0% found this document useful (0 votes)
32 views10 pages

Deep Residual Autoencoder For Quality Independent JPEG Restoration

This paper presents a deep residual autoencoder utilizing Residual-in-Residual Dense Blocks (RRDB) for JPEG artifact removal that operates independently of the Quality Factor (QF). The model processes images in the YCbCr color space, restoring the luma channel first and then the chroma channels using a two-phase approach, demonstrating superior performance over existing methods on benchmark datasets. Key contributions include a quality-independent restoration method and a robust architecture that leverages knowledge of the JPEG compression pipeline.

Uploaded by

Javier Bush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Deep Residual Autoencoder For Quality Independent JPEG Restoration

This paper presents a deep residual autoencoder utilizing Residual-in-Residual Dense Blocks (RRDB) for JPEG artifact removal that operates independently of the Quality Factor (QF). The model processes images in the YCbCr color space, restoring the luma channel first and then the chroma channels using a two-phase approach, demonstrating superior performance over existing methods on benchmark datasets. Key contributions include a quality-independent restoration method and a robust architecture that leverages knowledge of the JPEG compression pipeline.

Uploaded by

Javier Bush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1

Deep Residual Autoencoder for quality independent


JPEG restoration
Simone Zini, Simone Bianco and Raimondo Schettini

Abstract—In this paper we propose a deep residual autoen- Methods comparison QF=10
coder exploiting Residual-in-Residual Dense Blocks (RRDB) to
remove artifacts in JPEG compressed images that is independent 0.850 OUR
from the Quality Factor (QF) used. The proposed approach S-NET
leverages both the learning capacity of deep residual networks
arXiv:1903.06117v1 [cs.CV] 14 Mar 2019

0.845
and prior knowledge of the JPEG compression pipeline. The
proposed model operates in the YCbCr color space and performs DMCNN
JPEG artifact restoration in two phases using two different au- 0.840
toencoders: the first one restores the luma channel exploiting 2D

SSIM
convolutions; the second one, using the restored luma channel as a
guide, restores the chroma channels explotining 3D convolutions. 0.835 ARGAN-MSE
Extensive experimental results on three widely used bench- CAS-CNN
mark datasets (i.e. LIVE1, BDS500, and CLASSIC-5) show
0.830
that our model is able to outperform the state of the art with
respect to all the evaluation metrics considered (i.e. PSNR, PSNR-
B, and SSIM). This results is remarkable since the approaches DnCNN MWCNN
0.825
in the state of the art use a different set of weights for each ARCNN D3
compression quality, while the proposed model uses the same
weights for all of them, making it applicable to images in the wild
29.2 29.4 29.6 29.8 30.0
where the QF used for compression is unkwnown. Furthermore,
the proposed model shows a greater robustness than state-of- PSNR
the-art methods when applied to compression qualities not seen
Fig. 1. PSNR-SSIM comparison of the state-of-the-art-models and our
during training. proposed method. For both metrics higher value means better visual results.
Index Terms—JPEG restoration, deep learning, residual net-
work, autoencoder.
performs the Discrete Cosine Transform (DCT) on each block
I. I NTRODUCTION separately, while downsampling the chroma components with
Image compression represents a very active research topic a bilinear filter. The DCT coefficients obtained from the luma
due to the high impact of the data in a big amount of channel are then quantized based on quantization tables and
fields, from image sharing on the web to the most specific adjusted using the user-selected quality factor. The image
applications involving the acquisition of images and transfer is then reconstructed from the quantized DCT coefficients
to elaboration nodes. by using the inverse DCT. The described JPEG encoding
Specifically, image compression refers to the task of repre- operation introduces three kinds of artifacts in the recovered
senting images using the smallest storage space possible. images, related to the quality factor used for the compression:
Compression algorithms play a key role for saving space and i) blocketization artifacts, which come from the recombination
bandwidth for the memorization and transfer of large amount of the 8×8 blocks, that are independently compressed without
of images. Two different compression paradigm exist: the considering the adjacent blocks; ii) ringing artifacts, which
former is lossless image compression, where the compression are most visible along the edges and are related to the coarse
rate is limited by the requirement that the original image must quantization of the high-frequencies components; iii) blurred
be perfectly recovered; the latter, more diffused, is lossy image low-frequencies areas, which is also related to the compression
compression, where higher compression rates are possible at of the high-frequencies in the DCT domain.
the cost of some distortion in the recovered image. Among The presence of these kinds of artifacts represents a problem
the lossy compression algorithms, the most diffused and used since the general quality of the images is degraded resulting
is the JPEG compression algorithm. unpleasing for normal users for generic applications (e.g.
The JPEG compression algorithm first converts the original projection, print, etc.), or even useless for computer vision
RGB image into YCbCr color space and processes the luma applications where the loss of information can be potentially
and chroma channels separately. It divides the luma channel critic for the task [1], [2].
of an input image into non-overlapping 8 × 8 blocks and With the purpose of reducing these artifacts, in the last years
a lot of JPEG artifact reduction algorithms have been pro-
S. Zini, S. Bianco, and R. Schettini are with the Department of Informatics, posed. These methods include both traditional image process-
Systems and Communication, University of Milano-Bicocca, 20126 Mi-
lan, Italy (email: [email protected]; [email protected]; schet- ing pipelines [3]–[8] and machine learning approaches [9]–
[email protected]). [16], both making great steps in the restoration of corrupted
2

images. However, these methods suffer from two main limits: The trained neural network obtained at the end of the training
the first one is that they need to train a different model for process represent an approximation of the desired function for
each possible quality factor (QF), making them not generally the translation of the images from a distribution to another one.
applicable to general images downloaded from the web unless The first attempt with this kind of models has been done by
the QF used for compression is known; the second one, is that Dong et al. [9] who proposed the ARCNN, a model inspired
the great majority of methods in the state of the art restores by SRCNN [17], a neural network for Super-Resolution. This
just the luma channel or do not fully exploit the knowledge first attempt has been followed by DnCNN [10], a CNN
about the JPEG compression pipeline. for general denoising task that has also been used on JPEG
To address these problem we propose a new method for compressed images, and CAS-CNN [11], a model proposed
the restoration of JPEG compressed images in YCbCr color by Cavigelli et al., who presented a much deeper model
space, based on machine learning, specifically on convolu- capable to obtain higher quality images. Wang et al. proposed
tional autoencoders. The proposed approach consists in two D3 [12], a deep neural network that adopts JPEG-related
deep autoencoders respectively used for luma and chroma priors to improve reconstruction quality which obtained an
restoration, that are able to restore images independently improvement in speed and performances with respect with to
from the quality factor used for the compression. The main the previous models. In 2017, Galtieri et al. [13] developed a
contribution are the following: generative adversarial network (GAN) [18] for artifact removal
- the design of a method for the restoration of JPEG and texture reconstruction.
compression artifact that is independent from the QF In 2018 a bunch of new models for JPEG artifact removal
used; has been presented, showing interesting improvements in the
- the design of a model trainable end-to-end that fully results quality. Liu et al. [14] proposed a Multi-level Wavelet
exploits knowledge about JPEG compression pipeline; CNN (MWCNN), a model based on the U-Net architecture
- a thorough comparison with the state of the art on three [19], trained and used for multiple tasks: compression artifact
standard datasets at fixed QFs; removal, denoising and super-resolution. Zhang et al. [15]
- an analysis of robustness of restoration results at QFs not developed DMCNN, a Dual-Domain Multi-Scale CNN, which
used for training. gains higher results quality than the previous works, by using
both pixel and frequency (i.e. DCT) domain information.
Lastly S-Net, the most recent method by Zheng et al. [16]
II. R ELATED W ORKS
proposed a “greedy loss architecture” to train deeper models
The task of JPEG compression artifacts removal has been capable to outperform the previous state-of-the-art.
faced in different ways in the past years. The existing proposed
methods can be broadly classified into two groups: traditional III. P ROPOSED M ETHOD
image processing methods and learning based methods.
To the first group belong methods based on traditional The method in the state of the art mainly suffer from
image processing techniques working both in the spatial two limits: the first one is that each machine learning model
and in the frequency domain. For spatial domain processing needs to know the JPEG compression Quality Factor (QF) of
different kinds of filters have been proposed, with the intent each input image to properly restore a compressed image; the
of restoring specific areas of the images such as edges [3], second one is that the great majority of them are capable to
textures [4], smooth regions [5], etc. Algorithms usually rely restore only the luma channel without considering the chroma
on information obtained by the application of the Discrete components, and the only one that recovers all three channels
Cosine Transform (DCT) transform [6]. SA-DCT, proposed by [16] does not fully exploit theoretical knowledge of the JPEG
Foi et al. [7], attempts to reconstruct an estimate of the signal compression pipeline.
using the DCT of the original image together with the spatial In this work we propose a method able to overcame both
information contained in the image itself. However SA-DCT these problems. The first problem has to do with the way
is not capable to reproduce details like sharp edges or complex the models are trained: all of the previous existing methods
textures. To overcome this limit different restoration oriented make the implicit assumption that the compression quality
methods have been proposed, like the Regression Tree Fields factor QF used to compress the input images is known. In
based method (RTF) [8]. The RTF uses the results of SA-DCT fact, most of the previous models present networks trained
to restore images, taking advantage of a regression tree field on datasets compressed on specific quality factors (the most
model. common being QF = 10, 20, 30 and 40). This way of training
Following the success of the application of Deep Convo- the models leads to two limits:
lutional Neural Networks (Deep-CNNs) in image processing - the models are capable to correctly restore only images
tasks, such as image denoising [10] and Single-Image Super- at a specific QF, with the consequence that a specific
Resolution [17], Deep-CNNs have been applied with success training for each quality factor is needed;
to JPEG compression artifact removal task. The basic idea - the QF used for the compression of the images is needed
behind Deep-CNNs is to learn a function to map a set of in order to train a model and correctly restore the images:
images from an input distribution, to the desired output one. this is usually a not known information for images
In the artifact removal case the objective is to map degraded coming from unknown sources (e.g. downloaded from
images into a distribution without the presence of the noise. the web), thus largely limiting the usability of the model.
3

In order to overcome the necessity to know the compression produces Y’ as output. The second step concatenates Y’CbCr
quality factor, we train our model on a dataset containing along the channel dimension and uses a second model named
images compressed at different QFs: this will make the model ChromaNet, to restore the CbCr channels. This second step
more generic and able to restore images taken in the wild, uses Y’ as a map of the structures present in the image (i.e.
i.e. without knowing the actual QF used. This objective poses a sort of guide) to condition the second network to recover
a challenge, since the training of such a quality independent the color hue and contours, and produces Cb’Cr’ as output.
model is much harder than training on a single quality factor. The final output is obtained by concatenating Y’Cb’Cr’ and
The second problem concerns the way the previous models converting them back to RGB. Both LumiNet and ChromaNet
restore the images: all of the previous state-of-the-art methods are two different deep CNN Autoencoders both exploiting a
are trained on the luma channel (Y channel of the YCbCr new revisited version of the Residual Blocks [20].
space) of the images. This approach is based on the fact that
the JPEG compression algorithm applies the DCT to the Y B. Deep Residual Autoencoder Architecture
channel, introducing ringing and blocketization artifacts on
the luma channel, while the other Cb and Cr channels are Autoencoder architectures have been widely used in image
just sub-sampled the bicubic interpolation. The design and processing tasks like image-to-image translation [21], Super-
training of a model for the specific restoration of the luma Resolution [22], image inpainting [23] and rain removal
component and its subsequent application for the restoration [24]. Autoencoders generally present a structure made by
of the chroma components (as done for example by ARCNN three parts: the encoder, which extracts features from the
[9]), introduces chromatic aberrations and artifacts in the final n-dimensional input (usually 1 or 3 channels); a central
result. S-Net [16] is the only method considering this problem part, that performs feature processing; and the final decoder,
and instead of training a model for the restoration of just which decodes the processed features into the output image
the luma component, it takes as input a full RGB image and having the desired dimensions. Figure 4 shows a schematic
recovers a full RGB images as output. representation of the proposed model, while a more detailed
description of its architecture is reported in Table I.
To overcome this second limit and obtain better results we
exploit the knowledge of how the JPEG compression pipeline The encoder, which consists of two convolutions followed
works and propose the use of two models for the image by Leaky ReLU activations, is followed by a central part for
restoration in YCbCr space: the first model restores the Y chan- feature enhancement consisting in a sequence of Residual-in-
nel; the second model then uses the result as a Structure Map Residual Dense Blocks (RRDB) [25], a modified version of the
(i.e. a guide) for the restoration of the chroma components. A well known residual blocks originally introduced in the ResNet
schematic representation of the proposed method is depicted architecture [20], that have been shown to perform well in
in Figure 2. other image processing tasks, e.g. image super-resolution [25],
[26]. The RRDBs blocks combine multi-level residual learning
and dense connection architecture: the RRDBs are designed
A. Luma and chroma Restoration Model without the use of the Batch Normalization and the application
The vast majority of learning based methods for JPEG of the residual learning on different levels. The RRDBs are
compression artifact removal in the state of the art [9]–[12], shown in Figure 5: each RRDB is made of five Dense Blocks,
[14], [15] focus exclusively on the luma component of the which use only convolutions with Leaky ReLUs activation
images. Generally these methods perform the compression and dense skip connection structures, combined together with
artifact removal working on the Y channel of the images, after other skip connections. Finally, the decoder is designed in a
converting them in YCbCr color space. The learned model in symmetrical way with respect to the encoder part.
some cases is then applied as is also on Cb and Cr channels The same architecture has been used for both the networks
(e.g. [9]). These approaches do not take in consideration for luma and chroma restoration, but with some differences:
the chroma aspects of the images, generating results with - different depth in terms of number of RRDBs used in the
aberrations in RGB space and low perceptual quality. central part;
Moreover the JPEG compression algorithm, when operating - different feature extraction from the input in the encoder
with very low compression quality factors, such as QF < part.
20, tends to change the colors of the input images in two For the restoration of the luma (Y channel) the number of
different ways: hue change and spatial location change. As can central RRDBs is set to five, while for the CbCr restoration the
be seen in Figure 3, in the compressed version of the Cb and number of RRDB is decreased to three. The second and more
Cr channels, as expected the color resolution is reduced, and important difference is in the first layer of the CbCr version
also, for some elements, the color position does not correspond of the network, which is a 3-dimensional convolutional layer.
to the one in the original uncompressed image. Considering that the input of the CbCr-Net is the concatenation
Keeping the above considerations in mind we propose a (along the channel dimension) of the restored Y’ channel with
method for restoring both luma and chroma components of the Cb and Cr channels, we decided to use a 3D convolution
the compressed images (see Figure 2). The method consists to make the model capable to correlate information about
of two steps: the first step, after the conversion of the input color and structures with the use of the same kernels for all
image into YCbCr color space, involves the restoration of the the information coming from the three input channels. The
Y channel alone, using a first model named LumiNet, and output of this second network are the two restored Cb and
4

Step 1: Y channel restoration


Y Y'

Y Net

IN OUT
Cb Structure Map Cb'
RGB YCbCr
to Concat CbCr Net Concat to
YCbCr RGB

Cr Cr'

Step 2: Cb Cr channels restoration

Fig. 2. Schematic representation of the proposed method: the input image is first converted to YCbCr color space. The Y channel is restored with the Y-net
and the result Y’ is concatenated with the original CbCr channels to restore Cb’Cr’ with the CbCr-net. Restored Y’Cb’Cr’ channels are then converted back
to RGB color space.

TABLE I
D ETAILED ARCHITECTURE OF THE AUTOENCODERS USED FOR BOTH THE
LUMA AND CHROMA RESTORATION . T HE NUMBER OF RRDB S IS B = 5
FOR THE Y-N ET AND B = 3 FOR THE C B C R -N ET.

Layer Filter size, Stride, Padding output channels


Conv2D 3x3, 1, 1 64
Conv2D 5x5, 1, 2 128
LReLU - 128
Encoder
Conv2D 3x3, 1, 1 64
LReLU - 64
RRDB x B
Conv2D 3x3, 1, 1 128
(a) Original (b) Compressed (QF = 10) LReLU - 128
Decoder
Conv2D 5x5, 1, 2 64
LReLU - 64
Conv2D 3x3, 1, 1 1
Tanh - 1

Cr channels, which are then concatenated with the restored Y’


channel, in order to obtain the complete restored image.
In order to improve the quality of the generated results, as
well as to make the training process more stable, the proposed
architecture include the following design choices:
(c) Original Cb (d) Compressed Cb (QF = 10)
- removal of Batch Normalization (BN) layers from the
Residual Blocks;
- use of a residual scaling parameter in each Residual
Block;
- initialization of the model weights using a scaled version
of the Kaiming initialization [27].
The removal of the batch normalization layers has been
proved, in image Super-Resolution [26] and image deblurring
[28] tasks, to increase the performances for the generation of
images in terms of quality indexes (PSNR and SSIM [29]).
(e) Original Cr (f) Compressed Cr (QF = 10)
The removal of the BN layers, which improve the stability of
the training and the generated image appearance, makes on
Fig. 3. Visual example of how the JPEG compression algorithm, when
operating with very low compression quality factors changes the colors of the
the other hand the training of deep networks more difficult.
input images in two different ways: hue change and spatial location change. To solve that issues two solutions have been proved to work
well: the so called residual scaling (in our model set to 0.2),
to scale each residual in order to not magnify the input image
5

B Residual in Residual Dense Blocks

...

LReLU

LReLU

LReLU

LReLU
Conv

Conv

Conv

Conv

Conv

Conv
RRDB RRDB

Encoder Feature Enhancement Decoder


Fig. 4. Graphical representation of the architecture of the autoencoders used for both the luma and chroma restoration.

4) color restoration: evaluation of the color restoration


capability of the model on the images converted in RGB
space after the elaboration.
LReLU

LReLU
LReLU

LReLU
Conv

Conv
Conv

Conv

Conv

A. Dataset
The dataset used for training is the DIV2K dataset, a
collection of high-quality images (2K resolution), presented
during the NTIRE2017 challenge [30] for image restoration
Dense Dense Dense tasks. This dataset is made of a total amount of 900 images:
Block Block Block 800 are used for training while the remaining 100 are used

for validation. The complete dataset contains also 100 images


+ + + *β + for testing. The groundtruths of this last part have not been
released after the challenge, and therefore are not used in this
paper.
Fig. 5. Schematic representation of the architecture of the Residual-in- With the purpose of increase the amount of different texture
Residual Dense Block (RRDB) [25]. and pattern to show to the model during training, we have
combined the DIV2K dataset with the F LICKR 2K dataset
[31], a collection of 2650 high-quality images (same resolution
in a wrong way, and a small weight initialization, obtained by
as the DIV2K) collected from Flickr website.
the application of the Kaiming initialization, presented by He
In order to train the models on different quality factors, for
et al. [27], scaled by a factor 0.1. As can be seen in Figure
each image in the dataset we have applied 10 different com-
5 the residual scaling is applied on the higher level of the
pression levels, corresponding to the quality factors between
residual learning architecture, i.e. on the output of each dense
QF = 10 to QF = 100, with step 10. The images have been
block and at the end of the RRDBs.
compressed in RGB space with the MATLAB standard library
IV. E XPERIMENTAL S ETUP function, then the compressed images have been converted
later in YCbCr space using the P YTHON S CIKIT-I MAGE
The training of the proposed method leads to two different
library (v0.14.0), during the training phase. The compressed
Deep-CNNs respectively for the restoration of the luminance
version of the training dataset contains 8000 images. The same
and chroma components of JPEG compressed images at
operation has been applied to the F LICKR 2K dataset for a total
generic quality (i.e. QFs). In order to evaluate the results, our
amount of 34k training images.
models have been compared with the state of the art in four
The evaluation of our model has been done on the LIVE1
different experimental setups:
[29], C LASSIC -5 and BSD500 [32], three benchmark datasets
1) known QF luminance restoration: comparison with the widely used for JPEG artifact removal algorithm evaluation.
state-of-the-art methods which work only on the Y For the evaluation of the behaviour of the models with the
channel of the input images; unknown compression quality factor we adopted the SDIVL
2) unknown QF luminance restoration: comparison to test [33], a dataset proposed for Image Quality Assessment task.
the ability of the models to restore images at intermedi-
ate QFs never seen during training;
3) high and low details density areas restoration: evaluation B. Evaluation metrics
of the performances of the state-of-the-art methods and The globally adopted metrics for the evaluation of the
the proposed one over specific areas of the images, by quality of images in artifact removal tasks are PSNR, PSNR-B
dividing the images in patches classified on high-to-low [34] (which focus the evaluation on the blocketization in the
frequency (DCT domain) and high-to-low detail density; image) and SSIM [29] indexes. For all of these three measures
6

an higher value means better results. The PSNR and PSNR-B metrics considered. As can be seen our model outperforms the
indexes give information about the quality of the images in state of the art on all the metrics. With the proposed model
terms of noise and perceived quality, with PSNR-B taking in we obtained improvements with respect to the state-of-the-art
consideration also the blocketization artifacts; SSIM index is methods on both general perceptual quality (PSNR/PSNR-B)
an indicator of the quality of edges and structures contained and structure reconstruction (SSIM). Since each index focuses
in the. For all the three indexes considered an higher value of different aspects of the restoration quality, each index
means that the content and the structures in the reconstructed alone is not capable to summarize all the aspect of a good
image are more similar to the ones in the target image. reconstruction. Therefore, we also compare the methods in a
graph style-view, reported in Figures 1 and 6 to correlate the
C. Training Details two indexes. In order for a method to obtain a more pleasing
perceived quality, it is necessary that both the metrics obtain
All the training phase has been done on a NVIDIA GTX
high values. It is easy from this kind of view to see how
1070 GPU with 8 GB of memory using P Y T ORCH framework
the proposed method outperforms the current state-of-the-art
at version 0.4.1. The mini-batch size has been set to 8 and each
models even if a single model is used for all the QFs.
input image has been cropped to a patch size of 100 × 100
pixels. During the experiments we tried to train the network
with different crop sizes (32 × 32, 50 × 50, 100 × 100 and B. Restoration with unknown compression Quality Factor
400 × 400), observing how training deeper networks with
Another kind of evaluation has been done about the capa-
bigger patch size gives a boost on performances over both
bility of the models to recover images at compression quality
PSNR and SSIM indexes.
factors never seen during training. In most of the real use-
We also explored the use of different numbers of RRDBs
cases, the JPEG compression quality factor previously applied
in the model: we observed how with deeper models, using
on an image is not know: it is then important that a model
this specific kind of residual blocks, the results got better and
is able to recover the images without this prior information.
better, increasing the PSNR and SSIM values on the validation
On the other hand, if we are able at least to estimate the
set. The final structure uses five RRDBs for the Y channel
compression quality factor of the input compressed image,
restoration model and three RRDBs for the CbCr model, where
following the previous approaches we should train new models
each convolution has 64 filters. We found this configuration
for each specific quality factor needed, or use the model trained
to be the best one, with respect to the patch size, the amount
for the closest QF to the desired one.
of RRDBs, the number of filters and the limits due to the
We compare our model with the two state-of-the-art models
memory offered by our board.
for which the code i available (i.e. ARCNN and MWCNN) in
We trained the model using Adam optimizer [35] with β1 =
a specific selection of cases. Since previous models have been
0.9, β2 = 0.999, with learning rate initialized at 2 × 10−4
trained on specific quality factors, and our model has been
decreased after 200 epochs of training by a factor of 2. The
instead trained over quality factor from 10 to 100 in steps
training has been performed using the L1 Loss, since allow us
of 10, without the use of images with QFs in between, we
to achieve better PSNR results and to make the training more
decided to test the model robustness on never seen artifacts.
stable.
In order to perform the evaluation in a coherent way, for the
state-of-the-art algorithm we used the pretrained models for
V. E XPERIMENTAL R ESULTS
the nearest quality factor, for example if the input image has
A. Restoration with known compression Quality Factor been compressed with QF = 17 we used the models trained
We compared our model with the state-of-the-art models for QF = 20. For this evaluation we adopted the SDIVL
ARCNN [9], CAS-CNN [11], D3 [12], and the more recent dataset: for each image of the testset we applied all of the
DMCNN [15], MWCNN [14], ARGAN [13] and S-Net [16]. compression factors in the interval 5 − 25. The evaluation is
Since the state-of-the-art methods operate only on the Y done in the same way it has been done for in the previous
channel of the images, in order to make a fair comparison, the secsion, by extracting Y channel and measuring PSNR, PSNR-
metrics are evaluated on the Y channel recovered by the first B and SSIM indexes.
network with the corresponding target images, using the MAT- In Figure 7 are shown the results of the models on the
LAB standard libraries, over five different compression qual- SDIVL with all the quality factors compression. As can be
ities: 10, 20, 40, 60, 80. For each method, on all the datasets seen in those graphs our model shows a more stable behaviour:
considered, we report the results taken from the corresponding the model is capable to restore images at different QFs with
publication, except for ARCNN and MWCNN which provide a more coherent and smooth behaviour in relation to the
the source-code, that are then used for the evaluation. Since the increase of the QF, in comparison with the other methods.
training of the proposed methods leads to a single model that Moreover, the previous state-of-the-art models have difficulties
can be used for all the quality factors, we used the same model to restore images at quality factors distant from the trained one.
for the evaluation at all the qualities previously mentioned. All It is particularly interesting to see how the other models have
the state-of-the-art methods compared, instead, have a different difficulties to restore images at higher qualities with respect to
trained model for each QF considered. the QF used in training, in terms of structures in the images
Table II, III and IV respectively report the comparison on (Figure 7c), due to the more complex textures never seen by
the LIVE1, BSD500 and C LASSIC -5 datasets for all the three the models during training phase.
7

Methods comparison QF=20 Methods comparison QF=40


0.910
0.945
OUR OUR
S-NET
S-NET
0.905 DMCNN
0.940

CAS-CNN
0.900
0.935
SSIM

SSIM
ARGAN-MSE
0.895 CAS-CNN ARCNN
0.930 MWCNN
DnCNN

0.890 D3
MWCNN 0.925

ARCNN ARGAN-MSE
0.885
0.920
31.4 31.6 31.8 32.0 32.2 32.4 33.6 33.8 34.0 34.2 34.4 34.6 34.8
PSNR PSNR
Fig. 6. PSNR-SSIM comparison of the state-of-the-art-models and our proposed method. For both metrics higher value means better visual results.

TABLE II
C OMPARISON ON TEST SET LIVE1: FOR THE METHODS IN THE STATE OF THE ART A FIVE DIFFERENT MODELS ARE TRAINED FOR EACH QF
CONSIDERED . T HE PROPOSED METHOD USES THE SAME MODEL FOR ALL THE QF S .

ARCNN DnCNN CAS-CNN D3 DMCNN MWCNN S-NET ARGAN-MSE ARGAN


Quality OUR
[9] [10] [11] [12] [15] [14] [16] [13] [13]
10 29.13 29.19 29.44 29.96 29.73 29.37 29.87 29.45 27.29 29.97
20 31.4 31.59 31.70 32.21 32.09 31.58 32.26 31.77 28.35 32.34
PSNR 40 33.63 33.96 34.10 - - 34.17 34.61 34.09 28.99 34.78
60 - - 35.78 - - - - - - 36.47
80 - - 38.55 - - - - - - 39.31
10 28.74 - 29.19 29.45 29.55 28.85 - 29.10 26.69 29.60
20 30.69 - 30.88 31.35 31.32 30.83 - 31.26 28.10 31.76
PSNR-B 40 33.12 - 33.68 - - 33.33 - 33.40 28.84 33.96
60 - - 35.10 - - - - - - 35.51
80 - - 37.73 - - - - - - 38.26
10 0.823 0.812 0.833 0.823 0.842 0.832 0.847 0.834 0.773 0.850
20 0.886 0.880 0.895 0.890 0.905 0.891 0.907 0.896 0.817 0.908
SSIM 40 0.931 0.924 0.937 - - 0.936 0.942 0.922 0.837 0.944
60 - - 0.954 - - - - - - 0.960
80 - - 0.973 - - - - - - 0.976

TABLE III
C OMPARISON ON TEST SET BSD500: FOR THE METHODS IN THE STATE OF THE ART A FIVE DIFFERENT MODELS ARE TRAINED FOR EACH QF
CONSIDERED . T HE PROPOSED METHOD USES THE SAME MODEL FOR ALL THE QF S .

ARCNN DnCNN CAS-CNN D3 DMCNN MWCNN S-NET ARGAN-MSE ARGAN


Quality OUR
[9] [10] [11] [12] [15] [14] [16] [13] [13]
10 29.10 - - - 29.67 29.50 29.82 29.03 27.01 29.92
PSNR 20 31.25 - - - 31.98 31.34 32.15 31.20 28.07 32.23
40 33.55 - - - - 33.23 34.45 33.30 28.61 34.61
10 28.75 - - - - 28.60 - 28.61 26.30 29.41
PSNR-B 20 30.60 - - - - 29.84 - 30.48 27.76 31.39
40 32.80 - - - - 31.04 - 32.18 28.20 33.34
10 0.819 - - - 0.840 0.835 0.844 0.807 0.746 0.847
SSIM 20 0.885 - - - 0.904 0.889 0.905 0.876 0.794 0.906
40 0.929 - - - - 0.928 0.941 0.921 0.815 0.943

C. High and low frequency areas restoration the patches into five bins with respect to both frequency and
detail density. Patch frequency is computed as the weighted
In order to better understand if the proposed method per- average of the 2D Fourier Transform normalized magnitude.
forms better than approaches in the state of the art only Patch detail density is computed as the 2D average of the result
on certain image types, we conduct a further experiment: of the Canny edge detection. The results for the considered
we divide the images from LIVE1 testset, compressed at evaluation metrics over the five categories of the frequency
QF = 10, into 64 × 64 patches and classify each of them into and detail density are respectively reported in Table V and
five categories. The categories are obtained by equally diving
8

TABLE IV
C OMPARISON ON TEST SET C LASSIC -5: FOR THE METHODS IN THE STATE OF THE ART A FIVE DIFFERENT MODELS ARE TRAINED FOR EACH QF
CONSIDERED . T HE PROPOSED METHOD USES THE SAME MODEL FOR ALL THE QF S .

ARCNN DnCNN CAS-CNN D3 DMCNN MWCNN S-NET ARGAN-MSE ARGAN


Quality OUR
[9] [10] [11] [12] [15] [14] [16] [13] [13]
10 29.04 29.4 - - - 29.68 - - - 29.67
PSNR 20 31.16 31.63 - - - 31.78 - - - 31.89
40 33.34 33.77 - - - 34.05 - - - 34.04
10 28.75 - - - - 29.06 - - - 29.35
PSNR-B 20 30.6 - - - - 30.95 - - - 31.43
40 32.8 - - - - 33.20 - - - 33.33
10 0.811 0.803 - - - 0.828 - - - 0.829
SSIM 20 0.869 0.861 - - - 0.878 - - - 0.882
40 0.91 0.9 - - - 0.916 - - - 0.917

Model Model
38 Model
ARCNN 37 ARCNN 0.96
ARCNN
MWCNN MWCNN
MWCNN
OUR 36 OUR
OUR
0.94
36
35

0.92
34
PSNR-B
PSNR

SSIM
34
33
0.90

32
32 0.88
31

30
0.86
30
29
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Quality Factor Quality Factor Quality Factor

(a) PSNR (b) PSNR-B (c) SSIM


Fig. 7. Comparison on QFs not seen during training. For ARCNN and MWCNN the models trained for QF=10 and QF=20 are tested on QF in the range
[5, 25]. The proposed model is trained for QF in the range [10, 100] with steps of 10, and is tested on the same intermediate QFs not seen in training.

VI. From the results reported it is possible to notice that the color aberration coming from the compression are blurred
proposed method consistently outperforms the state of the art and mainteined in the other models, while are cleaned by our
on all the frequency and detail density categories. model which reshapes the color information with respect to the
structures in the images. The results are much more pleasing
D. Color Restoration and realistic than the other methods ones.
The final evaluation is focused on the color restoration
capability of the models. The comparison, in the same way VI. C ONCLUSION
as done in the previous evaluations, has been done among the
ARCNN [9], MWCNN [14] and our proposed model. In this paper we proposed a deep residual autoencoder
We restored the images from the LIVE1 testset with the exploiting Residual-in-Residual Dense Blocks (RRDB) to re-
lowest quality factors QF = 10, 20, 40. For this specific move artifacts in JPEG compressed images, that is independent
evaluation we restored both luma and chroma components. from the QF used. The proposed model operates in the YCbCr
In the case of ARCNN and MWCNN methods, we adopted color space and performs a two-phase restoration of JPEG
the same model for all of the three channels (Y, Cb and Cr artifacts: in the former phase, a first autoencoder exploiting
channels), while our method uses the two different networks 2D convolutions is used to restore the luma channel; in the
to first restore the luminance channel then the chrominance latter phase, a second autoencoder, by stacking along the
channels. channel dimension the results of the first autoencoder and the
For this comparison we used the PSNR, PSNR-B and SSIM original chroma channels, employs 3D convolutions to exploit
indexes over the restored images in RGB space, instead of only the restored luma channel as a guide, and restores the chroma
evaluating the luminance information, using the MATLAB channels.
standard library: numerical results can be seen in Table VII The main contributions of this paper are: i) the design of
and visual results are summarized in some patches from the a method for the restoration of JPEG compression artifact
images of LIVE1 in Figure 8. As can be seen the proposed that is independent from the QF used; ii) the design of a
model obtains better results than the other methods in terms model trainable end-to-end that fully exploits knowledge about
of PSNR, PSNR-B and SSIM index, and is also evident the JPEG compression pipeline; iii) a thorough comparison with
difference on the final images. The blocketization and the the state of the art on three standard datasets at fixed QFs;
9

TABLE V
C OMPARISON BY SUBDIVIDING THE IMAGE PATCHES ON THE BASIS OF THE FREQUENCY CONTENT IN FIVE CLASSES FROM HIGH TO LOW.

ARCNN MWCNN OUR


Frequency PSNR PSNR-B SSIM PSNR PSNR-B SSIM PSNR PSNR-B SSIM
high 27.53 27.26 0.782 27.61 27.24 0.792 28.18 27.88 0.807
medium-high 25.00 24.66 0.685 25.24 24.67 0.700 25.64 25.18 0.729
medium 24.61 24.27 0.734 24.50 23.82 0.740 25.37 24.91 0.773
medium-low 25.92 25.49 0.794 25.91 25.24 0.803 26.73 26.21 0.827
low 27.08 25.93 0.840 26.72 25.27 0.849 27.81 26.52 0.864

TABLE VI
C OMPARISON BY SUBDIVIDING THE IMAGE PATCHES ON THE BASIS OF THE DETAIL DENSITY IN FIVE CLASSES FROM LOW TO HIGH .

ARCNN MWCNN OUR


Edges frequency PSNR PSNR-B SSIM PSNR PSNR-B SSIM PSNR PSNR-B SSIM
high 23.20 22.94 0.667 23.42 22.82 0.683 23.87 23.45 0.716
medium-high 24.69 24.39 0.721 24.91 24.31 0.735 25.42 25.02 0.763
medium 25.68 25.22 0.758 25.94 25.26 0.772 26.41 25.86 0.794
medium-low 26.83 26.12 0.805 27.01 25.95 0.817 27.47 26.61 0.832
low 29.17 28.22 0.884 27.45 26.28 0.888 29.97 28.99 0.897

(a) Input (b) ARCNN (c) MWCNN (d) OUR (e) Target
Fig. 8. Visual comparison of the full color JPEG restoration.

iv) an analysis of robustness of restoration results at QFs not respect to all the evaluation metrics considered (i.e. PSNR,
used for training. PSNR-B, and SSIM). This results is remarkable since the
Extensive experimental results on three widely used bench- approaches in the state of the art use a different set of weights
mark datasets (i.e. LIVE1, BDS500, and CLASSIC-5) show for each compression quality, while the proposed model uses
that our model is able to outperform the state of the art with the same weights for all of them, making it applicable to
10

TABLE VII [15] DMCNN: Dual-domain Multi-scale Convolutional Neural Network for
C OMPARISON OF THE EVALUATION METRICS COMPUTER ON FULL - COLOR Compression Artifacts Removal, 2018.
RESTORATED IMAGES [16] B. Zheng, R. Sun, X. Tian, and Y. Chen, “S-net: a scalable convolutional
neural network for jpeg compression artifact reduction,” Journal of
LIVE1 Electronic Imaging, vol. 27, no. 4, p. 043037, 2018.
Qualities ARCNN MWCNN OUR [17] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using
10 28.97 29.84 29.98 deep convolutional networks,” IEEE transactions on pattern analysis
PSNR 20 31.31 32.00 32.35 and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016.
40 33.64 34.58 34.80 [18] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
10 28.69 29.40 29.61 S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
PSNR-B 20 30.78 31.28 31.77 Advances in Neural Information Processing Systems 27, Z. Ghahramani,
40 33.13 33.78 33.98 M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.
10 0.822 0.846 0.851 Curran Associates, Inc., 2014, pp. 2672–2680. [Online]. Available:
SSIM 20 0.888 0.900 0.909 http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
40 0.931 0.941 0.944 [19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on
Medical image computing and computer-assisted intervention. Springer,
2015, pp. 234–241.
images in the wild where the QF used for compression is [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
unkwnown. Furthermore, the proposed model shows a greater recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
robustness than state-of-the-art methods when applied to com- [21] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
pression qualities not seen during training. Since preliminary with conditional adversarial networks,” CVPR, 2017.
experiments with the same architecture proposed showed good [22] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder
for single image super-resolution,” IEEE transactions on cybernetics,
results for the restoration of other artifacts (i.e. noise removal, vol. 47, no. 1, pp. 27–37, 2017.
in the CVPRW NTIRE2019 challenge), as future work we [23] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep
plan to investigate its extension to other single and multiple neural networks,” in Advances in neural information processing systems,
2012, pp. 341–349.
distortions [36]. [24] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative
adversarial network for raindrop removal from a single image,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern
R EFERENCES Recognition, 2018, pp. 2482–2491.
[25] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy,
[1] S. Dodge and L. Karam, “Understanding how image quality affects deep “Esrgan: Enhanced super-resolution generative adversarial networks,” in
neural networks,” in 2016 eighth international conference on quality of The European Conference on Computer Vision Workshops (ECCVW),
multimedia experience (QoMEX). IEEE, 2016, pp. 1–6. September 2018.
[2] S. Bianco, L. Celona, and R. Schettini, “Robust smile detection using [26] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual
convolutional neural networks,” Journal of Electronic Imaging, vol. 25, networks for single image super-resolution,” in The IEEE conference
no. 6, p. 063002, 2016. on computer vision and pattern recognition (CVPR) workshops, vol. 1,
[3] P. List, A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, no. 2, 2017, p. 4.
“Adaptive deblocking filter,” IEEE transactions on circuits and systems [27] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
for video technology, vol. 13, no. 7, pp. 614–619, 2003. Surpassing human-level performance on imagenet classification,” in
[4] H. C. Reeve and J. S. Lim, “Reduction of blocking effects in image Proceedings of the IEEE international conference on computer vision,
coding,” Optical Engineering, vol. 23, no. 1, p. 230134, 1984. 2015, pp. 1026–1034.
[5] C. Wang, J. Zhou, and S. Liu, “Adaptive non-local means filter for image [28] S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional
deblocking,” Signal Processing: Image Communication, vol. 28, no. 5, neural network for dynamic scene deblurring,” in CVPR, vol. 1, no. 2,
pp. 522–530, 2013. 2017, p. 3.
[6] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” [29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974. quality assessment: from error visibility to structural similarity,” IEEE
[7] A. Foi, V. Katkovnik, and K. Egiazarian, “Pointwise shape-adaptive transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
dct for high-quality deblocking of compressed color images,” in Signal [30] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image
Processing Conference, 2006 14th European. IEEE, 2006, pp. 1–5. super-resolution: Dataset and study,” in The IEEE Conference on Com-
[8] J. Jancsary, S. Nowozin, and C. Rother, “Loss-specific training of puter Vision and Pattern Recognition (CVPR) Workshops, July 2017.
non-parametric image restoration models: A new state of the art,” in [31] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim,
European Conference on Computer Vision. Springer, 2012, pp. 112– S. Son, H. Kim, S. Nah, K. M. Lee et al., “Ntire 2017 challenge on
125. single image super-resolution: Methods and results,” in Computer Vision
[9] C. Dong, Y. Deng, C. Change Loy, and X. Tang, “Compression artifacts and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference
reduction by a deep convolutional network,” in Proceedings of the IEEE on. IEEE, 2017, pp. 1110–1121.
International Conference on Computer Vision, 2015, pp. 576–584. [32] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and
[10] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian hierarchical image segmentation,” IEEE transactions on pattern analysis
denoiser: Residual learning of deep cnn for image denoising,” IEEE and machine intelligence, vol. 33, no. 5, pp. 898–916, 2011.
Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017. [33] S. Corchs, F. Gasparini, and R. Schettini, “No reference image quality
[11] L. Cavigelli, P. Hager, and L. Benini, “Cas-cnn: A deep convolutional classification for jpeg-distorted images,” Digital Signal Processing,
neural network for image compression artifact suppression,” in Neural vol. 30, pp. 86–100, 2014.
Networks (IJCNN), 2017 International Joint Conference on. IEEE, [34] C. Yim and A. C. Bovik, “Quality assessment of deblocked images,”
2017, pp. 752–759. IEEE Transactions on Image Processing, vol. 20, no. 1, pp. 88–98, 2011.
[12] Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. S. Huang, “D3: [35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Deep dual-domain based fast restoration of jpeg-compressed images,” in arXiv preprint arXiv:1412.6980, 2014.
Proceedings of the IEEE Conference on Computer Vision and Pattern [36] S. Corchs and F. Gasparini, “A multidistortion database for image
Recognition, 2016, pp. 2764–2772. quality,” in International Workshop on Computational Color Imaging.
[13] L. Galteri, L. Seidenari, M. Bertini, and A. Del Bimbo, “Deep Springer, 2017, pp. 95–104.
generative adversarial compression artifact removal,” arXiv preprint
arXiv:1704.02518, 2017.
[14] P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo, “Multi-level wavelet-
cnn for image restoration,” in The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR) Workshops, June 2018.

You might also like