FBCNN Iccv2021

Uploaded by

tasks888999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

FBCNN Iccv2021

Uploaded by

tasks888999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Towards Flexible Blind JPEG Artifacts Removal

Jiaxi Jiang Kai Zhang * Radu Timofte

Computer Vision Lab, ETH Zurich, Switzerland
[email protected] {kai.zhang, timofter}@vision.ee.ethz.ch
https://github.com/jiaxi-jiang/FBCNN

Abstract ble is usually represented by an integer called quality factor

(QF) ranging from 0 to 100, where a lower quality factor
Training a single deep blind model to handle different means less storage size but more lost information. Inspired
quality factors for JPEG image artifacts removal has been by the success of deep neural networks (DNNs) for image
attracting considerable attention due to its convenience for classification [25, 37], researchers began to resort to DNNs
practical usage. However, existing deep blind methods usu- for JPEG artifacts removal and have achieved notable aca-
ally directly reconstruct the image without predicting the demic success.
quality factor, thus lacking the flexibility to control the out- However, existing methods for JPEG artifacts removal
put as the non-blind methods. To remedy this problem, in generally have four limitations in real applications: (1)
this paper, we propose a flexible blind convolutional neural Most existing DNNs based methods [6, 7, 11, 28, 54]
network, namely FBCNN, that can predict the adjustable trained a specific model for each quality factor, lacking the
quality factor to control the trade-off between artifacts re- flexibility to learn a single model for different JPEG quality
moval and details preservation. Specifically, FBCNN de- factors. (2) DCT based methods [13, 17, 53] need to obtain
couples the quality factor from the JPEG image via a decou- the DCT coefficients or quantization table as input, which
pler module and then embeds the predicted quality factor is only stored in JPEG format. Besides, when images are
into the subsequent reconstructor module through a qual- compressed multiple times, only the most recent compres-
ity factor attention block for flexible control. Besides, we sion information is stored. (3) To solve the first problem,
find existing methods are prone to fail on non-aligned dou- some recent work [13, 15, 50] resort to training a single
ble JPEG images even with only a one-pixel shift, and we model for a large range of quality factors. However, these
thus propose a double JPEG degradation model to aug- blind methods can only provide a deterministic reconstruc-
ment the training data. Extensive experiments on single tion result for each input, ignoring the need for user pref-
JPEG images, more general double JPEG images, and erences. (4) Existing methods are all trained with synthetic
real-world JPEG images demonstrate that our proposed images which assumes that the low-quality images are com-
FBCNN achieves favorable performance against state-of- pressed only once. However, most images from the Internet
the-art methods in terms of both quantitative metrics and are compressed multiple times. Despite some progress for
visual quality. real recompressed images, e.g., from Twitter [11, 15], a de-
tailed and complete study on double JPEG artifacts removal
is still missing.
1. Introduction To tackle the above problems, we design a flexible blind
JPEG [39] is one of the most widely-used image com- convolutional neural network, namely FBCNN, for real
pression algorithms and formats due to its simplicity and JPEG image restoration. Our FBCNN is a single model that
fast encoding/decoding speeds. JPEG compression splits an can deal with JPEG images with different quality factors. In
image into 8 × 8 blocks and applies discrete cosine trans- addition, FBCNN can work independent of the image for-
form (DCT) to each block. The DCT coefficients are then mats, as it directly processes images in pixel-domain, with-
divided by a quantization table and rounded to the nearest out the need to access the metadata of images. By further
integer. The elements in the quantization table control the decoupling the latent quality factor from the input JPEG
compression ratio and the rounding operation is the only image, we can use this important parameter to guide the
lossy operation in the whole process. The quantization ta- artifacts removal process. As a controllable variable with
clear physical meaning, the predicted quality factor can also
* Corresponding author. be adjusted via interactive selection to achieve a balance
between artifacts removal and details preservation. To ad- of quantization table as prior information, which allows a
dress the problems with real-world JPEG images, we pro- single model to correct artifacts at any quality factor and
vide a detailed study on the restoration of images with dou- achieved state-of-the-art results.
ble JPEG compression. We find that existing blind methods
are prone to fail when the 8 × 8 blocks of double JPEG Double JPEG Compression. Double JPEG compression
compression are not aligned and QF1 ≤ QF2 . However, our has been studied in the area of image forensics for a long
quality factor predictor can help to explain the behavior of time, as detection of double compression can provide im-
current blind methods under unseen scenarios. We provide portant clues for the recovery of image processing history.
comprehensive empirical evidence showing that blind meth- Fu et al. [14] showed that if an image has been JPEG com-
ods work are easy to be misled by the unseen compound ar- pressed only once, then the first digits of the quantized
tifacts, resulting in an unpleasant reconstructed output. By JPEG coefficients follow a Benford-like logarithmic law.
correcting the predicted quality factor, FBCNN instead can In [3, 4, 8, 29], double JPEG compression was classified
boost the performance on complex double JPEG images. into two cases: aligned and non-aligned. Chen et al. [8]
To obtain a fully blind model, we further propose two solu- formulated the periodic characteristics of JPEG images in
tions: correcting QF to the smaller one which can be esti- both spatial and DCT domains and showed that such pe-
mated by our dominant QF estimation method or augment- riodic characteristics will be changed after recompression.
ing the training data with non-aligned double JPEG images. Recently, learning-based methods [2, 31, 40] were proposed
To summarize, the main contributions of this paper are: to detect double JPEG compression. The estimation of the
(1) A flexible blind convolutional neural network for first quantization table of JPEG images is also a challeng-
JPEG artifacts removal (FBCNN) is proposed. FBCNN can ing problem and studied in both aligned [16, 33, 44, 47]
predict the latent quality factor to guide the image restora- and non-aligned cases [5, 10, 45]. However, these methods
tion. The predicted quality factor can be adjusted manually focus on analyzing the DCT coefficients, which are only
to control the preference between artifacts removal and de- stored in JPEG format. Besides, the research on double
tails preservation according to the user’s needs. JPEG compression restoration is still missing.
(2) We perform a thorough analysis of double JPEG im-
ages and provide solutions to take a step towards the restora-
tion of real images. To the best of our knowledge, this is the Flexible Image Restoration. Flexible image generation
first attempt to handle double non-aligned JPEG compres- based on the conditional variable has drawn much attention
sion. We hope that the community will gradually begin to in e.g. text-to-image generation [26, 35, 43] and facial at-
consider this more challenging and realistic scenario. tribute editing [9, 21, 27]. However, these methods can not
(3) We demonstrate the effectiveness of FBCNN on syn- be directly adopted in image restoration. Zhang et al. [51]
thetic and real JPEG images with complex degradation set- proposed to take a tunable noise level map as the input to
tings. Our proposed FBCNN provides a useful solution for handle noise on different levels. In [52], a PCA-based di-
practical applications. mensionality stretching of the degradation parameters was
proposed to take blur kernel and noise level as input for
2. Related Work super-resolution. Wang et al. [41] proposed a novel con-
trollable framework for interactive image restoration. He et
JPEG Artifacts Removal Networks. Learning-based al. [19] focused on the images with multiple degradations
methods have made notable progress in JPEG artifacts re- and added the multi-dimensional degradation information
moval in the past few years. Dong et al. [11] first in- as input. These methods usually assume that the control-
troduced deep learning to remove JPEG artifacts, inspired lable variable is provided, but such information is almost
by the success of super-resolution network [12]. Zhang et unknown in real applications. This encourages us to work
al. [50] employed batch normalization [22] and residual towards a flexible blind solution for image restoration.
learning [20] strategies to speed up the training process and
boost the performance on general blind image restoration 3. Proposed Method
tasks. A wavelet transform based network was presented
in [28] as the generalization of dilated convolution [46] In this section, we first introduce the architecture of our
and subsampling, leading to a large improvement. Fu et FBCNN, and then present its advantage over other state-
al. [15] proposed a deep convolutional sparse coding net- of-the-art methods, especially for practical recompressed
work that combines model-based methods with deep learn- JPEG images.
ing. Besides, dual-domain convolutional network based
3.1. Flexible Blind Artifacts Removal Network
methods [17, 23, 53, 55] were proposed to take advantage
of redundancies on both pixel and DCT domains. Recently, Fig. 1 illustrates the overall architecture of our proposed
Ehrlich et al. [13] trained their networks with the utilization FBCNN. FBCNN is an end-to-end model which takes a
QF = 90
Skip Connection
QF = 50
Decoupler Reconstructor QF = 10

Image Features
Quality Factor
(γ, β)
QF Features 1
Predictor Controller
2

...

ReLU
Conv

Conv
99
⊙ ⊕ ⊕
100
γ β
QF Attention Block

Interactive Selection

Figure 1. The architecture of the proposed FBCNN for JPEG artifacts removal. FBCNN consists of four parts, i.e., decoupler, quality factor
predictor, flexible controller, and image reconstructor. The decoupler extracts the deep features from the input corrupted JPEG image and
then splits them into image features and QF features which are subsequently fed into the reconstructor and predictor, respectively. The
controller gets the estimated QF from the predictor and then generates QF embeddings. The QF attention block enables the controller to
make the reconstructor produce different results according to different QF embeddings. The predicted quality factor can be changed with
interactive selections to have a balance between artifacts removal and details preservation.

JPEG image as input and directly generates the output im- factor QFest of the compressed image. We set the number
age. Specifically, FBCNN consists of four components: de- of nodes in each hidden layer as 512 for a better predic-
coupler, QF predictor, flexible controller, and image recon- tion. During training, patches with small sizes may only in-
structor. The network is fairly straightforward, with each clude limited information and correspond to multiple qual-
component designed to achieve a specific task. ity factors so that the quality factor can not be accurately
estimated, which may lead to an unstable training process.
Therefore, we use the L1 loss function to avoid too much
Decoupler: The decoupler aims to extract the deep fea-
penalty for such outliers. Let N be the batch size during
tures and decouple the latent quality factor from the input
training, the loss for quality factor estimation in each batch
image. It involves four scales, each of which has an identity
can be written as:
skip connection to the reconstructor. 4 residual blocks are
adopted in each scale, and each residual block is composed
of two 3 × 3 convolution layers with ReLU activation in \mathcal {L}_\text {QF} = \frac {1}{N}\sum _{i=1}^{N} \norm {\text {QF}_\text {est}î - \text {QF}_\text {gt}î}_1. (1)
the middle. 2 × 2 strided convolutions are adopted for the
downscaling operations. The number of output channels in
Flexible Controller: The flexible controller is a 4-layer
each layer from the first to the fourth scale is set to 64, 128,
MLP and takes as input the quality factor, representing the
256, 512, respectively. The image features from the decou-
degree of compression of the targeted image. The con-
pler are passed into the reconstructor. At the same time,
troller aims to learn an embedding of the given quality fac-
they are also shared by an additional quality factor branch
tor that can be fused into the reconstructor for flexible con-
that uses residual blocks to extract higher-level information,
trol. Inspired by recent research in spatial feature trans-
followed by a global average pooling layer to get the global
form [32, 42], the controller learns a mapping function that
quality factor features from the image features.
outputs a modulation parameter pair (γ, β) which embeds
the given quality factor. Specifically, the first three layers
Quality Factor Predictor: The QF predictor is a 3-layer of MLP generate shared intermediate conditions, which are
MLP (multilayer perceptron) that takes as input the 512- then split into three parts corresponding to the three scales
dimensional QF features and produces an estimated quality in the reconstructor. In the last layer of MLP, we learn dif-
ferent parameter pairs for different scales in reconstructor Cascaded QF prediction and non-blind model: It is
whereas shared (γ, β) are broadcasted to the QF Attention also possible to design a QF predictor cascaded by a non-
block within the same scale. blind method like CBD-Net [18]. However, our method
enjoys some benefits compared with such a cascaded de-
sign: First, for accurate quality factor estimation, a con-
Image Reconstructor: The image reconstructor includes volutional network starting from the same scale as the in-
three scales and receives image features from decoupler and put image is needed, which would increase the total model
quality factor embedding parameters (γ, β) to generate the size and cost more training and inference time. Instead, we
restored clean image. The QF attention block is an impor- only add a relatively small prediction branch. Second, our
tant component of the reconstructor. The number of QF at- decoupler shared parameters for QF estimation and image
tention blocks in each scale is set to 4. The learned parame- reconstruction, accelerating the convergence of predicting
ter pair (γ, β) adaptively influences the outputs by applying QF. On the contrary, in cascaded design, inaccurate QF esti-
an affine transformation spatially to each intermediate fea- mation would lead to an unstable training process. It might
ture map inside the QF attention block of each scale. be a solution to train a QF predictor and then freeze it to
After obtaining (γ, β) from the controller, the transfor- train the second part for reconstruction. Nevertheless, it
mation is carried out by scaling and shifting feature maps would cost more training time than our joint training sched-
of a specific layer: ule. Fourth, in cascaded networks, the predicted parame-
ter is treated as the input of the second part and propagates
\boldsymbol {F_\text {out}} = (1+ \boldsymbol {\gamma })\odot \boldsymbol {F_\text {in}}\oplus \boldsymbol {\beta }, (2) through the whole encoder-decoder architecture. Instead,
our predicted parameter QF is the only input to the decoder
where Fin and Fout denote the feature maps before and after part. We can change the QF to adjust different outputs dur-
the affine transformation, and ⊙ is referred to as element- ing inference without the need to change the encoded image
wise multiplication. features, which saves half of the inference time.
Given N training samples within a batch, the goal of the
image reconstructor is to minimize the following L1 loss 3.3. Restoration of Double JPEG Images
function between reconstructed image Irec and the original Limitations of existing methods: Although some exist-
ground-truth image Igt : ing work claimed to work on recompressed JPEG images, a
detailed study on the restoration of double JPEG compres-
sion is still missing. We find that the current blind methods
\mathcal {L}_\text {rec} = \frac {1}{N}\sum _{i=1}^{N}\norm {\mathbf {I}_\text {rec}î - \mathbf {I}_\text {gt}î}_1. (3) always fail when the blocks of two JPEG compression are
not aligned and QF1 ≤ QF2 , even if there is an only one-
pixel shift between two compression.
Overall, the complete training objective can be written as:
Let us look at an example in Fig. 2, where the appear-
ances of JPEG images with different compression settings
\mathcal {L}_\text {total} = \mathcal {L}_\text {rec} + \lambda \cdot \mathcal {L}_\text {QF}, (4)
can be observed. To get non-aligned double JPEG images,
we remove the first row and the first column of the image
where λ controls the balance between image reconstruction
between the first compression with QF1 and the second one
and QF estimation.
with QF2 . For aligned double JPEG with QF = (90, 10),
(10, 90), and non-aligned double JPEG with QF = (90,
3.2. Comparison with Other Design Choices 10)*, the blocking effects are similar to single compression
In the following, we will clarify the differences between with QF = 10: the edges of 8 × 8 blocks are apparent. How-
the proposed FBCNN and two alternative design choices. ever, in the case of non-aligned double JPEG with QF =
(10, 90)*, the blocking edges are not clear anymore. We test
representative blind methods DnCNN [50] and QGAC [13]
A blind model without QF prediction: Existing blind on these images.
methods only provide a deterministic result, ignoring the As shown in Fig. 2, in cases of QF = 90, 10, (90, 10),
need of the user’s preference. Besides, as we will discuss the blocking effects are well removed by both methods.
in Sec. 3.3, although the pure blind model performs favor- DnCNN also works well on QF = (10, 90), while QGAC
ably for single JPEG artifacts removal without knowing the fails in this case because QGAC extracts the quantization
quality factor, it does not generalize well to real corrupted table from the JPEG image, but JPEG images only keep the
images whose artifacts are more complex. FBCNN can be most recent compression information. Therefore, we con-
viewed as multiple deblockers and can control the trade-off clude that existing quantization table-based methods are not
between JPEG artifacts removal and details preservation. suitable for real application.
QF=10 QF=90 QF=(90,10) QF=(10,90) QF=(90,10)* QF=(10,90)*

JPEG

DnCNN

QGAC

Figure 2. Visual comparisons of a JPEG image with different degradation settings and their restored results by DnCNN and QGAC. QF =
(QF1 , QF2 ) denotes that the image is firstly compressed with QF1 and then compressed with QF2 . ‘*’ means there is a pixel shift (1,1)
between blocks of two compression. Even only a shift of one pixel between two compression can lead to failures of existing methods.

However, in the case of non-aligned double JPEG com- FBCNN trained with a single JPEG degradation model
pression when QF1 = 10 and QF2 = 90, both methods do with dominant QF correction: Since our FBCNN can
not work. Since our FBCNN is also a pixel-based blind provide different outputs by setting different quality factors,
method like DnCNN but can predict the quality factor, it can correcting the predicted QF to the smaller one, which actu-
be used to explain the behavior behind a blind method. We ally dominates the main compression, is expected to im-
test FBCNN using the same images. Not surprisingly, we prove the restoration results. However, to get a fully blind
get a similar, almost unchanged reconstructed result, but we model, it is crucial to infer the smaller quality factor auto-
find the predicted quality factor is 90. We continue to test matically. By utilizing the property of JPEG compression,
other images with non-aligned double JPEG compression we find that the quality factor of a JPEG image with single
and QF1 < QF2 , finding that the predicted quality factor is compression can be obtained by doing another JPEG com-
always close to QF2 . This is to say, blind methods trained pression with all possible QFs. The image’s QF corresponds
with single JPEG compression image pairs are always mis- to the global minimum of the MSE (mean squared error) be-
led by the appearance of non-aligned double JPEG images tween two JPEG images. We further extend this method to
with QF1 < QF2 . They also do not work when QF1 = QF2 . challenging non-aligned double JPEG images with QF1 <
In summary, we classify double JPEG compression into QF2 . We apply another JPEG compression with all possi-
two categories: simple and complex compression. Sim- ble QFs after a shift in the range of 0 to 7 in two directions.
ple compression corresponds to non-aligned double JPEG We also calculate the MSE curves for each shift possibility
with QF1 > QF2 and all aligned double JPEG compression, between the two JPEG images. For each MSE curve, we
which is actually equivalent to single JPEG compression. search for the first minimum. It can be found that among all
Complex compression corresponds to non-aligned double the first minimums, the QF at the smallest first minimum is
JPEG with QF1 ≤ QF2 , where composite artifacts occur. always close to QF1 , while the QF at the global minimum is
We test images with these degradation settings by a recent approximate to QF2 . Besides, we constrain the MSE of the
double JPEG compression algorithm [31], finding that only smallest first minimum to be smaller than a threshold T to
images with non-aligned double JPEG with QF1 ≤ QF2 can have more robust results. We empirically set T to 30 in our
be identified as double JPEG compression, which further experiment. We name the FBCNN model with dominant
support our arguments. QF correction as FBCNN-D.
To overcome the problem with non-aligned double JPEG
compression, we propose two solutions, from the perspec- FBCNN trained with double JPEG degradation model:
tives of adjusting the QF to utilize our flexible network and We can also solve this problem by augmenting the training
augmenting the training data. data using images with double JPEG compression. We pro-
Table 1. PSNR|SSIM|PSNRB results of different methods on grayscale JPEG images with single compression. Please note that the
methods marked with * train a specific model for each quality factor. The best two results are highlighted in red and blue colors, respectively.
Dataset Quality JPEG ARCNN* MWCNN* DnCNN DCSC QGAC FBCNN (Ours)
10 27.82|0.760|25.21 29.03 |0.793|28.76 30.01|0.820|29.59 29.40|0.803|29.13 29.62|0.810|29.30 29.84|0.812|29.43 30.12|0.822|29.80
20 30.12|0.834|27.50 31.15|0.852|30.59 32.16|0.870|31.52 31.63|0.861|31.19 31.81|0.864|31.34 31.98|0.869|31.37 32.31|0.872|31.74
Classic5
30 31.48|0.867|28.94 32.51|0.881|31.98 33.43|0.893|32.62 32.91|0.886|32.38 33.06|0.888|32.49 33.22|0.892|32.42 33.54|0.894|32.78
40 32.43|0.885|29.92 33.32|0.895|32.79 34.27|0.906|33.35 33.77|0.900|33.23 33.87|0.902|33.30 34.05|0.905|33.12 34.35|0.907|33.48
10 27.77|0.773|25.33 28.96|0.808|28.68 29.69|0.825|29.32 29.19|0.812|28.90 29.34|0.818|29.01 29.51|0.825|29.13 29.75|0.827|29.40
20 30.07|0.851|27.57 31.29|0.873|30.76 32.04|0.889|31.51 31.59|0.880|31.07 31.70|0.883|31.18 31.83|0.888|31.25 32.13|0.889|31.57
LIVE1
30 31.41|0.885|28.92 32.67|0.904|32.14 33.45|0.915|32.80 32.98|0.909|32.34 33.07|0.911|32.43 33.20|0.914|32.47 33.54|0.916|32.83
40 32.35|0.904|29.96 33.61|0.920|33.11 34.45|0.930|33.78 33.96|0.925|33.28 34.02|0.926|33.36 34.16|0.929|33.36 34.53|0.931|33.74
10 27.80|0.768|25.10 29.10|0.804|28.73 29.61|0.820|29.14 29.21|0.809|28.80 29.32|0.813|28.91 29.46|0.821|28.97 29.67|0.821|29.22
20 30.05|0.849|27.22 31.28|0.870|30.55 31.92|0.885|31.15 31.53|0.878|30.79 31.63|0.880|30.92 31.73|0.884|30.93 32.00|0.885|31.19
BSDS500
30 31.37|0.884|28.53 32.67|0.902|31.94 33.30|0.912|32.34 32.90|0.907|31.97 32.99|0.908|32.08 33.07|0.912|32.04 33.37|0.913|32.32
40 32.30|0.903|29.49 33.55|0.918|32.78 34.27|0.928|33.19 33.85|0.923|32.80 33.92|0.924|32.92 34.01|0.927|32.81 34.33|0.928|33.10

pose a new degradation model to synthesize the non-aligned ARCNN [11], MWCNN [28], DnCNN [50], DCSC [15],
double JPEG image y from the uncompressed image x via QGAC [13]. It should be pointed out that ARCNN and
MWCNN train a single network for each specific value of
\mathbf {y}=\text {JPEG}(\text {shift}(\text {JPEG}(\mathbf {x}, \text {QF}_1)), \text {QF}_2). (5) quality factor, and DCSC is trained with quality factors
from 10 to 40. Only DnCNN, QGAC, and our FBCNN
For shift operation, we randomly remove the first i rows and cover a full range of quality factors. We calculate the PSNR,
j columns of the image after the first compression, where SSIM, and PSNR-B for quantitative assessment. The quan-
0 ≤ i, j ≤ 7. When trained with double JPEG com- titative results are shown in Table 1. Our method has signif-
pressed images, the weight of quality factor loss is set to icantly better results than other blind methods and moder-
zero. Then the dominant quality factor can be trained in an ately better results than MWCNN, which trains each model
unsupervised way. We name the FBCNN model with aug- for a specific quality factor. For subjective comparisons,
mented training data as FBCNN-A. Note that our double some restored images of different approaches on the LIVE1
JPEG degradation model can also be applied to other tasks dataset have been presented. As can be seen in Fig. 3, the
such as blind single image super-resolution [49]. results of our FBCNN are more visually pleasing.

4. Experiments
Color JPEG image restoration We also train our model
4.1. Data Preparation and Network Training on RGB channels, referred to as FBCNN-C. We compare
For fair comparisons, JPEG images used during training FBCNN-C with QGAC, which is a state-of-the-art method,
and evaluation are all generated by the MATLAB JPEG en- especially for color JPEG image restoration. The evalua-
coder. We use the Y channel of YCbCr space for grayscale tion is made on LIVE1 [36], testset of BSDS500 [30], and
image comparison, and the RGB channels for color image ICB [34] dataset. Although QGAC is specially designed
comparison. Following [13], we employ DIV2K [1] and for color JPEG image artifacts removal, we still get better
Flickr2K [38] as our training data. During training, we ran- performance by setting the input/output channels as 3. The
domly extract patch pairs with the size 128 × 128, and the result is shown in Table 2.
quality factor is randomly sampled from 10 to 95. We set
λ to 0.1. To optimize the parameters of FBCNN, we adopt Table 2. PSNR|SSIM|PSNRB results of QGAC and FBCNN-C on
the Adam solver [24] with batch size 256. The learning rate color JPEG images with single compression.
starts from 1 × 10−4 and decays by a factor of 0.5 every
4×104 iterations and finally ends with 1.25×10−5 . We train Dataset QF JPEG QGAC FBCNN-C (Ours)
10 25.69|0.743|24.20 27.62|0.804|27.43 27.77|0.803|27.51
our model with PyTorch on eight NVIDIA GeForce GTX 20 28.06|0.826|26.49 29.88|0.868|29.56 30.11|0.868|29.70
LIVE1
2080Ti GPUs. It takes about two days to obtain FBCNN. 30 29.37|0.861|27.84 31.17|0.896|30.77 31.43|0.897|30.92
40 30.28|0.882|28.84 32.05|0.912|31.61 32.34|0.913|31.80
10 25.84|0.741|24.13 27.74|0.802|27.47 27.85|0.799|27.52
4.2. Single JPEG Image Restoration 20 28.21|0.827|26.37 30.01|0.869|29.53 30.14|0.867|29.56
BSDS500
30 29.57|0.865|27.72 31.33|0.898|30.70 31.45|0.897|30.72
Grayscale JPEG image restoration We first evaluate the 40 30.52|0.887|28.69 32.25|0.915|31.50 32.36|0.913|31.52
performance of the proposed FBCNN on images with sin- 10 29.44|0.757|28.53 32.06|0.816|32.04 32.18|0.815|32.15
20 32.01|0.806|31.11 34.13|0.843|34.10 34.38|0.844|34.34
gle JPEG compression. We test on the commonly used ICB
30 33.20|0.831|32.35 35.07|0.857|35.02 35.41|0.857|35.35
benchmarks: Classic5 [48], LIVE1 [36] and the test set of 40 33.95|0.840|33.14 32.25|0.915|31.50 36.02|0.866|35.95
BSDS500 [30]. We compare our proposed FBCNN with
(a) JPEG (29.64dB) (b) ARCNN (31.15dB) (c) MWCNN (32.38dB) (d) DnCNN (31.36dB)

(e) DCSC (31.68dB) (f) QGAC (31.97dB) (g) FBCNN (32.51dB) (h) Ground Truth
Figure 3. Visual comparisons of different methods on a single JPEG image ‘BSDS500: 140088’ with QF = 10.

(a) JPEG (27.45dB) (b) QF = 10 (28.13dB) (c) QF = 30 (29.34dB) (d) QF = 90 (28.05dB)

Figure 4. An example to show the flexibility of FBCNN by setting different QFs into the network. The JEPG image is ‘LIVE1: cemetry’
compressed with quality factor 30. Although the artifacts around the words can be effectively removed when the set QF is small, the texture
on the bricks becomes blurred. Users can get the desired results according to their preference through interactive selection by FBCNN.

Table 3. PSNR|SSIM|PSNRB results of different methods on grayscale JPEG images with non-aligned double compression. The testing
images are synthesized from the LIVE1 dataset. The best two results are highlighted in red and blue colors, respectively.
Type QF JPEG DnCNN DCSC QGAC FBCNN (Ours) FBCNN-D (Ours) FBCNN-A (Ours)
(30,10) 27.49|0.762|25.62 28.95|0.805|28.61 29.08|0.810|28.81 29.24|0.818|28.94 29.46|0.820|29.11 29.46|0.820|29.10 29.44|0.818|29.12
QF1 >QF2 (50,10) 27.65|0.769|25.69 29.13|0.810|28.76 29.25|0.815|28.96 29.42|0.823|29.08 29.64|0.825|29.23 29.65|0.825|29.22 29.61|0.823|29.20
(50,30) 30.62|0.866|28.85 32.20|0.895|31.50 32.30|0.897|31.78 32.32|0.899|31.72 32.61|0.902|31.88 32.61|0.902|31.89 32.69|0.901|32.24
(10,10) 26.48|0.715|25.08 27.73|0.765|27.49 27.76|0.768|27.59 27.78|0.771|27.59 27.96|0.774|27.75 27.95|0.774|27.74 28.25|0.777|28.14
QF1 =QF2 (30,30) 29.98|0.847|28.53 31.40|0.878|30.86 31.48|0.880|31.10 31.43|0.881|30.99 31.64|0.884|31.14 31.65|0.884|31.14 31.94|0.886|31.73
(50,50) 31.58|0.888|30.18 33.12|0.912|32.44 33.28|0.914|32.80 33.12|0.914|32.50 33.38|0.917|32.61 33.45|0.914|32.85 33.70|0.919|33.34
(10,30) 27.55|0.760|26.94 28.33|0.790|28.17 28.31|0.789|28.19 28.30|0.791|28.18 28.29|0.791|28.15 28.94|0.802|28.82 29.38|0.816|29.30
QF1 <QF2 (10,50) 27.69|0.768|27.41 28.30|0.791|28.24 28.40|0.794|28.35 28.23|0.791|28.18 28.20|0.789|28.14 28.96|0.801|28.88 29.52|0.820|29.45
(30,50) 30.61|0.865|29.60 31.89|0.890|31.46 32.08|0.893|31.78 31.81|0.891|31.43 31.96|0.893|31.50 32.31|0.895|31.94 32.64|0.900|32.49

Flexible JPEG image restoration To demonstrate the posed methods on images with double JPEG compression.
flexibility of FBCNN, we show an example in Fig. 4. By We compare our methods with blind methods: DnCNN,
setting different quality factors, we can get results with dif- DCSC, QGAC. The comparison is conducted using dif-
ferent perception qualities. Users can make an interactive ferent combinations of quality factors (QF1 , QF2 ) on the
selection according to their preferences. LIVE1 dataset. Each original image is JPEG compressed
with QF1 , cropped by a random shift (4, 4) to the upper left
4.3. Double JPEG Image Restoration corner, and JPEG compressed with QF2 .
The focus of our paper is to remove the complex dou- The numerical and visual results are reported in Table 3
ble JPEG compression artifacts, which is an important step and Fig. 5. As shown in Table 3, when changing the or-
towards real image restoration. So we also evaluate the per- der of QF1 and QF2 , although the differences between the
formance of current state-of-the-art methods and our pro- PSNR values of JPEG images are generally smaller than
(a) JPEG (31.34dB) (b) DnCNN (32.10dB) (c) DCSC (31.97dB) (d) QGAC (32.06dB)

(e) FBCNN (32.04dB) (f) FBCNN-D (32.89dB) (g) FBCNN-A (33.62dB) (h) Ground Truth
Figure 5. Visual comparisons of image ‘LIVE1: caps’ with non-aligned double JPEG compression. This image is degraded by the first
JPEG with QF1 =10, pixel shift = (4, 4), the second JPEG with QF2 = 30 successively.

(a) JPEG (b) DnCNN (c) DCSC (d) QGAC (e) FBCNN (f) FBCNN-D (g) FBCNN-A
Figure 6. Visual comparisons of an example from our Meme dataset.

0.05 dB, a significant drop in performance can be seen on ten compressed many times. Fig. 6 shows a test example
other methods and our FBCNN. Since DCSC is only trained on our collected Meme dataset. Since there are no ground-
with small quality factors from 10 to 40, it generally per- truth high-quality images and no reliable no-reference im-
forms better than DnCNN, QGAC, and FBCNN when QF1 age quality assessment (IQA) metrics, we do not report the
< QF2 . Despite some benefits for double JPEG compres- quantitative results. We leave the study of no-reference IQA
sion, it should be pointed out that it is not reasonable to use for JPEG compression artifacts removal for future works.
a model trained with low quality factors to tackle all kinds
of JPEG images. When dealing with relatively high-quality 5. Conclusions
images, it tends to give more blurry results.
In this paper, we proposed a flexible blind JPEG artifacts
We also examine the effectiveness of our proposed two
removal network (FBCNN) for real JPEG image restora-
solutions to non-aligned double JPEG restoration. FBCNN-
tion. FBCNN decouples the quality factor from the input
D is obtained based on FBCNN by correcting the qual-
image via a decoupler and then embeds the predicted qual-
ity factor by dominant QF estimation during inference.
ity factor into the subsequent reconstructor through a qual-
FBCNN-A is obtained by augmenting the training data
ity factor attention block for flexible control. The predicted
with our proposed double JPEG degradation model. Ta-
quality factor can also be adjusted to achieve a balance be-
ble 3 shows that by correcting the predicted quality fac-
tween artifacts removal and details preservation. Besides,
tor, FBCNN-D largely improves the PSNR when QF1 <
we address non-aligned double JPEG restoration tasks to
QF2 . FBCNN-A further improves performance when QF1
take steps towards real JPEG images with severe degrada-
< QF2 . The difficult case when QF1 = QF2 also sees an
tions. Extensive experiments on single JPEG images, the
improvement on FBCNN-A.
more general double JPEG images, and real-world JPEG
images demonstrate the flexibility, effectiveness, and gen-
4.4. Real-World JPEG Image Restoration eralizability of our proposed FBCNN for restoring different
Besides the above experiments on synthetic test images, kinds of degraded JPEG images.
we also conduct experiments on real images to demonstrate Acknowledgments: This work was partly supported by the
the effectiveness of the proposed FBCNN. We collect 400 ETH Zürich Fund (OK) and a Huawei Technologies Oy
meme images from the Internet, as this kind of image is of- (Finland) project.
References forensics. In Security, Steganography, and Watermarking of
Multimedia Contents IX, volume 6505, page 65051L. Inter-
[1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge national Society for Optics and Photonics, 2007. 2
on single image super-resolution: Dataset and study. In
[15] Xueyang Fu, Zheng-Jun Zha, Feng Wu, Xinghao Ding, and
IEEE Conference on Computer Vision and Pattern Recog-
John Paisley. Jpeg artifacts reduction via deep convolutional
nition Workshops, pages 126–135, 2017. 6
sparse coding. In International Conference on Computer Vi-
[2] Mauro Barni, Luca Bondi, Nicolò Bonettini, Paolo sion, pages 2501–2510, 2019. 1, 2, 6
Bestagini, Andrea Costanzo, Marco Maggini, Benedetta
[16] Fausto Galvan, Giovanni Puglisi, Arcangelo Ranieri Bruna,
Tondi, and Stefano Tubaro. Aligned and non-aligned double
and Sebastiano Battiato. First quantization matrix estimation
jpeg detection using convolutional neural networks. Jour-
from double compressed jpeg images. IEEE Transactions on
nal of Visual Communication and Image Representation,
Information Forensics and Security, 9(8):1299–1310, 2014.
49:153–163, 2017. 2
2
[3] Mauro Barni, Andrea Costanzo, and Lara Sabatini. Identifi-
[17] Jun Guo and Hongyang Chao. Building dual-domain repre-
cation of cut & paste tampering by means of double-jpeg de-
sentations for compression artifacts reduction. In European
tection and image segmentation. In International Symposium
Conference on Computer Vision, pages 628–644. Springer,
on Circuits and Systems, pages 1687–1690. IEEE, 2010. 2
2016. 1, 2
[4] Tiziano Bianchi and Alessandro Piva. Analysis of non-
[18] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei
aligned double jpeg artifacts for the localization of image
Zhang. Toward convolutional blind denoising of real pho-
forgeries. In International Workshop on Information Foren-
tographs. In IEEE Conference on Computer Vision and Pat-
sics and Security, pages 1–6. IEEE, 2011. 2
tern Recognition, pages 1712–1722, 2019. 4
[5] Tiziano Bianchi and Alessandro Piva. Image forgery
localization via block-grained analysis of jpeg artifacts. [19] Jingwen He, Chao Dong, and Yu Qiao. Interactive multi-
IEEE Transactions on Information Forensics and Security, dimension modulation with dynamic controllable residual
7(3):1003–1017, 2012. 2 learning for image restoration. In European Conference on
Computer Vision. Springer, 2020. 2
[6] Lukas Cavigelli, Pascal Hager, and Luca Benini. Cas-cnn:
A deep convolutional neural network for image compression [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
artifact suppression. In International Joint Conference on Deep residual learning for image recognition. In IEEE Con-
Neural Networks, pages 752–759. IEEE, 2017. 1 ference on Computer Vision and Pattern Recognition, pages
[7] Yunjin Chen and Thomas Pock. Trainable nonlinear reaction 770–778, 2016. 2
diffusion: A flexible framework for fast and effective image [21] Zhenliang He, Wangmeng Zuo, Meina Kan, Shiguang Shan,
restoration. IEEE Transactions on Pattern Analysis and Ma- and Xilin Chen. Attgan: Facial attribute editing by only
chine Intelligence, 39(6):1256–1272, 2016. 1 changing what you want. IEEE Transactions on Image Pro-
[8] Yi-Lei Chen and Chiou-Ting Hsu. Detecting recompression cessing, 28(11):5464–5478, 2019. 2
of jpeg images via periodicity analysis of compression arti- [22] Sergey Ioffe and Christian Szegedy. Batch normalization:
facts for tampering detection. IEEE Transactions on Infor- Accelerating deep network training by reducing internal co-
mation Forensics and Security, 6(2):396–406, 2011. 2 variate shift. In International Conference on Machine Learn-
[9] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, ing, pages 448–456. PMLR, 2015. 2
Sunghun Kim, and Jaegul Choo. Stargan: Unified genera- [23] Yoonsik Kim, Jae Woong Soh, and Nam Ik Cho. Agarnet:
tive adversarial networks for multi-domain image-to-image Adaptively gated jpeg compression artifacts removal net-
translation. In IEEE Conference on Computer Vision and work for a wide range quality factor. IEEE Access, 8:20160–
Pattern Recognition, pages 8789–8797, 2018. 2 20170, 2020. 2
[10] Nandita Dalmia and Manish Okade. Robust first quantization [24] Diederik P Kingma and Jimmy Ba. Adam: A method
matrix estimation based on filtering of recompression arti- for stochastic optimization. In International Conference on
facts for non-aligned double compressed jpeg images. Signal Learning Representations, 2015. 6
Processing: Image Communication, 61:9–20, 2018. 2 [25] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.
[11] Chao Dong, Yubin Deng, Chen Change Loy, and Xiaoou Imagenet classification with deep convolutional neural net-
Tang. Compression artifacts reduction by a deep convolu- works. Neural Information Processing Systems, 25:1097–
tional network. In International Conference on Computer 1105, 2012. 1
Vision, pages 576–584, 2015. 1, 2, 6 [26] Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip HS
[12] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Torr. Controllable text-to-image generation. In Neural Infor-
Tang. Learning a deep convolutional network for image mation Processing Systems, 2019. 2
super-resolution. In European Conference on Computer Vi- [27] Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding,
sion, pages 184–199. Springer, 2014. 2 Wangmeng Zuo, and Shilei Wen. Stgan: A unified selec-
[13] Max Ehrlich, Ser-Nam Lim, Larry Davis, and Abhinav Shri- tive transfer network for arbitrary image attribute editing. In
vastava. Quantization guided jpeg artifact correction. In Eu- IEEE Conference on Computer Vision and Pattern Recogni-
ropean Conference on Computer Vision, 2020. 1, 2, 4, 6 tion, pages 3673–3682, 2019. 2
[14] Dongdong Fu, Yun Q Shi, and Wei Su. A generalized ben- [28] Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and
ford’s law for jpeg coefficients and its applications in image Wangmeng Zuo. Multi-level wavelet-cnn for image restora-
tion. In IEEE Conference on Computer Vision and Pattern [43] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang,
Recognition Workshops, June 2018. 1, 2, 6 Zhe Gan, Xiaolei Huang, and Xiaodong He. Attngan: Fine-
[29] Weiqi Luo, Zhenhua Qu, Jiwu Huang, and Guoping Qiu. A grained text to image generation with attentional generative
novel method for detecting cropped and recompressed image adversarial networks. In IEEE Conference on Computer Vi-
block. In International Conference on Acoustics, Speech and sion and Pattern Recognition, pages 1316–1324, 2018. 2
Signal Processing, volume 2, pages II–217. IEEE, 2007. 2 [44] Fei Xue, Ziyi Ye, Wei Lu, Hongmei Liu, and Bin Li. Mse
[30] David Martin, Charless Fowlkes, Doron Tal, and Jitendra period based estimation of first quantization step in double
Malik. A database of human segmented natural images compressed jpeg images. Signal Processing: Image Com-
and its application to evaluating segmentation algorithms and munication, 57:76–83, 2017. 2
measuring ecological statistics. In International Conference [45] Heng Yao, Hongbin Wei, Chuan Qin, and Xinpeng Zhang.
on Computer Vision, volume 2, pages 416–423. IEEE, 2001. An improved first quantization matrix estimation for non-
6 aligned double compressed jpeg images. Signal Processing,
[31] Jinseok Park, Donghyeon Cho, Wonhyuk Ahn, and Heung- 170:107430, 2020. 2
Kyu Lee. Double jpeg detection in mixed jpeg quality factors [46] Fisher Yu and Vladlen Koltun. Multi-scale context aggrega-
using deep convolutional neural network. In European Con- tion by dilated convolutions. In International Conference on
ference on Computer Vision, pages 636–652, 2018. 2, 5 Learning Representations, 2016. 2
[32] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan [47] Liyang Yu, Qi Han, Xiamu Niu, SM Yiu, Junbin Fang, and
Zhu. Semantic image synthesis with spatially-adaptive nor- Ye Zhang. An improved parameter estimation scheme for
malization. In IEEE Conference on Computer Vision and image modification detection based on dct coefficient analy-
Pattern Recognition, pages 2337–2346, 2019. 3 sis. Forensic Science International, 259:200–209, 2016. 2
[33] Cecilia Pasquini, Giulia Boato, and Fernando Pérez- [48] Roman Zeyde, Michael Elad, and Matan Protter. On sin-
González. Multiple jpeg compression detection by means of gle image scale-up using sparse-representations. In Interna-
benford-fourier coefficients. In International Workshop on tional Conference on Curves and Surfaces, pages 711–730.
Information Forensics and Security, pages 113–118. IEEE, Springer, 2010. 6
2014. 2 [49] Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte.
Designing a practical degradation model for deep blind im-
[34] Rawzor. Image compression benchmark. 6
age super-resolution. In IEEE Conference on International
[35] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Lo- Conference on Computer Vision, 2021. 6
geswaran, Bernt Schiele, and Honglak Lee. Generative ad-
[50] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
versarial text to image synthesis. In International Conference
Lei Zhang. Beyond a gaussian denoiser: Residual learning of
on Machine Learning, pages 1060–1069. PMLR, 2016. 2
deep cnn for image denoising. IEEE Transactions on Image
[36] HR Sheikh. Live image quality assessment database release Processing, 26(7):3142–3155, 2017. 1, 2, 4, 6
2. http://live. ece. utexas. edu/research/quality, 2005. 6 [51] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward
[37] K. Simonyan and A. Zisserman. Very deep convolutional a fast and flexible solution for cnn-based image denoising.
networks for large-scale image recognition. In International IEEE Transactions on Image Processing, 27(9):4608–4622,
Conference on Learning Representations, May 2015. 1 2018. 2
[38] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming- [52] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a
Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single single convolutional super-resolution network for multiple
image super-resolution: Methods and results. In IEEE Con- degradations. In IEEE Conference on Computer Vision and
ference on Computer Vision and Pattern Recognition Work- Pattern Recognition, pages 3262–3271, 2018. 2
shops, pages 114–125, 2017. 6 [53] Xiaoshuai Zhang, Wenhan Yang, Yueyu Hu, and Jiaying
[39] Gregory K Wallace. The jpeg still picture compression Liu. Dmcnn: Dual-domain multi-scale convolutional neu-
standard. IEEE Transactions on Consumer Electronics, ral network for compression artifacts removal. In Inter-
38(1):xviii–xxxiv, 1992. 1 national Conference on Image Processing, pages 390–394.
[40] Qing Wang and Rong Zhang. Double jpeg compres- IEEE, 2018. 1, 2
sion forensics based on a convolutional neural network. [54] Y Zhang, K Li, K Li, B Zhong, and Y Fu. Residual non-local
EURASIP Journal on Information Security, 2016(1):1–12, attention networks for image restoration. In International
2016. 2 Conference on Learning Representations, 2019. 1
[41] Wei Wang, Ruiming Guo, Yapeng Tian, and Wenming [55] Bolun Zheng, Yaowu Chen, Xiang Tian, Fan Zhou, and
Yang. Cfsnet: Toward a controllable feature space for im- Xuesong Liu. Implicit dual-domain convolutional network
age restoration. In International Conference on Computer for robust color image compression artifact reduction. IEEE
Vision, pages 4140–4149, 2019. 2 Transactions on Circuits and Systems for Video Technology,
[42] Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2019. 2
Recovering realistic texture in image super-resolution by
deep spatial feature transform. In IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 606–615, 2018.
3