Blur 2 Blur
Blur 2 Blur
on Unknown Domains
Bang-Dang Pham1 Phong Tran2 Anh Tran1 Cuong Pham1,3 Rang Nguyen1 Minh Hoai1,4
1 2 3 4
VinAI Research, Vietnam MBZUAI, UAE Posts & Telecommunications Inst. of Tech., Vietnam University of Adelaide, Australia
{v.dangpb1, v.anhtt152, v.rangnhm, v.hoainm}@vinai.io [email protected] [email protected]
arXiv:2403.16205v1 [cs.CV] 24 Mar 2024
Deblur Deblur
Blur Translation
24.7837 dB 26.9832 dB
Unknown Blur Known Blur
Figure 1. We address the unsupervised image deblurring problem by training a blur translator that converts an input image with unknown
blur to an image with a predefined known blur. The figure shows the effectiveness of our approach. The blurry images before and after
translation (left image in each box) exhibit similar visual content but have different blur patterns (zoomed-in patches). While a standard
image deblurring technique fails to restore the unknown-blur image, it successfully recovers the known-blur version, yielding an approximate
2.2 dB increase in PSNR score (noted below each deblurred image on the right side of each box).
to construct a synthesized blurred dataset. Inspired by the the differences between C and C ′ , resulting in suboptimal
effectiveness of this strategy in capturing blur attributes from deblurring performance.
the known dataset, Blur2Blur adopts this approach. It is de- When a pre-trained network is not a good choice, our
signed to discern and retain the blur kernel while selectively remaining option is to train a new deblurring network tai-
ignoring the camera-specific attributes of the target dataset. lored to our camera. The obstacle here is that the specific
blur space C of our device is unknown, and we cannot rely In the remaining of this section, we will discuss two
on having paired training data of corresponding blurry and main components of our method, including the blur-to-blur
sharp images. Paired training data requires a complex hard- translation network G and the target blur space C ′ .
ware setup, involving a beam splitter, along with identical
devices capturing at different speeds, and the capability for 3.2. Blur-to-blur translation
time synchronization, geometrical alignment, and color cal- Our objective here is to train a blur-to-blur translation net-
ibration. Not all camera devices meet these requirements, work G, capable of converting any blurry image from the
and setting up such a system is beyond the expertise of many. unknown blur domain C to a known blur domain C ′ while
Consequently, our only feasible option is to use unpaired preserving the image content. To train G, we require two
data. Fortunately, we can access the camera device to cap- datasets: B, which consists of blurry images from the un-
ture sets of blurry images B and sharp images S, which are known blur domain, and K contains images with known blur,
unpaired and do not necessitate correspondence between for which a deblurring model has already been trained. We
images in B and images in S. Thus, gathering these datasets design G to work at multiple scales and carefully design the
is relatively easy and straightforward. The downside, how- training losses to achieve the desired outcome.
ever, is that learning from unpaired data is challenging. The
deblurring process, which transforms a blurred image y into Adversarial Loss. We employ an adversarial loss [5] to en-
a sharp image x, typically requires an understanding of the force the translation network G to produce images with the
blurring domain C. For unpaired data, this necessity poses desired target blur. To achieve this, we introduce a discrim-
a significant hurdle, especially in reconstructing fine details inator network D, which is responsible for distinguishing
absent or distorted in the blurred input. Traditional deblur- between real images from the known blur domain and gener-
ring networks [20, 40, 44, 46], attempting to ‘hallucinate’ ated images. Two networks G and D are trained alternately
these missing details, often produce unsatisfactory results, in a minimax game. The adversarial loss is defined as:
particularly with images affected by real-world blurring.
In this section, we introduce an innovative method to \begin {split} \mathcal {L}_{adv}(G, D) = \; & \mathbb {E}_{y \sim \mK }[\log D(y)] \\ & + \mathbb {E}_{y \sim \mB }[\log (1 - D(G(y)))]. \end {split}
∗ (3)
learn GC . Rather than directly learning this function, which
is extremely challenging, or roughly approximating it using
∗ ∗
a function learned for another blur domain GC ′ , we treat GC The blur translation network is trained to minimize the above
∗
as a composition of GC ′ and a translation function G, i.e., loss term, while the discriminator D is trained to maximize
∗ ∗
GC = GC ′ ◦ G. Our goal then shifts to learning G to bridge it. We also force the Lipschitz continuity constraint on the
the gap between domains C and C ′ . discriminator using the gradient penalty regularization [6]:
More specifically, our task is to learn a mapping function
G that maps each blurry input image y defined in Eq. (1) to \mathcal {L}^D_{grad}(D) = \mathbb {E}_{\hat {y} \sim \hat {\mB }}[(\|\nabla _{\hat {y}} D(\hat {y})\|_2 - 1)^2], (4)
an image y ′ with the same sharp visual representation x but
belongs to a known blur distribution C ′ : where B̂ is the set of samples ŷ randomly interpolated
between a real image y ∈ B and the generated image
G: y \rightarrow y', \textrm {where } y' = \mF _{C'}(x, k') + \eta '. \label {eq:G} (2) G(y) using a random mixing ratio ϵ ∈ [0, 1], i.e., ŷ =
ϵy + (1 − ϵ)G(y).
Our approach breaks a complex task into two manage-
able ones. One task requires deblurring from C ′ , which, Reconstruction Loss. Given a blurry image y, the desired
while challenging, benefits from existing research. We can function G should translate the blur characteristics from C
∗
select a well-performing pre-trained network GC ′ , which has to C ′ while maintaining the elements belonging to the sharp
been trained with supervised learning using paired data in image x. Using the adversarial loss helps translate the im-
its domain. The other task is to learn a translation from an age to the target blur domain but does not guarantee sharp
unknown blur domain C to a known domain C ′ . The diffi- content preservation. Hence, we integrate a reconstruction
culty of this task depends on the differences between C and loss to enforce the visual consistency between the generated
C ′ , yet it is surely easier than directly learning a mapping blurry image G(y) and the original image y. This loss term
from C to a sharp domain. This is because a blur-to-blur has two benefits: (1) it prevents G from modifying the im-
transformation primarily modifies the blur patterns, avoiding age content and only focuses on the blur kernel translation,
the need to reconstruct intricate image details. Moreover, we and (2) it provides additional supervision to our network,
have the flexibility to choose the most appropriate C ′ and enhancing the training stability. Moreover, to make G focus
∗
GC ′ for our specific blur domain. This flexibility extends on preserving the input semantic content rather than being
to the possibility of utilizing synthetic data, which allows overly constrained by pixel-wise accuracy, we (1) employ
for the generation of extensive datasets, ensuring that the perceptual loss [10] instead of the common L1 or L2 loss
deblurring network is thoroughly trained. function and (2) adopt a multi-scale deblurring architecture
Unknown-Sharp
ℒ!"# ℒ$%&
images
e Blur
ur
pt
Ca Translator
Unknown-blur images
Unknown-Blur Converted Known-Blur
image image images
Ca
ptu Blur Kernel
re
Deblurring Extractor
Model
Unknown-sharp images
Blur Kernel
Transfer
Known-blur images
Deblurring Training
Model
Blur Sharp
Deblurred image
Blur-Sharp images images images
[4] to reconstruct the image content from coarse to fine: images. It can cause G to either fail to converge or introduce
undesired characteristics from the representative dataset K
into the transferred outcomes.
\mathcal {L}^G_{rec}(G) = \frac {1}{M} \sum _{i=1}^M \frac {1}{t_i}\mathbb {E}_{y_i \sim \mB }[||\phi (y_i) - \phi (G(y_i))||_1], (5)
To avoid this issue, we propose generating images in K
from a set of sharp images S captured with the same camera
where M is the number of levels, yi is the input image at as B, thus sharing identical characteristics. These images
scale level i, ϕ(.) is a pre-trained feature extractor with the are then augmented by blur kernels from a known domain,
VGG19 backbone [33]. We divide the loss by the number of characterized by a dataset of blurry-sharp image pairs using
total elements ti for normalization. the blur transfer technique [38]. The blurry-sharp image pair
Total Loss. Our final objective function for G combines the dataset can be selected from commonly used image deblur-
adversarial and reconstruction loss terms: ring datasets like REDS [25], GOPRO [24], RSBlur [30],
and RB2V [26], and we can utilize any deblurring network
\mathcal {L}^G_{total}(G, D) = \mathcal {L}_{adv}(G, D) + \lambda _{rec}\mathcal {L}_{rec}(G), (6) pre-trained on that dataset. A key component in [38] is a
Blur Kernel Extractor F that can isolate and transfer blur
where λrec is the weight factor for the reconstruction loss, kernels from random blurry-sharp image pairs to the target
ensuring the input content is maintained. Concurrently, the sharp inputs. After applying this blur synthesis procedure,
objective function for D is established as follows: we obtain a known-blur image set K that carries blur ker-
nels from the known-blur domain while maintaining other
\begin {split} \mathcal {L}^D_{total}(G, D) = - \mathcal {L}_{adv}(G, D) + \lambda _{grad}\mathcal {L}_{grad}(D). \end {split} (7) camera-based characteristics similar to the unknown-blur
images in B. Consequently, the discriminator can focus on
Here λgrad is a hyperparameter that controls the importance
distinguishing based on blur kernels, facilitating effective
of the gradient penalty loss component.
blur-to-blur translation training. The overview problem and
3.3. Known Blur Selection pipeline of our method is illustrated in Fig. 2.
The choice of C ′ and its representative dataset K is impor- 4. Experiments
tant because the difficulty of learning the blur translation
network depends on the discrepancy between the two blur 4.1. Experimental Setups
domains. As described in Sec. 3.2, the representative dataset 4.1.1 Datasets and implementation details
K only affects the adversarial training losses. The transla-
tion network G aims to convert images in B to have similar We evaluate our proposed method on four datasets. REDS
blur characteristics as images in K so that the discriminator dataset [25] consists of 300 high-speed videos used to cre-
D cannot differentiate between the generated images and ate synthetic blur. By ramping up the frame rate from 120
the real images in K. However, if K and B have different to 1920 fps and averaging frames with an inverse Cam-
characteristics besides the blur kernel distribution, such as era Response Function (CRF), it simulates more realistic
color tone, image resolution, or device-dependent noise pat- motion blur, differentiating it from other synthetic datasets
tern, D may rely on them to differentiate real and generated [23, 32]. GoPro dataset [24] comprises 3,142 paired frames
Number of data samples responses obtained from applying the Laplacian operator. A
Dataset
U. blur (B) U. sharp (S) Test lower variance corresponds to a reduced range of intensity
RB2V Street 5400 3600 2053 changes, signaling a blurrier image with fewer edges. Ini-
REDS 14400 9600 3000 tially, we optimized approximately 50% of the data within
RSBlur 8115 5410 8301 a single batch. Subsequently, after 200K iterations, we in-
GoPro 1261 842 1111 crementally scaled this proportion to encompass the full
Table 1. Statistics of datasets used as unknown domains. batch. We evaluated Blur2Blur in combination with different
state-of-the-arts deblurring network backbones, including
NAFNet [3] and Restormer [42]. During training, we ran-
of sharp and blurred images, recorded at 240 frames per
domly cropped these images to obtain a square shape of
second. It employed a synthesis method akin to that of the
256×256 and augmented with rotation, flip and color-jitter.
REDS dataset but with a different camera response function.
All experiments were performed using the Adam opti-
We utilize this dataset as the main target data for evaluating
mizer [12]. Training our model required roughly 3 days for
deblurring methods in combination with Blur2Blur. RSBlur
1M iterations on 2 Nvidia A100 GPUs. The learning rate
dataset [30] contains 13,358 real blurred images. It pro-
is maintained constant for the first 500K iterations and then
vides sequences of sharp images alongside blurred ones for
linearly reduced during the remaining iterations as in [37].
in-depth blur analysis and offers the higher resolution than
similar datasets. Noise levels are also estimated to assess and
compare to the noise present in real-world blur scenarios. 4.1.2 Baselines
In this paper, for the evaluation on the publicly available
RSBlur dataset, we utilized the official dataset alongside its We compared Blur2Blur with a comprehensive list of base-
Additional set at a lower sampling rate, rather than generat- line methods from three categories: supervised methods
ing blurry images from the RSBlur sharps set using dense (NAFNet [3], Restormer [42]), unpaired training (Cycle-
sampling as described in the original paper. RB2V dataset GAN [46], DualGAN [40]), and generalized image deblur-
[26] comprises about 11,000 real-world pairs of a blurry ring (BSRGAN+NAFNet [43], RSBlur+NAFNet [29]).
image and a sharp image sequence for street categories, de- For fair comparisons, we retrained the supervised models
noted as RB2V street. Experiments on this dataset are crucial using the blur-sharp pairs from the source dataset. Further-
for confirming the effectiveness of our algorithm in handling more, to replicate real-world scenarios with the absence of
real-world, camera-specific data. paired data for deblurring network training, we generated
synthetic motion-blur data derived from the unknown-sharp
Train and test data. To address practical deblurring prob- image set S by adding motion blur synthesis techniques,
lems, our method assumes access to unpaired sets containing such as the one provided by the imgaug library [11]. This
blurry images B and sharp images S. When selecting a approach synthesized motion blur independently on each
dataset as the source for our deblurring evaluation, we divide image. For the unpaired training and generalized image de-
its training data into two disjoint subsets that capture differ- blurring approaches, we used the blurry images in B and
ent scenes with a specific ratio of 0.6:0.4. In the first subset, the sharp images from S for training the deblurring net-
we select blurry images to form the unknown-blur image work. BSRGAN was originally designed for blind image
set B, while in the second subset, we choose sharp images super-resolution, and we adapted it to work on blind image
to construct the sharp set S. For the chosen target dataset, deblurring by adding motion blur augmentation (via averag-
representing the domain for blur kernel translation via the ing with neighboring frames) into its augmentation pipeline.
Blur2Blur mechanism, we employ the entire training dataset
to train our Blur Kernel Extractor [38] and subsequently 4.2. Image Deblurring Results
apply this extractor to map captured blur embeddings onto
To evaluate the performance of the Blur2Blur mechanism,
the sharp image set S, creating the known-blur image set K.
we defined three data configurations. Each configuration
The blurry images in the test data of the source dataset are
consists of the Known-Blur dataset K, sourced from the Go-
used to evaluate image deblurring algorithms. The statistics
Pro dataset, and two unpaired datasets, B and S, derived
of source image sets are reported in Tab. 1.
from the training partitions of the deblurring dataset REDS,
Implementation Details. We implemented the blur-to-blur RSBlur, or RB2V Street. For a comprehensive evaluation
translation network G using MIMO-UNet [4] with the de- of the Blur2Blur model, we integrated it with two super-
fault configuration in Pix2Pix [9] implementation. For all ex- vised image deblurring backbones, Restormer and NAFNet.
periments, we set the hyper-parameters λrec = 0.8, λgrad = Additionally, we compared the results with state-of-the-art
0.005 and batch size of 16. To enhance our understanding of baselines. The quantitative results are summarized in Tab. 2.
the network G during its initial iterations, we sorted images As observed, both unsupervised image deblurring and
based on their blur degree, determined by the variance of the generalized deblurring approaches, despite expecting gen-
BSRGAN RSBlur Syn. Data GoPro GoPro+B2B
Blur CycleGAN DualGAN + NAFNet + NAFNet (NAFNet) (NAFNet) (NAFNet) Sharp
RB2V_Street
REDS
RSBlur
Figure 3. Comparing image deblurring results on three benchmark datasets with NAFNet. Due to space limit, we skip the results with
Restormer backbone, which is similar but slightly worse than those with NAFNet. Best viewed when magnified on a digital display.
eralization power, exhibit poor performance on these chal- RB2V Street REDS RSBlur
lenging real-world datasets. Their scores are similar to, and NAFNet [3]
sometimes significantly lower than, state-of-the-art super- w/ GoPro 24.78 / 0.714 25.80 / 0.880 26.33 / 0.790
w/ Synthetic Data 22.10 / 0.644 25.07 / 0.853 23.53 / 0.659
vised methods such as Restormer and NAFNet. In con- w/ Blur2Blur (GoPro) 26.98 / 0.812 28.11 / 0.893 29.00 / 0.857
trast, Blur2Blur demonstrates remarkable deblurring results. w/ the source domain* 28.72 / 0.883 29.09 / 0.927 33.06 / 0.888
When combined with Restormer, Blur2Blur helps to in- Restormer [42]
w/ GoPro 23.34 / 0.698 25.43 / 0.775 25.98 / 0.788
crease the PSNR score by 2.63 dB on RB2V Street, 2.12 dB w/ Synthetic Data 23.78 / 0.655 24.76 / 0.753 23.34 / 0.651
on REDS, and 2.91 dB on RSBlur. When combined with w/ Blur2Blur (GoPro) 25.97 / 0.750 27.55 / 0.885 28.89 / 0.850
NAFNet, it provides consistent score increases, with 2.20 w/ the source domain* 27.43 / 0.849 28.23 / 0.916 32.87 / 0.874
dB on RB2V Street, 2.31 dB on REDS, and 2.67 dB on RS- Generalized Deblurring
Blur. NAFNet outperforms Restormer overall, making the BSRGAN [43] 23.31 / 0.645 26.39 / 0.803 27.11 / 0.810
RSBlur [29] 23.42 / 0.603 26.32 / 0.812 26.98 / 0.798
combination of NAFNet and Blur2Blur the most effective ap-
Unpaired Training
proach. Moreover, our method comes close to matching the CycleGAN [46] 21.21 / 0.582 23.92 / 0.775 23.34 / 0.782
best results of supervised models trained on source datasets. DualGAN [40] 21.02 / 0.556 23.50 / 0.700 22.78 / 0.704
We provide a qualitative comparison between image de- Table 2. Comparison of different deblurring methods on various
blurring results in Fig. 3. The comparison highlights a sig- datasets. For each test, we report PSNR↑ and SSIM↑ scores as
evaluation metrics. The best scores are in bold and the second
nificant performance disparity between supervised methods
best score are in underline. For a supervised method, NAFNet or
and their counterparts. Unsupervised methods like Dual- Restormer, we assess its upper-bound of deblurring performance
GAN and CycleGAN struggle notably in deblurring, with by training it on the training set of the source dataset*.
DualGAN particularly unable to navigate the blur-to-sharp
domain, tending instead to bridge the content and color dis- Ratio B : S 5:5 6:4 7:3 8:2 9:1
tribution gap between the blurry (B) and sharp (S) datasets.
GoPro–RB2V Street 26.02 26.98 26.92 25.98 24.32
Synthesis-based methods such as BSRGAN and RSBlur also
GoPro–REDS 27.53 28.11 28.10 27.00 26.43
fall short, failing to address unseen blurs, indicating the limi-
tations of augmentation strategies, including those using the Table 3. PSNR debluring results with different Blur-to-Sharp ratios.
imgaug library. Supervised method NAFNet fails to handle
4.3. Blur2Blur Visualization
unseen blurs, often yielding output mostly identical to the
blurred inputs. However, our method effectively transforms Fig. 4a provides a comparative visualization between the
unknown blurs into known ones. Our translation process suc- original blurry image and its Blur2Blur converted images us-
cessfully focuses on the blur kernel, minimizing bias from ing the same source and target datasets as detailed in Sec. 4.2.
other image characteristics. By integrating Blur2Blur with As can be seen, our transformed images effectively adopt the
NAFNet, we achieve a substantial recovery of high-quality blur pattern of the GoPro dataset, noted for its low sampling
sharp images, demonstrating the practical strength of our rate blur (as further shown in Fig. 4b), while preserving other
approach. Additional qualitative results for Restormer are content elements identical to the input. This demonstrates the
provided in the supplementary material. Blur2Blur conversion’s capability to produce transformed
RB2V_Street REDS RSBlur GoPro Examples Blur RSBlur RSBlur + B2B
Original
Blur
Converted
Image
(a) (b)
Figure 4. (a) A comparison of original images and their correspond-
ing Blur2Blur converted version; (b) Selected examples demon- Figure 6. Results of using Blur2Blur on the WritingHands dataset.
strating the GoPro dataset’s blur pattern (Zoom for best view).
Blur GoPro REDS RSBlur sung Galaxy Note 10 Plus. This dataset includes videos
with motion-induced blur, refined through post-processing
to remove other blur types, and clear, sharp videos recorded
at 60fps. Over two hours, a variety of scenes and motions
were captured, producing 12 blurry and 11 sharp video clips,
each between 30 and 40 seconds long.
In deblurring PhoneCraft images, we used well-known
blur datasets GoPro, REDS, and RSBlur. Results in Fig. 5
show Blur2Blur significantly improved image clarity over
pre-trained models, especially with RSBlur’s complex blur
patterns. The NIQE [22] scores of the deblurred images
GoPro + B2B REDS + B2B RSBlur + B2B
transformed by Blur2Blur, using the GoPro, REDS, and
Figure 5. Qualitative comparison of deblurring models on the
PhoneCraft dataset with multiple target datasets. RSBlur as source datasets are 9.8, 9.2, and 8.8, respec-
tively. For NIQE score, lower is better, and this demonstrates
images that faithfully reflect the specific blur pattern while Blur2Blur’s ability to handle real-world blurs effectively.
preserving the original content details. In our second scenario, we explored a webcam-based
4.4. Ablation Study for Blur to Sharp Ratio application for monitoring hand movements during writing
exercises, aimed at assisting in rehabilitation therapy. The
We evaluate the significance of the blur-to-sharp ratio, rep- challenge here is motion blur, which complicates hand and
resented as the ratio between datasets Unknown-Blur (B) object tracking. To test our approach, we created a dataset
and Unknown-Sharp (S). Specifically, we consider NAFNet named WritingHands with four 30fps webcam-recorded
as the image deblurring backbone, and consider the GoPro- videos, each about 40s long. From these, two videos pro-
RB2V Street and GoPro-REDS dataset settings, where Go- vided over 1100 frames with motion blur for training, and
Pro represents our target camera device for which we have one video offered sharp reference images. Leveraging in-
tailored a deblurring model. We conducted B2B experiments sights from the PhoneCraft dataset, we used the RSBlur
across a range of ratios from 5:5 to 9:1. The deblurring result dataset and its pre-trained NAFNet model for a two-day
in Tab. 3 demonstrates that a greater proportion of blurry training session. Results, shown in Fig. 6, indicate that while
images in the dataset, as seen in the 6:4 and 7:3 ratios, allows RSBlur’s model alone leaves some blur, integrating it with
for a deeper understanding of the blur patterns characteristic Blur2Blur significantly restores the image’s sharpness.
of the target device, leading to improved deblurring perfor-
mance. However, excessively few sharp images, as in the 9:1
5. Conclusions
ratio, may cause the Blur2Blur method to overfit to limited
sharp content. To balance learning and prevent overfitting, a We have proposed Blur2Blur, an effective approach to ad-
6:4 ratio has been selected for all experiments in this study. dress the practical challenge of adapting image deblurring
techniques to handle unseen blur. The key is to learn to con-
4.5. Practicality Evaluation
vert an unknown blur to a known blur that can be effectively
We evaluate the practicality of Blur2Blur in two imagined deblurred using a deblurring network specifically trained to
yet realistic scenarios. The first scenario involved a user handle the known blur. Throughout extensive experiments
desiring a deblurring algorithm for images taken with their on synthetic and real-world benchmarks, Blur2Blur consis-
smartphone camera. To facilitate this, we compiled a dataset tently exhibited superior performance, delivering impressive
named PhoneCraft, featuring images captured using a Sam- quantitative and qualitative outcomes.
References Wang. Deblurgan-v2: Deblurring (orders-of-magnitude)
faster and better. In Proceedings of the International Confer-
[1] Tony F Chan and Chiu-Kwong Wong. Total variation blind ence on Computer Vision, 2019.
deconvolution. IEEE Transactions on Image Processing, 7 [17] Anat Levin, Yair Weiss, Fredo Durand, and William T Free-
(3):370–375, 1998. man. Understanding and evaluating blind deconvolution algo-
[2] Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- rithms. In Proceedings of the IEEE Conference on Computer
peng Chen. Hinet: Half instance normalization network for Vision and Pattern Recognition. IEEE, 2009.
image restoration. In Proceedings of the IEEE Conference on [18] Chih-Hung Liang, Yu-An Chen, Yueh-Cheng Liu, and Win-
Computer Vision and Pattern Recognition, 2021. ston H Hsu. Raw image deblurring. IEEE Transactions on
[3] Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Multimedia, 24:61–72, 2020.
Simple baselines for image restoration. In Proceedings of the [19] Guangcan Liu, Shiyu Chang, and Yi Ma. Blind image deblur-
European Conference on Computer Vision, 2022. ring using spectral properties of convolution operators. IEEE
[4] Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, Transactions on Image Processing, 23(12):5047–5056, 2014.
and Sung-Jea Ko. Rethinking coarse-to-fine approach in [20] Boyu Lu, Jun-Cheng Chen, and Rama Chellappa. Unsuper-
single image deblurring. In Proceedings of the International vised domain-specific deblurring via disentangled representa-
Conference on Computer Vision, 2021. tions. In Proceedings of the IEEE Conference on Computer
[5] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Vision and Pattern Recognition, 2019.
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and [21] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa.
Yoshua Bengio. Generative adversarial networks. Communi- Mprnet: Multi-path residual network for lightweight image
cations of the ACM, 63(11):139–144, 2020. super resolution. In Proceedings of the IEEE/CVF Winter
[6] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Conference on Applications of Computer Vision, 2021.
Dumoulin, and Aaron C Courville. Improved training of [22] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak-
wasserstein gans. Advances in Neural Information Processing ing a “completely blind” image quality analyzer. IEEE Signal
Systems, 30, 2017. processing letters, 20(3):209–212, 2012.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [23] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep
Deep residual learning for image recognition. In Proceedings multi-scale convolutional neural network for dynamic scene
of the IEEE Conference on Computer Vision and Pattern deblurring. In Proceedings of the IEEE Conference on Com-
Recognition, 2016. puter Vision and Pattern Recognition, 2017.
[8] Michal Hradivs, Jan Kotera, Pavel Zemcik, and Filip [24] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep
vSroubek. Convolutional neural networks for direct text multi-scale convolutional neural network for dynamic scene
deblurring. In Proceedings of the British Machine Vision deblurring. In Proceedings of the IEEE Conference on Com-
Conference, 2015. puter Vision and Pattern Recognition, 2017.
[9] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. [25] Seungjun Nah, Radu Timofte, Sungyong Baik, Seokil Hong,
Image-to-image translation with conditional adversarial net- Gyeongsik Moon, Sanghyun Son, and Kyoung Mu Lee. Ntire
works. In Proceedings of the IEEE Conference on Computer 2019 challenge on video deblurring: Methods and results.
Vision and Pattern Recognition, 2017. In Proceedings of the IEEE/CVF Conference on Computer
[10] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual Vision and Pattern Recognition Workshops, 2019.
losses for real-time style transfer and super-resolution. In [26] Bang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang
Proceedings of the European Conference on Computer Vision, Nguyen, and Minh Hoai. Hypercut: Video sequence from a
2016. single blurry image using unsupervised ordering. In Proceed-
[11] Alexander B. Jung, Kentaro Wada, Jon Crall, Satoshi Tanaka, ings of the IEEE Conference on Computer Vision and Pattern
Jake Graving, Christoph Reinders, Sarthak Yadav, Joy Baner- Recognition, 2023.
jee, Gábor Vecsei, Adam Kraft, Zheng Rui, Jirka Borovec, [27] Dongwei Ren, Kai Zhang, Qilong Wang, Qinghua Hu, and
Christian Vallentin, Semen Zhydenko, Kilian Pfeiffer, Ben Wangmeng Zuo. Neural blind deconvolution using deep pri-
Cook, Ismael Fernández, Franccois-Michel De Rainville, Chi- ors. In Proceedings of the IEEE Conference on Computer
Hung Weng, Abner Ayala-Acevedo, Raphael Meudec, Matias Vision and Pattern Recognition, 2020.
Laporte, et al. imgaug. https://github.com/aleju/ [28] Jaesung Rim, Haeyun Lee, Jucheol Won, and Sunghyun Cho.
imgaug, 2020. Online; accessed 01-Feb-2020. Real-world blur dataset for learning and benchmarking deblur-
[12] Diederik P. Kingma and Jimmy Ba. Adam: A method for ring algorithms. In Proceedings of the European Conference
stochastic optimization. CoRR, abs/1412.6980, 2014. on Computer Vision, 2020.
[13] Dilip Krishnan and Rob Fergus. Fast image deconvolution [29] Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee,
using hyper-laplacian priors. Advances in Neural Information Seungyong Lee, and Sunghyun Cho. Realistic blur synthesis
Processing Systems, 22:1033–1041, 2009. for learning image deblurring. In Proceedings of the European
[14] Dilip Krishnan, Terence Tay, and Rob Fergus. Blind deconvo- Conference on Computer Vision, 2022.
lution using a normalized sparsity measure. In Proceedings [30] Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee,
of the IEEE Conference on Computer Vision and Pattern Seungyong Lee, and Sunghyun Cho. Realistic blur synthesis
Recognition. IEEE, 2011. for learning image deblurring. In Proceedings of the European
[15] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Conference on Computer Vision, 2022.
Dmytro Mishkin, and Jivr´i Matas. Deblurgan: Blind motion [31] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net:
deblurring using conditional adversarial networks. In Pro- Convolutional networks for biomedical image segmentation.
ceedings of the IEEE Conference on Computer Vision and In Proceedings of the International Conference on Medical
Pattern Recognition, 2018. Image Computing and Computer Assisted Intervention, 2015.
[16] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang [32] Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen,
Haibin Ling, Tingfa Xu, and Ling Shao. Human-aware mo- Conference on Computer Vision, 2017.
tion deblurring. In Proceedings of the International Confer-
ence on Computer Vision, 2019.
[33] K Simonyan and A Zisserman. Very deep convolutional
networks for large-scale image recognition. In Proceedings
of International Conference on Learning and Representation,
2015.
[34] Maitreya Suin, Kuldeep Purohit, and AN Rajagopalan.
Spatially-attentive patch-hierarchical network for adaptive
motion deblurring. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2020.
[35] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya
Jia. Scale-recurrent network for deep image deblurring. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2018.
[36] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya
Jia. Scale-recurrent network for deep image deblurring. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2018.
[37] Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang
Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, and
Yihui Ren. Rethinking cyclegan: Improving quality of gans
for unpaired image-to-image translation. arXiv preprint
arXiv:2303.16280, 2023.
[38] Phong Tran, Anh Tuan Tran, Quynh Phung, and Minh Hoai.
Explore image deblurring via encoded blur kernel space. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2021.
[39] Andrey Vakunov, Chuo-Ling Chang, Fan Zhang, George
Sung, Matthias Grundmann, and Valentin Bazarevsky. Medi-
apipe hands: On-device real-time hand tracking. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops, 2020.
[40] Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dualgan:
Unsupervised dual learning for image-to-image translation.
In Proceedings of the International Conference on Computer
Vision, 2017.
[41] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar
Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling
Shao. Multi-stage progressive image restoration. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2021.
[42] Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu-
nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang.
Restormer: Efficient transformer for high-resolution image
restoration. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2022.
[43] Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo-
fte. Designing a practical degradation model for deep blind
image super-resolution. In Proceedings of the International
Conference on Computer Vision, 2021.
[44] Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Yi
Yang, and Meng Wang. Fcl-gan: A lightweight and real-
time baseline for unsupervised blind image deblurring. In
Proceedings of the 30th ACM International Conference on
Multimedia, 2022.
[45] Zhihang Zhong, Ye Gao, Yinqiang Zheng, and Bo Zheng.
Efficient spatio-temporal recurrent neural network for video
deblurring. In Proceedings of the European Conference on
Computer Vision, 2020.
[46] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.
Unpaired image-to-image translation using cycle-consistent
adversarial networks. In Proceedings of the International
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring
on Unknown Domains
Supplementary Material
Abstract clearly surpasses the standard UNet, even in its modified
form. Moreover, the results also reveal that the NAFNet
In this supplementary PDF, we first provide the qualita- does not perform as well as the multi-scale variants, high-
tive results obtained by methods with the Restormer back- lighting the importance of multi-scale level optimization in
bone [42] and some additional qualitative results of each the Blur2Blur framework for deblurring tasks.
dataset to show our effectiveness in deblurring unknown-
blur images compared to other baselines. Next, we illustrate Backbone PSNR↑ SSIM↑
the performance with different backbones for the Blur2Blur
UNet [31] 22.54 0.732
translator model. Finally, we provide details of our collected
MIMO-UNet [4] 26.98 0.812
PhoneCraft dataset and validate the video deblurring perfor- NAFNet [3] 20.54 0.686
mance of Blur2Blur, demonstrating significant enhancements
in hand movement visualization and thus leaving room for Table 4. Ablation studies with the Blur2Blur backbone.
practical application. We also include our code and a video
of sample deblurring results in the supplementary package.
7.2. Impact of Sataset Size.
6. Additional Qualitative Results In Tab. 5, we validate the deblurring performance using
GoPro-RB2V datasets, maintaining a fixed B:S ratio of 6:4
6.1. Restormer model while varying the dataset size across four different scales of
In Fig. 3 in the main paper, we omit the results with the the target dataset (α).
Restormer backbone due to the space limit. We provide
these results in this supplementary in Fig. 7. As can be seen, α 0.25 0.5 0.75 1.0
Restormer shows behavior similar to NAFNet. The orig- PSNR 25.45 25.93 26.32 26.98
inal network produces blurry images that are close to the
input images. However, when combined with Blur2Blur, it Table 5. Affect of data size
can successfully deblur the images and produce sharper out-
puts. From quantitative numbers, Restormer-based models
perform slightly worse than the NAFNet-based counterparts. 7.3. Validation on the blur converter
6.2. Additional Deblurring Results To evaluate the effectiveness of our blur converter, we do
two classification experiments to determine the alignment
In this section, we provide additional qualitative figures of converted images with the KnownBlur domain (K), using
comparing the image deblurring results of our Blur2Blur GoPro-RB2V settings as detailed in the main paper. The first
and other baselines. Figures 8, 9, and 10 show samples (Acc1) used the pretrained Discriminator from our Blur2Blur
where K is built upon the GoPro dataset [24], with the Un- framework, assessing if converted images belongs to (K).
known set derived respectively from the REDS dataset [25], For the second (Acc2), we synthesized a new dataset via
RB2V Street [26], and RSBlur [30]. the Blur Kernel Extractor F [38] using sharp GoPro images
combined with blur kernels from the target datasets. We then
7. Blur2Blur Analysis trained a ResNet18 [7] to discern if the blur in converted
7.1. Backbone Experiments images corresponded to (K) or not. In both experiments, our
method converts the input image to have the target known
We explore the integration of multi-scale architectures into blur with near 100% accuracy (Tab. 6).
the Blur2Blur mechanism by experimenting with different
backbones. The UNet architecture [31] has been adapted to
Model Input Converted
handle inputs at various scales, allowing for a more nuanced
understanding of blur at multiple scales. Concurrently, we Acc1/Acc2(%) 11.24 / 6.85 86.67 / 96.53
employed the NAFNet backbone in its original form, taking
advantage of its robust feature extraction capabilities without Table 6. Blur converter validation
modifications. The result on Tab. 4 shows that MIMO-UNet
Syn. Data GoPro GoPro+B2B
Blur (Restormer) (Restormer) (Restormer) Sharp
RB2V_Street
REDS
RSBlur
Blur
CycleGAN
DualGAN
BSRGAN
+ NAFNet
RSBlur
+ NAFNet
Syn. Data
(Restormer)
GoPro
(Restormer)
GoPro+B2B
(Restormer)
Sharp
Blur
CycleGAN
DualGAN
BSRGAN
+ NAFNet
RSBlur
+ NAFNet
Syn. Data
(NAFNet)
GoPro
(NAFNet)
GoPro+B2B
(NAFNet)
Sharp
Blur
BSRGAN
+ NAFNet
RSBlur
+ NAFNet
GoPro
(Restormer)
GoPro+B2B
(Restormer)
GoPro
(NAFNet)
GoPro+B2B
(NAFNet)
Sharp