0% found this document useful (0 votes)
12 views15 pages

Blur 2 Blur

The document introduces Blur2Blur, a novel framework for unsupervised image deblurring that transforms blurry images from unknown domains into known blur patterns, facilitating improved deblurring performance. This method leverages unpaired data from specific camera devices, allowing for the customization of deblurring algorithms without the need for complex paired datasets. Experimental results demonstrate that Blur2Blur significantly outperforms existing state-of-the-art methods in both quantitative and qualitative assessments of image deblurring.

Uploaded by

jack.kang0808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views15 pages

Blur 2 Blur

The document introduces Blur2Blur, a novel framework for unsupervised image deblurring that transforms blurry images from unknown domains into known blur patterns, facilitating improved deblurring performance. This method leverages unpaired data from specific camera devices, allowing for the customization of deblurring algorithms without the need for complex paired datasets. Experimental results demonstrate that Blur2Blur significantly outperforms existing state-of-the-art methods in both quantitative and qualitative assessments of image deblurring.

Uploaded by

jack.kang0808
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Blur2Blur: Blur Conversion for Unsupervised Image Deblurring

on Unknown Domains

Bang-Dang Pham1 Phong Tran2 Anh Tran1 Cuong Pham1,3 Rang Nguyen1 Minh Hoai1,4
1 2 3 4
VinAI Research, Vietnam MBZUAI, UAE Posts & Telecommunications Inst. of Tech., Vietnam University of Adelaide, Australia
{v.dangpb1, v.anhtt152, v.rangnhm, v.hoainm}@vinai.io [email protected] [email protected]
arXiv:2403.16205v1 [cs.CV] 24 Mar 2024

Deblur Deblur
Blur Translation

24.7837 dB 26.9832 dB
Unknown Blur Known Blur

Figure 1. We address the unsupervised image deblurring problem by training a blur translator that converts an input image with unknown
blur to an image with a predefined known blur. The figure shows the effectiveness of our approach. The blurry images before and after
translation (left image in each box) exhibit similar visual content but have different blur patterns (zoomed-in patches). While a standard
image deblurring technique fails to restore the unknown-blur image, it successfully recovers the known-blur version, yielding an approximate
2.2 dB increase in PSNR score (noted below each deblurred image on the right side of each box).

Abstract image deblurring method is essential in various contexts.


While the idea of deblurring images from arbitrary, di-
This paper presents an innovative framework designed verse sources sounds impressive and broadly useful, the
to train an image deblurring algorithm tailored to a spe- practical necessity, commercial value, and societal impact
cific camera device. This algorithm works by transform- of image deblurring are frequently connected to specific ap-
ing a blurry input image, which is challenging to deblur, plication scenarios and particular cameras. For example, a
into another blurry image that is more amenable to deblur- mobile phone manufacturer might focus on integrating the
ring. The transformation process, from one blurry state to
most effective deblurring algorithm for the camera types
another, leverages unpaired data consisting of sharp and
blurry images captured by the target camera device. Learn- used in their latest phone models. Similarly, a factory man-
ing this blur-to-blur transformation is inherently simpler ager might consider installing ceiling-mounted cameras to
than direct blur-to-sharp conversion, as it primarily involves identify errors on the assembly line, enhancing workforce
modifying blur patterns rather than the intricate task of efficiency. However, motion blur could significantly degrade
reconstructing fine image details. The efficacy of the pro- the performance of computer vision algorithms meant to
posed approach has been demonstrated through comprehen- detect and track workers’ hands and tools. In law enforce-
sive experiments on various benchmarks, where it signifi- ment, a police officer using a body-worn camera coupled
cantly outperforms state-of-the-art methods both quantita- with face recognition technology might find that motion blur
tively and qualitatively. Our code and data are available at hampers the accuracy of detecting faces and identifying fugi-
https://zero1778.github.io/blur2blur/ tives. Therefore, in these scenarios, the development of a
framework to customize a deblurring algorithm for specific
cameras or camera types becomes crucial and represents a
1. Introduction
significant and growing need.
Motion blur in images and videos is a common issue, often In this paper, we explore the question: How can we deblur
resulting from camera shake or rapid movement within the images captured by specific cameras? Classical deblurring
scene. Such blur can detract from the aesthetic quality of the algorithms, which use signal processing or theoretical mod-
content and may undermine the performance of downstream els of motion blur, are one option. Yet, their reliance on
computer vision applications. Consequently, an effective oversimplified blur models limits their effectiveness in ad-
dressing the complex motion blur encountered in real-world involves altering blur patterns rather than the more complex
scenarios. An alternative is a data-driven approach that task of reconstructing detailed image features.
leverages advancements in machine learning. This approach To learn the blur-to-blur mapping, we propose a novel
involves using pre-trained deblurring networks developed learning framework to leverage the collected set of blurry and
through supervised learning, as illustrated by works such sharp images as well as the blurry images from the known
as [2, 3, 15, 34, 36, 42]. These networks, trained on ex- blur domain C ′ . To train the blur-to-blur mapping network,
tensive datasets of paired images, aim to transform blurry we carefully define various loss terms, including perceptual,
images into sharp ones. However, they often suffer from adversarial, and gradient penalty terms. The details of our
overfitting and tend to underperform on novel blurred im- approach are illustrated in Fig. 1.
ages that were not captured by the cameras used to create We conducted extensive experiments to compare the ef-
their training datasets. Our empirical findings indicate that fectiveness of our model with other state-of-the-art image
the performance of these models is still unsatisfactory when deblurring approaches on both real-world and synthetic blur
confronting unseen blurs produced by real-world cameras. datasets. The results demonstrate that Blur2Blur outper-
When pre-trained networks are unsuitable, the alterna- forms other methods by a significant margin, highlighting
tive is to develop a deblurring network specifically for our its superior performance in addressing the challenges of
camera. However, this approach faces the challenge of not image deblurring in real-world settings. Notably, when com-
having access to paired training data, consisting of corre- bined with our blur translation method, supervised methods
sponding blurry and sharp images. Generating such data achieve an impressive boost up to 2.91 dB in PSNR.
typically involves a sophisticated setup with a beam splitter,
identical cameras operating at varying speeds, and capabil- 2. Related Work
ities for time synchronization, geometrical alignment, and
Many methods have been proposed for image deblurring.
color calibration [17, 18, 30]. Often, the camera targeted
Beyond classical methods that do not necessitate training
for deblurring may not meet these stringent requirements,
data, many contemporary approaches are grounded in ma-
and arranging this setup is not feasible for many. Therefore,
chine learning. Learning-based methods can be broadly
we are left with the option of utilizing unpaired data. Yet,
categorized based on their data requirements, whether it be
training on unpaired data presents its own set of challenges
paired, synthetic, or unpaired data. This section reviews
due to the lack of supervision for restoring fine details that
representative works from these categories.
are missing or distorted in the blurry input images. Existing
methods [20, 40, 44, 46], which attempt to recreate these ab- Classical Image Deblurring. Early deblurring methods
sent details, frequently fall short, particularly when dealing assume that the blur operator is linear and uniform. In
with blur typical of real-world images. other words, the blur can be approximated by a single con-
volution operator: y = x ∗ k + η, where y, x, k, and
In this paper, we introduce Blur2Blur, a novel plug-and-
η represent the blurry image, sharp image, blur kernel,
play framework that leverages pretrained deblurring models
and noise, respectively. Based on this assumption, given
to train an image deblurring algorithm specifically for a cho-
a blurry image y, the sharp image x and the blur kernel
sen camera device. Similar to other unsupervised deblurring
k can be obtained by maximizing the posterior distribu-
methods, we utilize unpaired data. More precisely, we use
tion: x∗ , k ∗ = argmaxx,k P (x, k|y)P (x)P (k). Traditional
the target camera to capture a set of blurry images and sharp
methods primarily focus on finding prior distributions for
images, without requiring a one-to-one correspondence be-
either x [1, 8, 13, 14] or k [17, 19, 27]. However, these meth-
tween the images in these two sets. This approach makes
ods generalize poorly to real-world blurry images because
data collection relatively simple and straightforward. Our
blur kernels are often non-uniform and non-linear.
method diverges from existing unsupervised methods by not
attempting to directly learn a function from the domain of Supervised learning with paired data. Going beyond the
blurry images captured by our camera (the unknown blur assumptions of uniformity and linearity, several deep deblur-
domain, denoted as C), to the domain of sharp images. In- ring neural networks have been proposed [3, 15, 16, 21, 35,
stead, our strategy involves first learning a mapping G from 41], demonstrating promising results. These networks are
the domain C to another domain C' of blurry images, where typically trained on large-scale datasets containing pairs of
deblurring techniques are already well-established. To de- blurry and sharp images. The distinguishing factors among
blur an image taken by our camera, we first convert it into these works primarily lie on their architectural designs. For
an image in C' using the learned mapping G, then apply a instance, Tao et al. [35] introduced a multi-scale recurrent
pre-trained network to deblur this transformed image. Conse- network architecture specifically tailored for image deblur-
quently, our primary goal is to learn the blur-to-blur mapping ring. Other methods [4, 24] leveraged a coarse-to-fine strat-
from C to C' , which is inherently less challenging than the egy, utilizing multi-scale inputs to incrementally refine the
direct blur-to-sharp mapping, because the former primarily deblurring process. Kupyn et al. [15] was the first to incorpo-
rate GAN-based loss into the image deblurring framework, Unupservised learning with unpaired data. Another ap-
aiming to enhance the realism of deblurred images. Mean- proach to address the overfitting problem is through unpaired
while, Zamir et al. [41] proposed a multi-stage framework deblurring [20, 40, 44, 46]. Unlike supervised methods,
that breaks down the image restoration task into smaller, these techniques do not require paired sharp and blurry im-
more manageable stages. Lastly, Chen et al. [3] reduced the ages for training. However, they often face limitations, such
complexity both between and within network blocks based as being domain-specific [20], or making low-level statisti-
on UNet [31] architecture. cal assumptions about blur operators [44], which may not
In supervised learning, training convolutional networks be valid for real-world blurry images. To facilitate domain
effectively requires extensive datasets comprising both sharp adaptation between blurred and sharp images, other methods
and blurry image pairs. Acquiring these datasets can be a [40, 46] have been explored. However, these approaches
complex and lengthy process, often necessitating advanced struggle to bridge the gap between these domains effectively
hardware and careful setup. Recent studies [26, 28, 45] have due to (1) the significant variation in the degree of blur across
introduced real-world deblurring datasets created using a different images, which affects the perceived semantics of
dual-camera system, consisting of a high-speed and a low- the objects within, and (2) the complex and unpredictable
speed camera, synchronized and aligned precisely with a nature of real-world blur patterns, often contradicting the
time trigger and a beam splitter. This method ensures the simplistic assumptions used in these models. Consequently,
collection of perfectly matched pairs of blurry and sharp im- the challenge of achieving truly blind image deblurring re-
ages. Nonetheless, a limitation arises as deblurring networks mains unsolved.
trained on these specific datasets may become too tailored to Considering these limitations, our Blur2Blur approach
the characteristics of the cameras used, resulting in reduced is centered around the innovative idea of blur kernel trans-
performance when applied to images from different cameras. fer. This involves transforming the blur kernel from any
Moreover, the dual-camera system is an advanced setup, particular camera into a familiar blur kernel from a dataset
requiring specific camera types that meet certain criteria, or camera that has a strong, pre-trained deblurring model.
which means not all cameras are suitable for this purpose. This method enables us to utilize the benefits of supervised
techniques within an unsupervised framework, effectively
Supervised learning with synthesized data. One com- tackling the challenge of deblurring images with a wide
mon approach for synthesizing blurry images is to average range of unknown blur distributions.
multiple consecutive sharp frames from a video sequence
[24, 25]. Although this method mimics the way blurry im- 3. Methodology
ages are captured, it has been demonstrated that models 3.1. Approach Overview
trained on these datasets often underperform when tested on
real-world blurry images [29, 38]. Recent studies have pro- We formulate a blurry image y as a function of the corre-
posed more advanced techniques to synthesize deblurring sponding sharp image x through a blur operator FC (·, k),
datasets, aiming to improve the generalization of models which is associated with a device-dependent blur domain C
trained on these datasets to unseen blur. For instance, Zhang and a blur kernel k:
et al. [43] created a synthesized dataset by combining mul-
tiple types of degradation operators initially developed for y = \mathcal {F}_C(x, k) + \eta , \label {eq:blur} (1)
the super-resolution task. Rim et al. [29] compared real and where η is a noise term. Our task is to find a deblurring
synthetic blurry images to design a more realistic blur syn- function GC ∗
that can recover the sharp image from the blurry
thesis pipeline. However, as demonstrated in Sec. 4.2, the ∗
input, i.e., GC (y) = x.
degradation augmentation in [43] significantly impairs the One strategy is to utilize an existing, pre-trained deblur-
quality of input images, leading to distorted outputs. On ring network to approximate the desired function GC ∗
, and
the other hand, models trained on the synthesized deblurring then use it for deblurring. However, this approach often
dataset in [29] exhibit signs of overfitting to the training data. leads to unsatisfactory results. The pre-trained network is
One promising direction was to leverage the known rela- generally trained on a dataset from a camera with a unique
tionship between blurry and sharp image pairs from existing blur space C ′ , which is likely to be different from the blur
datasets [38]. This method involves capturing the blur distri- space C of our camera. In essence, this would mean approx-
∗ ∗
bution characteristic of each pair, which can then be applied imating GC with GC ′ , an approach that is not ideal due to

to construct a synthesized blurred dataset. Inspired by the the differences between C and C ′ , resulting in suboptimal
effectiveness of this strategy in capturing blur attributes from deblurring performance.
the known dataset, Blur2Blur adopts this approach. It is de- When a pre-trained network is not a good choice, our
signed to discern and retain the blur kernel while selectively remaining option is to train a new deblurring network tai-
ignoring the camera-specific attributes of the target dataset. lored to our camera. The obstacle here is that the specific
blur space C of our device is unknown, and we cannot rely In the remaining of this section, we will discuss two
on having paired training data of corresponding blurry and main components of our method, including the blur-to-blur
sharp images. Paired training data requires a complex hard- translation network G and the target blur space C ′ .
ware setup, involving a beam splitter, along with identical
devices capturing at different speeds, and the capability for 3.2. Blur-to-blur translation
time synchronization, geometrical alignment, and color cal- Our objective here is to train a blur-to-blur translation net-
ibration. Not all camera devices meet these requirements, work G, capable of converting any blurry image from the
and setting up such a system is beyond the expertise of many. unknown blur domain C to a known blur domain C ′ while
Consequently, our only feasible option is to use unpaired preserving the image content. To train G, we require two
data. Fortunately, we can access the camera device to cap- datasets: B, which consists of blurry images from the un-
ture sets of blurry images B and sharp images S, which are known blur domain, and K contains images with known blur,
unpaired and do not necessitate correspondence between for which a deblurring model has already been trained. We
images in B and images in S. Thus, gathering these datasets design G to work at multiple scales and carefully design the
is relatively easy and straightforward. The downside, how- training losses to achieve the desired outcome.
ever, is that learning from unpaired data is challenging. The
deblurring process, which transforms a blurred image y into Adversarial Loss. We employ an adversarial loss [5] to en-
a sharp image x, typically requires an understanding of the force the translation network G to produce images with the
blurring domain C. For unpaired data, this necessity poses desired target blur. To achieve this, we introduce a discrim-
a significant hurdle, especially in reconstructing fine details inator network D, which is responsible for distinguishing
absent or distorted in the blurred input. Traditional deblur- between real images from the known blur domain and gener-
ring networks [20, 40, 44, 46], attempting to ‘hallucinate’ ated images. Two networks G and D are trained alternately
these missing details, often produce unsatisfactory results, in a minimax game. The adversarial loss is defined as:
particularly with images affected by real-world blurring.
In this section, we introduce an innovative method to \begin {split} \mathcal {L}_{adv}(G, D) = \; & \mathbb {E}_{y \sim \mK }[\log D(y)] \\ & + \mathbb {E}_{y \sim \mB }[\log (1 - D(G(y)))]. \end {split}
∗ (3)
learn GC . Rather than directly learning this function, which
is extremely challenging, or roughly approximating it using
∗ ∗
a function learned for another blur domain GC ′ , we treat GC The blur translation network is trained to minimize the above

as a composition of GC ′ and a translation function G, i.e., loss term, while the discriminator D is trained to maximize
∗ ∗
GC = GC ′ ◦ G. Our goal then shifts to learning G to bridge it. We also force the Lipschitz continuity constraint on the
the gap between domains C and C ′ . discriminator using the gradient penalty regularization [6]:
More specifically, our task is to learn a mapping function
G that maps each blurry input image y defined in Eq. (1) to \mathcal {L}^D_{grad}(D) = \mathbb {E}_{\hat {y} \sim \hat {\mB }}[(\|\nabla _{\hat {y}} D(\hat {y})\|_2 - 1)^2], (4)
an image y ′ with the same sharp visual representation x but
belongs to a known blur distribution C ′ : where B̂ is the set of samples ŷ randomly interpolated
between a real image y ∈ B and the generated image
G: y \rightarrow y', \textrm {where } y' = \mF _{C'}(x, k') + \eta '. \label {eq:G} (2) G(y) using a random mixing ratio ϵ ∈ [0, 1], i.e., ŷ =
ϵy + (1 − ϵ)G(y).
Our approach breaks a complex task into two manage-
able ones. One task requires deblurring from C ′ , which, Reconstruction Loss. Given a blurry image y, the desired
while challenging, benefits from existing research. We can function G should translate the blur characteristics from C

select a well-performing pre-trained network GC ′ , which has to C ′ while maintaining the elements belonging to the sharp
been trained with supervised learning using paired data in image x. Using the adversarial loss helps translate the im-
its domain. The other task is to learn a translation from an age to the target blur domain but does not guarantee sharp
unknown blur domain C to a known domain C ′ . The diffi- content preservation. Hence, we integrate a reconstruction
culty of this task depends on the differences between C and loss to enforce the visual consistency between the generated
C ′ , yet it is surely easier than directly learning a mapping blurry image G(y) and the original image y. This loss term
from C to a sharp domain. This is because a blur-to-blur has two benefits: (1) it prevents G from modifying the im-
transformation primarily modifies the blur patterns, avoiding age content and only focuses on the blur kernel translation,
the need to reconstruct intricate image details. Moreover, we and (2) it provides additional supervision to our network,
have the flexibility to choose the most appropriate C ′ and enhancing the training stability. Moreover, to make G focus

GC ′ for our specific blur domain. This flexibility extends on preserving the input semantic content rather than being
to the possibility of utilizing synthetic data, which allows overly constrained by pixel-wise accuracy, we (1) employ
for the generation of extensive datasets, ensuring that the perceptual loss [10] instead of the common L1 or L2 loss
deblurring network is thoroughly trained. function and (2) adopt a multi-scale deblurring architecture
Unknown-Sharp
ℒ!"# ℒ$%&
images

e Blur
ur
pt
Ca Translator
Unknown-blur images
Unknown-Blur Converted Known-Blur
image image images
Ca
ptu Blur Kernel
re
Deblurring Extractor
Model
Unknown-sharp images
Blur Kernel
Transfer
Known-blur images
Deblurring Training
Model
Blur Sharp
Deblurred image
Blur-Sharp images images images

(a) Unknown- and known-blur datasets (b) Blur translation


Figure 2. Overview of our problem and proposed method. a) Given a camera, we aim to develop an algorithm to deblur its captured
blurry images. We assume access to the camera to collect unpaired sets of blurry images (B) and sharp image sequences (S). b) The
key component in our proposed system is a blur translator that converts unknown-blur images captured by the camera to have the target
known-blur presented in K. This translator is trained using reconstruction and adversarial losses. The converted images have known blur and
can be successfully deblurred using the previously trained deblurring model (Zoom for best view).

[4] to reconstruct the image content from coarse to fine: images. It can cause G to either fail to converge or introduce
undesired characteristics from the representative dataset K
into the transferred outcomes.
\mathcal {L}^G_{rec}(G) = \frac {1}{M} \sum _{i=1}^M \frac {1}{t_i}\mathbb {E}_{y_i \sim \mB }[||\phi (y_i) - \phi (G(y_i))||_1], (5)
To avoid this issue, we propose generating images in K
from a set of sharp images S captured with the same camera
where M is the number of levels, yi is the input image at as B, thus sharing identical characteristics. These images
scale level i, ϕ(.) is a pre-trained feature extractor with the are then augmented by blur kernels from a known domain,
VGG19 backbone [33]. We divide the loss by the number of characterized by a dataset of blurry-sharp image pairs using
total elements ti for normalization. the blur transfer technique [38]. The blurry-sharp image pair
Total Loss. Our final objective function for G combines the dataset can be selected from commonly used image deblur-
adversarial and reconstruction loss terms: ring datasets like REDS [25], GOPRO [24], RSBlur [30],
and RB2V [26], and we can utilize any deblurring network
\mathcal {L}^G_{total}(G, D) = \mathcal {L}_{adv}(G, D) + \lambda _{rec}\mathcal {L}_{rec}(G), (6) pre-trained on that dataset. A key component in [38] is a
Blur Kernel Extractor F that can isolate and transfer blur
where λrec is the weight factor for the reconstruction loss, kernels from random blurry-sharp image pairs to the target
ensuring the input content is maintained. Concurrently, the sharp inputs. After applying this blur synthesis procedure,
objective function for D is established as follows: we obtain a known-blur image set K that carries blur ker-
nels from the known-blur domain while maintaining other
\begin {split} \mathcal {L}^D_{total}(G, D) = - \mathcal {L}_{adv}(G, D) + \lambda _{grad}\mathcal {L}_{grad}(D). \end {split} (7) camera-based characteristics similar to the unknown-blur
images in B. Consequently, the discriminator can focus on
Here λgrad is a hyperparameter that controls the importance
distinguishing based on blur kernels, facilitating effective
of the gradient penalty loss component.
blur-to-blur translation training. The overview problem and
3.3. Known Blur Selection pipeline of our method is illustrated in Fig. 2.
The choice of C ′ and its representative dataset K is impor- 4. Experiments
tant because the difficulty of learning the blur translation
network depends on the discrepancy between the two blur 4.1. Experimental Setups
domains. As described in Sec. 3.2, the representative dataset 4.1.1 Datasets and implementation details
K only affects the adversarial training losses. The transla-
tion network G aims to convert images in B to have similar We evaluate our proposed method on four datasets. REDS
blur characteristics as images in K so that the discriminator dataset [25] consists of 300 high-speed videos used to cre-
D cannot differentiate between the generated images and ate synthetic blur. By ramping up the frame rate from 120
the real images in K. However, if K and B have different to 1920 fps and averaging frames with an inverse Cam-
characteristics besides the blur kernel distribution, such as era Response Function (CRF), it simulates more realistic
color tone, image resolution, or device-dependent noise pat- motion blur, differentiating it from other synthetic datasets
tern, D may rely on them to differentiate real and generated [23, 32]. GoPro dataset [24] comprises 3,142 paired frames
Number of data samples responses obtained from applying the Laplacian operator. A
Dataset
U. blur (B) U. sharp (S) Test lower variance corresponds to a reduced range of intensity
RB2V Street 5400 3600 2053 changes, signaling a blurrier image with fewer edges. Ini-
REDS 14400 9600 3000 tially, we optimized approximately 50% of the data within
RSBlur 8115 5410 8301 a single batch. Subsequently, after 200K iterations, we in-
GoPro 1261 842 1111 crementally scaled this proportion to encompass the full
Table 1. Statistics of datasets used as unknown domains. batch. We evaluated Blur2Blur in combination with different
state-of-the-arts deblurring network backbones, including
NAFNet [3] and Restormer [42]. During training, we ran-
of sharp and blurred images, recorded at 240 frames per
domly cropped these images to obtain a square shape of
second. It employed a synthesis method akin to that of the
256×256 and augmented with rotation, flip and color-jitter.
REDS dataset but with a different camera response function.
All experiments were performed using the Adam opti-
We utilize this dataset as the main target data for evaluating
mizer [12]. Training our model required roughly 3 days for
deblurring methods in combination with Blur2Blur. RSBlur
1M iterations on 2 Nvidia A100 GPUs. The learning rate
dataset [30] contains 13,358 real blurred images. It pro-
is maintained constant for the first 500K iterations and then
vides sequences of sharp images alongside blurred ones for
linearly reduced during the remaining iterations as in [37].
in-depth blur analysis and offers the higher resolution than
similar datasets. Noise levels are also estimated to assess and
compare to the noise present in real-world blur scenarios. 4.1.2 Baselines
In this paper, for the evaluation on the publicly available
RSBlur dataset, we utilized the official dataset alongside its We compared Blur2Blur with a comprehensive list of base-
Additional set at a lower sampling rate, rather than generat- line methods from three categories: supervised methods
ing blurry images from the RSBlur sharps set using dense (NAFNet [3], Restormer [42]), unpaired training (Cycle-
sampling as described in the original paper. RB2V dataset GAN [46], DualGAN [40]), and generalized image deblur-
[26] comprises about 11,000 real-world pairs of a blurry ring (BSRGAN+NAFNet [43], RSBlur+NAFNet [29]).
image and a sharp image sequence for street categories, de- For fair comparisons, we retrained the supervised models
noted as RB2V street. Experiments on this dataset are crucial using the blur-sharp pairs from the source dataset. Further-
for confirming the effectiveness of our algorithm in handling more, to replicate real-world scenarios with the absence of
real-world, camera-specific data. paired data for deblurring network training, we generated
synthetic motion-blur data derived from the unknown-sharp
Train and test data. To address practical deblurring prob- image set S by adding motion blur synthesis techniques,
lems, our method assumes access to unpaired sets containing such as the one provided by the imgaug library [11]. This
blurry images B and sharp images S. When selecting a approach synthesized motion blur independently on each
dataset as the source for our deblurring evaluation, we divide image. For the unpaired training and generalized image de-
its training data into two disjoint subsets that capture differ- blurring approaches, we used the blurry images in B and
ent scenes with a specific ratio of 0.6:0.4. In the first subset, the sharp images from S for training the deblurring net-
we select blurry images to form the unknown-blur image work. BSRGAN was originally designed for blind image
set B, while in the second subset, we choose sharp images super-resolution, and we adapted it to work on blind image
to construct the sharp set S. For the chosen target dataset, deblurring by adding motion blur augmentation (via averag-
representing the domain for blur kernel translation via the ing with neighboring frames) into its augmentation pipeline.
Blur2Blur mechanism, we employ the entire training dataset
to train our Blur Kernel Extractor [38] and subsequently 4.2. Image Deblurring Results
apply this extractor to map captured blur embeddings onto
To evaluate the performance of the Blur2Blur mechanism,
the sharp image set S, creating the known-blur image set K.
we defined three data configurations. Each configuration
The blurry images in the test data of the source dataset are
consists of the Known-Blur dataset K, sourced from the Go-
used to evaluate image deblurring algorithms. The statistics
Pro dataset, and two unpaired datasets, B and S, derived
of source image sets are reported in Tab. 1.
from the training partitions of the deblurring dataset REDS,
Implementation Details. We implemented the blur-to-blur RSBlur, or RB2V Street. For a comprehensive evaluation
translation network G using MIMO-UNet [4] with the de- of the Blur2Blur model, we integrated it with two super-
fault configuration in Pix2Pix [9] implementation. For all ex- vised image deblurring backbones, Restormer and NAFNet.
periments, we set the hyper-parameters λrec = 0.8, λgrad = Additionally, we compared the results with state-of-the-art
0.005 and batch size of 16. To enhance our understanding of baselines. The quantitative results are summarized in Tab. 2.
the network G during its initial iterations, we sorted images As observed, both unsupervised image deblurring and
based on their blur degree, determined by the variance of the generalized deblurring approaches, despite expecting gen-
BSRGAN RSBlur Syn. Data GoPro GoPro+B2B
Blur CycleGAN DualGAN + NAFNet + NAFNet (NAFNet) (NAFNet) (NAFNet) Sharp
RB2V_Street
REDS
RSBlur

Figure 3. Comparing image deblurring results on three benchmark datasets with NAFNet. Due to space limit, we skip the results with
Restormer backbone, which is similar but slightly worse than those with NAFNet. Best viewed when magnified on a digital display.

eralization power, exhibit poor performance on these chal- RB2V Street REDS RSBlur
lenging real-world datasets. Their scores are similar to, and NAFNet [3]
sometimes significantly lower than, state-of-the-art super- w/ GoPro 24.78 / 0.714 25.80 / 0.880 26.33 / 0.790
w/ Synthetic Data 22.10 / 0.644 25.07 / 0.853 23.53 / 0.659
vised methods such as Restormer and NAFNet. In con- w/ Blur2Blur (GoPro) 26.98 / 0.812 28.11 / 0.893 29.00 / 0.857
trast, Blur2Blur demonstrates remarkable deblurring results. w/ the source domain* 28.72 / 0.883 29.09 / 0.927 33.06 / 0.888
When combined with Restormer, Blur2Blur helps to in- Restormer [42]
w/ GoPro 23.34 / 0.698 25.43 / 0.775 25.98 / 0.788
crease the PSNR score by 2.63 dB on RB2V Street, 2.12 dB w/ Synthetic Data 23.78 / 0.655 24.76 / 0.753 23.34 / 0.651
on REDS, and 2.91 dB on RSBlur. When combined with w/ Blur2Blur (GoPro) 25.97 / 0.750 27.55 / 0.885 28.89 / 0.850
NAFNet, it provides consistent score increases, with 2.20 w/ the source domain* 27.43 / 0.849 28.23 / 0.916 32.87 / 0.874
dB on RB2V Street, 2.31 dB on REDS, and 2.67 dB on RS- Generalized Deblurring
Blur. NAFNet outperforms Restormer overall, making the BSRGAN [43] 23.31 / 0.645 26.39 / 0.803 27.11 / 0.810
RSBlur [29] 23.42 / 0.603 26.32 / 0.812 26.98 / 0.798
combination of NAFNet and Blur2Blur the most effective ap-
Unpaired Training
proach. Moreover, our method comes close to matching the CycleGAN [46] 21.21 / 0.582 23.92 / 0.775 23.34 / 0.782
best results of supervised models trained on source datasets. DualGAN [40] 21.02 / 0.556 23.50 / 0.700 22.78 / 0.704

We provide a qualitative comparison between image de- Table 2. Comparison of different deblurring methods on various
blurring results in Fig. 3. The comparison highlights a sig- datasets. For each test, we report PSNR↑ and SSIM↑ scores as
evaluation metrics. The best scores are in bold and the second
nificant performance disparity between supervised methods
best score are in underline. For a supervised method, NAFNet or
and their counterparts. Unsupervised methods like Dual- Restormer, we assess its upper-bound of deblurring performance
GAN and CycleGAN struggle notably in deblurring, with by training it on the training set of the source dataset*.
DualGAN particularly unable to navigate the blur-to-sharp
domain, tending instead to bridge the content and color dis- Ratio B : S 5:5 6:4 7:3 8:2 9:1
tribution gap between the blurry (B) and sharp (S) datasets.
GoPro–RB2V Street 26.02 26.98 26.92 25.98 24.32
Synthesis-based methods such as BSRGAN and RSBlur also
GoPro–REDS 27.53 28.11 28.10 27.00 26.43
fall short, failing to address unseen blurs, indicating the limi-
tations of augmentation strategies, including those using the Table 3. PSNR debluring results with different Blur-to-Sharp ratios.
imgaug library. Supervised method NAFNet fails to handle
4.3. Blur2Blur Visualization
unseen blurs, often yielding output mostly identical to the
blurred inputs. However, our method effectively transforms Fig. 4a provides a comparative visualization between the
unknown blurs into known ones. Our translation process suc- original blurry image and its Blur2Blur converted images us-
cessfully focuses on the blur kernel, minimizing bias from ing the same source and target datasets as detailed in Sec. 4.2.
other image characteristics. By integrating Blur2Blur with As can be seen, our transformed images effectively adopt the
NAFNet, we achieve a substantial recovery of high-quality blur pattern of the GoPro dataset, noted for its low sampling
sharp images, demonstrating the practical strength of our rate blur (as further shown in Fig. 4b), while preserving other
approach. Additional qualitative results for Restormer are content elements identical to the input. This demonstrates the
provided in the supplementary material. Blur2Blur conversion’s capability to produce transformed
RB2V_Street REDS RSBlur GoPro Examples Blur RSBlur RSBlur + B2B

Original
Blur

Converted
Image

(a) (b)
Figure 4. (a) A comparison of original images and their correspond-
ing Blur2Blur converted version; (b) Selected examples demon- Figure 6. Results of using Blur2Blur on the WritingHands dataset.
strating the GoPro dataset’s blur pattern (Zoom for best view).
Blur GoPro REDS RSBlur sung Galaxy Note 10 Plus. This dataset includes videos
with motion-induced blur, refined through post-processing
to remove other blur types, and clear, sharp videos recorded
at 60fps. Over two hours, a variety of scenes and motions
were captured, producing 12 blurry and 11 sharp video clips,
each between 30 and 40 seconds long.
In deblurring PhoneCraft images, we used well-known
blur datasets GoPro, REDS, and RSBlur. Results in Fig. 5
show Blur2Blur significantly improved image clarity over
pre-trained models, especially with RSBlur’s complex blur
patterns. The NIQE [22] scores of the deblurred images
GoPro + B2B REDS + B2B RSBlur + B2B
transformed by Blur2Blur, using the GoPro, REDS, and
Figure 5. Qualitative comparison of deblurring models on the
PhoneCraft dataset with multiple target datasets. RSBlur as source datasets are 9.8, 9.2, and 8.8, respec-
tively. For NIQE score, lower is better, and this demonstrates
images that faithfully reflect the specific blur pattern while Blur2Blur’s ability to handle real-world blurs effectively.
preserving the original content details. In our second scenario, we explored a webcam-based
4.4. Ablation Study for Blur to Sharp Ratio application for monitoring hand movements during writing
exercises, aimed at assisting in rehabilitation therapy. The
We evaluate the significance of the blur-to-sharp ratio, rep- challenge here is motion blur, which complicates hand and
resented as the ratio between datasets Unknown-Blur (B) object tracking. To test our approach, we created a dataset
and Unknown-Sharp (S). Specifically, we consider NAFNet named WritingHands with four 30fps webcam-recorded
as the image deblurring backbone, and consider the GoPro- videos, each about 40s long. From these, two videos pro-
RB2V Street and GoPro-REDS dataset settings, where Go- vided over 1100 frames with motion blur for training, and
Pro represents our target camera device for which we have one video offered sharp reference images. Leveraging in-
tailored a deblurring model. We conducted B2B experiments sights from the PhoneCraft dataset, we used the RSBlur
across a range of ratios from 5:5 to 9:1. The deblurring result dataset and its pre-trained NAFNet model for a two-day
in Tab. 3 demonstrates that a greater proportion of blurry training session. Results, shown in Fig. 6, indicate that while
images in the dataset, as seen in the 6:4 and 7:3 ratios, allows RSBlur’s model alone leaves some blur, integrating it with
for a deeper understanding of the blur patterns characteristic Blur2Blur significantly restores the image’s sharpness.
of the target device, leading to improved deblurring perfor-
mance. However, excessively few sharp images, as in the 9:1
5. Conclusions
ratio, may cause the Blur2Blur method to overfit to limited
sharp content. To balance learning and prevent overfitting, a We have proposed Blur2Blur, an effective approach to ad-
6:4 ratio has been selected for all experiments in this study. dress the practical challenge of adapting image deblurring
techniques to handle unseen blur. The key is to learn to con-
4.5. Practicality Evaluation
vert an unknown blur to a known blur that can be effectively
We evaluate the practicality of Blur2Blur in two imagined deblurred using a deblurring network specifically trained to
yet realistic scenarios. The first scenario involved a user handle the known blur. Throughout extensive experiments
desiring a deblurring algorithm for images taken with their on synthetic and real-world benchmarks, Blur2Blur consis-
smartphone camera. To facilitate this, we compiled a dataset tently exhibited superior performance, delivering impressive
named PhoneCraft, featuring images captured using a Sam- quantitative and qualitative outcomes.
References Wang. Deblurgan-v2: Deblurring (orders-of-magnitude)
faster and better. In Proceedings of the International Confer-
[1] Tony F Chan and Chiu-Kwong Wong. Total variation blind ence on Computer Vision, 2019.
deconvolution. IEEE Transactions on Image Processing, 7 [17] Anat Levin, Yair Weiss, Fredo Durand, and William T Free-
(3):370–375, 1998. man. Understanding and evaluating blind deconvolution algo-
[2] Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- rithms. In Proceedings of the IEEE Conference on Computer
peng Chen. Hinet: Half instance normalization network for Vision and Pattern Recognition. IEEE, 2009.
image restoration. In Proceedings of the IEEE Conference on [18] Chih-Hung Liang, Yu-An Chen, Yueh-Cheng Liu, and Win-
Computer Vision and Pattern Recognition, 2021. ston H Hsu. Raw image deblurring. IEEE Transactions on
[3] Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Multimedia, 24:61–72, 2020.
Simple baselines for image restoration. In Proceedings of the [19] Guangcan Liu, Shiyu Chang, and Yi Ma. Blind image deblur-
European Conference on Computer Vision, 2022. ring using spectral properties of convolution operators. IEEE
[4] Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, Transactions on Image Processing, 23(12):5047–5056, 2014.
and Sung-Jea Ko. Rethinking coarse-to-fine approach in [20] Boyu Lu, Jun-Cheng Chen, and Rama Chellappa. Unsuper-
single image deblurring. In Proceedings of the International vised domain-specific deblurring via disentangled representa-
Conference on Computer Vision, 2021. tions. In Proceedings of the IEEE Conference on Computer
[5] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Vision and Pattern Recognition, 2019.
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and [21] Armin Mehri, Parichehr B Ardakani, and Angel D Sappa.
Yoshua Bengio. Generative adversarial networks. Communi- Mprnet: Multi-path residual network for lightweight image
cations of the ACM, 63(11):139–144, 2020. super resolution. In Proceedings of the IEEE/CVF Winter
[6] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Conference on Applications of Computer Vision, 2021.
Dumoulin, and Aaron C Courville. Improved training of [22] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak-
wasserstein gans. Advances in Neural Information Processing ing a “completely blind” image quality analyzer. IEEE Signal
Systems, 30, 2017. processing letters, 20(3):209–212, 2012.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. [23] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep
Deep residual learning for image recognition. In Proceedings multi-scale convolutional neural network for dynamic scene
of the IEEE Conference on Computer Vision and Pattern deblurring. In Proceedings of the IEEE Conference on Com-
Recognition, 2016. puter Vision and Pattern Recognition, 2017.
[8] Michal Hradivs, Jan Kotera, Pavel Zemcik, and Filip [24] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep
vSroubek. Convolutional neural networks for direct text multi-scale convolutional neural network for dynamic scene
deblurring. In Proceedings of the British Machine Vision deblurring. In Proceedings of the IEEE Conference on Com-
Conference, 2015. puter Vision and Pattern Recognition, 2017.
[9] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. [25] Seungjun Nah, Radu Timofte, Sungyong Baik, Seokil Hong,
Image-to-image translation with conditional adversarial net- Gyeongsik Moon, Sanghyun Son, and Kyoung Mu Lee. Ntire
works. In Proceedings of the IEEE Conference on Computer 2019 challenge on video deblurring: Methods and results.
Vision and Pattern Recognition, 2017. In Proceedings of the IEEE/CVF Conference on Computer
[10] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual Vision and Pattern Recognition Workshops, 2019.
losses for real-time style transfer and super-resolution. In [26] Bang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang
Proceedings of the European Conference on Computer Vision, Nguyen, and Minh Hoai. Hypercut: Video sequence from a
2016. single blurry image using unsupervised ordering. In Proceed-
[11] Alexander B. Jung, Kentaro Wada, Jon Crall, Satoshi Tanaka, ings of the IEEE Conference on Computer Vision and Pattern
Jake Graving, Christoph Reinders, Sarthak Yadav, Joy Baner- Recognition, 2023.
jee, Gábor Vecsei, Adam Kraft, Zheng Rui, Jirka Borovec, [27] Dongwei Ren, Kai Zhang, Qilong Wang, Qinghua Hu, and
Christian Vallentin, Semen Zhydenko, Kilian Pfeiffer, Ben Wangmeng Zuo. Neural blind deconvolution using deep pri-
Cook, Ismael Fernández, Franccois-Michel De Rainville, Chi- ors. In Proceedings of the IEEE Conference on Computer
Hung Weng, Abner Ayala-Acevedo, Raphael Meudec, Matias Vision and Pattern Recognition, 2020.
Laporte, et al. imgaug. https://github.com/aleju/ [28] Jaesung Rim, Haeyun Lee, Jucheol Won, and Sunghyun Cho.
imgaug, 2020. Online; accessed 01-Feb-2020. Real-world blur dataset for learning and benchmarking deblur-
[12] Diederik P. Kingma and Jimmy Ba. Adam: A method for ring algorithms. In Proceedings of the European Conference
stochastic optimization. CoRR, abs/1412.6980, 2014. on Computer Vision, 2020.
[13] Dilip Krishnan and Rob Fergus. Fast image deconvolution [29] Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee,
using hyper-laplacian priors. Advances in Neural Information Seungyong Lee, and Sunghyun Cho. Realistic blur synthesis
Processing Systems, 22:1033–1041, 2009. for learning image deblurring. In Proceedings of the European
[14] Dilip Krishnan, Terence Tay, and Rob Fergus. Blind deconvo- Conference on Computer Vision, 2022.
lution using a normalized sparsity measure. In Proceedings [30] Jaesung Rim, Geonung Kim, Jungeon Kim, Junyong Lee,
of the IEEE Conference on Computer Vision and Pattern Seungyong Lee, and Sunghyun Cho. Realistic blur synthesis
Recognition. IEEE, 2011. for learning image deblurring. In Proceedings of the European
[15] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych, Conference on Computer Vision, 2022.
Dmytro Mishkin, and Jivr´i Matas. Deblurgan: Blind motion [31] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net:
deblurring using conditional adversarial networks. In Pro- Convolutional networks for biomedical image segmentation.
ceedings of the IEEE Conference on Computer Vision and In Proceedings of the International Conference on Medical
Pattern Recognition, 2018. Image Computing and Computer Assisted Intervention, 2015.
[16] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang [32] Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen,
Haibin Ling, Tingfa Xu, and Ling Shao. Human-aware mo- Conference on Computer Vision, 2017.
tion deblurring. In Proceedings of the International Confer-
ence on Computer Vision, 2019.
[33] K Simonyan and A Zisserman. Very deep convolutional
networks for large-scale image recognition. In Proceedings
of International Conference on Learning and Representation,
2015.
[34] Maitreya Suin, Kuldeep Purohit, and AN Rajagopalan.
Spatially-attentive patch-hierarchical network for adaptive
motion deblurring. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2020.
[35] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya
Jia. Scale-recurrent network for deep image deblurring. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2018.
[36] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya
Jia. Scale-recurrent network for deep image deblurring. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2018.
[37] Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang
Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, and
Yihui Ren. Rethinking cyclegan: Improving quality of gans
for unpaired image-to-image translation. arXiv preprint
arXiv:2303.16280, 2023.
[38] Phong Tran, Anh Tuan Tran, Quynh Phung, and Minh Hoai.
Explore image deblurring via encoded blur kernel space. In
Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2021.
[39] Andrey Vakunov, Chuo-Ling Chang, Fan Zhang, George
Sung, Matthias Grundmann, and Valentin Bazarevsky. Medi-
apipe hands: On-device real-time hand tracking. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops, 2020.
[40] Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. Dualgan:
Unsupervised dual learning for image-to-image translation.
In Proceedings of the International Conference on Computer
Vision, 2017.
[41] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar
Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling
Shao. Multi-stage progressive image restoration. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2021.
[42] Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu-
nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang.
Restormer: Efficient transformer for high-resolution image
restoration. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2022.
[43] Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo-
fte. Designing a practical degradation model for deep blind
image super-resolution. In Proceedings of the International
Conference on Computer Vision, 2021.
[44] Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Yi
Yang, and Meng Wang. Fcl-gan: A lightweight and real-
time baseline for unsupervised blind image deblurring. In
Proceedings of the 30th ACM International Conference on
Multimedia, 2022.
[45] Zhihang Zhong, Ye Gao, Yinqiang Zheng, and Bo Zheng.
Efficient spatio-temporal recurrent neural network for video
deblurring. In Proceedings of the European Conference on
Computer Vision, 2020.
[46] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.
Unpaired image-to-image translation using cycle-consistent
adversarial networks. In Proceedings of the International
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring
on Unknown Domains
Supplementary Material
Abstract clearly surpasses the standard UNet, even in its modified
form. Moreover, the results also reveal that the NAFNet
In this supplementary PDF, we first provide the qualita- does not perform as well as the multi-scale variants, high-
tive results obtained by methods with the Restormer back- lighting the importance of multi-scale level optimization in
bone [42] and some additional qualitative results of each the Blur2Blur framework for deblurring tasks.
dataset to show our effectiveness in deblurring unknown-
blur images compared to other baselines. Next, we illustrate Backbone PSNR↑ SSIM↑
the performance with different backbones for the Blur2Blur
UNet [31] 22.54 0.732
translator model. Finally, we provide details of our collected
MIMO-UNet [4] 26.98 0.812
PhoneCraft dataset and validate the video deblurring perfor- NAFNet [3] 20.54 0.686
mance of Blur2Blur, demonstrating significant enhancements
in hand movement visualization and thus leaving room for Table 4. Ablation studies with the Blur2Blur backbone.
practical application. We also include our code and a video
of sample deblurring results in the supplementary package.
7.2. Impact of Sataset Size.
6. Additional Qualitative Results In Tab. 5, we validate the deblurring performance using
GoPro-RB2V datasets, maintaining a fixed B:S ratio of 6:4
6.1. Restormer model while varying the dataset size across four different scales of
In Fig. 3 in the main paper, we omit the results with the the target dataset (α).
Restormer backbone due to the space limit. We provide
these results in this supplementary in Fig. 7. As can be seen, α 0.25 0.5 0.75 1.0
Restormer shows behavior similar to NAFNet. The orig- PSNR 25.45 25.93 26.32 26.98
inal network produces blurry images that are close to the
input images. However, when combined with Blur2Blur, it Table 5. Affect of data size
can successfully deblur the images and produce sharper out-
puts. From quantitative numbers, Restormer-based models
perform slightly worse than the NAFNet-based counterparts. 7.3. Validation on the blur converter
6.2. Additional Deblurring Results To evaluate the effectiveness of our blur converter, we do
two classification experiments to determine the alignment
In this section, we provide additional qualitative figures of converted images with the KnownBlur domain (K), using
comparing the image deblurring results of our Blur2Blur GoPro-RB2V settings as detailed in the main paper. The first
and other baselines. Figures 8, 9, and 10 show samples (Acc1) used the pretrained Discriminator from our Blur2Blur
where K is built upon the GoPro dataset [24], with the Un- framework, assessing if converted images belongs to (K).
known set derived respectively from the REDS dataset [25], For the second (Acc2), we synthesized a new dataset via
RB2V Street [26], and RSBlur [30]. the Blur Kernel Extractor F [38] using sharp GoPro images
combined with blur kernels from the target datasets. We then
7. Blur2Blur Analysis trained a ResNet18 [7] to discern if the blur in converted
7.1. Backbone Experiments images corresponded to (K) or not. In both experiments, our
method converts the input image to have the target known
We explore the integration of multi-scale architectures into blur with near 100% accuracy (Tab. 6).
the Blur2Blur mechanism by experimenting with different
backbones. The UNet architecture [31] has been adapted to
Model Input Converted
handle inputs at various scales, allowing for a more nuanced
understanding of blur at multiple scales. Concurrently, we Acc1/Acc2(%) 11.24 / 6.85 86.67 / 96.53
employed the NAFNet backbone in its original form, taking
advantage of its robust feature extraction capabilities without Table 6. Blur converter validation
modifications. The result on Tab. 4 shows that MIMO-UNet
Syn. Data GoPro GoPro+B2B
Blur (Restormer) (Restormer) (Restormer) Sharp
RB2V_Street
REDS
RSBlur

Figure 7. Qualitative results of Restormer [42] on three datasets.

8. Real-world Application To evaluate our Blur2Blur model, we used a video with


pronounced hand movements, pre-training the deblurring
8.1. Details of PhoneCraft collection model on the RSBlur dataset. The results, demonstrated
The data collection process for training Blur2Blur is actually in video1.mp4, clearly show that our Blur2Blur framework
inexpensive. Although the number of images required looks significantly enhances visual clarity compared to using the
high (several thousand for each subset), they are mostly pre-trained deblurring model alone. Moreover, to further
video frames and thus can be collected effectively. For ex- assess the enhancement in hand movement recognition, we
ample, in the PhoneCraft experiment above, we only need validated the deblurred videos using the Hand Pose Estima-
to collect 11 sharp videos and 12 blurry ones, with a total tion model from MediaPipe[39]. The results, shown in the
collection time of less than 2 hours. More specifically, the video, highlight a notable improvement in hand pose esti-
dataset contains more than 12500 diverse blurry images and mation when using our method. The enhanced sharpness
11000 sharp images. and detail achieved by Blur2Blur enable more accurate and
reliable recognition of hand poses. This demonstrates the
8.2. Video Deblurring Performance potential of our Blur2Blur model in applications demanding
As mentioned in the main paper, to enrich our practical evalu- high-fidelity visualization of hand movements, especially
ation with more tangible visual examples and to demonstrate in advanced rehabilitation therapy tools that rely on pre-
one real-world application of our Blur2Blur mode, we in- cise hand movement tracking for effective patient care and
corporated a video from the collected dataset. This video recovery.
simulates scenarios with significant motion blur, which is Besides that, we also provide the additional qualitative
common in dynamic environments. The clarity of visual video deblurring result in PhoneCraft dataset is illustrated in
details in such situations is crucial for various applications, video2.mp4.
including rehabilitation therapy. Accurate hand movement
visualization is vital for tasks like hand pose detection and
gesture-based interactive rehabilitation systems.
Example 1 Example 2 Example 3

Blur

CycleGAN

DualGAN

BSRGAN
+ NAFNet

RSBlur
+ NAFNet

Syn. Data
(Restormer)

GoPro
(Restormer)

GoPro+B2B
(Restormer)

Sharp

Figure 8. Extra qualitative results on the REDS dataset.


Example 1 Example 2 Example 3

Blur

CycleGAN

DualGAN

BSRGAN
+ NAFNet

RSBlur
+ NAFNet

Syn. Data
(NAFNet)

GoPro
(NAFNet)

GoPro+B2B
(NAFNet)

Sharp

Figure 9. Extra qualitative results on the RB2V Street dataset.


Example 1 Example 2 Example 3

Blur

BSRGAN
+ NAFNet

RSBlur
+ NAFNet

GoPro
(Restormer)

GoPro+B2B
(Restormer)

GoPro
(NAFNet)

GoPro+B2B
(NAFNet)

Sharp

Figure 10. Extra qualitative results on the RSBlur dataset.

You might also like