Joint Learning of Blind Super-Resolution and Crack
Joint Learning of Blind Super-Resolution and Crack
Joint learning improved by our proposed two extra paths that further encourage the mutual optimization between
SR and segmentation. Comparative experiments with SoTA segmentation methods demonstrate the
superiority of our joint learning, and various ablation studies prove the effects of our contributions.
(a) Input LR (b) HR GT (c) Independent [12] (d) Multi-task [102] (e) Ours (CSBSR)
Figure 1: Difficulty in real-world crack segmentation. From an input Low-Resolution (LR) image (a), High-Resolution (HR) segmentation
results (c), (d), and (e) are acquired. (c) Independent and (d) Multi-task show the results on images enlarged by non-blind SR “trained
independently of segmentation” and “trained with segmentation in a multi-task learning manner,” respectively. (b) is the manually-annotated
ground-truth HR segmentation image.
Table 1 √
Problems ⟨𝐴⟩, ⟨𝐵⟩, ⟨𝐶⟩, and ⟨𝐷⟩ and their solutions 1, 2, 3, and 4. If solution 𝑆 is for problem ⟨𝑃 ⟩, column 𝑃 of row 𝑆 in the table is .
⟨𝐴⟩ Class imbalance ⟨𝐵⟩ Fine cracks ⟨𝐶⟩ LR cracks ⟨𝐷⟩ Blur
√ √
1. CSBSR √ √
2. BC loss √ √
3. Segmentation-aware SR-loss weights √
4. Blur-reflected task learning
2. Related Work are employed for crack segmentation in [86], in [19], and
in [71, 128, 69], respectively.
2.1. Image Segmentation In addition to the class-imbalance issue, the fine bound-
Image segmentation techniques [78] are briefly divided aries of cracks are not easy to be extracted and make crack
into three categories, namely semantic segmentation [90], segmentation difficult, as presented as Problem <B> in
instance segmentation [47], and panoptic segmentation [59]. Table 1. For such difficult fine crack segmentation, the afore-
Crack segmentation is categorized into semantic segmenta- mentioned schemes proposed against class imbalance (e.g.,
tion because it classifies all pixels into crack and background weighted loss, re-sampling, class-imbalance-oriented loss)
pixels with no instance. That is, these crack pixels are not are also useful. Previous methods for such fine cracks are di-
divided into crack instances. vided into the following two approaches, namely boundary-
Class-imbalance Segmentation: As well as in various com- based and coarse-to-fine weighting.
puter vision problems, in image segmentation, class im- In the boundary-based approach, the distance between
balance is a critical problem. Many approaches for class the boundaries of ground-truth and predicted cracks is min-
imbalance are applicable to class-imbalance segmentation imized. In [54], the Hausdorff distance is evaluated by using
tasks. For example, weighted loss such as the Weighted the distance transform. While its computational cost for the
Cross Entropy (WCE) loss [26] and the focal loss [67] for exact solution is high, the sum of L2 distances is approxi-
segmentation [64, 49, 109], re-sampling [126] for segmen- mated by the sum of regional integrals for efficiency in the
tation [18], and hard mining [29] for segmentation [35]. Boundary loss [55].
Among all segmentation tasks, medical image segmen- Various coarse-to-fine weighting approaches such as [20]
tation has to cope with highly-imbalanced classes (e.g., employ pyramid and U-net like networks for weighting a fine
tiny tumors and background). Such difficult medical image but unreliable representation by more reliable results in a
segmentation is tackled by a variety of loss functions such coarse representation. The effectiveness of this approach is
as the Dice loss [77], the Generalized Dice loss [97], the validated also in crack segmentation [107, 128, 71, 69, 22,
Combo loss [98], the Hausdorff loss [54], and the Boundary 66].
loss [55]. While the effectiveness of the both approaches is val-
Crack Segmentation: Since the class-imbalance issue is idated, the coarse-to-fine weighting approach is applicable
important also for crack segmentation as presented as Prob- only to pyramid and U-net like architectures. On the other
lem <A> in Table 1, the aforementioned schemes proposed hand, the boundary-based approach can be employed with
against class imbalance are useful for crack segmentation. any other loss functions in any network architectures in
For example, in order to balance the number of samples be- general.
tween classes, Crack GAN [119] oversamples crack images
by using DC-GAN [85]. The Dice, Combo, and WCE losses
Figure 3: Proposed joint learning network with blind SR and segmentation. See the caption of Fig. 2 for the explanations of arrows and
ellipses. ⊙ indicates a pixelwise multiplication operator.
3.1. Joint Learning loss that evaluates the whole image region. In [55], the Gen-
CSBSR consists of blind SR and segmentation networks, eralized Dice (GDice) loss [97] is empirically demonstrated
as shown in Fig. 2 (c). Its detail is shown in Fig. 3. The blind to be a good choice. However, it is reported that the Sigmoid
SR network, 𝑆(𝐼 𝐿 ; 𝜃𝑆 ) where 𝜃𝑆 denotes all the parameters function included in the GDice loss and its original Dice loss
of this SR network, maps 𝐼 𝐿 to its SR image, 𝐼 𝑆 . The tends to cause the vanishing gradient problem [98].
crack segmentation network, 𝐶, takes 𝐼 𝑆 and outputs a crack This paper explores more appropriate losses combined
segmentation image 𝐼 𝐶 = 𝐶(𝐼 𝑆 ; 𝜃𝐶 ). Any differentiable SR with the Boundary loss for stable learning as well as fine
and crack segmentation networks can be employed as 𝑆 and segmentation. We improve learning stability by combining
𝐶, respectively. Let 𝑆 and 𝐶 denote loss functions for 𝑆 the GDice loss with the WCE loss that is expressed with-
and 𝐶, respectively. While 𝑆 is back-propagated through out the derivative of the Sigmoid function, which tends to
𝑆, 𝐶 is back-propagated through 𝑆 and 𝐶 in an end-to-end cause gradient vanishing. Since the Dice loss and the WCE
manner. The whole network is trained by the following loss loss have different properties (i.e., which are categorized to
with the task weight 𝛽 as a hyper-parameter: region-based and distribution-based losses, respectively, as
introduced in [75]), it is also validated that a pair of the
𝐽 = (1 − 𝛽)𝑆 + 𝛽𝐶 (2) Dice and WCE losses, which is called the Combo loss [98],
complementarily work for better segmentation. Finally, we
Implementation details In our experiments, DBPN [45,
propose the following Boundary Combo (BC) loss, 𝐵𝐶 , as
43] and its extension to blind SR [111], which is called
𝐶 for 𝐶 in our joint learning:
KBPN, are employed as 𝑆 for fair comparison between our
( )
proposed methods with non-blind SR and blind SR (i.e., 𝐵𝐶 = 𝛼𝐵 + (1 − 𝛼) (1 − 𝛾)𝐷 + 𝛾𝑊 𝐶𝐸 (, 3)
comparison between CSSR and CSBSR). Different from
DBPN as non-blind SR, KBPN also outputs its estimated where 𝐵 , 𝐷 and 𝑊 𝐶𝐸 denote the Boundary, Dice, and
blur kernel. Loss functions used in [45, 43] and KBPN [111] WCE losses, respectively. 𝛼 ∈ [0, 1] and 𝛾 ∈ [0, 1] are
are used as 𝑆 in our joint learning with no change. hyper-parameters. 𝐵𝐶 consists of the region, distribution,
𝐶 is implemented with each of U-Net [87], PSPNet [124], and boundary-based losses. A combination of these three
CrackFormer [69], and HRNet+OCR [112] for validating a loss categories are never evaluated according to the sur-
wide applicability of our method1 . Section 3.2 proposes a vey [75]. As a variant of 𝐵𝐶 , we also propose 𝐺𝐵𝐶 in
new general-purpose segmentation loss, which is applicable which the GDice loss 𝐺𝐷 is used in 𝐵𝐶 instead of 𝐷 .
to all of these networks as 𝐶 . While one may refer to the original papers of 𝐵 , 𝐷 ,
𝐺𝐷 , and 𝑊 𝐶𝐸 for the details, these losses are briefly
3.2. Boundary Combo Loss explained in the following three paragraphs.
For suppressing class-imbalance difficulty in crack seg-
mentation, we propose the Boundary Combo (BC) loss Boundary loss (𝐵 ) The Boundary loss [55], computes
the distance-weighted 2D area between the ground-truth
that simultaneously achieves locally-fine and globally-robust
crack and its estimated one, which becomes zero in the ideal
segmentation. Fine segmentation can be achieved by the estimation, as follows:
boundary-based approach such as the Boundary loss [55].
However, if only the boundary-based approach is employed, 𝐷(𝜕𝐺, 𝜕𝑆) = ||𝑞𝜕𝑆 (𝑝) − 𝑝||2 𝑑𝑝
the segmentation network is easy to fall into local minima, ∫𝜕𝐺
as validated in [55]. This problem can be resolved by em- ≃ 2 𝐷 (𝑝)𝑑𝑝
ploying the boundary-based approach simultaneously with a ∫Δ𝑆 𝐺
( )
1 The implementations of these SR and segmentation networks are = 2 𝜙 (𝑝)𝑠(𝑝) − 𝜙 (𝑝)𝑔(𝑝)𝑑𝑝 , (4)
∫Ω 𝐺 ∫Ω 𝐺
publicly available [41, 110, 1, 123, 2, 113].
where 𝐺 and 𝑆 denote the pixel sets of the ground-truth • For detecting all fine thin cracks, a segmentation loss
crack and its estimated one, respectively. 𝑝 and 𝑞𝜕𝑆 (𝑝) denote function is weighted so that pixels inside and around
a point on boundary 𝜕𝐺 and its corresponding point on cracks are weighted higher. A weight given to pixel 𝑝,
boundary 𝜕𝑆, respectively. 𝑞𝜕𝑆 (𝑝) is an intersection between 𝑤𝐶𝑝 , is expressed as follows:
𝜕𝑆 and a normal of 𝜕𝐺 at 𝑝. Δ𝑆 = (𝑆∕𝐺) ∪ (𝐺∕𝑆) is the
mismatch part between 𝐺 and 𝑆. 𝐷𝐺 (𝑝) is the distance map 𝑤𝐶 = exp(−𝑚𝐶 𝐷𝑝 ) (9)
𝑝
from 𝐺. 𝑠(𝑝) and 𝑔(𝑝) are binary indicator functions, where
𝑠(𝑝) = 1 and 𝑔(𝑝) = 1 if 𝑝 ∈ 𝑆 and 𝑝 ∈ 𝐺, respectively.
where 𝑚𝐶 and 𝐷𝑝 denote a weight constant and a
𝜙𝐺 (𝑞) is the level set representation of boundary 𝜕𝐺: 𝜙𝐺 =
distance between 𝑝 and its nearest crack pixel, respec-
−𝐷𝐺 (𝑞) if 𝑞 ∈ 𝐺, and 𝜙𝐺 = 𝐷𝐺 (𝑞) otherwise. Ω denotes a
tively. 𝑤𝐶
𝑝 is called the Crack-Oriented (CO) weight.
pixel set in the image. The second term in Eq. (5) is omitted
as it is independent of the network parameters. By replacing • For hard pixel mining, a segmentation loss func-
𝑠(𝑝) by the network softmax outputs 𝑠𝜃 (𝑝), we obtain the tion is weighted so that pixels inside and around
Boundary loss function below: false-positive and false-negative pixels are weighted
higher. For such difficulty-aware segmentation, in our
\mathcal {L}_{B} = \int _{\Omega } \phi _G (p) s_\theta (p) dp \label {eq:Boundary} (5) method, a weight given to pixel 𝑝, 𝑤𝐹𝑝 , is expressed as
follows:
Dice and GDice losses (𝐷 and 𝐺𝐷 ) The Dice loss [77]
𝑤𝐹𝑝 = exp(𝑚𝐹 |𝑇𝑝𝑃 − 𝑇𝑝𝐺𝑇 |), (10)
is a harmonic mean of precision and recall as expressed as
follows:
∑ ∑𝑁 where 0 ≤ 𝑇𝑝𝑃 ≤ 1 and 𝑇𝑝𝐺𝑇 ∈ {0, 1} denote
2 𝑀 𝑗 𝑖 𝑝𝑖𝑗 𝑔𝑖𝑗 the value of 𝑝-th pixel in predicted and ground-truth
𝐷 = ∑𝑀 ∑𝑁 , (6) segmentation images, respectively. 𝑚𝐹 is a weight
(𝑝 2 + 𝑔2 )
𝑗 𝑖 𝑖𝑗 𝑖𝑗 constant. Our 𝑤𝐹𝑝 is applicable to any loss function
where 𝑀 and 𝑁 denote the number of classes (i.e., 𝑀 = 2 such as our BC loss, Eq. (3), consisting of multiple
in our problem) and the number of all pixels in each image, loss functions, while the focal loss [67] and the anchor
respectively. 𝑝𝑖𝑗 and 𝑔𝑖𝑗 are the classification probability loss [88], both of which also penalize hard samples,
(0 ≤ 𝑝𝑖𝑗 ≤ 1) and its ground truth (𝑔𝑖𝑗 ∈ {0, 1}). are based on a weighted cross entropy loss. 𝑤𝐹𝑝 is
Different from the Dice loss, the GDice loss [97] is called the Fail-Oriented (FO) weight.
weighted by the number of pixels in each class as follows:
These two weights (9) and (10) are multiplied pixelwise by
∑ (𝐺𝐷) ∑𝑁 𝑆 .
2 𝑀 𝑗 𝑤𝑗 𝑖 𝑝𝑖𝑗 𝑔𝑖𝑗
𝐺𝐷 = ∑ ∑ , (7)
𝑀 (𝐺𝐷) 𝑁
𝑗 𝑤𝑗 𝑖 (𝑝𝑖𝑗 + 𝑔𝑖𝑗 ) 3.4. Blur Skip for Blur-reflected Task Learning
It is not easy for the blind SR network to perfectly predict
where 𝑤(𝐺𝐷)
𝑗 = ∑𝑁
1
. the ground-truth blur kernel 𝐾 and the ground-truth HR
𝑔𝑖𝑗
𝑖
image 𝐼 𝐻 so that 𝐼 𝑆 = 𝐼 𝐻 . Let 𝐾 𝑃 and 𝐾 𝑆 denote the
WCE loss (𝑊 𝐶𝐸 ) The WCE loss [26] is the Cross En- predicted kernel and the blur kernel that remains in 𝐼 𝑆 so
that 𝐾 = 𝐾 𝑃 + 𝐾 𝑆 and 𝐼 𝑆 = 𝐼 𝐻 ∗ 𝐾 𝑆 . We assume that
tropy loss weighted by a hyper parameter, 𝑤(𝑊 𝐶𝐸)
, which is
𝑗 𝐾 𝑆 correlates with 𝐾 𝑃 .
determined based on the class imbalance (e.g., 𝑤(𝑊
𝑗
𝐶𝐸)
= Based on this assumption, this paper proposes blur-
1
∑𝑁 ′ where 𝑁 ′ = 𝑁𝑁𝐼 and 𝑁𝐼 is the number of all reflected segmentation learning via a skip connection, which
𝑖 𝑔𝑖𝑗
is called the blur skip, from the SR network 𝑆 to the
training images):
segmentation network 𝐶. This skip connection forwards 𝐾 𝑃
∑
𝑀 ∑
𝑁 to the end of 𝐶 in order to condition features extracted by 𝐶
𝑊 𝐶𝐸 = 𝑤(𝑊
𝑗
𝐶𝐸)
𝑔𝑖𝑗 log 𝑝𝑖𝑗 (8) with 𝐾𝑃 . While this conditioning is achieved by the Spatial
𝑗 𝑖 Feature Transform (SFT) [104], SFT is marginally modified
for CSBSR as follows. The detail of the modified SFT layer
3.3. Segmentation-aware Weights for SR is shown in Fig. 4. In the original SFT layer, conditions
In addition to end-to-end learning with 𝐶 (i.e., segmen- are directly fed into conv layers for producing conditioning
tation loss in Eq. (2)), we propose to weight 𝑆 by 𝐶 for features (which are depicted by red and yellow 3D boxes,
further optimizing the SR network 𝑆 for segmentation. This respectively, in Fig. 4) for scaling and shifting. Different
weighting is achieved by pixelwise multiplying 𝑆 by 𝐶 . from this original SFT layer, target features (“Segmentation
It is not yet easy to discriminate between crack and features” in Fig. 4) are concatenated to the conditions. It is
background pixels for precisely detecting fine cracks. This empirically validated that this concatenation process slightly
difficulty arises especially around crack pixels. For such improves the segmentation quality.
difficult pixelwise segmentation, our method employs the
following two difficulty-aware weights:
Expand
SFT layer
Conv
Conv
Conv
Conv
Blur
conditions Concat
Blur
conditions
Conditions Conditions
for scaling for shifting
+
SFT layer
SFT layer
Conv
Conv
+
Segmentation Conditioned
features Residual connection segmentation features
Figure 4: The structure of our blur skip module using SFT [104]. Each 3D box and rectangle depict a feature set and a process, respectively.
⊙ indicates a pixelwise multiplication operator.
3.5. Training Strategy flips from each training image for data augmentation. This
Our joint learning has several loss functions, weights, patch is regarded as a HR image (𝐼 𝐻 ). From 𝐼 𝐻 , its LR
and hyper-parameters. They should be properly used for images (𝐼 𝐿 ) are generated with various blur kernels (𝐾) and
training our complex network consisting of 𝑆 and 𝐶. bicubic downsampling (↓𝑠 ), as expressed in Eq. (1). 𝐾 is ran-
domly sampled from the anisotropic 2D Gaussian blurs with
Step 1: As with most tasks each of which has a limited variance 𝜎𝑎2 , 𝜎𝑏2 ∈ [0.2, 4.0] and angle 𝜃𝑔𝑎𝑢𝑠 ∈ [0, 𝜋). The
amount of training data, 𝑆 is pre-trained with general kernel size is 21 × 21 pixels. The HR-LR downscaling factor
huge datasets for blind SR. is 14 . The feature extractor of 𝐶 is pre-trained depending on
Step 2: With a dataset for crack segmentation, only 𝑆 is the segmentation network as follows. For U-Net and PSPNet,
initially finetuned with 𝛽 = 0 in Eq .(2). VGG-16 is provided by torchvision [4]. For HRNet+OCR,
the authors’ model [113] is used.
Step 3: The whole network is finetuned so that 𝐶 is weighted For pre-training of 𝑆 in Step 1, the number of iterations
by a constant (i.e., 𝛽 ≠ 0). is 200,000. The minibatch size is six. Adam [58] is used as an
optimizer with 𝛽1 = 0.9, 𝛽2 = 0.999, 𝜖 = 10−8 . The learning
rate is 2 × 10−4 .
4. Experimental Results The number of iterations is 30,000 and 150,000 in Steps
4.1. Pre-training and Training Details 2 and 3, respectively. The minibatch size and the optimizer
For pre-training the SR network 𝑆, 3,450 images in are equal to those in the aforementioned pre-training. The
the DIV2K dataset [6] (800 images) and the Flickr2K learning rate is 2 × 10−5 .
dataset [100] (2,650 images) were used. The whole network
for crack segmentation 𝐶 was not pre-trained but its feature 4.2. Synthetically-degraded Crack Images
extractor was pre-trained with the ImageNet [28]. 4.2.1. Training
For pre-training 𝑆 (i.e., Step 1 in Sec. 3.5) and finetuning For experiments shown in Secs. 4.2 and 4.3, the Khanhha
𝑆 and 𝐶 (i.e., Steps 2 and 3), an image patch fed into each dataset [56] was used to finetune the whole network for
network is randomly cropped with vertical and horizontal CSBSR. the SR and segmentation networks. This dataset
Table 2
Results on the Khanhha dataset. CSBSR is implemented with four different segmentation networks [124, 112, 87]. To validate the effect of
our proposed joint learning, the SR and segmentation networks are also trained without joint learning in each pair of the SR and segmentation
networks; see (d) vs. (e), (f) vs. (g), (h) vs. (i), and (j) vs. (k). For comparison, the results of SOTA methods with SR and segmentation are
also shown in (b) and (c). For reference, instead of an LR image, its original HR image is directly fed into the segmentation network in “(a)
Segmentation in HR” for the upper bound analysis. The best score in each column except for “(a) Segmentation in HR” is colored by red.
consists of CRACK500 [120], GAPs [30], CrackForest [91], 4.2.3. Comparison with SOTA segmentation methods
AEL [9], cracktree200 [127], DeepCrack [71], and CSSC [108] For comparative experiments, 1,695 HR test images in
datasets. As shown in the sample images of these datasets, the Khanhha dataset are degraded to their LR images in the
(Fig. 5), the Khanhha dataset is challenging so that a variety same manner as training image generation.
of structures are observed and the properties of annotated For validating the wide applicability of CSBSR, four
cracks differ between the elemental datasets [120, 30, 91, SOTA segmentation networks (i.e., PSPNet [124] for Table 2
9, 127, 71, 108]. In the Khanhha dataset, the image size is (e), HRNet+OCR [112] for Table 2 (g), CrackFormer [69]
448 × 448 pixels, which is regarded as a HR image in our for Table 2 (i), and U-Net [87] for Table 2 (k)) are used as
experiments. The dataset has 9, 122, 481, and 1,695 training, a segmentation network in CSBSR, as described in Sec. 3.1.
validation, and test images. These training and test sets were While CSBSR is trained in a joint end-to-end manner (i.e.,
used as training images for all experiments and test images (e), (g), (i), (k) in Table 2), the results of independent
in experiments shown in Sec. 4.2, respectively. blind SR and segmentation networks (i.e., (d), (f), (h), (j)
in Table 2) are also shown for comparison. To focus on the
4.2.2. Evaluation Metrics difference between the network architectures for segmen-
Each SR image is evaluated with PSNR and SSIM [106]. tation, all of these segmentation networks are trained with
Each segmentation image is evaluated with Intersect of our BC loss in Eq. (3). In BC loss, 𝛼 = 0.5 and 𝛾 = 0.5
Union (IoU). While IoU is computed in a binarized image, were determined empirically. The task weight 𝛽 in Eq. (2)
the output of CSBSR is a segmentation image in which is determined empirically for each method and fixed during
each pixel has a probability of being a crack or not a Step 3 in the training strategy (Sec. 3.5).
crack. Since IoU differs depending on a threshold for bi- In addition, CSBSR is compared with SOTA methods
narization, the threshold for each method is determined so in which non-blind SR and segmentation are used (i.e.,
that the mean IoU over all test images is maximized. This Table 2 (b) SrcNet [12] in which SR and segmentation are
maximized IoU is called IoU𝑚𝑎𝑥 . For evaluation indepen- trained independently and Table 2 (c) DSRL [102] in which
dently of thresholding, IoUs are averaged over thresholds SR and segmentation are trained in a multi-task learning
(AIU [107]). While IoU is a major metric for segmentation, manner). The segmentation network of SrcNet and DSRL
it is inappropriate for evaluating fine thin cracks because a is trained with the BCE loss. While SrcNet is implemented
slight displacement makes IoU significantly small even if the by ourselves because its code is not available, we used the
structures of ground-truth and estimated cracks are almost publicly-available implementation of DSRL [3].
similar. For appropriately evaluating such similar cracks, Quantitative Results: Table 2 shows quantitative results.
95% Hausdorff Distance (HD95) [27] is employed. As with In all metrics, all variants of CSBSR are better than their
IoU, the HD95 threshold for each method is also determined original segmentation methods. That is, (e), (g), (i), and (k)
so that the mean HD95 over all test images is minimized. are better than (d), (f), (h), and (j), respectively, in Table 2.
This minimized HD95 is called HD95𝑚𝑖𝑛 . For evaluation As a result, CSBSR is the best in all segmentation metrics
independently of thresholding, HD95s are also averaged (i.e., IoU, AIU, HD95, and AHD95).
over thresholds. This averaged HD95 is called AHD95. Our proposed methods are also compared with SoTA
segmentation methods using SR (i.e., (b) and (c) in Table 2).
0.7 1,000
0.6
0.5
100
0.4
0.3
10
0.2
(a) SS in HR (b) SrcNet (a) SS in HR (b) SrcNet
0.1 (c) DSRL (d) CSBSR w/o JL (c) DSRL (d) PSPNet w/o JL
(e) CSSR w/ PSPNet (f) CSBSR w/ PSPNet (e) CSSR w/ PSPNet (f) CSBSR w/ PSPNet
0 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 6: IoU and HD95 comparison with SOTA methods on the Khanhha dataset. (a) HR segmentation by PSPNet [124] (b) SR
segmentation by SrcNet. (c) SR segmentation by DSRL. (d) SR segmentation by CSBSR w/o joint learning. (e) SR segmentation by
CSSR. (f) SR segmentation by CSBSR.
(a) Input LR (b) HR GT (c) SrcNet [12] (d) DSRL [102] (e) Ours
The performance improvement of CSBSR compared to Sr- images are also shown as (a) in Fig. 6, while LR images are
cNet might be acquired by BC loss, joint learning, and/or fed into all other methods (b), (c), (d), (e), and (f) in Fig. 6.
blind SR. In comparison between CSBSR and DSRL, we It can be seen that (b) SrcNet and (c) DSRL are clearly
can see the effectiveness of serial joint learning, as well as inferior to others in both IoU and HD95. In particular, the
BC loss and blind SR. scores of DSRL are significantly changed depending on a
Even in comparison with (a) segmentation in HR images change in the threshold. This reveals that DSRL is sensitive
(implemented by PSPNet with BC loss), the segmentation to a change in the threshold. The scores of all other methods
scores of CSBSR get close to those of segmentation in HR. accepting LR images are close to those of (a) segmentation
For example, IoU and AIU of CSBSR with PSPNet are in HR images. In particular, (f) CSBSR can get higher scores
93.0% and 98.7% of those of segmentation in HR. In terms in a wide range of the thresholds. This stability against a
of HD95, on the other hand, CSBSR is much inferior to change in the threshold is crucial in applying CSBSR to a
segmentation in HR. This reveals that CSBSR should be variety of segmentation tasks.
improved more in order to extract fine crack structures. Visual Results: Figure 7 shows visual results. In the upper
The IoU and HD95 scores of our proposed method with row, from left to right, the first and second images are an
CSBSR are shown in Fig. 6. For comparison, our method input LR image (enlarged by nearest neighbor interpolation)
with non-blind SR (i.e., CSSR) and SOTA segmentation and its ground-truth HR image. The remaining three images
methods using SR are compared with CSBSR. As the upper are SR images of SrcNet, DSRL, and CSBSR. It can be seen
limitation, the scores of segmentation on ground-truth HR that the SR image of CSBSR is much sharper than those of
Images
Labels
Images
Labels
Images
Labels
(a) Input LR (b) HR GT (c) SS in HR (d) SrcNet [12] (e) DSRL [102] (f) CSBSR
Figure 8: Visual comparison on the Khanhha dataset. In the upper row of each example: (a) Input LR image (enlarged by Bicubic
interpolation for visualization. (b, c) Ground-truth HR image. (d) SR image obtained by SrcNet. (e) SR image obtained by DSRL. (f)
SR image obtained by our CSBSR. In the lower row of each example: (a) No image. (b) Ground-truth segmentation image in HR. (c) HR
segmentation image obtained by PSPNet [124] (d) SR segmentation image obtained by SrcNet. (e) SR segmentation image obtained by
DSRL. (f) SR segmentation image obtained by our CSBSR.
SrcNet and DSRL. In terms of the crack segmentation image HR images shown in Fig. 8 (c). It can also be seen that
also, CSBSR outperforms SrcNet and DSRL. CSBSR can reconstruct and detect even thin fine cracks in
Figure 8 shows the examples of more complex cracks. the SR image and segmentation images, respectively. As a
Since such complex crack pixels make it difficult to correctly result, our results are similar to the ground-truth segmenta-
detect these pixels, even segmentation methods using SR tion images shown in Fig. 8 (b).
reconstruction (i.e., SrcNet [12] and DSRL [102]) cannot Figure 9 shows examples where (f) the SR segmentation
detect many crack pixels, as shown in Fig. 8 (d) and (e). image obtained by CSBSR is better even than (c) the HR
As shown in Fig. 8 (f), on the other hand, our CSBSR can segmentation image obtained in the ground-truth HR im-
obtain crack segmentation images that are similar to their age. These images are characterized by low image-contrast
corresponding segmentation images obtained in the original
Images
Labels
Images
Labels
(a) Input LR (b) HR GT (c) SS in HR (d) SrcNet [12] (e) DSRL [102] (f) CSBSR
Figure 9: Examples where (f) the SR segmentation image obtained by our CSBSR is better than (c) the HR segmentation image obtained
in the ground-truth HR image.
around crack pixels, thin cracks, and/or local illumination More specifically, in terms of the segmentation results,
change around crack pixels. IoU𝑚𝑎𝑥 and AIU are not so changed depending on 𝛽. On the
We interpret that, while it is difficult for SR to recon- other hand, the best HD95𝑚𝑖𝑛 and AHD95 scores are better in
struct and for segmentation to detect such high-frequency the training strategy with increasing 𝛽 (i.e., “Increasing” in
structures and low-contrast structures shown in Figs. 8 the table) and have a larger margin from the scores obtained
and 9, our joint learning of SR and segmentation with the with any fixed 𝛽. Intuitively speaking, the segmentation
segmentation-aware SR loss and the blur skip for blur- score should be best with 𝛽 = 1 so that the segmentation
reflected segmentation learning can achieve these difficult loss (i.e., in Eq. 2) is fully weighted. We interpret that the
tasks. segmentation scores are not best with 𝛽 = 1 because it is
Figure 10 shows sample test images where no crack difficult to fully optimize the whole network directly from
pixels are observed. While there are no crack pixels in these the pre-trained SR and segmentation networks. That is why
images, observed masonry joints tend to be false-positives. the training strategy with increasing 𝛽 is better than 𝛽 = 1.
For real applications using automatic image inspection, it is In terms of the SR image quality, While the best SSIM
important to successfully suppress such false-positives for is acquired without joint learning, the best PSNR is with
avoiding false alarms because most images have no crack 𝛽 = 0.3. Since the SR network is trained without joint
pixels in real buildings. In Fig. 10, it can be seen that learning just to improve SR, it is expected that the best SR
(d) SrcNet and (e) DSRL detect false-positives around the results are obtained without joint learning. This expecta-
masonry joints, while (f) CSBSR successfully neglects all tion is betrayed probably because of the feature extractor
of these masonry joint pixels. augmentation through the training of the segmentation task.
The features can be marginally augmented also for SR as
4.2.4. Effects of 𝛽 in multi-task learning if 𝛽 is smaller, while the features are
Table 3 shows the evaluation results obtained in accor- optimized for the segmentation task if 𝛽 is larger.
dance with changes in 𝛽. In all metrics of both SR and
segmentation tasks, CSBSR outperforms CSSR. Further- 4.2.5. Effects of Segmentation losses
more, in both CSSR and CSBSR, our proposed joint learning To verify the effectiveness of our BC and GBC losses,
acquires better results in all segmentation metrics. CSBSR is trained with other losses for class-imbalance
Images
Labels
Images
Labels
(a)(a)Input LR
Input LR (b)
(b) HR GT
HR GT (c)
(c) SS inHR
SS in HR (d)SrcNet
(d) SrcNet [7]
[12] (e)
(e) DSRL [87]
DSRL [102] (f)CSBSR
(f) Ours
Figure 10: Examples where there are no crack pixels in (a) input LR image.
0.7 1,000
0.6
SS in HR BC
GBC WCE
0.5 B + GDice Combo
100
0.4
0.3
SS in HR BC 10
0.2
GBC WCE
0.1
B + GDice Combo
0 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 11: Curves of IoU and HD95 scores varying with a change in the threshold for segmentation image binarization.
segmentation (i.e., WCE [26], Dice [77], Combo [98], and other hand, while WCE gets better results in a few metrics in
GDice [97]). As shown in Table 4, BC loss gets the best Table 4, its performance drop depending on the threshold is
scores in four metrics (i.e., IoU, AIU, AHD95, and PSNR) significant. This performance drop makes it difficult to apply
and the second-best in HD95. While it is the third place in WCE loss to a variety of scenarios. As with GBC, the curves
SSIM, the gap from the best is tiny (0.705 vs 0.703). of BC are also not so decreased.
Figure 11 shows IoU and HD95 scores varying with a Based on the aforementioned observations, we conclude
change in a threshold for binarizing the segmentation image. that our BC and GBC loesses are superior to other SOTA
As shown in Table 4, GBC is inferior to BC. However, GBC losses in terms of the max performance (as shown in Table 4)
gets higher scores in a large range of thresholds in both IoU and stability (as shown in Fig. 11).
and HD95. This property might be given by GDice, included
in GBC, which works robustly to class imbalance. On the
Table 3
Performance change depending on 𝛽. 𝛽 is fixed during Step 3 in the training strategy, except for “Increasing” shown in the bottom line in
which 𝛽 is increased from 0 to 1 in proportion to iterations.
Table 4
Comparison with other losses for class-imbalance segmentation. The best and second best scores are colored by red and blue, respectively.
4.2.6. Effects of Segmentation-aware SR-loss Weights 𝑤𝐹 given to 𝑆 are almost equal to those of the
The effects of additional weights given to 𝑆 , which are baseline CSBSR (i.e., 0.573 vs 0.573 in IoU and 0.551
proposed in Sec. 3.3, are evaluated in Table 5. Since 𝑤𝐶 and vs 0.552 in AIU).
𝑤𝐹 have hyper parameters
{ 𝐶 (i.e.,} 𝑚 {and
𝐶 𝑚𝐹 , respectively), } • While 𝑤𝐹 weights the segmentation loss (𝐶 ), the
the best results among 𝑚 , 𝑚 = 2 , 2−2 , 2−1 , 20 , 21 , 22 , 23
𝐹 −3
results are inferior to the baseline in most metrics, as
are shown in Table 5. We can see the following observations:
shown in the bottom row of Table 5.
• All weights given to 𝑆 improve HD95.
In addition to the quantitative comparison shown in
• Conversely, all weights given to 𝑆 decrease IoU Table 5, Fig. 12 visually shows the effect of the FO weight.
and AIU, while the performance drops are not so All images are the results obtained with 𝑤𝐹 = 1.0. In the
significant. In particular, IoU and AIU provided by left part of Fig. 12, we can see that 𝑤𝐹 allows CSBSR to
Table 5
Ablation study of weights given to 𝑆 , namely 𝐶 , 𝑤𝐶 , and 𝑤𝐹 . Scores better than the baseline (i.e., CSBSR w/o any weight) are underlined.
Images
Cropped
Labels
Images
Cropped
Labels
(a) Input LR (b) HR GT (c) w/o 𝑤𝐹 (d) w/ 𝑤𝐹 (b’) HR GT (c’) w/o 𝑤𝐹 (d’) w/o 𝑤𝐹
Figure 12: Visual comparison between CSBSR w/ and w/o the FO weight 𝑤𝐹 . [Left part] In the upper row of each example: (a) Input
LR image (enlarged by Bicubic interpolation for visualization). (b) Ground-truth HR image. (c) SR image obtained by CSBSR w/o 𝑤𝐹 .
(d) SR image obtained by CSBSR. In the lower row of each example: (a) No image. (b) Ground-truth segmentation image in HR. (c) SR
segmentation image obtained by CSBSR w/o 𝑤𝐹 . (d) SR segmentation image obtained by CSBSR. [Right part] Rectangle regions are
cropped from the SR images shown in the left part, and their zoom-in images are shown. The boundary color of each cropped image shows
the correspondence between the cropped images in the left and right parts. Differences between (c’) and (d’) are pointed by white arrows.
Table 6
Ablation study of our blur skip process. Scores better than the baseline (i.e., CSBSR w/o any weight) are underlined.
detect thin crack pixels in segmentation images. In order to effectiveness of 𝑤𝐹 for discriminating between remarkably-
see the results of SR image enhancement by 𝑤𝐹 , the zoom-in similar crack and background pixels in the segmentation
images of several regions in the SR images are shown in the network of CSBSR.
right part of Fig. 12.
In (c) images obtained without 𝑤𝐹 , detected crack pixels 4.2.7. Effects of Blur Skip
are broken. In (d) images obtained with 𝑤𝐹 , on the other The effects of the proposed blur skip process are shown
hand, cracks are more continuously detected, though it is in Table 6. Since the quality of the estimated kernel is high
difficult to visually see any significant difference between enough (e.g., above 50 dB in PSNR), our kernel skip should
zoom-in SR images shown in (c’) and (d’). In an opposite have the potential to support the segmentation task. While
way, background textures enclosed by the purple dashed the single usage of the blur skip cannot work well for all
ellipse are falsely detected in CSBSR without 𝑤𝐹 , as shown metrics, the blur skip used with 𝑤𝐹 improves HD95 and
in (c) of the lower example. However, these background AHD95. The typical examples are shown in Fig. 13. While
pixels reconstructed by CSBSR without and with 𝑤𝐹 (en- the results without the blur skip are much inferior to their
closed by the purple dashed ellipses in (c’) and (d’)) are also ground truths, the blur skip can improve the performance, as
almost the same as each other. These results demonstrate the shown in the rightmost image in Fig. 13.
(a) GT (b) Results without blur skip (c) Results with blur skip
Figure 13: Effectiveness of our proposed blur skip. The left and right images show the HR/SR image and the segmentation image,
respectively.
4.3. Crack Images with Real Degradations CSBSR can detect more true-positive crack pixels, in par-
For experiments with real images, we captured 809 wall ticular, along a crack located in the upper part of the image
images (1280 × 720 pixels) with a flying drone (DJI MAVIC (enclosed by blue ellipses). However, there are also many
MINI). This dataset includes out-of-focus images as well false-negative crack pixels (enclosed by green ellipses) even
as motion-blurred images. By using all the images in this in the segmentation image of CSBSR.
dataset as test images, we visually verify the effectiveness In the input image shown in the third row of Fig. 14,
of CSBSR for realistically-blurred images. Since it is essen- there are thin electrical wires as well as thin cracks (enclosed
tially difficult to annotate severely-blurred cracks correctly, by blue and green ellipses). A crack segmentation method is
only qualitative comparison is done with this dataset. required to detect only real cracks without being disturbed by
In the first row of Fig. 14, cracks are very thin. DSRL the wires. DSRL detects several wire pixels (enclosed by the
and SrcNet cannot detect any crack pixels. In addition, false- yellow ellipse) and crack pixels, while SrcNet detects noth-
positive cracks (enclosed by yellow ellipses) are detected. ing. While CSBSR detects only crack pixels, even CSBSR
CSBSR, on the other hand, can detect most crack pixels, as fails to detect blurry cracks observed in the lower part of the
depicted by superimposed red pixels. image (enclosed by green ellipses).
The second row of Fig. 14 shows the segmentation As mentioned above, while our CSBSR outperforms
results detected on the image of complex cracks observed SOTA segmentation methods using SR, it also fails to detect
on a building wall. While DSRL detects no crack pixels, severely-degraded cracks. Improving crack segmentation in
SrcNet and CSBSR successfully detect several crack pixels. such severely-degraded images is important for future work.
5. Concluding Remarks [14] Yancheng Bai, Yongqiang Zhang, Mingli Ding, and Bernard
Ghanem. Finding tiny faces in the wild with generative adversarial
This paper proposes an end-to-end joint learning net- network. In CVPR, 2018.
work consisting of blind SR and segmentation networks. [15] Yancheng Bai, Yongqiang Zhang, Mingli Ding, and Bernard
Blind SR allows us to apply the proposed method to realistically- Ghanem. SOD-MTGAN: small object detection via multi-task
generative adversarial network. In ECCV, 2018.
blurred images. The information exchange between the SR
[16] Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita,
and segmentation networks (i.e., segmentation-aware SR- Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno
loss weights and blur skip for blur-reflected task learning) Lecouat, Youwei Li, Shuaicheng Liu, Ziluan Liu, Ziwei Luo,
enables further improvement. For better segmentation in Takahiro Maeda, Julien Mairal, Christian Micheloni, Xuan Mo,
class-imbalance fine crack images, BC loss is proposed. Takeru Oba, Pavel Ostyakov, Jean Ponce, Sanghyeok Son, Jian Sun,
Norimichi Ukita, Rao Muhammad Umer, Youliang Yan, Lei Yu,
Future work includes quantitative evaluation on real- Magauiya Zhussip, and Xueyi Zou. NTIRE 2021 challenge on burst
image datasets in which ground-truth segmentation pixels super-resolution: Methods and results. In CVPR Workshop, 2021.
are manually given. It is also interesting to apply CSBSR [17] Yochai Blau, Roey Mechrez, Radu Timofte, Tomer Michaeli, and
to other segmentation tasks such as medical imaging. An Lihi Zelnik-Manor. 2018 pirm challenge on perceptual image super-
essential difficulty in SR is that SR is an ill-posed problem resolution. In ECCV Workshop, 2018.
[18] Samuel Rota Bulò, Gerhard Neuhold, and Peter Kontschieder. Loss
in which a larger number of pixels are reconstructed from a max-pooling for semantic image segmentation. In CVPR, 2017.
smaller number of pixels. In order to relieve this difficulty, [19] Hanshen Chen, Yishun Su, and Wei He. Automatic crack segmen-
multiple LR images are used as a set of input images in video tation using deep high-resolution representation learning. Applied
SR [34, 80, 46, 42] and burst SR [16]. Our proposed method Optics, 60(21):6080–6090, 2021.
can also be extended to the one with time-series images. [20] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin
Murphy, and Alan L. Yuille. Deeplab: Semantic image segmen-
tation with deep convolutional nets, atrous convolution, and fully
5.1. Acknowledgments connected crfs. IEEE Trans. Pattern Anal. Mach. Intell., 40(4):834–
This work was partly supported by JSPS KAKENHI 848, 2018.
Grant Numbers 19K12129 and 22H03618. [21] Dong-Yoon Choi, Ji Hoon Choi, Jin Wook Choi, and Byung Cheol
Song. Sharpness enhancement and super-resolution of around-view
monitor images. IEEE Trans. Intell. Transp. Syst., 19(8):2650–2662,
References 2018.
[1] Crack segmentation. https://github.com/khanhha/crack_ [22] Wooram Choi and Young-Jin Cha. Sddnet: Real-time crack segmen-
segmentation. tation. IEEE Trans. Ind. Electron., 67(9):8016–8025, 2020.
[2] Crackformer-ii. https://github.com/LouisNUST/CrackFormer-II. [23] Victor Cornillère, Abdelaziz Djelouah, Yifan Wang, Olga Sorkine-
[3] Dual super-resolution learning for semantic segmentation. https: Hornung, and Christopher Schroers. Blind image super-
//github.com/Dootmaan/DSRL. resolution with spatially variant degradations˙ ACM Trans. Graph.,
[4] Torchvision.models. https://pytorch.org/vision/stable/models. 38(6):166:1–166:13, 2019.
html. [24] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang.
[5] Allen Zhang abd Kelvin C. P. Wang, Yue Fei, Yang Liu, Siyu Tao, Second-order attention network for single image super-resolution. In
Cheng Chen, Joshua Q. Li, and Baoxian Li. Deep learning–based CVPR, 2019.
fully automated pavement crack detection on 3d asphalt surfaces [25] Dimitris Dais, Ihsan Engin Bal, Eleni Smyrou, and Vasilis Sarho-
with an improved cracknet. Journal of Computing in Civil Engi- sis. Automatic crack classification and segmentation on masonry
neering, 32(5), 2018. surfaces using convolutional neural networks and transfer learning.
[6] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single Automation in Construction, 25, 2021.
image super-resolution: Dataset and study. In CVPRW, 2017. [26] Dimitris Dais, İhsan Engin Bal, Eleni Smyrou, and Vasilis Sarho-
[7] Kazutoshi Akita, Muhammad Haris, and Norimichi Ukita. Region- sis. Automatic crack classification and segmentation on masonry
dependent scale proposals for super-resolution in object detection. surfaces using convolutional neural networks and transfer learning.
In IPAS, 2020. Automation in Construction, 125:103606, 2021.
[8] Kazutoshi Akita, Masayoshi Hayama, Haruya Kyutoku, and Norim- [27] DeepMind. Surface distance metrics. https://github.com/deepmind/
ichi Ukita. AVM image quality enhancement by synthetic image surface-distance.
learning for supervised deblurring. In MVA, 2021. [28] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
[9] Rabih Amhazand, Sylvie Chambon, Jérôme Idier, and Vincent Bal- Fei. Imagenet: A large-scale hierarchical image database. In CVPR,
tazart. Automatic crack detection on two-dimensional pavement 2009.
images: An algorithm based on minimal path selection. TITS, [29] Qi Dong, Shaogang Gong, and Xiatian Zhu. Class rectification hard
17(10):2718–2729, 2016. mining for imbalanced deep learning. In ICCV, 2017.
[10] Md Rifat Arefin, Vincent Michalski, Pierre-Luc St-Charles, Alfredo [30] Markus Eisenbach, Ronny Stricker, Daniel Seichter, Karl Amende,
Kalaitzis, Sookyung Kim, Samira Ebrahimi Kahou, and Yoshua Klaus Debes, Maximilian Sesselmann, Dirk Ebersbach, Ulrike
Bengio. Multi-image super-resolution for remote sensing using deep Stoeckert, and Horst-Michael Gross. How to get pavement distress
recurrent networks. In CVPR Workshops, 2020. detection ready for deep learning? a systematic approach. In IJCNN,
[11] Leanne Attard, Carl James Debono, Gianluca Valentino, and 2017.
Mario Di Castro. Tunnel inspection using photogrammetric tech- [31] Faris Elghaish, Saeed Talebi, Essam Abdellatef, Sandra T Matarneh,
niques and image processing: A review. ISPRS Journal of Pho- M Reza Hosseini, Song Wu, Mohammad Mayouf, Aso Hajirasouli,
togrammetry and Remote Sensing, 140:180–18, 2018. et al. Developing a new deep learning cnn model to detect and
[12] Hyunjin Bae, Keunyoung Jang, and Yun-Kyu An. Deep super classify highway cracks. Journal of Engineering, Design and
resolution crack network (srcnet) for improving computer vision– Technology, 2021.
based automated crack detectability in in situ bridges. Structural [32] Yue Fei, Kelvin C. P. Wang, Allen Zhang, Cheng Chen, Joshua Q.
Health Monitoring, 20(4):1428–1442, 2021. Li, Yang Liu, Guangwei Yang, and Baoxian Li. Pixel-level cracking
[13] Yuval Bahat and Tomer Michaeli. Explorable super resolution. In detection on 3d asphalt pavement images through deep-learning-
CVPR, 2020.
[73] Andreas Lugmayr, Martin Danelljan, Luc Van Gool, and Radu Tim- [90] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully convo-
ofte. Srflow: Learning the super-resolution space with normalizing lutional networks for semantic segmentation. IEEE Trans. Pattern
flow. In ECCV, 2020. Anal. Mach. Intell., 39(4):640–651, 2017.
[74] Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, and Tieniu [91] Yong Shi, Limeng Cui, Zhiquan Qi, Fan Meng, and Zhensong Chen.
Tan. Unfolding the alternating optimization for blind super resolutio Automatic road crack detection using random structured forests.
n. In NeurIPS, 2020. TITS, 17(12):3434–3445, 2016.
[75] Jun Ma, Jianan Chen, Matthew Ng, Rui Huang, Yu Li, Chen Li, [92] Gyumin Shim, Jinsun Park, and In So Kweon. Robust reference-
Xiaoping Yang, and Anne L. Martel. Loss odyssey in medical image based super-resolution with similarity-aware deformable convolu-
segmentation. Medical Image Anal., 71:102035, 2021. tion. In CVPR, 2020.
[76] Yiqun Mei, Yuchen Fan, and Yuqian Zhou. Image super-resolution [93] Kodai Shimosato and Norimichi Ukita. Multi-modal data fusion
with non-local sparse attention. In CVPR, 2021. for land-subsidence image improvement in psinsar analysis. IEEE
[77] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Access, 9:141970–141980, 2021.
Fully convolutional neural networks for volumetric medical image [94] Maneet Singh, Shruti Nagpal, Richa Singh, and Mayank Vatsa. Dual
segmentation. In 3DV, 2016. directed capsule network for very low resolution image recognition.
[78] Shervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser In ICCV, 2019.
Kehtarnavaz, and Demetri Terzopoulos. Image segmentation using [95] Jae Woong Soh, Sunwoo Cho, and Nam Ik Cho. Meta-transfer
deep learning: A survey. arXiv, 2001.05566, 2020. learning for zero-shot super-resolution. In CVPR, 2020.
[79] Masashi Nagaya and Norimichi Ukita. Embryo grading with unre- [96] Simon Stent, Riccardo Gherardi, Björn Stenger, Kenichi Soga, and
liable labels due to chromosome abnormalities by regularized PU Roberto Cipolla. An image-based system for change detection on
learning with ranking. IEEE Trans. Medical Imaging, 41(2):320– tunnel linings. In MVA, 2013.
331, 2022. [97] Carole H. Sudre, Wenqi Li, Tom Vercauteren, Sébastien Ourselin,
[80] Seungjun Nah, Radu Timofte, Shuhang Gu, Sungyong Baik, Seokil and M. Jorge Cardoso. Generalised dice overlap as a deep learning
Hong, Gyeongsik Moon, Sanghyun Son, Kyoung Mu Lee, Xintao loss function for highly unbalanced segmentations. In MICCAI,
Wang, Kelvin C. K. Chan, Ke Yu, Chao Dong, Chen Change Loy, 2017.
Yuchen Fan, Jiahui Yu, Ding Liu, Thomas S. Huang, Xiao Liu, Chao [98] Saeid Asgari Taghanaki, Yefeng Zheng, S Kevin Zhou, Bogdan
Li, Dongliang He, Yukang Ding, Shilei Wen, Fatih Porikli, Ratheesh Georgescu, Puneet Sharma, Daguang Xu, Dorin Comaniciu, and
Kalarot, Muhammad Haris, Greg Shakhnarovich, Norimichi Ukita, Ghassan Hamarneh. Combo loss: Handling input and output im-
Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Jiayi Ma, balance in multi-organ segmentation. Comput Med Imaging Graph,
Hang Dong, Xinyi Zhang, Zhe Hu, Kwan-Young Kim, Dong Un 75:24–33, 2019.
Kang, Se Young Chun, Kuldeep Purohit, A. N. Rajagopalan, Yapeng [99] Hossein Talebi and Peyman Milanfar. Learning to resize images for
Tian, Yulun Zhang, Yun Fu, Chenliang Xu, A. Murat Tekalp, computer vision tasks. In ICCV, 2021.
M. Akin Yilmaz, Cansu Korkmaz, Manoj Sharma, Megh Makwana, [100] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang,
Anuj Badhwar, Ajay Pratap Singh, Avinash Upadhyay, Rudrabha and Lei Zhang. Ntire 2017 challenge on single image super-
Mukhopadhyay, Ankit Shukla, Dheeraj Khanna, A. S. Mandal, San- resolution: Methods and results. In CVPRW, 2017.
tanu Chaudhury, Si Miao, Yongxin Zhu, and Xiao Huo. NTIRE 2019 [101] Radu Timofte, Shuhang Gu, Jiqing Wu, and Luc Van Gool. Ntire
challenge on video super-resolution: Methods and results. In CVPR 2018 challenge on single image super-resolution: Methods and re-
Workshop, 2019. sults. In NTIRE (CVPRW), 2018.
[81] Ben Niu, Weilei Wen, Wenqi Ren, Xiangde Zhang, Lianping Yang, [102] Li Wang, Dong Li, Yousong Zhu, Lu Tian, and Yi Shan. Dual super-
Shuzhen Wang, Kaihao Zhang, Xiaochun Cao, and Haifeng Shen. resolution learning for semantic segmentation. In CVPR, 2020.
Single image super-resolution via a holistic attention network. In [103] Longguang Wang, Yingqian Wang, Xiaoyu Dong, Qingyu Xu, Jun-
ECCV, 2020. gang Yang, Wei An, and Yulan Guo. Unsupervised degradation
[82] Yanwei Pang, Jiale Cao, Jian Wang, and Jungong Han. Jcs-net: Joint representation learning for blind super-resolution. In CVPR, 2021.
classification and super-resolution network for small-scale pedes- [104] Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. Recover-
trian detection in surveillance images. IEEE Trans. Inf. Forensics ing realistic texture in image super-resolution by deep spatial feature
Secur., 14(12):3322–3331, 2019. transform. In CVPR, 2018.
[83] Seobin Park, Jinsu Yoo, Donghyeon Cho, Jiwon Kim, and Tae Hyun [105] Zhihao Wang, Jian Chen, and Steven C. H. Hoi. Deep learning for
Kim. Fast adaptation to super-resolution networks via meta-learning. image super-resolution: A survey. TPAMI, 43(10):3365–3387, 2021.
In ECCV, 2020. [106] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simon-
[84] Prateek Prasanna, Kristin J. Dana, Nenad Gucunski, Basily B. celli. Image quality assessment: from error visibility to structural
Basily, Hung Manh La, Ronny Salim Lim, and Hooman Parvardeh. similarity. IEEE Transactions on image processing, 13(4):600–612,
Automated crack detection on concrete bridges. IEEE Trans Autom. 2004.
Sci. Eng., 13(2):591–599, 2016. [107] Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei, and
[85] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised rep- Haibin Ling. Feature pyramid and hierarchical boosting network
resentation learning with deep convolutional generative adversarial for pavement crack detection. TITS, 21(4):1525–1535, 2020.
networks. In ICLR, 2016. [108] Liang Yang, Bing Li, Wei Li, Liu Zhaoming, Guoyong Yang, and
[86] Amir Rezaie, Radhakrishna Achanta, Michele Godio, and Katrin Jizhong Xiao. Deep concrete inspection using unmanned aerial
Beyer. Comparison of crack segmentation using digital image vehicle towards cssc database. In IROS, 2017.
correlation measurements and deep learning. Construction and [109] Michael Yeung, Evis Sala, Carola-Bibiane Sch´’onlieb, and Leonardo
Building Materials, 261(20):120474, 2020. Rundo. Unified focal loss: Generalising dice and cross entropy-based
[87] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convo- losses to handle class imbalanced medical image segmentation.
lutional networks for biomedical image segmentation. In MICCAI, Computerized Medical Imaging and Graphics, 95:102026, 2022.
2015. [110] Tomoki Yoshida, Yuki Kondo, Takahiro Maeda, Kazutoshi Akita,
[88] Serim Ryou, Seong-Gyun Jeong, and Pietro Perona. Anchor loss: and Norimichi Ukita. Kernelized back-projection networks for blind
Modulating loss scale based on prediction difficulty. In ICCV, 2019. super resolution. https://github.com/Yuki-11/KBPN.
[89] Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. Singan: [111] Tomoki Yoshida, Yuki Kondo, Takahiro Maeda, Kazutoshi Akita,
Learning a generative model from a single natural image. In ICCV, and Norimichi Ukita. Kernelized back-projection networks for blind
2019. super resolution. arXiv, 2023.
[112] Yuhui Yuan, Xilin Chen, and Jingdong Wang. Object-contextual 6. Biography Section
representations for semantic segmentation. In ECCV, 2020.
[113] Yuhui Yuan et al. Ocnet series. https://github.com/openseg-group/
openseg.pytorch/blob/master/MODEL_ZOO.md.
[114] Kai Zhang, Jingyun Liang Luc Van Gool, and Radu Timofte. De- Yuki Kondo received the bachelor degree in en-
signing a practical degradation model for deep blind image super- gineering from Toyota Technological Institute in
resolution. In ICCV, 2021. 2022. Currently, he is a researcher with Toyota
[115] Kai Zhang, Luc Van Gool, and Radu Timofte. Deep unfolding Technological Institute. His research interests in-
network for image super-resolution. In CVPR, 2020. clude low-level vision including image and video
[116] Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, super-resolution and its application to tiny image
Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong analysis such as crack detection. His award in-
Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, cludes the best practical paper award in MVA2021.
Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru
Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu,
Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Norimichi Ukita received the B.E. and M.E. de-
Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon grees in information engineering from Okayama
Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, University, Japan, in 1996 and 1998, respectively,
Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu and the Ph.D. degree in Informatics from Kyoto
Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, University, Japan, in 2001. From 2001 to 2016, he
Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, was an assistant professor (2001 to 2007) and an
Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, associate professor (2007-2016) with the graduate
Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, school of information science, Nara Institute of
and C. V. Jiji. Ntire 2020 challenge on perceptual extreme super- Science and Technology, Japan. In 2016, he be-
resolution: Methods and results. In NTIRE (CVPRW), 2020. came a professor with Toyota Technological Insti-
[117] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a single tute, Japan. He was a research scientist of Precur-
convolutional super-resolution network for multiple degradations. In sory Research for Embryonic Science and Tech-
CVPR, 2018. nology, Japan Science and Technology Agency,
[118] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Deep plug-and-play during 2002–2006, and a visiting research scien-
super-resolution for arbitrary blur kernels. In CVPR, 2019. tist at Carnegie Mellon University during 2007–
[119] Kaige Zhang, Yingtao Zhang, and Heng-Da Cheng. Crackgan: 2009. Currently, he is also an adjunct professor at
Pavement crack detection using partially accurate ground truths Toyota Technological Institute at Chicago. Prof.
based on generative adversarial learning. TITS, 22(2):1306–1319, Ukita’s awards include the excellent paper award
2020. of IEICE (1999), the winner award in NTIRE 2018
[120] Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu. challenge on image super-resolution, the 1st place
Road crack detection using deep convolutional neural network. In in PIRM 2018 perceptual SR challenge, the best
ICIP, 2016. poster award in MVA2019, and the best practical
[121] Yongqiang Zhang, Yancheng Bai, Mingli Ding, Shibiao Xu, and paper award in MVA2021.
Bernard Ghanem. Kgsnet: Key-point-guided super-resolution net-
work for pedestrian detection in the wild. IEEE Trans. Neural
Networks Learn. Syst., 32(5):2251–2265, 2021.
[122] Zhifei Zhang, Zhaowen Wang, Zhe L. Lin, and Hairong Qi. Image
super-resolution by neural texture transfer. In CVPR, 2019.
[123] Hengshuang Zhao. Pytorch semantic segmentation. https://github.
com/hszhao/semseg/blob/master/model/pspnet.py.
[124] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and
Jiaya Jia. Pyramid scene parsing network. In CVPR, 2017.
[125] Ruofan Zhou and Sabine Süsstrunk. Kernel modeling super-
resolution on real low-resolution images. In ICCV, 2019.
[126] Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural
networks with methods addressing the class imbalance problem.
IEEE Trans. Knowl. Data Eng., 18(1):63–77, 2006.
[127] Qin Zou, Yu Cao, Qingquan Li, Qingzhou Mao, and Song Wang.
Cracktree: Automatic crack detection from pavement images. Pat-
tern Recognition Letters, 33(3):227–238, 2012.
[128] Qin Zou, Zheng Zhang, Qingquan Li, Xianbiao Qi, Qian Wang, and
Song Wang. Deepcrack: Learning hierarchical convolutional fea-
tures for crack detection. IEEE Trans. Image Process., 28(3):1498–
1512, 2019.