Guided Linear Upsampling
Guided Linear Upsampling
SHUANGBING SONG, FAN ZHONG∗ , TIANJU WANG, XUEYING QIN, and CHANGHE TU, Shandong
University, China
interpolation
parameters
self-upsampling
Θ
joint
optimization
arXiv:2307.09582v1 [cs.CV] 13 Jul 2023
𝐼↓ 𝑇↓
input I output 𝑇%
Fig. 1. Our method to accelerate high-resolution image processing with guided linear upsampling. Given a high-resolution source image 𝐼 , our method can
jointly optimize the downsampled source image 𝐼 ↓ and the interpolation parameters Θ, and then 𝐼 ↓ is processed by a black-box image operator to get the
low-resolution target image 𝑇 ↓ . The high-resolution target image 𝑇ˆ can be linearly upsampled from 𝑇 ↓ with the optimized parameters Θ.
Guided upsampling is an effective approach for accelerating high-resolution ACM Reference Format:
image processing. In this paper, we propose a simple yet effective guided Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu.
upsampling method. Each pixel in the high-resolution image is represented 2023. Guided Linear Upsampling. ACM Trans. Graph. 42, 4 (August 2023),
as a linear interpolation of two low-resolution pixels, whose indices processing
and 12 pages. https://doi.org/10.1145/3592453
weights are optimized to minimize the upsampling error. The downsampling
can be jointly optimized in order to prevent missing small isolated regions. 1 INTRODUCTION
Our method can be derived from the color line model and local color transfor-
mations. Compared to previous methods, our method can better preserve In the past decades, many useful image processing methods have
detail effects while suppressing artifacts such as bleeding and blurring. It is been proposed for various tasks such as enhancement [Aubry et al.
efficient, easy to implement, and free of sensitive parameters. We evaluate 2014], style transfer [Li et al. 2018; Zhu et al. 2017], matting [Levin
the proposed method with a wide range of image operators, and show its ad- et al. 2007], colorization [Iizuka et al. 2016], etc. Most of them require
vantages through quantitative and qualitative analysis. We demonstrate the intensive computation and memory, and thus face great challenges
advantages of our method for both interactive image editing and real-time for high-resolution images. At the same time, the popularity of
high-resolution video processing. In particular, for interactive editing, the mobile devices requires us to consider more about computational
joint optimization can be precomputed, thus allowing for instant feedback efficiency. The problem is even more prominent for interactive image
without hardware acceleration.
editing [Bousseau et al. 2009; Levin et al. 2004], which requires
CCS Concepts: • Imaging/Video → Matting & Compositing; Interactive repetitive user interactions, so instant feedback is necessary for a
Editing. better user experience.
Additional Key Words and Phrases: guided upsampling, optimized down-
For general image processing, the guided upsampling should be
sampling, image processing the simplest and most effective way to achieve acceleration. By using
the original image as a guidance map, a large ratio downsampling
∗ Corresponding author. of the output image can be upsampled to the original resolution
without noticeable artifacts. This is amazing because even for image
Authors’ address: Shuangbing Song, [email protected]; Fan Zhong, zhongfan@ operators of linear complexity in image size, using 8× downsampling
sdu.edu.cn; Tianju Wang, [email protected]; Xueying Qin, [email protected]. can result in 64× speed up.
cn; Changhe Tu, [email protected], Shandong University, China.
Two classical approaches for guided upsampling are joint bilateral
Permission to make digital or hard copies of all or part of this work for personal or
upsampling (JBU) [Kopf et al. 2007] and bilateral guided upsampling
classroom use is granted without fee provided that copies are not made or distributed (BGU) [Chen et al. 2016]. JBU is an extension of the bilateral fil-
for profit or commercial advantage and that copies bear this notice and the full citation ter [Durand and Dorsey 2002; Paris and Durand 2006], while BGU is
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or based on the local color transformations [Levin et al. 2007], whose
republish, to post on servers or to redistribute to lists, requires prior specific permission effectiveness for guided upsampling has been demonstrated in ear-
and/or a fee. Request permissions from [email protected]. lier works such as transform recipes [Gharbi et al. 2015] and guided
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
0730-0301/2023/8-ART $15.00 filter [He et al. 2012]. In BGU, the local transformations are applied
https://doi.org/10.1145/3592453 in the bilateral space [Barron et al. 2015], which further improves the
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
2 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
Guided Linear Upsampling • 3
(a) Source (560 × 560) (b) JBU (8× ) (c) GLU∗ (16× ) (d) GLU∗ (32× ) (e) GLU (32× )
Fig. 3. Guided self-upsampling with JBU and our method. The input source image is downscaled and then upsampled to the original resolution with the
guidance of itself. GLU∗ is our method as described in Section 3.1, GLU is the accelerated version introduced in Section 3.2. They both can well recover the
original image for large ratios as 32×. As the comparison, JBU produces obvious blur even for smaller ratios as 8×.
which is a combinatorial optimization problem that is usually diffi- 3.2 Efficient Computation
cult to solve. Fortunately, in our case Ω𝑝 ↓ is a small neighborhood The complexity of the above method is quadratic to the number of
(a 3 × 3 window in our experiments), so it is easy to enumerate all pixels in Ω𝑝 ↓ . For a typical 3 × 3 window, it needs to check 36 pairs
possible pixel pairs. For each selected pixel pair (𝑎, 𝑏), the optimal of pixels in order to minimize Eq. (3). For high-resolution images,
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
4 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
this still requires a large amount of computation, so we propose the downscaling and upscaling processes, which however, are not well
following improvements for better efficiency. suited for the proposed GLU upsampler.
Firstly, we find that it is not necessary to enumerate all pixel pairs Given the GLU upsampler Ψ(𝐼 ↓ , Θ), we can formulate the down-
(𝑎, 𝑏) ∈ Ω𝑝 ↓ in order to optimize Eq. (3). Instead, we can first fix 𝑎 sampling process as an optimization problem aiming to minimize
as the pixel with the most similar color to 𝐼𝑝 , and then optimize only the self-upsampling error of the source image. In practice since Θ
𝑏 and 𝜔𝑎𝑏 with respect to Eq. (3). In this way, the complexity can is unknown, the downsampling and upsampling need to be jointly
be reduced to be linear with |Ω𝑝 ↓ |. Since 𝑎 is close to 𝐼𝑝 in the color optimized as
space, the approximation error should be small for the projection of 𝐼 ↓ , Θ = arg min ∥ 𝐼 − Ψ(𝐼 ↓ , Θ) ∥ (6)
𝐼𝑝 on the color line. 𝐼 ↓ ,Θ
Secondly, it is easy to see that if 𝐼𝑝 is on the color line determined
with each pixel of 𝐼 ↓ from exactly one pixel of 𝐼 . Note that this is
by 𝐼𝑎↓ and 𝐼𝑏↓ , the interpolation weight 𝜔𝑎𝑏 as in Eq. (4) reduces to
different from previous downsampling optimization methods, in
∥ 𝐼𝑝 − 𝐼𝑏↓ ∥ which each pixel of 𝐼 ↓ is usually filtered from multiple pixels of 𝐼 in
𝜔𝑎𝑏 = (5) order to reduce aliasing artifacts. For our method, the filtering in
∥ 𝐼𝑝 − 𝐼𝑎↓ ∥ + ∥ 𝐼𝑝 − 𝐼𝑏↓ ∥ +𝜀
downsampling may significantly blur the upsampled image because
which can be computed more efficiently and the results are guar- it would shrink the endpoints of color lines, which is detrimental to
anteed to be in [0, 1]. Since the color lines not crossing 𝐼𝑝 are less image details.
likely to be selected, the above approximation has little impact on Eq. (6) can be solved by iteratively optimizing 𝐼 ↓ and Θ. Given 𝐼 ↓ ,
the quality of our method. the upsampling parameters Θ can be solved as in Algorithm 1. To
As shown in Figure 3, the above accelerations would not introduce optimize the downsampling, we first compute the pixel-wise error
noticeable differences compared to our original method, but the map 𝐸, 𝐸𝑝 =∥ 𝐼𝑝 − 𝐼ˆ𝑝 ∥. Obviously, the pixels with large error must
complexity is much lower. Therefore, in the following we will use be those that cannot be well represented by 𝐼 ↓ , and thus need to
the accelerated method by default. be added to 𝐼 ↓ by replacing some existing pixels. Note that since
Our final upsampling method is as described in the Algorithm each pixel in 𝐼 ↓ may be used to interpolate multiple pixels of 𝐼 , the
1. It is very simple and efficient. Ω𝑝 ↓ is typically chosen as a 3 × 3 above operation may not reduce the total error. Therefore, we adopt
window, so for each pixel, only 9 pixel pairs need to be checked. a trial-and-error approach, and if replacing some pixels in 𝐼 ↓ does
Note that if we fix 𝜔𝑎𝑏 to 1, then the optimization in line 3 is not not reduce the total error, the replaced pixels would be rolled back.
necessary, and 𝑇ˆ𝑝 would be equal to 𝑇𝑎↓ . We call this special case Figure 6 illustrates the procedure of our method, more details are
of our method as Guided Nearest Upsampling (GNU). As shown described in Algorithm 2. The trial-and-error procedure is executed
in Figure 4, GNU lacks the ability to recover the ramp edges and for each connected region 𝐶𝑖 of pixels with large error (E). The
smooth variations of natural images, thus producing blocky effects pixels with large errors are tried to be added to the downsampled
and false contours, which can be effectively eliminated by using image, and the operation would be accepted if it can reduce the total
GLU. error, otherwise it would be unrolled. For multiple high-resolution
pixels [𝑞 ↑ ] mapped to the same low-resolution pixel location 𝑞 ∈ 𝐼 ↓ ,
ALGORITHM 1: Efficient Guided Linear Upsampling. the one with the largest error would be selected to replace the
Input: High-res source image 𝐼 , low-res source image 𝐼 ↓ and original color of 𝑞.
corresponding target image 𝑇 ↓ . As shown in Figure 5, the above method can effectively prevent
Output: High-res target image 𝑇ˆ . the missing of thin structures and small regions. In most cases, it
1 for each pixel 𝑝 ∈ 𝐼 do requires only 1 or 2 iterations to converge, and after the initialization,
2 Find 𝑎 as the pixel in Ω𝑝↓ with the most similar color to 𝐼𝑝 ; only pixels with large errors are involved for further processing, so
3 Fix 𝑎 and optimize 𝑏, 𝜔𝑎𝑏 with Eq. (3)(5); only a little more computation is required.
4 Compute 𝑇ˆ𝑝 as Eq. (1);
5 end
4 ANALYSIS
An ideal guided upsampling method should be able to preserve the
detail effects of the target image while avoiding artifacts such as
bleeding and blurring. In the following we will analyze the capabili-
3.3 Downsample Optimization ties of our method and show how it relates to previous methods.
For large downsampling ratio, isolated thin structures and small
regions may be completely lost due to regular grid downsampling. 4.1 Theoretical Derivation
In this case, it would be impossible for the upsampling process to The proposed upsampling method in Section 3.1 can be derived from
recover the original content. Figure 5 demonstrates such a situation. the color line model [Levin et al. 2007] and local color transformation
Although downsampling optimization has been extensively studied, methods [Chen et al. 2016; He et al. 2012; Levin et al. 2007].
previous works mainly aim to avoid aliasing artifacts [Kopf et al. The color line model tells us that the colors of pixels in a small
2013; Oeztireli and Gross 2015; Weber et al. 2016], which is different patch should be roughly on the same line in the color space. There-
from our goal. Some super-resolution methods [Kim et al. 2018; fore, the color of each pixel in the patch must be well approximated
Sun and Chen 2020; Xiao et al. 2020] also jointly optimize their by the linear interpolation of the two endpoints [𝑎, 𝑏] of the color
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
Guided Linear Upsampling • 5
(a) Source (284 × 202) (b) 16× downsampling (c) GLU − (d) Optimized downsampling (e) GLU
Fig. 5. Demonstration of downsample optimization. (a) The input image with some thin structures. (b) Most thin structures are lost with 16× default
downsampling. (c) The result upsampled from (b), the thin structures cannot be recovered. (d) Optimized 16× downsampled image. (e) The result upsampled
from (d), the thin structures are well recovered.
8×↓
𝐼↓
GLU(Θ)
1st iteration
Optimizer
𝐼↓
GLU(Θ)
2nd iteration
Optimizer …
Optimizer for 𝑪𝒊
𝐼 ⊖ 𝐼 ⊖ Scroll back 𝐼↓ , Θ and ℇ
𝑒( = * 𝐸,
,∈/0 𝑌
𝐼 ↓ , Θ, ℇ 𝑁 𝐼 ↓ , Θ, ℇ
ℇ( ℇ4 Update 𝐼↓ , Θ, ℇ with 𝐶3 𝑒4 > 𝑒(?
𝑒4 = * 𝐸,
,∈/0
𝐶3
Fig. 6. Illustration of the proposed downsample optimization method. For the input high-resolution image 𝐼 , we initialize 𝐼 ↓ with regular grid downsampling,
and then iteratively update 𝐼 ↓ by trying to add large-error pixels to it for minimizing the total upsampling error.
line. After downsampling, it can be expected that [𝑎, 𝑏] still can be can recover only the step edges, and thus would introduce significant
well represented by two pixels in the downsampled patch, because artifacts, as is shown in Figure 4.
of the information redundancy in the high-resolution image. As a A natural question is whether we can achieve further improve-
result, each pixel color in the original patch can also be linearly ments by interpolating more pixels. Indeed, Eq. (1) can be more
interpolated by two pixels in the downsampled patch, as in Eq. (1). generally expressed as
The local color transformation methods assume that the output
𝑇ˆ𝑝 =
∑︁
image can be locally represented as the affine transformation of 𝜔𝑞 𝑇𝑞↓ (9)
the input image, i.e. 𝑇𝑝 = 𝐴𝑝 𝐼𝑝 , where 𝐴𝑝 is an affine transforma- 𝑞 ∈Ω𝑝 ↓
tion that varies smoothly over the image space. In addition, we with 𝜔𝑞 as the normalized weights. Interestingly, this is exactly the
require the operator to be approximately scale-invariant: 𝑇𝑝↓ = 𝐴𝑝 𝐼𝑝↓ . form of JBU. However, in JBU 𝜔𝑞 is not optimized, and the filtering
Therefore, if using Θ𝑝 can linearly interpolate 𝐼𝑝 , i.e. effect would result in blur and edge reversal artifacts [He et al. 2012].
𝐼𝑝 = 𝜔𝑎𝑏 𝐼𝑎↓ + (1 − 𝜔𝑎𝑏 )𝐼𝑏↓ (7) It is easy to see that when 𝜎𝑑 → ∞ and 𝜎𝑟 → 0, JBU will reduce
to GNU. However, in practice this is hard to achieve due to the
then it immediately follows that numerical problems of the exp weighting function. By decreasing
𝜎𝑟 , the blurring artifacts of JBU can be reduced, but may lead to
𝑇𝑝 = 𝜔𝑎𝑏 𝐴𝑝 𝐼𝑎↓ + (1 − 𝜔𝑎𝑏 )𝐴𝑝 𝐼𝑏↓ = 𝜔𝑎𝑏𝑇𝑎↓ + (1 − 𝜔𝑎𝑏 )𝑇𝑏↓ (8)
aliasing artifacts as GNU. Therefore, in this sense both GLU and
which means that using Θ𝑝 also can linearly interpolate 𝑇𝑝 , as we GNU can be taken as special cases of JBU with optimized weights.
have assumed in Section 3.1. Although not tested, we do not see the need to take more pix-
els for interpolation. Involving more pixels not only makes the
4.2 Edge Recovery optimization more difficult, but may also lead to overfitting and
Typical image edges can be classified into three types: step edge, extrapolation, both of which can reduce the result quality.
ramp edge, and roof edge [Koschan and Abidi 2005; Yin et al. 2019].
For natural images, most edges should be ramp edges connecting 4.3 Detail Preservation
two regions. Obviously, the transition effects of ramp edges can be As discussed in Section 4.1, our method implicitly takes advantage
well represented by linear interpolation of the two region colors. of the local color transformation for transferring the upsampling
Therefore, by interpolating only two pixels, GLU can recover the parameters. However, it should be noted that unlike previous ap-
edges of the original image very well. As a comparison, using GNU proaches such as guided filter [He et al. 2012] and BGU [Chen
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
6 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
ALGORITHM 2: Joint Optimization of Down- and Upsampling. image. However, if the pixel affinities of the source and target im-
Input: High-res source image 𝐼 , the error threshold 𝜏, the maximum ages are significantly different (e.g., when new edges are introduced
iterations 𝑁 . in the target image), unsmooth artifacts may be produced. Actually,
Output: Optimized low-res image 𝐼 ↓ and upsampling parameters Θ. this is the main limitation of our method, which we will discuss
1 Initialize 𝐼 ↓ with regular grid downsampling; further in Section 6.
2 Initialize Θ from 𝐼, 𝐼 ↓ with Algorithm 1;
3 Compute initial error map 𝐸; 5 EXPERIMENTS
4 for n = 1, ..., N do In experiments we evaluate the proposed method in various image
5 Find the set of pixels with large error: E = {𝑝 | 𝐸𝑝 > 𝜏 } ;
processing applications, and compare it qualitatively and quantita-
6 if E=∅ then
tively with previous methods. We also demonstrate the advantages
7 Break
of our method for interactive image editing and real-time video
8 end
processing, and reveal its limitations for more diverse applications.
9 Cluster E as connected components 𝐶 1 , · · · , 𝐶𝑀 ;
10 for 𝑖 = 1, · · · , 𝑀 do
11 Backup Θ, 𝐼 ↓ , 𝐸 for scroll back; 5.1 Comparisons
Compute 𝑒 0 = 𝑝 ∈𝐶𝑖 𝐸𝑝 ; For quantitative evaluations we tested our method with the follow-
Í
12
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
Guided Linear Upsampling • 7
Table 1. Comparisons of different methods with PSNR scores. The low-resolution target is produced from the low-resolution source using the image operator,
except the applications with † , for which the low-resolution target is obtained by downsampling the reference image.
alpha matting colorization unsharp mask 𝐿0 -smoothing dehazing laplacian filter unsharp mask†
PSNR↑
8× 16× 8× 16× 8× 16× 8× 16× 8× 16× 8× 16× 8× 16×
JBU 25.6 22.9 20.9 20.1 18.2 16.9 22.3 20.1 25.9 22.3 15.6 13.8 19.0 17.8
BGU-fast 21.4 22.2 28.5 27.8 23.8 23.2 22.8 22.2 21.1 17.7 21.8 21.3 25.1 24.9
BGU 28.3 25.8 30.7 28.8 23.5 22.4 27.0 25.4 26.8 23.4 23.7 22.3 25.4 25.0
GLU − 31.4 28.9 29.7 27.7 23.6 22.3 23.6 24.5 27.6 24.1 20.5 17.2 25.2 24.0
GLU 31.5 29.1 31.3 29.6 24.0 22.4 28.8 27.1 27.6 24.1 23.1 24.5 25.9 25.2
alpha matting colorization unsharp mask 𝐿0 -smoothing dehazing laplacian filter unsharp mask†
SSIM↑
8× 16× 8× 16× 8× 16× 8× 16× 8× 16× 8× 16× 8× 16×
JBU 0.93 0.91 0.60 0.56 0.40 0.36 0.80 0.78 0.91 0.88 0.32 0.27 0.41 0.37
BGU-fast 0.71 0.64 1.00 1.00 0.85 0.84 0.83 0.82 0.79 0.72 0.82 0.81 0.77 0.75
BGU 0.86 0.78 1.00 1.00 0.88 0.85 0.88 0.84 0.90 0.85 0.88 0.84 0.79 0.77
GLU − 0.96 0.94 0.97 0.97 0.86 0.83 0.86 0.83 0.94 0.89 0.80 0.68 0.82 0.79
GLU 0.96 0.94 0.99 0.99 0.87 0.83 0.89 0.85 0.94 0.89 0.85 0.80 0.83 0.81
alpha matting
source local zoom reference JBU (20.2/0.88) BGU (25.3/0.79) GLU (28.2/0.94)
colorization
source local zoom reference JBU (22.9/0.70) BGU (31.6/1.00) GLU (34.0/0.99)
L0-smoothing
source local zoom reference JBU (16.3/0.96) BGU (30.8/0.85) GLU (31.1/0.98)
Fig. 8. Visual comparisons of different methods with 8× downsampling. In the parentheses are the PSNR/SSIM scores.
optical flow
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
source local zoom reference JBU (31.6/0.98) BGU (29.4/0.96) GLU (35.1/0.98)
8 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
Guided Linear Upsampling • 9
(a) Source (3270 × 2388) (b) BGU 32× (c) BGU 64× (d) JBU 32×
(e) Reference (f) GLU 32× (g) GLU 64× (h) GLU 128×
Fig. 11. Experiments with large ratios of downsampling and upsampling. JBU produces obvious blur for 32×, BGU produces significant bleeding for 64×, and
our method gets pretty good results even for 128×. For this experiment the low-resolution target images are downsampled from the reference, in order to
observe the net effect of upsampling methods.
Table 3. Effect of the window size 𝑆 to the resulting quality in PSNR. only about 5ms for 2K images and 14ms for 4K images, so it can be
easily incorporated for real-time video processing.
window size 3×3 5×5 7×7 9×9
A great advantage of our method is that the joint optimization
self-upsampling 41.31 42.83 44.04 44.92 process is target-free, i.e. the optimization is independent of the
matting 35.47 35.18 34.76 34.15 target image. Therefore, it can be precomputed before the target
image is acquired, as we will demonstrate in Sections 5.4 and 5.5. For
colorization 32.39 29.95 29.94 28.74
applications where multiple operators may be applied for the same
image, optimized parameters can be cached and shared between
Table 4. Time cost (ms). different operators, which further reduces the overall computations
of our method.
JBU BGU BGU-fast GLU GLU −
Image Size
C++ Matlab Halide C++ CUDA
Compared with JBU and BGU, the parameter setting of our method
is much simpler. This advantage makes it more suitable to be used (a) (b)
as a universal guided upsampler for different situations.
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
10 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
image2Vangogh
apple2orange
horse2zebra Guided Linear Upsampling • 11
Fig. 16. Examples of CycleGAN style transfer [Zhu et al. 2017], which may introduce dramatic changes to local image structures. In this case, our method may
produce unsmooth artifacts, while BGU may smooth out the new structures. Since CycleGAN can only output 256×256 fixed-size results, the reference here is
the low-resolution target image.
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.
12 • Shuangbing Song, Fan Zhong, Tianju Wang, Xueying Qin, and Changhe Tu
ACM Trans. Graph., Vol. 42, No. 4, Article . Publication date: August 2023.