Fast Guided Median Filter
Fast Guided Median Filter
As described in [17], it is almost impossible to apply methods The remainder of this paper is organized so that Section II
that accelerate UWM filters to WM filters. The WM filter reviews the related work on faster computation for UWM
requires the construction of a weighted histogram. Since the and WM filters. We define the weighted median used in this
weights spatially vary for each local window, the sliding paper in Section III. The proposed WM filter is presented in
window approach cannot be applied directly to construct the Section IV, and its extensions to multidimensional, multichan-
weighted histogram. Several acceleration methods have been nel, and high precision data are presented in Section V. The
proposed for WM filters using ideas different from those of implementation details of our method are shown in Section VI.
UWM filters. Ma et al. [14] proposed a constant time WM Section VII shows the experimental results over high resolu-
filter: An input image is projected onto a 3D space, and tion, multidimensional, multichannel, and high precision data.
then an edge-preserving filter is applied to each 2D slice. Finally, we conclude our paper in Section VIII.
The advantage of this method is that any filter kernel can
be used, including bilinear and guided filter kernels. Since
II. A R EVIEW OF R ELATED W ORK
an edge-preserving filter is applied to all 2D slices, the
computational cost is not low enough to realize real-time pro- This section focuses on the study of UWM and WM filters
cessing. Zhang et al. [17] proposed an innovative acceleration in achieving faster computation in image processing.
method for WM filters. They use a joint histogram with two
dimensions: one of the dimensions is pixel value and the other
is feature. Faster computation is achieved by combining the A. Median Filter
joint histogram with a median tracking technique and a data The median filter, which we also call the UWM filter,
structure that provides instant access to the data. This method replaces the pixel value with the median value in a window
achieves real-time processing for relatively small grayscale centered at each pixel. There are two categories of methods
images with small window sizes. Filter kernels that can be for calculating median values. The first category is histogram-
used in this method are limited to those calculated from the based methods, and the second category is sort-based methods.
color/intensity difference between two pixels. This includes The histogram-based methods first construct a histogram
the Gaussian kernel based on color/intensity distance. This of the pixels in the window and then search for their
means that this method cannot use a guided filter kernel, which median. For faster processing, faster histogram construction
calculates weights from multiple pixels. A fast WM filter using and faster median searches have been studied. To accelerate
a guided filter kernel is a challenging task that has not yet been the construction of histograms, many methods use a slid-
achieved. ing window approach that exploits the redundancy of the
In this paper, we propose a fast guided median filter overlapping regions of the windows of adjacent pixels. The
weighted by a pointwise guided filter kernel. Like the guided sliding window approach has two categories: The first is
filter kernel proposed in [12], the pointwise guided filter called a O(r ) sliding window approach and the latter a O(1)
kernel can prevent gradient reversal artifacts. The core idea sliding window approach. This is because the computational
of the proposed method is a formulation that allows for the complexities with respect to window radius r are O(r ) and
use of histogram updates with a sliding window approach O(1), respectively. Huang et al. [18] proposed a UWM filter
to find the weighted median. Compared to the conventional that uses the O(r ) sliding window approach to construct a
fast WM filters, the proposed method is not only faster but histogram efficiently. The histogram of a horizontal sliding
also applicable to multidimensional, multichannel, and high window can be updated efficiently by removing the elements
precision data. Until recently, conventional methods were used of an excluded column resulting from the sliding window and
to process various types of data. Examples include 8-bit color then adding the elements of a new column included also, from
image processing; multidimensional data such as video and the sliding window. To accelerate the median search, Huang
light field images; multichannel data such as multispectral et al. also used a technique referred to as median tracking [17].
images; and high precision data such as medical images, HDR Median tracking records changes in the number of pixels
images, and depth sensor data. The conventional methods have below the median value while the histogram is updating and
been insufficient for this data due to their heavy computational starts the median search for the next window from the median
cost and enormous memory requirements. This has lead to a value of the previous window. Huang’s method is extendable
need to reduce the data size, e.g., by using downsampling. in the temporal dimension too, so that it can be used as a
However, this leads to degraded accuracy in execution. Despite UWM filter for video [19]. The computational cost increases
the situation, there has been little discussion about using WM as the window size increases in the O(r ) sliding window
filtering for multidimensional, multichannel, or high precision approach. However, the computational cost is independent of
data. the window size in the O(1) approach [20], [21]. In the O(1)
The main contributions of this research are: approach, the histogram of a horizontal sliding window is
1) We propose an accelerated computation algorithm for updated by respectively subtracting and adding the histograms
the WM filter whose kernel is based on the guided filter. of the column excluded and the column included by the
Faster computation is achieved by formulating a WM sliding window. To accelerate the median search, a coarse-
filter to which a sliding window method can be applied. level histogram that accumulates only the higher order bits of
2) The proposed filter can be applied to multidimensional, pixels is used to reduce the number of bins to search for and
multichannel, and high precision data. update. SIMD operations are used to accelerate the addition
MISHIBA: FAST GUIDED MEDIAN FILTER 739
and subtraction between histograms. The O(1) sliding window it is computationally expensive. In this paper, we propose a
approach is also used in the arc-distance median filter [22]. fast WM filter based on the guided filter kernel.
The sort-based methods first sort the pixel values in the win-
dow and then search for their median. Similar to the histogram- III. A W EIGHTED M EDIAN F ORMULATION
based methods, the sorting-based methods use redundancy in In this section, we define the weighted median formulation
the window’s overlapping region of adjacent pixels and this used in this paper. Several formulations of weighted medians
accelerates the finding of the median. Chaudhuri [23] used have been proposed [28], [29], [30]. The weighted median
the O(r ) sliding window approach to compute rank orders in formulation is based on the formulation proposed in [14].
a window. This can be used for min, max, or median filters. Let f x ∈ F be a pixel value at x in a single channel
When updating ordered elements through a sliding window, input image f , where F = {i ∈ Z| f min ≤ i ≤ f max } and
the elements of the excluded column are removed, and the f min , f max ∈ Z are the minimum and maximum values that
elements of the windows newly included column are inserted f x can take, respectively. In [14], the weighted median f x⋆ in
using the principle of mergesort. Sánchez and Rodríguez [24] a local window centered at pixel x is defined as:
proposed a UWM filter that combines a sorting algorithm, j f max
1 X
based on the complementary cumulative distribution function, f x⋆ = min j
X
s.t. Hx (i) ≥ Hx (i), (1)
with the O(1) sliding window approach. Adams [25] proposed j∈F
i= f min
2
i= f min
fast UWM filters using separable sorting networks. The key
idea for the accelerating median search is to share most of where Hx is a local weighted histogram and a value at bin i
the sorting tasks between adjacent pixels. Sort-based meth- is calculated by
wx,y f yδ (i),
X
ods work faster than histogram-based methods for small to Hx (i) = (2)
medium window sizes or high precision data. y∈x
where x is a set of pixels inside the local window, wx,y is
B. The Weighted Median Filter a weight calculated from the affinity between pixels x and y,
and
The WM filter replaces the pixel value of each pixel with (
the weighted median value in a window centered at each δ 1 ( f x = i)
f x (i) = (3)
pixel. The weights are calculated from an input image itself 0 ( f x ̸ = i).
or from a guide image. Since the weights vary from window
to window, it is difficult to accelerate the construction of In this paper, the weighted median formulation is extended
the weighted histogram using a sliding window approach as as follows. A weighted cumulative histogram up to bin i can
is used in UWM filters. Ma et al. [14] proposed the first be expressed as
O(1) time WM filter. This method projected a 2D input i i
f yδ ( j).
X X X
image into a 3D space, where the extended coordinate was a Hx↓ (i) = Hx ( j) = wx,y (4)
histogram bin of the input image. By applying an O(1) edge- j= f min y∈x j= f min
preserving filter to 2D slices with the same bin, the method The weights wx,y are often calculated from the coefficients
achieved O(1) time computation. Since filtering is repeated of a smoothing filter. Since the sum of the coefficients of a
for the number of bins, it is difficult to perform real-time smoothing filter is 1, the following holds:
processing for high precision images. Zhang et al. [17] used f max
a joint histogram with the O(r ) sliding window approach to Hx↓ ( f max ) =
X
wx,y
X
f yδ ( j) =
X
wx,y = 1. (5)
accelerate the construction of a weighted histogram. Since the
y∈x j= f min y∈x
joint histogram had a large number of bins, data traversal
was time-consuming. To solve this problem, a data structure From this, the following weighted median with smoothing
that allows fast traversal was introduced. The large memory filter coefficients can be derived:
requirement of the joint histogram made it difficult to apply f x⋆ = min i s.t. Hx↓ (i) ≥ 0.5. (6)
the O(1) sliding window approach and to deal with mul- i∈F
↓
tichannel and high precision data. Zhao et al. [26] applied To find its solution, we track the change in Hx (i) with i.
the bilateral grid data structure [27] to WM filtering. Their Suppose all the weights are non-negative. When tracking from
↓
method achieved real-time processing on 2D images when i = f min with increasing i, the solution is i where Hx (i)
the sampling rate of the bilateral grid was high. Since their becomes 0.5 or higher for the first time. When tracking from
method was based on the bilateral grid, it had the same ↓
i = f max with decreasing i, the solution is i just before Hx (i)
problems that the bilateral grid had. For example, using a small becomes less than 0.5 for the first time. By generalizing this
window size required fine sampling, which incurred large observation, we define the weighted median tracked from a
memory and computational costs. Due to memory limitations, starting bin k as follows:
processing high resolution, high-dimensional, multichannel, ↓ ↓
min i s.t. Hx (i) ≥ 0.5 (Hx (k) < 0.5)
and high precision data requires coarse sampling, leading to
significant degradation of results. f x⋆ = i>k,i∈ F
1 + max i s.t. Hx↓ (i) < 0.5 (Hx↓ (k) ≥ 0.5).
Zhang’s method and Zhao’s method cannot use the guided i≤k,i∈F
filter kernel. Ma’s method can use the guided filter kernel, but (7)
740 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 32, 2023
This weighted median formulation is used for the proposed time using the separable summed area table (SSAT) or the one
method. pass summed area table (OP-SAT) [32].
When the weights are non-negative, the solutions of Using the multipoint guided filter kernel the histogram
(6) and (7) are the same. On the other hand, when the weights Hx (i) is,
contain negative values, the solutions are not necessarily the
same. As is seen in the next section, the weights used in the Hx (i) = µ(axδ (i))gx + µ(bδx (i)), (14)
proposed method may contain negative values. where
1
g y f yδ (i) − µ(gx )µ( f xδ (i))
P
IV. T HE FAST G UIDED M EDIAN F ILTER axδ (i) =
|x | y∈x
, (15)
The construction of a histogram Hx (i) shown in (2) impedes vx
δ δ δ
fast computation of WM filters. Section IV-A discusses the bx (i) = µ( f x (i)) − ax (i)µ(gx ). (16)
problems of the computation of Hx (i) in the WM filter
To use a sliding approach, we attemptPto rewrite these
using the guided filter kernel. Section IV-B presents the δ
equations using histograms. Let Fx (i) = y∈x f y (i) be a
proposed method to solve this problem. The proposed method δ δ
efficiently obtains the weighted median using the median histogram of f at pixel x. ax (i) and bx (i) can be rewritten
tracking technique [17] and the sliding window approach using Fx (i) as
δ
y∈x g y f y (i) − µ(gx )Fx (i)
shown in Sec. IV-C.
P
axδ (i) = , (17)
vx |x |
A. Problems With the Guided Filter Kernel 1
bδx (i) = Fx (i) − axδ (i)µ(gx ). (18)
For the histogram-based WM filter process, the weighted |x |
cumulative histograms are updated as follows:
These equations reveal three problems that arise when the
Hx↓ (i) = Hx↓ (i − 1) + Hx (i). (8) multipoint guided filter kernel is used in WM filtering. First,
δ
y∈x g y f y (i) in (17) cannot be expressed using a histogram
P
As this formula shows, fast computation of Hx (i) is the key of f . The naive calculation of this term is computationally
to achieving faster WM filtering. In the UWM filter, the inefficient. Second, the computation of the mean value of axδ (i)
histogram can be updated efficiently using a sliding approach. and bδx (i) is time consuming. Since axδ (i) and bδx (i) depends
This subsection discusses the difficulties of using the same on i, SSAT and OP-SAT cannot be used to calculate their
approach with the guided filter kernel. mean values. Third, since many histograms are required, the
Let gx ∈ R be a pixel value at pixel x in a single channel calculation is inefficient and requires a lot of memory. For
guide image g. The guided filter [12] assumes that the filtering the calculation of the mean values of axδ (i) and bδx (i), it is
output ex at x can be estimated by a linear transformation of necessary to keep histograms for all pixels in x .
g in a local window z centered at pixel z:
e x = az g x + bz , (9) B. Our Approach
where Instead of the multipoint guided filter kernel, the proposed
1 P
g y f y − µ(gz )µ( f z ) method uses a pointwise guided filter kernel, which uses
|z | y∈z
az = , (10) pointwise modeling [31]. The filtering result is
vz
bz = µ( f z ) − az µ(gz ), (11) ex = ax gx + bx = ax (gx − µ(gx )) + µ( f x ). (19)
µ(·⋆ ) is the mean value of · in ⋆ , |⋆ | is the number of This equation can also be formulated as
elements of ⋆ , and v⋆ = σ⋆ + ϵ where σ⋆ is a variance of g X X
in ⋆ and ϵ is a regularization parameter. The output value ex ex = cx g y f y + dx fy, (20)
is estimated in multiple windows, which is called multipoint y∈x y∈x
modeling [31]. In the rest of this paper, we call the guided where
filter proposed in [12] the multipoint guided filter. The final
gx − µ(gx ) 1
estimate is calculated as the average of multiple estimates. cx = , dx = − µ(gx )cx . (21)
vx |x | |x |
ex = µ(ax )gx + µ(bx ). (12)
The weight in the pointwise guided filter can be expressed as
The weight of the multipoint guided filter is 1
(gx − µ(gx ))(g y − µ(gx ))
X (gx − µ(gz ))(g y − µ(gz ))
wx,y = 1+ . (22)
wx,y =
1
1+ . |x | vx
|x |2 vz Unlike the multipoint guided filter, the weight of the pointwise
{z|x,y∈z }
(13) guided filter is not symmetric, i.e., wx,y ̸ = w y,x .
Using (20), Hx (i) can be expressed as
Here it is noted that the weights may contain negative values.
g y f yδ (i) + dx f yδ (i).
X X
The summation of the elements in the local window needed Hx (i) = cx (23)
in the calculation of mean values can be computed in O(1) y∈x y∈x
MISHIBA: FAST GUIDED MEDIAN FILTER 741
Fig. 2. (a) 2D window and column windows. (b) Column window update
in the O(1) sliding window approach.
entries are the pixel value of each channel at pixel x in the This problem is solved by having the histogram in a linked
guide image g, where m denotes the number of channels of list of high precision data in ascending order. Both O(r ) and
the guide image. The output of the pointwise guided filter with O(1) sliding window methods can use this solution. This paper
a multichannel guide image is expressed as discusses the O(1) sliding window approach to sorted linked
X X lists. Adding elements to the linked list is like the merge
e x = c⊤
x g y f y + dx fy, (27) process of a mergesort, although slightly different. Elements
y∈x y∈x
with the same bin are not inserted into the list but are added
where ·⊤ denotes the transpose of the matrix, to the existing elements. Please for a moment assume that
the linked list of the histogram of a column window has
gx − µ(gx ) 1
cx = Vx−1 , dx = x µ(gx ),
− c⊤ (28) already been sorted during processing in the previous row.
|x | |x | Then, adding one pixel to the histogram of the column window
Vx = 6x + ϵ I , 6x ∈ Rm×m is a covariance matrix of g in x , merges one element into the linked list. Adding the histogram
I is the identity matrix, and µ(gx ) ∈ Rm is a vector whose of the column window to the histogram of the main window
entries are the mean values of each channel of g in x . Using merges two sorted lists. When removing elements of the
(27), the histogram Hx is derived in the same way as for the histogram of the column window from the histogram of the
single channel as follows: main window, bins whose value becomes zero are removed
from the list. The computation time for updating the linked list
Hx (i) = c⊤
x Gx (i) + dx Fx (i), (29)
is proportional to the length of the linked list. As the window
where Gx is a weighted histogram of f and Gx (i) ∈ Rm is a size increases, the computation time tends to increase because
vector with values for each channel bin. the possibility of having values at various bins increases.
We analyze the behavior of the regularization parameter ϵ,
which controls the smoothing effect, when using multichannel VI. T HE I MPLEMENTATION D ETAILS
guide images. Consider the case where all channels of g are the In this section, we describe CPU and GPU implementations
same. That is, gx = gx 1m , where 1m ∈ Rm is a vector, whose of the proposed method. In general, a CPU is composed of
entries are all 1. Here, the same output is wanted as when fewer cores and more cache memory than a GPU. Due to
the single channel g is used as the guide image. Substituting this difference in composition, the suitable sliding window
gx = gx 1m into (27), we obtain approach differs between CPU and GPU, which will be
ex = αx (gx − µ(gx )) + µ( f x ), (30) discussed in Sec. VI-A. In the implementation of the proposed
method, the O(r ) sliding window approach is used for the
where GPU implementation as shown in Sec. VI-B, and the O(1)
sliding window approach is used for the CPU implementation
1 X
αx = 1⊤ −1
m Ṽx 1m
g y f y − µ(gx )µ( f x ) , (31) as shown in Sec. VI-C. We also describe the implementation
|x | for high precision data in Sec. VI-D.
y∈x
and Ṽx = σx 1m 1⊤
m +ϵ I . Using the Sherman-Morrison formula,
we obtain A. Sliding Window Approach for CPU and GPU
ϵ −1 The O(1) sliding window approach is superior to the O(r )
m Ṽx 1m = σx +
1⊤ .
−1
(32) sliding window approach in computational time. However,
m
while this is true for single-threaded environments, it is
As can be seen by comparing with (19), ϵ becomes relatively
not necessarily true for multi-threaded environments. In the
small as the number of channels increases. To obtain the same
O(1) sliding window approach, each row cannot be computed
level of smoothing effect when the guide image has multiple
independently because the data of the column window updated
channels compared to when the guide image has a single
in the previous row is needed. In a parallel implementation
channel, ϵ should be multiplied by the number of channels.
of the O(1) sliding window approach, an input image is
The same is true for the multipoint guided filter kernel.
divided into multiple blocks, and these blocks are processed
in parallel. A small number of divisions is ineffective in
C. High Precision Extensions terms of parallelism while a high number of divisions causes
Theoretically, the proposed method described so far can a lot of overhead, which is mainly in the construction of
be applied to high precision data. However, high precision the histograms of column windows. For fast computation,
data causes an explosive increase in the size of the histogram, it is necessary to store the data of the column windows in
which leads to practical problems. One is the requirement for the cache memory, which needs a large amount of cache
a large amount of memory to store histograms and another is memory. The O(1) sliding window approach is therefore
the increased computational cost of updating the histograms. suitable for CPUs but not for GPUs. [34] shows that the CPU
In a UWM filter, using the ordinal transform solves these implementation outperforms the GPU implementation in terms
problems because the transform can reduce the number of bins of computational speed in the UWM filter implementation
in the histogram [33]. On the other hand, the proposed method using the O(1) sliding window approach. Unlike the O(1)
cannot use the ordinal transform because (25) is not invariant sliding window approach, the O(r ) sliding window approach
to the transform. does not have the overhead of parallelization because each
744 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 32, 2023
row is computed independently. Also, it does not require a Algorithm 6 GPU Kernel Implementation
large cache for execution because it does not need the data
of the column windows. Therefore, the O(r ) sliding window
approach is suitable for processors with many cores and a
small cache, such as GPUs.
TABLE II
AVERAGE C OMPUTATION T IMES ( IN M ILLISECONDS ) W ITH R ESPECT TO
D IFFERENT I MAGE S IZES FOR G RAYSCALE AND C OLOR I MAGES
TABLE III
AVERAGE PSNR S OF I MAGES F ILTERED W ITH D IFFERENT C HANNEL R ADII rc ON N OISE R EDUCTION FOR M ULTISPECTRAL I MAGE
TABLE IV
AVERAGE MSE S OF I MAGES F ILTERED W ITH D IFFERENT A NGULAR R ADII ra ON D ISPARITY R EFINEMENT FOR L IGHT F IELD I MAGE
Fig. 9. The denoising results for the upper left viewpoint of estimated disparity in the dataset antinous. (a) The estimated disparity in the initial estimation
process of [36]. (b) UWM filter. (c) FWM-O(r ) filter. (d) CT-O(1) filter. (e) GPU-O(r ) filter with ra = 0. (f) GPU-O(r ) filter with ra = 4. (g) Ground truth
disparity.
[29] J. Nieweglowski, M. Gabbouj, and Y. Neuvo, “Weighted medians— [38] A. Clapes, J. C. S. J. Junior, C. Morral, and S. Escalera, “ChaLearn
Positive Boolean functions conversion algorithms,” Signal Process., LAP 2020 challenge on identity-preserved human detection: Dataset and
vol. 34, no. 2, pp. 149–161, Nov. 1993. results,” in Proc. 15th IEEE Int. Conf. Autom. Face Gesture Recognit.
[30] G. R. Arce, “A general weighted median filter structure admitting (FG), Nov. 2020, pp. 801–808.
negative weights,” IEEE Trans. Signal Process., vol. 46, no. 12, [39] University of Pavia Dataset. Accessed: Mar. 17, 2022.
pp. 3195–3205, Dec. 1998. [Online]. Available: http://www.ehu.ews/ccwintoco/index.php?title=
[31] V. Katkovnik, A. Foi, K. Egiazarian, and J. Astola, “From local kernel to Hypespectral_Remote_Sensing_Scenes
nonlocal multiple-model image denoising,” Int. J. Comput. Vis., vol. 86,
no. 1, pp. 1–32, Jul. 2009.
[32] N. Fukushima et al., “Efficient computational scheduling of box and
Gaussian FIR filtering for CPU microarchitecture,” in Proc. Asia–Pacific
Signal Inf. Process. Assoc. Annu. Summit Conf., Nov. 2018, pp. 875–879.
[33] B. Weiss, “Fast median and bilateral filtering,” ACM Trans. Graph.,
vol. 25, no. 3, pp. 519–526, 2006.
[34] O. Green, “Efficient scalable median filtering using histogram-based
operations,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2217–2228,
May 2018. Kazu Mishiba (Member, IEEE) received the B.E.,
[35] K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, M.E., and Ph.D. degrees from Keio University,
“A dataset and evaluation methodology for depth estimation on 4D Yokohama, Japan, in 2004, 2006, and 2011, respec-
light fields,” in Proc. Asian Conf. Comput. Vis., Taipei, Taiwan, 2016, tively. In 2006, he joined FUJIFILM Company Ltd.
pp. 19–34. He became an Assistant Professor at Keio University
[36] K. Mishiba, “Fast depth estimation for light field cameras,” IEEE Trans. in 2011. He is currently an Associate Professor
Image Process., vol. 29, pp. 4232–4242, 2020. at Tottori University. His research interests include
[37] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image image processing and computer vision.
super-resolution: Dataset and study,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. Workshops (CVPRW), Jul. 2017, pp. 1122–1131.