0% found this document useful (0 votes)
70 views4 pages

Design of A 2D Median Filter With A High Throughput FPGA Implementation

Uploaded by

tresa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views4 pages

Design of A 2D Median Filter With A High Throughput FPGA Implementation

Uploaded by

tresa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Design of a 2D Median Filter with a High Throughput

FPGA Implementation
Anish Goel, Student Member, IEEE, M. Omair Ahmad, Fellow, IEEE, and M.N.S. Swamy, Fellow, IEEE
Department of Electrical and Computer Engineering
Concordia University, Montreal, QC, Canada H3G 1M8
email: {an_goe, omair, swamy}@ece.concordia.ca

Abstract — In this paper, a hybrid technique for median quality of filtered image. Of these, frequency of the design
filtering of images affected by impulse noise is proposed. Our plays a vital role in many real-time applications. The
technique combines impulse noise detection, histogram-based minimum frequency requirement also known as the ‘pixel
median calculation and bit-plane processing to obtain clock’ for a real-time Full High Definition (FHD) vision
approximate median with the aim of optimizing the throughput
system at a rate of 60 frames per second (fps) is 148.5 MHz
at minimum cost of image quality. The proposed median filter is
implemented on FPGA with pipelining and is significantly faster [2], which must be satisfied by every block of the vision
than existing FPGA based pipelined median filter architectures. system. For offline processing of massive data, even higher
Implementation of the proposed median filter hardware provides frequencies are desired.
a throughput of 282 Full High Definition (FHD) frames per
second on Zynq-7 FPGA; 48% higher than the throughput of
The throughput of algorithms that use sorting-based
low-latency median filter. Compared to FPGA implementation of median filters [3] [4] is limited, as these architectures use
a low complexity noise removal, the proposed median filter cascaded stages of sorting blocks (compare and swap) for
utilizes only 45% of FPGA slices and provides a speed-up of 2.2 median calculation. A Low Energy Adaptive Median Filter
on Zynq-7 FPGA. (LEAMF) architecture presented in [5] processes only higher
4-bits of pixel values for median calculation. Although
Keywords— Noise Detection, Bit-Planes, Histogram-Based, LEAMF processes only 4-bits of pixel values, its FPGA
Pipelining, Approximate Median. implementation results in lower throughput compared to the
adaptive median filter presented in [4] which utilizes all 8-bits.
I. INTRODUCTION On the other hand, FPGA implementation of a histogram-
Median filtering is one of the most commonly used based technique [6] results in large area and low operating
technique for removal of impulse noise in images [1]. The frequency due to its design complexity. The quality of images
defect in sensing or capturing device, memory corruptions and filtered by techniques [3-6] is equivalent to filtered image
shot noise affects an image in such a way that some pixel quality of Conventional Median Filter (CMF) [7]. On the other
hand, median filter based on a low complexity noise removal
values are set to minimum while some are set to maximum; a
technique [8] provides high quality of filtered images by
phenomenon that characterizes salt and pepper noise. For
filtering only noisy pixels (decision-based filtering). Work
recovery of original image from a corrupted image, several
presented in [9] is a bit-plane processing architecture that
techniques have been proposed where the best match for each processes N number of W-bit integers in smaller blocks of B-
corrupted pixel is calculated. Finding a pixel from the bits to achieve area and speed optimization. In the proposed
neighborhood or a close approximation based on the work, we design only one block for processing higher B-bits
neighborhood pixels to replace the corrupted pixel is the and ignore the remaining (W-B) lower bits. Although ignoring
typical approach in image filtering. This approach is lower bits compromises the filtered image quality by a small
characterized by a 2-Dimentional (2D) window, where pixels amount, the improvement achieved in hardware performance
surrounding the noisy pixel are processed. Several algorithms is significantly higher to balance this trade-off.
and techniques have tried to solve the problem of retrieving
the corrupted pixels to achieve high quality of filtered images. The quality of a filtered image can be judged
qualitatively by visualizing the image and quantitatively by
However, it may be interesting to note that the actual value of
calculating its PSNR. In general, higher PSNR shows a better
all the corrupted pixels may never be known. This is supported
approximation to the original image. Qualitatively, there is no
by the fact that none of such algorithms have resulted in
significant loss of information even if some of the pixels of a
infinite Peak Signal to Noise Ratio (PSNR) between the noisy image are not retrieved to near-original values.
original and filtered image, in-spite of low noise density. Following this concept, techniques based on approximate
For implementation of median filtering algorithms on arithmetic [10] have been implemented to achieve
hardware platforms such as FPGAs, performance parameters optimization in area and power at the cost (< 1dB) of PSNR.
like area, frequency and power are equally important as the

This work was supported in part by the Natural Science and Engineering
Research Council (NSERC) of Canada and in part by the Regroupement
Stratégique en Microélectronique du Québec (ReSMiQ).

978-1-7281-2788-0/19/$31.00 ©2019 IEEE 1073


In this paper we implement a 2D decision based nibbles of the pixels are not used for processing or calculation,
median filter that optimizes the delay, using histogram-based they are attached to the higher nibble. Sorting using higher 4-
technique to calculate median with only higher bit-planes of bits results in a 4-bit median value of 0x4. However, an 8-bit
the image. To achieve higher operating frequencies, pipeline output is required which means one of the three values: 0x46,
stages are made very shallow. We also examine the effects of 0x41 and 0x42, is to be selected. Of these values, we select the
the proposed technique on other performance parameters like first value (0x46) in the order of reading input. Although this
area, power and output image quality. example refers to ‘Sorting’, our method uses non-sorting
based median calculation, discussed in the next section.
The rest of this paper is organized as follows: Section
II presents the details of the background techniques with
examples of median calculation using only higher B-bits and
calculation of median using histogram-based technique.
Architecture of the proposed median filter is discussed in
section III, along-with the architecture of median filter core.
Section IV presents the results on filtered image quality and
results of implementation of proposed median filter Fig. 2. (a) Input window (b) Masking lower nibble. (c) Sorted window (d)
architecture on different generation FPGAs with comparative Processed window
analysis. Overall conclusion of the work is presented in
It is experimentally observed that there is no significant
section V. difference in the performance (Image Quality), if any of the
three matching values (0x46, 0x41, 0x42) in above example is
II. BACKGROUND TECHNIQUES chosen as the output. We select the first value in the order of
reading input values, as it does not add any overhead to the
A. Decision Based Median Filtering hardware. In this example, the true/actual median value (0x42)
Fig.1 shows a simple modification around CMF to convert is not chosen as the median and hence we say that approximate
it to decision based median filter where the output pixel is median is calculated using higher B-bits. However, for most
changed to the median of input window only if the center pixel windows in an image, the actual median gets calculated by
is found to be corrupted by impulse noise (0 or 255). The processing only higher bits. To demonstrate this
decision based median filter provides better image quality experimentally, we calculated the median with 5×5 window
compared to CMF for de-noising images affected by impulse size and different values of B (4, 3 and 2) for several noisy
noise. images and video frames from databases [12] and [13]. Fig.3
shows the graph of window number versus actual and
approximate median for one such sample image for different
values of B. It is observed from fig.3 that the difference
between the median obtained with different values of B and
actual median is negligible. In some cases, the median values
are exactly equal which shows in general that approximate
median is calculated using only higher B-bits. Numerically, B =
4 provides better results as compared to B = 2 or B = 3.

Fig. 1. Decision-based Median Filter using CMF

B. Processing Higher B-Bits for Calculating Approximate


Median
The higher bit-planes of images consists of majority
of visually significant data and due to the nature of images,
most pixels in most windows have values in similar range
[11]. Hence approximate median can be calculated using
higher B-bits of pixel values. To calculate median using higher
Fig. 3. Actual Median and Median using Higher B-bits for a sample image
B-bits, consider a 3×3 window of an image which has 9 pixels
represented in hexadecimal base in fig.2 (a). The center pixel
in this window is to be filtered by processing the window with C. Histogram-Based Median Calculation
B = 4. The number of cascaded stages of sorting blocks in a
In the first step, the lower hex characters (lower 4-bits) of classical sorting network for median calculation is equal to the
number of input values [14], which results in a low frequency
the input pixels are masked. The resulting 3×3 window is
hardware implementation. On the other hand, number of
shown in fig.2 (b). The output of previous step is sorted
considering only higher 4-bits and reading pixels in order, stages in a histogram-based median calculation hardware is
which results in the window in fig.2 (c). Although the lower independent of the number input values resulting in
comparatively higher operating frequencies. The histogram-

1074
based median filter that reads and processes all input pixels of
a window simultaneously, is feasible for hardware
implementation only if limited bits of input pixel values are
considered. For more bits, the hardware grows exponentially.

Fig. 5. Proposed Hybrid Median Filter Architecture

The operating frequency of the median filter core is


dependent on the critical path delay which is the delay of
Stage-II, as it accumulates all comparison bits (25 bits to be
added for 5×5 window). To reduce this delay, we exploit the
concept of super-pipelining by adding registers inside the
Fig. 4. Histogram-Based Median Calculation adder stage (Stage-II).
Fig.4 shows median calculation using histogram technique
with B = 3 as an example. In the first step, all input values are
compared with all possible values (0-7) that an input may
take; forming Stage-I of the median calculation block. In the
next step, the comparison results of Stage-I are added together
to calculate the histogram of the input data which forms Stage-
II. The Stage-II outputs (count of each value) are added
successively and after each addition, the result is compared
with value ‘5’ (position of median for 9 inputs) forming Stage-
III of the architecture. The value in range 0-7 whose count was
added to make the comparison true is the median.

III. PROPOSED ARCHITECTURE


We implement the proposed hybrid median filter for 5×5
window and serially process each window for decision based
median filtering using histogram calculation with B = 4. As
the input pixels are available serially, a buffer stores four rows
and five pixels of the next row of the image and then provides
a 5×5 window to the median filter block in every clock cycle.
Fig. 6. Median Filter Core with B = 2 and N = 5
A. Proposed Hybrid Median Filter Architecture
Fig. 5 shows the block diagram of the proposed hybrid IV. RESULTS
median filter that processes N W-bit integers. Out of the W-
bits, B-bits are processed using histogram-based technique to A. Image Quality
calculate B-bit median value by the median filter core. Using To analyze the quality of filtered images by the proposed
B-bit median, W-bit median is determined from the input hybrid median filter, images from database [12] and FHD
window. Based on the decision made by the impulse noise video frames from database [13] are added with salt and
detector, W-bit median or the center pixel is selected as the pepper noise of varying noise densities followed by filtering
output. The hardware implementation of the proposed hybrid using the proposed technique and calculating the PSNR. The
median filter uses pipeline registers between all stages. average PSNR of the images filtered by the proposed median
filter is at least 8 dB greater than average PSNR of images
B. Median Filter Core filtered by median filters [3-6] and is less than the average
For simplified representation, block diagram of median PSNR of images filtered by Low Complexity Median Filter
filter core for B = 2 and N = 5 is shown in fig. 6. For N = 5, (LCMF) [8] by approximately 0.7 dB.
the value of last stage comparators is ‘3’ (position of median
for 5 inputs). The hardware design consists of 3 stages with B. Hardware Evaluation
pipeline registers between these stages. The structure of To evaluate the proposed hybrid technique on hardware, we
median filter core is same for higher values of B and N with designed and implemented it using VHDL for B = 4 and N = 25
more comparators and adders. along-with supporting buffers, logic and control on Zybo Z7

1075
evaluation board equipped with Xilinx XC7Z010 SoC. Real- TABLE I. IMPLEMENTATION RESULTS WITH COMPARISON
time input to the complete system was provided using FPGA Architecture # Slices Frequency (MHz) Throughput (fps)
APEMAN 1080P FHD camera capturing video at 30 fps with a Virtex - 2 [4] 1506 305 140
resolution of 1080×1920 and the frames were fed to the [5] 366 140 56
median filter block sequentially with the processed output [6] 2300 72 35
[8] 2718 101 48
visualized on Monitor via HDMI interface as seen in fig. 7. Proposed 1688 384 185
Virtex - 6 [5] 136 263 105
[8] 1713 169 91
Proposed 527 531 256
Zynq - 7 [3] 1550 394 190
[8] 1990 261 125
Proposed 900 586 282

V. CONCLUSION
This paper presented a hybrid technique for high speed
filtering of images affected by impulse noise. The proposed
hybrid median filter performs significantly better than many
relevant architectures in terms of image quality as well as
Fig.7. Evaluation of Proposed Hybrid Median Filter on Zybo Z7 Board hardware performance. We validated the proposed hybrid
median filter by designing and testing the complete system on
C. Experimental Results Xilinx Zynq-7 SoC. Experimental results show that the
The hardware implementations are verified using RTL proposed median filter hardware can work at a frequency of
simulations and the VHDL code for proposed hybrid median 586 MHz when implemented on Zynq-7 FPGA with an
filter is mapped to Virtex-2, Virtex-6 and Zynq-7 FPGAs. All estimated throughput of 282 FHD frames per second.
experiments were performed on Xilinx Vivado Design Suite.
Implementation results of the proposed filter are compared REFERENCES
with FPGA implementations of other relevant pipelined [1] Tai, Y. M., Tu, W.C. and Chien, S. Y., “VLSI Architecture Design of
median filters for same window size (5×5). These results are Layer-Based Bilateral and Median Filtering for 4k2k Videos at 30fps”,
presented in Table I. The throughput Tp in fps is calculated 2017 International Symposium on Circuits and Systems.
[2] Kowalczyk, A., Przewlocka, D. and Kryjak, T., “Real-time
using (1) from the operating frequency fmax (MHz) obtained Implementation of Contextual Image Processing Operations for 4K
from timing summary report after inserting the timing Video Stream in Zynq UltraScale+ MPSoC”, 2018 Conference on
constraints. Pipeline latency ‘δ’ is very small and may be Design and Architectures for Signal and Image Processing (DASIP).
ignored. Fs is 1080×1920; the frame size for FHD video [3] Kumar, V., Asati, A. and Gupta, A., “Low-latency Median Filter Core
for Hardware Implementation of 5 × 5 Median Filtering”, IET Image
frames. The throughput obtained from (1) is an estimate. The Processing, 2017, Vol. 11 Issue 10, pp. 927-934.
reference implementations also estimate the throughput of [4] Vasicek, Z. and Sekanina, L., “Novel Hardware Implementation of
their architectures using similar calculations. Adaptive Median Filters”, 11th IEEE Workshop on Design and
Diagnostics of Electronics Circuits and Systems, Apr. 2008.
[5] Kalali, E. and Hamzaoglu, I., “A Low Energy 2D Adaptive Median
Tp = [(fmax × 10E6) + δ] / Fs (1) Filter Hardware”, Design, Automation & Test in Europe Conference &
Exhibition (DATE) , Grenoble, France, 2015.
[6] Fahmy, S.A., Cheung, P.Y.K and Luk, W., “High-throughput One
It is observed from Table I that the proposed hybrid median Dimensional Median and Weighted Median Filters on FPGA”, IET
filter architecture provides the highest throughput compared to Computers & Digital Techniques (Volume: 3, Issue: 4 , July 2009).
reference architectures. Although the proposed median filter [7] T. Nodes, and N. Gallager, "Median filters: Some Modifications and
their Properties," IEEE Trans. Acoustics, Speech and Signal Processing,
hardware utilizes more area compared to some architectures, voI.ASSP-30, no.5. (739-746), Oct.l982.
the throughput achieved in such cases is significantly higher. [8] T. Matsubara, V. Moshnyaga, and K. Hashimoto, “A FPGA
In comparison with LCMF [8], the proposed median filter Implementation of Low-Complexity Noise Removal,” in Proc. of 17th
requires substantially low area while providing significantly International Conference on Electronics, Circuits, and Systems (ICECS),
pp. 255 – 258, 2010.
more operating frequency leading to a high throughput on all [9] Cadenas, J.O., Megson, G.M. and Sherratt, R.S., “Median Filter
the three FPGAs. Thus, there is a trade-off between image Architecture by Accumulative Parallel Counters”, IEEE Transactions on
quality and hardware performance when the proposed median Circuits and Systems-II: Express Briefs, VOL. 62, NO. 7, (661-665) July
filter is compared with LCMF. 2015.
[10] Sekanina, L., Vasicek, Z. and Mrazek, V., “Approximate Circuits in
For comparing performance with LEAMF [5], energy Low-Power Image and Video Processing: The Approximate Median
Filter”, Radio Engineering, VOL. 26, NO. 3, September 2017.
required by the proposed architecture is calculated for noisy [11] R. C. Gonzalez and R. E. Woods, Eds., Digital Image Processing, 3rd
FHD video frames by importing the simulation activity file in ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2008.
Xilinx XPower analyzer for Virtex-6 implementation. It is [12] Image Database: http://www.eecs.qmul.ac.uk/~phao/IP/Images/
observed that the proposed median filter consumes only 30% [13] Video Database: https://media.xiph.org/video/derf/
[14] Batcher, K.E., “Sorting Networks and their Applications”, Spring Joint
extra energy compared to LEAMF while providing better Computer Conference, 1968.
image quality and a speed-up of 2.4 as compared to it.

1076

You might also like