Parallel and Pipelined 2-D Median Filter Architecture
Parallel and Pipelined 2-D Median Filter Architecture
Parallel and Pipelined 2-D Median Filter Architecture
3, SEPTEMBER 2018 69
Abstract—The existing 2-D median filters in the literature are limitations [2]. In addition, there is always a tradeoff between
computationally intensive. It is proposed to optimally reduce the the quality of information contained in images and the
amount of data handled at the architecture level realization of resources required to handle the images. FPGAs are suffi-
the basic median filtering operation on images. The proposed
architecture reads 4 pixels at a time in the input image, 4 pixels ciently flexible and cost effective for prototyping and recon-
forming a word on a 32-bit hardware processing system; the figuring the applications [3] and, therefore, provide sufficient
subsequent processing is carried out by parallel and pipelined opportunity for the development of application specific archi-
median filter architecture. Two read operations process eight tectures which cater to real-time requirements. Basic median
input pixels which results in the generation of four output pixels filter can be implemented on specific architectures by perform-
with an initial latency. The proposed architecture offers reduced
number of read operations and increased speed. ing median operation through sorting-based systolic arrays [4]
and nonsorting-based techniques [5]–[8]. Systolic arrays are
Index Terms—Median filter, parallel median filter, pipelined continuously optimized by researchers [9]–[11]. The archi-
median filter, systolic arrays.
tecture developed by Vega-Rodríguez et al. [4] exploits 32-
bit data-width hardware for image data transfer; they have
I. I NTRODUCTION implemented the architecture as a systolic array which is
MAGE processing is integral to deciphering the intelligence reported in [10].
I associated with it. Generally, image processing involves
huge data handling, and modeling human intelligence demands
The architecture proposed in this letter handles image
data effectively in such a way that a pixel in the input image
heavy amount of computations on automated processes. is read only twice in comparison with architectures reported
Handling images involves loss of original information at vari- in the literature, wherein a pixel is read three or more times.
ous levels through environment and process stages. One such Reduction in the number of read operations for filtering an
level is the processing of image for feature extraction [1]. Prior image ensures reduction in the overall operating time. The
to this, preprocessing an image for the retrieval of the original proposed architecture also offers the advantage that it can be
information from non-Gaussian noise corruption is carried out employed for different word length realizations, programmable
mostly by nonlinear digital filters. Dominant among these are digital signal processing, ASICs, and reconfigurable architec-
median-based filters. tures. In addition, the proposed architecture does not impose
Median filter provides robustness to impulse noise; how- any constraint on the time required for reading and processing
ever, the development of median filtering algorithms does not pixels.
include the requirements of real-time intelligence systems.
Generic processing ICs do not provide cost effective solution
for image processing because of predefined architectural
II. M EDIAN F ILTER M EETING R EAL -T IME
Manuscript received September 22, 2017; revised November 3, 2017; R EQUIREMENTS
accepted November 5, 2017. Date of publication November 7, 2017; Vega-Rodríguez et al. [4] have presented an architecture
date of current version September 7, 2018. This work was supported
in part by the Project, “Capacity Building in the Areas of EPDPT” for basic median filter for systolic array implementation [10].
of Ministry of Electronics and Information Technology, Government of The architecture employs pipelining and parallelism. It is
India, Implemented by NIELIT Chennai Centre, and in part by the finally implemented with FPGA as target device. The FPGA is
Scholarship Scheme, “Visvesvaraya Ph.D. Scheme for Electronics and IT”
of Ministry of Electronics and Information Technology, Government of India, interfaced with computer through 32 bit PCI port for real-time
through VIT University Chennai Campus, Chennai, India. This manuscript interfacing and better human interactions. Every read instruc-
was recommended for publication by D. Sciuto. (Corresponding author: tion on a 32 bit system can read 4 pixels, each 8 bit wide.
Janarthanam Subramaniam.)
J. Subramaniam is with SENSE, VIT University Chennai Campus, Chennai Multiple pixels on a single read cycle and parallelism on sys-
600127, India, and also with VLSI and Embedded Systems Group, NIELIT tolic array lead to simultaneous production of four filtered
Chennai, Chennai 600025, India (e-mail: [email protected]). pixels. The parallel and pipelined median filter architecture
R. J. Kannan is with SCSE, VIT University Chennai Campus, Chennai
600127, India (e-mail: [email protected]). is shown in Fig. 1. Smith’s network [10] introduced paral-
D. Ebenezer is with the Department of ECE, Anna University, Chennai lelism and pipelining by splitting the nine level systolic arrays
600025, India (e-mail: [email protected]). into two stages. The first stage in Fig. 1 is called elementary
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. sorting stage also known as E-stage. The next stage is called
Digital Object Identifier 10.1109/LES.2017.2771453 network sorting stage also known as N-stage, with six levels of
1943-0663 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on March 09,2020 at 04:29:59 UTC from IEEE Xplore. Restrictions apply.
70 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 10, NO. 3, SEPTEMBER 2018
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on March 09,2020 at 04:29:59 UTC from IEEE Xplore. Restrictions apply.
SUBRAMANIAM et al.: PARALLEL AND PIPELINED 2-D MEDIAN FILTER ARCHITECTURE 71
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on March 09,2020 at 04:29:59 UTC from IEEE Xplore. Restrictions apply.
72 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 10, NO. 3, SEPTEMBER 2018
TABLE I
3 × 3 M EDIAN F ILTER IN X ILINX FPGA V IRTEX 4 XC4VSX25 (N = 9, I NPUT S AMPLE W IDTH = 8 B ITS )
TABLE II
3 × 3 M EDIAN F ILTER IN X ILINX FPGA V IRTEX 7 XC7VX330T V. C ONCLUSION
Computationally intensive median filtering algorithms are
a challenge in the context of real-time processing. For the
basic median filter, optimization of the amount of data han-
dled at the architecture level with pipelining and parallelism
of existing systolic array is considered. While the number of
filter processing cycles is fixed by the size of the image to be
filtered, median filtering by the proposed architecture requires
reduced data handling. The proposed architecture saves sig-
nificant amount of resources and the time required to handle
median filtering of images. Further optimization on pipelin-
low hardware complexity pipelined rank filter (LCBP) [5] and
ing and parallelism may lead to further improvement in the
Cadenas’s method [8]. However, Cadenas’s method has fewer
context of real-time processing.
latency cycles. FM-WCA uses 7% fewer DFFs and smaller
latency than Smith’s. LCBP, Cadenas’s, Smith’s, and FM-
WCA are implemented as pipelined architectures. Throughput R EFERENCES
for all these methods is same and not modified for improve- [1] G. A. Baxes, Digital Image Processing: Principles and Applications.
ments at the same speed. Parallelism with pipelining is intro- New York, NY, USA: Wiley, 1994.
[2] M. J. S. Smith, Application-Specific Integrated Circuits. Reading, MA,
duced in Vega’s architecture and CPMA for the improvement USA: Addison-Wesley, 2008.
of throughput at 454 MHz. The introduction of parallelism [3] S. Hauck, “The roles of FPGAs in reprogrammable systems,” Proc.
has reduced the amount of image data to be handled by these IEEE, vol. 86, no. 4, pp. 615–638, Apr. 1998.
[4] M. A. Vega-Rodríguez, J. M. Sánchez-Pérez, and J. A. Gómez-Pulido,
architectures. Vega’s architecture handles only 33.3% of image “An FPGA-based implementation for median filter meeting the real-
data handled by pipelined methods. But CPMA handles only time requirements of automated visual inspection systems,” in Proc.
66.6% image data handled by Vega’s architecture. It is evident 10th Mediterr. Conf. Control Autom., 2002.
[5] D. Prokin and M. Prokin, “Low hardware complexity pipelined rank
from Table I that the number of times a pixel in input image filter,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 6,
read is only twice in the complete filtering process of the input pp. 446–450, Jun. 2010.
image. Per-throughput resources of CPMA are comparatively [6] J. Cadenas, G. M. Megson, R. S. Sherratt, and P. Huerta, “Fast median
calculation method,” Electron. Lett., vol. 48, no. 10, pp. 558–560,
fewer than Vega’s architecture, and speed of 1816 MHz is sig- May 2012.
nificantly higher than FM-WCA and other methods. Table II [7] J. Cadenas, “Pipelined median architecture,” Electron. Lett., vol. 51,
presents the implementation of sorting-based methods for eval- no. 24, pp. 1999–2001, Nov. 2015.
[8] J. O. Cadenas, G. M. Megson, and R. S. Sherratt, “Median filter archi-
uating resource utilization and the speed on the state-of-the-art tecture by accumulative parallel counters,” IEEE Trans. Circuits Syst.
prototyping platform. Performance of CPMA is further con- II, Exp. Briefs, vol. 62, no. 7, pp. 661–665, Jul. 2015.
firmed by Xilinx FPGA Virtex 7 implementation. CPMA uses [9] B. Morcego, J. Frau, and A. Català, “Suavizado de imágenes en tiempo
real mediante filtrado por mediana utilizando arrays sistólicos,” in Proc.
at least 30% less hardware resources than FM-WCA and offers VII DCIS, Toledo, Spain, 1992, pp. 545–546.
increased throughput four times that of FM-WCA. Similarly, [10] J. L. Smith, “Implementing median filters in xc4000e FPGAs,” Xilinx
CPMA uses 2.5% less logical resources and handles 33.3% Xcell, vol. 23, no. 1, p. 16, 1996.
[11] J. Subramaniam, J. K. Raju, and D. Ebenezer, “Fast median-finding
less image data for the same throughput than architecture word comparator array,” Electron. Lett., vol. 53, no. 21, pp. 1402–1404,
in [4]. Dec. 2017.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on March 09,2020 at 04:29:59 UTC from IEEE Xplore. Restrictions apply.