A Real-Time Implementation of Fpga Hardware For Mpeg Artifact Reduction For Video Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT

REDUCTION FOR VIDEO APPLICATIONS

Charayaphan Charoensak Farook Sattar

Digital Video Processing, Architecture and School of Electrical and Electronic Engineering
Standard Design, Philips Consumer Electronics Nanyang Technological University
620A Lorong 1, Toa Payoh, Singapore 319762 Nanyang Avenue, Singapore 639798
emails: Charayaphan.Charoensak@Philips, email: [email protected]
[email protected]
boundaries could not be specified accurately. Thus, a
ABSTRACT practical post-processing deblocking algorithm should not
base on a fixed block size and locations.
This paper presents efficient hardware architecture for
the implementation of real-time MPEG artifact reduction. Methods for reducing the blocking artifacts may be
MPEG artifact reduction, or deblocking, implemented here grouped into three categories according to their means of
is based on modified Bilateral low-pass filter. Bilateral reconstruction. The first category uses low-pass filtering
filter is a type of non-iterative filter that preserve edge [5]. The second category involves statistical estimation [6].
information. When applied on images compressed with The last category involves set-theoretic reconstruction [7],
BDCT-based compressors, results in improved visual which defines constraint sets from observed data and tries
quality without over smoothing of the image [1],[2]. We to reconstruct the original image by projecting onto convex
propose a modified Bilateral filter (BF) that is sensitive to sets (POCS). Last two categories require iterations, which
the activity across the orthogonal block boundaries and thus are not practical for real-time processing because of the
suitable for the applications of deblocking. The proposed requirements for memories for storing video data.
architecture demonstrates a good compromise between
filtering performance and FPGA resource requirements. There is an increasing demand for high-definition (HD)
The architecture was prototyped in hardware using FPGA picture quality in the area of consumer-based television
(Field Programmable Gate Array). FPGA design and including full-HD television sets, HDTV, and blu-ray disc.
simulation was carried out using system-level design tool. The high-resolution display technologies make the MPEG
artifacts more visible. Increasing the bit rate in the data
stream in order to improve the picture quality is typically
1. INTRODUCTION not possible. The post processing is the most feasible
solution because it does not require any modification to the
Today, digital medium is widely used for storage and existing compression standards. The high computation
transmission of video information. Many efficient and power of hardware-based circuits such as FPGAs allows
standardized video compression formats exist for various real-time processing at a reasonable cost.
applications such as H.261, H.263, and MPEG-1/2/4. These
compression formats are based on block-based discrete This paper presents our work on hardware architecture
cosine transform (BDCT). BDCT is commonly used of FPGA-based circuit for MPEG artifact reduction suitable
because of its near-optimum energy compaction and fast for video applications. The algorithm is based on modified
algorithm for hardware implementation. Most compression Bilateral filter (BF). BF filter offers edge-preserving
standards use 8x8 block discrete cosine transform (DCT). smoothing of the image and requires no iteration. The
At high compression ratio, this BDCT method suffers some modified Bilateral filter discussed offers hardware
artifacts including blocking, ringing, and mosquito noise. simplification while at the same time sensitive to the
activity across the orthogonal boundary around the block
Several deblocking algorithms [3],[4] have been boundaries. The measurement of the activity is also used for
developed and reported in publications. In the case of adapting the BF parameters suitable for different level of
video applications such as in television sets, the digital artifacts. The result is improved visual quality without over
data stream passes through many processing stages such as smoothing the image details and sharp edges in the image.
scaling, luminance transient improvement (LTI), and After MATLAB simulations, the final verification of the
motion blur reduction. Thus, the blocking artifact may no design was carried out using system level tool called
longer appear as 8x8 block in size, and the block System Generator from Xilinx [8].
2. BILATERAL FILTER AS AN IMAGE its inverse are diagonal with constant entries. The Weiner
ESTIMATOR filter may then be implemented by the conditional average:

lx = 1
Bilateral filter was first introduced by Smith and l
K
∑ y P( ζ | y, k )
k (5)
Brady under the name “SUSAN” [9] and was later referred k ∈η l

to as “Bilateral filter” [10]. The filter replaces each pixel


where K = is the normalization factor and
by a weighted average of its neighbors. The weight ∑ yk P ( ζ | y, k )
assigned to each neighbor decreases with both the distance k ∈η l

in the image plane and the distance on the intensity axis. ρ xy is constant. Equation (5) may be expressed in the form
Thus, it is a form of moving average adaptive filter of Bilateral filter:
weighted:
lx = 1 p( y | ζ , k )
∑ y w( k 1 yl − yk ) w2 ( l − k ) l
K
∑η y k
p( y | k )
P ( ζ | y , k ), (6)
lx = k ∈η l k∈ l
l
(1) where
∑η w ( 1 yl − yk ) w2 ( l − k )
p( y | ζ , k )
k∈ l w1 (| yl − yk |) 
p( y | k ) (7)
Here, yl and x  are the filter input and output values
l

respectively, l and k are 2D-coordinates of the image pixel w2 ( l − k )  P (ζ | k ) (8)


locations,ηl is a neighborhood around l, . denotes the are the correspondence between the image estimator and
Euclidean distance, and w1(.) and w2(.) are weight the Bilateral filter. Thus, Bilateral filter may be used as an
functions. w1(.) is a function of absolute difference of efficient image estimator.
brightness value and w2(.) is a function of Euclidean
distance. The weight functions are usually chosen as
Gaussians for w2(.) and exponential for w1(.). 3. PROPOSED HARDWARE ARCHITECTURE
FOR MPEG ARTIFACT REDUCTION

The estimation x of the original signal is computed


We show in last section that Bilateral filter may be
from a distorted signal y = x + n, where n is uncorrelated used as efficient image estimator, and in our application,
noise. The least mean-square (LMS) estimate is obtained MPEG noise reduction. In this section, we propose
by the conditional expectation. architecture for effective MPEG artifact reduction. Our
goals are to maintain low resource requirements:
x = E{x | y} (2)
and the linear solution of this problem is Wiener filter. 1. It should effectively reduce MPEG artifacts,
Similarly, a locally adaptive Weiner filter is expressed as: blocking and mosquito noise, with minimal
reduction in picture sharpness.
x l = rxl y Ryy−1 y (3) 2. The deblocking algorithm should not depend on
the fixed size and location of the block
Here, the pixels in y located around position l are boundaries. This is because, in practice, the block
boundaries could not be fixed.
denoted ζ . The correlation ρ xy is defined; where its high
3. It should be adaptive in removing the noise at
value indicates that the observation belongs to the same different level.
structure, and low value for pixels that do not. This 4. It should be simple to be realized in hardware. It
correlation within structure is expressed as ρ xy2 = (σ / σ x ) 2 . should not require frame buffer. Thus, the
σ denotes the noise variance, and σ x the signal variance. hardware will be practical for real-time video
post-processing using low-cost hardware.
Typically, ρ xy is closed to 1 and we may assume:
The Bilateral filter is a good choice for our applications
rxl y = ρ xy if yk ∈ ζ else 0. (4) because it is non-iterative, robust, and relatively simple.
The proposed modified Bilateral filter takes into account
Since the observations are corrupted by noise, we may the activity across block boundary and is more effective on
present the probability P ( ζ | y , k ) that an observed value reducing the MPEG blockiness. The measurement of the
y at location k belongs to ζ . If we assume constant activity across block boundary can easily be used to adjust
variance and uncorrelated observations, a formulation the BF parameters according to the actual level of MPEG
similar to the Bilateral filter can be derived. Ryy as well as artifacts existing in the image.
As shown in equation (1), the filtered output of the
Bilateral filter is a function of two weights. w1(.) is a
function of the absolute difference of brightness values at
the two locations, and w2(.) is a function of Euclidean
distance between the two locations. In our application, the
typical Gaussians function is used for the weight w2(.).

To simplify the hardware, the weight function w1(.) is


defined as a step function. The transitions of the step
function depend on the sum of absolute difference of pixel
along the horizontal and vertical directions across the centre
pixel, scaled by the measurement of the average activities
along the block boundaries in the whole image. Fig. 1
shows the pixel coordinate system, the horizontal and Fig. 1 Boundaries, and pixels used for measurement of activity
vertical block boundaries, and the pixels defined for the
measurement of activity around the boundaries. Note that
for illustration purpose the block size is shown as 8x8. 4. HARDWARE PROTOTYPE DESIGN MPEG
However, in typical application, the block size and location ARTIFACTS REDUCTION
are not known. Here, h1 to h8 are pixels for measurement
of activities along horizontal boundary and v1 to h8 are This section describes the FPGA design of the
used for vertical boundary. Here, H1 to H4 refer to pixels prototype circuit that implements the modified Bilateral
from the below adjacent block. The measurement of filter for the MPEG deblocking. An integrated system-
activity along the bottom horizontal boundary is computed level environment called System Generator from Xilinx [8]
from pixels h5 to h8 and H1 to H4: is used. Using System Generator, the FPGA design and
simulation is carried out using Simulink and Xilinx blocks.
∆ h 2 = abs (h5-H1 + h6-H2 + h7-H3 + h8-H4) (9) The FPGA functional simulation is done under MATLAB
Simulink environment. After the successful simulation, the
When the DC coefficient of the DCT used in the upper
synthesizable VHDL code is automatically generated from
block is significantly different from that of the lower block,
the models for final FPGA implementation.
the four difference pairs, h5-H1, h6-H2, h7-H3, and h8-H4
will exhibit four offset values of the same sign and of
The modified Bilateral filter described in previous
significant magnitude. Thus, the measurement ∆ h 2 will be
section was implemented. For spatial weight w2(.), a
a large positive number indicating the activity on the lower Gaussian function of variance σ 2 = 6, and the 7x7
horizontal boundary. Similarly, we define the activity of the convolution kernel were used. The values were proven to
upper horizontal boundary: be a good trade-off between hardware complexity and the
∆ h1 = abs (h1-H5 + h2-H6 + h3-H7 + h4-H8) (10) deblocking performance. The top-level design of the
FPGA design for MPEG deblocking is shown in Fig. 2.
Similar definitions are used for measurements of Note that the design contains two subsystems labeled
activity on left and right vertical boundaries, ∆ v1 and ∆ v 2 . “Virtex2 7 Line Buffer” and “Filter”. The more detailed
design of the subsystem “Filter”, which implements the
If the summation of the four activity measurements,
modified Bilateral filter is shown in Fig. 3. Notice the sub-
∆ h1 + ∆ h 2 + ∆ v1 + ∆ v 2 , is higher than a threshold λ , then systems “sumval”, “sumweight”, and “7-tap BF filter”.
the function w1(.) is zero, otherwise 1: The sub-systems “sumval”, “sumweigh” perform the
summations:
w1(.) = 0, when ∆ h1 + ∆ h 2 + ∆ v1 + ∆ v 2 < λ
= 1, otherwise.
∑ y w(
k∈η l
k 1 yl − yk ) w2 ( l − k ) (11)

and (12)
The threshold λ is a function of the average ∑w(
k∈η l
1 yl − yk ) w2 ( l − k )

measurement of the boundary activity across the whole


image. Replacing the exponential function with step in equation (1) respectively. The Xilinx CORDIC divider
function for the weight w1(.) removes the need for block shown in Fig. 2 is used to perform the division for
multiplier and look-up table for exponential function. This the weight factors mentioned above. The sub-system “7-
results in simplified circuit as well as shorter critical path, tap BF filter” is a MAC-based implementation of the 7x7
and thus a higher operating frequency. modified BF filter as explained in last section.
Table 1 shows the comparison between the measured
PSNRs before and after the implementation of modified BF
filter discussed. It is shown that the BF filter improves the
PSNR by 8.3 dB. The improvement results vary according
to picture contents and more experiments are needed.

Table 1. Comparison between measured PSNRs with and without


modified Bilateral filter
Fig. 2 Top-level design of BF for MPEG deblocking using System PSNR (dB)
Generator Before MPEG noise reduction using BF 25.5
After noise reduction using BF 33.8

5. FPGA SYNTHESIS RESULTS

After successful simulation, the VHDL codes were


generated from the design. The VHDL codes were then
synthesized using Xilinx ISE 8.1i, targeted for Xilinx
Spartan3 family. The optimization setting is for maximum
clock speed. Table 2 details the resource requirements of
the design. Note that in practice, additional circuit is
needed for input/output interface, and synchronization.
Note also that system-level design using System Generator
may not offer optimal gate requirements and clock speed.
Fig. 3 Detailed circuit for the modified 7x7 BF and for the weight
summation for scaling Table 3 shows the reported maximum path delay and
the highest FPGA clock frequency. Because the 7-tap
The FPGA reads in the gray scale image data sequentially MAC-based is used for the filter, the actual maximum
from MATLAB workspace variable “grayScaleSignal”, pixel rate achievable is 81.2/7=11.6 Million pixels/second.
and writes the filtered image data sequentially into the This is slightly slower than the typical frequency of 13.5
workspace named “filteredImage”. After the simulation is MHz required for un-scaled standard definition (SD)
completed, a MATLAB program plots the input image and television. More work is needed for hardware optimization
filtered output image for comparison. The result of the and poly-phase filter may be used. Additional circuitry for
FPGA simulation is shown in Fig. 4. Fig. 4a is the input color space transformation is also needed.
image with blocking and ringing artifacts clearly visible, Table 2. Resource utilization of the FPGA design for MPEG
Fig. 4b is the simulation output image, and Fig. 4c shows deblocking based on modified Bilateral filter
the absolute difference of the two images. It is observed in Number of Slice for Logic 1,550
Fig. 4b that the blocking and ringing artifacts are much Number of Slice for Flip Flops 950
reduced with minimal reduction in sharpness and details. Number of 4-inputs LUTs 6,002

Table 3. Maximum combinational path delay and operating


The performance measurement of the algorithm is frequency of the FPGA design
measured by comparing the Power Signal-to-Noise Ratio Maximum path delay from/to any node 13.8 nSec
(PSNR) of the image before, and after the BF filter: Maximum operating frequency 81.2 MHz
⎡R R

PSNR (dB) = 10 log ⎢ ∑ y 2 (i ) / ∑ e2 (i ) ⎥ (13)
6. CONCLUSION
⎣ i∈R i∈R ⎦
R
In this paper, we present an FPGA implementation of
where ∑y
i∈R
2
(i ) is the measurement of image total energy a modified Bilateral filter for the application of MPEG
deblocking. The modified Bilateral filter architecture takes
(R represents the image width and height dimension) and into account the activity along the horizontal and vertical
R
directions, which represents the blockiness around the
∑ e (i)
i∈R
2
represent the error due to MPEG noise block boundaries. The architecture is simplified and the
step function is used for the weight w1(.) instead of a decay
calculated from: exponential function. This results in less hardware resource
e(i ) = x(i ) − y (i ) (14) needed.
operate closed to a real-time standard definition television
frame rate. Additional work on testing with more images,
optimization and improvement of the design, and real-time
demonstration of the system is needed.

It is found that Bilateral filter is very efficient for the


application on removing MPEG artifacts because it
preserves edge information, requires no iteration, stable,
and relatively simple to realize in hardware. Thus, the filter
offers much potential for real-time applications including
JPEG and MPEG deblocking, impulse noise removal, and
high-quality image up-sampling. The applications for
colored image may be realized by first performing color
(a) space transformation from, RGB to YUV or HSV for
example, then performing the deblocking on the luminance
information, and then inverse the transformation to generate
the RGB output. In another area, the development of local
adaptive Bilateral filter, for improved performance, is a
good subject for future work.

7. REFERENCES

[1] Skočir, P., Marušič, B., Tasič, J., “A three-dimensional


extension of the SUSAN _lter for wavelet video coding
artifact removal,” in Proc.Electrotechnical Conference,
MELECON 2002, pp. 395-398, 2002.
[2] Elad, M., “On the origin of the bilateral filter and ways to
(b) improve it,” IEEE Transactions on Image Processing, pp.
1141-1151, 2002.
[3] Peter List, et al, “Adaptive Deblocking Filter,” IEEE Trans.
Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614-619,
July 2003.
[4] S. D. Kim, J. Yi, H. M. Kim, and J. B. Ra, “A deblocking
filter with two separate modes in block-based video coding,”
IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 156–
160, Feb. 1999.
[5] Y. F. Hsu and Y. C. Chen, “A new adaptive separable
median filter for removing blocking effects,” IEEE Trans.
Consum. Electron., vol. 2, no. 3, pp. 91–95, Mar. 1993.
[6] J. Luo, C. W. Chen, K. J. Parker, and T. S. Huang, “Artifact
(c) reduction in low bit rate DCT-based image compression,”
Fig. 4 FPGA simulation results. (a) input image with visible blocking IEEE Trans. Image Process., vol. 5, no. 9, pp. 1363–1368,
and mosquito noise, (b) output of BF filter, (c) the absolute difference Sep. 1996.
of (a) and (b)
[7] P. L. Combettes, “The foundations of set theoretic
The hardware implementation of the algorithm was estimation,” Proc.IEEE, vol. 81, no. 2, pp. 182–208, Feb.
realized using FPGA. The FPGA design was carried out 1993.
using a relatively new system level tool called System [8] Xilinx Inc., System Generator v8.1 for the MathWorks
Generator from Xilinx. The FPGA functional simulations Simulink: Quick Start Guide, 2006.
were carried out to verify and measure the deblocking [9] Smith, S.M., Brady, J.M., ”SUSAN – a new approach to low
performance of the proposed architecture. After the level image processing,” International Journal of Computer
successful simulation, the VHDL code for the design was Vision 23, pp. 45–78, 1997.
generated and synthesized. The estimated FPGA resource
[10] Tomasi, C., Manduchi, R., “Bilateral filtering for gray and
requirement is reported. The estimated maximum operating
color images,” in IEEE Proc. Int. Conf. Computer Vision,
speed of the FPGA designed suggest that the design can pp. 839–846, 1998.

You might also like