0% found this document useful (0 votes)
29 views6 pages

Aydin 2019

The document discusses implementing multichannel FIR filters on an FPGA. It describes FIR filter theory, architectures for efficient implementation, and presents results from implementing various order filters on a Xilinx Artix-7 FPGA. Parallel multichannel FIR filters allow efficient processing of signals from applications like power electronics in smart grids.

Uploaded by

ahmedosama8272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Aydin 2019

The document discusses implementing multichannel FIR filters on an FPGA. It describes FIR filter theory, architectures for efficient implementation, and presents results from implementing various order filters on a Xilinx Artix-7 FPGA. Parallel multichannel FIR filters allow efficient processing of signals from applications like power electronics in smart grids.

Uploaded by

ahmedosama8272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ECAI 2019 - International Conference – 11th Edition

Electronics, Computers and Artificial Intelligence


27 June -29 June, 2019, Pitesti, ROMÂNIA

FPGA Implementation of Multichannel FIR Filters


Cihan Aydın İbrahim Sefa
Smart Grids Electrical and Electronics Engineering
Gazi University Gazi University
Ankara, Turkey Ankara, Turkey
[email protected] [email protected]

Abstract—Latest generation FPGAs determine the future CLBs (configurable logic blocks). This structure provides
usage of FIR filters. Their DSP blocks are able to implement speed, flexibility, cost and performance compared to
fixed-point data types for efficient computations. The systolic conventional DSP (Digital signal processing) processors [4].
multiply-accumulate architecture is utilized for various order
and parallel multiple channel to efficiently handle resource and Noisy signals which are involved to control cycle without
timing considerations. Implementing various order filter taps, filtering cause some troubles related to instability and
resource and latency of the particular architecture of Xilinx misdetection. Yet another important issue is that the systems
Artix-7 (XC7A100T-1CSG324C) series with the clock frequency that are processed only microprocessors have performance and
of 100 MHz and 12 bit input and 12 bit output is observed. The response time loss due to long time of filtering the signals. In
proposed design also shows that this design is suitable for power electronics applications, a large number of signals
multichannel parallel implementation such as power electronics mostly included switching noises need to read and evaluate
applications in smart grids. regarding protection or control. That is why the need of
filtering the signals with minimum latency is a must to enhance
Keywords—FPGA, FIR filter, parallel multichannel, adc, dac, the performance of the systems.
smart grids
In this paper, implementation of parallel multichannel and
I. INTRODUCTION various taps on Xilinx Artix-7 (XC7A100T-1CSG324C) FPGA
Digital filtering is one of the important aspect in digital device regarding resource, latency and multichannel aspects is
signal processing world. These filters are essentially used to demonstrated. The rest of the paper is organized as follow.
filter unwanted portions of the signal for various applications Section-2 gives theory of a FIR filter. Section-3 gives an idea
such as power electronics and control systems. Application of of how to design an FIR filter. Section-4 gives detailed
digital filters utilizes adders, multipliers and shift register explanation of architecture used in this implementation.
blocks. Architecture of the digital filters manipulates these Section-5 expresses how to quantize coefficients. Section-6
blocks and determines the speed, complexity and power [1]. gives details about output rounding. Section-7 states
implementation of an FPGA. Section-8 gives detailed
A digital filter receives digital inputs and gives digital explanation of testing elements, tables and results. Finally, in
outputs. Typically, in a filtering, digital signal processor reads the last section, conclusion and future works are explained.
sample from an analog to digital converter, manipulates
mathematical processes according to the filter type and extracts II. SMART GRIDS APPROACH
the result to a digital to analog converter [2]. Fig.1 shows that A smart grid complies with a vast number of variance
conventional flow of above statements with an FPGA (Field which may include distributed generation, demand response
programmable gate array). and electric grid itself. Because of that, the communication
need to handle the data obtaining from sensors and monitoring
that are more complicated in the consistency of utilizing
enhanced power electronics equipment [5].
In today smart grids, the power electronics equipment
needs high data density and very fast communication units.
Data processing from sensors and monitoring equipment need
the FPGAs to overcome the problems related to slow response
from classic microprocessors. Fig.2 can display this approach
and context of the using FPGAs in smart grids.

Fig. 1. FPGA signal processing

FIR (Finite impulse response) filters have a crucial position


in digital signal processing. Fundamentally, an FIR filter
carries out a convolution on a window of N data samples [3].
FPGAs are being gradually used in digital signal processing
world. New generation FPGAs comprise of wide range of

978-1-7281-1624-2/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Ideal low pass filter

A more incremental cut off with a transition bond amongst


Fig. 2. Smart Grid approach the pass band and the stop band must be approved. A classic
magnitude response of low pass is displayed in Fig.5. The
III. FIR FILTER THEORY dotted horizontal lines in the image refer the tolerance limits.
A digital filter is a mathematical algorithm executed in The frequencies and are the pass band edge (cut off)
digital signal processing processors or FPGAs, and/or software frequency and the stop band edge frequency respectively [6].
to fulfill necessary objectives. Digital filters are sorted as linear
or nonlinear filters. Linear filters are classified into FIR and IIR
(Infinite impulse response) filters.
Specific benefits of FIR filters may be wrapped up as
follows:
• FIR filters are steady since there is no feedback from
past outputs and the absence of poles.
• The layout of linear-phase filters can be assured.
• The finite precision errors are less solid in FIR filters
than IIR filters.
• FIR filters may be actively implemented on DSP
processors or FPGAs [6].
A casual FIR filter can be specified mathematically by the
following difference equation:
N-1
y[n]= ∑k=0 h[k]*x[n-k] (1) Fig. 5. Actual low pass filter
Fig.3 displays the basic direct form of a conventional FIR Optimal desired frequency response that can be represented as:
filter. x[n] is the input signal, y[n] is the output signal and h[k]
is the coefficients. Hd (ejw )= ∑∞-∞ hd [n] *e-jwn (2)
where ℎ [ ] is the corresponding impulse response sequence.
which can be expressed as:
1 π
hd [n]= * H
-π d
ejw * ejwn (3)

The desired frequency response is defined as:

e-jwM/2 , |w|<wc
Hp ejw = (4)
0, wc <|w|≤π
The corresponding ideal impulse response is:
Fig. 3. FIR filter basic direct form
M
1 wc -jwM sin wc * n- 2
IV. DESIGN OF FIR FILTERS hp [n]= * -wc
e 2 * ejwn dw= M (5)
2π π* n-
2
The features of digital filters are frequently remarked in the
frequency domain, and hence, design is predicated on V. ARCHITECTURE
magnitude-response features. In fact, it cannot fulfill infinitely Systolic Multiply-Accumulate is first-hand encouraged by
sharp cutoff as in Fig.4. the DSP slices, outcomes area-efficient and high efficient filter
executions. Besides, this architecture expands to make use of
coefficient symmetry, hence ensuring more resource saving [7].

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
Fig.6 displays the Systolic Multiply-Accumulate architecture
implementation.

Fig. 8. Conventional DSP implementation and an FPGA implementation

FPGAs are very well favorable to fill this performance gap


Fig. 6. Systolic Multi-MAC implementation for various reasons:
VI. COEFFICIENT QUANTIZATION • They offer extremely high-performance signal
processing capability through concurrency.
Principally, when the filter coefficients have been stated
benefiting from non-integer real numbers, that setting quantizes • They ensure very low risk due to the flexible
the coefficients to the determined coefficient bit width. The architecture.
coefficient rates are rounded to the nearest value benefiting
from a round towards zero algorithm. The FIR Compiler • They allow design migration to handle changing
examines the filter coefficients to detect how many bits are standards.
needed to perform the integer section of the coefficient rates. • Developers can use them to create a specialized and
The whole remaining coefficient bits are then assigned to differentiated solution.
perform the fractional section of the coefficients. When the
whole determined coefficients are between 1 and -1, only a • Their price is relatively low.
single bit is needed. The residual of the coefficient part is
utilized for fractional bits [7]. Fig.7 shows an example of this • They provide very low power per function [8].
implementation. IX. TESTING AND RESULTS
Xilinx Artix-7 (XC7A100-1CSG324C) and Xilinx
VIVADO 2017.4 platform are utilized to make this particular
design. Also, for this design, Microblaze soft-core processor
and IP (Intellectual Property) blocks are used. 12 bit ADC
(Analog-digital converter) is used to digitize for the input noisy
signal. Input width and output width are set to 12 bits. The
whole design is designed to run at 100 MHz. The coefficients
for Xilinx FIR Compiler are generated via FDA Tool from
Simulink. The signal, sampling frequency, cut-off frequency
and stop-band frequency are set to 10 kHz, 20 kHz, 4 kHz and
5 kHz respectively. Pass band magnitude and stop band
magnitude are set to 1 dB and 80 dB respectively. Filtered
signals are observed via hardware ILA (Integrated Logic
Analyzer). In addition, an external 12-bit DAC (Digital-to-
analog converter) is used to display the filtered signal.
Fig. 7. Coefficient quantization fix18_17
Fig.9 displays the overall system used in this paper. Fig.10
VII. OUTPUT ROUNDING shows the 10 kHz signal with noise. Fig.11, 12, 13, and 14
display filtered signals 3, 4, 5, and 8 taps respectively using
In this paper, non-symmetric rounding up rounding mode is
hardware ILA. Fig.15, 16, 17, and 18 show filtered signal using
used. In this rounding mode, a binary value suitable to 0.5 is
DAC. Finally Fig.19 shows the graph in respect to latency and
added to the accumulator outcome and then the most LSB
taps.
(Least Significant Bit) are eliminated. That addition may
generally be implemented in most filter forms with little or In table I and II, values of pass band and stop band in dB
non-resource cost in DSP processors or FPGAs benefiting from are showed. Table III and IV indicate DSP slice usage for
the DSP slices [7]. single channel and multichannel respectively. Table V and VI
show LUT (Look-up-table) usage for single channel and
VIII. HARDWARE IMPLEMENTATION multichannel respectively. Table VII and VIII reveal flip-flop
Algorithmic complexity increases as application demand usage regarding single channel and multichannel. Finally, table
increases. In order to implement these new algorithms, higher IX indicate latency values (in cycles) with respect to single
performance signal processing hardware is required. Classic channel and multichannel.
fixed architecture DSP processors cannot keep pace on its own.
Because of that, performance gap increases as algorithmic
complexity increases. Fig.8 shows the difference between the
conventional DSP processor and an FPGA.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
Fig. 14. 8 taps filtered signal in hardware ILA

Fig. 9. Overall system

Fig. 15. Output of the 3 taps filtered signal and noisy signal

Fig. 10. Signal with noise

Fig. 11. 3 taps filtered signal in hardware ILA


Fig. 16. Output of the 4 taps filtered signal and noisy signal

Fig. 12. 4 taps filtered signal in hardware ILA

Fig. 17. Output of the 5 taps filtered signal and noisy signal

Fig. 13. 5 taps filtered signal in hardware ILA

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
TABLE V. LUT USAGE (SINGLE CHANNEL)

LUT
Taps
Utilization Available Utilization %
3 13812 63400 21.79
4 13844 63400 21.84
8 13908 63400 21.94
16 14036 63400 22.14
32 14292 63400 22.54
64 14805 63400 23.35
128 16326 63400 25.75
256 21080 63400 33.25

TABLE VI. LUT USAGE (MAX 8-PARALLEL CHANNEL)

LUT
Taps
Fig. 18. Output of the 8 taps filtered signal and noisy signal
Utilization Available Utilization %
3 14036 63400 22.14
TABLE I. FILTER ANALYSIS (PASS BAND) 4 14292 63400 22.54
8 14804 63400 23.35
Pass Band 16 15828 63400 24.97
Taps
32 17876 63400 28.20
Min Max Ripple
3 -38.401220 dB -21.330237 dB 17.070983 dB 64(max 7-Channel) 22629 63400 35.69
4 -38.416314 dB -13.901644 dB 24.514670 dB 128(max 3-Channel) 21414 63400 33.78
8 -39.653911 dB 5.255691 dB 44.909602 dB 256(max 1-Channel) 21080 63400 33.25
16 -44.288399 dB 3.451442 dB 47.739841 dB
32 -54.391398 dB 1.186509 dB 55.577907 dB TABLE VII. FLIP-FLOP USAGE (SINGLE CHANNEL)
64 -84.288399 dB 0.086351 dB 84.374750 dB
128 -76.329599 dB 0.005684 dB 76.335282 dB FLIP-FLOP
256 -72.247199 dB 0.010586 dB 72.257785 dB Taps
Utilization Available Utilization %
TABLE II. FILTER ANALYSIS (STOP BAND) 3 14769 126800 11.65
4 14802 126800 11.67
8 14900 126800 11.75
Stop Band 16 15096 126800 11.91
Taps
Min Max Ripple 32 15488 126800 12.21
3 - -38.249385 dB - 64 16272 126800 12.83
4 - -38.416314 dB - 128 17840 126800 14.07
8 - -39.489903 dB - 256 21044 126800 16.60
16 - -43.989592 dB -
32 - -53.345216 dB - TABLE VIII. FLIP-FLOP USAGE (MAX 8-PARALLEL CHANNEL)
64 - -61.919041 dB -
128 - -59.358466 dB - FLIP-FLOP
256 - -60.048297 dB - Taps
Utilization Available Utilization %
TABLE III. DSP SLICE USAGE (SINGLE CHANNEL) 3 15665 126800 12.35
4 15922 126800 12.56
8 16692 126800 13.16
DSP SLICE
Taps 16 18232 126800 14.38
Utilization Available Utilization % 32 21312 126800 16.81
3 2 240 0.83 64(max 7-Channel) 26776 126800 21.12
4 3 240 1.25 128(max 3-Channel) 24112 126800 19.02
8 5 240 2.08 256(max 1-Channel) 21044 126800 16.60
16 9 240 3.75
32 17 240 7.08 TABLE IX. LATENCY (SINGLE VS MAX 8-PARALLEL CHANNEL)
64 33 240 13.75
128 65 240 27.08
LATENCY (CYCLES)
256 129 240 53.75
Taps
Single
Multichannel Difference
TABLE IV. DSP SLICE USAGE (MAX 8-PARALLEL CHANNEL) Channel
3 9 9 -
DSP SLICE 4 10 10 -
Taps 8 12 12 -
Utilization Available Utilization % 16 16 16 -
3 16 240 6.67 32 24 24 -
4 24 240 10 64(max 7-Channel) 40 40 -
8 40 240 16.67 128(max 3-Channel) 72 72 -
16 72 240 30 256(max 1-Channel) 140 - -
32 136 240 56.67
64(max 7-Channel) 231 240 96.25
128(max 3-Channel) 195 240 81.25
256(max 1-Channel) 129 240 53.75

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.
Minimum latency and parallel multichannel processing
LATENCY-TAPS features enable the users to real-time process with no latency
for critical systems such as power electronic devices used for
smart grids for the future scope and also the unwanted portions
LATENCY
300
200 of the signals (noise) that are produced from switching devices
100
0 etc. can be eliminated. Thanks to the parallel processing
1 2 3 4 5 6 7 8 implementation feature of the FPGAs, the users can use these
Taps 3 4 8 16 32 64 128256 systems required high speed applications.
Latency(cycle) 9 10 12 16 24 40 72 140 REFERENCES
[1] S. M. Qasim, M. S. BenSaleh and A. M. Obeid, “Efficient FPGA
Fig. 19. Latency taps graph implementation of microprogram control unit based FIR filter using
Xilinx and Synopsys tools“, Proc. of Synopsys User Group Conference
(SNUG), Silicon Valley, USA, vol. 3, pp. 1-14, 2012.
X. CONCLUSION AND FUTURE SCOPE
[2] A. A. AlJuffri, A. S. Badawi, M. S. BenSaleh, A. M. Obeid and S. M.
Various orders low pass FIR filters are implemented in the Qasim, “FPGA implementation of scalable microprogrammed FIR filter
Artix 7-series FPGA. Systolic Multiply-Accumulator architectures using Wallace tree and Vedic multipliers”, 3rd IEEE
architecture is used to preserve resources. As it can be seen International Conference on Technological Advances in Electrical,
Electronics and Computer Engineering, Beirut, Lebanon, vol. 29, pp.
from the results that parallel multichannel implementation is 159-162, 2015.
possible as long as the DSP slices are available. For one [3] B. Mamatha and V. V. S. V. S. Ramachandram “Design and
channel, 478 taps can be executed to fully use the whole DSP implementation 120 order FIR filter based on FPGA”, International
slice. 8-channel is executed to show parallel implementation, Journal of Engineering Sciences & Emerging Technologies, vol. 3, pp.
however, it can be seen from the results that up to 64 taps, there 90-97, 2012.
are more resources if we want to increase the number of [4] S. Mirzaei, A. Hosangadi and R. Kastner, “FPGA implementation of
channel. The flip-flops and the LUTs (Look Up Table) change high speed FIR filters using add and shift methods”, 2006 International
Conference on Computer Design, San Jose, CA, USA, vol. 10, pp. 308-
slightly with respect to the number of taps and parallel 313, 2007.
multichannel. As long as the DSP slices are available for
[5] P. Faria and Z. Vale, “Digital signal processing issues in the context of
multichannel implementation, the latency does not change. the future smart grids”, Advanced Science and Technology Letters,
Filtered signals are captured through ILA and DAC. Phase (SUComS 2015), vol.97, pp. 30-35, 2015.
difference between the original noisy signal and the filtered [6] S. M. Kuo, B. H. Lee and W. Tian,”Design and implementation of FIR
signal is measured such that 12,5µs, 17,3µs, 22,1µs, 26,2µs for filters,” in Real-Time Digital Signal Processing Implementations and
3-taps, 4-taps, 5-taps, and 8-taps respectively. In this Applications, 2nd ed., England: Wiley, 2006, ch. 4, pp. 185-245.
implementation, it is seen that hardware that are based on the [7] Xilinx, “FIR Compiler v7.2 LogiCore IP Product Guide”, Vivado
FPGA parallel data processing capabilities can get rid of the Design Suit, PG149, November 18, 2015.
unwanted portions of signals with FIR filters in minimum time [8] Xilinx, “DSP: Designing for Optimal Results High Performance DSP
Using Virtex-4 FPGAs”, DSP Solutions-Advanced Design Guide,
considerations. The low cost Artix 7-series FPGA with low- Edition 1.0, March 2005.
taps filter the noisy signal as intended. Considering the cost of
the conventional DSP processors, as it can be seen that an
FPGA with DSP handling capabilities displays better results.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 11:57:51 UTC from IEEE Xplore. Restrictions apply.

You might also like