0% found this document useful (0 votes)
8 views

DSP Processing

The document discusses DSP algorithms and their implementation for applications like audio and video processing. It covers DSP processors and architectures, and algorithms like FIR filters, IIR filters, and LFSRs. It also discusses considerations for the DSP processing environment and architecture.

Uploaded by

marvel homes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DSP Processing

The document discusses DSP algorithms and their implementation for applications like audio and video processing. It covers DSP processors and architectures, and algorithms like FIR filters, IIR filters, and LFSRs. It also discusses considerations for the DSP processing environment and architecture.

Uploaded by

marvel homes
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter 8

DSP Algorithms and Video Processing

The DSP processing environment should use the real-time


clocks, DSP processor cores, and DMA.

Abstract This chapter discusses the DSP algorithms and the role of the design engi-
neer to achieve the desired performance for the DSP designs. The chapter is useful
to understand the basics of FIR, IIR filter design using Verilog and the performance
improvement for the design. The chapter even focuses on the architecture and micro-
architecture and their implementation for the video applications. The video encoder
and decoder architectures and micro-architecture to design them are discussed with
the practical scenarios.

Keywords RTL · Verilog · DSP · FIR · IIR · LFSR · Video decoder


Video encoder · Audio processing · Video processing · If-else · Case · Process
Sequential design · Pipelining · DSP processor · MAC

During this century, we visualize lot many applications using the digital signal pro-
cessing (DSP). The complexity of these applications and the desired speed encour-
ages us to design the high-performance DSP processors. The application can be in the
multimedia, audio or video processing; the requirement is to have the least area, low
power and high speed. Even the data rate, efficiency of the design, and multitasking
are key important parameters need to be thought before implementing such applica-
tions. The chapter focuses on all these aspects and the design of the DSP algorithms.
The chapter is useful to understand the DSP processor trends, architecture, and the
Xilinx and Intel FPGAs suitable for such kind of the complex applications.

© Springer Nature Singapore Pte Ltd. 2019 141


V. Taraate, Advanced HDL Synthesis and SOC Prototyping,
https://doi.org/10.1007/978-981-10-8776-9_8
142 8 DSP Algorithms and Video Processing

8.1 DSP Processors

If we consider the use of the DSP, then there are many areas in which we can use the
algorithms to get the desired speed, power, and area. The main applications of the
DSP are in the following areas, these areas are evolved during this decade, and still
there is research going on in these areas. Few of the DSP application areas are listed
below:
1. Control and instrumentation applications: The DSP processors and algorithms
are extensively used for navigation and guidance, power system monitoring,
transient analysis of the signals, RADAR, sonar, etc.
2. Speech processing applications: Most of the times, we need to have the effi-
cient DSP algorithm or architecture for the encryption, decryption, and speech
recognition. For such kind of designs, the DSP algorithms can be realized using
the FPGAs.
3. General Purpose DSP applications: In general, most of the time to design the
filters like FIR, IIR and for convolution we use the DSP algorithms using C/C++
or using the HDLs.
4. Audio processing applications: Audio equalization, audio mixing, sound syn-
thesis are few of the important applications where the efficient DSP architecture
and implementation can give the better results.
5. Image processing: The compression and decompression of the images, image
recognition, face recognition and image enhancement are few of the areas where
DSP algorithms and processors are used extensively.
In this context, the designs need to be prototyped on the FPGAs. In the present
scenario, if we consider the modern FPGA architecture of Xilinx or Intel FPGA, then
we can conclude that these architectures are efficient enough to achieve the desired
performance for the complex DSP tasks. If we consider the basic DSP representation,
then we can think of following blocks to implement the DSP algorithms (Fig. 8.1).

Fig. 8.1 DSP-based processing blocks


8.1 DSP Processors 143

As shown in the figure for any DSP implementation, we can think of using the
following blocks:
1. ADC
2. DSP processor
3. DAC
The input to the system is analog input and is converted into the digital data using
the ADC. The analog inputs are sampled by the sample and hold depending on the
sampling frequency and resolution of the ADC. The sampled signals are quantized
to get the digital output and given to the DSP processor for the processing of the
desired application. The output from the system can be digital or analog depending
on the requirement of the design. In the practical environment the system may need
to have input filters for such kind of processing.
As a prototype engineer to implement these applications, we need to think about
the following few points:
1. How fast the input signal is? This can allow us to choose the ADC to sample the
correct signal.
2. In the practical scenario, the designer can use the ADC daughter card with the
FPGA.
3. The DSP algorithm complexity, speed, power, and bandwidth requirements
decides about the selection of the desired FPGA.
4. Whether the design needs hard processor, DSP core or the DSP algorithms need
to be implemented using HDL?
5. What is the operating frequency required to execute the single instruction or
multiple instructions?
6. Whether my design architecture is efficient enough to allow the chunk of data
residing inside the FPGA platform?
7. Can I use the lower frequency with the multitasking or whether my design needs
to run at high frequency without parallelism?
8. Whether it supports the real-time processing of the data?
The answer to all these questions can yield into the better DSP architecture and
the algorithm implementation.

8.2 DSP Algorithms and Implementation

What we need to think is?


1. What kind of computational elements required?
2. The complexity of the DSP algorithm
3. What are the functional implementation requirements
a. Adders, multipliers, shifters, MAC
b. The pipelined requirements
144 8 DSP Algorithms and Video Processing

Fig. 8.2 Multiply and


accumulate

c. The speed, area, and low power requirements for the design
d. The design partitioning into multiple functional blocks
For example, consider the DSP algorithm which needs accumulation of the data
after multiplication. The MAC can be efficiently designed and shown in Fig. 8.2.
The RTL design and implementation of the Linear Feedback Shift Register (LFSR)
are discussed in this section. The designers can think of other algorithmic implemen-
tation like FIR, IIR using the HDLs.

8.2.1 LFSR

In most of the applications, we need to implement the polynomial to have the LFSR.
The RTL code for the polynomial is described in Example 8.1 (Fig. 8.3).

8.3 DSP Processing Environment

While designing algorithms for the DSP applications, consider the following few
important points (Fig. 8.4).
1. The processing speed, throughput, and the IO data rate
2. The processor architecture, whether it supports the pipelining of the instructions
3. Whether processor supports the floating point operations using the available
instructions
4. The architecture should have the separate program memory and data memory
buses
5. For fast access of the data, the DMA interface should be the better choice.
6. Have the internal storage in the form of FIFO or circular buffers
8.4 Architecture for the DSP Algorithms 145

//verilog code for the lfsr

module lfsr ( clk, y_out);

input clk;

output [5:0] y_out;

reg [5:0] tmp_reg;

integer k;

always @ (posedge clk)

begin

tmp_reg [0] <= tmp_reg[4] ~^ tmp_reg[5];

for (k=5; k>=1; k=k-1)


The LFSR triggered on
tmp_reg [k] <= tmp_reg [k-1]; the posiƟve edge of
clock and having
end output y_out.
assign y_out = tmp_reg;

endmodule

Example 8.1 Verilog code for the LFSR

Fig. 8.3 Synthesis result for the LFSR

8.4 Architecture for the DSP Algorithms

While architecting for the SOC for the DSPs what we need to think? This can be
effectively answered in the following way!

1. DSP processor core: The capability of core should be to perform the complex
operations. It should have
146 8 DSP Algorithms and Video Processing

a. Internal memory and storage registers


b. Circular buffers and FIFO mechanisms to support the queuing of the data
c. Multipliers and large-density accumulators
d. Shifters
e. The floating point support logic
f. The separate logic for the program and data memory access
g. Pipelining and multitasking features

2. DMA controllers: The most important feature of the direct memory access
should be on chip with the processor. This will give freedom to the DSP proces-
sors to perform the concurrent operation with the DMA.

a. DMA can be used to transfer the burst of the data between the memories or
from the memory to IO.

3. The serial interfaces: The capabilities like serial data transfer using I2C or SPI
can be an added advantage. They can be used to interface the eternal serial devices
with the SOC.

Program Program DATA


Memory Memory Memory
Address Address Data Memory
Generator Generator

Instruction Program
cache Sequencer
and
Scheduler

Multipliers
Registers DMA

Controller

ALU

Memory
CLK and
Shifters
reset logic

Fig. 8.4 DSP processing system


8.4 Architecture for the DSP Algorithms 147

4. On-chip PLL: The phase locked loops to generate the clock and the additional
clock distribution logic for the clock distribution with uniform clock skew.
5. Real-time data processing: To process the real-time data, the timers and real-
time clocking should be present in the DSP SOC.
6. USB controllers: The USB interface for the data transfer between the host system
and the DSP processor core can be used in most of the architectures.
7. Analog blocks: The analog blocks like ADCs and DACs can make the SOC
compatible for the analog interfaces.
8. BUS interface logic: The SOC components with additional logic can be inter-
faced with the host using the high speed bus interfaces.

By considering all these features, the architecture of the SOC should be evolved
for the required DSP capabilities is shown in Fig. 8.5.

Real Ɵme clock


USB Controller
DSP processor cores
Timer

External Bus Serial


interface Internal storage interfaces I2c
and SPI

DMA
controller

ADC DAC PLL

Fig. 8.5 Architecture for the DSP processing system


148 8 DSP Algorithms and Video Processing

8.5 Video Encoders and Decoders

The high-density video processing systems need to have the video encoders and the
decoders. The architecture of such type of the system is complex enough due to the
parallelism, storage needs. The video encoding should be real time and what needs
to be incorporated?
1. The ping-pong buffers or the circular buffers to queue the data.
2. Frame prediction logic can be used to detect the type of the frame. For example,
if we use the H.264 encoder standard, then the frame can be intra or inter and
can be predicted by the frame prediction logic.
3. Frame processing logic: To have the quantization and the entropy coding, the logic
can be used. As such kind of system uses the complex matrix multiplications,
the density of such logic is high.
4. Internal memory buffers: To store the data for the predictions, the high capacity
memory buffers are required.
5. Controllers: The controller using the multiple state machines can be designed for
such kind of the encoders to derive the timing and control signals.
The video encoding system is shown in Fig. 8.6. It is assumed that the video input
and output are digital data.
The compressed video data from the video encoders can act as input to the video
decoding system (Fig. 8.7). The components of such type of system are
1. Entropy encoding
2. The intra- or inter frame prediction logic
3. Inverse quantization and transform
4. Deblocking logic
5. Frame buffer (frame storage)

Fig. 8.6 Video encoder


8.6 How the Discussion Is Helpful in SOC Prototyping? 149

Fig. 8.7 Video decoder (H.264)

8.6 How the Discussion Is Helpful in SOC Prototyping?

As the FPGAs are having the DSP capabilities, they can be used to implement the
DSP algorithms. During the SOC prototyping, the RTL can be tweaked to have the
FPGA equivalent. Implement the FPGA-based algorithms by using the dedicated
DSP blocks available inside the FPGA.
High-density FPGAs from Xilinx or Intel are efficient for digital signal processing
(DSP) applications because they can implement custom, fully parallel algorithms.
As stated earlier, the DSP system should use the multipliers and accumulator while
executing the DSP algorithms.
Features of Xilinx 7 series FPGAs are listed below:
1. Full-custom, low-power DSP slices
2. High-speed, small size architecture
3. The DSP slices to enhance the design performance.
The basic functionality of the DSP48E1 slice is shown in Fig. 8.8 [1], and high-
lights of the DSP functionality include [1]:
1. 25 × 18 two’s complement multiplier
2. Dynamic bypass 48-bit accumulator
3. Single instruction, multiple data (SIMD) arithmetic unit
150 8 DSP Algorithms and Video Processing

Fig. 8.8 Xilinx DSP slice architecture [1]

4. Dual 24-bit or quad 12-bit add/subtract/accumulate


5. 96-bit-wide logic functions when used in conjunction with the logic unit
6. Optional pipelining and dedicated buses for cascading.

8.6.1 Intel FPGA DSP Block

Intel Stratix 10 devices have the powerful DSP features to perform the floating point
operations. The DSP block has the hard fixed point capability. The DSP architecture
is based on the variable precision architecture.
The important features of the DSP block are listed below:
1. Hard 18-bit and 25-bit pre-adders
2. Hard floating point adders and multipliers
3. For separate I, Q product accumulation the provision of the 64-bit accumulator
4. Embedded coefficient registers for the for 18-bit and 27-bit coefficients
5. Cascaded output adder chain for 18- and 27-bit FIR filter
6. Fully independent multiplier output
7. Can be easily inferred in all the modes using the HDL
The DSP block having standard precision fixed point mode is shown in Fig. 8.9
[2].
The DSP block with high precision fixed point mode is shown in Fig. 8.10 [2].
The DSP block with the single precision floating point number is shown in
Fig. 8.11 [2].
As shown in the figure, each DSP block can be configured independently as either
dual 18 × 19 or single 27 × 27 multiply and accumulate. The main application of
such kind of DSP is to implement the high precision DSP functions using the 64-bit
8.6 How the Discussion Is Helpful in SOC Prototyping? 151

Fig. 8.9 Standard precision fixed point mode [2]

Fig. 8.10 High precision fixed point mode [2]

cascade bus. The architecture of the DSP block is flexible enough, and by using
64-bit bus the multiple high precision blocks can be cascaded. Even in the floating
point mode, each DSP block provides the single precision floating point adder and
multiplier.
152 8 DSP Algorithms and Video Processing

Fig. 8.11 Single precision floating point number [2]

Table 8.1, shows the variable precision DSP block configuration.


The complex multiplication using variable precision DSP block supports the FFT
algorithms. The DSP block supports 18-bit DSP applications such as high-definition
video processing. It supports the floating point multiplications. The major advantage
of using this DSP capability is to reduce the overheads in the system design. It has
increased system performance and the low power consumption.
The prototype team can choose the FPGA depending on the need of the DSP
capabilities and complexity.

8.7 Design Scenarios

This section describes a few of the design scenarios. Most of the time, we need to
have the multipliers, barrel shifters, and filters during the implementation of the DSP
algorithms.

8.7.1 The Design of the IIR Filter

The infinite input response filter implementation is described in Example 8.2.


8.7 Design Scenarios

Table 8.1 Multiplier size [2]


Multiplier size DSP block resources Expected usage

18 × 19 bits Half of variable precision DSP block Medium precision fixed point operation

27 × 27 bits One variable precision DSP block High precision fixed point operation

19 × 36 bits One variable precision DSP block with external adder Fixed point FFT

36 × 36 bits Two variable precision DSP blocks with external adder Very high precision fixed point

54 × 54 bits Four variable precision DSP blocks with external adder Double precision floating point

Single precision floating point One single precision floating point adder, one single precision floating point multiplier Floating point
153
154 8 DSP Algorithms and Video Processing

8.7.2 FIR Filter

The Verilog description of the direct FIR filter is described in Example 8.3. The filter
design uses more number of multipliers. While implementing the filters, the designer
can push the logic using the DSP slices (Figs. 8.12 and 8.13).

8.7.3 Barrel Shifters

The barrel shifters are used to shift the data during the DSP algorithms. The Verilog
code is described in Example 8.4 (Fig. 8.14).

//Verilog code for the iir filter

module iir_design (clk, reset_n, data_in, data_out);

parameter N=15;

input clk ;

input reset_n;

input [N-1:0] data_in;

output [N-1:0] data_out;

reg [N-1:0] tmp1_data_out, tmp2_data_out;

always@ (posedge clk or negedge reset_n)

begin

if (~reset_n)
tmp1_data_out<=0;
tmp2_data_out<=0; The IIR filter sensiƟve
on the rising edge of
else clock.

tmp1_data_out<=data_in;

tmp2_data_out<= tmp1_data_out+
{tmp2_data_out[N-1], tmp2_data_out[N-2:0]}
+ { {2{tmp2_data_out[N-1], tmp2_data_out[N-1:1]}};

end

assign data_out<=
Example tmp2_data_out;
8.2 : Synthesizable Verilog code of the IIR filter
endmodule

Example 8.2 Synthesizable Verilog code of the IIR filter


8.7 Design Scenarios 155

//Verilog code for the 4 tap direct FIR filter


module fir_design (clk, reset_n, data_in, data_out);
parameter N=8;
input clk ;
input reset_n ;
input [N-1:0] data_in ;
output [N-1:0] data_out ;
reg [N-1:0] tmp_0, tmp_1, tmp_2, tmp_3;
reg [N-1:0] data_out, tap_0, tap_1, tap_2, tap_3;

always @ (posedge clk or negedge reset_n)


begin
The four tap direct fir
if(~reset_n) filter realizaƟon using
begin the non-blocking
assignments
data_out<=0;
{tmp_0,tmp_1,tmp_2,tmp_3} <= {0,0,0,0};
tap_3<=0;
tap_2<=0;
tap_1<=0;
tap_0<=0;

end
else
begin
tmp_1 <= tap_1<<1+tap_1+{tap_1[7], tap_1[7:1]}+{tap_1[7], tap_1[7], tap_1[7:2]};
tmp_2 <= tap_2<<1+tap_2+{tap_2[7], tap_2[7:1]}+{tap_2[7], tap_2[7], tap_2[7:2]};

tmp_3<=tap_3;

tmp_0<=tap_0;

data_out<= tmp_1+tmp_2-(tmp_3+tmp_0);

tap_3<=tap_2;

tap_2<=tap_1;
The four tap direct FIR
tap_1<=tap_0; filter using Verilog is
sensiƟve to posiƟve
tap_0<=data_in; edge of the clock and
having the output
end data_out
end
endmodule

Example 8.3 Synthesizable Verilog code of four tap FIR filters


156 8 DSP Algorithms and Video Processing

Fig. 8.12 Synthesis result for the FIR filter

// Verilog code for the barrel shifter


module barrel_shifter (clk, reset_n, load_en, data_in, data_out);

input clk;

input reset_n ;

input load_en;

input [7:0] data_in ; 26

output wire [7:0] data_out ;

reg [7:0] tmp_data_out;

always@ (posedge clk or negedge reset_n)

begin

if (~reset_n) The data is shiŌed on


the rising edge of the
tmp_data_out<= 0; clock

else if (load_en)

tmp_data_out<= data_in;

else

tmp_data_out <= {tmp_data_out[6:0],tmp_data_out[7]};

end

assign data_out = tmp_data_out ;

endmodule

Example 8.4 Synthesizable Verilog code for the barrel shifter


8.8 Important Takeaways and Further Discussions 157

Fig. 8.13 Synthesis result for the FIR filter(contd.)

Fig. 8.14 Synthesis result for the barrel shifter

8.8 Important Takeaways and Further Discussions

Following are a few important points to summarize the chapter


1. DSP applications like IIR, FIR need to be implemented using the dedicated FPGA
blocks.
2. If DSP IPs are available, then during prototype use the vendor specified boards.
3. If the multiple FPGAs are used in the design, then take care of the design parti-
tioning of complex IIR and FIR filters.
4. Choose the DSP processors by understanding the required speed requirements
for the design.
5. Use the pipelined algorithms and pipelined controlled stages to implement the
DSP algorithms.
6. For floating point operations, the important parameters are area, speed, and power.
If FPGAs are used, then check for the inference of the RTL code in the specific
DSP slice.
158 8 DSP Algorithms and Video Processing

The next chapter discusses the ASIC and FPGA synthesis and is useful to have
an understanding of the synthesis and constraints. The RTL tweaks required for the
FPGA equivalent are discussed in the chapter.

References

1. www.xilinx.com
2. www.altera.com

You might also like