21ec42 Module 5
21ec42 Module 5
MODULE-5
DIGITAL SIGNAL PROCESSORS
Prepared By:
Mrs. Latha S
Asst. Prof., Dept. of ECE
1 Module-5 Digital Signal Processors Dept. of ECE, SJBIT
Module-5
DIGITAL SIGNAL PROCESSORS
Syllabus
DSP Architecture, DSP Hardware Units, Fixed point format, Floating point Format, IEEE Floating
point formats, Fixed point digital signal processors, Floating point processors, FIR and IIR filter
implementations in Fixed point systems.
Unlike microprocessors and microcontrollers, digital signal (DS) processors have special
features that require operations such as fast Fourier transform (FFT), filtering, convolution and
correlation, and real-time sample-based and block-based processing. Therefore, DS processors
use a different dedicated hardware architecture.
• To accelerate the execution speed of digital signal processing, DS processors are designed
based on the Harvard architecture (Figure 2), which originated from the Mark 1 relay-based
computers built by IBM in 1944 at Harvard University.
• The DS processor has two separate memory spaces. One is dedicated for the program
code, while the other is employed for data.
• Hence, to accommodate two memory spaces, two corresponding address buses and two
data buses are used.
• In this way, the program memory and data memory have their own connections to the
program memory bus and data memory bus, respectively.
• This means that the Harvard processor can fetch the program instruction and data in
parallel at the same time, the former via the program memory bus and the latter via the data
memory bus.
• There is an additional unit called a multiplier and accumulator (MAC), which is the
dedicated hardware used for the digital filtering operation.
• The last additional unit, the shift unit, is used for the scaling operation for fixed-point
implementation when the processor performs digital filtering.
Application:
• The Harvard architecture is preferred for all DS processors due to the requirements of most
digital signal processing (DSP) algorithms, such as filtering, convolution, and FFT, which
need repetitive arithmetic operations, including multiplications, additions, memory access,
and heavy data flow through the CPU.
• For the other applications, such as those dependent on simple microcontrollers with less of
a timing requirement, the Von Neumann architecture may be a better choice, since it offers
much less silica area and is thus less expensive.
2) Shifters
• In digital filtering, to prevent overflow, a scaling operation is required.
• A simple scaling-down operation shifts data to the right, while a scaling-up operation shifts data
to the left.
• Shifting data to the right is the same as dividing the data by 2 and truncating the fraction part;
shifting data to the left is equivalent to multiplying the data by 2.
E.g. for a 3-bit data word 0112 = 310, shifting 011 to the right gives 0012 = 1, that is, 3/2=1.5, and
truncating 1.5 results in 1.
Shifting the same number to the left, we have 1102 = 610, that is, 3 x 2 = 6.
• The DS processor often shifts data by several bits for each data word.
• To speed up such operation, the special hardware shift unit is designed to accommodate the scaling
operation, as depicted in Figure 2.
3) Address Generators
• The DS processor generates the addresses for each datum on the data buffer to be processed.
• A special hardware unit for circular buffering is used.
• Figure 9.6 describes the basic mechanism of circular buffering for a buffer having eight data
samples.
• In circular buffering, a pointer is used and always points to the newest data sample, as shown in
Figure 9.6.
• After the next sample is obtained from analog-to-digital conversion (ADC), the data will be placed
at the location of x(n-7) and the oldest sample is pushed out.
• Thus, the location for x(n-7) becomes the location for the current sample.
• The original location for x(n) becomes a location for the past sample of x(n-1). The process
continues. Only one location is updated.
• The circular buffer acts like a first-in/first-out (FIFO) buffer, but each datum on the buffer does not
have to be moved.
• Figure gives a simple illustration of the 2-bit circular buffer. In the figure, there is data flow to the
ADC (a, b, c, d, e, f, g, .) and a circular buffer initially containing a, b, c, and d.
• The pointer specifies the current data of d, and the equivalent FIFO buffer is shown on the right side
with the current data of d at the top of the memory.
• Like finite impulse response (FIR) filtering, the data buffer size can reach several hundreds. Hence,
using the circular buffer will significantly enhance the processing speed.
Figure 5: Illustration of circular buffering. Figure 6: Circular buffer and equivalent FIFO.
Fixed-Point Format
• Considering a 3-bit 2’s complement, we can represent all the decimal numbers shown in Table A.
Table A. 3-bit 2's compliment representation
DECIMAL TWO'S
NUMBER COMPLIMENT
3 011
2 010
1 001
0 000
-1 111
-2 110
-3 101
-4 100
• As we see, a 3-bit 2’s complement number system has a dynamic range from -4 to 3, which is very
narrow. Since the basic DSP operations include multiplications and additions, results of operation
can cause overflow problems. Overflow occurs when the result magnitude is greater than the limits
that can be held destination location.
• Decimal values as fractional numbers, to obtain the fractional binary 2’s complement system
shown in Table below.
• To become familiar with the fractional binary 2’s complement system, let us convert a positive
fraction number ¾ and a negative fraction number -1/4 in decimals to their 2’s complements.
3
Since = 𝟎 × 20 + 𝟏 × 2−1 + 𝟏 × 2−2 its 2’s complement is 011.
4
1
And 4
= 𝟎 × 20 + 𝟎 × 2−1 + 𝟏 × 2−2 its 2’s complement is 001.
𝟏
For - taking two’s complement (110 + 1) = 111.
𝟒
Q- Format
• Q-format number representation is the most common one used in fixed-point DSP implementation.
• The Q number represents the number of fractional bits. Q-15 has 15 fractional bits.
It is defined in Figure below.
• Q-15 means that the data are in a sign magnitude form in which there are 15 bits for magnitude
and one bit for sign. Note that after the sign bit, the dot shown in Figure implies the binary point.
• The number is normalized to the fractional range from -1 to 1.
• The range is divided in to 216 intervals, each with a size of 2-15. The most negative number is -1,
while the most positive number is 1-2-15.
• Any result from multiplication is within the fractional range of -1 to 1.
Let us study the following examples to become familiar with Q-format number representation.
Example:
Find the signed Q-15 representation for the decimal number 0.560123.
Table C
The Q-format number representation is a better choice than the 2’s complement integer representation,
it can prevent multiplication overflow.
But we need to be concerned with the following problems.
1) When converting a decimal number to its Q-N format, where N denotes the number of
magnitude bits, we may lose accuracy due to the truncation error, which is bounded by the size
of the interval, that is, 2-N.
2) Addition and subtraction may cause overflow, where adding two positive numbers leads to a
negative number, or adding two negative number yields a positive number; similarly,
subtracting a positive number from a negative number gives a positive number, while
subtracting a negative number from a positive number results in a negative number.
3) Multiplying two numbers in Q-15 format will lead to a Q-30 format, which has 31 bits in total.
As in Example 9.7, the multiplication of Q-3 yields a Q-6 format, that is, 6 magnitude bits and
a sign bit. In practice, it is common for a DS processor to hold the multiplication result using a
double word size such as MAC operation, for multiplying two numbers in Q-15 format. In Q-
30 format, there is one sign-extended bit. We may get rid of it by shifting left by one bit to
obtain Q-31 format and maintaining the Q-31 format for each MAC operation.
4) Underflow can happen when the result of multiplication is too small to be represented in the
Q-format. As an example, in a Q-2 system (No. of fractional bits = 2) multiplying 0.01 x 0.01
leads to 0.0001. To keep the result in Q-2, we truncate the last two bits of 0.0001 to achieve
0.00, which is zero. Hence, underflow occurs.
Floating-Point Format
• To increase the dynamic range of number representation, a floating-point format, which is similar
to scientific notation, is used.
• The general format for floating-point number representation is given by
𝑀. 2𝐸
• where M is the mantissa, or fractional part in Q format, and E is the exponent. The mantissa and
exponent are signed numbers. If we assign 12 bits for the mantissa and 4 bits for the exponent, the
format looks like Figure 7.
• Since the 12-bit mantissa is limited to between -1 to +1, the number of bits assigned to the exponent
controls the dynamic range.
• The bigger the number of bits designated to the exponent, the larger the dynamic range.
• The number of bits for the mantissa defines the interval in the normalized range; as shown in Figure
7, the interval size is 2-11 in the normalized range, which is smaller than the Q-15 format. However,
when more mantissa bits are used, there will be a smaller interval size. Using the format in Figure
7, we can determine the most negative and most positive numbers as
The exponent acts like a scale factor to increase the dynamic range of the number
representation.
Floating point arithmetic is more complicated. We must obey the rules for manipulating two
floating point numbers.
Addition of two floating point numbers given as
𝑥1 = 𝑀1 2𝐸1
𝑥2 = 𝑀2 2𝐸2
It is required to make the exponents equal (here the greater one is chosen); this is called aligning
the exponents. The floating-point sum is performed as follows:
𝑥1 = 𝑀1 2𝐸1
𝑥2 = 𝑀2 2𝐸2
where 0.5 ≤ |𝑀1 | < 1 and 0.5 ≤ |𝑀2 | < 1 , the calculation can be performed as follows:
Overflow
During an operation, overflow will occur when a number is too large to be represented in the floating-
point number system. Adding two mantissa numbers may lead to a number larger than +1 or less than
-1; and multiplying two numbers causes the addition of their two exponents so that the sum of the two
exponents could overflow. Consider the following overflow cases.
Case 1.
Add the following two floating-point numbers:
0111011000000000 + 011101000000000
Note that the two exponents are the same and they are the biggest positive number in 4-bit 2’s
complement representation. We add two positive mantissa numbers as 0. 11000000000
+ 0. 10000000000
1. 01000000000
MODULE-5 DIGITAL SIGNAL PROCESSORS
16 Module-5 Digital Signal Processors Dept. of ECE, SJBIT
MSB is 1. The result for adding mantissa numbers is negative. Hence the overflow occurs.
Case 2: Multiply the following two numbers:
0111 011000000000 x 0111 01100000000
Adding the two positive exponents gives 0111 + 0111 =1000 (negative, the overflow occurs)
Multiplying the two mantissa numbers gives
0.11000000000 x 0.1100000000 = 0.10010000000 (This is OK!)
• The sign bit S is employed to indicate the sign of the number, where when S = 1 the number is
negative, and when S = 0 the number is positive.
• The exponent E is in excess 127 form. The value of 127 is the offset from the 8-bit exponent
range from 0 to 255, so that E-127 will have a range from -127 to +128.
• The formula shown in Figure 8 can be applied to convert the IEEE 754 standard (single
precision) to the decimal number.
The following simple examples also illustrate this conversion:
Figure 9
Floating-Point Processors
• Floating-point DS processors perform DSP operations using floating-point arithmetic, as we
discussed before.
• The advantages of using the floating-point processor include getting rid of finite word length effects
such as overflows, round-off errors, truncation errors, and coefficient quantization error.
• Hence, in terms of coding, we do not need to scale input samples to avoid overflow, shift the
accumulator result to fit the DAC word size, scale the filter coefficients, or apply Q-format
arithmetic.
• A floating-point DS processor with high speed and calculation precision facilitates a friendly
environment to develop and implement DSP algorithms.
• Analog Devices provides floating-point DSP families such as ADSP210xx and TigerSHARC.
• Texas Instruments offers a wide range of the floating-point DSP families, in which the TMS320C3x
is the first generation, followed by the TMSC320C4x and TMS320C67x families.
• Since the first generation of a floating-point DS processor is less complicated than later generations
but still has the common basic features, we review the first-generation architecture first.
• Figure 11 shows the typical architecture of Texas Instruments’ TMS320C3x family of processors.
We discuss some key features briefly.
• The TMS320C3x family consists of 32-bit single chip floating-point processors that support both
integer and floating-point operations.
• The processor has a large memory space and is equipped with dual-access on-chip memories.
• A program cache is employed to enhance the execution of commonly used codes.
• Similar to the fixed-point processor, it uses the Harvard architecture, where there are separate
buses used for program and data so that instructions can be fetched at the same time that data are
being accessed.
• There also exist memory buses and data buses for direct-memory access (DMA) for concurrent I/O
and CPU operations, and peripheral access such as serial ports, I/O ports, memory expansion, and
an external clock.
• The C3x CPU contains the floating-point/integer multiplier; an ALU, which is capable of operating
both integer and floating-point arithmetic; a 32-bit barrel shifter; internal buses; a CPU register
file; and dedicated auxiliary register arithmetic units (ARAUs).
• The multiplier operates single-cycle multiplications on 24-bit integers and on 32-bit floating-point
values.
• Using parallel instructions to perform a multiplication, an ALU will cost a single cycle, which means
that a multiplication and an addition are equally fast.
• The ARAUs support addressing modes, in which some of them are specific to DSP such as circular
buffering and bit-reversal addressing (digital filtering and FFT operations).
• The CPU register file offers 28 registers, which can be operated on by the multiplier and ALU. The
special functions of the registers include eight-extended 40-bit precision registers for maintaining
accuracy of the floating-point results.
• Eight auxiliary registers can be used for addressing and for integer arithmetic.
• These registers provide internal temporary storage of internal variables instead of external memory
storage, to allow performance of arithmetic between registers. In this way, program efficiency is
greatly increased.
• The prominent feature of C3x is its floating-point capability, allowing operation of numbers with a
very large dynamic range.
• It offers implementation of the DSP algorithm without worrying about problems such as overflows
and coefficient quantization.
• Three floating-point formats are supported.
➢ A short 16-bit floating-point format has 4 exponent bits, 1 sign bit, and 11 mantissa
bits.
➢ A 32-bit single precision format has 8 exponent bits, 1 sign bit, and 23 fraction bits.
➢ A 40-bit extended precision format contains 8 exponent bits, 1 sign bit, and 31 fraction
bits. Although the formats are slightly different from the IEEE 754 standard,
conversions are available between these formats.
• The TMS320C30 offers high-speed performance with 60-nanosecond single-cycle instruction
execution time, which is equivalent to 16.7 MIPS.
• For speech quality applications with an 8 kHz sampling rate, it can handle over 2,000 single-cycle
instructions between two samples (125 microseconds).
• With instruction enhancements such as pipelines executing each instruction in a single cycle (four
cycles required from fetch to execution by the instruction itself) and a multiple interrupt structure,
this high-speed processor validates implementation of real-time applications in floating-point
arithmetic.
In the figure, the input is scaled down by S and the filter coefficients are scaled by B. The output is
scaled up by multiplying with B and S to restore the value.
The scale factor B makes the coefficients bk/B convertible to the Q-format. The scale factors S and B
are usually chosen to be a power of 2, so the simple shift operation can be used in the coding process.
Let us implement an FIR filter containing filter coefficients larger than 1 in the fixed-point
implementation.
Example:
Given the FIR filter, 𝑦(𝑛) = 0.9𝑥(𝑛) + 3𝑥(𝑛 − 1) + 0.9𝑥(𝑛 − 2)
with a passband gain of 4, and assuming that the input range only occupies one quarter of the full range
for a particular application, develop the DSP implementation equations in the Q-15 fixed point system.
Solution: The adder may cause overflow if the input data exists for one quarter of the full dynamic
range. The scale actor is determined using the impulse response, which consists of the FIR filter
coefficients.
1 1
𝑆= (|ℎ(0)| + |ℎ(1)| + |ℎ(2)|) = (0.9 + 3 + 0.9) = 1.2
4 4
Overflow may occur. Hence, we select S = 2 (it is a power of 2). We choose B = 4 to scale all the
coefficients to be less than 1, so the Q-15 format can be used. As per the figure 12, the developed
difference equations are given by,
𝑥(𝑛)⁄
𝑥𝑠 (𝑛) = 2
input is scaled by S=2. 𝑥𝑠 (𝑛) is the scaled input.
𝑦𝑠 (𝑛) = 0.225𝑥𝑠 (𝑛) + 0.75 𝑥𝑠 (𝑛 − 1) + 0.225 𝑥𝑠 (𝑛 − 2)
Coefficients in the difference equation given in problem statement is divided by 4 to avoid overflow.
𝑦(𝑛) = 8 𝑦𝑠 (𝑛)
Multiplying by 8 to restore the value back.
Next, the direct-form I implementation of the IIR filter is illustrated in Figure 13. As shown in the
figure, the purpose of the scale factor C is to scale down the original filter coefficients to the Q-format.
The factor C is usually chosen to be a power of 2 for using a simple shift operation in DSP.
Example:
Given the IIR filter, 𝑦(𝑛) = 2𝑥(𝑛) + 0.5𝑦(𝑛 − 1)
uses the direct-form I realization, for a particular application, the maximum input is Imax=0.010….02 =
0.25. Develop the DSP implementation equations in the Q-15 fixed point system.
Solution: Given, difference equation, 𝑦(𝑛) = 2𝑥(𝑛) + 0.5𝑦(𝑛 − 1)
The IIR filter has a transfer function is found by applying z-transform on both sides and rearranging,
2 2𝑧
𝐻(𝑧) = −1
=
1 − 0.5𝑧 𝑧 − 0.5
Applying the inverse z-transform, we obtain the impulse response.
ℎ(𝑛) = 2 × (0.5)𝑛 𝑢(𝑛)
To prevent overflow in the adder, we can compute the S factor with the help of the Maclaurin series
or approximation equation (1) numerically. We get,
1
𝑆 = 0.25 × [2(0.5)0 + 2(0.5)1 + 2(0.5)2 + ⋯ ] = 0.25 × 2 × =1
1 − 0.5
∞
1
𝐵𝑒𝑐𝑎𝑢𝑠𝑒, ∑ 𝑎𝑛 =
1−𝑎
𝑛=0
Hence, we do not need to perform input scaling. However, we need scale down all the coefficients to
use the Q-15 format. A factor of C = 4 is selected. From Figure 13, we get the difference equations as
𝑥𝑠 (𝑛) = 𝑥(𝑛)
𝑦𝑠 (𝑛) = 0.5 𝑥𝑠 (𝑛) + 0.125 𝑦𝑓 (𝑛 − 1) [After scaling by c=4]
The fixed-point implementation for direct-form II is more complicated. The developed direct-form II
implementation of the IIR filter is illustrated in Figure 14. As shown in the figure, two scale factors A
and B are designated to scale denominator coefficients and numerator coefficients to their Q-format
representations, respectively. Here S is a special factor to scale down the input sample so that the
numerical overflow in the first sum in Figure 14. can be prevented. The difference equations describing
the two stages are:
𝑤(𝑛) = 𝑥(𝑛) − 𝑎1 𝑤(𝑛 − 1) − 𝑎2 𝑤(𝑛 − 2) − ⋯ 𝑎𝑀 𝑤(𝑛 − 𝑀)
𝑦(𝑛) = 𝑏0 𝑤(𝑛) + 𝑏1 𝑤(𝑛 − 1) + ⋯ + 𝑏𝑀 𝑤(𝑛 − 𝑀)
The first equation is scaled down by the factor A to ensure that all the denominator coefficients are less
than 1, that is,
1 1 1 1 1
𝑤𝑠 (𝑛) = 𝑤(𝑛) = 𝑥(𝑛) − 𝑎1 𝑤(𝑛 − 1) − 𝑎2 𝑤(𝑛 − 2) − ⋯ 𝑎𝑀 𝑤(𝑛 − 𝑀)
𝐴 𝐴 𝐴 𝐴 𝐴
𝑤(𝑛) = 𝐴 × 𝑤𝑠 (𝑛)
Similarly, the second equation yields,
1 1 1 1
𝑦𝑠 (𝑛) = 𝑦(𝑛) = 𝑏0 𝑤(𝑛) + 𝑏1 𝑤(𝑛 − 1) + ⋯ + 𝑏𝑀 𝑤(𝑛 − 𝑀)
𝐵 𝐵 𝐵 𝐵
and 𝑦(𝑛) = 𝐵 × 𝑦𝑠 (𝑛)
To avoid the first adder overflow (first equation), the scale factor S can be safely determined by
Equation (1)
𝑆 = 𝐼𝑚𝑎𝑥 × (|ℎ(0)| + |ℎ(1)| + |ℎ(2)| + ⋯ ) (1)
where h(k) is the impulse response due to the denominator polynomial of the IIR filter, where the poles
can cause a large value to the first sum. Hence, h(k) is given by,
1
ℎ(𝑛) = 𝑧 −1 ( ) (2)
1+𝑎1 𝑧 −1 +⋯+𝑎𝑀 𝑧 −𝑀
All the scale factors A, B, and S are usually chosen to a power of a power of 2, respectively, so that the
shift operations can be used in the coding process.
Example:
Given the IIR filter,
𝑦(𝑛) = 0.75𝑥(𝑛) + 1.49𝑥(𝑛 − 1) + 0.75𝑥(𝑛 − 2) − 1.52𝑦(𝑛 − 1) − 0.64𝑦(𝑛 − 2)
with a passband gain of 1 and a full range of input, use the direct-form II implementation to develop
the DSP implementation in the Q-15 fixed-point system.
Solution: The difference equations without scaling in the direct-form II implementation are given by,
𝑤(𝑛) = 𝑥(𝑛) − 1.52 𝑤(𝑛 − 1) − 0.64 𝑤(𝑛 − 2)
𝑦(𝑛) = 0.75 𝑤(𝑛) + 1.49 𝑤(𝑛 − 1) + 0.75 𝑤(𝑛 − 2)
To prevent overflow in the first adder, we obtain the reciprocal of the denominator polynomial as
1
𝐴(𝑧) =
1+ 1.52 𝑧 −1 + 0.64 𝑧 −2
We choose the S factor as S = 16 and we choose A=2 to scale down the denominator coefficients by
half.
Since the second adder output after scaling is
0.75 1.49 0.75
𝑦𝑠 (𝑛) = 𝑤(𝑛) + 𝑤(𝑛 − 1) + 𝑤(𝑛 − 2)
𝐵 𝐵 𝐵
to avoid second adder overflow we have to ensure that each coefficient is less than 1, along with the
sum of the absolute values:
0.75 1.49 0.75
+ + <1
𝐵 𝐵 𝐵
Hence B = 4 is selected. We develop the DSP equations as
𝑥𝑠 (𝑛) = 𝑥(𝑛)/16
𝑤𝑠 (𝑛) = 0.5 𝑥𝑠 (𝑛) − 0.76 𝑤(𝑛 − 1) − 0.32𝑤(𝑛 − 2)
𝑤(𝑛) = 2 𝑤𝑠 (𝑛)
𝑦𝑠 (𝑛) = 0.1875 𝑤(𝑛) + 0.3725 𝑤(𝑛 − 1) + 0.1875 𝑤(𝑛 − 2)
𝑦(𝑛) = (𝐵 × 𝑆) 𝑦𝑠 (𝑛) = 64 𝑦𝑠 (𝑛)