Lab 1 - Block Filtering
Lab 1 - Block Filtering
In a real-time signal processing scenario, an input signal is constantly fed into the DSP to be
processed, and the associated output is made ready without “much delay” (otherwise it wouldn’t
be called “real-time”). So, if we had a long input signal that continues in time indefinitely, we
don’t want to wait until the entire input signal has terminated before filtering it; this would be
equivalent to doing something “off-line” after all the input is available. Instead, we would like
to filter the blocks of input as they come in. Furthermore, we would like to filter these blocks
quickly, which means the use of the Fast Fourier Transform (FFT) algorithm is necessary to
implement the associated processes in the frequency domain. There are two methods for
segmenting a long (possibly infinitely long) input signal into shorter blocks, and processing
them quickly using the FFT. They are called the Overlap-Add method and the Overlap- Save
method. The former method is easiest to explain.
Overlap-Add
The Overlap-Add method is based on the observation that when we consider two discrete-
time signals, say xk(n) and h(n), with support L and support M, respectively (note: the support
is the length of the smallest consecutive stretch of points that contains all non-zero signal
elements), the resulting convolution yk(n) = xk(n) * h(n), has a support of L+M-1. For example,
say the support for xk(n) is n = 0, …, L-1 and the support for h(n) is n = 0, …, M-1, then the support
for yk(n) is at n = 0, …, L+M-2. The questions at the end of this lab will elucidate this concept.
Using this idea suppose our input stream x(n) is an infinite sequence starting at time n = 0.
Divide x(n) into L-length blocks and convolve each L-block with h(n) (using linear convolution).
Then sum all the convolution outcomes along the L-boundaries (we elaborate more soon). This
works because of the additive property of convolution which states (x1(n) + x2(n)) * h(n) = x1(n) *
h(n) + x2(n) * h(n), and is visually depicted in Figure 1.
Figure 1 Basic Idea Behind Overlap-Add Method. The kth L-block of x(n) is denoted xk(n).
In Figure 1, the operation of convolving a very long x(n) with h(n) is equivalent to the
operation of convolving each L-block of x(n), denoted xk(n) for the kth block, with h(n) and
then conducting addition judiciously to deal with the “tail” region from each block convolution as
we discuss next. An important aspect is that after convolving each block with h(n), the resulting
intermediate signal is L+M-1 samples in length as discussed earlier, and thus the extra M-1
samples at the end due to convolution expanding the support (called the “tail”) must be
added to the first M-1 samples of the next convolved block. This is illustrated in Figure 2, where
the right-hand side (RHS) of the equality shows the result (graphically) after convolution.
Specifically, the kth output block is given by: yk(n) = xk(n) * h(n) and the last M-1 samples of
yk(n) must be added to the first M-1 samples of yk+1(n) to produce the appropriate output
signal y(n) = x(n) * h(n).
Figure 2 “Tail” Resulting from Convolution must be Added to Next Convolved Block;
Note: yk(n) = xk(n) * h(n).
One main challenge in a real-time processing scenario is that the timing of completing
a block convolution needs to be appropriately synchronized with the overall output speed so
that that tail region may be added to the next block at the right time. If the process of convolving
each block is slower than outputting the samples of blocks already convolved, then the tail
region will not have the opportunity to be added to the next block (because the next block is still
in the process of going through convolution), resulting in the erroneous output of the samples 0 to
M-2 and L to L+M-2 for each block. One way to deal with this is to slow down the rate of output,
which may fail to meet the timing requirements for a given application. Another, more attractive
approach, is to speed up the process of convolution. This can be achieved through the use of the
FFT for convolution. A reminder of how to use the FFT algorithm to filter a block of input to
perform convolution is summarized here (note: this is not the entire Overlap-Add algorithm as
the output blocks must be combined at the end of this):
1. Zero-pad the filter h(n) with L-1 zeros to make it length (L+M-1).
2. Compute the (L+M-1)-FFT of the zero-padded h(n) result of Step 1 and save.
3. Zero-pad each L-block input segment xk(n) with M-1 zeros it makes it length (L+M-1).
4. Compute the (L+M-1)-FFT of the zero-padded result of Step 3 and save.
5. Multiply sample-by-sample the two FFT results from Steps 2 and 4.
6. Take the inverse (L+M-1)-FFT (IFFT) of the resulting product from Step 5 to produce yk(n).
After each block yk(n) is computed as above, it is added to the next block as shown in
Figure 2 and more simply in Figure 3. The throughput is L samples per block processed.
Overlap-Save
The Overlap-Save method is a bit more difficult to explain than the Overlap-Add method
as it is based, in part, on the concept of circular convolution which in this context results in
time-domain aliasing. We describe an Overlap-Save method with the same throughput of L
samples per block processed as discussed for Overlap-Add.
Textbooks often give complicated formulas when explaining circular convolution. Here
is an easier way to consider circular convolution. Let’s consider xk(n) and h(n) with support
regions n = 0, …, N-1 and n = 0, …, M-1, respectively; note the difference from the previous
section: the support length for xk(n) is now N for Overlap-Save, but it was L for Overlap-Add. Let
yL,k(n) be the result of normal convolution (often called linear convolution) of xk(n) and h(n).
Then the N-circular convolution of xk(n) and h(n) can be described in terms of yL,k(n) via the
diagram in Figure 4 for N = 4 and M = 3. Consider two stages. The first stage takes a periodic
extension of the linear convolution result yL,k(n) where the period used for the extension is N.
The second stage windows the result such that we consider only N points at n = 0, …, N-1
leaving all other points with zero-values. The periodic extension is created by adding all shifted
versions (where the shifts are restricted to be integer multiples of N) of the linear convolution
result yL,k(n). Because xk(n) is of length N and h(n) of length M, the length of yL,k(n) will be
N+M-1. Thus, taking a periodic extension of yL,k(n) with period N will result in overlap. Since the
overall goals is to conduct a linear convolution (but using the FFT), this overlap is essentially
“corruption” in the beginning of the windowed sequence as shown in Figure 4. In general, the
points at n = 0, 1, 2, …, M-2 will be corrupt from what we call time-domain “self-aliasing”.
However, the remaining points at n = M-1, M-2, M-3, …, N-1 will still be equal to the
desired yL,k(n).
Finally, after each output block is generated, it is combined into the convolved output
stream y(n) as illustrated in Figure 5. Figure 5 demonstrates the relationship between input
blocks and output blocks for the Overlap-Save method. The N-length blocks need to overlap
effectively by M-1 points in order to compensate for the corruption due to using the FFT for
convolution (resulting in corruption in the circularly convolved output block).
The following algorithm will further help you implement the Overlap-Save method. We
deal with input blocks of length N. To have an equivalent throughput to the Overlap-Add
method (which helps for comparison reasons), we let N=L+M-1 where L is the same parameter
as specified in the Overlap-Add method and M is the length of h(n) as for both the
Overlap-Add and Overlap-Save methods:
1. Zero pad h(n) with L-1 zeros such that it is of length N and has support n=0, 1, …, N-1.
2. Take the N-FFT of h(n) and save for repeated access in the future.
3. If the current block being processed is the first block, then the first M-1 samples are zeros; fill
the remaining L samples with the signal x(n), starting at n=0. Save the last M-1 samples of
this block for processing of the next block.
If the current block being processed is not the first block, then take the last M-1 samples
from the previous block (previously saved) as the first M-1 samples of the current block
(this creates an overlap), and fill the remaining L samples with the next new samples of x(n).
Save the last M-1 samples of this block for processing of the next block.
Figure 5 illustrates this overlap process.
4. Take the N-FFT of each N-length input block xk(n) and save.
5. Multiply sample-by-sample the FFT result from Step 2 with the FFT result from Step 4.
6. Take the inverse N-FFT of the result from Step 5.
7. Discard the first M-1 samples from Step 5 that are corrupted, keeping only the last L samples.
8. Concatenate the kept L samples of each block to produce the overall output as illustrated in
Figure 5.
Figure 5 Illustration of Breaking up Input Blocks and Combining Filtered Output Blocks for Overlap-
Save Method.
Notice that for the Overlap-Save method, no zero-padding is required for any of the input
blocks (but it may be required one-time for the FFT of h(n)), and no addition operation is
required either as in the Overlap-Add method. Depending on the relative values of the
parameters L, M and N, one method may be more computationally and/or memory efficient than
the other.
Overlap-Save Algorithm
First, you will construct a block diagram for an Overlap-Save algorithm using
elementary Simulink blocks. For this model, we will walk you through it step by step. The
model will implement an FIR filter kernel of length M = 113. The algorithm will use an FFT and
inverse FFT of length N = L + M - 1 = 512. Thus, the input blocks will be of length N = 512 and
the throughput will be L = 400 output samples per processed block.
1. Use the same lowpass filter as was used for the Overlap-Save algorithm. However, change the
order of the filter to 500 such that the filter kernel has a length of M = 501. Also, change the
Fstop parameter from 800 Hz to 500Hz. Essentially, we are using a longer. more expensive filter
kernel to produce a lowpass frequency response that drops of much more sharply than the
previous filter. Even with a filter order of 500. FFT convolution can filter the output in a
reasonable amount of time.
2. Divide the input to the filter into data blocks of length L = 1548. Note that this makes the size of
your FFT and inverse FIT calculations N = L+M-I = 2048.
1. It is probably best to use Pad blocks from Signal Processing Blockset Signal Operations to
append zeros to the end of data blocks.
2. When using the Delay Line block, make sure that the block parameter named Delay line size is
set to be the same size as the input vector to the block. For example, if you input 50x1 data
blocks into the Delay Line and want a delay of one data block, set the Delay line size to 50.
3. If you use a From Workspace block to import the filter coefficients to the Simulink model in a
similar way to the Overlap-Save algorithm, make sure to set the sample time to 1548/8000.This
will match the rate at which the length L data blocks are received by the filter at its input.
Perform the same tests as for the Overlap-Save algorithm. Note and explain any obtained result