Module 3C
Module 3C
1. Explain overlap addition method of short time synthesis, with flow chart, waveforms and
necessary equations in detail.
Objective: To consider the reconstruction of x[n] from its short-time spectrum based on the
DFT interpretation.
This section strictly adheres to the discussion of whether can we recover the signal x[n] from
the STFT. If yes, then we explore on how the reconstruction is mathematically achieved.
The process of reconstructing the signal x[n] from Short Time Fourier Transform (STFT) is
called as Short Time Fourier Synthesis (STFS) or simply as short time synthesis.
The STFT is naturally discrete in the time dimension but continuous in the frequency
dimension.
For the case of finite window of length (0 𝑡𝑜 𝐿 − 1) The above equation becomes with
2𝜋𝑘
= 𝜔𝑘
𝑁
The STFT has two variables one is n=0,R,2R….rR along time dimension and other is
𝜔𝑘 = 𝑒 −𝑗2𝜋𝑘/𝑁 𝑤𝑖𝑡ℎ 𝑘 = 0,1,2 … . . 𝑁 − 1
The window function w[n] used in the equation (7.42) is defined from 0 to L-1. Therefore the
limits in the above equation are modified due to the use of finite window function w[n].
where R is the sampling period in time of the STFT, and for uniform sampling in frequency,
𝜔𝑘 = 2𝜋𝑘/𝑁, with 𝑘 = 0, 1, . . . , 𝑁 – 1
Now we consider the reconstruction of x[n] from its short-time spectrum based on the DFT
interpretation
Module III Derivations
𝑦𝑟 [𝑚] = 𝑥[𝑚]𝑤[𝑟𝑅 − 𝑚]
i.e.,
𝑁−1
1
𝐵𝑒𝑐𝑎𝑢𝑠𝑒, 𝐼𝐷𝐹𝑇 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 {𝑦[𝑛] = ∑ 𝑋(𝑒 𝑗𝜔𝑘 ) 𝑒 𝑗𝜔𝑘 𝑛
𝑁
𝑘=0
If N ≥ L so that no time aliasing occurs due to the sampling in the 𝜔̂ variable, x[m] inside the
shifted window 𝑤[𝑟𝑅 − 𝑚] can be reconstructed by computing the inverse DFT of
𝑋𝑛 (𝑒 𝑗𝜔𝑘 k ) and dividing by the window (assuming it is strictly non-zero for all values of m).
In this manner L signal values of x[m] can be reconstructed for each analysis window (where
L is the window duration). Then the window can be moved by L samples and the process
repeated.
That is, for a causal window of length L as in Eq. (7.42), to reconstruct the signal, the inverse
transform of 𝑌𝑟 (𝑒 𝑗𝜔𝑘 ) is computed for each value of r, giving the sequences:
The inverse Fourier Transform of 𝑌𝑟 (𝑒 𝑗𝜔𝑘 ) is 𝑦𝑟 [𝑚]
𝑇ℎ𝑒𝑛 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑎𝑙 𝑎𝑡 𝑡𝑖𝑚𝑒 "𝑛" 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑠𝑢𝑚𝑚𝑖𝑛𝑔 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑡 𝑡𝑖𝑚𝑒 "𝑛" 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒
𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑠 𝑦𝑟[𝑚] 𝑡ℎ𝑎𝑡 𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑛; 𝑖. 𝑒.,
Module III Derivations
That is, the reconstructed signal is equal to x[n] multiplied by a periodically time varying
weighting sequence 𝑤 ̂ [n]. The condition for exact reconstruction of x[n] is therefore,
The summation process defined by Eq. (7.45) is illustrated in Figure 7.23, which shows five
overlapping sections of the speech signal, where each section uses a Hamming window of
duration L = 400 samples, and adjacent frames are separated by R = 100 samples. The
summation implied by Eq. (7.47a) is illustrated in this figure by showing at the bottom of the
figure the resulting speech waveform that is recovered by overlapping and adding the
sections.
And
̂ [𝑛] = 𝑤𝑟𝑒𝑐𝑡 [𝑛] = 1 for rectangular window
𝑤
If L for a rectangular window is even, and R = L/2, Eq. (7.48) is satisfied with C = 2.
In fact, for the rectangular window, if L = 2𝑣 , where ν is an appropriately large integer, the
signal x[n] can be perfectly reconstructed from Yr[k] by the OLA method of Eq. (7.47a) when
L ≤ N and R =L,L/2,L/4, . . . , 1. The corresponding reconstruction gains would be C=1,2,.. L
We know that,
With Ω = 0, and 𝑘 = 0
\
if w[n] has a bandlimited Fourier transform and if Xn(e jωk ) is properly sampled in time, i.e.,
R is small enough to avoid aliasing in the time dimension, then
for all n. Examples would include the unmodified Hamming window or the Kaiser window,
both of which can achieve very nearly exact reconstruction when R is properly chosen. Thus
Eq. (7.47a) becomes
showing that the synthesis rule of Eq. (7.45) can lead to nearly exact reconstruction of x[n]
(to within a constant multiplier) by adding overlapping sections of the waveform.
Figures 7.27 and 7.28 illustrate in more detail how the OLA method is implemented for w[n],
an L-point Hamming window with R = L/4. Figure 7.27 gives a flow chart of the method,
assuming the signal x[n] is 0 for n < 0. Since a time overlap of 4 to 1 is chosen for the
Hamming window, to obtain the correct initial conditions, the first analysis section is
positioned to begin at n = L/4 as shown in Figure 7.28. The window (assumed to be causal
and non-zero for 0 ≤ n ≤ L − 1) is used to give the signal yr[m] = w[rR − m]x[m], which is
non-zero for rR − L + 1 ≤ m ≤ rR. This L-point sequence is padded with sufficient zeros to
account for the effects of any modifications of the short-time spectrum (as discussed in
Section 7.11) and to increase N to a convenient size for fast computation. Then an N-point
FFT of the resulting sequence is used to give Yr(e jωk ).
To reconstruct the signal at time n, we use Eq. (7.45). Figure 7.28 illustrates the operations
implied by Eq. (7.45) for a value of n such that 0 ≤ n ≤ R − 1. Note that y[n] for each n in 0 ≤
n ≤ R − 1 consists of the sum of four numbers;
i.e., y[n] = x[n]w[R − n] + x[n]w[2R − n] + x[n]w[3R − n] + x[n]w[4R − n]. (7.58)
Module III Derivations
For values of n in the next block of samples where R ≤ n ≤ 2R − 1, the term x[n]w[R−n]
would be replaced by a termx[n]w[5R − n], etc.
Module III Derivations
Module III Derivations
2. Explain the Filter bank summation method of short time synthesis with neat block
diagrams, waveforms and detailed derivation.
FILTER BANK SUMMATION METHOD OF SHORT TIME SYNTHESIS
Here we are trying to discuss the impact of sampling Short Time Fourier Transform (STFT)
in frequency dimension
Objective: To Show that the requirement N ≥ L for exact reconstruction can be relaxed
if we choose the window and N properly
Assumptions: the sampling rate in the time dimension is identical to that of the input
signal;
The method of synthesis that emerges from this interpretation is called the filter bank
summation method (FBS) of short-time synthesis it is useful Before we begin, it is necessary
to observe that if N ≥ L, where L is the window length, the inverse DFT produces w[n −
m]x[m] for n − m inside the region of support of the window. Therefore, it follows from
setting m = n that
That is, Eq. (7.60b) exactly reconstructs x[n] to within a constant multiplier, and if w[0] > 0,
we can divide by w[0] to obtain x[n].
When expressed in the form of Eq. (7.60a), Xn(𝑒 𝑗𝜔𝑘 ) has the interpretation of lowpass
filtering following frequency down-shifting by 𝜔𝑘 . With a change of summation variable, we
have the alternative form
Since the window w[n] has the properties of a lowpass filter, Eq. (7.64) can be interpreted as
in Figure 7.14 as a bandpass filter with impulse response ℎ𝑘 [n] followed by frequency down-
shifting by modulation with a complex exponential 𝑒 −𝑗𝜔𝑘 𝑛
Module III Derivations
Module III Derivations
Figure 7.13 shows an example of how the corresponding lowpass and bandpass filter
frequency responses are related for the case of a Hamming window for analysis frequency 𝜔
̂
Module III Derivations
where Figure 7.30a depicts Eqs. (7.64) and (7.65) and Figure 7.30b depicts Eqs. (7.60a) and
(7.65). Figure 7.30c shows the equivalent bandpass filter for both cases
The result summarized in Figure 7.30 provides the key to understanding how Eq. (7.60b)
provides a practical method for reconstructing the input signal from its time-dependent
Fourier transform. We simply implement N bandpass channels of the form of Eq. (7.65) and
sum the outputs;
which is identical to Eq. (7.60b). This motivates the name FBS for this approach to STFS.
We have seen that at one value of frequency 𝜔 ̂ = 𝜔𝑘 , the combination of analysis followed by
synthesis can be represented as a bandpass filter centered on the analysis frequency 𝜔𝑘 . Now
consider the set of N frequencies {𝜔𝑘 = 2πk/N}, k = 0, 1, . . . ,N − 1, and suppose that the N
time sequences Xn(𝑒 𝑗𝜔𝑘 ) are available for each frequency. This can be achieved by a “bank”
Module III Derivations
of analysis/synthesis channels of the form of either Figure 7.30a or Figure 7.30b, whose
outputs are summed to effect the implementation of Eq. (7.60b).
The N bandpass filters of the form of Eq. (7.63) have frequency responses
as illustrated in Figure 7.13b for a Hamming window. Then if we consider the entire
collection of bandpass filters, each having the same input and their outputs added together as
in Figure 7.31, the composite frequency response relating y[n] to x[n] is
Module III Derivations
Equation (7.69) is readily obtained by noting that W(𝑒 𝑗(𝜔−𝜔𝑘)) is a uniformly sampled
version of W(𝑒 𝑗𝜔 ) evaluated at ω − ωk rather than at ωk. According to sampling theory, any
set of N uniformly spaced samples is adequate.
Module III Derivations
It is interesting to note that the FBS method and the OLA method are duals of one another;
i.e., one depends on a sampling relation in frequency, and the other depends on a sampling
relation in time. The FBS requires that the sampling in frequency be such that the window
transform obeys the relation
From Eqs. (7.68) and (7.69), we see that the impulse response of the composite system is
and therefore, the composite output y[n] will be y[n] = w[0]x[n] as we showed in Eq. (7.61)
under the same assumption N ≥ L.
Under the condition that w[n] has finite duration, L, the sequence x[n] can be reconstructed
exactly from the time-dependent Fourier transform sampled in the time dimension at the
sampling rate of the input signal and sampled in the frequency dimension at N ≥ L equally
spaced frequencies over 0 ≤ 𝜔 ̂ < 2π
to understand why it is not necessary for the window to have finite length with L ≤ N, it is
helpful to redefine the synthesis
Module III Derivations
where we have introduced a set of complex gain coefficients, P[k], for each of the N filter
bank channels. These complex gain coefficients are shown in Figure 7.33, The coefficients
P[k] can be used to adjust the magnitude and phase of the individual channels. This adds
significant flexibility in the design and implementation of STFA/STFS systems. Now the
overall composite impulse response of the filter bank becomes
.
Module III Derivations
Comparing Figures 7.34a and 7.34b reveals that the product ℎ̂[n] = p[n]w[n] will be zero
everywhere except at n = 0 where the product is unity.
Thus the composite impulse response is ℎ̂ [n] = δ[n],