0% found this document useful (0 votes)
39 views

Module 3C

The document discusses two methods for short time synthesis from a short time Fourier transform (STFT): overlap-add and filter bank summation. 1) The overlap-add method reconstructs the signal by taking the inverse DFT of each frame's STFT, overlapping the frames based on the window function, and summing them. Exact reconstruction requires the window and sampling rates meet certain conditions. 2) The filter bank summation method interprets the STFT as lowpass filtering at different frequencies. It reconstructs by modulating each frequency component, filtering with the window function, and summing. This interpretation allows relaxing the requirement that the DFT size exceed the window length for exact reconstruction.

Uploaded by

mahima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Module 3C

The document discusses two methods for short time synthesis from a short time Fourier transform (STFT): overlap-add and filter bank summation. 1) The overlap-add method reconstructs the signal by taking the inverse DFT of each frame's STFT, overlapping the frames based on the window function, and summing them. Exact reconstruction requires the window and sampling rates meet certain conditions. 2) The filter bank summation method interprets the STFT as lowpass filtering at different frequencies. It reconstructs by modulating each frequency component, filtering with the window function, and summing. This interpretation allows relaxing the requirement that the DFT size exceed the window length for exact reconstruction.

Uploaded by

mahima
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Module III Derivations

1. Explain overlap addition method of short time synthesis, with flow chart, waveforms and
necessary equations in detail.

OVERLAP ADDITION METHOD OFSYNTHESIS

Objective: To consider the reconstruction of x[n] from its short-time spectrum based on the
DFT interpretation.

This section strictly adheres to the discussion of whether can we recover the signal x[n] from
the STFT. If yes, then we explore on how the reconstruction is mathematically achieved.

The process of reconstructing the signal x[n] from Short Time Fourier Transform (STFT) is
called as Short Time Fourier Synthesis (STFS) or simply as short time synthesis.

The STFT is naturally discrete in the time dimension but continuous in the frequency
dimension.

For the case of finite window of length (0 𝑡𝑜 𝐿 − 1) The above equation becomes with
2𝜋𝑘
= 𝜔𝑘
𝑁

The STFT has two variables one is n=0,R,2R….rR along time dimension and other is
𝜔𝑘 = 𝑒 −𝑗2𝜋𝑘/𝑁 𝑤𝑖𝑡ℎ 𝑘 = 0,1,2 … . . 𝑁 − 1

The window function w[n] used in the equation (7.42) is defined from 0 to L-1. Therefore the
limits in the above equation are modified due to the use of finite window function w[n].

Upper limit: old value – new value = rR − 0


Lower limit: old value – new value = rR − L + 1

Hence the equation

where R is the sampling period in time of the STFT, and for uniform sampling in frequency,
𝜔𝑘 = 2𝜋𝑘/𝑁, with 𝑘 = 0, 1, . . . , 𝑁 – 1

Now we consider the reconstruction of x[n] from its short-time spectrum based on the DFT
interpretation
Module III Derivations

The DFT of the sequence 𝑥[𝑚]𝑤[𝑟𝑅 − 𝑚] is 𝑋𝑟𝑅 (𝑒 𝑗𝜔𝑘 )

we can use the inverse DFT to obtain

𝑦𝑟 [𝑚] = 𝑥[𝑚]𝑤[𝑟𝑅 − 𝑚]
i.e.,

𝑁−1
1
𝐵𝑒𝑐𝑎𝑢𝑠𝑒, 𝐼𝐷𝐹𝑇 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦 {𝑦[𝑛] = ∑ 𝑋(𝑒 𝑗𝜔𝑘 ) 𝑒 𝑗𝜔𝑘 𝑛
𝑁
𝑘=0

If N ≥ L so that no time aliasing occurs due to the sampling in the 𝜔̂ variable, x[m] inside the
shifted window 𝑤[𝑟𝑅 − 𝑚] can be reconstructed by computing the inverse DFT of
𝑋𝑛 (𝑒 𝑗𝜔𝑘 k ) and dividing by the window (assuming it is strictly non-zero for all values of m).
In this manner L signal values of x[m] can be reconstructed for each analysis window (where
L is the window duration). Then the window can be moved by L samples and the process
repeated.

Conditions for Exact Reconstruction


Assume that the short-time transform is sampled with period R samples in the time dimension
and at N frequencies as in Eq. (7.42);
i.e., we have 𝑌𝑟 (𝑒 𝑗𝜔𝑘 ) = 𝑋𝑟𝑅 (𝑒 𝑗𝜔𝑘 )

where r is an integer and 𝜔𝑘 = 2𝜋𝑘/𝑁 for 0 ≤ 𝑘 ≤ 𝑁 − 1. The Overlap Addition


method is based upon the synthesis equation,

That is, for a causal window of length L as in Eq. (7.42), to reconstruct the signal, the inverse
transform of 𝑌𝑟 (𝑒 𝑗𝜔𝑘 ) is computed for each value of r, giving the sequences:
The inverse Fourier Transform of 𝑌𝑟 (𝑒 𝑗𝜔𝑘 ) is 𝑦𝑟 [𝑚]

𝑦𝑟 [𝑚] = 𝑥[𝑚]𝑤[𝑟𝑅 − 𝑚], 𝑟𝑅 − 𝐿 + 1 ≤ 𝑚 ≤ 𝑟𝑅.

𝑇ℎ𝑒𝑛 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛𝑎𝑙 𝑎𝑡 𝑡𝑖𝑚𝑒 "𝑛" 𝑖𝑠 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑 𝑏𝑦 𝑠𝑢𝑚𝑚𝑖𝑛𝑔 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑡 𝑡𝑖𝑚𝑒 "𝑛" 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒
𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑠 𝑦𝑟[𝑚] 𝑡ℎ𝑎𝑡 𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑛; 𝑖. 𝑒.,
Module III Derivations

That is, the reconstructed signal is equal to x[n] multiplied by a periodically time varying
weighting sequence 𝑤 ̂ [n]. The condition for exact reconstruction of x[n] is therefore,

where we will refer to the constant C as the reconstruction gain.


Module III Derivations

The summation process defined by Eq. (7.45) is illustrated in Figure 7.23, which shows five
overlapping sections of the speech signal, where each section uses a Hamming window of
duration L = 400 samples, and adjacent frames are separated by R = 100 samples. The
summation implied by Eq. (7.47a) is illustrated in this figure by showing at the bottom of the
figure the resulting speech waveform that is recovered by overlapping and adding the
sections.

Note that the sequence 𝑤


̂[𝑛] in Eq. (7.47b) is a periodic sequence (with period R) comprised
of time-aliased (time-reflected) window sequences. As a simple example, consider a
rectangular window 𝑤𝑟𝑒𝑐𝑡 [𝑛] (rectangular window) of length L samples. If R(overlap
duration) = L(Window length), the windowed segments would simply fit together block-by-
block with no overlap.
since

And
̂ [𝑛] = 𝑤𝑟𝑒𝑐𝑡 [𝑛] = 1 for rectangular window
𝑤

Therefore, from Equation (7.48)


̂ [𝑛 ] = 𝐶 = 1
𝑤

If L for a rectangular window is even, and R = L/2, Eq. (7.48) is satisfied with C = 2.
In fact, for the rectangular window, if L = 2𝑣 , where ν is an appropriately large integer, the
signal x[n] can be perfectly reconstructed from Yr[k] by the OLA method of Eq. (7.47a) when
L ≤ N and R =L,L/2,L/4, . . . , 1. The corresponding reconstruction gains would be C=1,2,.. L
We know that,

With Ω = 0, and 𝑘 = 0

The above equation becomes


1
̂ (𝑒 𝑗Ω𝑇 ) =
𝑊 𝑊(𝑒 𝑗0 )
𝑅
Module III Derivations

\
if w[n] has a bandlimited Fourier transform and if Xn(e jωk ) is properly sampled in time, i.e.,
R is small enough to avoid aliasing in the time dimension, then

for all n. Examples would include the unmodified Hamming window or the Kaiser window,
both of which can achieve very nearly exact reconstruction when R is properly chosen. Thus
Eq. (7.47a) becomes

showing that the synthesis rule of Eq. (7.45) can lead to nearly exact reconstruction of x[n]
(to within a constant multiplier) by adding overlapping sections of the waveform.

Figures 7.27 and 7.28 illustrate in more detail how the OLA method is implemented for w[n],
an L-point Hamming window with R = L/4. Figure 7.27 gives a flow chart of the method,
assuming the signal x[n] is 0 for n < 0. Since a time overlap of 4 to 1 is chosen for the
Hamming window, to obtain the correct initial conditions, the first analysis section is
positioned to begin at n = L/4 as shown in Figure 7.28. The window (assumed to be causal
and non-zero for 0 ≤ n ≤ L − 1) is used to give the signal yr[m] = w[rR − m]x[m], which is
non-zero for rR − L + 1 ≤ m ≤ rR. This L-point sequence is padded with sufficient zeros to
account for the effects of any modifications of the short-time spectrum (as discussed in
Section 7.11) and to increase N to a convenient size for fast computation. Then an N-point
FFT of the resulting sequence is used to give Yr(e jωk ).
To reconstruct the signal at time n, we use Eq. (7.45). Figure 7.28 illustrates the operations
implied by Eq. (7.45) for a value of n such that 0 ≤ n ≤ R − 1. Note that y[n] for each n in 0 ≤
n ≤ R − 1 consists of the sum of four numbers;
i.e., y[n] = x[n]w[R − n] + x[n]w[2R − n] + x[n]w[3R − n] + x[n]w[4R − n]. (7.58)
Module III Derivations

For values of n in the next block of samples where R ≤ n ≤ 2R − 1, the term x[n]w[R−n]
would be replaced by a termx[n]w[5R − n], etc.
Module III Derivations
Module III Derivations

2. Explain the Filter bank summation method of short time synthesis with neat block
diagrams, waveforms and detailed derivation.
FILTER BANK SUMMATION METHOD OF SHORT TIME SYNTHESIS

Here we are trying to discuss the impact of sampling Short Time Fourier Transform (STFT)
in frequency dimension

Objective: To Show that the requirement N ≥ L for exact reconstruction can be relaxed
if we choose the window and N properly

Assumptions: the sampling rate in the time dimension is identical to that of the input
signal;

Procedure: Part A: Analysis: converting x[n] into STFT representation


Part B: Synthesis: Given STFT to get back x[n]
Part A:
The frequency-sampled STFT is

where for uniform sampling in frequency, ωk = 2πk/N, with k = 0, 1, . . . ,N – 1


Part B:
STFS equation that we shall employ first is

is simply the inverse DFT of Xn(𝑒 𝑗𝜔𝑘 ) at the particular time n.


Principle Behind Synthesis: study the process of STFS from the point of view of the linear
filtering interpretation of the STFT

The method of synthesis that emerges from this interpretation is called the filter bank
summation method (FBS) of short-time synthesis it is useful Before we begin, it is necessary
to observe that if N ≥ L, where L is the window length, the inverse DFT produces w[n −
m]x[m] for n − m inside the region of support of the window. Therefore, it follows from
setting m = n that

That is, Eq. (7.60b) exactly reconstructs x[n] to within a constant multiplier, and if w[0] > 0,
we can divide by w[0] to obtain x[n].

From the concept of Linear Filtering Interpretation of STFT, we showed that


when 𝑤̂ is fixed at a frequency 𝜔𝑘 , Xn(𝑒 𝑗𝜔𝑘 ) is a lowpass representation of the signal in a
band centered at 𝜔𝑘 .
Module III Derivations

When expressed in the form of Eq. (7.60a), Xn(𝑒 𝑗𝜔𝑘 ) has the interpretation of lowpass
filtering following frequency down-shifting by 𝜔𝑘 . With a change of summation variable, we
have the alternative form

Since the window w[n] has the properties of a lowpass filter, Eq. (7.64) can be interpreted as
in Figure 7.14 as a bandpass filter with impulse response ℎ𝑘 [n] followed by frequency down-
shifting by modulation with a complex exponential 𝑒 −𝑗𝜔𝑘 𝑛
Module III Derivations
Module III Derivations

Figure 7.13 shows an example of how the corresponding lowpass and bandpass filter
frequency responses are related for the case of a Hamming window for analysis frequency 𝜔
̂
Module III Derivations

where Figure 7.30a depicts Eqs. (7.64) and (7.65) and Figure 7.30b depicts Eqs. (7.60a) and
(7.65). Figure 7.30c shows the equivalent bandpass filter for both cases

The result summarized in Figure 7.30 provides the key to understanding how Eq. (7.60b)
provides a practical method for reconstructing the input signal from its time-dependent
Fourier transform. We simply implement N bandpass channels of the form of Eq. (7.65) and
sum the outputs;

which is identical to Eq. (7.60b). This motivates the name FBS for this approach to STFS.

We have seen that at one value of frequency 𝜔 ̂ = 𝜔𝑘 , the combination of analysis followed by
synthesis can be represented as a bandpass filter centered on the analysis frequency 𝜔𝑘 . Now
consider the set of N frequencies {𝜔𝑘 = 2πk/N}, k = 0, 1, . . . ,N − 1, and suppose that the N
time sequences Xn(𝑒 𝑗𝜔𝑘 ) are available for each frequency. This can be achieved by a “bank”
Module III Derivations

of analysis/synthesis channels of the form of either Figure 7.30a or Figure 7.30b, whose
outputs are summed to effect the implementation of Eq. (7.60b).

The N bandpass filters of the form of Eq. (7.63) have frequency responses

as illustrated in Figure 7.13b for a Hamming window. Then if we consider the entire
collection of bandpass filters, each having the same input and their outputs added together as
in Figure 7.31, the composite frequency response relating y[n] to x[n] is
Module III Derivations

If W(𝑒 𝑗𝜔𝑘 ) is properly sampled in the frequency dimension (i.e., if N ≥ L where L is


the time duration of the window), then it can be shown that

Equation 7.70 is from Equation 7.47b

Equation (7.69) is readily obtained by noting that W(𝑒 𝑗(𝜔−𝜔𝑘)) is a uniformly sampled
version of W(𝑒 𝑗𝜔 ) evaluated at ω − ωk rather than at ωk. According to sampling theory, any
set of N uniformly spaced samples is adequate.
Module III Derivations

It is interesting to note that the FBS method and the OLA method are duals of one another;
i.e., one depends on a sampling relation in frequency, and the other depends on a sampling
relation in time. The FBS requires that the sampling in frequency be such that the window
transform obeys the relation

From Eqs. (7.68) and (7.69), we see that the impulse response of the composite system is

and therefore, the composite output y[n] will be y[n] = w[0]x[n] as we showed in Eq. (7.61)
under the same assumption N ≥ L.

Under the condition that w[n] has finite duration, L, the sequence x[n] can be reconstructed
exactly from the time-dependent Fourier transform sampled in the time dimension at the
sampling rate of the input signal and sampled in the frequency dimension at N ≥ L equally
spaced frequencies over 0 ≤ 𝜔 ̂ < 2π
to understand why it is not necessary for the window to have finite length with L ≤ N, it is
helpful to redefine the synthesis
Module III Derivations

where we have introduced a set of complex gain coefficients, P[k], for each of the N filter
bank channels. These complex gain coefficients are shown in Figure 7.33, The coefficients
P[k] can be used to adjust the magnitude and phase of the individual channels. This adds
significant flexibility in the design and implementation of STFA/STFS systems. Now the
overall composite impulse response of the filter bank becomes

.
Module III Derivations

Comparing Figures 7.34a and 7.34b reveals that the product ℎ̂[n] = p[n]w[n] will be zero
everywhere except at n = 0 where the product is unity.
Thus the composite impulse response is ℎ̂ [n] = δ[n],

You might also like