Digital Signal Processing With Selected
Digital Signal Processing With Selected
SIGNAL
PROCESSING
Basic Theory and Applications
Ljubiša Stanković
ISBN-13: 978-1514179987
ISBN-10: 1514179989
All right reserved. Printed and bounded in the United States of America.
No part of this book may be reproduced or utilized in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any information storage and retrieval
system, without permission in writing from the copyright holder.
Ljubiša Stanković Digital Signal Processing 3
To
my parents
Božo and Cana ,
my wife Snežana,
and our
Irena, Isidora, and Nikola.
Contents
I Review of Continuous-Time Signals and Systems 16
Chapter 1 Continuous-Time Signals and Systems 17
1.1 Continuous-Time Signals 17
1.2 Linear Systems 20
1.3 Periodic Signals and Fourier Series 23
1.3.1 Fourier Series of Real-Valued Signals 25
1.4 Fourier Transform 30
1.4.1 Fourier Transform and Linear Time-Invariant Systems 33
1.4.2 Properties of the Fourier Transform 33
1.4.3 Relationship Between the Fourier Series and the Fourier
Transform 36
1.5 Fourier Transform and the Stationary Phase Method 37
1.6 Laplace Transform 42
1.6.1 Properties of the Laplace Transform 44
1.6.2 Table of the Laplace Transform 46
1.6.3 Linear Systems Described by Differential Equations 47
1.7 Butterworth Filter 52
4
Ljubiša Stanković Digital Signal Processing 5
HIS book is a result of the author’s thirty-three years of experience in teaching and
T research in signal processing. It is written for students and engineers as a first book in
digital signal processing, assuming that a reader is familiar with basic mathematics,
including integrals, differential calculus, and linear algebra. Although a review of continuous-
time analysis is presented in the first chapter, a prerequisite for the presented content is basic
knowledge about continuous-time signal processing.
The book consists of three parts. After an introductory review part, the basic principles
of digital signal processing are presented within Part two of the book. This part starts with
Chapter two which deals with basic definitions, transforms, and properties of discrete-time
signals. The sampling theorem, providing an essential relation between continuous-time
and discrete-time signals, is presented in this chapter as well. Discrete Fourier transform
and its applications to signal processing are the topics of the third chapter. Other common
discrete transforms, like Cosine, Sine, Walsh-Hadamard, and Haar are also presented in
this chapter. The z-transform, as a powerful tool for analysis of discrete-time systems, is
the topic of Chapter four. Various methods for transforming a continuous-time system into
a corresponding discrete-time system are derived and illustrated in Chapter five. Chapter
six is dedicated to the forms of discrete-time system realizations. Basic definitions and
properties of random discrete-time signals are given in Chapter six. Systems to process
random discrete-time signals are considered in this chapter as well. Chapter six concludes
with a short study of quantization effects.
The presentation is supported by numerous illustrations and examples. Chapters within
Part two are followed by a number of solved and unsolved problems for practice. The
theory is explained in a simple way with a necessary mathematical rigor. The book provides
simple examples and explanations for every presented transform, method, algorithm or
approach. Sophisticated results in signal processing theory are illustrated by simple numerical
examples.
Part three of the book contains a few selected topics in digital signal processing: adaptive
discrete-time systems, time-frequency signal analysis, and processing of discrete-time sparse
signals. This part could be studied within an advanced course in digital signal processing,
following the basic course. Some parts from the selected topics may be included in tailoring
a more extensive first course in digital signal processing as well.
The author would like to thank colleagues: prof. Zdravko Uskoković, prof. Srdjan
Stanković, prof. Igor Djurović, prof. Veselin Ivanović, prof. Miloš Daković, prof. Božo
12
Ljubiša Stanković Digital Signal Processing 13
Krstajić, prof. Vesna Popović-Bugarin, prof. Slobodan Djukanović, prof. Irena Orović, dr.
Nikola Žarić, dr Marko Simeunović, M.Sc. Predrag Raković, M.Sc. Andjela Draganić and
M.Sc. Isidora Stanković for careful reading of the initial version of this book and for the
comments that helped to improve the presentation.
The author thanks the colleagues that helped in preparing the special topics part of the
book. Many thanks to Miloš Daković who coauthored all three chapters of Part three of
this book and to other coauthors of chapters in this part: Thayaparan Thayananthan, Srdjan
Stanković, and Irena Orović. Special thanks to M.Sc. Miloš Brajović and M.Sc. Stefan
Vujović for their careful double-check of the presented theory and examples, numerous
comments, and for the help in proofreading the final version of the book.
London,
July 2013 - July 2015.
Author
The book has been slightly edited, keeping the main structure unchanged. The
chapter dealing with random signals is updated to provide the basis for machine learning,
compressive sensing, graph signal processing, and other modern signal processing areas.
Podgorica, Montenegro
March - June 2020.
Author
A representation of information. Signal theory and processing are the areas dealing
with the efficient generation, description, transformation, transmission, reception,
and interpretation of signals. In the beginning, the most common physical processes used
for these purposes were the electric signals, for example, varying current or electromagnetic
waves. Signal theory is most commonly studied within electrical engineering. Signal theory
tools are strongly related to applied mathematics and information theory. Examples of
signals include speech, music, image, video, medical, biological, geophysical, sonar, radar,
biomedical, car engine, financial, and molecular data. In terms of signal generation, the
main topics are in sensing, acquisition, synthesis, and reproduction of information. Various
mathematical transforms, representations, and algorithms are used for describing signals.
Signal transformations are a set of methods for decomposition, filtering, estimation, and
detection. Modulation, demodulation, detection, coding, and compression are the most
important aspects of signal transmission. In the process of interpretation, various approaches
may be used, including adaptive and learning-based tools and analysis.
Mathematically, signals are presented by the functions of one or more variables.
Examples of one-dimensional signals are speech and music signals. A typical example of a
two-dimensional signal is an image while video sequence is a sample of a three-dimensional
signal. Some signals, for example, geophysical, medical, biological, radar, or sonar, may be
represented and interpreted as one-dimensional, two-dimensional, or multidimensional.
Signals may be continuous functions of independent variables, for example, functions of
time or space. Independent variables may also be discrete, with the signal values being defined
only over an ordered set of discrete independent variable values. This is a discrete-time
signal. The discrete-time signals, after being stored in a general computer or special-purpose
hardware, are discretized (quantized) in amplitude as well, so that they can be memorized
within the registers of a finite length. These kinds of signals are referred to as digital signals,
Fig. 1. A continuous-time and continuous amplitude (analog) signal is transformed into a
discrete-time and discrete-amplitude (digital) signal using analog-to-digital (A/D) converters,
Fig. 2. Their processing is known as digital signal processing. In modern systems, the
amplitude quantization errors are very small. Common A/D converters are with the sampling
frequency of up to megasample (some even up to a few gigasample) per second with 8 to
24 bits of resolution in amplitude. The digital signals are usually mathematically treated as
continuous (nondiscretized) in amplitude, while the quantization error is studied, if needed,
14
Ljubiša Stanković Digital Signal Processing 15
x (n)
x(n)
x(t)
8 1000
d
7 0111
0.4 0.4 6 0110
5 0101
4 0100
0.2 0.2 3 0011
2 0010
1 0001
0 0 0000
0 5 10 15 0 5 10 15 0 5 10 15
t n n
Figure 1 A continuous-time analog signal (left) and its discrete-time (middle) and digital version (right).
as a small disturbance in processing, reduced to a noise in the input signal. Digital signals
are transformed back into analog form by digital-to-analog (D/A) converters.
ANALOG SYSTEM
x(t) y(t)
ha(t)
DIGITAL SYSTEM
Figure 2 Illustration of an analog and a digital system used to process an analog signal.
According to the nature of their behavior, all signals could be deterministic or stochastic.
For deterministic signals, the values are known in the past and future, while the stochastic
signals are described by probabilistic methods. The deterministic signals are commonly used
for theoretical description, analysis, and syntheses of systems for signal processing.
Advantages of processing signals in digital form are in their flexibility and adaptability
with possibilities ranging up to our imagination to implement a transformation with an
algorithm on a computer. The time required for processing in real time (all calculations
have to be completed between two signal samples) is a limitation as compared to the analog
systems that are limited with a physical delay of electrical components and circuits only.
Part I
Review of Continuous-Time
Signals and Systems
16
Chapter 1
Continuous-Time Signals and Systems
In the Heaviside function definition, the value of u(0) = 1/2 is also used. Note that the
independent variable t is continuous, while the signal itself is not a continuous function. It
has a discontinuity at t = 0.
The boxcar signal (rectangular window) is formed as b(t) = u(t + 1/2) − u(t − 1/2),
that is, b(t) = 1 for −1/2 ≤ t < 1/2 and b(t) = 0 elsewhere. The signal obtained by
multiplying the unit-step signal by t is called the ramp signal, with notation R(t) = tu(t).
The impulse signal (or delta function) is defined as
Z∞
δ(t) = 0, for t 6= 0 and δ(t)dt = 1. (1.2)
−∞
The impulse signal is equal to 0 everywhere, except at t = 0, where it takes an infinite value,
so that its area is 1. From the definition of the impulse signal, it follows δ( at) = δ(t)/ | a| .
This function cannot be implemented in real-world systems due to its infinitely short duration
and infinitely large amplitude at t = 0.
17
18 Continuous-Time Signals and Systems
Z∞ Z∞
x (t) = x (t − τ )δ(τ )dτ = x (τ )δ(t − τ )dτ. (1.3)
−∞ −∞
The unit-step signal can be related to the impulse signal using the previous relation as
Z∞ Zt
u(t) = δ(τ )u(t − τ )dτ = δ(τ )dτ
−∞ −∞
or
du(t)
= δ ( t ). (1.4)
dt
The sinusoidal signal, with amplitude A, frequency Ω0 , and initial phase ϕ, is a signal
of the form
x (t) = A sin(Ω0 t + ϕ). (1.5)
This signal is periodic in time, since it satisfies the periodicity condition
x ( t + T ) = x ( t ). (1.6)
is also periodic with period T = 2π/Ω0 . Fig. 1.1 depicts basic continuous-time signals.
1 1
u(t)
δ(t)
0 0
−1 (a) −1 (b)
−4 −2 0 2 4 −4 −2 0 2 4
1 1
sin(πt)
b(t)
0 0
−1 (c) −1 (d)
−4 −2 0 2 4 −4 −2 0 2 4
t t
Figure 1.1 Continuous-time signals: (a) unit-step signal, (b) impulse signal, (c) boxcar signal, and (d) sinusoidal
signal.
Ljubiša Stanković Digital Signal Processing 19
Example 1.1. Find the periods of the signals: x1 (t) = sin(2πt/36), x2 (t) = cos(4πt/15 + 2),
x3 (t) = exp( j0.1t), x4 (t) = x1 (t) + x2 (t), and x5 (t) = x1 (t) + x3 (t).
⋆Periods are calculated according to (1.6). For x1 (t), the period follows from 2πT1 /36 = 2π,
as T1 = 36. Similarly, T2 = 15/2 and T3 = 20π. The period of x4 (t) is the smallest interval
containing T1 and T2 . It is T4 = 180 (5 periods of x1 (t) and 24 periods of x2 (t)). For the signal
x5 (t), when the periods of components are T1 = 36 and T3 = 20π, there is no common interval
T5 such that the periods T1 and T3 are contained an integer number of times in it. Thus, the signal
x5 (t) is not periodic.
n =0
ZT
1
PAV = lim | x (t)|2 dt. (1.11)
T →∞ 2T
−T
20 Continuous-Time Signals and Systems
The average power is a time average of energy. Energy signals are signals with a finite
energy, while power signals have finite and nonzero power. The average signal power
of energy signals is zero.
Example 1.3. Find the magnitude, energy, instantaneous power, and average power of the signal
x (t) given by
x (t) = te−t u(t). (1.12)
⋆ The signal x (t) is a nonnegative continuous function with the initial and the final value equal
to x (0) = 0 and limt→∞ x (t) = 0, respectively. The magnitude of this signal is obtained as its
maximum, from
dx (t)
= e−t − te−t = (1 − t)e−t = 0, for t > 0. (1.13)
dt
The maximum Mx = 1/e is achieved at t = 1. The energy of this signal is equal to
Z∞ Z∞ Z∞
2 −2t e−2t 2 ∞ −2t e−2t ∞ 1 1
Ex = t e dt = − t + te dt = − t + e−2t dt = , (1.14)
2 0 2 0 2 4
0 0 0
where the integration in parts is used twice, with limt→∞ e−2t t2 = 0. The instantaneous power of
the signal x (t) is Px (t) = t2 e−2t u(t). The average power of this signal is PAV = 0.
A system transforms one signal (input signal) into another signal (output signal). Assume
that x (t) is the input signal. The system transformation will be denoted by an operator, T {◦}.
The output signal can be written as
A system is linear if, for any two signals x1 (t) and x2 (t) and arbitrary constants a1 and a2 ,
holds
y(t) = T { a1 x1 (t) + a2 x2 (t)} = a1 T { x1 (t)} + a2 T { x2 (t)}. (1.16)
A system is time-invariant if its properties and parameters do not change over time. For
a time-invariant system, the following relation:
h(t) = T {δ(t)},
Ljubiša Stanković Digital Signal Processing 21
then for any signal x (t) at the system input, the output can be obtained using (1.3), as
Z∞
y(t) = T { x (t)} = T x (τ )δ(t − τ )dτ
−∞
Z∞ Z∞
Linearity Time−invariance
= x (τ )T {δ(t − τ )}dτ = x (τ )h(t − τ )dτ.
−∞ −∞
The last integral is of particular importance in signals and systems. It is called the convolution
in time of x (t) and h(t), and has a specific notation
Z∞
y(t) = x (t) ∗t h(t) = x (τ )h(t − τ )dτ. (1.18)
−∞
holds.
Example 1.4. Find the convolution of the two boxcar signals x (t) = u(t) − u(t − 5) and
h ( t ) = u ( t ) − u ( t − 2).
⋆The signals x (τ ) and h(t − τ ) are shown in Fig. 1.2 for t = 0 and t = 1.25. For example, the
convolution value at t = 0 is obtained using the integral of the product of x (τ ) and h(−τ ), that is
Z∞
y (0) = x (τ )h(−τ )dτ = 0. (1.20)
−∞
For t < 0, the nonzero values of x (τ ) and h(t − τ ) do not overlap, resulting in y(t) = 0. For
Rt
0 ≤ t < 2, the output signal is y(t) = 0 dτ = t, while for 2 ≤ t < 5, y(t) = 2. For 5 ≤ t < 7, the
value of y(t) is y(t) = 7 − t. Finally, for t ≥ 7 the convolution value is equal to zero, y(t) = 0,
as shown in Fig. 1.2.
Duration of the convolution, y(t) = x (t) ∗t h(t), is equal to the sum of durations of x (t)
and h(t), that is Ty = Tx + Th , where Tx , Th , and Ty , are the respective durations of x (t), h(t),
and y(t).
Example 1.5. Find the convolution of the two signals x (t) = u(t + 1) − u(t − 1) and h(t) =
e − t u ( t ).
22 Continuous-Time Signals and Systems
Figure 1.2 Calculation of the convolution, y(t) = x (t) ∗t h(t), of signals x (t) = u(t) − u(t − 5) and h(t) =
u ( t ) − u ( t − 2).
A system is causal if there is no response before the input signal appears. For causal
systems h(t) = 0 for t < 0. In general, signals that satisfy the property that they may be an
impulse response of a causal system may be referred to as causal signals.
A system is stable if any input signal with a finite magnitude Mx = max−∞<t<∞ | x (t)|
produces an output y(t) whose values are finite, |y(t)| < ∞. Sufficient condition that a linear
time-invariant system is stable is
Z∞
|h(τ )|dτ < ∞ (1.21)
−∞
since
Z∞ Z∞
|y(t)| = | x (t − τ )h(τ )dτ | ≤ | x (t − τ )h(τ )|dτ
−∞ −∞
Z∞ Z∞
= | x (t − τ )||h(τ )|dτ ≤ Mx |h(τ )|dτ < ∞,
−∞ −∞
Ljubiša Stanković Digital Signal Processing 23
if (1.21) holds.
It can be shown that the absolute value integrability of the impulse response is the
necessary condition for a linear time-invariant system to be stable as well.
Consider a periodic signal x (t) with a period T. It can be expanded onto periodic complex
sinusoidal functions φn (t) = e j2πnt/T , for −∞ < n < ∞,
It means that the inner product of any two different basis functions is zero (orthogonal set),
while the self-inner product of a basis function is 1 (normal set). In the case of an orthonormal
set of basis functions, it is easy to show that the weighting coefficients Xn can be calculated
as the projections of x (t) onto the basis functions, here φn (t) = e j2πnt/T , −∞ < n < ∞,
D E T/2
Z
1
Xn = x ( t ), e j2πnt/T
= x (t)e− j2πnt/T dt. (1.23)
T
− T/2
This relation follows after a simple multiplication of the right and left side of (1.22) by
R T/2
e− j2πmt/T and a normalized integration within the period, that is T1 −T/2 (·) dt.
Normalization is achieved using the factor of 1/T in the scalar product definition. If
this factor were not
√ used, then the orthonormal set of basis functions would be defined by
φn (t) = e j2πt/T / T, −∞ < n < ∞, with the same conclusions and relations.
Since the signal and the basis functions are periodic with period T, we can use
T/2
Z Z +Λ
T/2
1 − j2πnt/T 1
x (t)e dt = x (t)e− j2πnt/T dt (1.24)
T T
− T/2 − T/2+Λ
24 Continuous-Time Signals and Systems
Example 1.6. What are the Fourier series coefficients of the periodic signal x (t) = cos2 (πt/4).
What will be the coefficient values if the period T = 8 is assumed?
⋆The signal x (t) can be written as x (t) = (1 + cos(πt/2))/2. The period is T = 4. Assuming
that the Fourier series coefficients are calculated with T = 4, after transforming the signal into
(1.22) form, we get
1 1 1
x (t) = e− j2πt/4 + + e j2πt/4 .
4 2 4
The Fourier series coefficients are recognized as X−1 = 1/4, X0 = 1/2 and X1 = 1/4 (without
the calculation defined by (1.23)). Other coefficients are equal to zero. In the above transformation,
the relation cos(πt/2) = (e jπt/2 + e− jπt/2 )/2 is used.
If the period T = 8 is used, then the signal is decomposed into complex sinusoids of the
form e j2πnt/8 (see relation (1.22)). The signal can be written as
1 − j2π2t/8 1 1 j2π2t/8
x (t) = e + + e . (1.25)
4 2 4
Thus, by comparing the signal definition with the basis functions e jπnt/4 , we may write
X−2 = 1/4, X0 = 1/2, and X2 = 1/4. The remaining coefficients Xn are equal to zero.
Example 1.7. Calculate the Fourier series coefficients of a periodic signal x (t) defined as
∞
x (t) = ∑ x0 (t + 2n)
n=−∞
with
x0 (t) = u(t + 1/4) − u(t − 1/4). (1.26)
⋆The signal x (t) is a periodic extension of x0 (t), with period T = 2. This signal is equal to 1
for −1/4 ≤ t < 1/4, within its basic period. The Fourier series coefficients are obtained from
1/4
Z
1 sin(πn/4)
Xn = 1e− j2πnt/2 dt = , (1.27)
2 πn
−1/4
x(t) X
n
−1 −1/4 1/4 1
t 0 n
Figure 1.3 Periodic signal, x (t), (left) and its Fourier series coefficients, Xn , (right).
1.5 1.5
1 1
x (t)
x (t)
0.5 0.5
1
0 0
(a) (b)
−0.5 −0.5
−2 −1 0 1 2 −2 −1 0 1 2
t t
1.5 1.5
1 1
x (t)
x (t)
0.5 0.5
30
6
0 0
(c) (d)
−0.5 −0.5
−2 −1 0 1 2 −2 −1 0 1 2
t t
Figure 1.4 Reconstruction of the signal x (t) using a finite Fourier series with: (a) the coefficients Xn within
−1 ≤ n ≤ 1, (b) the coefficients Xn within −2 ≤ n ≤ 2, (c) the coefficients Xn within −6 ≤ n ≤ 6, and (d) the
coefficients Xn within −30 ≤ n ≤ 30.
For a real-valued signal x (t) the Fourier series coefficients can be written in the form
T/2
Z T/2
Z
1 2πnt 1 2πnt An − jBn
Xn = x (t) cos( )dt − j x (t) sin( )dt = , (1.28)
T T T T 2
− T/2 − T/2
26 Continuous-Time Signals and Systems
where An /2 and − Bn /2 are the real and imaginary part of Xn . Since Xn∗ = X−n holds for
real-valued signals, the values of An and Bn are equal to
T/2
Z
2 2πnt
A n = Xn + X− n = x (t) cos( )dt,
T T
− T/2
T/2
Z
Xn − X− n 2 2πnt
Bn = = x (t) sin( )dt. (1.29)
−j T T
− T/2
−1 ∞
x (t) = ∑ Xn e j2πnt/T + X0 + ∑ Xn e j2πnt/T
n=−∞ n =1
∞
= X0 + ∑ (Xn e j2πnt/T + X−n e− j2πnt/T )
n =1
∞
2πnt 2πnt
= X0 + ∑ (Xn + X−n ) cos( T
) + j( Xn − X−n ) sin(
T
)
n =1
∞ ∞
A0 2πnt 2πnt
= + ∑ An cos( ) + ∑ Bn sin( ), (1.30)
2 n =1
T n =1
T
p
with | Xn | = A2n + Bn2 /2. For real-valued signals the integrals in (1.29), corresponding to
An and Bn , are respectively even and odd function of n. Therefore, it is possible to calculate
T/2
Z T/2
Z
1 2πnt 2πnt 1 2πnt
Hn = x (t) cos( ) + sin( ) dt = x (t)cas( )dt (1.31)
T T T T T
− T/2 − T/2
and to get
An = Hn + H−n
Bn = Hn − H−n .
The coefficients calculated by (1.31) are the Hartley series coefficients. For a real-valued and
even signal, x (t) = x (−t), the Hartley series reduces to
T/2
Z T/2
Z
An 1 2πnt 2 2πnt
Cn = Xn = = x (t) cos( )dt = x (t) cos( )dt,
2 T T T T
− T/2 0
A similar expression is obtained for an odd and real-valued signal x (t), when the Fourier
series reduces to the Fourier sine series coefficients,
Example 1.8. Consider the Fourier series based reconstruction of the the signal
x (t) = t[u(t) − u(t − 1/2)],
whose duration (nonzero values) is limited to 0 ≤ t < 1/2. For the Fourier series expansion,
a periodic extension of the signal must be formed. The rate of the Fourier series coefficients
convergence depends on the way how the periodic extension of this signal is formed.
(a) Calculate the Fourier series of the original signal extended periodically with period T = 1/2,
∞
1
x p (t) = ∑ x ( t + n ).
2
n=−∞
x c ( t ) = x ( t ) + x (1 − t ),
and then extended periodically with the period T = 1. Find the Fourier series coefficients and the
reconstruction formula.
(d) Comment the coefficients convergence in all cases.
M M
1 1 j4πnt 1 − j4πnt 1 sin(4πnt)
x M (t) = +∑ − e + e = −∑ .
4 n =1 4jπn 4jπn 4 n =1 2πn
1/2
Z 1/2 1/2 (−1)n
1 − j2πnt 1
− j2πnt 1 − j2πnt (−1)n − 1
Xn = te dt = te + e = + ,
1 − j2πn 0 (2πn)2 0 − j4πn (2πn)2
0
with X0 = 1/8. Note that the relation between the Fourier coefficients in (a) and (b) is
(b) ( a)
2X2n = Xn . The reconstruction is given in Fig. 1.6.
28 Continuous-Time Signals and Systems
0.6 0.6
0.4 0.4
x1(t)
x2(t)
0.2 0.2
0 0
(a) (b)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x30(t)
x6(t)
0.2 0.2
0 0
(c) (d)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
Figure 1.5 Reconstruction of the signal x (t) using the Fourier series. Reconstructed signal is denoted by x M (t),
where M indicates the number of coefficients used in reconstruction.
0.6 0.6
0.4 0.4
x1(t)
x2(t)
0.2 0.2
0 0
(a) (b)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x30(t)
x6(t)
0.2 0.2
0 0
(c) (d)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
Figure 1.6 Reconstruction of the periodic signal x (t), with a zero interval extension before the Fourier series is
used.
(c) For the signal xc (t) extended with its reversed version follows
1/2
Z Z1 1/2
Z
1 (−1)n − 1
Xn = Cn = te− j2πnt dt + (1 − t)e− j2πnt dt = 2 t cos(2πnt)dt =
1 2π 2 n2
0 1/2 0
1 M
(−1)n − 1 1 M
1
x M (t) = +2 ∑ 2 2
cos(2πnt) = − 2 ∑ 2 2
cos 2π (2n − 1)t .
n=1 π (2n − 1)
4 n=1 2π n
4
0.6 0.6
0.4 0.4
x1(t)
x2(t)
0.2 0.2
0 0
(a) (b)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
0.6 0.6
0.4 0.4
x30(t)
x6(t)
0.2 0.2
0 0
(c) (d)
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
t t
Figure 1.7 Reconstruction of a periodic signal after an even extension before using the Fourier series (cosine
Fourier series).
(d) The coefficients convergence in cases (a) and (b) is of order 1/n, while the convergence
in the last case (c) is of order 1/n2 . The best signal reconstruction, with a given number of
coefficients, will be achieved in case (c). Also, for a given reconstruction error the smallest
number of reconstruction terms M would be required in case (c). This kind of signal extension
(even signal extension) will be later used as a basis for the definition of the so called cosine signal
transforms. From these periodic extensions, we can also conclude that an extension that avoids
signal discontinuities at the interval ending instants improves the series convergence.
Example 1.9. Show that the Fourier series coefficients Xn of a periodic signal x (t) can be obtained
by minimizing the mean squared error between the signal and ∑nN=− N Xn e j2πnt/T within the
period T.
N
e(t) = x (t) − ∑ Xn e j2πnt/T ,
n=− N
T/2 2
Z N
1 j2πnt/T
I= x ( t ) − ∑ Xn e dt.
T n=− N
− T/2
30 Continuous-Time Signals and Systems
∗ = 0 follows
From ∂I/∂Xm
T/2
Z
!
N
1 − j2πmt/T j2πnt/T
T
e x (t) − ∑ Xn e dt = 0
n=− N
− T/2
T/2
Z
1
Xm = x (t)e− j2πmt/T dt. (1.33)
T
− T/2
Note: The derivative of a complex function F (z) = u( x, y) + jv( x, y), with z = x + jy, are u( x, y), v( x, y)
are real-valued functions, is defined by
∂F (z) ∂ ∂
= −j F ( x, y),
∂z ∂x ∂y
∂F (z) ∂ ∂
= +j F ( x, y).
∂z∗ ∂x ∂y
Commonly, a half of these values is used in the definition.
In order to justify the complex derivation ∂I/∂Xm ∗ in (1.33) let us denote: (i) the complex-valued
variable Xm by z = x + jy, (ii) all terms in x (t) − ∑nN=− N Xn e j2πnt/T = f (z) which do not depend on
z = Xm = x + jy by a + jb, and (iii) the value of −e j2πmt/T by e jα . Now we have to show that
∂F (z) ∂ | f (z)|2
∗
= = 2e− jα f (z).
∂z ∂z∗
In our case
2
| f (z)|2 = a + jb + e jα ( x + jy)
= ( a + x cos α − y sin α)2 + (b + x sin α + y cos α)2 .
For the minimization of the real-valued function | f (z)|2 of two variables x and y we need partial derivatives
∂ | f (z)|2
= 2 cos α( a + x cos α − y sin α) + 2 sin α(b + x sin α + y cos α) (1.34)
∂x
= 2 Re{e− jα f (z)}
and
∂ | f (z)|2
= 2 Im{e− jα f (z)}. (1.35)
∂y
Therefore, all calculations with two real-valued equations (1.34) and (1.35) are the same as one complex-
valued relation
∂ | f (z)|2 ∂ | f (z)|2 ∂ ∂ ∂ ∂ ∂F (z)
+j = +j | f (z)|2 = +j F (z) = .
∂x ∂y ∂x ∂y ∂x ∂y ∂z∗
The Fourier series has been introduced and presented for periodic signals, with a period T.
Assume now that the signal is of limited duration and that the period for its expansion is
extended toward infinity, while not changing the signal. This case corresponds to the analysis
of an aperiodic signal x (t). Its transform, the Fourier series coefficients normalized by the
Ljubiša Stanković Digital Signal Processing 31
period, is given by
T/2
Z Z∞
lim Xn T = lim x (t)e− j2πnt/T dt = x (t)e− jΩt dt (1.36)
T →∞ T →∞
− T/2 −∞
is called the Fourier transform (FT) of a signal x (t). For the Fourier transform existence it is
sufficient that the signal is absolutely integrable, that is
Z∞
| x (t)|dt < ∞. (1.38)
−∞
There are some signals that do not satisfy this condition, such as the unit-step signal, whose
Fourier transform exists in the form of generalized functions.
The inverse Fourier transform (IFT) can be obtained by multiplying both sides of (1.37)
by e jΩτ and integrating over Ω,
Z∞ Z∞ Z∞
X (Ω)e jΩτ dΩ = x (t)e jΩ(τ −t) dtdΩ.
−∞ −∞ −∞
Z∞ Z∞
e− at ∞ A
| x (t)|dt = A e− at dt = A = < ∞.
−a 0 a
0 0
⋆Since a direct calculation of the Fourier transform for this signal is not possible, let us consider
the signal − at
e for t > 0
x a (t) = 0 for t = 0
−e at for t < 0
where a > 0 is a real-valued constant. It is obvious that the signal x (t) can be obtained as the
following limit
lim x a (t) = x (t).
a →0
The Fourier transform of x (t) can be calculated from
where
Z0 Z∞
2Ω
Xa (Ω) = −e at e− jΩt dt + e− at e− jΩt dt = . (1.41)
ja2 + jΩ2
−∞ 0
It results in
2
X (Ω) = . (1.42)
jΩ
Based on the definitions of the Fourier transform and the inverse Fourier transform, it is
easy to conclude that the duality property holds:
If X (Ω) is the Fourier transform of x (t), then the Fourier transform of X (t) is
2πx (−Ω)
Example 1.12. Find the Fourier transform of the signals δ(t), x (t) = 1, and u(t).
Consider a linear, time-invariant system with an impulse response h(t) and the input signal
x (t) = Ae j(Ω0 t+ ϕ) . The output signal is
Z∞
y(t) = x (t) ∗t h(t) = Ae j(Ω0 (t−τ )+ ϕ) h(τ )dτ
−∞
Z∞
= Ae j(Ω0 t+ ϕ) h(τ )e− jΩ0 τ dτ = H (Ω0 ) x (t), (1.47)
−∞
where
Z∞
H (Ω) = h(t)e− jΩt dt (1.48)
−∞
is the Fourier transform of h(t). The linear time-invariant system does not change the form of
an input complex harmonic signal x (t) = Ae j(Ω0 t+ ϕ) . It remains complex harmonic signal
after passing through the linear time-invariant system, with the same frequency Ω0 . The
amplitude of the input signal x (t) is changed for | H (Ω0 )| and the phase is changed for
arg{ H (Ω0 )}.
where X1 (Ω) and X2 (Ω) are the Fourier transforms of signals x1 (t) and x2 (t), separately.
2. Realness: The Fourier transform of a signal is real-valued (that is, X ∗ (Ω) = X (Ω)), if
x ∗ (−t) = x (t),
since
Z∞ Z∞
∗ ∗ t→−t
X (Ω) = x (t)e jΩt
dt = x ∗ (−t)e− jΩt dt = X (Ω), (1.50)
−∞ −∞
if x ∗ (−t) = x (t).
3. Modulation: If the signal x (t) is modulated by e jΩ0 t the Fourier transform of the modulated
signal is shifted in frequency, that is
Z∞
FT{ x (t)e jΩ0 t
}= x (t)e jΩ0 t e− jΩt dt = X (Ω − Ω0 ) (1.51)
−∞
FT{2x (t) cos(Ω0 t)} = X (Ω − Ω0 ) + X (Ω + Ω0 ).
4. Shift in time: The Fourier transform of the signal x (t) shifted in time for t0 is modulated
in the frequency domain,
Z∞
FT{ x (t − t0 )} = x (t − t0 )e− jΩt dt = X (Ω)e− jt0 Ω . (1.52)
−∞
5. Time-scaling: For a signal scaled in time by factor a the Fourier transform is given by
Z∞
1 Ω
FT{ x ( at)} = x ( at)e− jΩt dt = X ( ). (1.53)
| a| a
−∞
6. Convolution: The Fourier transform of the convolution of signals x (t) and h(t) is equal to
the product of their corresponding Fourier transforms, that is
Z∞ Z∞
FT{ x (t) ∗t h(t)} = x (τ )h(t − τ )e− jΩt dτdt (1.54)
−∞ −∞
Z∞ Z∞
t−τ →u
= x (τ )h(u)e− jΩ(τ +u) dτdu = X (Ω) H (Ω).
−∞ −∞
Z∞ Z∞
∗ 1
x (t)y (t)dt = X (Ω)Y ∗ (Ω)dΩ, (1.56)
2π
−∞ −∞
Z∞ Z∞
1
| x (t)|2 dt = | X (Ω)|2 dΩ.
2π
−∞ −∞
Zt
x (τ )dτ,
−∞
can be calculated as the Fourier transform of the convolution of the signals x (t) and u(t),
Z∞ Zt
x (t) ∗t u(t) = x (τ )u(t − τ )dτ = x (τ )dτ.
−∞ −∞
Then, the Fourier transform of the signal x (t) integral is obtained from
Zt
FT = FT{ x (t)}FT{u(t)} =
x (τ )dτ (1.58)
−∞
1 1
X (Ω) + πδ(Ω) = X (Ω) + πX (0)δ(Ω).
jΩ jΩ
If the mean value of the signal x (t) is zero, when X (0) = 0, a multiplication by 1/( jΩ) in
the Fourier transform domain corresponds to the signal integration in the time domain.
11. An analytic part of a signal x (t), whose Fourier transform is X (Ω), is a signal with the
Fourier transform defined by
2X (Ω) for Ω > 0
Xa (Ω) = X (0) for Ω = 0 . (1.59)
0 for Ω < 0
36 Continuous-Time Signals and Systems
It can be written as
where Xh (Ω) is the Fourier transform of the Hilbert transform of the signal x (t). From
Example 1.11, with the signal x (t) = sign(t) and the duality property of the Fourier
transform pair, obviously the inverse Fourier transform of sign(Ω) is j/(πt). Therefore, the
analytic part of the signal x (t), in the time domain, takes the form
Z∞
j 1 x (τ )
x a (t) = x (t) + jxh (t) = x (t) + x (t) ∗t = x (t) + j dτ, (1.61)
πt π t−τ
−∞
p.v.
where p.v. stands for Cauchy principal value of the considered integral.
1.4.3 Relationship Between the Fourier Series and the Fourier Transform
Consider an aperiodic signal x (t), with the Fourier transform X (Ω). Assume that the signal
is of limited duration (that is, x (t) = 0 for |t| > T0 /2). Then,
TZ0 /2
X (Ω) = x (t)e− jΩt dt. (1.62)
− T0 /2
If we make a periodic extension of x (t), with the period T, we get the periodic signal
∞
x p (t) = ∑ x (t + nT ).
n=−∞
This periodic signal x p (t) can be expanded into Fourier series with the coefficients
T/2
Z
1
Xn = x p (t)e− j2πnt/T dt. (1.63)
T
− T/2
T/2
Z TZ0 /2
− j2πnt/T
x p (t)e dt = x (t)e− jΩt dt|Ω=2πn/T
− T/2 − T0 /2
or
1
Xn =
X (Ω)|Ω=2πn/T . (1.64)
T
It means that the Fourier series coefficients are equal to the samples of the Fourier transform,
divided by T. The only condition in the derivation of this relation is that the signal duration
Ljubiša Stanković Digital Signal Processing 37
is shorter than the period of its periodic extension (that is, T > T0 ). The sampling interval in
frequency is
2π 2π
∆Ω = , ∆Ω < .
T T0
This sampling interval should be smaller than 2π/T0 , where T0 is the signal x (t) duration.
This is a form of the sampling theorem in the frequency domain. It states that: the values
of X (Ω) can be recovered for any Ω, from its samples X (2πn/T ) = Xn T if T > T0 . The
sampling theorem in the time domain will be discussed later.
In order to write the Fourier series coefficients in the Fourier transform form, note that
the periodic signal x p (t), formed by a periodic extension of x (t) with period T, can be
written as
∞ ∞
x p (t) = ∑ x (t + nT ) = x (t) ∗t ∑ δ(t + nT ). (1.65)
n=−∞ n=−∞
The Fourier transform of this periodic signal is
( )
∞
X p (Ω) = FT x (t) ∗t ∑ δ(t + nT ) (1.66)
n=−∞
2π ∞ 2π 2π 2π 2π ∞
= X (Ω) · ∑ δ Ω− T n = T ∑ X T n δ Ω− T n
T n=− ∞ n=−∞
since
( ) Z∞
∞ ∞
FT ∑ δ(t + nT ) = ∑ δ(t + nT )e− jΩt dt
n=−∞ n=−∞−∞
∞
jΩnT 2π ∞ 2π
= ∑ e = ∑ δ Ω− T n . (1.67)
n=−∞ T n=− ∞
When a signal
x (t) = A(t)e jφ(t) (1.68)
is not of a simple analytic form, it may be possible, in some cases, to obtain an approximative
expression for its Fourier transform using the method of stationary phase.
The method of stationary phase states that if the phase function φ(t) is monotonous and
the amplitude A(t) is sufficiently smooth function, then
s
Z∞
jφ(t) − jΩt jφ(t0 ) − jΩt0 2πj
A(t)e e dt ≃ A(t0 )e e , (1.69)
|φ′′ (t0 )|
−∞
38 Continuous-Time Signals and Systems
Ωi ( t ) = φ ′ ( t )
is called the instantaneous frequency of a signal. Around the stationary phase instant t0 the
following relation holds
d (φ(t) − Ωt)
=0
dt | t = t0
φ′ (t0 ) − Ω = 0.
In the vicinity of the stationary phase instant, t0 , the phase can be expanded into a Taylor
series,
1
φ(t) − Ωt = [φ(t0 ) − Ωt0 ] + [φ′ (t0 ) − Ω] + φ′′ (t0 )t2 + . . .
2
Since φ′ (t0 ) − Ω = 0 the integral in (1.69) can be written in the form
Z∞ Z∞
1 ′′ ( t 2
A(t)e j(φ(t)−Ωt) dt ∼
= A(t0 )e j(φ(t0 )−Ωt0 ) ej 2 φ 0 )t dt,
−∞ −∞
where A(t) ∼
= A(t0 ) is also used. With
s
Z∞
j 21 at2 2πj
e dt =
| a|
−∞
Find its Fourier transform approximation using the stationary phase method.
Ljubiša Stanković Digital Signal Processing 39
⋆According to the stationary phase method, the instant of stationary phase follows from
φ′ (t0 ) − Ω = 0, that is
8πt0 + 10π = Ω
Ω − 10π
t0 =
8π
and
φ′′ (t0 ) = 8π. (1.70)
The amplitude of X (Ω) is
s r
2π 2π
2 2
| X (Ω)| ≃ A(t0 ) = exp(−(t0 − 1)t0 )
φ′′ (t0 ) 8π
" 2 # !
1 Ω − 10π Ω − 10π 2
= exp − −1 (1.71)
2 8π 8π
The signal, stationary phase approximation of the Fourier transform (its amplitude) and the
numerical value of the Fourier transform amplitudes are shown in Fig. 1.8
where A(t) is a slow-varying non-negative function. Find its Fourier transform approximation
using the stationary phase method.
⋆According to the stationary phase method, we get that the stationary phase point is
−1
2Nat2N
0 = Ω with
1/(2N −1)
Ω
t0 =
2Na
and
Ω (2N −2)/(2N −1)
φ′′ (t0 ) = 2N (2N − 1) a . (1.72)
2Na
The amplitude and phase of X (Ω), according to (1.69), are
2π
| X (Ω)|2 ≃ A2 (t0 ) ′′ (1.73)
φ ( t0 )
2 Ω 1/(2N −1) 2π Ω 1/(2N −1)
=A ( )
2Na (2N − 1)Ω 2aN
1/(2N −1)
(1 − 2N ) Ω
arg { X (Ω)} ≃ φ(t0 ) − Ωt0 + π/4 = Ω + π/4
2N 2aN
1
x(t)
−1
−4 −3 −2 −1 0 1 2 3 4
t
1
0.5
0
−100 −80 −60 −40 −20 0 20 40 60 80 100
Ω
1
Numeric calculation
|X(Ω)|
0.5
0
−100 −80 −60 −40 −20 0 20 40 60 80 100
Ω
Figure 1.8 The signal (top), along with the stationary phase method approximation of its Fourier transform and
the Fourier transform obtained by numeric calculation with a high precision (bottom).
The method of stationary phase may be defined in the frequency domain as well. For a
Fourier transform
X (Ω) = B(Ω)e jθ (Ω) (1.74)
the method of stationary phase states that if the Fourier transform phase, θ (t), is monotonous
and the amplitude, B(t), is sufficiently smooth function, then
s
Z∞
1 j
x (t) = B(Ω)e jθ (Ω) e jΩt dΩ ≃ B(Ω0 )e jθ (Ω0 ) e jΩ0 t , (1.75)
2π 2π |θ ′′ (Ω0 )|
−∞
Find the impulse response of this system using the stationary phase method.
The signal amplitude is delayed for b. The second-order parameter a in the phase scales the time
axis of the impulse response. This is an undesirable effect in common systems.
Example 1.16. For a system with the frequency response H (Ω) = | H (Ω)| e j0 the impulse response
is h(t). Find the impulse response of the systems whose transfer functions are:
(a) Ha (Ω) = | H (Ω)| e− j4Ω ,
2
(b) Hb (Ω) = | H (Ω)| he− j2πΩ , and i
3
(c) Hc (Ω) = | H (Ω)| 4 + 14 cos(2πΩ2 ) e j0 .
The Fourier transform could be considered as a special case of the Laplace transform. In the
beginning, Fourier’s work was even not published as an original contribution mainly due to
this fact. The Laplace transform is defined by
Z∞
X (s) = L{ x (t)} = x (t)e−st dt, (1.76)
−∞
Example 1.17. Calculate the Laplace transform of x (t) = e at u(t), for a real-valued constant a.
if limt→∞ e−(s− a)t = 0 or σ − a > 0, that is, σ > a. The region of convergence of this Laplace
transform, X (s), is a part of the complex s-plane where σ > a. The point s = a is the pole of the
Laplace transform. The region of convergence cannot include any poles and it is limited by a
vertical line in the complex s-plane passing through the pole, as shown in Fig. 1.9.
The Laplace transform may be considered as the Fourier transform of a signal x (t)
multiplied by exp(−σt), with varying parameter σ, that is
Z∞ Z∞
−σt −σt − jΩt
FT{ x (t)e }= x (t)e e dt = x (t)e−st dt = X (s). (1.77)
−∞ −∞
Ljubiša Stanković Digital Signal Processing 43
Figure 1.9 The region of convergence (ROC) of the Laplace transform of the signal x (t) = e at u(t) for a = −1.
In this way, we may calculate the Laplace transform of signals that are not absolutely
integrable,
R∞ that is, do not satisfy the condition for the Fourier transform convergence,
−∞ | x ( t )| dt < ∞. For some values of σ, the new signal x (t)e−σt may be absolutely
integrable and the Laplace transform could exist.
In the previous example, the Fourier transform does not exist for a > 0, while for a = 0
it exists in the sense of the generalized functions sense only. The Laplace transform of the
considered signal always exists, with the region of convergence σ > a. If a < 0, then the
region of convergence σ > a includes the line σ = 0, meaning that the Fourier transform
exists.
Example 1.18. Find the Laplace transform of x (t) = −e at u(−t) and the region of its convergence.
if limt→−∞ e−(s− a)t = 0 or σ − Re{ a} < 0, that is, σ < Re{ a}, where Re{ a} is a real part of a.
The Laplace transform X (s) is this example has the same form as X (s) in the previous example,
with different regions of convergence. The Fourier transform of x (t) = −e at u(−t) exists if σ = 0
is within the region of convergence, that is, if Re{ a} > 0.
where the integration is performed along a path, with γ within the region of convergence of
X ( s ).
Properties of the Laplace transform may easily be generalized from those presented for the
Fourier transform in Section 1.4.2, like for example, the linearity property and the convolution
property, given by
and
L{ x (t) ∗t h(t)} = L{ x (t)}L{h(t)} = X (s) H (s).
Since the Laplace transform will be used to analyze linear systems described by linear
differential equations we will pay a special attention to the relation of the signal derivatives
to the corresponding forms in the Laplace domain. In general, the Laplace transform of the
first derivative, dx (t)/d(t), of a signal x (t) is
Z∞ ∞ Z∞
dx (t) −st −st
e dt = x (t)e +s x (t)e−st dt = sX (s). (1.78)
dt −∞
−∞ −∞
This relation follows from the integration in parts, with the assumption that the values of
x (t)e−st are zero as t → ±∞.
Unilateral Laplace transform. In many applications, causal systems are assumed, with the
corresponding causal signals used in calculations. In these cases, x (t) = 0 for t < 0, that
is, x (t) = x (t)u(t). Then, the so called one-sided Laplace transform (unilateral Laplace
transform) is used. Its definition is
Z∞
X (s) = x (t)e−st dt.
0
The region of convergence for the unilateral Laplace transform is the right-sided part of the s
plane. This topic is discussed in Section 1.6.3.
When dealing with the derivatives of causal signals we have to take care about possible
discontinuity at t = 0. In general the first derivative of the function x (t)u(t) is
Z∞ ∞ Z∞
dx (t) −st −st
e dt = x (t)e + s x (t)e−st dt = sX (s) − x (0). (1.79)
dt 0
0 0
The previous relation can easily be generalized to the higher-order signal derivatives of
x (t) and the corresponding Laplace transforms
Z∞ n Z∞
d x (t) −st
n
e dt = s n
x (t)e−st dt − sn−1 x (0) − sn−2 x ′ (0) − · · · − x (n−1) (0)
dt
0 0
n
= s n X ( s ) − s n −1 x (0 ) − s n −2 x ′ (0 ) − · · · − x ( n −1) (0 ) = s n X ( s ) − ∑ s n − m x ( m −1) (0 ).
m =1
The unilateral Laplace transform of an integral with variable limit of the signal x (t) is
Zt
1
L{ x (τ )dτ } = L{u(t) ∗t x (t)} = L{u(t)}L{ x (t)} = X ( s ),
s
0
since
Z∞
1
L{u(t)} = e−st dt = .
s
0
The signal that corresponds to the derivative of the Laplace transform is obtained from
Z∞ Z∞
dX (s) d −st
= x (t)e dt = −tx (t)e−st dt.
ds ds
0 0
Example 1.20. Find the Laplace transform of the signal x (t) = e jΩ0 t u(t).
46 Continuous-Time Signals and Systems
for σ > 0. The Laplace transforms of cos(Ω0 t)u(t) and sin(Ω0 t)u(t) follow from the last
relation as
L{cos(Ω0 t)u(t)} = L{e jΩ0 t u(t)}/2 + L{e− jΩ0 t u(t)}/2 = s/(s2 + Ω20 ),
L{sin(Ω0 t)u(t)} = L{e jΩ0 t u(t)}/2j − L{e− jΩ0 t u(t)}/2j = Ω0 /(s2 + Ω20 ).
The initial value theorem and the final value theorem for the signal x (t) are
respectively. Both of them follow from (1.79). The requirement is that the Laplace transform
of the signal, x (t), and its derivative dx (t)/dt, exist. The final value of the signal does not
exist if the poles of sX (s) are: (a) on the right side of the s plane, (b) there is a pair of the
conjugate-complex poles on the imaginary axis, (c) at the origin. The initial value theorem
requires that the signal does not contain delta pulses at the origin.
After we have established the relation between the Laplace transform and the signals
derivatives we may use it to analyze the systems described by differential equations. Consider
a causal system described by the following differential equation
with the initial conditions x (0) = x ′ (0) = x (n−1) (0) = 0. The Laplace transform of both
sides of this differential equation is
Y (s) b s M + · · · + b1 s + b0
H (s) = = M N . (1.81)
X (s) a N s + · · · + a1 s + a0
Stability and causality. A Rlinear time invariant system is stable if its impulse response
∞
h(t) satisfies the condition −∞ | h(t)| dt < ∞. Within the Laplace transform framework,
this condition means that the line σ = 0, in the complex s-plane, belongs to the region of
convergence of the transfer function, H (s).
A system whose impulse response is of the form
N
h(t) = ∑ A n e an t u ( n )
n =1
is causal. Although, this is not a general form of causal systems, it is important in system
analysis. The transfer function of this system is given by
N
An
H (s) = ∑ , (1.82)
n =1
s − an
with the region of convergence defined by the set of inequalities σ > Re{ a1 }, σ > Re{ a2 },
. . . , σ > Re{ a N }. These inequalities can be written in a compact form as
The region of convergence of the causal system in (1.82) is the right side of the line
σ = maxn Re{ an }, passing trough the pole with the largest real value part.
The system defined by (1.81) can be written in the form given by (1.82) if the polynomial
order in the denominator is higher that the nominator polynomial order, that is, if N > M.
This system is causal and stable if all poles an reside in the left side of the complex s-plane,
that is, if Re{ an } < 0 for all n = 1, 2, . . . , N and the region of convergence is defined by
48 Continuous-Time Signals and Systems
σ > maxn Re{ an }, as illustrated in Fig. 1.10. Possible higher-order (multiple) poles in H (s)
would not change the conclusion about its causality.
Figure 1.10 Poles of a stable and causal system, with its region of convergence, σ > maxn Re{ an }, that includes
the line σ = 0.
Example 1.21. A causal system with a proportional regulator is described with the transfer function,
K
H (s) = ,
s2 + 4s + K
where K is the constant of the regulator. Find the system response to the input signal x (t) = u(t),
for K = 4, K = 3 < 4, and K = 20 > 4.
⋆ The Laplace transform of the input signal is X (s) = L{u(t)} = 1/s. The poles of the transfer
function H (s) are obtained from s2 + 4s + K = 0 as
√
s1,2 = −2 ± 4 − K.
3 A B C 1 3/2 1/2
Y (s) = = + + = − + ,
s(s + 1)(s + 3) s s+1 s+3 s s+1 s+3
Ljubiša Stanković Digital Signal Processing 49
and
3 1
y(t) = (1 − e−t + e−3t )u(t).
2 2
In this case, the slowest converging term is of the form e−t . The convergence of this term toward
the steady-state is slower than in the case with K = 4 (overdumped response).
Finally, for K = 20 the outputs in the Laplace domain and the time domain are of the form
20 1 (2 + j)/4 (2 − j)/4
Y (s) = = − − ,
s(s + 2 + j4)(s + 2 − j4) s s + 2 + j4 s + 2 − j4
and
2 + j −2t(1+2j) 2 − j −2t(1−2j) 1
y ( t ) = (1 − e − e )u(t) = (1 − e−2t cos(4t) − e−2t sin(4t))u(t).
4 4 2
The convergence toward the steady-state is defined by the function e−2t , with the oscillatory term
sin(4t) (underdumped response that overshoots its final value).
The responses of the system for the three considered cases are shown in Fig. 1.11.
1.2
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Figure 1.11 The responses of the system for the three considered cases: K = 4 (critically dumped), K = 3
(overdumped), and K = 20 (underdumped)
⋆ The poles of this transfer function are s1 = −2, s2 = −1, and s2 = 1. The system is stable if
the line σ = 0 belongs to the region of convergence. This region of convergence is defined by
−1 < σ < 1. In order to find the inverse Laplace transform, the transfer function can be written
50 Continuous-Time Signals and Systems
in the form
1 1 1
H (s) = + + .
s+2 s+1 s−1
The impulse response for the region of convergence defined by −1 < σ < 1 is
Solution to the differential equations using the Laplace transform. The output of a
linear-time invariant system described by (1.80) can be found by solving the corresponding
differential equation. The Laplace transform approach to solve differential equations is of
crucial importance in engineering. In general, if the initial conditions are included in (1.80),
the corresponding Laplace transform domain equation is
N n
a N s N Y (s) + · · · + a1 sY (s) + a0 Y (s) − ∑ an ∑ s n − m x ( m −1) (0 )
n =0 m =1
= b M s M X (s) + · · · + b1 sX (s) + b0 X (s).
The Laplace transform of the solution (output signal) can be written in the form
B(s) C (s)
Y (s) = X (s) + ,
A(s) A(s)
N n
C (s) = ∑ a n ∑ s n − m x ( m −1) (0 ) .
n =0 m =1
The output consists of two parts, Y (s) = Yp (s) + Yh (s), whose form is defined as follows:
B(s)
Yp (s) = X ( s ),
A(s)
and it is called the forced response (in mathematics, the particular part of the
differential equation solution).
Ljubiša Stanković Digital Signal Processing 51
C (s)
Yh (s) = .
A(s)
This part is independent of the input signal and it is called the natural response (in
mathematics, the homogeneous part of the solution).
3 A B
Y (s) = = + .
s ( s + 2) s s+2
3 = A(s + 2) + Bs.
3
y(t) = (1 − e−2t )u(t).
2
or
Y (s)(s2 + 3s + 2) = X (s) + sy(0) + y′ (0) + 3y(0).
The Laplace transform of x (t) = e−4t u(t) is equal to X (s) = 1/(s + 4). The Laplace transform
of the output signal is equal to
s+5 A1 A2 A3
Y (s) = 2
= + + .
(s + 4)(s + 3s + 2) s+4 s+2 s+1
A i = ( s − s i )Y ( s ) | s = s i .
For example,
s+5
A1 = ( s + 4) = 1/6.
(s + 4)(s2 + 3s + 2) |s=−4
The other two coefficients are A2 = −3/2 and A3 = 4/3.
The output signal, y(t), is the inverse Laplace transform of Y (s), that is
1 −4t 3 4
y(t) = e u(t) − e−2t u(t) + e−t u(t).
6 2 3
Note that 16 e−4t u(t) = y p (t) is the forced response and − 32 e−2t u(t) + 34 e−t u(t) = yh (t) is the
natural response in the time domain.
The most common processing systems in communications and signal processing are the
filters, used to selectively pass a part of the input signal within a predefined band in the
frequency domain and to reduce possible interferences in such a way. The basic form of
filters is the lowpass filter. Here we will present a simple Butterworth lowpass filter.
The squared frequency response of the Butterworth lowpass filter is defined by
1
| H ( jΩ)|2 = 2N .
Ω
1+ Ωc
It is shown in Fig. 1.12, for various N. This filter definition contains two parameters. Order
of the filter is N. It is a measure of the transition sharpness from the passband to the stopband
region. For N → ∞, the amplitude form of an ideal lowpass filter is achieved. The second
parameter is the critical frequency Ωc . At the frequency Ω equal to the critical frequency,
Ω = Ωc , we get
1 1
| H ( jΩc )|2 = | H (0)|2 = ,
2 2
corresponding to 10 log(1/2) = −3[dB] gain, for any filter order N.
Ljubiša Stanković Digital Signal Processing 53
1
H ( jΩ) H (− jΩ) = 2N
jΩ
1+ jΩc
1
H (s) H (−s) = 2N for s = jΩ.
s
1+ jΩc
Poles of the product of the transfer functions, H (s) H (−s), are of the form
2N
sk
= −1 = e j(2πk+π )
jΩc
sk = Ωc e j(2πk+π )/2N + jπ/2 for k = 0, 1, 2, . . . , 2N − 1.
The poles of the product H (s) H (−s) of the transfer function H (s) of the Butterworth filter
and its reversed version H (−s) are located on the circle whose radius is Ωc and at the
positions defined by the phases
2πk + π π
αk = + for k = 0, 1, 2, . . . , 2N − 1.
2N 2
For a given filter order N and the critical frequency Ωc the only remaining decision is to
select a half of the poles sk that belong to H (s) and to declare that the remaining half of the
poles belongs to H (−s). Since we want that the designed filter is stable and causal then we
chose the poles
s0 , s1 , . . . , s N −1
within the left side of the s-plane, where Re{s} < 0, π/2 < αk < 3π/2. The symmetric
poles with Re{s} > 0 are the poles of H (−s). They are not used in the filter design.
Figure 1.12 Squared amplitude of the frequency response of a Butterworth filter for various orders N.
54 Continuous-Time Signals and Systems
Example 1.25. Design a lowpass Butterworth filter with the following filter order, N, and critical
frequency, Ωc ,
(a) N = 3 and Ωc = 1,
(b) N = 4 and Ωc = 3.
⋆(a) The poles for the third-order filter, N = 3, and critical frequency Ωc = 1, have the phases
2πk + π π
αk = + , for k = 0, 1, 2.
6 2
The pole values are
√
2π 2π 1 3
s0 = cos( ) + j sin( ) = − + j
3 3 2 2
2π π 2π π
s1 = cos( + ) + j sin( + ) = −1
3 3 3 3
√
2π 2π 2π 2π 1 3
s2 = cos( + ) + j sin( + )=− −j
3 3 3 3 2 2
with the third-order Butterworth filter transfer function
c 1
H (s) = √ √ = ,
(s + 1
−j 3 1 3 (s2 + s + 1)(s + 1)
2 2 )( s + 2 +j 2 )( s + 1)
2πk + π π
αk = + , for k = 0, 1, 2, 3.
8 2
Their values are
π π π π π 3π π 3π
s0 = 3 cos( + ) + j3 sin( + ), s1 = 3 cos( + ) + j3 sin( + )
2 8 2 8 2 8 2 8
π 5π π 5π π 7π π 7π
s2 = 3 cos( + ) + j3 sin( + ), s3 = 3 cos( + ) + j3 sin( + ),
2 8 2 8 2 8 2 8
Ljubiša Stanković Digital Signal Processing 55
In practice, we usually do not know the filter order N, but the passband frequency Ω p
and the stopband frequency Ωs , of the filter, with the maximum attenuation in the passband
a p [dB] and the minimum attenuation in the stopband as [dB], as shown in Fig. 1.14. Based
on these values we can calculate the filter order, N, and the critical frequency, Ωc , needed
for the filter design.
The passband and stopband relations for N and Ωc are
1 2
2N ≥ A p (1.83)
Ωp
1+ Ωc
1 2
2N ≤ As , (1.84)
Ωs
1+ Ωc
where A p and As are the required amplitudes of the frequency response at the respective
passband and stopband frequencies, Ω p and Ωs . The relation
a = 20 log A
Figure 1.14 Specification of the Butterworth filter parameters in the passband and stopband.
Using the equality in both relations, (1.83) and (1.84), the order N follows
The nearest greater integer is assumed for the filter order N. Next, we can use any of the
relations in (1.83) or (1.84) with the equality sign to calculate Ωc . If we choose the first
56 Continuous-Time Signals and Systems
2
one, then the critical frequency Ωc will satisfy H ( jΩ p ) = A2p , while if we use the second
relation, the value of Ωc will satisfy | H ( jΩs )|2 = A2s . These two values differ. However
both of them are within the defined criteria for the transfer function passband and stopband.
All other filter forms, like bandpass and highpass, may be obtained from a lowpass filter
with appropriate signal modulations. These modulations will be discussed for discrete-time
filter forms in Chapter V.
Part II
Deterministic Discrete-Time
Signals and Systems
57
Chapter 2
Discrete-Time Signals and Transforms
Discrete-time signals (discrete signals) are represented in a form of an ordered set of numbers
{ x (n)}. Commonly, they are obtained by sampling continuous-time signals. There exist
discrete-time signals whose independent variable is inherently discrete in nature as well.
In the case that a discrete-time signal is obtained by sampling a continuous-time signal,
we can write (Fig. 2.1),
x (n) = x (t)|t=n∆t ∆t. (2.1)
∆t t n
Figure 2.1 Signal discretization: continuous-time signal (left) and corresponding discrete-time signal (right).
Discrete-time signals are defined for an integer value of the argument n. We will use
the same notation for continuous-time and discrete-time signals, x (t) and x (n). However,
58
Ljubiša Stanković Digital Signal Processing 59
we hope that this will not cause any confusion since we will use different sets of variables,
for example, t and τ for continuous time and n and m for discrete time. Also, we hope that
the context will always be clear, so that there is no doubt what kind of signal is considered.
Notation x [n] is sometimes used in literature for discrete-time signals, instead of x (n).
It is presented in Fig. 2.2. In contrast to the continuous-time impulse signal, that cannot be
1 1
x(n)=u(n)
δ(n)
0 0
−1 −1
(a) (b)
−10 0 10 −10 0 10
t n
1 1
x(n)=b(n)
sin(nπ/4)
0 0
−1 −1
(c) (d)
−10 0 10 −10 0 10
n n
Figure 2.2 Illustration of discrete-time signals: (a) unit-step function, (b) discrete-time impulse signal, (c) boxcar
signal b(n) = u(n + 2) − u(n − 3), and (d) discrete-time sinusoid.
practically implemented and used, the discrete-time unit impulse is a signal that can easily
be implemented and used in realizations. In mathematical notation, this signal corresponds
to the Kronecker delta function
1, for m = n
δm,n = (2.3)
0, for m 6= n.
Any discrete-time signal can be written in a form of a sum of shifted and weighted
discrete-time impulses,
∞
x (n) = ∑ x ( k ) δ ( n − k ), (2.4)
k =−∞
4 4
2 2
−2δ (n+2)
x(n)
0 0
−2 −2
−4 −4
−5 0 5 −5 0 5
n n
4 4
2 2
−δ(n−1)
3δ(n)
0 0
−2 −2
−4 −4
−5 0 5 −5 0 5
n n
δ ( n ) = u ( n ) − u ( n − 1)
n
u(n) = ∑ δ ( k ).
k =−∞
x ( n + N ) = x ( n ). (2.7)
Smallest positive integer N that satisfies this equation is called the period of the discrete-
time signal x (n). Note that the signal x (n) with a period N is also periodic with any integer
multiple of N. Some basic discrete-time signals are presented in Fig. 2.2.
Example 2.1. Check the periodicity of discrete-time signals x1 (n) = sin(2πn/36), x2 (n) =
cos(4πn/15 + 2), x3 (n) = exp( j0.1n), x4 (n) = x1 (n) + x2 (n), and x5 (n) = x1 (n) + x3 (n).
⋆Period of the discrete-time signal x1 (n) = sin(2πn/36) is obtained from 2πN1 /36 = 2πk,
where k is an integer. It is N1 = 36, for k = 1. The period N2 follows from 4πN2 /15 = 2πk as
N2 = 15 with k = 2. Period of signal x3 (n) should be calculated from 0.1N3 = 2πk. Obviously,
Ljubiša Stanković Digital Signal Processing 61
there is no integer k such that N3 is an integer. This signal is not periodic. The same holds for
x5 (n). The period of x4 (n) is a common period for signals x1 (n) and x2 (n) with N1 = 36 and
N2 = 15. It is N4 = 180.
x (n) = x (−n).
where xe (n) and xo (n) are its even and odd part, respectively.
⋆For a signal x (n) we can form its even and odd part as
x (n) + x (−n) x (n) − x (−n)
xe (n) = and xo (n) = .
2 2
Summing these two parts, the signal x (n) is reconstructed. Note that xo (0) = 0.
A signal is Hermitian if
x (n) = x ∗ (−n).
Magnitude of a discrete-time signal is defined as the maximum value of the signal
amplitude
Mx = max | x (n)| .
−∞<n<∞
Energy of discrete-time signals is defined by
∞
Ex = ∑ | x (n)|2 . (2.8)
n=−∞
The instantaneous power of x (n) is Px (n) = | x (n)|2 , while the average signal power is
1 N D E
PAV = lim ∑ | x (n)|2 = | x (n)|2 , (2.9)
N →∞ 2N + 1 n=− N
D E
where | x (n)|2 is used to denote an average over large number of signal values, as N → ∞.
The average power of signals with a finite energy (energy signals) is PAV = 0. For power
signals (when 0 < PAV < ∞) the energy is infinite, Ex → ∞.
62 Discrete-Time Signals and Transforms
Example 2.3. The energy of signal x (n) is Ex = 10. The energy of its even part is Exe = 3. Find
the energy of its odd part.
The terms xo (n) xe∗ (n) and xe (n) xo∗ (n) in the last sum correspond to odd signals
For the signals xe (n) and xo (n), satisfying the previous relation, we say that they are orthogonal.
Therefore, for the energies Ex , Exe , and Exo , holds
Ex = Ex e + Ex o .
Discrete-time (discrete) system transforms one discrete-time signal (input) into the other
(output signal)
y(n) = T { x (n)}. (2.10)
A discrete system T {·} is linear if for any two signals x1 (n) and x2 (n) and any two constants
a1 and a2 holds
holds
T { x (n − n0 )} = y(n − n0 ),
Ljubiša Stanković Digital Signal Processing 63
for any n0 .
For any input signal x (n) the signal at the output of a linear time-invariant discrete
system can be calculated if we know the output to the impulse signal. The output to the
impulse signal, h(n) = T {δ(n)}, is the impulse response.
The output to an input signal x (n) is
( )
∞
y(n) = T { x (n)} = T ∑ x (k)δ(n − k) .
k =−∞
x ( n ) ∗ n h ( n ) = h ( n ) ∗ n x ( n ). (2.15)
Example 2.4. Calculate discrete-time convolution of signals x (n) and h(n) shown in Fig. 2.4.
3 3
2 2
1 1
h(n)
x(n)
0 0
−1 −1
−2 −2
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
3 3
2 2
h(−k)
1 1
x(k)
0 0
−1 −1
−2 −2
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
n n
3 3
2 2
h(1−k)
h(2−k)
1 1
0 0
−1 −1
−2 −2
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
n n
Figure 2.5 The signals x (k ), h(−k), h(1 − k), and h(2 − k ) used for the calculation of the output signal values
y(0), y(1), and y(2).
In a similar way y(−2) = 2, y(−1) = −1, y(2) = 6, y(3) = 2, y(4) = −1, y(5) = −1, and
y(n) = 0, for all other n. The convolution y(n) is shown in Fig. 2.6.
8
6
4
y(n)
2
0
−2
−4 −2 0 2 4 6 8
n
Example 2.5. Calculate the convolution of signals x (n) = n[u(n) − u(n − 10)] and h(n) = u(n).
Using the fact that u(k) − u(k − 10) = 1 for 0 ≤ k ≤ 9 and u(n − k) = 1 for k ≤ n we get
n
∑ k = n n+ 1
for 0 ≤ n ≤ 9
2
k =0
y(n) = ∑ k = 9
0≤ k ≤9 ∑ k = 45 for n > 9
k≤n k =0
n+1
=n [u(n) − u(n − 10)] + 45u(n − 10).
2
Example 2.6. If the response of a linear time-invariant system to the unit-step is y(n) =
T {u(n)} = e−n u(n) find the impulse response h(n) of this system.
A discrete system is causal if there is no response before the input signal appears. For
causal linear time-invariant discrete systems h(n) = 0 for n < 0 holds. For a signal that may
be an impulse response of a causal system we say that it is a causal signal or one-sided signal.
A discrete system is stable if any input signal with a finite magnitude, Mx =
max−∞<t<∞ | x (n)|, produces the output y(n) whose values are finite, |y(n)| < ∞. A
discrete linear time-invariant system is stable if
∞
∑ |h(m)| < ∞. (2.16)
m=−∞
Therefore |y(n)| < ∞ if (2.16) holds. It can be shown that the absolute sum convergence
of the impulse response h(n) is the necessary condition for a linear time-invariant discrete
system to be stable as well.
66 Discrete-Time Signals and Transforms
Notation X (e jω ) is used to emphasize the fact that it is a periodic function of the normalized
frequency ω. The period is 2π.
In order to establish the relation between the Fourier transform of discrete-time signals
and the Fourier transform of continuous-time signals,
Z∞
X (Ω) = x (t)e− jΩt dt,
−∞
∞ Zπ Zπ
∑ x (n) e− jω (n−m) dω = X (e jω )e jωm dω.
n=−∞ −π −π
Since
Zπ
e− jω (n−m) dω = 2πδ(n − m),
−π
Ljubiša Stanković Digital Signal Processing 67
Zπ
1
x (n) = X (e jω )e jωn dω. (2.21)
2π
−π
x (n) = Ae−α|n|
Example 2.8. Find the inverse Fourier transform of a discrete-time signal if X (e jω ) = 2πδ(ω ) for
−π ≤ ω < π and X (e jω ) = 2π ∑∞
k=−∞ δ ( ω + 2kπ ) for any ω.
⋆By definition
Zπ
1
x (n) = 2πδ(ω )e jωn dω = 1.
2π
−π
Therefore, the Fourier transform of signal x (n) = 1 is
∞ ∞
∑ e− jωn = 2π ∑ δ(ω + 2kπ ). (2.23)
n=−∞ k =−∞
The equivalent form in the continuous-time domain is obtained (by using ω = ΩT and
δ( TΩ) = δ(Ω)/T) as
∞ ∞
2π ∞
∑ e jΩnT = 2π ∑ δ(ΩT + 2kπ ) = ∑ δ(Ω + 2kπ/T ). (2.24)
n=−∞ k=−∞
T k=− ∞
68 Discrete-Time Signals and Transforms
2.3.1 Properties
where X (e jω ) and Y (e jω ) are the Fourier transforms of the discrete-time signals x (n) and
y(n), respectively.
Shift and modulation: With respect to the signal shift and modulation the Fourier transform
of discrete-time signals behaves in the same way as the Fourier transform of continuous-time
signals,
FT{ x (n − n0 )} = X (e jω )e− jn0 ω (2.26)
and
FT{ x (n)e jω0 n } = X (e j(ω −ω0 ) ). (2.27)
Example 2.9. The Fourier transform of a discrete-time signal x (n) is X (e jω ). Find the Fourier
transform of y(n) = x (2n).
Example 2.10. Calculate the Fourier transform of the discrete-time signal (rectangular window),
w R ( n ) = u ( N + n ) − u ( n − N − 1). (2.29)
⋆By definition
N
1 − e− jω (2N +1) sin(ω 2N2+1 )
WR (e jω ) = ∑ e− jωn = e jωN − jω
= . (2.30)
n=− N 1−e sin(ω/2)
Ljubiša Stanković Digital Signal Processing 69
As the window width increases in the time domain the main lobe width in the Fourier domain
1
N=4
W (e )
jω
w (n)
0.5
R
R
0
−10 0 10 −π ω π
n
1
N=8
W (e )
jω
w (n)
0.5
R
−10 0 10 −π ω π
n
1
N=8
W (e )
jω
w (n)
0.5
H
−10 0 10 −π ω π
n
Figure 2.7 Discrete-time signal in a form of rectangular window of the widths 2N + 1 = 9 and 2N + 1 = 17
samples (top and middle), and a Hann(ing) window with 2N + 1 = 17 (bottom). The time domain values are on the
left while the Fourier transforms of these discrete-time signals are on the right.
is narrowing. The first zero value of the Fourier transform of a rectangular window is at
ω (2N + 1)/2 = π, that is, at ω = 2π/(2N + 1) where 2N + 1 is the signal duration. In
the case of a Hann(ing) window the main lobe is wider as compared to the rectangular window of
the same width, but its convergence is much faster with very reduced oscillations in the Fourier
transform, Fig. 2.7.
70 Discrete-Time Signals and Transforms
Example 2.11. Find the output of a discrete linear time-invariant system with frequency response
H (e jω ) if the input signals are:
(a) x (n) = Ae jω0 n and
(b) x (n) = A cos(ω0 n + ϕ). What is the output if the impulse response h(n) is real-valued?
holds
H (e jω ) = H ∗ (e− jω )
with the even amplitude and odd phase of the transfer function
H (e jω ) = H (e− jω )
∞
∑n=−∞ h(n) sin(ωn)
arg{ H (e jω ) = − arctan = − arg{ H (e− jω )}.
∑∞n=−∞ h ( n ) cos ( ωn )
Ljubiša Stanković Digital Signal Processing 71
The output signal for a real-valued impulse response and x (n) = A cos(ω0 n + ϕ) is of the form
y(n) = A H (e jω0 ) cos(ω0 n + ϕ + arg{ H (e jω0 )}).
cos(πn) (−1)n
h(n) = =
n n
for n 6= 0 and h(n) = 0 for n = 0. Using samples n = ±1, ±2, . . . , ± N the approximation of
the frequency response is
N N
sin(ωn)
HN (e jω ) = ∑ h(n)e− jωn = 2j ∑ (−1)n−1 .
n=− N n =1
n
Product of signals: The Fourier transform of a product of the two discrete-time signals x (n)
and h(n) is equal to the convolution of their Fourier transforms in the frequency domain,
Zπ
1
FT{ x (n)h(n)} = X (e jθ ) H (e j(ω −θ ) )dθ = X (e jω ) ∗ω H (e jω ). (2.34)
2π
−π
∞ ∞ Zπ
1
∑ x(n)y (n) = ∑ 2π X (e jω )e jωn y∗ (n)dω
∗
(2.35)
n=−∞ n=−∞ −π
Zπ
! Zπ
∞
1 − jωn ∗ 1
= jω
X (e ) ∑ ( e y ( n )) dω = X (e jω )Y ∗ (e jω )dω.
2π n=−∞ 2π
−π −π
∞ Zπ 2
1
2
∑ | x(n)| = 2π X (e jω ) dω = Ex .
n=−∞ −π
2
Function X (e jω ) is the spectral energy density of signal x (n).
Since the average power of a signal x (n) is defined by
N
1
PAV = lim ∑ | x (n)|2 , (2.36)
N →∞ 2N + 1 n=− N
1 2
Pxx (e jω ) = lim X N (e jω ) , (2.37)
N →∞ 2N + 1
N N
1
Pxx (e jω ) = lim ∑ ∑ x (n) x ∗ (m)e− jω (n−m) . (2.38)
N →∞ 2N + 1 n=− N m=− N
2N
1
Pxx (e jω ) = lim ∑ (2N + 1 − |k|)r (k)e− jωk ,
N →∞ 2N + 1
k =−2N
since the value r (0), for n − m = k = 0, appears 2N + 1 times along the diagonal in n, m
domain in (2.38). The value for n − m = k = ±1 appears 2N times, and so on. The value
r (k), for n − m = k, appears 2N + 1 − |k| times in double summation (2.38). Note that
Pxx (e jω ), in this case, is the Fourier transform of r (k) multiplied by the Bartlett window
1 − |k|/(2N + 1).
Ljubiša Stanković Digital Signal Processing 73
Consider a continuous-time signal, x (t), whose Fourier transform X (Ω) is nonzero within a
limited frequency band |Ω| ≤ Ωm , that is
The signal x (t) can be reconstructed at any t from the discrete-time samples, x (n∆t),
acquired at the instants t = n∆t with the sampling interval ∆t such that
π 1
∆t < = ,
Ωm 2 fm
x (n) = x (n∆t)∆t,
Now we will prove this fundamental statement in the analog to digital signal conversion.
Since a limited duration of X (Ω) is assumed, we can make its periodic extension
∞
X p (Ω) = ∑ X (Ω + 2Ω0 m) (2.40)
m=−∞
with the period in the frequency domain equal to 2Ω0 . For the reconstruction of the original
aperiodic Fourier transform, X (Ω), from this periodic extension, X p (Ω), it is of crucial
importance that the basic period of X p (Ω) contains undisturbed X (Ω), that is
This condition is satisfied if the extension period 2Ω0 and the maximum frequency in the
Fourier transform of the signal, Ωm , satisfy the inequality Ω0 > Ωm . In this case, it is
possible to make transition from X (Ω) to X p (Ω), and back, without losing any information.
Of course, that would not be the case if Ω0 > Ωm did not hold. By periodic extension of
X (Ω), with Ω0 ≤ Ωm the overlapping (aliasing) would have occurred in X p (Ω). It would
not be reversible. A periodic extension of the Fourier transform with Ω0 > Ωm is illustrated
in Fig. 2.8.
The periodic function X p (Ω) can be expanded into the Fourier series with coefficients
ZΩ0 Z∞
1 1
X− n = X p (Ω)e jπΩn/Ω0 dΩ = X (Ω)e jπΩn/Ω0 dΩ. (2.41)
2Ω0 2Ω0
− Ω0 −∞
The integration limits are extended to the infinity since X (Ω) = X p (Ω) within the basic
period interval and X (Ω) = 0 outside this interval.
74 Discrete-Time Signals and Transforms
X(Ω)
−Ωm Ωm Ω
Xp(Ω) = X(Ω)
−Ω0 < Ω < Ω0
−Ω0 −Ωm Ωm Ω0 Ω
Figure 2.8 The Fourier transform of a signal, X (Ω), such that X (Ω) = 0 for |Ω| > Ωm (top) and its periodically
extended version, X p (Ω), with the period 2Ω0 > 2Ωm (bottom).
Z∞
1
x (t) = X (Ω)e jΩt dΩ. (2.42)
2π
−∞
π π
X− n = x (t)|t=πn/Ω0 = x (n∆t)∆t with ∆t = ,
Ω0 Ω0
meaning that the Fourier series coefficients of the periodically extended Fourier transform of
X (Ω) are the samples of the signal x (t), acquired at the instants t = n∆t, with the sampling
interval ∆t = π/Ω0 . Therefore, the samples x (n∆t) of the signal x (t) and the periodically
extended Fourier transform, X p (Ω) are the Fourier series pair
∞
x (n∆t)∆t = X−n ←→ X p (Ω) = ∑ X (Ω + 2Ω0 m) (2.43)
m=−∞
with ∆t = π/Ω0 .
The reconstruction formula for x (t) form samples x (n∆t) then follows from
Z∞ ZΩ0
1 1
x (t) = X (Ω)e jΩt dΩ = X p (Ω)e jΩt dΩ. (2.44)
2π 2π
−∞ − Ω0
Ljubiša Stanković Digital Signal Processing 75
The periodic Fourier transform X p (Ω) is now expanded into Fourier series to produce
ZΩ0
!
∞
1 jπnΩ/Ω0
x (t)= ∑ Xn e e jΩt dΩ. (2.45)
2π n=−∞
− Ω0
ZΩ0
!
∞
1 jπnΩ/Ω0
x (t)= ∑ x (−n∆t)∆te e jΩt dΩ. (2.46)
2π n=−∞
− Ω0
follows by evaluating the simple integral over Ω. In this way, the signal x (t) is expressed for
any t, in terms of its samples x (n∆t).
Example 2.13. The sampling theorem and relation (2.47) can be used to prove that X (Ω) = X (e jω )
with Ω∆t = ω for |ω | < π for the signals x (t) sampled at the rate which satisfies the sampling
theorem.
the signal x (t), which satisfies the sampling theorem, can by written in terms of its samples,
according to the third row of (2.46), as
ZΩ0
!
∞
1 − j∆tnθ
x (t) = ∑ x (n∆t)∆te e jθt dθ.
2π n=−∞
− Ω0
Z∞ ZΩ0
!
∞
1 − j∆tnθ
X (Ω) = ∑ x (n∆t)∆te e jθt dθe− jΩt dt
2π n=−∞
−∞ − Ω0
∞ ZΩ0 ∞
= ∑ x (n∆t)∆t δ(θ − Ω)e− j∆tnθ dθ = ∑ x (n∆t)∆te− j∆tnΩ for |Ω| < Ω0 , (2.48)
n=−∞ n=−∞
− Ω0
76 Discrete-Time Signals and Transforms
resulting in
∞
X (Ω) = ∑ x (n)e− jωn = X (e jω ) for |ω | < π
n=−∞
with ω = Ω∆t and x (n) = x (n∆t)∆t.
Example 2.14. If the highest frequency in a signal x (t) is Ωm1 and the highest frequency in
a signal y(t) is Ωm2 what should be the sampling interval for the signals x (t)y(t) and
x (t − t1 )y∗ (t − t2 )? The highest frequency Ωm in a signal is used in the sense that the Fourier
transform of the signal is zero for |Ω| > Ωm .
After sampling the signal in the time domain with the sampling interval ∆t = 0.1, the Fourier
transform is periodically extended with the period 2Ω0 = 2π/∆t = 20π.
(a) The periodic Fourier transform is
2 2 2
X p (Ω) = · · · + + + + ...
1 + (Ω + 20π )2 1 + Ω2 1 + (Ω − 20π )2
Ljubiša Stanković Digital Signal Processing 77
The value of X p (Ω) at the period ending points ±10π will approximately be equal to
X p (±10π ) = 2/(1 + 100π 2 ) ∼ = 0.002. By comparing this value with the maximum Fourier
transform value X (0) = 2, we can conclude that the expected error due to the discretization of
this signal (since it does not strictly satisfy the sampling theorem) will be of a 0.1% order.
(b) The discrete-time signal obtained by sampling x (t) = exp(− |t|) with ∆t = 0.1 is
x (n) = 0.1e−0.1|n| . Its Fourier transform is already calculated with A = 0.1 and α = 0.1 in
equation (2.22). The result is
1 − e−0.2
X (e jω ) = 0.1 . (2.50)
1 − 2e−0.1 cos(ω ) + e−0.2
Therefore, the exact value of an infinite sum in X p (Ω) is X (e jω ) with ω = Ω∆t = 0.1Ω
∞
2 1 − e−0.2
X p (Ω) = ∑ 1 + (Ω + 20kπ )2
= 0.1
1 − 2e−0.1 cos(0.1Ω) + e−0.2
.
k=−∞
In this way, we have solved an interesting mathematical problem of finding a sum of an infinite
series.
For Ω = 0, the original value of the Fourier transform is X (0) = 2. In the signal that could
be reconstructed based on the discretized signal X p (0) = 0.1(1 + e−0.1 )/(1 − e−0.1 ) = 2.00167.
The increase of 0.00167 is due to the periods overlapping. This overlapping manifests produces
the aliasing error in X (0). The value of error corresponds to our previous conclusion of about a
0.1% error order.
is sampled with the sampling interval ∆t = 1/100 and a discrete-time signal x (n) = x (n∆t)∆t
is formed. The discrete-time signal is processed using the system whose impulse response is
1 1 1
h(n) = δ ( n ) + δ ( n − 2) + δ ( n + 2)
2 4 4
Find the output signal y(n) and the corresponding continuous-time signal y a (t).
The frequency response values at the frequencies of nonzero values in X (e jω ), within the
basic period −π ≤ ω < π, are H (e± jπ/4 ) = 1/2 and H (e± jπ/2 ) = 0. Therefore, the Fourier
transforms of the output signal is
π π π
Y (e jω ) = H (e jω ) X (e jω ) = [δ(ω + )e− jπ/4 + δ(ω − )e jπ/4 ] for − π ≤ ω < π.
200 4 4
The output discrete-time signal is
1
y(n) = cos(nπ/4 + π/4)∆t.
2
The corresponding continuous-time output signal is given by
1 π 1
y(t) = cos(n + π/4) = cos(25πt + π/4).
2 4∆t 2
Hint: Find the output signal for the same input and h(n) = ∑2i=−2 δ(n − i ).
2.5 PROBLEMS
Problem 2.1. Check the periodicity and find the period of signals:
(a) x (n) = sin(2πn/32),
(b) x (n) = cos(9πn/82),
(c) x (n) = e jn/32 , and
(d) x (n) = sin(πn/5) + cos(5πn/6) − sin(πn/4).
Problem 2.2. Check the linearity and time-invariance of the discrete system described by
equation
y(n) = x (n) + 2.
Problem 2.3. The output of a linear time-invariant discrete system to the input signal
x (n) = u(n) is y(n) = 2−n u(n). Find the impulse response h(n). Is the system stable?
Problem 2.4. Find the convolution
12 3
2
10 2
8
1
2
x(n)
6
4 0
1
2 −1
0
−2
−2 0 2 4 6 8 −2 0 2 4 6 8
Figure 2.9 Problem 2.7, impulse response h(n) (left) and Problem 2.14, discrete signal x (n) (right).
Find h1 (n) and y(n) = h(n) ∗n x (n), with x (n) = δ(n) − δ(n − 1).
Problem 2.8. Find the output of a discrete system whose impulse response is
Problem 2.10. In order to design a system whose output will produce an approximation of
the input signal derivative we may use a system with the impulse response
H (e jω ) ∼
= jω for small ω,
80 Discrete-Time Signals and Transforms
that is,
dH (e jω ) d2 H (e jω )
= j and = 0.
dω ω =0 dω 2 ω =0
Problem 2.11. Find the Fourier transform of the following discrete-time signal (triangular
window)
|n|
wT (n) = 1 − [u(n + N ) − u(n − N − 1)].
N+1
with N being an even number.
Zπ
1 sin2 (( N + 1)ω/2)
I= dω.
2π sin2 (ω/2)
−π
w ( n ) = w H ( n + N ) + w H ( n ) + w H ( n − N ),
1
w H (n) = [1 + cos(nπ/N )] [u( N + n) − u(n − N − 1)] .
2
Plot the window w(n) and express its Fourier transform as a function of the Fourier transform
of the Hann(ing) window WH (e jω ). Generalize the results for
K
w(n) = ∑ w H (n + kN ).
k =−K
Problem 2.14. A discrete-time signal x (n) is shown in Fig. 2.9 (right). Without calculating
its Fourier transform X (e jω ) find
Zπ Zπ 2
j0 jπ jω
X ( e ), X ( e ), X (e )dω, X (e jω ) dω,
−π −π
and a signal whose Fourier transform is the real part of X (e jω ), denoted by Re{ X (e jω )}.
Using this Fourier transform find the center of gravity of signal x (n) = e−n/4 u(n), defined
by
∞
∑ nx (n)
n=−∞
ng = ∞ .
∑ x (n)
n=−∞
sin(nπ/3)
(a) h(n) = , with h(0) = 1/3,
nπ
sin2 (nπ/3)
(b) h(n) = , with h(0) = 1/9,
(nπ )2
sin((n − 2)π/4)
(c) h(n) = , with h(2) = 1/4.
( n − 2) π
Show that the frequency response of the system with the impulse response h(n) =
sin(nπ/3)/nπ is H (e jω ) = 1 for |ω | ≤ π/3 and H (e jω ) = 0 for π/3 < |ω | < π. Find
the frequency responses in other two cases (b) and (c). Find the output of these three systems
to the input signal x (n) = sin(nπ/6).
Problem 2.18. An analytic part x a (n) of a discrete-time signal x (n) is defined in the
frequency domain by
2X (e jω ) for 0<ω<π
jω
Xa (e ) = X (e jω ) for ω=0 .
0 for −π ≤ ω < 0
where xh (n) is the Hilbert transform of x (n). Find the impulse response of the system that
transforms a signal x (n) into its Hilbert transform (Hilbert transformer).
Problem 2.19. The Fourier transform of a continuous signal x (t) is nonzero only within
3Ω1 < Ω < 5Ω1 . Find the maximum possible sampling interval ∆t such that the signal can
be reconstructed based on the samples x (n∆t).
82 Discrete-Time Signals and Transforms
Problem 2.20. For a signal whose Fourier transform is zero-valued for frequencies
|Ω| ≥ Ωm = 2π f m = π/∆t show that
Z∞
sin(π (t − τ )/∆t)
x (t) = x (τ ) dτ.
π (t − τ )
−∞
2.6 EXERCISE
Exercise 2.1. Calculate the convolution of the signals x (n) = n[u(n) − u(n − 3)] and
h(n) = δ(n + 1) + 2δ(n) − δ(n − 2).
Exercise 2.2. Find the convolution of the signals x (n) = e−|n| and h(n) = u(3 − n)u(3 +
n ).
Exercise 2.3. The output of a linear time-invariant discrete system to the input signal
x (n) = u(n) is y(n) = ( 31n + n)u(n). Find the impulse response h(n). Is the system
stable?
2.4. For signal j0 jπ
Exercise
Rπ jω
R π x (n)jω= 2nu(5 − n)u(n + 5) find the values of X (e ), X (e ),
−π X ( e ) dω, and −π | X ( e )| dω without the Fourier transform calculation. Check the
results by calculating the Fourier transform.
Exercise 2.5. For a signal x (n) at an instant m a signal y(n) = x (m − n) x ∗ (m + n)
is formed. Show that the Fourier transform of y(n) is real-valued. What is the Fourier
Ljubiša Stanković Digital Signal Processing 83
transform of y(n) if x (n) = A exp( jan2 /4 + j2ω0 n)? Find the Fourier transform of
z(m) = x (m − n) x ∗ (m + n) for a given n.
Note: The Fourier transform of y(n) is the Fourier transform of x (n) for a given m,
while the Fourier transform of z(m) is the Ambiguity function of x (n) for a given n.
Exercise 2.6. For a signal x (n) with the Fourier transform X (e jω ) find the Fourier transform
of x (2n). Find the Fourier transform of y1 (2n) = x (2n) and y1 (2n + 1) = 0. What is the
Fourier transform of x (2n + 1) and what is the Fourier transform of the signal y2 (n)
defined by y2 (2n) = 0 and y2 (2n + 1) = x (2n + 1). Check the result by showing that
Y1 (e jω ) + Y2 (e jω ) = X (e jω ).
Exercise 2.7. For a real-valued signal x (n) find the relation between its Fourier transform
X (e jω ) and the corresponding Hartley transform
∞
H (e jω ) = ∑ x (n)[cos(ωn) + sin(ωn)].
n=−∞
Write this relation if the signal is real-valued and even, that is, x (n) = x (−n).
Exercise 2.8. Systems with impulse responses h1 (n), h2 (n) and h3 (n) are connected in
cascade. If the impulse responses h2 (n) = h3 (n) = u(n) − u(n − 2) and the resulting
impulse response is h(n) = δ(n) + 5δ(n − 1) + 10δ(n − 2) + 11δ(n − 3) + 8δ(n − 4) +
4δ(n − 5) + δ(n − 6). Find the impulse response h1 (n).
Exercise 2.9. Continuous-time signal x (t) = sin(100πt) + cos(180πt) + sin(200πt +
π/4) is sampled with the sampling interval ∆t = 1/125 and used as an input to the system
with the transfer function H (e jω ) = 1 for |ω | < 3π/4 and H (e jω ) = 0 for |ω | ≥ 3π/4.
What is the discrete-time output of this system? What is the corresponding continuous-time
output signal? What should be the sampling interval so that the continuous-time output signal
y(t) is equal to the input signal x (t)?
84 Discrete-Time Signals and Transforms
2.7 SOLUTIONS
Solution 2.1. (a) The signal shifted for N is given by x (n + N ) = sin(2π (n + N )/32).
The equality x (n + N ) = x (n) holds for 2πN/32 = 2kπ, k = 1, 2, . . . . The smallest
integer N satisfying the previous condition is N = 32, with k = 1. The period of this signal
is N = 32.
(b) For the signal x (n) = cos(9πn/82), the equality x (n) = x (n + N ) = cos(9πn/82 +
9πN/82) holds for 9πN/82 = 2kπ, k = 1, 2, . . . . The period follows from N = 164k/9
and it is equal to N = 164 for k = 9.
(c) In this case x (n + N ) = e j(n/32+ N/32) . The relation N/32 = 2kπ, k = 1, 2, . . . ,
produces N = 64kπ. This is not an integer for any k, meaning that the signal x (n) is not
periodic.
(d) The periods of the signal components are obtained from N1 = 10k, N2 = 12k/5,
and N3 = 8k. The smallest value of N when N1 = N2 = N3 = N is N = 120 containing 12
periods of sin(πn/5), 50 periods of cos(5πn/6), and 15 periods of sin(πn/4).
Solution 2.2. In order to establish if the linearity property holds we have to check the
system output to a linear combination of the input signals x1 (n) and x2 (n),
T { x (n − N )} = x (n − N ) + 2 = y(n − N ).
Solution 2.3. The impulse response is defined by h(n) = T {δ(n)}, It can be written as
∞
y (0) = ∑ x (k) x (−k) = x (0) x (0) = 1
k=−∞
∞
y (1) = ∑ x ( k ) x (1 − k ) = x (0) x (1) + x (1) x (0) = 2
k =−∞
∞
y(−1) = ∑ x (k) x (−1 − k) = 0
k =−∞
∞
y (2) = ∑ x ( k ) x (2 − k ) = 3
k =−∞
..
.
The calculation process is illustrated in Fig. 2.10, along with the final result y(n).
1.5 1.5
1 1
x(−k)
x(k)
0.5 0.5
0 0
−0.5 −0.5
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
k k
1.5 1.5
1 1
x(1−k)
x(2−k)
0.5 0.5
0 0
−0.5 −0.5
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
k k
1.5 6
x(n)* x(n)
1
x(−1−k)
4
0.5
2
0
−0.5 0
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
k n
Figure 2.10 Illustration of the convolution calculation for a discrete-time signal x (n) = u(n) − u(n − 5).
86 Discrete-Time Signals and Transforms
∞
y(n) = x (n) ∗n h(n) = ∑ x (k)h(n − k) = (2.51)
k =−∞
∞
= ∑ e−|k| (u((n − k) + 5) − u((n − k ) − 6))
k=−∞
with
1, for k ≤ n + 5 1, for k ≤ n − 6
u((n − k) + 5) = and u((n − k) − 6) =
0, for k > n + 5 0, for k > n − 6
we get
1, for n − 6 < k ≤ n + 5
(u((n − k) + 5) − u((n − k) − 6)) =
0, elsewhere.
n +5
y(n) = ∑ e−|k| .
k = n −5
Since |k| = k, for k ≥ 0, and |k| = −k, for k < 0, we have three cases:
(1) For n + 5 ≤ 0, that is, n ≤ −5, we have k ≤ 0 for all terms. Therefore |k | = −k, and
n +5
1 − e11 e −5 − e6 e0.5 e−5.5 − e5.5 sinh 5.5
y(n) = ∑ e k = e n −5 = en = en 0.5 −0.5 0.5
= en .
k = n −5
1−e 1−e e e −e sinh 0.5
(2) For n − 5 ≥ 0, the lowest k = n − 5 is greater than 0. Then, k ≥ 0 for all terms and
n +5
1 − e−11 −n e
−0.5 e5.5 − e−5.5 sinh 5.5
y(n) = ∑ e − k = e − n +5 − 1
= e − 0.5 0.5 − 0.5
= e−n .
k = n −5
1−e e e −e sinh 0.5
(3) For −5 < n < 5, the index k can take positive and negative values. The convolution is
split into two sums as
n +5 −1 n +5 5− n n +5
y(n) = ∑ e−|k| = ∑ ek + ∑ e−k = ∑ e−k + ∑ e−k
k = n −5 k = n −5 k =0 k =1 k =0
1−e −( 5 − n ) −(
1 − e n +6) 1 − e n −5 −(n+6)
1/2 1 − e
= e −1 + = e−1/2 + e
1 − e −1 1 − e −1 e1/2 − e−1/2 e1/2 − e−1/2
1/2
cosh(1/2) − e cosh(n)
= .
sinh(1/2)
Ljubiša Stanković Digital Signal Processing 87
Solution 2.6. (a) For a parallel connection of systems the output signal is given by
y ( n ) = y1 ( n ) + y2 ( n ) + y3 ( n )
∞ ∞ ∞
= ∑ h1 ( k ) x ( n − k ) + ∑ h2 ( k ) x ( n − k ) + ∑ h3 ( k ) x ( n − k )
k =−∞ k =−∞ k =−∞
∞
= ∑ [h1 (k) + h2 (k) + h3 (k)] x (n − k).
k =−∞
(b) For a cascade of systems with the impulse responses h2 (n) and h3 (n), the output from
the first system is
∞ ∞
y2 ( n ) = ∑ h2 ( k ) x ( n − k ) = ∑ h2 ( n − k ) x ( k ).
k =−∞ k =−∞
The input to the second system is equal to the output of the first system, while the output of
the second system is
∞ ∞ ∞
y3 ( n ) = ∑ h3 ( m ) y2 ( n − m ) = ∑ h3 ( m ) ∑ h2 ( n − m − k ) x ( k )
m=−∞ m=−∞ k=−∞
∞ ∞ ∞
= ∑ ∑ h3 ( m ) h2 ( n − m − k ) x ( k ) = ∑ h23 (n − k) x (k )
k =−∞ m=−∞ k=−∞
where
∞
h23 (n) = ∑ h3 ( m ) h2 ( n − m ) = h2 ( n ) ∗ n h3 ( n ).
m=−∞
with
∞ n
e−bn − eb
h2 (n) ∗n h3 (n)= ∑ e−b(n−m) u ( n − m ) u ( m ) = u ( n ) ∑ e−b(n−m) = u ( n ).
m=−∞ m =0 1 − eb
88 Discrete-Time Signals and Transforms
Solution 2.7. Since we know the impulse response h2 (n), we can calculate
From the last relation follows: h1 (n) = 0 for n < 0, h1 (0) = h(0) = 1, h1 (1) = h(1) −
2h1 (0) = 3, h1 (2) = h(2) − 2h1 (1) − h1 (0) = 3, h1 (3) = 2, h1 (4) = 1, h1 (5) = 0, and
h1 (n) = 0 for n > 5. The output signal for the input x (n) = δ(n) − δ(n − 1) can be easily
calculated as
y ( n ) = h ( n ) − h ( n − 1).
Solution 2.8. Instead of a direct convolution we will calculate the frequency response,
H (e jω ), of the discrete system. First, we will find the Fourier transform of the signal
e−n/2 u(n),
∞
1
H1 (e jω ) = ∑ e−n/2 e− jωn = −(1/2+ jω )
n =0 1 − e
and differentiate both sides with respect to ω
∞
− je−(1/2+ jω )
−j ∑ ne−n/2 e− jωn = (1 − e−(1/2+ jω) )2 .
n =0
∞
e−(1/2+ jω )
H (e jω ) = ∑ ne−n/2 e− jωn = (1 − e−(1/2+ jω) )2 .
n =0
∞
1 − an 1 1 − an
jω
Xa (e ) = ∑ e u(n) + − e u(−n − 1) e− jωn
n=−∞ 2 2 2
1 ∞ 1 a+ jω
2 2e
=
1 − e− a− jω
+ ∑ πδ(ω + 2kπ ) −
1 − e a+ jω
k=−∞
∞
1
X (e jω ) = lim Xa (e jω ) = + ∑ πδ(ω + 2kπ ).
a →0 1 − e− jω k=−∞
The result from (2.23) is used to transform the constant signal equal to 1/2.
(b) This signal can be written in the form
∞ ∞ ∞
Y (e jω ) = ∑ ∑ x (n + kN )e− jωn = ∑ X (e jω )e jωkN
k =−∞ n=−∞ k =−∞
∞
jω
= X (e ) ∑ e jωkN .
k =−∞
Solution 2.10. For the impulse response h(n) the frequency response is
a + 2b = 1/2
a + 4b = 0
1
h(n) = δ(n + 1) − δ(n − 1) − (δ(n + 2) − δ(n − 2)).
4
1
wT (n) = w R (n) ∗n w R (n)
N+1
where w R (n) = u(n + N/2) − u(n − N/2 − 1) is the rectangular window. Since
sin(ω N2+1 )
WR (e jω ) = ,
sin(ω/2)
we have
1 1 sin2 (ω N2+1 )
WT (e jω ) = WR (e jω )WR (e jω ) = .
N+1 N + 1 sin2 (ω/2)
Solution 2.12. This integral represents the energy of a discrete-time signal whose Fourier
transform is defined by
sin(ω N2+1 )
X (e jω ) = .
sin(ω/2)
Ljubiša Stanković Digital Signal Processing 91
This signal is a rectangular window, x (n) = u(n + N/2) − u(n − N/2 − 1). Its energy is
Zπ N/2 N/2
1 sin2 (( N + 1)ω/2) 2
I=
2π sin2 (ω/2)
dω = ∑ x ( n ) = ∑ 1 = N + 1.
−π n=− N/2 n=− N/2
1
w H (n) = [1 + cos(nπ/N )] [u( N + n) − u(n − N − 1)]
2
w(n) = w H (n) + w H (n − N )
1 1
= [1 + cos(nπ/N )] + [1 + cos((n − N )π/N )]
2 2
1 1
= 1 + cos(nπ/N ) + cos(nπ/N − π ) = 1.
2 2
w(n) = w H (n + N ) + w H (n) = 1.
K
w(n) = ∑ w H (n + kN )
k =−K
92 Discrete-Time Signals and Transforms
we get
0 for n < −(K + 1) N
π
12 1 + cos((n + KN ) N ) for −(K + 1) N + 1 ≤ n ≤ −KN + 1
w(n) = 1 for −KN ≤ n ≤ KN − 1
1 π
2 1 + cos((n − KN ) N ) for KN ≤ n ≤ (K + 1) N − 1
0 for n > ( K + 1) N − 1
with
K
sin(ω (2K + 1) N/2)
W (e jω ) = WH (e jω ) ∑ e− jωkN = WH (e jω ) .
k =−K
sin(ωN/2)
Similar results hold for the Hamming and the triangular window. These results can be
generalized for shifts of N/2, N/4,. . .
For very large K the second term variations in W (e jω ) are much faster than the variations
of WH (e jω ). Thus, for large K the Fourier transform W (e jω ) approaches to the Fourier
transform of a rectangular window of the width (2K + 1) N.
Solution 2.14. Based on the definition of the Fourier transform of discrete-time signals,
∞ ∞
X (e j0 ) = ∑ x (n) = 7, X (e jπ ) = ∑ x (n)(−1)n = 1,
n=−∞ n=−∞
Zπ Zπ 2 ∞
jω
X (e )dω = 2πx (0) = 4π, X (e jω ) dω = 2π ∑ | x (n)|2 = 30π.
−π −π n=−∞
1
y(n) = ( x (n) + x ∗ (−n)).
2
∞
e−1/4− jω
∑ nx (n)
n=−∞ Y (e j0 ) (1−e−1/4− jω )2 |ω =0 1
ng = ∞ = = 1
= = 3.52.
X (e j0 ) e1/4 − 1
∑ x (n) 1−e−1/4− jω |ω =0
n=−∞
is
π/3
Z π/3
1 e jωn sin(πn/3)
h(n) = e jωn dω = = .
2π 2jπn −π/3 πn
−π/3
The frequency response value at the input signal frequency ω = ±π/6 is H (e± jπ/6 ) = 1.
The output signal is given by y(n) = sin(nπ/6).
(b) The frequency response is H (e jω ) ∗ω H (e jω ), resulting in y(n) = 0.25 sin(nπ/6).
(c) The output signal is equal to y(n) = sin((n − 2)π/6) = sin(nπ/6 − π/3).
Solution 2.17. For the signal x (t) = cos(20πt + π/4) + sin(90πt), the corresponding
discrete-time signal is
π ∞ h − jπ/4
i
X (e jω ) = ∑ δ ( ω − 0.2π + 2kπ ) e jπ/4
+ δ ( ω + 0.2π + 2kπ ) e
100 k=− ∞
∞ h i
π
+ ∑
j100 k=−
δ ( ω − 0.9π + 2kπ ) − δ ( ω + 0.9π + 2kπ ) .
∞
jω jω
H(e ), X(e )
1
jω jω
H(e ), X(e )
1
jω jω
H(e ), X(e )
1
Figure 2.11 Illustration of the system output with various sampling intervals (a)-(c).
(b) If the signal is sampled with the sampling interval ∆t = 1/50 the discrete-time signal is
π ∞
X (e jω ) = ∑ [δ(ω − 0.4π + 2kπ )e jπ/4 + δ(ω + 0.4π + 2kπ )e− jπ/4 ]
50 k=− ∞
π ∞
+ ∑ [δ(ω − 1.8π + 2kπ ) − δ(ω + 1.8π + 2kπ )].
j50 k=− ∞
The Fourier transform components within the basic period, −π ≤ ω < π, are
π
X (e jω ) = [δ(ω − 0.4π )e jπ/4 + δ(ω + 0.4π )e− jπ/4 ]
50
π
[δ(ω − 1.8π + 2π ) − δ(ω + 1.8π − 2π )]
+
j50
π π
= [δ(ω − 0.4π )e jπ/4 + δ(ω + 0.4π )e− jπ/4 ] + [δ(ω + 0.2π ) − δ(ω − 0.2π )].
50 j50
Ljubiša Stanković Digital Signal Processing 95
The component − sin(10πt) does not correspond to any frequency in the input signal, Fig.
2.11(middle). This effect is illustrated in Fig. 2.12.
x(n)
Figure 2.12 Illustration of the aliasing caused frequency change, from signal sin(90πt) to signal − sin(10πt).
(c) For the sampling interval ∆t = 3/100 the discrete-time signal is of the form
The Fourier transform components within the basic period, −π ≤ ω < π, are given by
3π
X (e jω ) = [δ(ω − 0.6π )e jπ/4 + δ(ω + 0.6π )e− jπ/4 ]
100
3π
+ [δ(ω − 2.7π + 2π ) − δ(ω + 2.7π − 2π )].
j100
Zπ
2 sin2 (πn/2)
h(n) = sign(ω )e jωn dω = .
πn
−π
For n = 0 the impulse response is h(0) = 0. The discrete-time Hilbert transformer, in the
frequency and the time domain, is shown in Fig. 2.13.
H(ejω) h(n)
1 2/π
−2π −π 0 π 2π ω
0 n
Figure 2.13 Frequency and impulse response of the discrete-time Hilbert transformer.
Solution 2.19. From the definition and conditions for the sampling theorem we could
conclude that the maximum sampling interval should be related to the maximum frequency
5Ω1 as ∆t = π/(5Ω1 ), corresponding to the periodical extension of the Fourier transform
X (Ω) with period 10Ω1 . However, in this case, there is no need to use such a large period in
order to achieve that two periods do not overlap. It is sufficient to use the period equal to
2Ω1 , as shown in Fig. 2.14. In this case, we will be able to reconstruct the signal, with some
additional processing.
1.5 1.5
1 1
Xp(Ω)
X(Ω)
0.5 0.5
0 0
−3 −2 −1 0 1 2 3 4 5 6 7 −3 −2 −1 0 1 2 3 4 5 6 7
Ω/Ω1 Ω/Ω1
Figure 2.14 Problem 2.19: illustration of the Fourier transform periodic extension.
It is obvious that, after the signal sampling with ∆t = π/Ω1 (periodic extension of
Fourier transform with 2Ω1 ), the basic period −Ω1 < Ω < Ω1 will contain the original
Ljubiša Stanković Digital Signal Processing 97
Solution 2.20. For signal whose Fourier transform is zero for frequencies |Ω| ≥ Ωm =
2π f m = π/∆t hods
X (Ω) = X (Ω) H (Ω)
where
1 for |Ω| < Ωm = π/∆t
H (Ω) = .
0 for |Ω| ≥ Ωm = π/∆t
The impulse response of H (Ω) is equal to
π/∆t
Z
1 sin(πt/∆t)
h(t) = e jΩt dΩ = .
2π πt
−π/∆t
Z∞ Z∞
sin(π (t − τ )/∆t)
x (t) = x (τ )h(t − τ )dτ = x (τ ) dτ.
π (t − τ )
−∞ −∞
holds if the Fourier transform of signal X (Ω) is periodically extended with the period
2π/∆t ≥ 2Ωm , to produce
∞
2π
X (Ω) ∗Ω ∑ 2πδ Ω −
∆t
k = X p ( Ω ).
k =−∞
Convolution in the frequency domain of two Fourier transforms corresponds to the product
of signals in the time domain, that is
∞
x (t) ∑ δ(t + n∆t)∆t = IFT{ X p (Ω)} = x p (t). (2.53)
n=−∞
is used.
From (2.52) and then (2.53) the discrete-time form follows
Z∞ ∞
x (t) = x p (t) ∗t h(t) = x (τ ) ∑ δ(τ − n∆t)h(t − τ )∆tdτ
−∞ n=−∞
∞ ∞ π
sin( ∆t (t − n∆t))
= ∑ x (n∆t)h(t − n∆t)∆t = ∑ x (n∆t) π . (2.54)
n=−∞ n=−∞ ∆t ( t − n∆t )
H(Ω)
X(Ω)
−Ω
m
Ωm Ω
Xp(Ω) = X(Ω)
H(Ω) −Ω < Ω < Ω
0 0
X(Ω)
−Ω0 −Ωm Ωm Ω0 Ω
Xp(Ω) = X(Ω)
−Ω < Ω < Ω
0 0
X(Ω)
−Ω0 −Ωm Ωm Ω0 Ω
Figure 2.15 Smoothed filter in the sampling theorem illustration (first two graphs) versus original sampling
theorem relation within filtering framework.
illustrated in Fig. 2.15. The sampling step should be (Ωm + ∆Ω 2 ) = π/∆t so that the
m
periodic extension of X (Ω) H (Ω) does not include overlapped X (Ω) values. The impulse
response h(t) can be then used in the reconstruction formula
∞
x (t) = ∑ x (n∆t)h(t − n∆t),
n=−∞
∆Ωm
with a reduction of the sampling interval to ∆t = π/(Ωm + 2 ) with respect to
∆t = π/Ωm .
∞
X1 ( Ω ) = ∑ X (Ω + 2πn/∆t),
n−−∞
∞
X2 ( Ω ) = ∑ X (Ω + 2πn/∆t)e j(Ω+2πn/∆t) a .
n−−∞
Within the basic period (considering the positive frequencies 0 ≤ Ω < Ωm ), only two periods
overlap
The second term X (Ω − 2π/∆t) in these relations is the overlapped period (aliasing) of
the Fourier transform, that should be eliminated using these two equations. The Fourier
transform X (Ω) of the original signal follows in the form
Similarly for negative frequencies, within the basic period −Ωm < Ω < 0, follows
Therefore, the signal can be reconstructed from two independent discrete-time signals
undersampled with factor of two.
A similar result could be derived for N independently sampled, N times undersampled
signals.
or
with
1 x (t0 + ∆t) + x (t0 − ∆t)
Ω0 = arccos .
∆t 2x (t0 )
The condition for a unique solution is that the argument of cosine is 0 ≤ Ω0 ∆t ≤ π, limiting
the approach to small values of ∆t.
We will discuss the discrete complex-valued signal as well. For a complex sinusoid
x (n) = A exp( j2πk0 n/N + φ0 ), with available two samples x (n1 ) = A exp( jϕ(n1 )) and
x (n2 ) = A exp( jϕ(n2 )), from
x ( n1 )
= exp( j2πk0 (n1 − n2 )/N )
x ( n2 )
follows
2πk0 (n1 − n2 )/N = ϕ(n1 ) − ϕ(n2 ) + 2kπ,
where k is an arbitrary integer. Then
ϕ ( n1 ) − ϕ ( n2 ) k
k0 = N+ N. (2.55)
2π (n1 − n2 ) n1 − n2
Let us analyze the ambiguous term kN/(n1 − n2 ) role in the determination of k0 . For
n1 − n2 = 1, this term is kN, meaning that any frequency k0 would be ambiguous with kN.
Any value k0 + kN for k 6= 0, in this case, will be outside the basic period 0 ≤ k ≤ N − 1.
Thus, we may find k0 in a unique way, within 0 ≤ k0 ≤ N − 1. However, for n1 − n2 = L > 1,
the terms kN/(n1 − n2 ) = kN/L produce shifts within the frequency basic period. Then
several possible solutions for the frequency k0 are obtained. For example, for N = 16 and
k0 = 5 if we use n1 = 1 and n2 = 5, a possible solution to (2.55) is k0 = 5, but also
k0 = 5 + 16k/4,
or k0 = 9, k0 = 13, and k0 = 1, for k0 within 0 ≤ k0 ≤ 15, are possible solutions for frequency
of the considered discrete-time signal.
Solution 2.23. For the absolute values of an even and odd part of the signal holds
x (n) + x (−n) 2 x (n) − x (−n) 2
| xe (n)|2 + | xo (n)|2 = +
.
2 2
| x (n)|2 + | x (−n)|2
| xe (n)|2 + | xo (n)|2 = = A2s (n).
2
Ljubiša Stanković Digital Signal Processing 101
q
Obviously, | xe (n)|2 ≤ A2s (n) and | xo (n)|2 ≤ A2s (n). Replacing | xo (n)| = A2s (n) − | xe (n)|2
into | xe (n)| + | xo (n)| we get
q
| xe (n)| + | xo (n)| = | xe (n)| + A2s (n) − | xe (n)|2 .
The minimum value is achieved at the interval ending points for χ = 0 or χ = As (n),
producing the final result
√
As (n) ≤ | xe (n)| + | xo (n)| ≤ 2As (n).
Chapter 3
Discrete Fourier Transform
ISCRETE - TIME signals can be processed on digital computers in the time domain.
N −1
DFT{ x (n)} = X (k) = ∑ x (n)e− j2πkn/N (3.1)
n =0
for k = 0, 1, 2, . . . , N − 1.
In order to establish the relation between the DFT with the Fourier transform of discrete-
time signals, consider a discrete-time signal x (n) of limited duration. Assume that nonzero
samples of x (n) are within 0 ≤ n ≤ N0 − 1. The Fourier transform of this discrete-time
signal is given by
N0 −1
X (e jω ) = ∑ x (n)e− jωn .
n =0
The DFT values can be considered as the frequency domain samples of the Fourier transform
of discrete-time signals, taken with the sampling interval ∆ω = 2π/N in the frequency
domain, where N is the number of samples of X (e jω ) within the period −π ≤ ω < π,
X (k) = X (e j2πk/N ) = X (e jω ) . (3.2)
ω =k∆ω =2πk/N
In order to examine how the Fourier transform sampling in the frequency domain
influences the signal in the time domain, we will form a periodic extension of the discrete-
time signal x (n) with the period N equal to the number of the samples within the basic
frequency domain period, such that N ≥ N0 , Fig. 3.1.
102
Ljubiša Stanković Digital Signal Processing 103
x(n)
x(n) = x(t) ∆t
t = n∆t
0 N n
0
x (n)
p
0 N n
With N being greater or equal to the signal duration N0 , we will be able to reconstruct
the original signal x (n) from its periodic extension x p (n). Furthermore, we will assume that
the periodic signal x p (n) is formed from the samples of periodic continuous-time signal
x p (t) with a period T (corresponding to N signal samples within the period, T = N∆t, and
x p (n) = x p (n∆t)∆t). The Fourier series coefficients of x p (t) are defined by
ZT
1
Xk = x p (t)e− j2πkt/T dt.
T
0
Assuming that the sampling theorem is satisfied, the integral can be replaced by a sum (in
the sense of Example 2.13)
N −1
1
Xk = ∑ x (n∆t)e− j2πkn∆t/T ∆t
T n =0
with x p (t) = x (t) within 0 ≤ t < T. Using T/∆t = N, x (n∆t)∆t = x (n), and X (k) = TXk
this sum can be written in the form
N −1
X (k) = ∑ x (n)e− j2πkn/N . (3.3)
n =0
104 Discrete Fourier Transform
Therefore, the relation between the DFT and the Fourier series coefficients is
X (k ) = TXk . (3.4)
The inverse DFT is obtained by multiplying both sides of the DFT definition (3.1) by
e j2πkm/N and summing over k
N −1 N −1 N −1
∑ X (k)e j2πmk/N = ∑ x (n) ∑ e j2πk(m−n)/N
k =0 n =0 k =0
with
N −1
1 − e j2π (m−n)
∑ e j2πk(m−n)/N = = Nδ(m − n),
k =0 1 − e j2π (m−n)/N
for 0 ≤ m, n ≤ N − 1. The inverse discrete Fourier transform (IDFT) of a signal x (n) is
N −1
1
x (n) = ∑ X (k)e j2πnk/N . (3.5)
N k =0
for 0 ≤ n ≤ N − 1.
The signal calculated using the IDFT is, by definition, periodic with the period N since
N −1
1
x (n + N ) = ∑ X (k)e j2π (n+ N )k/N = x (n).
N k =0
Therefore the DFT of a signal x (n) calculated using the signal samples within
0 ≤ n ≤ N − 1 assumes that the signal x (n) is periodically extended with the period
N, that is
∞
IDFT{DFT{ x (n)}} = ∑ x (n + mN )
m=−∞
∞
with ∑ x (n + mN ) = x (n) for 0 ≤ n ≤ N − 1.
m=−∞
The values of this periodical extension within the basic period are equal to x (n). This is a
circular extension of signal x (n). The following notations are also used for this kind of the
Ljubiša Stanković Digital Signal Processing 105
If the original signal x (n), used in the DFT calculation, is aperiodic, then
assuming that the initial DFT was calculated for signal samples x (n) within 0 ≤ n ≤ N − 1.
In literature it is quite common to use the same notation for both x (n) and
IDFT{DFT{ x (n)}} having in mind that any DFT calculation with N signal samples
implicitly assumes a periodic extension of the original signal x (n) with period N. Thus, we
will use this kind of notation, except in the cases when we want to emphasize a difference in
the results when the inherent periodicity in the signal (when the DFT is used) is not properly
taken into account.
Example 3.1. For the signals x (n) = 2 cos(2πn/8) for 0 ≤ n ≤ 7 and x (n) = 2 cos(2πn/16) for
0 ≤ n ≤ 7 plot the periodic signals IDFT{DFT{ x (n)}} with N = 8 without calculating the
DFTs.
x(n) x(n)
0 N=8 n 0 N=8 n
...x(n−N)+x(n)+x(n+N)+.. ...x(n−N)+x(n)+x(n+N)+..
0 N=8 n 0 N=8 n
Figure 3.2 Signals x (n) = 2 cos(2πn/8) for 0 ≤ n ≤ 7 (left) and x (n) = 2 cos(2πn/16) for 0 ≤ n ≤ 7 (right)
along with their periodic extensions IDFT{DFT{ x (n)}} with N = 8.
106 Discrete Fourier Transform
Example 3.2. For the signal x (n), whose values are x (0) = 1, x (1) = 1/2, x (2) = −1, and
x (3) = 1/2, find the DFT with N = 4. What is the IDFT for n = −2?
The IDFT is
1 3
[1 + cos(2πk/4) + (−1)k+1 ]e j2πnk/4 ,
4 k∑
x (n) =
=0
for 0 ≤ n ≤ 3. The DFT and IDFT inherently assume the signal and its Fourier transform
periodicity. Thus, the result for n = −2 is
1 3 k 1 3 k
x (−2) = ∑ X (k)e j2π (−2) 4 = ∑ X (k)e j2π (4−2) 4 = x (4 − 2) = x (2) = −1.
4 k =0 4 k =0
Matrix notation and calculation complexity: The DFT can be written in a matrix form as
X (0) 1 1 ··· 1 x (0)
−1)
X (1) 1 e − j 2π ··· e − j 2π ( N x (1)
N N
.. = .. .. .. .. .. (3.6)
.
. . . .
.
2π ( N −1) 2π ( N −1)( N −1)
X ( N − 1) 1 e− j N · · · e− j N x ( N − 1)
or
X = Wx, (3.7)
where X and x are the vectors containing the signal and its DFT values
respectively, while W is the discrete Fourier transform matrix with the coefficients
1 1 ··· 1
1 WN 1 ··· WNN −1
W= .. .. .. .. , (3.8)
. . . .
( N −1) ( N −1)( N −1)
1 WN · · · WN
Ljubiša Stanković Digital Signal Processing 107
where
k
WN = e− j2πk/N
is used to simplify the notation.
The number of additions to calculate a DFT is N − 1 for every X (k) in (3.1). Since
there are N DFT coefficients, the total number of additions is N ( N − 1). From the matrix
in (3.6) we can see that the multiplications are not needed for calculation of X (0). There is
no need for a multiplication in the first term of every coefficient calculation as well. If we
neglect the fact that some other terms in matrix (3.6) may also take values 1, −1, j, or − j
then the number of multiplications is ( N − 1)2 . The order of the number of multiplications
and the number of additions for the DFT calculation is N 2 .
The inverse DFT in a matrix form is
x = W−1 X, (3.9)
Most of the DFT properties can be derived in the same way as for the Fourier transform and
the Fourier transform of discrete-time signals.
N −1
1 2π
IDFT{ X (k )e− j2πkn0 /N } = ∑ X (k)e− j2πkn0 /N e j N kn
N k =0
N −1
1 j 2π
N k ( n − n0 ) (3.10)
=
N ∑ X (k)e = x ( n − n0 ).
k =0
Here x (n − n0 ) is the signal obtained when x (n) is periodically extended with N first
and then this periodic signal is shifted for n0 . The basic period of the original signal
x (n) is now within n0 ≤ n ≤ N + n0 − 1.
This kind of shift in periodic signals, used in the above relation, is also referred to as a
circular shift. Thus, with the circular shift
X ∗ ( k ) = X ( N − k ).
4. Parseval’s theorem of discrete-time periodic signals relates the energy in the time
domain and the frequency domain
N −1 N −1 N −1 1 N −1 ∗
∑ | x (n)|2 = ∑ x (n) x∗ (n) = ∑ x (n) ∑ X (k)e j2πnk/N
n =0 n =0 n =0
N k =0
N −1 N −1 N −1
1 1
= ∑ X ∗ (k) ∑ x (n)e− j2πnk/N = ∑ | X (k)|2 .
N k =0 n =0
N k =0
5. Convolution of two periodic signals x (n) and h(n), whose period is N, is defined
by
N −1
y(n) = ∑ x ( m ) h ( n − m ).
m =0
The DFT of this signal is
N −1 N −1
Y (k) = DFT{y(n)} = ∑ ∑ x (m)h(n − m)e− j2πnk/N = X (k) H (k). (3.13)
n =0 m =0
Thus, the DFT of the convolution of two periodic signals is equal to the product of
the DFTs of individual signals. Since the convolution is performed on periodic signals
(the DFT inherently assumes signals periodicity), a circular shift of signals is assumed
in the calculation. This kind of convolution is called circular convolution.
Relation (3.13) indicates that we can calculate convolution of two aperiodic discrete-
time signals of a limited duration in the following way:
Ljubiša Stanković Digital Signal Processing 109
• Calculate the DFTs of signals x (n) and h(n) with N nonzero samples, to obtain
X (k) and H (k). At this point, inherently, we make periodic extensions of x (n)
and h(n), with a period N.
• Multiply these two DFTs to obtain the DFT of the output signal Y (k ) =
X ( k ) H ( k ).
• Calculate the inverse DFT to get the convolution (the output signal with N
samples)
y(n) = IDFT{Y (k)}.
This procedure looks computationally more complex than the direct calculation of
convolution, by definition. However, due to very efficient and fast routines for the
DFT and the IDFT calculation, this way of calculating the convolution could be more
efficient than the direct one.
In using this procedure, we have to take care about the length of signals and their DFTs
that assume periodic extensions.
Calculate the convolution x (n) ∗ x (n). Periodically extend signals with period N = 7 and
calculate the circular convolution (corresponding to the DFT based convolution calculation
with N = 7). This value of N satisfies the condition that it is larger than the signal duration.
Compare the results. What value of N should be used for the period so that the direct convolution
corresponds to one period of the circular convolution?
⋆The signal x (n) and its reversed version x (−n), along with some shifted signals used
in the convolution calculation, are presented in Fig. 3.3.
In the circular (DFT) calculation, for example, at n = 0, the convolution value is
6
x p (n) ∗n x p (n) = ∑ x p (m) x p (0 − m) = 1 + 0 + 0 + 1 + 1 + 0 + 0 = 3.
m =0
In addition to the term x (0) x (0) = 1 which exists in the aperiodic convolution, two terms
for m = 3 and m = 4 appeared due to the periodic extension of the signal. They made that
the circular convolution value differs from the convolution of original aperiodic signals.
The same situation occurred for n = 1 and n = 2. For n = 3, 4, and 5 the correct result
for the aperiodic convolution is obtained using the circular convolution.
From the previous calculation, it could be concluded that if the signal periods in the
calculation of the circular convolution were separated by at least two more zero signal
samples (if the period N were N ≥ 9) this difference would not occur (overlapping
of the signal samples in the basic period with the extended period samples would be
avoided), as shown in Fig. 3.4 for N = 9. Then one period of the circular convolution, for
0 ≤ n ≤ N − 1, would correspond to the original aperiodic convolution.
110 Discrete Fourier Transform
1.5 6
x(n)* x(n)
1 4
x(n)
0.5
2
0
−0.5 0
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
n n
1.5 1.5
x (−m+1)
1 1
xp(m)
0.5 0.5
0 0
p
−0.5 −0.5
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
n n
1.5 1.5
x (−m+3)
1 1
xp(−m)
0.5 0.5
0 0
p
−0.5 −0.5
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
n n
1.5 6
x (n)* x (n)
xp(−m+5)
1 4
p
0.5
2
0
p
−0.5 0
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
n n
Figure 3.3 Illustration of the discrete-time signal convolution and circular convolution for signals whose length is
5 and the circular convolution is calculated with N = 7.
1.5
1
x (−m)
xp(m)
0.5
p
0
−0.5
−15 −10 −5 0 5 10 15 n
n
1.5 6
x (n)* x (n)
xp(−m+8)
1 4
p
0.5
2
0
p
−0.5 0
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
n n
Figure 3.4 Illustration of the discrete-time signal circular convolution for signals whose length is 5 and the circular
convolution is calculated with N = 9.
Ljubiša Stanković Digital Signal Processing 111
Generalization: If a signal x (n) is of the length M, then we can calculate its DFT with
any N ≥ M, and the signal will not overlap with its extended periods, implicitly added
using the DFT. If a signal h(n) is of the length L, then we can calculate its DFT with
any N ≥ L. However, if we want to use their DFTs for the convolution calculation (to
use the circular convolution), then from the previous example we see that the length of
the convolution y(n) is equal M + L − 1. Therefore, for the DFT-based calculation of
the convolution y(n), we have to use at least
N ≥ M + L − 1.
This means that both DFTs, X (k) and H (k), whose product results in Y (k), must be
at least of N ≥ M + L − 1 duration (period). Otherwise, aliasing (overlapping of the
periods) will appear and the circular convolution calculated in this way would not
correspond (within the basic period) to the convolution of the original discrete-time
(aperiodic) signals.
Duration of an input signal x (n) may be much longer that the duration of the impulse
response h(n). For example, an input signal may have tens of thousands of samples, while
the impulse response of a discrete system duration is, for example, tens of samples, that is
M ≫ L. The direct convolution of these two signals could be calculated (after first L − 1
output samples) as
n
y(n) = ∑ x ( m ) h ( n − m ).
m = n − L +1
For every output sample, L multiplications would be used. For a direct DFT application in
the convolution calculation we should wait until the end of the signal and then zero-pad
both the input signal and the impulse response up to M + L − 1. This kind of calculation
is not efficient. Instead of using the direct DFT calculation, the signal can be split into
nonoverlapping sequences whose duration N is of the order of the impulse response duration
L,
K −1
x (n) = ∑ x k ( n ),
k =0
where
xk (n) = x (n)[u(n − kN ) − u(n − (k + 1) N ]
and M = KN (the input signal can always be zero-padded up to the nearest KN duration,
where K is an integer). The output signal is
!
K −1 n K −1
y(n) = ∑ ∑ xk (m) h(n − m) = ∑ y k ( n ). (3.14)
k =0 m = n − L +1 k =0
For the calculation of the convolutions yk (n) = xk (n) ∗n h(n), the signals xk (n) and h(n)
should be of duration N + L − 1 only. These convolutions can be calculated after every
N ≪ M input signal samples. The output sequence yk (n) duration is N + L − 1. Since
yk (n), k = 0, 1, . . . , K − 1, are calculated with the step N in the time-domain, they overlap,
although the input signals xk (n) are nonoverlapping. For the two successive yk (n) and
yk+1 (n) and L ≤ N, L − 1, the samples within kN + N ≤ n < kN + N + L − 1 overlap.
112 Discrete Fourier Transform
This effect should be taken into account, by summing the overlapped output samples in y(n),
after the individual convolutions yk (n) = xk (n) ∗n h(n) are calculated using the DFTs, as
shown in Fig. 3.5.
The basic period of the DFT X (k), calculated for k = 0, 1, 2, . . . , N − 1, should be considered
as having two parts:
• One part of the DFT values for 0 ≤ k ≤ N/2 − 1, which corresponds to the positive
frequencies
2π 2π
ω= k or Ω = k, for 0 ≤ k ≤ N/2 − 1, (3.15)
N N∆t
and the
• Other part being a shifted version of the DFT corresponding to the negative frequencies
(in the original aperiodic signal)
2π 2π
ω= (k − N ) or Ω = (k − N ), for N/2 ≤ k ≤ N − 1. (3.16)
N N∆t
Illustration of the frequency value correspondence to the frequency index in the DFT is
given in Fig. 3.6
We have seen that the DFT of a signal whose duration is limited to M samples can be
calculated using any N ≥ M. In practice, this means that we can add (use) as many zeros,
after the nonzero signal x (n) values, as we like. By doing this, we increase the calculation
complexity, but we also increase the number of samples within the same frequency range of
the Fourier transform.
If we recall that
holds in the case when the sampling theorem is satisfied, then we see that by increasing N
in the DFT calculation, the density of sampling (interpolation) in the Fourier transform of
the original signal increases. The DFT interpolation by zero padding the signal in the time
domain is illustrated in Fig. 3.7.
The same holds for the frequency domain. If we calculate the DFT of a signal with N
samples and then add, for example, N zeros after the region corresponding to the highest
frequencies, then by the IDFT of this 2N point DFT, we will interpolate the original signal in
time. All zero values in the frequency domain should be inserted between two parts (regions)
of the original DFT corresponding to positive and negative frequencies.
Ljubiša Stanković Digital Signal Processing 113
x(n)
0 n
h(n)
0 n
x (n)
1
0 n
x2(n)
0 n
x (n)
3
0 n
y1(n)
0 n
y2(n)
0 n
y (n)
3
0 n
y(n)
0 n
Figure 3.5 Illustration of the convolution, y(n), calculation when the input signal, x (n), duration is much longer
than the duration of the system impulse response, y(n).
114 Discrete Fourier Transform
X(Ω)|Ω=2πk/(N∆t)
−N/2 0 N/2−1 k
Ω=2πk/(N∆t)
X(k)
0 N k
Figure 3.6 Relation between the frequency in the continuous-time and the DFT frequency index.
Example 3.4. The Hann(ing) window for a signal truncation within − N/2 ≤ n ≤ N/2 − 1, is
1 2πn
w(n) = [1 + cos( )], for − N/2 ≤ n ≤ N/2 − 1. (3.18)
2 N
If the original signal values are within 0 ≤ n ≤ N − 1 then the Hann(ing) window form is
1 2πn
w(n) = [1 − cos( )], for 0 ≤ n ≤ N − 1. (3.19)
2 N
Present the zero-padded forms of the Hann(ing) windows with 2N samples.
⋆The zero-padded form of the Hann(ing) windows used for windowing data within the intervals
− N/2 ≤ n ≤ N/2 − 1 and 0 ≤ n ≤ N − 1 are shown in Fig. 3.8. The DFTs of windows (3.18)
and (3.19) are
W (k) = N [δ(k) + δ(k − 1)/2 + δ(k + 1)/2]/2
and
W (k) = N [δ(k) − δ(k − 1)/2 − δ(k + 1)/2]/2,
respectively. After the presented zero-padding the window DFT realness property w pz (n) =
w pz (n − 2N ) is preserved (for an even N in the case − N/2 ≤ n ≤ N/2 − 1 and for an odd N
for data within 0 ≤ n ≤ N − 1).
Ljubiša Stanković Digital Signal Processing 115
x(n)
n
X(k)
k
x(n)
n
X(k)
k
x(n)
n
X(k)
Figure 3.7 Discrete-time signal and its DFT (top two subplots). Discrete-time signal zero-padded and its DFT
interpolated (two subplots in the middle). Zero-padding (interpolation) factor was 2. Discrete-time signal zero-
padded and its DFT interpolated (two bottom subplots). Zero-padding (interpolation) factor was 4. According to the
duality property, the same holds if X (k) were a signal in the discrete-time and x (−n) was its Fourier transform.
116 Discrete Fourier Transform
w(n)
−N/2 0 N/2−1 n
w (n)
p
0 N n
w (n)
p
0 N 2N n
w(n)
0 N n
wp(n)
0 N n
wp(n)
0 2N n
Figure 3.8 Zero-padding of the Hann(ing) windows used to window data within − N/2 ≤ n ≤ N/2 − 1 and
0 ≤ n ≤ N − 1.
Ljubiša Stanković Digital Signal Processing 117
Presentation of the DFT will be concluded with an illustration (Fig. 3.9) of the relation
among the four forms of the Fourier domain signal representations for the cases of:
Z∞ Z∞
x (t) = 1
2π X (Ω)e jΩt
dΩ, X (Ω) = x (t)e− jΩt dt.
−∞ −∞
∞ T/2
Z
x p (t) = ∑ Xn e j2πnt/T , Xn = 1
T x (t)e− j2πnt/T dt,
n=−∞
− T/2
1
Xn =X (Ω)|Ω=2πn/T .
T
If the periodic signal is formed by a periodic extension of an aperiodic signal x (t) then
there is no signal overlapping (aliasing) in the periodic signal if the original aperiodic
signal duration is shorter than the extension period T and the previous relation holds.
x (n) = x (n∆t)∆t,
Zπ ∞
x (n) = 1
2π X (e jω )e jωn dω, X (e jω ) = ∑ x (n)e− jωn ,
−π n=−∞
∞
2π
X (e jω ) = ∑ X (Ω + m ) .
m=−∞ ∆t |Ω=ω/∆t
x(t) X(Ω)
X(Ω)
x(t)
t Ω
jω
x(n) = x(t) ∆t X(e ) = X(Ω)
t = n∆t Ω = ω/∆t
−π ≤ ω < π
X(e jω )
x(n)
−π π
n ω
Xn
−T/2 T/2
t n
X(k)
n k
Figure 3.9 Aperiodic continuous-time signal and its Fourier transform (first row). Discrete-time signal and its
Fourier transform (second row). Periodic continuous-time signal and its Fourier series coefficients (third row).
Periodic discrete-time signal and its discrete Fourier transform (DFT), (fourth row).
Ljubiša Stanković Digital Signal Processing 119
N −1 N −1
x p (n) = 1
N ∑ X (k)e j2πnk/N , X (k) = ∑ x (n)e− j2πnk/N ,
k =0 n =0
All forms of the Fourier representations are related and shown in Fig. 3.9.
Algorithms that provide efficient calculation of the DFT, with a reduced number of arithmetic
operations, are called the fast Fourier transform (FFT). A unified approach to the DFT and
the inverse DFT, (3.5), is used. The only differences between the DFT and inverse DFT
calculation are in the sign of the exponent and a division of the final result by N.
Here we will present an algorithm based on splitting the signal x (n), with N samples,
into two signals x (n) for 0 ≤ n ≤ N/2 − 1 and x (n) for N/2 ≤ n ≤ N − 1, whose duration
is N/2. It is assumed that N is an even number. By definition, the DFT of a signal with N
samples is
N −1
DFT N { x (n))} = X (k) = ∑ x (n)e− j2πnk/N
n =0
N/2−1 N −1
= ∑ x (n)e− j2πnk/N + ∑ x (n)e− j2πnk/N
n =0 n= N/2
N/2−1 h i
= ∑ x (n) + x (n + N/2)(−1)k e− j2πnk/N
n =0
since
e− j2π (n+ N/2)k/N = e− j2πnk/N e− jπk = (−1)k e− j2πnk/N .
For an even number, k = 2r, we have
N/2−1
DFT N/2 { g(n)} = X (2r ) = ∑ g(n)e− j2πnr/( N/2)
n =0
with
g(n) = x (n) + x (n + N/2).
For an odd number, k = 2r + 1, follows
N/2−1
DFT N/2 {h(n)} = X (2r + 1) = ∑ h(n)e− j2πnr/( N/2) ,
n =0
120 Discrete Fourier Transform
where
h(n) = ( x (n) − x (n + N/2))e− j2πn/N .
x(0) X(0)
x(1) X(2)
DFT
4
x(2) X(4)
x(3) X(6)
x(4) X(1)
−1 W 0
8
x(5) X(3)
−1 W 1 DFT
8
4
x(6) X(5)
−1 W 2
8
x(7) X(7)
−1 W 3
8
x(0) X(0)
x(1) X(4)
−1 W0
8
x(2) 0 X(2)
−1 W
8
x(3) 2 0 X(6)
−1 W −1 W8
8
x(4) 0 X(1)
−1 W8
x(5) 1 0 X(5)
−1 W8 −1 W8
x(6) 2 0 X(3)
−1 W8 −1 W8
x(7) 3 2 0 X(7)
−1 W8 −1 W8 −1 W8
In this way, one DFT of N elements is split into two DFTs of N/2 elements. Having in
mind that the direct calculation of a DFT with N elements requires an order of N 2 operations,
it means that we have reduced the calculation complexity, since
An illustration of the DFT calculation, with N = 8, using two DFT with N/2 = 4 is
shown in Fig. 3.10.
We can continue in the same way and split every DFT with N/2 elements into two
DFTs with N/4, and so on. A complete calculation scheme is shown in Fig. 3.11.
This kind of the DFT calculation is referred to as the decimation-in-frequency algorithm.
We can conclude that in this FFT algorithm an order of N log2 N of operations is required.
Here, it has been assumed that log2 N = p is an integer, that is, N = 2 p .
If we want to be precise the number of additions is exactly
Nadditions = N log2 N.
In the first stage, there are ( N/2 − 1) multiplications. In the second stage, there are
2( N/4 − 1) multiplications. In the next stage
would be 4( N/8 − 1) multiplications. Finally
in the last stage would be 2 p−1 2Np − 1 = N2 ( NN
− 1) = 0 multiplications (N = 2 p or
p = log2 N). The total number of multiplications, in this FFT algorithm, is
N N N p −1 N
Nmultiplicat. = −1 +2 −1 +4 − 1 + ··· + 2 −1
2 4 8 2p
N N N N N
= − 1 + − 2 + − 4 + ··· + −
2 2 2 2 2
N N 1 − 2p
= p − (1 + 2 + 22 + · · · + 2 p−1 ) = p −
2 2 1−2
N N
= log2 N − ( N − 1) = [log2 N − 2] + 1.
2 2
Example 3.5. Consider a signal x (n) within 0 ≤ n ≤ N − 1. Assume that N is an even number.
Show that the DFT of x (n) can be calculated as two DFTs, one DFT calculated using the even
samples of x (n) and the other DFT obtained using the odd samples of x (n).
where xe (m) = x (2m) and xo (m) = x (2m + 1) are the signal samples with even and odd
indices, respectively. If we use the notation Xe (k) = DFT{ xe (n)} and Xo (k) = DFT{ x (no )},
for k = 0, 1, . . . , N/2 − 1, then
and
since Xe (k) and Xo (k) are periodic with period N/2. Thus, the DFT of N elements is split into
two DFTs of N/2 elements. If N/2 is an even number, we can continue and split two DFTs
of N/2 elements into four DFTs of N/4 elements, and so on. This is a decimation-in-time
algorithm, Fig. 3.12.
x(4) X(1)
W80 −1
x(2) X(2)
W80 −1
x(6) X(3)
W80 −1 W82 −1
x(1) X(4)
W80 −1
x(5) X(5)
W80 −1 W81 −1
x(3) X(6)
W80 −1 W82 −1
x(7) X(7)
W80 −1 W82 −1 W83 −1
Example 3.6. Consider a signal x (n) within 0 ≤ n ≤ N − 1. Assume that N = 3M. Show that the
DFT of x (n) can be calculated using three DFTs of M samples.
A periodic signal x (t), with a period T, can be reconstructed if its Fourier series is with
limited number of nonzero coefficients, so that Xk = 0 for k > k m . This means that the
Fourier series coefficients corresponding to frequencies greater than Ωm = 2πk m /T are
zero-valued. The periodic signal x (t) can be reconstructed from the samples taken with the
sampling interval ∆t < π/Ωm = 1/(2 f m ). The number of samples within the period is
N = T/∆t.
The reconstructed signal is
N −1 sin[(n − t
∆t ) π ]
x (t) = ∑ x (n∆t)
N sin[(n − t
n =0 ∆t ) π/N ]
for an even N.
124 Discrete Fourier Transform
Example 3.7. Samples of a periodic signal x (t) are taken with the sampling interval ∆t = 1.
The obtained discrete-time signal samples x (n) are the elements of the signal vector x =
[0, 2.8284, − 2, 2.8284, 0, − 2.8284, 2, − 2.8284] T for 0 ≤ n ≤ N − 1 with N = 8. Assuming
that the signal satisfies the sampling theorem find its value at t = 1.5. Check the accuracy if the
original signal values were known, x (t) = 3 sin(3πt/4) + sin(πt/4).
⋆Using the reconstruction formula for an even number of samples, N, within the period we get
7
sin[(n − 1.5)π ]
x (1.5) = ∑ x(n)e j(n−1.5)π/8 8 sin[(n − 1.5)π/8] = −0.2242.
n =0
This result is equal to the original signal value. Calculation is repeated with 0 ≤ t ≤ 8, with a
step 0.01. The reconstructed values of x (t) are presented in Fig. 3.13.
4
x(t), x(n) with ∆t=1
−2
−4
0 2 4 6 8
time
In order to prove the sampling theorem of periodic signals write the signal x (t) in the
form of the Fourier series expansion
km
x (t) = ∑ Xk e j2πkt/T . (3.21)
k =−k m
Using N samples of the signal x (t) within the period (assuming that N is an odd number),
that is, by sampling the signal at instants n∆t = nT/N, we get
km
x (n∆t) = ∑ Xk e j2πkn/N .
k =−k m
Ljubiša Stanković Digital Signal Processing 125
( N −1)/2
T km T
x (n∆t)∆t = ∑ Xk e j2πkn/N = N
N k=− ∑ Xk e j2πkn/N .
k m k =−( N −1)/2
With x (n∆t)∆t = x (n) and TXk = X (k) this form of the Fourier series reduces to the DFT
and the inverse DFT
( N −1)/2 N −1
1
X (k )e j2πkn/N , x (n)− j2πkn/N .
N k=−(∑ ∑
x (n) = X (k) =
N −1)/2 n =0
Substituting the Fourier series coefficients Xk , expressed in terms of X (k) and x (n), into
signal (3.21), with k m = ( N − 1)/2, we get
N −1 N −1
2 N −1 N −1 2
1 n j2πk t
− j2πk N 1 t n
x (t) = ∑ ∑ x (n)e eT = ∑ ∑ x (n∆t)e j2πk( T − N )
T N
k =− N2−1 n=0 n=0 k =− N −1
2
N −1 t n
1 t n 1 − e j2π ( T − N ) N
= ∑ x (n∆t)e− j2π ( T − N )( N −1)/2 t n
N n =0 1 − e j2π ( T − N )
N −1 π
sin[ ∆t (t − n∆t)]
= ∑ x (n∆t) π
N sin[ N∆t (t − n∆t)]
.
n =0
This is the reconstruction formula that can be used to calculate x (t) for any t, based on the
signal samples x (n∆t) at the instants n∆t, with ∆t < π/Ωm = 1/(2 f m ).
In a similar way, the reconstruction formula for an even number of samples N can be
obtained.
The sampling theorem reconstruction formula of aperiodic signals follows as a special
case as N → ∞, since for a small argument
π π
sin[ (t − n∆t)] → (t − n∆t)
N∆t N∆t
and
∞ π
sin[ ∆t (t − n∆t)]
x (t) → ∑ x (n∆t) π .
n=−∞ ∆t ( t − n∆t )
Example 3.8. For a signal x (t) whose period is T, it is known that the signal has components
corresponding to the nonzero Fourier series coefficients at the indices k1 , k2 , . . . , k K . What is the
minimum number of signal samples needed to reconstruct the signal? What condition should be
satisfied the sampling instants and the frequencies for the reconstruction?
126 Discrete Fourier Transform
⋆The signal x (t) can be reconstructed using the Fourier series (1.22). In calculations, a finite
number of K nonzero terms will be used,
K
x (t) = ∑ Xkm e j2πkm t/T .
m =1
Since there are K unknown values Xk1 , Xk2 ,. . . , XkK the minimum number of equations to
calculate their values is K. The equations are written for K time instants
K
∑ Xkm e j2πkm ti /T = x (ti ), for i = 1, 2, . . . , K
m =1
or
ΦX= y, X = Φ −1 y
where
e j2πk1 t1 /T e j2πk2 t1 /T ... e j2πkK t1 /T
e j2πk1 t2 /T e j2πk2 t2 /T ... e j2πkK t2 /T
Φ= .. .. .. ..
. . . .
e j2πk1 tK /T e j2πk2 tK /T ... e j2πkK tK /T
The reconstruction condition is det kΦk 6= 0 for selected time instants ti and given frequency
indices k i .
Analysis and estimation of frequency and amplitude of pure sinusoidal signals is of great
importance in many applications.
Consider a simple continuous-time sinusoidal signal
whose Fourier transform is X (Ω) = 2π Aδ(Ω − Ω0 ). The whole signal energy is concen-
trated just in one frequency point at Ω = Ω0 . Obviously, the position of maximum is equal
Ljubiša Stanković Digital Signal Processing 127
to the signal frequency. For this operation we will use the notation
Ω0 = arg max | X (Ω)| . (3.23)
−∞<Ω<∞
Assume that the signal x (t) is sampled with the sampling interval ∆t. The discrete-time
form of this signal is
x (n) = Ae jω0 n ∆t,
where ω0 = Ω0 ∆t. In order to compute the DFT of this signal, we will assume a value of N
and calculate
N −1
X (k) = ∑ Ae jω0 n e− j2πnk/N ∆t.
n =0
In general, the DFT is of the form
N −1
1 − e jω0 N e− j2πk
X (k) = A ∑ e jω0 n e− j2πnk/N ∆t = A ∆t (3.24)
n =0 1 − e jω0 e− j2πk/N
sin( N (ω0 − 2πk/N )/2)
= Ae j(( N −1)(ω0 −2πk/N )/2) ∆t (3.25)
sin((ω0 − 2πk/N )/2)
ω0 = 2πk0 /N
N −1
X (k) = A ∑ e j2πk0 n/N e− j2πnk/N ∆t = N Aδ(k − k0 )∆t. (3.27)
n =0
Obviously, in this case we can find the signal frequency index from
1
A= X ( k 0 ).
N∆t
2. In reality, the signal period (or Ω0 ) is not known in advance (if we knew it, then this
analysis would not be needed). So, it is highly unlikely to have the previous case with the
frequency on the grid, when Ω0 = 2πk0 /( N∆t) as in Fig. 3.14, top row. More common
is the case illustrated in Fig. 3.14, bottom row, when the true signal frequency does not
correspond to any DFT sample position. Then, the simple signal of sinusoidal form produces
th DFT components at all frequencies since | X (k )| in (3.26) is not zero for any k. This effect
that a simple sinusoidal signal produces nonzero DFT values at all frequencies (Fig. 3.14,
bottom row) is known as the leakage effect.
X(k)
x(n)
n k
X(k)
x(n)
n k
Figure 3.14 Sinusoid x (n) = cos(8πn/64) and its DFT with N = 64 (top row) and sinusoid x (n) =
cos(8.8πn/64) and its DFT absolute value, with N = 64 (bottom row).
2π
e = Ω0 − k̂0 .
N∆t
The estimation error could be up to a half of the discretization interval, ∆Ω = 2π/( N∆t),
π π 2π π 2π π
− ≤e< and k̂0 − ≤ Ω0 < k̂0 + . (3.29)
N∆t N∆t N∆t N∆t N∆t N∆t
Two ways to improve the estimation will be described here.
Ljubiša Stanković Digital Signal Processing 129
1. The simplest way to reduce the estimation error is to increase the number of samples and
to reduce the discretization interval in frequency ∆Ω = 2π/( N∆t). This could be achieved
by appropriate zero-padding in the time domain, before the DFT calculation (corresponding
to the interpolation in the frequency domain). This way of error reduction increases the
calculation complexity.
2. The other way is based on the window function application in the DFT calculation
N −1 2πk
X (k) = ∑ w(n) Ae jω0 n e− j2πnk/N ∆t = W e j( N −ω0 ) ∆t,
n =0
where W (e jω ) is the Fourier transform of the window function. Windows, like for example
Hann(ing) or Hamming window, smooth the transition and reduce discontinuities at the
ending calculation points that cause leakage. A simple realization with, for example, the
Hann(ing) window (relation (2.31) and Fig. 2.7)
1
w(n) = [1 − cos(2nπ/N )] [u(n) − u(n − N − 1)] .
2
N −1
1
X H (k) = ∑ [1 − cos(2nπ/N )] Ae jω0 n e− j2πnk/N ∆t
n =0
2
A N −1 1 j2nπ/N 1 − j2nπ/N
Ae jω0 n e− j2πnk/N ∆t
2 n∑
= 1 − e − e
=0 2 2
1 1 1
= X R ( k ) − X R ( k − 1) − X R ( k + 1) ,
2 2 2
where XR (k) would be the DFT if the rectangular window were used. It is defined by (3.24).
The DFT of sinusoids on the grid and outside of the grid, multiplied by a Hann(ing) window,
are shown in Fig. 3.15. The leakage effect is reduced. However the DFT is spread over two
additional consecutive samples even in the case when the frequency is on the DFT grid, Fig.
3.15(top). In this case the amplitude is estimated as
1
A= [ X (k0 ) + X (k0 + 1) + X (k0 − 1)].
N∆t
This method is more efficient for the leakage effect reduction than for the improvement in the
frequency estimation. However, the idea of using a few neighboring samples in the parameters
estimation will be used next to define an approach for accurate frequency estimation.
3.7.2 Displacement
The maximum DFT value and its relation with a few surrounding values of the windowed
DFT are used to calculate correction, the displacement bin, of the estimated frequency.
130 Discrete Fourier Transform
x(n)w(n)
XH(k)
n k
x(n)w(n)
XH(k)
n k
Figure 3.15 Sinusoid x (n) = cos(8πn/64) multiplied by a Hann(ing) window and its DFT with N = 64 (top
row) and sinusoid x (n) = cos(8.8πn/64) multiplied by a Hann(ing) window and its DFT absolute value, with
N = 64 (bottom row).
For a given window function it is possible to derive the exact displacement formula for
the shift of the maximum position with respect to the detected maxim position. However,
instead of deriving an exact formula for every window form, we will present an approach
that combines the interpolation and a general fitting polynomial form. It can be used with
any window.
We can always interpolate the DFT values X (k) (by appropriate zero-padding of the
signal x (n)), so that there are several DFT samples within the main lobe. Then, for any
symmetric window we can approximate the Fourier transform around the maximum by
a quadratic function (in analog domain X (Ω) = aΩ2 + bΩ + c). Since there are three
parameters, a, b, and c, in this approximation, we need three Fourier transform values to
calculate them. Let us denote the largest sample of the Fourier transform, following from
k̂0 = arg max | X (k)| ,
0≤ k ≤ N −1
by
X0 = | X (k̂0 )|
and the two neighboring Fourier transform samples by
transform values at these points being denoted by X−1 , X0 and X1 , we have the Lagrange
second-order polynomial, Fig. 3.16,
This function reaches its maximum at ∂X (k̂0 + d)/∂d = 0, resulting in the displacement bin
for the frequency correction
with the frequency as in (3.32). The displacement calculation is illustrated in Fig. 3.16. Thus,
X(0) X(1)
X(Ω), X(k)
X(Ω), X(k)
X(−1)
Ω, k Ω, k
Figure 3.16 Illustration of the displacement bin correction for a true maximum position calculation based on the
three neighboring values (full range – left and zoomed graph – right) .
2π
Ω0 = (k̂0 + d) (3.32)
N∆t
2π
for 0 ≤ k̂0 ≤ N/2 − 1 and Ω0 = N∆t (( k̂ 0 + d) − N ) for N/2 ≤ k̂0 ≤ N − 1.
Example 3.9. The sinusoidal signal x (t) = A exp( jΩ0 t) is sampled with a sampling interval
∆t = 1/128 and N0 = 64 samples are considered. Prior to the DFT calculation, the signal is
zero-padded four times, up to N = 256. The DFT maximum is detected at the frequency index
position k̂0 = 95. The maximum DFT value is X (95) = 0.9936. Neighboring DFT values are
X (96) = 0.9432 and X (94) = 0.8470. Calculate the displacement bin d and estimate the value
of signal frequency Ω0 .
The total number of samples in the DFT calculation was N = 4N0 = 256, meaning that the
value k̂0 = 95 is within the first half of the samples (corresponding to positive frequency Ω0 ).
Therefore, we can use (3.32) for the frequency calculation
2π
Ω0 = (k̂ + d) = 95.2442π.
N∆t 0
The true signal used in simulation was x (t) = exp( j95.25t)/64, with the estimation error
e = 95.25 − 95.2442 = 0.0058. If the position of the maximum was only used the estimated
frequency would be 95π with an error of e = 0.25.
It is possible to derive the exact displacement formula for some specific windows,
based on their Fourier transform function. For example, for the Hann(ing) window the exact
displacement formula is
After the displacement is calculated the signal can also be modulated for the displace-
ment frequency shift in order to produce a signal with the frequency on the frequency grid.
This is especially important if we expect that the signal contains much smaller higher-order
harmonics that were masked with strong values of the dominant harmonic. If we detected
that the k0 th harmonic is dominant and displaced for d then this harmonic should be removed
from the signal modulated by the resulting estimated frequency. The DFT of the new signal
is used for the analysis of the second largest harmonic and so on.
The DFT of signal satisfies many desirable properties. Its calculation is simple and efficient
using the FFT algorithm. With the DFT calculation the signal periodic extension is assumed
and embedded in the discrete transform. However, this periodic extension of signal will, in
general, introduce significant signal change (corresponding to discontinuities in continuous
time) at the period ending points Fig. 3.17 (first and second row). This change (discontinuity)
will significantly worsen the DFT coefficients convergence and increase the number of
coefficients needed in the signal reconstruction for a given accuracy. In order to reduce
influence of this effect and to improve convergence of signal transform coefficients the signal
could be extended in an appropriate way.
The discrete cosine transforms (DCT) and the discrete sine transforms (DST) are used
to analyze real-valued discrete-time signals, periodically extended to produce even or odd
signal forms, respectively. However, this extension is not straightforward for discrete-time
signals.
Consider a discrete-time signal of duration N, when x (n) takes nonzero values for
0 ≤ n ≤ N − 1. If we try with a direct extension (using all signal values) and form a periodic
Ljubiša Stanković Digital Signal Processing 133
the obtained signal is not even, Fig. 3.17(third row). It is obvious that y(n) does not satisfy
the condition y(n) = y(−n) = y(2N − n), required for a real-valued DFT. The same holds
for an odd extension, Fig. 3.17(fourth row),
x (n) for 0≤n≤ N−1
y(n) = .
− x (2N − n − 1) for N ≤ n ≤ 2N − 1
One of our goals, to have a real-valued transform after the periodic extension of a real-valued
signal, is not achieved. However, from Fig. 3.17(third and fourth row) we can see that the
signals y(n) are even (or odd) with respect to the vertical line at n = −1/2. Thus, if we add
zeros between every sample of y(n) and assume that the position which was at n = −1/2 in
the initial signal is the new coordinate origin, n = 0, in the new signal z(n), then these signals
will be even and odd, respectively, Fig. 3.17(last two rows). This is just one of possible
extensions to make the original discrete-time signal even (or odd). Several forms of the DCT
and DST are defined based on other ways of getting an even (odd) signal extension.
The most commonly used form of the DCT is the so called DCT-II or just DCT. If no
form of the DCT is referred to in its name, then it is assumed that DCT-II form is used. It will
be presented here. Signal periodic extension for this transform corresponds to the already
described one in Fig. 3.17. The DCT definition is
N −1
2π (2n + 1)
C (k) = ∑ 2x(n) cos( 4N
k ), 0 ≤ k ≤ N − 1.
n =0
This transform will be derived and explained next. There are two main advantages of this
transform over the standard DFT calculation. The DCT coefficients are real-valued for a
real-valued signal. This transform can produce a better energy concentration than the DFT.
In order to understand why a better energy concentration can be obtained we will compare
the DCT to the standard DFT
N −1
X (k) = ∑ x (n)e− j2πnk/N , 0≤k≤ N−1
n =0
This extension eliminates possible signal discontinuities at the period ending points. Thus, in
general the Fourier transform of such a signal will converge faster, requiring fewer coefficients
134 Discrete Fourier Transform
x(n)
z(n)
z(n)
Figure 3.17 Illustration of a signal x (n), its periodic extension corresponding to the DFT, an even and odd
discrete-time signal extension corresponding to the DCT and DST of type II.
in the reconstruction. A zero value is then inserted between every pair of samples and an
even signal z(n), with period 4N, is formed
4N −1 2N −1
XC (k ) = DFT{z(n)} = ∑ z(n)e− j2πnk/(4N ) = ∑ z(2n + 1)e− j2π (2n+1)k/(4N )
n =0 n =0
2N −1 N −1
2π (2n + 1)k
= ∑ y(n)e− j2π (2n+1)k/(4N ) = ∑ 2x(n) cos( ) = C ( k ).
n =0 n =0
4N
Ljubiša Stanković Digital Signal Processing 135
Only N terms of the transform are used and the DCT values are obtained.
Since the basis functions are orthogonal the inverse DCT is obtained by multiplying
2π (2m+1)k
both sides of the DCT by cos( 4N ) and summing over 0 ≤ k ≤ N − 1,
N −1 N −1
2π (2n + 1)k 2π (2m + 1)k
∑ 2x(n) ∑ wk cos(
4N
) cos(
4N
)
n =0 k =0
N −1
2π (2m + 1)k
= ∑ wk C (k) cos(
4N
),
k =0
N −1
2π (2n + 1)k 2π (2m + 1)k N
∑ wk cos(
4N
) cos(
4N
) = δ(m − n)
2
k =0
we get
N −1
1 2π (2n + 1)k
x (n) = ∑ wk C (k) cos( ). (3.34)
N k =0
4N
A symmetric relation, with the same coefficients in the time and frequency domain, is
N −1
2π (2n + 1)k
C (k ) = vk ∑ x (n) cos(
4N
)
n =0
N −1
2π (2n + 1)k
x (n) = ∑ vk C (k) cos(
4N
),
k =0
√ √
where v0 = 1/N and vk = 2/N for k 6= 0.
In a similar way the discrete sine transforms are defined. The most common form is the
DST of type II (DST-II), whose definition is
N −1
2π (2n + 1)
S(k) = ∑ 2x(n) sin( 4N
(k + 1))
n =0
z(2n + 1) = y(n)
z(2n) = 0.
136 Discrete Fourier Transform
4N −1 2N −1
XS ( k ) = ∑ z(n)e− j2πnk/(4N ) = ∑ y(n)e− j2π (2n+1)k/(4N )
n =0 n =0
( )
N −1
2π (2n + 1)k
= Im ∑ 2jx(n) sin( 4N
) = S ( k ),
n =0
with N terms of the transform being used. The DST is the imaginary part of this DFT.
Calculate its DFT with N = 32. Plot the periodic extension of the signal x (n). Plot its even
extension y(n). Calculate the DFT (the DCT) of such a signal and discuss the results.
⋆Signal x (n), along with its extended versions and corresponding transforms, is presented in
Fig. 3.18. Better energy concentration in the DCT is due to the introduced symmetry in y(n). The
X(k)
x(n)
n k
[x(n) x(n)]
X2(k)
n k
C(k)
y(n)
n k
artificial discontinuity in the DFT, which causes its slow convergence, is eliminated in the DCT.
Ljubiša Stanković Digital Signal Processing 137
By using periodic extensions in the cosine transform, the convolution property of the
DFT is lost. Thus, this kind of transforms may be used for a signal reconstruction and
compression but not in the realization of discrete systems, unless they are properly related to
the corresponding DFT values (see Problem 3.10).
Calculate its DFT with N = 32. Plot the periodic extension of this signal. Plot even and odd
extensions y(n) of x (n). Calculate the DCT and DST. Comment the results.
⋆The signal x (n), with its periodic extensions yc (n) and ys (n), corresponding to the DFT,
DCT, and DST, respectively, is presented in Fig. 3.19(left), as x (n), [ x (n) x (n)], yc (n), and
ys (n), respectively. The corresponding transforms are shown in the right panels of this figure.
Note that the convergence of the DFT and DCT is similar. Here the DST converges faster, since
its extension is "smoother".
Two discrete signal transforms that can be calculated without using multiplications will
be presented next. One of them will be used to explain the basic principle of the wavelet
transform calculation as well.
Let us consider a two-sample signal x (n), with N = 2. The corresponding two-sample
DFT is
1
X (k) = ∑ x(n)e− j2πnk/2 = x(0) + (−1)k x(1).
n =0
It can be calculated without using multiplications, X (0) = x (0) + x (1) and X (1) =
x (0) − x (1). Now we can show that it is possible to define basis functions for any signal
duration in such a way that the multiplications are not used in the signal transformation.
These transform values will be denoted by H (k ). For two-sample signal case
The whole frequency interval is represented by a low-frequency value X (0) and a high-
frequency value X (1). In a matrix form
H (0) 1 1 x (0)
= . (3.35)
H (1) 1 −1 x (1)
X(k)
x(n)
n k
[x(n) x(n)]
X2(k)
n k
y (n)
C(k)
c
n k
y (n)
S(k)
s
n k
Figure 3.19 Signal and its periodic extensions, corresponding to: the DFT (second row), the cosine transform
(third row), and the sine transform (fourth row). Positive frequencies for the DFT are shown.
Example 3.12. For the signal shown in Fig. 3.20 calculate the two-sample DFT for every pair of
signal samples
⋆The values of lowpass part, Hn (0) = y L (n), and the highpass part, Hn (1) = y H (n), are
calculated and are presented in Fig. 3.20. The signal y L (n) is a low-frequency and smoothed
version of the original signal, while the signal y H (n) contains the details that are lost in the
smoothed version y L (n).
Ljubiša Stanković Digital Signal Processing 139
x(n)
n
yL(n)
n
yH(n)
Figure 3.20 Original signal x (n) and its two-sample lowpass part y L (n) and highpass part y H (n).
The original signal values may easily be reconstructed from Hn (0) = y L (n) and Hn (1) =
y H (n) as
x (2n) 1 1 1 Hn (0)
=
x (2n + 1) 2 1 −1 Hn (1)
for 0 ≤ n ≤ N/2 − 1.
In some cases the smoothed version y L (n), with a half of the samples of the original signal,
(3.20), is quite good representative of the original signal, so there is no need to use corrections.
Note that for many instants, the correction is zero as well. This result can be used as a basis
for the signal compression, when the signal is presented with a reduced set of samples, with no
significant distortion.
There are two possibilities to continue and apply the two-point DFT scheme to a signal
with N samples:
• The first one consists in splitting further both existing (lowpass and highpass) signals,
y L (n) and y H (n), into their corresponding lowpass and highpass parts. This scheme
leads to the discrete Walsh-Hadamard transform, shown in Fig. 3.21 for the signal
x (n) from Fig. 3.20.
• In the second case, the splitting is done for the lowpass part, y L (n), only, while the
highpass correction, y H (n), is kept unchanged. This scheme leads to the Haar wavelet
transform, Fig. 3.22.
x(n)
yH(n)
y (n)
L
n n
yHL(n)
y (n)
LL
n n
yHH(n)
y (n)
LH
n n
Figure 3.21 Illustration of the procedure leading to the Walsh-Hadamard transform calculation.
Let us continue the idea of splitting both (lowpass and highpass) parts of the signal and define
a transformation of a four-sample signal. For this signal form two auxiliary two-sample
signals y L (n) and y H (n) as
They represent low-frequency and high-frequency parts of the pairs: x (0), x (1) and x (2),
x (3) of two-sample signals. The lowpass part of the auxiliary two-sample lowpass signal
y L (n) is
H (0) = y L (0) + y L (1) = x (0) + x (1) + x (2) + x (3).
The highpass part of the auxiliary two-sample lowpass signal y L (n) is
y (n)
x(n)
H
n n
y (n)
y (n)
LH
L
n n
y (n)
LL
Figure 3.22 Illustration of the procedure leading to the Haar wavelet transform calculation.
By replacing the values of y L (n) and y H (n) with signal values x (n), we get the
transformation equation
H (0) 1 1 1 1 x (0) x (0)
H (1) 1 −1 −1 x (1)
1 = T4 x (1) , (3.40)
H (2) = 1 −1 1 −1 x (2) x (2)
H (3) 1 −1 −1 1 x (3) x (3)
142 Discrete Fourier Transform
where ⊗ denotes Kronecker multiplication of two submatrices in T2 (its rows) with T2 , defined by
(3.36). Notation T2 (i, :) is used for the ith row of T2 . The transformation matrix of order N is obtained
by a Kronecker product of N/2-order transformation matrix rows and T2 ,
T2 ⊗ [T N/2 (1, :)]
T ⊗ T
2 [ N/2 (2, :)]
TN =
.
(3.42)
. . .
T2 ⊗ [T N/2 ( N/2, :)]
In this way, although we started from a two-point DFT, in splitting the frequency domain, we did
not obtain the Fourier transform of a signal, but a form of the Walsh-Hadamard transform. In ordering
the coefficients (matrix rows) in our example, we followed the frequency region order from the Fourier
domain (for example, in the four-sample case, low-low, low-high, high-low, and high-high frequency
region).
Three ways of ordering transform coefficients in the Walsh-Hadamard transform (ordering of
transformation matrix rows) are used. They produce the same result with different orderings of the
coefficients and different recursive formulae for constructing the transformation matrices. The presented
way of ordering coefficients, as in (3.41), is known as the Walsh transform with dyadic ordering. It
will be used in examples and denoted as the Walsh-Hadamard transform.
The Hadamard transform would correspond to the so called natural ordering of rows from the
transformation matrix T8 ,
1 1 1 1 1 1 1 1
1 −1 1 −1 1 −1 1 −1
1 1 −1 −1 1 1 −1 −1
1 −1 −1 1 1 −1 −1 1
H8 = 1 1 1 1 −1 −1 −1 −1
1 −1 1 −1 −1 1 −1 1
1 1 −1 −1 −1 −1 1 1
1 −1 −1 1 −1 1 1 −1
Ljubiša Stanković Digital Signal Processing 143
It would correspond to [ H (0), H (4), H (2), H (6), H (1), H (5), H (3), H (7)] T order of coefficients
in the Walsh transform with dyadic ordering (3.41).
Recursive construction of the Hadamard transform matrix H2N is easy using the Kronecker
product of T2 defined by (3.36) and HN ,
HN HN
H2N = T2 ⊗ HN = .
HN −HN
The following order [ H (0), H (1), H (3), H (2), H (6), H (7), H (5), H (4)] T in (3.41) would corre-
spond to a Walsh transform with sequence ordering.
Calculation of the Walsh-Hadamard transforms requires only additions. For an N-order transform
the number of additions is ( N − 1) N.
Consider again two pairs of signal samples, x (0), x (1) and x (2), x (3). The high frequency parts of
these pairs are calculated as y H (n) = x (2n) − x (2n + 1), for n = 0, 1. They are used in the Haar
transform without any further modification. Since they represent highpass Haar transform coefficients,
they will be denoted, by W (2) = y H (0) = x (0) − x (1) and W (3) = y H (1) = x (2) − x (3). The
lowpass coefficients of these pairs are y L (0) = x (0) + x (1) and y L (1) = x (2) + x (3). The highpass
and lowpass parts of these signals are calculated as y LH (0) = [ x (0) + x (1)] − [ x (2) + x (3)] and
y LL (0) = [ x (0) + x (1)] + [ x (2) + x (3)]. For a four-sample signal the transformation ends here with
W (1) = y LH (0) and W (0) = y LL (0). Note that the order of coefficients is such that the lowest
frequency coefficient corresponds to the transform index k = 0. Matrix form of the transform for a
four-sample signal is
W (0) 1 1 1 1 x (0)
W (1) 1 1 −1 −1 x (1)
W (2) = 1 −1 0 0 x (2) .
W (3) 0 0 1 −1 x (3)
For an eight-sample signal the highpass coefficients would be kept without further modification in
every step (scale), while for the lowpass parts of signal their highpass and lowpass parts would be
calculated. The transformation matrix in the case of a signal with eight samples is
W (0) 1 1 1 1 1 1 1 1 x (0)
W (1) 1 1 1 1 −1 −1 −1 −1 x (1)
W (2) 1 1 −1 −1 0 0 0 0 x (2)
W (3) 0 0 0 0 1 1 −1 −1 x (3)
= (3.43)
W (4) 1 −1 0 0 0 0 0 0 x (4) .
W (5) 0 0 1 −1 0 0 0 0 x (5)
W (6) 0 0 0 0 1 −1 0 0 x (6)
W (7) 0 0 0 0 0 0 1 −1 x (7)
This is the Haar transform or the Haar wavelet transform of a signal with eight samples.
The Haar transform is useful in the analysis of signals when we can expect that in a slow-varying
signal there are few details.
The Haar wavelet transform is computationally very efficient. The efficiency comes from the
fact that the Haar wavelet transform almost does not transform the signal at high frequencies. It leaves
it almost as it is, using a very simple two-sample transform. For lower frequencies, the number of
operations is increased.
In specific, for the highest N/2 coefficients, the Haar transform does only one addition (of two
signal values) for every coefficient. For next N/4 coefficients the Haar wavelet uses 4 signal values
144 Discrete Fourier Transform
with 3 additions and so on. The total number of additions for the Haar transform is
N N N N
Nadditions = (2 − 1) + (4 − 1) + (8 − 1) + · · · + ( N − 1).
2 4 8 N
For N of the form N = 2m we can write
1 1 1 1
Nadditions = N log2 N − N ( + 2 + 3 + · · · + m )
2 2 2 2
1 1 − 21m
= N log2 N − N = N log2 N − ( N − 1) = N [log2 N − 1] + 1.
2 1 − 12
x (n) = [2, 2, 12, −8, 2, 2, 2, 2, −3, −3, −3, −3, 3, −9, −3, −3].
Calculate its Haar and Walsh-Hadamard transform with N = 16. Discuss the results.
n k
x0−1(n)
x0(n)
n n
x0−1,9,14(n)
x0−1,9(n)
n k
Figure 3.23 Signal x (n) and its discrete Haar transform H (k ). Reconstructed signals: using H (0) presented
by x0 (n), using two coefficients H (0) and H (1) denoted by x0−1 (n), using H (0), H (1), and H (9) denoted by
x0−1,9 (n), and using H (0), H (1), H (9), and H (14) denoted by x0−1,9,14 (n). Vertical axes scales for the signal
and transform are different.
Ljubiša Stanković Digital Signal Processing 145
In full analogy with (3.43), the Haar transformation matrix of order N = 16 is formed. For
example, higher coefficients are just two-sample signal transforms,
Although there are some short duration pulses (x (2), x (3), x (13)), the Haar transform coefficients
W (2), W (3), . . . , W (8), W (10), W (11), W (12), W (13), W (15) are zero-valued, Fig. 3.23. This
is the result of the Haar transform property to decompose the high frequency signal region into
short duration (two-sample) basis functions. A short duration pulse is contained in the high
frequency part of only one Haar coefficient. This is not the case in the Fourier transform (or
Walsh-Hadamard transform) where a single delta pulse will cause that all coefficients are nonzero,
Fig. 3.24. Transformation matrix T16 for the Walsh-Hadamard transform is obtained from T8
using (3.42).
Property that high-frequency coefficients are well localized in the time domain and that they
represent a short duration signal components is used in image compression, where adding high
frequency coefficients adds details into an image, with important property that one detail in the
image corresponds to one (a few) nonzero coefficient. Reconstruction of the signal from the Haar
transform, using various number of coefficients, is presented in Fig. 3.23. As explained, it can
be considered as "a zooming" a signal toward the details when the higher frequency coefficients
are added. Since a half of the coefficients are zero-valued a significant compression ratio can
be achieved by storing or transmitting the nonzero coefficients only. This is a basic idea for
multiresolution wavelet based image representations and compression.
H(k)
x(n)
n k
⋆The Haar wavelet transform and the Walsh-Hadamard transform are shown in Fig. 3.25. We can
see that for a signal of long duration, with high frequencies, the number of nonzero coefficients in
the Haar wavelet transform is large. Just one such a component in the Walsh-Hadamard transform
can require a half of the available coefficients in the Haar wavelet transform, Fig. 3.25(left). In
addition, the fact that a much smaller number of coefficients is used for the Walsh-Hadamard
146 Discrete Fourier Transform
x(n)
x(n)
n n
W(k)
W(k)
k k
H(k)
H(k)
k k
Figure 3.25 The Haar wavelet transform (second row) and the Walsh-Hadamard transform (third row) for high
frequency long duration signals (first row). Vertical axes scales for the signal and transform are different.
transform based reconstruction, as compared to a very large number of coefficients in the Haar
wavelet transform reconstruction, may annul the Haar transform calculation complexity advantage
in this case.
3.10 PROBLEMS
Problem 3.2. If the signals g(n) and f (n) are real-valued, show that their DFTs, G (k) and F (k), can
be obtained from the DFT Y (k) of the signal y(n) defined as y(n) = g(n) + jh(n).
Problem 3.3. Frequency of a continuous time signal is related to the DFT index according to
(
2πk/( N∆t) for 0 ≤ k ≤ N/2 − 1
Ω=
2π (k − N )/( N∆t) for N/2 ≤ k ≤ N − 1.
This mapping is achieved in programs using shift functions. Show that the shift will not be necessary if
we use the signal x (n)(−1)n . The DFT values of this signal will be ordered from the one corresponding
to the lowest negative frequency, toward the highest positive frequency.
Ljubiša Stanković Digital Signal Processing 147
Problem 3.4. If the DFT of a signal x (n), with period N, is X (k) find the DFT of the signals
x (n) for n = 2m
y(n) =
0 for n = 2m + 1
and
0 for n = 2m
z(n) =
x (n) for n = 2m + 1.
Problem 3.5. Find the convolution of signals x (n) and h(n) whose nonzero values are x (0) = 1,
x (1) = −1 and h(0) = 2, h(1) = −1, h(2) = 2, using their DFTs and the inverse DFT of the resulting
product, that is x (n) ∗n h(n) = IDFT{DFT{ x (n)}DFT{y(n)}}.
Problem 3.6. Find the circular convolution of the signals x (n) = e j4πn/N + sin(2πn/N ) and
h(n) = cos(4πn/N ) + e j2πn/N within the common period for both signals.
Problem 3.7. Find the signal whose DFT is Y (k) = | X (k)|2 and X (k) is the DFT of the signal
x (n) = u(n) − u(n − 3), calculated with the period N = 10.
Problem 3.8. What is the relation between the discrete Hartley transform (DHT) of real-valued signals
x (n), defined by
N −1
2πnk 2πnk
H (k) = ∑ x (n) cos + sin
n =0
N N
and the DFT of the same signal? Express the DHT in terms of the DFT and the DFT in terms of the
DHT.
Problem 3.9. Show that the DCT of a signal x (n) with N samples, defined by
N −1
2πk 1
C (k) = ∑ 2x (n) cos(
2N
(n + ))
2
n =0
Problem 3.10. A real-valued signal x (n) of a duration shorter than N, defined for 0 ≤ n ≤ N − 1,
has the Fourier transform X (k). The signal y(n) is formed as
2x (n) for 0 ≤ n ≤ N − 1
y(n) = (3.44)
0 for N ≤ n ≤ 2N − 1,
with the DFT denoted by Y (k). The signal z(n) is formed using y(n) as
z(2n + 1) = y(n)
z(2n) = 0.
(a) What are the real and imaginary parts of Z (k) = DFT{z(n)}? How they are related to the
DCT and the DST of the signal x (n)? (b) The signal x (n) is applied as an input to a linear impulse
invariant system with the impulse response h(n) such that h(n) is of the duration shorter than N,
148 Discrete Fourier Transform
defined within 0 ≤ n ≤ N − 1, and x (n) ∗n h(n) is also within the same interval, 0 ≤ n ≤ N − 1. The
DCT of the output signal is calculated. How the DCT of the output signal is related to the DCT and
DST of the input signal x (n)?
Problem 3.11. Consider a signal x (n) whose duration is N, with nonzero values within the interval
0 ≤ n ≤ N − 1. Define the system with the output
N −1
yk (n + ( N − 1)) = ∑ x (n + m)e− j2πmk/N
m =0
so that its value yk ( N − 1) at the last instant of the signal duration is equal to the DFT of signal, for a
given k,
N −1
y k ( N − 1) = ∑ x (m)e− j2πmk/N = DFT{ x (n)} = X (k).
m =0
Note that the system is causal since yk (n) uses only x (n) at instant n and previous instants.
Show that the output signal yk (n) is related to the previous output value yk (n − 1) by the equation
Problem 3.12. Show that the discrete Hartley transform (DHT) coefficients of a signal x (n) with an
even number of samples N can be calculated, for an even frequency index k = 2r, using the DHTs with
N/2 samples (fast DHT calculation).
√
Problem 3.13. Find the DFT of the signal x (n) = exp( j4π 3n/N ), for n = 0, 1, . . . , N − 1 with
N = 16. If the DFT is interpolated four times (signal zero-padded), find the displacement bin, estimate
the signal frequency, and compare it with the true frequency value. What is the displacement bin if the
general formula is applied without the interpolation?
3.11 EXERCISE
Exercise 3.1. Find the DFT of the signal x (n) = δ(n) − δ(n − 3) with the assumed periods N = 4
and N = 8.
Exercise 3.2. Calculate the DFT of the signal x (n) = sin(nπ/4) for 0 ≤ n < N with N = 8 and
N = 16.
Exercise 3.3. For a real-valued signal x (n), the DFT is calculated with N = 8 and the following DFT
values are known: X (0) = 1, X (2) = 2 − j, X (4) = 2, X (5) = j, X (7) = 3. Find the remaining DFT
values. What are the values of x (0) and ∑7n=0 x (n)?
Exercise 3.4. Signal x (n) is presented in Fig. 3.26. Find X (0), X (4), and X (8), where X (k) is the
DFT of the signal x (n) calculated with the period N = 16.
Exercise 3.5. Prove that the DFT value X ( N/2) is real-valued for an arbitrary real-valued signal
x (n), defined for 0 ≤ n < N, where N is an even integer.
Exercise 3.6. Consider the signal x (n) whose DFT values X (k), calculated with N = 16, are presented
in Fig. 3.27.
Ljubiša Stanković Digital Signal Processing 149
4
3
2
x( n) 1
0
−1
−2
−3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n
4
3
2
X( k)
1
0
−1
−2
−3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
Exercise 3.7. Prove that if | x (n)| ≤ A for 0 ≤ n < N then | X (k)| ≤ N A for any k, where X (k) is
the DFT of x (n) calculated with N points.
Exercise 3.8. Prove that if ∑nN=−01 | x (n)| ≤ A then ∑kN=−01 | X (k)| ≤ N A where X (k) is the DFT of
x (n) calculated with N samples.
150 Discrete Fourier Transform
3.12 SOLUTIONS
Solution 3.1. The DFT assumes that the signals are periodic. In order to calculate the DFT, we have
to assume a period of the considered aperiodic signals first. The period N should be greater or equal
to the signal duration, so that the signal values do not overlap after their periodic extension. Larger
values of N will increase the density of the frequency domain samples, but they will also increase the
computation time.
a) For this signal any N ≥ 1 is acceptable, producing
X (k) = 1, k = 0, 1, . . . , N − 1,
Solution 3.2. From the signal y(n) = g(n) + j f (n) its real and imaginary parts g(n) and f (n) can
be obtained as
y(n) + y∗ (n) y(n) − y∗ (n)
g(n) = , and f (n) = .
2 2j
Then the DFTs of signals g(n) and f (n) are obtained from
Y (k) + Y ∗ ( N − k) Y (k) − Y ∗ ( N − k)
G (k) = and F (k) = .
2 2j
N −1
X1 ( k ) = ∑ x (n)(−1)n e− j2πnk/N .
n =0
Ljubiša Stanković Digital Signal Processing 151
N −1
1 N −1
∑ y(n)e− j2πnk/N = [ x (n) + (−1)n x (n)]e− j2πnk/N
2 n∑
Y (k) =
n =0 =0
1 N −1 1 N
[ x (n) + x (n)e− jπnN/N ]e− j2πnk/N = [ X (k) + X (k + )]
2 n∑
=
=0 2 2
It is obvious that the DFT of signal x (n) is equal to the sum of the DFTs of signals y(n) and z(n),
Y ( k ) + Z ( k ) = X ( k ).
Solution 3.5. For the convolution calculation using the DFTs of signals, the minimum number for
the period N is N = K + L − 1 = 4, where K = 2 is the duration of the signal x (n) and L = 3 is the
duration of the impulse response h(n). With N = 4, we get
X (k) = 1 − e− j2πk/4
H (k) = 2 − e− j2πk/4 + 2e− j4πk/4
Y (k) = X (k) H (k) = 2 − 3e− j2πk/4 + 3e− j4πk/4 − 2e− j6πk/4 .
Solution 3.6. The DFT of the circular convolution y(n) of signals x (n) and h(n), y(n) = x (n) ∗ h(n),
is equal to the corresponding DFTs: Y (k) = X (k) H (k) with
N −1
1 1
X (k) = ∑ [e j4πn/N + 2j e j2πn/N − 2j e− j2πn/N ]e− j2πnk/N
n =0
N N
= Nδ(k − 2) + δ ( k − 1) − δ ( k + 1)
2j 2j
152 Discrete Fourier Transform
and
N −1
1 1
H (k) = ∑ [ 2 e j4πn/N + 2 e− j4πn/N + e j2πn/N ]e− j2πnk/N
n =0
N N
= δ(k − 2) + δ(k + 2) + Nδ(k − 1).
2 2
The value of Y (k) is
N2 N2
Y (k) = δ ( k − 2) + δ ( k − 1).
2 2j
The circular convolution is obtained as the inverse DFT of Y (k)
N j4πn/N N j2πn/N
y(n) = e + e .
2 2j
Solution 3.7. The DFT of signal y(n), equal to Y (k) = | X (k)|2 , can be written as Y (k) = X (k) X ∗ (k).
The inverse DFT of this product is equal to the convolution of individual inverse DFTs, that is
Since
!∗
1 N −1 ∗ 1 N −1
IDFT{ X ∗ (k)} = X (k )e j2πnk/N = X (k)e− j2πnk/N
N k∑
=0
N k∑
=0
!∗
1 N −1
X (k)e j2πk( N −n)/N = x∗ ( N − n)
N k∑
=
=0
we get
y(n) = ( x (n))10 ∗n ( x ∗ (10 − n))10 = (u(n) − u(n − 3))10 ∗n (u(10 − n) − u(7 − n))10
= (δ(n + 2) + 2δ(n + 1) + 3δ(n) + 2δ(n − 1) + δ(n − 2))10 ,
where ( x (n)) N indicates that the signal x (n) is periodically extended with period N.
N −1
2πnk 2πnk
X (k) = ∑ [ x(n) cos N
− jx (n) sin
N
]
n =0
N −1
2πnk 2πnk
X ( N − k) = ∑ [ x(n) cos N
+ jx (n) sin
N
].
n =0
From the previous equations, we can easily conclude that the following relations hold
N −1
2πnk X (k) + X ( N − k) H (k) + H ( N − k)
∑ x (n) cos
N
=
2
=
2
n =0
N −1
2πnk X ( N − k) − X (k) H (k) − H ( N − k)
∑ x (n) sin
N
=
2j
=
2
.
n =0
Ljubiša Stanković Digital Signal Processing 153
Solution 3.9. We can split the summation in the DCT into an even and odd part
N −1
2πk 1
C (k) = ∑ 2x (n) cos(
2N
(n + )) =
2
n =0
N/2−1 N/2−1
2πk 1 2πk 1
∑ 2x (2n) cos(
2N
(2n + )) + ∑ 2x (2n + 1) cos(
2 2N
(2n + 1 + )).
2
n =0 n =0
By reverting the summation index in the second sum, using n = N/2 − 1 − m, the summation over m
is from m = N/2 − 1, for n = 0, down to m = 0, for n = N/2 − 1. Then
N/2−1
2πk 1
∑ 2x (2n + 1) cos(
2N
(2n + 1 + ))
2
n =0
N/2−1
2πk 1
= ∑ 2x ( N − 2m − 1) cos(
2N
( N − 2m − 1 + )).
2
m =0
The summation index in this sum can be shifted for N/2 + m = n to get
N/2−1
2πk 1
∑ 2x ( N − 2m − 1) cos(
2N
( N − 2m − 1 + ))
2
m =0
N −1
2πk 1
= ∑ 2x (2N − 2n − 1) cos( (2N − 2n − )).
n= N/2
2N 2
Now we can go back to the DCT and replace the second sum, to get
N/2−1
2πk 1
C (k) = ∑ 2x (2n) cos(
2N
(2n + ))
2
n =0
N −1 N −1
2πk 1 2πk 1
+ ∑ 2x (2N − 2n − 1) cos(
2N
(2n + )) = ∑ y(n) cos(
2 2N
(2n + ))
2
n= N/2 n =0
We can conclude that the relation between the DFT and DCT is given by
N −1 2πk 1 πk
C (k) = Re{ ∑ y(n)e− j 2N (2n+ 2 ) } = Re{e− j 2N DFT{y(n)}}.
n =0
154 Discrete Fourier Transform
4N −1 2N −1
DFT{z(n)} = ∑ z(n)e− j2πnk/(4N ) = ∑ z(2n + 1)e− j2π (2n+1)k/(4N )
n =0 n =0
2N −1 N −1
= ∑ y(n)e− j2π (2n+1)k/(4N ) = ∑ 2x (n)e− j2π (2n+1)k/(4N ) .
n =0 n =0
and
N −1
Z (k) = DFT{z(n)} = e− j2πk/(4N ) ∑ 2x (n)e− j2πnk/(2N )
n =0
Note that X (k/2) is just a notation for 2X ( 2k ) = Y (k), where Y (k) = DFT{y(n)} and y(n) is
zero-padded version of 2x (n), defined by (3.44).
(b) If the signal x (n) is used as an input to a system then the DCT is calculated for
It has been assumed that all signals, x (n), h(n), and x (n) ∗n h(n), are zero-valued outside the interval
0 ≤ n ≤ N − 1 (it means that the duration of x (n) and h(n) should be such that their convolution is
within 0 ≤ n ≤ N − 1). Then, for the signal zh (n), related to xh (n) = x (n) ∗n h(n) in the same way
as the signal z(n) is related to x (n) in (a), we can write
k k k k
DFT{zh (n)}e jπk/(2N ) = 2Xh ( ) = 2X ( ) H ( ) = Y (k) H ( ).
2 2 2 2
Then
k
Ch (k) = DCT{ xh (n)} = Re{Y (k) H ( )e− jπk/(2N ) }
2
k k
= Re{Y (k)e− jπk/(4N ) } Re{ H ( )} − Im{Y (k)e− jπk/(4N ) } Im{ H ( )}
2 2
k k
= C (k) Re{ H ( )} + S(k) Im{ H ( )}.
2 2
The system output is given by
, with IDCT{Ch )k)} defined by (3.34). The transform H (k/2) is the DFT of the zero-padded signal
h(n) with a factor of 2. Only the first half of the DFT samples are then used in calculation.
Ljubiša Stanković Digital Signal Processing 155
N −1
yk (n) = ∑ x (n − N + 1 + m)e− j2πmk/N .
m =0
N/2−1 h i N −1 h i
H (2r ) = ∑ x (n) cos 2πrn
N/2 + sin 2πrn
N/2 + ∑ x (n) cos 2πrn
N/2 + sin 2πrn
N/2
n =0 n= N/2
N/2−1 h i
2πrn
= ∑ ( x (n) + x (n + N/2)) cos N/2 + sin 2πrn
N/2 .
n =0
where g(n) = x (n) + x (n + N/2). This is the DHT of g(n) with N/2 samples.
Note: For odd frequency indices k = 2r + 1 we can write
N −1
2π (2r + 1)n 2π (2r + 1)n
H (2r + 1) = ∑ x ( n ) cos
N
+ sin
N
.
n =0
156 Discrete Fourier Transform
where
N 2πn N 2πn
f (n) = [ x (n) − x (n + )] cos + [ x ( − n) − x ( N − n)] sin .
2 N 2 N
This is again the DHT of the signal f (n) with N/2 samples. In this way, the DHT of the signal with N
samples is split into the two DHTs with N/2 samples.
√
Solution 3.13. The absolute value of the DFT of the signal x (n) = exp( j4π 3n/N ), for n =
0, 1, . . . , N − 1 with N = 16 is defined by
15 √ sin π (2√3 − k)
j2π (2 3−k )n/16
| X (k)| = | ∑ e
|= √ , with (3.46)
n =0 sin π (2 3 − k)/16
where |X| is the vector whose elements are the DFT values | X (k)|, k = 0, 1, . . . , 15. The maximum DFT
absolute value is achieved at k = 3. This means that the frequency estimation, without displacement
bin, would be
(2π · 3)/16 = 1.1781,
√
while the true frequency is (2π · 2 3)/16 = 1.3603. The error is 13.4%.
For the zero-padded signal (interpolated DFT), with a factor of 4,
15 √ 15 √
3n/16 − j2πnk/64 3−k )n/64
| X (k)| = | ∑ e j4π e | =| ∑ e j2π (8 |
n =0 n =0
sin π (8√3 − k)/4
= √ .
sin π (8 3 − k)/64
h √ i h √ i
The maximum value is obtained for k = 8 3 = 14, where 8 3 denotes the nearest integer value.
The maximum absolute DFT value at k = 14, along with the absolute values of its neighbors, is
sin π (8√3 − 14)/4
| X (14)| = √ = 15.9662,
sin π (8 3 − 14)/64
sin π (8√3 − 15)/4
| X (15)| = √ = 13.9412
sin π (8 3 − 15)/64
sin π (8√3 − 13)/4
| X (13)| = √ = 14.8249.
sin π (8 3 − 13)/64
Ljubiša Stanković Digital Signal Processing 157
| X (15)| − | X (13)|
d = 0.5 = −0.1395.
2 | X (14)| − | X (15)| − | X (15)|
√
The true frequency index would be 8 3 = 13.8564, with the true frequency 2π · 13.8564/64 = 1.3603.
The correct value of frequency index is shifted from the nearest integer k = 14 (on the frequency grid)
for 14 − 13.8564 = −0.1436, when the interpolation is done. Thus, the obtained displacement bin
value −0.1395 is close to the true shift value −0.1436. The estimated frequency, using the displacement
bin, is 1.3608. As compared to the true frequency, the error is 0.03%.
If the displacement
√ formula is applied on the DFT values, without interpolation, we would get
d = 0.3356, while 2 3 = 3.4641 is displaced from the nearest integer for 0.4641.
Chapter 4
z-Transform
HE Fourier transform of discrete-time signals and the DFT are used for direct signal processing
T and calculations. A transform that generalizes these transforms, in the same way as the Laplace
transform generalizes the Fourier transform of continuous-time signals, is the z-transform.
This transform provides an efficient tool for qualitative analysis and design of the discrete systems.
The Fourier transform of a discrete-time signal x (n) can be considered as a special case of the
z-transform defined by
∞
X (z) = ∑ x (n)z−n , (4.1)
n=−∞
where z = r exp( jω ) is a complex number. The value of the z-transform along the unit circle, |z| = 1
or z = exp( jω ), is equal to the Fourier transform X (e jω ) of discrete-time signals.
The z-transform, in general, converges only for some values of the complex argument z. The
region of z where X (z) is finite is the region of convergence (ROC) of the z-transform.
x ( n ) = a n u ( n ) + b n u ( n ),
where a and b are complex numbers, | a| < |b|. Find the z-transform of this signal and its region
of convergence.
158
Ljubiša Stanković Digital Signal Processing 159
not cancel out to produce a finite value). Since | a| < |b|, the region of convergence for X (z) is
|z| > |b|, as shown in Fig. 4.1.
a a
Im{z}
Im{z}
Im{z}
b b
Re{z} Re{z} Re{z}
Figure 4.1 Regions of convergence (gray area) for the signal x (n) = an u(n) + bn u(n).
where a and b are complex numbers, |b| > | a|. Find the z-transform of x (n) and its region of
convergence.
⋆The z-transform is
∞ −1 ∞ ∞
X (z) = ∑ an z−n − ∑ bn z−n = ∑ an z−n − ∑ b−n zn
n =1 n=−∞ n =1 n =1
a/z z/b a z
= − = + .
1 − a/z 1 − z/b z−a z−b
Infinite geometric series with progression coefficient ( a/z) converges for | a/z| < 1. The other
series converges for |z/b| < 1. Since |b| > | a| the region of convergence is | a| < |z| < |b|, Fig.
4.2.
Note that in this example and the previous one two different signals bn u(n) and −bn u(−n −
1) produced the same z-transform Xb (z) = z/(z − b), but with different regions of convergence.
For the signal bn u(n) the region of convergence was |b/z| < 1, while for −bn u(−n − 1) the
region of convergence was |z/b| < 1.
160 z-Transform
a a
Im{z}
Im{z}
Im{z}
b b
Figure 4.2 Regions of convergence (gray area) for the signal x (n) = an u(n − 1) − bn u(−n − 1).
4.2.1 Linearity
with the region of convergence being at least the intersection of the regions of convergence of X (z) and
Y (z). In special cases the region can be larger than the intersection of the regions of convergence of
X (z) and Y (z) if some poles, defining the region of convergence, cancel out in the linear combination
of the transforms.
4.2.2 Time-Shift
Additional pole at z = 0 is introduced for n0 > 0. The region of convergence is the same except for
z = 0 or z → ∞, depending on the value of n0 .
⋆ The z-transform of this equation is obtained using the linearity and the shift property
Example 4.4. For a causal signal x (n) = x (n)u(n), find the z-transform of x (n + n0 )u(n), for
n0 ≥ 0.
n =0 n =0
" #
∞
n0 −n −1 − n0 +1
=z ∑ x (n)z − x (0) − x (1) z − · · · − x ( n0 − 1) z
n =0
h i
= z n0 X ( z ) − x (0 ) − x (1) z −1 − · · · − x ( n0 − 1) z − n0 +1 .
For n0 = 1 follows
Z { x (n + 1)u(n)} = zX (z) − x (0). (4.2)
Note that for this signal x (n + n0 )u(n) 6= x (n + n0 )u(n + n0 ).
with region of convergence being scaled by | a|. In the special case, when a = e jω0 , the z-transform
plane is just rotated in the complex plane
∞
Z {e jω0 n x (n)} = ∑ x (n)e jω0 n z−n = X (ze− jω0 ),
n=−∞
4.2.4 Differentiation
d N X (z)
Z {n(n + 1) . . . (n + N − 1) x (n)u(n)} = (−1) N z N .
dz N
162 z-Transform
with the region of convergence being at least the intersection of the regions of convergence of X (z)
and H (z). In the case of a product of two z-transforms it may happen that some poles are canceled out
causing that the resulting region of convergence is larger than the intersection of the individual regions
of convergence.
⋆ This signal can be written as the convolution of signal x (n and u(n), that is
n n
∑ x (n) = x (n) ∗n u(n) = ∑ x (m)u(n − m)
m=−∞ m=−∞
The z-transform of the convolution of two signal is equal to the product of their corresponding
z-transforms,
z
Z { x (n) ∗n u(n)} = X (z) . (4.3)
z−1
According to the z-transform definition all terms with z−n vanishes as z → ∞. The term which does
not depend on z is obtained as the result of this limit. It is equal to x (0).
The stationary state value of a causal signal x (n) is
= lim [ x ( N + 1) − x (0)].
N →∞
Thus,
lim [ x ( N + 1) − x (0)] = lim [zX (z) − x (0) − X (z)],
N →∞ z →1
produces the stationary state value (4.5).
Most common approach to the z-transform inversion is based on a direct expansion of the given
transform into power series with terms z−1 , within the region of convergence. After the z-transform is
expanded into such a series
∞
X (z) = ∑ Xn z − n
n=−∞
the signal is identified as x (n) = Xn for −∞ < n < ∞.
164 z-Transform
In general, various techniques may be used to expand a function into power series. Most of the
cases in signal processing, after some transformations, reduce to a simple form of an infinite geometric
series
∞
1
= 1 + q + q2 + · · · = ∑ q n
1−q n =0
for |q| < 1.
identify possible regions of convergence and find the inverse z-transform for each of them.
⋆Obviously the z-transform has the poles z1 = 1/2 and z2 = 1/3. Since there are no poles in
the region of convergence there are three possibilities to define the region of convergence: (1)
|z| > 1/2, (2) 1/3 < |z| < 1/2, and (3) |z| < 1/3. The signals are obtained using power series
expansion for every case.
(1) For the region of convergence |z| > 1/2, the term 12 z−1 satisfies the condition that
1 −1
| 2 z | < 1. It can be expanded into geometric series as
∞
1 1 n ∞
1 −n 1 1
1
= ∑ = ∑ n z for < 1 or |z| > .
1 − 2z n =0
2z n =0
2 2z 2
However, the term 3z does not satisfy the condition |3z| < 1 for |z| > 1/2. This part of X (z)
should be modified so that it can also be expanded into geometric series for |z| > 1/2. This is
achieved if X (z) is rewritten as
1 1
X (z) = 1
+ 1
.
1− 2z −3z(1 − 3z )
Now, the second part of X (z) can be expanded using the following geometric series
∞
1 1 n ∞
1 −n 1
< 1 or |z| > 1 .
1
= ∑ = ∑ n
z for 3z
1 − 3z n =0
3z n =0
3 3
Both of these sums converge for |z| > 1/2. The resulting power series expansion of X (z) is
∞
1 −n 1 ∞ 1 −n
X (z) = ∑ 2 n
z −
3z n∑ n
z
n =0 =0 3
∞ ∞
1 1
= ∑ n z−n − ∑ n z−n .
n =0
2 n =1
3
The inverse z-transform, for this region of convergence |z| > 1/2, is
1 1
x (n) = u ( n ) − n u ( n − 1).
2n 3
Ljubiša Stanković Digital Signal Processing 165
(2) For the region of convergence defined by 1/3 < |z| < 1/2, the z-transform should be written
in the form
−2z 1
X (z) = + 1
.
1 − 2z −3z(1 − 3z )
The corresponding geometric series are
∞ 0
1 1
= ∑ (2z)n = ∑ 2−n z−n for |2z| < 1 or |z| <
1 − 2z n=0 n=−∞ 2
∞ n ∞
1 1 1 −n 1 1
= ∑ = ∑ z for < 1 or |z| > .
1
1 − 3z 3z 3 n 3z 3
n =0 n =0
They converge for 1/3 < |z| < 1/2. The resulting power series expansion is
0
1 ∞ 1 −n
∑ 2− n z − n −
3z n∑
X (z) = −2z n
z
n=−∞ =0 3
−1 1 −n ∞
1
=− ∑ n
z − ∑ n z−n .
n=−∞ 2 n =1
3
−2z 1
X (z) = + .
1 − 2z 1 − 3z
The corresponding geometric series are
∞ 0
1 1
= ∑ (2z)n = ∑ 2−n z−n for |2z| < 1 or |z| <
1 − 2z n=0 n=−∞ 2
∞ 0
1 1
= ∑ (3z)n = ∑ 3−n z−n for |3z| < 1 or |z| < .
1 − 3z n=0 n=−∞ 3
Both series converge for |z| < 1/3. The power series expansion is
0 0
X (z) = −2z ∑ 2− n z − n + ∑ 3− n z − n
n=−∞ n=−∞
−1 1 −n 0
1
=− ∑ n
z + ∑ n z−n .
n=−∞ 2 n=−∞ 3
z2 + 1
X (z) =
(z − 1/2)(z2 − 3z/4 + 1/8)
find the signal x (n) if the region of convergence is |z| > 1/2.
z2 + 1 z2 + 1
X (z) = =
(z − 1/2)(z − z1 )(z − z2 ) (z − 1/2)2 (z − 1/4)
where z1 = 1/2 and z2 = 1/4. Writing X (z) in the form of partial fractions
A B C
X (z) = + + ,
(z − 21 )2 z − 1
2 z− 1
4
or from
1 1 1 1
(z2 + 1) = A(z − ) + B(z − )(z − ) + C (z − )2 . (4.6)
4 2 4 2
For z = 1/4 we get 17/16 = C/16 or C = 17. Using the value z = 1/2 gives
1 1 1
( + 1) = A ( − )
4 2 4
Ljubiša Stanković Digital Signal Processing 167
and A = 5 is obtained. Finally if the highest order coefficients in the relation (4.6) with z2 are
equated
z2 = Bz2 + Cz2
we get 1 = B + C, producing B = −16. The z-transform is
5 −16 17
X (z) = 1 2
+ 1
+ .
(z − 2) z− 2 z − 1/4
For the region of convergence |z| > 1/2 and for the parameter | a| ≤ 1/2 holds
∞
1 1 −1 −1
= a = z (1 + az + a2 z −2 + . . . ) = ∑ a n −1 z − n .
z−a z (1 − z ) n =1
n−1 1 1
x (n) = 5 u(n − 2) − 16 n−1 u(n − 1) + 17 n−1 u(n − 1).
2n −2 2 4
Note: In general, the relation
1 1 dm 1
+
= ( )=
(z − a) m 1 m! dam z − a
!
∞
1 dm n −1 − n (n − 1)(n − 2)..(n − m) ∞ n−m−1 −n
=
m! dam ∑a z =
m! ∑a z
n =1 n =1
In general, the inversion is calculated using the Cauchy relation from the complex analysis
I
1
zm−1 dz = δ(m),
2πj
C
where C is any closed contour line within the region of convergence. The complex plane origin is
within the contour. By multiplying both sides of X (z) by zm−1 , after integration along the closed
168 z-Transform
where zi are the poles of zn−1 X (z) within the integration contour C, which is within the region of
convergence and k is the pole order. If the signal is causal, n ≥ 0, and all poles of zn−1 X (z) within
contour C are simple (first-order poles, with k = 1) then, for a given instant n,
n o
x (n) = ∑ [zn−1 X (z)(z − zi )]|z=zi . (4.7)
zi
find the signal x (n) for n ≥ 0 if the region of convergence is |z| > 1/2.
Hint: Since for each n < 0 there is a pole at z = 0 of the order −n + 1, to avoid different
derivatives for each of n we can make a substitution of variables z = 1/p, with dz = −dp/p2 .The
new region of convergence in the complex plane p will be |1/p| > 1/2 or | p| < 2. All poles are
now outside this region and outside the integration contour, producing the zero-valued integral.
Y ( z ) = X ( z ) H ( z ).
The z-transform of the output signal is obtained by multiplying the input signal z-transform by the
transfer function
∞
H (z) = ∑ h(n)z−n .
n=−∞
It is possible to relate two important properties of the linear time invariant systems with the transfer
function properties.
The system is stable if
∞
∑ |h(n)| < ∞.
n=−∞
It means that the z-transform exists at |z| = 1, that is, that the circle
|z| = 1
a a a
Im{z}
Im{z}
Im{z}
1 1 1
c
b b
Re{z} Re{z} Re{z}
2 4
60 h (n) 1.5 h2(n) 3 h3(n)
1
40 1 2
20 0.5 1
0 0 0
Figure 4.3 Regions of convergence (gray) with corresponding signals. Poles are marked by "x".
plot the regions of convergence and discuss the stability and causality. Find and plot the impulse
response for every case.
⋆ The regions of convergence are shown in Fig. 4.3. The system described by H1 (z) is causal
but not stable. The system H2 (z) is stable but not causal, while the system H3 (z) is both stable
and causal. Their impulse responses are shown in Fig. 4.3 as well.
Amplitude of the frequency response (gain) of a discrete system is related to the transfer function
as
| H (e jω )| = | H (z)||z=e jω .
Consider a discrete system whose transfer function assumes the form of a ratio of two polynomials
where z0i are the zeros and z pi are the poles of the transfer function. For the amplitude of the frequency
response we my write
B TO1 TO2 . . . TO M
| H (e jω )| = 0 ,
A0 TP1 TP2 . . . TPN
where TOi are the distances from the point T at the given frequency z = e jω to zero Oi at z0i . Distances
from the point T to the poles Pi at z pi are denoted by TPi .
Ljubiša Stanković Digital Signal Processing 171
Example 4.11. Plot the frequency response of a causal notch filter with the transfer function
z − e jπ/3
H (z) = .
z − 0.95e jπ/3
|e jω − e jπ/3 | TO1
| H (e jω )| = =
|e jω − 0.95e jπ/3 | TP1
where the zero O1 is positioned at z01 = e jπ/3 and the pole P1 is at z p1 = 0.95e jπ/3 . For any
point T at z = e jω , ω 6= π/3, the distances TO1 and TP1 from T to O1 and from T to P1 are
almost the same, TO1 ∼ = TP1 . Then | H (z)||z=e jω ∼
= 1 except at ω = π/3, when TO1 = 0 and
TP1 6= 0 resulting in | H (z)||z=e jπ/3 = 0. The frequency response | H (e jω )| is shown in Fig. 4.4.
O
1
1.5
T P
1
ω
|H(ejω)|
π/3
Im{z}
0.5
0
Re{z} −2 0 π/3 2 ω
Figure 4.4 Poles and zeros of a first-order notch filter (left). The frequency response of this notch filter (right).
An important class of discrete systems can be described by difference equations. They are obtained
by converting corresponding differential equations or by describing an intrinsically discrete system
relating the input and output signal in a recursive way. A general form of a linear difference equation
with constant coefficients, that relates the output signal y(n), at an instant n, with the input signal x (n)
and the previous input and output samples, is
The z-transform of the linear difference equation, assuming zero-valued initial conditions, is
since Z { x (n − i )} = X (z)z−i and Z {y(n − k)} = Y (z)z−k . The solution y(n) of the difference
equation is obtained as an inverse z-transform of
B0 + B1 z−1 + · · · + B M z− M
Y (z) = X ( z ).
1 + A1 z −1 + · · · + A N z − N
⋆The z-transform domain form of the system is Y (z) − 56 z−1 Y (z) + 16 z−2 Y (z) = X (z),
producing
1
Y (z) = X ( z ).
1 − 65 z−1 + 61 z−2
The z-transform of the input signal is X (z) = 1/(1 − 41 z−1 ) for |z| > 1/4. The z-transform of
the output signal y(n) is
z3
Y (z) = .
(z − 21 )(z − 31 )(z − 41 )
For a causal system the region of convergence is |z| > 1/2. The output signal is the inverse
z-transform of Y (z). For n > 0 it is equal to
n o
y(n) = ∑ [zn−1 Y (z)(z − zi )]|z=zi
zi =1/2,1/3,1/4
z n +2 z n +2 z n +2
= 1 1
+ 1 1
+
(z − 3 )( z − 4 ) |z=1/2 (z − 2 )( z − 4 ) |z=1/3 (z − 21 )(z − 31 ) |z=1/4
1 8 3
=6 − n + n.
2n 3 4
For n = 0 there is no pole at z = 0. Thus, the above expressions hold for n = 0 as well. The
output signal is given by
6 8 3
y ( n ) = n − n + n u ( n ).
2 3 4
Note: This kind of solution assumes the initial conditions from the system causality and x (n) in
the form: y(0) = x (0) = 1 and y(1) − 5y(0)/6 = x (1), that is, y(1) = 13/12.
Ljubiša Stanković Digital Signal Processing 173
Example 4.13. A first-order causal discrete system is described by the following difference equation
Find its impulse response and discuss its behavior in terms of the system coefficient A1 .
⋆For the impulse response calculation the input signal is defined by x (n) = δ(n) with X (z) = 1.
Then we have
with
y(n) = B0 δ(n) + (− A1 )n−1 (− A1 B0 + B1 )u(n − 1).
We can conclude that, in general, the impulse response has an infinite duration for any A1 6= 0. It
is a result of the recursive relation between the output y(n) and its previous value(s) y(n − 1).
This kind of systems is referred to as the infinite impulse response (IIR) systems or recursive
systems. If the value of coefficient A1 is zero-valued, that is A1 = 0, then there is no recursion
and
y(n) = B0 δ(n) + B1 δ(n − 1).
This is the system with a finite impulse response (FIR). This kind of system produces an output
to the signal x (n) in the form
and is referred to as the moving average (MA) system. Systems without recursion are always
stable since a finite sum of the finite signal values is always finite.
Systems that would contain only x (n) and the output signal recursions,
are auto-regressive (AR) systems or all pole systems. This kind of systems could be unstable, due
to the output signal recursion. In our case, the system is obviously unstable if | A1 | > 1. Systems
defined by (4.9) are in general auto-regressive moving average (ARMA) systems.
174 z-Transform
If the region of convergence were |z| < | A1 |, then the function Y (z) would be expanded
into series with q = z/A1 as
∞
B0 + B1 z−1 B0 B
Y (z) = = z+ 1 ∑ (− A1−1 z)n
A1 z−1 (z/A1 + 1) A1 A1 n =0
0 0
B
= B0 ∑ (− A1 )n−1 z−(n−1) + 1 ∑ (− A1 )n z−n
n=−∞ A1 n=−∞
−1 B1 0
= B0 ∑ (− A1 )n z−n + ∑ (− A1 )n z−n
n=−∞ A1 n=− ∞
with
B1
y(n) = B0 (− A1 )n u(−n − 1) + (− A1 )n u(−n).
A1
This system would be stable if |1/A1 | < 1 and unstable if |1/A1 | > 1, having in mind that
y(n) is nonzero for n < 0. This is an anticausal system since it has impulse response satisfying
h(n) = 0 for n ≥ 1.
Here, we have just introduced the notions. These systems will be considered in Chapter 5.
A direct way to solve a linear difference equation with constant coefficients of the form
y ( n ) + A1 y ( n − 1) + · · · + A N y ( n − N ) = x ( n ) (4.10)
yi (n) = Ci λin ,
where Ci and λi are constants. Replacing yi (n) into (4.11), the characteristic polynomial equation
follows
This is a polynomial of the Nth order. In general, it has N solutions λi , i = 1, 2, . . . , N. All functions
yi (n) = λin , i = 1, 2, . . . , N are the solutions of equation (4.11). Since the equation is linear, a linear
combination of these solutions,
N
yh (n) = ∑ Ci λin
i =1
is also a solution of the homogeneous equation (4.11). This solution is called homogeneous part of the
solution to (4.10).
Ljubiša Stanković Digital Signal Processing 175
Next a particular solution y p (n), corresponding to the form of the input signal x (n), should be
found. The solution to equation (4.10) is then
y ( n ) = y h ( n ) + y p ( n ).
The constants Ci , i = 1, 2, . . . , N are calculated based on the initial conditions y(i − 1), i = 1, 2, . . . , N.
5 1
y p (n) − y p (n − 1) + y p (n − 2) = n + 11/6
6 6
5 1
An + B − ( An − A + B) + ( An − 2A + B) = n + 11/6,
6 6
and A = 3, B = 1 follow. The solution to (4.12) is a sum of the homogeneous and the particular
solution,
1 1
y(n) = yh (n) + y p (n) = C1 n + C2 n + 3n + 1.
2 3
Using the initial conditions
y(0) = C1 + C2 + 1 = 1
C1 C
y (1) = + 2 +4=5
2 3
the constants C1 = 6 and C2 = −6 follow. The final solution is
6 6
y(n) = n − n + 3n + 1 u(n).
2 3
176 z-Transform
Note: The z-transform based solution would assume y(0) = x (0) = 11/6 and y(1) =
5y(0)/6 + x (1) = 157/36. The solution with the initial conditions y(0) = 1 and y(1) = 5 could
be obtained from this solution with appropriate changes of the first two samples of the input signal
in order to take into account the previous system state and to produce the given initial conditions
y(0) = 1 and y(1) = 5 .
If multiple polynomial roots are obtained, for example λi = λi+1 , then yi (n) = λin and
yi+1 (n) = nλin .
Consider a periodic signal x (n) with a period N and its DFT values X (k),
1 N −1
x (n) = ∑ X (k)e j2πnk/N . (4.14)
N k =0
If the signal within one of its periods, 0 ≤ n ≤ N − 1, is applied as the input to the system
described by difference equation (4.13) show that the output signal at n = N − 1 is equal to the
DFT of the signal x (n) at frequency index k = k0 , that is
y ( N − 1) = X ( k 0 ).
Consider now the case when the input signal x (n) is applied to the system. Since the system is
linear, consider one component of the input signal (4.14) in the form
1
xk (n) = X (k)e j2πnk/N ,
N
for an arbitrary 0 ≤ k ≤ N − 1. Then the difference equation (and the z-transform relation) for
this input signal reads
1
X (k)e j2πnk/N }
Z { xk (n)} = Z { (4.16)
N
1 N −1 1 1 − e j2πk z− N
= X (k) ∑ e j2πnk/N z−n = X (k) .
N n =0 N 1 − e j2πk/N z−1
or
z0 = e j2π (k+l )/N , l = 0, 1, 2, . . . , N − 1.
Note that the zero
z0 = e j2πk/N , obtained for l = 0
is canceled with the pole z p = e j2πkn/N in (4.16). Therefore the remaining zeros are at
z p = e j2πk0 /N .
- If k 6= k0 then one of zeros z0 = e j2π (k+l )/N , l = 1, 2,. . . , N − 1 will coincide with the pole
z p = e j2πk0 /N and will cancel it out. Thus, for k 6= k0 , the function Yk (z) will not have any pole.
Then, I
1
y k ( N − 1) = z N −2 Yk (z)dz = 0 (4.17)
2πj
C
since there are no poles within C, Fig. 4.5.
z=ej2πk/N z=ej2πk/N
k0=k
Im{z}
Im{z}
Im{z}
z=ej2πk0/N
k0≠ k
Re{z} Re{z} Re{z}
Figure 4.5 Zeros and the pole in Z { xk (n)} (left), the pole in 1/(1 − e j2πk0 n/N z−1 ) for k 6= k0 (middle), and
the pole in 1/(1 − e j2πk0 n/N z−1 ) for k = k0 (right). Illustration is for N = 16.
1 1−e j2πk 0 z − N
= z N −1 X ( k 0 )
N 1 − e j2πk0 /N z−1 j2πk /N
z=e 0
1 z N − e j2πk0
= X (k0 ) lim j2πk0 /N
= X ( k 0 ).
N z→e j2πk0 /N z − e
y k ( N − 1) = X ( k ) δ ( k − k 0 ).
is called the Goertzel algorithm for the DFT calculation at a given single frequency k0 .
It is interesting to note that the computation of (4.19) is more efficient than the computation
of (4.18). For the calculation of (4.18), for one k0 , we need one complex multiplication (4 real
multiplications) and one complex addition (2 real additions). For N instants and one k0 we need
4N real multiplications and 2N real additions. For the calculation of (4.19) we can use linear
property and calculate only
at every instant. It requires a multiplication of complex signal with a real coefficient. It means 2
real multiplications for every instant or 2N in total for N instants. The resulting output, at the
instant N − 1, is
It requires just one additional complex multiplication for the last instant and for one frequency.
The total number of multiplications is 2N + 4. It is reduced with respect to the previously needed
4N real multiplications. The total number of additions is 4N + 2. It is increased. However the
time needed for a multiplication is much longer than the time needed for an addition. Thus, the
Ljubiša Stanković Digital Signal Processing 179
overall efficiency is improved. The efficiency is even more improved having in mind that (4.20) is
the same for calculation of X (k0 ) and X (−k0 ) = X ( N − k0 ).
The z-transform will be related to the all other transform, presented so far. Consider a continuous-time
signal x (t) and its the Laplace transform X (s). If the integral in the Laplace transform is approximated
by the corresponding sum, we have
Z∞ ∞ ∞
X (s) = x (t)e−st dt ∼
= ∑ x (n∆t)e−sn∆t ∆t = ∑ x (n)e−sn∆t ,
−∞ n=−∞ n=−∞
with x (n) = x (n∆t)∆t. When this relation is compared to the z-transform definition we can conclude
that the Laplace transform of x (t) can be approximated by the z-transform of its samples with
z = exp(s∆t),
that is,
X (s) ↔ X (z)|z=exp(s∆t) . (4.21)
A point in the complex Laplace domain, s = σ + jΩ, maps to the point z = re jω
with r = eσ∆t
and ω = Ω∆t. Points from the left half-plane in the s domain, σ < 0, map to the interior of the unit
circle in the z domain, r < 1.
In applied mathematics, the transform X (z) at z = exp(s∆t) is called the starred or star
transform. It can be obtained as the Laplace transform of the sampled signal, denoted in the continuous-
time domain as
∞
x ∗ (t) = x (t) ∑ δ(t − n∆t).
n=−∞
The starred transform is derived by
Z∞ ∞ Z∞ ∞
X (s) = x ∗ (t)e−st dt = ∑ x (n∆t)e−sn∆t δ(t − n∆t)dt = ∑ x (n)e−sn∆t . (4.22)
−∞ n=−∞ −∞ n=−∞
According to the sampling theorem, for the Laplace transform of discrete-time signal holds
X (s)|σ=0 = X ( jΩ) = X ( j(Ω + 2kπ/∆t)).
The Fourier transform of a discrete-time signal is
∞
X (e jω ) = X (z)|z=e jω = ∑ x (n)z− n
|z=e jω
.
n=−∞
Example 4.16. The Fourier transform of a causal discrete-time signal x (n) is X (e jω ). Write its
z-transform in terms of the Fourier transform of the discrete-time signal, that is, write the z-
transform values in the complex plane based on their values on the unit circle.
180 z-Transform
∞ Zπ ∞ Zπ
−n 1 jω jωn −n 1 X (e jω )
X (z) = ∑ x (n)z =
2π
X (e ) ∑e z dω =
2π 1 − e jω z−1
dω,
n =0 −π n =0 −π
for |z| > 1. In this way, the relation between X (z), for any z, and X (e jω ) is established.
N=16
jω j2π k/16
z=e z=e
π/∆t
Im{s}=Ω
Im{z}
Im{z}
−π/∆t
Figure 4.6 Illustration of the z-transform relation with the Laplace transform (left), the Fourier transform of
discrete signals (middle), and the DFT (right).
Example 4.17. Consider a discrete-time signal x (n) with N samples different from zero within
0 ≤ n ≤ N − 1. Show that all values of X (z), for any z, can be calculated based on its N samples
on the unit circle in the z-plane.
⋆If the signal has N nonzero samples, then it can be expressed in term of its DFT as
N −1
1 N −1
∑ x (n)e− j2πnk/N and x (n) = X (k)e j2πnk/N .
N k∑
X (k) =
n =0 =0
Thus, the z-transform of this x (n) can be expressed in terms of the IDFT within 0 ≤ n ≤ N − 1,
in the form
N −1
1 N −1 N −1
1 N −1 1 − z− N e j2πk
∑ x (n)z−n = ∑ X (k) ∑ e j2πnk/N z−n =
N k∑
X (z) = −1 j2πk/N
X (k)
n =0
N k =0 n =0 =0 1 − z e
Ljubiša Stanković Digital Signal Processing 181
1 N −1 ∞ 1 N −1 1
∑ ∑ X (k)e j2πnk/N z−n =
N k∑
X (z) = −1 e j2πk/N
X ( k ).
N k =0 n =0 =0 1 − z
4.7 PROBLEMS
Problem 4.1. Find the z-transform and the region of convergence for the following signals:
(a) x (n) = δ(n − 2),
(b) x (n) = a|n| ,
(c) x (n) = 21n u(n) + 31n u(n)
Problem 4.2. Find the z-transform and the region of convergence for the signals:
(a) x (n) = δ(n + 1) + δ(n) + δ(n − 1),
(b) x (n) = 21n [u(n) − u(n − 10)].
dX (z)
Y (z) = −z
dz
corresponds to
y(n) = nx (n)u(n)
in the discrete-time domain, with the same region of convergence for X (z) and Y (z), find a causal
signal whose z-transform is
(a) X (z) = e a/z , |z| > 0.
(b) X (z) = ln(1 + az−1 ), |z| > | a|.
Problem 4.4. (a) How the z-transform of x (−n) is related to the z-transform of x (n)?
(b) If the signal x (n) is real-valued show that its z-transform satisfies X (z) = X ∗ (z∗ ).
Problem 4.5. If X (z) is the z-transform of a signal x (n) find the z-transform of
∞
y(n) = ∑ x ( k ) x ( n + k ).
k =−∞
z+1
X (z) = .
(2z − 1)(3z + 2)
3 − 65 z−1
H (z) = 1 −1 1 −1
.
(1 − 4 z )(1 − 3 z )
identify possible regions of convergence. For every case, comment the stability and causality of
the system whose transfer function is H (z). What is the output of the stable system to the input
x (n) = 2 cos(nπ/2)?
Problem 4.10. Find the impulse response of a causal system whose transfer function is
z+2
H (z) = .
( z − 2) z2
Problem 4.11. Find the inverse z-transform of
z2
X (z) = .
z2 +1
Problem 4.12. The system is described by the following difference equation
5 1 5 3
y ( n ) − y ( n − 1) + y(n − 2) − y(n − 3) = 3x (n) − x (n − 1) + x ( n − 2).
16 32 4 16
Find the impulse response of a causal system.
Problem 4.13. Show that the system defined by
3 1
y(n) = x (n) − x ( n − 1) + x ( n − 2)
4 8
has a finite output duration for an infinite duration input x (n) = 1/4n u(n) .
Problem 4.14. A linear time-invariant system is characterized by the impulse response
Using the z-transform find the output of the system to the input signal x (n) = u(n) − u(n − 6) .
Problem 4.15. Find the output of the causal discrete system
11 1 3
y(n) − y(n − 1) + y(n − 2) = 2x (n) − x (n − 1)
6 2 2
if the input signal is x (n) = δ(n) − 23 δ(n − 1).
Problem 4.16. Solve the difference equation
x (n + 2) + 3x (n + 1) + 2x (n) = 0
Ljubiša Stanković Digital Signal Processing 183
using the z-transform. The initial conditions are x (0) = 0 and x (1) = 1. The signal x (n) is causal.
x ( n + 1) = x ( n ) + a n u ( n )
∇ x ( n ) = x ( n ) − x ( n − 1),
∇ m x ( n ) = ∇ m −1 x ( n ) − ∇ m −1 x ( n − 1).
∆ m x ( n ) = ∆ m −1 x ( n + 1) − ∆ m −1 x ( n ).
Problem 4.20. Based on the pole-zero geometry plot the amplitude of the frequency response of the
system described by the difference equation
√ √
y(n) = x (n) − 2x (n − 1) + x (n − 2) + r 2y(n − 1) − r2 y(n − 2)
for r = 0.99. Based on the frequency response, find approximative values of the output signal if the
input is the continuous-time signal
Problem 4.21. Plot the frequency response of the discrete system (comb filter)
1 − z− N
H (z) =
1 − rz− N
4.8 EXERCISE
Exercise 4.1. Find the z-transform and the region of convergence for the following signals:
(a) x (n) = δ(n − 3) − δ(n + 3),
(b) x (n) = u(n) − u(n − 20) + 3δ(n),
(c) x (n) = 1/3|n| + 1/2n u(n),
(d) x (n) = 3n u(−n) + 2−n u(n),
(e) x (n) = n(1/3)n u(n).
(f) x (n) = cos(n π2 ).
Exercise 4.2. Find the z-transform and the region of convergence for the signals:
(a) x (n) = 3n u(n) − (−2)n u(n) + n2 u(n).
(b) x (n) = ∑nk=0 2k 3n−k ,
(c) x (n) = ∑nk=0 3k .
Exercise 4.3. Find the inverse z-transform of:
−8
(a) X (z) = 1z−z + 3, if X (z) is the z-transform of a causal signal x (n).
(b) X (z) = (zz−+22)z2 , if X (z) is the z-transform of a causal signal x (n).
2
(c) X (z) = 6z +3z−2 , if X ( z ) is the z-transform of an infinite-duration signal x ( n ).
6z2 −5z+1
∞
Find ∑n=−∞ x (n) in this case.
Exercise 4.4. Find the inverse z-transforms of:
z−5 (5z−3)
(a) X (z) = (3z−1)(2z−4) , if x (n) is causal,
(b) Y (z) = X ( 2z ), for a causal signal y(n),
(c) Y (z) = z−2 X (z), for a causal signal y(n).
Exercise 4.5. Find the inverse z-transforms of X (z) = cosh( az) and X (z) = sinh( az).
Exercise 4.6. If X (z) is the z-transform of a signal x (n), with the region of convergence |z| > 21 , find
the z-transforms of the following signals:
(a) y(n) = x (n) − x (n − 1),
∞
(b) y(n) = ∑ x (n − kN ), where N is an integer,
k =−∞
(c) y(n) = x (n) ∗n x (−n), where ∗n denotes the convolution.
d
(d) Find the signal whose z-transform is Y (z) = dz X ( z ).
Exercise 4.7. If X (z) is the z-transform of a signal x (n), find the z-transform of
∞
y(n) = ∑ x ∗ ( n − k ) x ( n + k ).
k =−∞
Exercise 4.12. Using the basic trigonometric transformations show that the real-valued signal
y(n) = cos(2πk0 n/N + ϕ) is a solution to the homogeneous difference equation
and r = 0.9999 plot the amplitude of the frequency response and find the output to the signal
4.9 SOLUTIONS
for any z 6= 0.
(b) For the signal x (n) = a|n| , the z-transform is given by
∞ −1 ∞
(1 − a2 ) z
X (z) = ∑ a|n| z−n = ∑ a−n z−n + ∑ an z−n = (1 − az)(z − a)
n=−∞ n=−∞ n =0
for |z| < 1/a and |z| > a. If | a| < 1 then the region of convergence is a < |z| < 1/a.
(c) In this case, when x (n) = 21n u(n) + 31n u(n), the z-transform is
∞ ∞
1 −n 1 1 1
X (z) = ∑ n
z + ∑ n z−n = 1 −1
+ 1 −1
n =0
2 n =0
3 1 − 2 z 1 − 3z
2 − 65 z−1 z(2z − 65 )
X (z) = =
(1 − 12 z−1 )(1 − 31 z−1 ) (z − 21 )(z − 31 )
for |z| > 1/2 and |z| > 1/3. The region of convergence is |z| > 1/2.
Solution 4.2. (a) The z-transform of signal x (n) = δ(n + 1) + δ(n) + δ(n − 1) is
∞
X (z) = ∑ (δ(n + 1) + δ(n) + δ(n − 1)) z−n =
n=−∞
1
= z + 1 + z −1 = z + 1 + .
z
The region of convergence excludes z = 0 and z −→ ∞.
(b) For x (n) = 21n [u(n) − u(n − 10)] we know that
1, n = 0, 1, . . . , 9
u(n) − u(n − 10) =
0, elsewhere.
The expression for X (z) is written in this way in order to find the region of convergence, observing
the zero-pole locations in the z-plane, Fig. 4.7. Poles are at z p1 = 0 and z p2 = 1/2. Zeros are
z0i = e j2iπ/10 /2, Fig. 4.7. Since the z-transform has a zero at z0 = 1/2, it will cancel out the pole
z p2 = 1/2. The resulting region of convergence will include the whole z plane, except the point at
z = 0.
Ljubiša Stanković Digital Signal Processing 187
j2π/10
z=e /2
Im{z}
z=1/2
Re{z}
dX (z) a a
−z = z 2 e a/z = X (z)
dz z z
The inverse z-transform of left and right side of this equation is
nx (n)u(n) = ax (n − 1)u(n)
dX (z)
since Z [nx (n)] = −z dz and z−1 X (z) = Z [ x (n − 1)]. This means that
a
x (n) = x ( n − 1)
n
for n > 0. According to the initial value theorem, we the value of x (0) is equal to
a2 a3
x (1) = a, x (2) = , x (3) = ,...
2 2·3
or
an
x (n) = u ( n ).
n!
(b) For X (z) = ln(1 + az−1 )
follows. The region of convergence is complementary to the one holding for the original signal. If the
region of convergence for x (n) is |z| > a, then the region of convergence for x (−n) is |z| < a .
(b) For a real-valued signal holds x ∗ (n) = x (n). Then we can write X ∗ (z∗ ) as
∞ ∗ ∞ ∗
X ∗ (z∗ ) = ∑ x (n)(z∗ )−n = ∑ x ∗ (n) (z∗ )−n .
n=−∞ n=−∞
∗ ∗
Since (z∗ )−n = (z−n )∗ = z−n , we get
∞ ∞
X ∗ (z∗ ) = ∑ x ∗ (n)z−n = ∑ x ( n ) z − n = X ( z ),
n=−∞ n=−∞
Solution 4.6. A direct expansion of the given transform into power series within the region of
convergence is used. In order to find the signal x (n), whose z-transform is X (z) = 2−13z , the z-
transformshould
be written in the form of the power series of X (z) in powers of z−1 . Since the
3z
condition 2 < 1 does not correspond to the region of convergence given in the problem formulation
we have to rewrite X (z) as
1 1
X (z) = − 2
.
3z 1 − 3z
Ljubiša Stanković Digital Signal Processing 189
2
Now, the condition 3z < 1, that is |z| > 32 , corresponds to the given region of convergence. In order
to obtain the inverse z-transform, we can write
1 1 1
X (z) = − 2
= − X1 ( z ) ,
3z 1 − 3z 3z
where
1
X1 ( z ) = 2
.
1 − 3z
The power series expansion of X1 (z) gives
∞ n ∞ n
2 2
X1 ( z ) = ∑ = ∑ z−n .
n =0
3z n =0
3
Finally, comparing this result to (4.26) we get the signal x (n) in the form
n −1
1 2
x (n) = − u ( n − 1).
3 3
Solution 4.7. Since the signal is causal, the region of convergence is outside the pole with the largest
radius (outside the circle passing through this pole). The poles of the z-transform X (z) are
1 2
z p1 = and z p2 = − .
2 3
The region of convergence is |z| > 32 . The z-transform can be written as
z+1 A B
X (z) = = +
(2z − 1)(3z + 2) 2z − 1 3z + 2
3 1
A= , B=− .
7 7
190 z-Transform
The terms in X (z) should be expressed in such a way that they represent sums of geometric series for
the given region of convergence. Based on the solution to the previous problem, we conclude that
A 1 B 1
X (z) = 1
+ 2
.
2z 1 − 2z 3z 1 + 3z
and
B 1 B ∞ 2 n −n B ∞ 2 n − n −1 2
3z n∑ ∑ 3 z
2
= − z = − , |z| > .
3z 1 + 3z =0 3 3 n =0
3
The z-transform, with m = n + 1, is of the form
m −1
A ∞ 1 −m B ∞ 2 m −1 − m
2 m∑ 3 m∑
X (z) = z + − z .
=1 2 =1 3
The signal x (n) is obtained by comparing this transform to the z-transform definition,
n
3 1 1 2 n
x (n) = + − u ( n − 1).
7 2 14 3
3 − 56 z−1 A B
H (z) = 1 −1 1 −1
= + ,
(1 − 4 z )(1 − 3 z ) 1 − 14 z−1 1 − 13 z−1
with A = 1 and B = 2.
(a) The region of convergence must contain |z| = 1, for a stable system. This region of convergence
is |z| > 13 . From
n ∞ ∞ n
1 2 1 −n 1 1 1
H (z) = + = ∑ z +2 ∑ z−n , |z| > and |z| >
1 − 41 z−1 1 − 13 z−1 n=0 4 n =0
3 3 4
h ( n ) = (4− n + 2 × 3− n ) u ( n ).
(b) The region of convergence is 14 < |z| < 31 . The first term in H (z) is the same as in (a), since it
converges for |z| > 14 . This term corresponds to the signal 4−n u(n). The second term must be rewritten
in such a way that its geometric series converges for |z| < 13 . Then,
2 3z ∞ −1 1
1 −1
= −2 = −2 ∑ (3z)n = −2 ∑ (3z)−m with |z| < .
1− 3z
1 − 3z n =1
m=−n m=−∞ 3
Ljubiša Stanković Digital Signal Processing 191
The signal corresponding to this z-transform is −2 × 3−n u(−n − 1). Then, the impulse response of
the system with the region of convergence 14 < |z| < 31 is obtained in the form
c) For an anticausal system, the region of convergence is |z| < 14 . Now, the second term in H (z) is the
same as in (b). For |z| < 41 , the first term in H (z) should be written as
1 4z ∞ −1 1
=− = − ∑ (4z)n = − ∑ (4z)−m with |z| < .
1 − 14 z−1 1 − 4z n =1
m=−n m=−∞ 4
The signal corresponding to this term is −4−n u(−n − 1). The impulse response of the anticausal
discrete system, with the given transfer function H (z), is
1 1
H (z) = √ = √ √
3 3 3
(1 − 4z)( 14 − 2
2 z+z ) (1 − 4z)(z − 4 + j 14 )(z − 4 − j 41 )
√ √
with the poles z1 = 1/4, z2 = 43 − j 14 , and z3 = 43 + j 14 . Since |z2 | = |z3 | = 1/2, the possible
regions of convergence are:
(1) |z| < 1/4,
(2) 1/4 < |z| < 1/2, and
(3) |z| > 1/2.
In the first two cases, the system is neither causal nor stable, while in the third case, the system is
causal and stable since |z| = 1 and |z| → ∞ belong to the region of convergence.
The output to the input signal
is
z+2 A B C
= + + 2.
z2 ( z − 2) z−2 z z
Az2 + Bz(z − 2) + C (z − 2) = z + 2
( A + B)z2 + (−2B + C ) − 2C = z + 2.
A + B = 0, −2B + C = 1, and − 2C = 2
192 z-Transform
z −1 1 1
H (z) = − 2− .
1 − 2z−1 z z
The region of convergence of a causal system is |z| > 2. The inverse z-transform of H (z) for causal
system is the system impulse response equal to
h ( n ) = 2n −1 u ( n − 1) − δ ( n − 2) − δ ( n − 1) = δ ( n − 2) + 2n −1 u ( n − 3).
For the region of convergence defined by |z| > 1, the signal is causal and
1 1
x (n) = [1 + (−1)n ] jn u(n) = [1 + (−1)n ]e jπn/2 u(n).
2 2
For n = 4k, where k ≥ 0 is an integer, x (n) = 1 , while for n = 4k + 2 the signal values are x (n) = −1.
For other n the signal value is x (n) = 0.
For |z| < 1 the inverse z-transform is
1
x (n) = − [1 + (−1)n ] jn u(−n − 1).
2
3 − 45 z−1 + 3 −2
16 z 3 − 54 z−1 + 3 −2
16 z
H (z) = 5 −2 1 −3
= 1 −1 1 −2 1 −1
1 − z −1 + 16 z − 32 z (1 − 2z + 16 z )(1 − 2 z )
1 1 5
=− − 2 + .
1 − 41 z−1 1− 1 −1 (1 − 12 z−1 )
4z
For a causal system, the region of convergence is outside of the pole z = 1/2, that is |z| > 1/2. Since
1 d z
2 =
da 1 − az −1
1 −1 a=1/4
1 − 4z
d ∞ n −(n−1) ∞
∞
1
= ∑ a z = ∑ nan−1 z−(n−1) = ∑ ( n + 1) n z − n ,
da n=0 n =1
n =0
4
a=1/4 a=1/4
Solution 4.13. The transfer function of the system defined by difference equation
3 1
y(n) = x (n) − x ( n − 1) + x ( n − 2)
4 8
is
3 1
H ( z ) = 1 − z −1 + z −2 .
4 8
The z-transform of the input signal x (n) = 1/4n u(n) is equal to
1
X (z) = ,
1 − 14 z−1
with the region of convergence |z| > 1/4. The z-transform of the output signal is
1
H (z) =
1 − 13 z−1
1 − z −6
X ( z ) = 1 + z −1 + z −2 + z −3 + z −4 + z −5 = .
1 − z −1
The z-transform of the output signal is
1 − z −6
Y (z) = = Y1 (z) − Y1 (z)z−6
(1 − z−1 )(1 − 1/3z−1 )
with
1 3/2 1/2
Y1 (z) = = − .
(1 − z−1 )(1 − 1/3z−1 ) 1 − z−1 1 − 31 z−1
Its inverse z-transform is n
3 1 1
y1 ( n ) = − u ( n ).
2 2 3
The system output is obtained in the form
n " #
3 1 1 3 1 1 n −6
y(n) = − u(n) − − u ( n − 6).
2 2 3 2 2 3
11 −1 1 −2 3
Y (z)(1 − z + z ) = X (z)(2 − z−1 )
6 2 2
194 z-Transform
as
2 − 23 z−1
H (z) = 11 −1
.
1− 6 z + 21 z−2
The poles are at z p1 = 1/3 and z p2 = 3/2 with the region of convergence |z| > 3/2. This means that
the system is not stable, Fig. 4.8.
Im{z}
Im{z}
Im{z}
1/3 1/3
3/2 3/2 3/2
Figure 4.8 Poles and zeros of the system (left), the input signal z-transform (middle), and the z-transform of the
output signal (right).
The output signal transform does not have a pole at z = 3/2, since this pole is canceled out. The output
signal is
2 3 1
y(n) = n u(n) − u ( n − 1).
3 2 3n −1
with
z 1 1
X (z) = = −
− .
z2 + 3z + 2 1 + z 1 1 + 2z−1
The inverse z-transform of X (z) is
Solution 4.17. The z-transforms of the left and right side of this equation are
z
zX (z) − zx (0) = X (z) +
z−a
z 1 1 a
X (z) = = − .
(z − a)(z − 1) 1 − a z − 1 z − a
The inverse z-transform is
1 1 − an
x (n) = [u(n − 1) − an u(n − 1)] = u ( n − 1)
1−a 1−a
or
n −1
x (n) = ∑ ak , n > 0.
k =0
Solution 4.18. For a direct solution in the discrete-time domain we assume a solution to the
homogeneous part of the equation
√
2 1
y(n) − y ( n − 1) + y ( n − 2) = 0 (4.27)
2 4
in the form yi (n) = Ci λin . The characteristic polynomial is
√
2 1
λ2 − λ+ =0
2 4
√ √
2 2
with λ1,2 = 4 ±j 4 .
The homogeneous solution to the difference equation is
√ √ √ √
2 2 n 2 2 n 1 1
yh (n) = C1 ( +j ) + C2 ( −j ) = C1 n e jnπ/4 + C2 n e− jnπ/4 .
4 4 4 4 2 2
1
A particular solution is assumed in the form of the input signal x (n) = 3n u ( n ) , that is
1
y p (n) = A u ( n ).
3n
The constant A is obtained by replacing this signal into (4.23)
√
1 2 1 1 1 1
A n − A n −1 + A n −2 = n
3 2 3 4 3 3
√
3 2 9
A (1 − + ) = 1.
2 4
Its value is A = 0.886. The general solution to the considered difference equation is equal to the sum
of the homogeneous solution and the particular solution
1 jnπ/4 1 1
y(n) = yh (n) + y p (n) = C1 e + C2 n e− jnπ/4 + 0.886 n .
2n 2 3
Since the system is causal with y(n) = 0 for n < 0, the constants C1 and C2 can be obtained from the
initial conditions following from
√
2 1
y(n) − y ( n − 1) + y ( n − 2) = x ( n )
2 4
196 z-Transform
as
y (0) = x (0) = 1
and √ √
2 2 1
y (1) = y (0) + x (1) = +
2 2 3
. With this initial conditions, we get
C1 + C2 + 0.886 = 1 (4.28)
√ √ √ √ √
2 2 2 2 1 2 1
C1 ( +j )/2 + C2 ( −j )/2 + 0.886 = + ,
2 2 2 2 3 2 3
as C1 = 0.057 − j0.9967 = 0.9984 exp(− j1.5137) = C2∗ . The final solution is
1 1
y(n) = 2 × 0.9984 cos(nπ/4 − 1.5137) + 0.886 n .
2n 3
For the z-domain, we write
√
2 1
Y (z) − Y ( z ) z −1 + Y ( z ) z −2 = X ( z )
2 4
with
1 1
Y (z) = √
1− 2 −1
+ 1 −2 1 − 31 z−1
2 z 4z
or
z3
Y (z) = √ √ √ √ .
2 2 2 2 1
(z − ( 4 +j 4 ))( z − ( 4 −j 4 ))( z − 3 )
Using the residual value based inversion of the z-transform, we can get the signal in the form
n o
y(n) = ∑ [zn−1 Y (z)(z − zi )]|z=zi .
√ √
2 2
z1,2,3 = 4 ± j 4 ,1/3
With the residual value definition for the simple poles z1 , z2 , and z3 we get
n +2 1 1
y(n) = z √ √ + z n + 2 √ √
2− j 2
1 √ √ 2+ j 2 1
√ √
(z − 4 )( z − 3 ) 2+ j 2 ( z − 4 )( z − 3 ) z= 4
2− j 2
4
1
+ z n +2 √ √ √ √
2+ j 2 2− j 2
(z − 4 )(z − 4 ) z=1/3
√ !
√ n +2 √ !
√ n +2
1 2+j 2 1 1 2−j 2 1 1 1
= √ √ √ − √ √ √ + n +2 √
j 2 4 2+ j 2 1 j 2 4 2− j 2 1 3 ( 1
− 1 2 1
2 4 −3 2 4 −3 9 3 2 + 4)
√ √
1 −j 2 1 j 2 1
= n+2 e j(n+2)π/4 √ √ + n+2 e− j(n+2)π/4 √ √ + 0.886 n
2 2+ j 2
−31 2 2− j 2
−3 1 3
4 4
√ √
1 2 1 2 1
= n e jnπ/4 √ √ + n e− jnπ/4 √ √ + 0.886 n
2 2 + j 2 − 43 2 2 − j 2 − 34 3
1 1
= 2 × 0.9984 cos(nπ/4 − 1.5137) + 0.886 n ,
2n 3
Ljubiša Stanković Digital Signal Processing 197
for n ≥ 1. For n = 0, there is no additional pole at z = 0. The previous result holds for n ≥ 0.
Its z-transform is
Z [∇2 x (n)] = (1 − z−1 )2 X (z).
In the same way we get
Z [∇m x (n)] = (1 − z−1 )m X (z).
The z-transform of the first forward difference is
Solution 4.20. The transfer function of this system can be written in the form
√ √ √ √ √
1 − 2z−1 + z−2 [1 − ( 22 + j 22 )z−1 ][1 − ( 22 − j 22 )z−1 ]
H (z) = √ = √ √ √ √
1 − r 2z−1 + r2 z−2 [1 − r ( 22 + j 22 )z−1 ][1 − r ( 22 − j 22 )z−1 ]
√ √ √ √
2
[z − ( 2 + j 22 )][z − ( 22 − j 22 )]
= √ √ √ √ .
[z − r ( 22 + j 22 )][z − r ( 22 − j 22 )]
√ √ √ √
The zeros and poles are z01,02 = 22 ± j 22 and z p1,p2 = r 22 ± jr 22 . Their locations are shown in
Fig. 4.9.
The amplitude of the frequency response is
B TO1 TO2 TO1 TO2
| H (e jω )| = 0 = .
A0 TP1 TP2 TP1 TP2
The values of TP1 and TO1 , and TP2 and TO2 , are almost the same for any ω, except
ω = ±π/4, where the distance to the corresponding zeros of the transfer function is 0, while the
198 z-Transform
distance to the corresponding pole is small but finite. Based on this analysis, the amplitude of the
frequency response is shown in Fig. 4.9.
O1
1.5
T
P1
|H(ejω)|
Im{z}
P2
0.5
O2
0
Re{z} −2 −π/4 0 π/4 2 ω
This system will filter out signal the components at ω = ±π/4. The output discrete-time signal is
z−
o
N
= 1 = e− j2πm
zom = e j2πm/N , m = 0, 1, . . . , N − 1.
Similarly, the poles are equal to zmp = r1/N e j2πm/N , m = 0, 1, . . . , N − 1. The frequency response of
the comb filter is
N −1 N −1
z − zom z − e j2πm/N
H (z) = ∏ = ∏ .
m =0
z − z pm m =0 z − r
1/N e j2πm/N
| H (e jω )| ∼
= 1 for z 6= e j2πm/N
| H (e jω )| = 0 for z = e j2πm/N .
T high importance. Some discrete-time systems are designed and realized in order to replace
or perform as equivalents of the continuous-time systems. It is quite common to design a
continuous-time system with desired properties, since the designing procedures in this domain are
simpler and well developed. In the next step the obtained continuous-time system is then transformed
into the corresponding discrete-time system.
Consider an Nth order linear continuous-time system described by a differential equation with
constant coefficients
d N y(t) dy(t) d M x (t) dx (t)
aN + · · · + a 1 + a 0 y ( t ) = b M + · · · + b1 + b0 x (t).
dt N dt dt M dt
The Laplace transform domain equation for this system is
There are several approaches to establish the relation between the continuous-time system in (5.1) and
the discrete-time system in (5.1), represented by their corresponding impulse responses or transfer
functions.
A natural approach to transform a continuous-time system into the corresponding discrete-time system
is based on the relation between the impulse responses of these two systems. Assume that the impulse
response of the continuous-time system is hc (t). The impulse response h(n) of the corresponding
discrete-time system, according to this approach, is equal to the samples of hc (t),
h(n) = hc (n∆t)∆t.
200
Ljubiša Stanković Digital Signal Processing 201
Obviously this relation can be used only if the sampling theorem is satisfied for the sampling interval
Figure 5.1 Sampling of the impulse response for the impulse invariance method.
∆t. It means that the frequency response of the continuous-time system must satisfy the sampling
theorem condition
and ∆t < π/Ωm . Otherwise the discrete-time version will not correspond to the continuous-time
version of the frequency response. Here, the frequency response of the discrete-time system is related
to a periodically extended form of the continuous-time system frequency response H (Ω) as
∞
∑ H (Ω + 2kπ/∆t) = H (e jω ), Ω = ω/∆t.
k=−∞
The transfer function of the continuous-time system in (5.1) may be decomposed using the partial
fractions as
b M s M + · · · + b1 s + b0 k1 k2 kN
H (s) = N
= + + ··· + , (5.3)
a N s + · · · + a1 s + a0 s − s 1 s − s 2 s − sN
where only simple poles, s1 , s2 , . . . , s N , of the transfer function are used. The case of multiple poles
will be discussed later. The inverse Laplace transform of a causal system, described by the previous
transfer function, is
h c ( t ) = k 1 e s1 t u ( t ) + k 2 e s2 t u ( t ) + · · · + k N e s N t u ( t ).
The impulse response of the corresponding discrete-time system is equal to the the samples of hc (t),
h(n) = hc (n∆t)∆t = [k1 ∆tes1 n∆t u(n) + k2 ∆tes2 n∆t u(n) + · · · + k N ∆tes N n∆t u(n)],
since u(n∆t) = u(n). The z-transform of the impulse response h(n) of the discrete-time system is
k1 ∆t k2 ∆t k N ∆t
H (z) = −
+ −
+ ··· + . (5.4)
1−e s 1 ∆t z 1 1−e s 2 ∆t z 1 1 − es N ∆t z−1
Comparing (5.3) to (5.4) it can be concluded that the terms in the transfer functions are transformed
from the continuous-time to the discrete-time case as
ki k i ∆t
→ . (5.5)
s − si 1 − esi ∆t z−1
202 From Continuous to Discrete Systems
If a multiple pole si of the (m + 1)th order exists in the continuous-time system transfer function,
then this term can be written as
ki 1 dm k i
= .
( s − si ) m + 1 m! dsim s − si
The term in the discrete-time system, corresponding to this continuous-time system term, is
1 dm ki 1 dm k i ∆t
→ . (5.6)
m! dsim s − si m! dsim 1 − esi ∆t z−1
s=jΩ
j2π/∆t
jπ/∆t z=ejω
Im{z}
Im{s}
−jπ/∆t
−j2π/∆t
Re{s} Re{z}
si → esi ∆t .
This mapping relation does not hold for zeros, Fig. 5.2.
Impulse response with an initial instant discontinuity. In the case when the continuous-time impulse
response hc (t) has a discontinuity at t = 0, that is, when
hc (t)|t=−0 6= hc (t)|t=+0 ,
then the previous forms assume that the discrete-time impulse response h(0) = hc (t)|t=+0 . Recall that
the theory of Fourier transforms in this case states that the inverse Fourier transform IFT{ H ( jΩ)} =
hc (t) where the signal hc (t) is continuous and
IFT{ H ( jΩ)} = hc (t)|t=−0 + hc (t)|t=+0 /2
at the discontinuity points (in this case at t = 0). The special case of discontinuity at t = 0 can be easily
detected for a causal system by mapping H (s) onto H (z) and by checking is the following relation
satisfied
0 = hc (t)|t=−0 = hc (t)|t=+0 = h(n)|n=0 = lim H (z).
z→∞
If limz→∞ H (z) 6= 0 then a discontinuity exists and we should use
since hc (t)|t=−0 = 0 and hc (t)|t=+0 ∆t = limz→∞ H (z). The resulting frequency response is
What is the corresponding discrete-time system according to the impulse invariance method with
∆t = 1?
with
k1 = H (s)(s + 1)|s=−1 = −1,
1
k2 = H (s)(s + ) = 2.
2 s=−1/2
Thus, we get
−1 2
H (s) = + .
s+1 s + 12
According to (5.5) the discrete-time system is
−1 2
H (z) = − −
+ .
1−e z1 1 1 − e 1/2 z−1
−
Since limz→∞ H (z) = 1, obviously there is a discontinuity in the impulse response and the
resulting transfer function should be corrected as
−1 2
H (z) = + − 1/2.
1 − e −1 z −1 1 − e−1/2 z−1
The impulse response and the frequency response of the discrete-time systems with uncorrected
and corrected discontinuity effect are presented in Fig. 5.3.
(1 − 3s/2)
H (s) = .
(6s2 + 5s + 1)(s + 1)2
What is the corresponding discrete-time system according to the impulse invariance method with
∆t = 1?
204 From Continuous to Discrete Systems
0.5 0.5
0 0
−5 0 5 10 15 −5 0 5 10 15
4 4
|H(ejω)| |H(ejω)|
3 |H(jΩ)| 3 |H(jΩ)|
2 2
1 1
0 0
−2 0 2 −2 0 2
Figure 5.3 Impulse responses of systems in continuous and discrete-time domains (top). Amplitude of the
frequency response of systems in continuous and discrete-time domains (bottom). System without discontinuity
correction (left) and system with discontinuity correction (right).
with
k1 = H (s)(s + 1/2)|s=−1/2 = −7, k2 = 27/8, and k3 = H (s)(s + 1)2 = 5/4.
s=−1
Since h(0) = limz→∞ H (z) = 0 there no need to consider possible impulse response correction
due to discontinuity.
Ljubiša Stanković Digital Signal Processing 205
we can easily see that the poles are mapped according to s pi → es pi ∆t , Fig. 5.4, while there is
no direct correspondence among zeros of the transfer functions. The impulse responses of the
continuous-time system and the discrete-time system are shown in Fig. 5.5.
s=jΩ
z=ejω
Im{z}
Im{s}
1
−1 2/32/3 1.9894
Re{s} Re{z}
Figure 5.4 Pole-zero locations in the s-domain and the z-domain using the impulse invariance method.
The matched z-transform method is based on a discrete-time approximation of the Laplace transform
derived in the previous chapter as the starred transform (4.22)
Z∞ ∞
X (s) = x (t)e−st dt ∼
= ∑ x (n)e−sn∆t = X (z)|z=es∆t .
−∞ n=−∞
This approximation leads to a relation between the Laplace domain and the z-domain in the form of
z = es∆t .
If we use this relation to map all zeros and poles of a continuous-time system transfer function
z0i = es0i ∆t
z pi = es pi ∆t ,
206 From Continuous to Discrete Systems
0.3
hc(t), h(n)
0.2
0.1
−0.1
0 5 10 15 20 25 30 35 40
|H(jΩ)|, |H(ejω)|
1
0.5
0
−3 −2 −1 0 1 2 3
1
10
20log|H(jΩ)|
0 20log|H(ejω)|
10
−1
10
−2
10
−3 −2 −1 0 1 2 3
Figure 5.5 Impulse responses of systems in continuous and discrete-time domains (top). Amplitude of the
frequency response of systems in continuous and discrete-time domains (middle). Amplitude of the frequency
response of systems in continuous and discrete-time domains in logarithmic scale (bottom).
the matched z-transform method of the system follows. The discrete-time system transfer function is
Example 5.3. For the continuous-time system with the transfer function
1−s
H (s) =
8s2 + 6s + 1
find the corresponding discrete-time system according to the matched z-transform method and
∆t = 1?
Ljubiša Stanković Digital Signal Processing 207
s=jΩ
j2π/∆t
jπ/∆t z=ejω
Im{z}
Im{s}
1
−jπ/∆t
−j2π/∆t
Re{s} Re{z}
Figure 5.6 Illustration of the zeros and poles mapping in the matched z−transform method.
z−e
H (z) = k .
8(z − e−1/2 )(z − e−1/4 )
The first-order backward difference is a common method to approximate the first-order derivative of a
continuous-time signal
dx (t)
y(t) =
dt
∼ n∆t) − x ((n − 1)∆t) .
y(n∆t) =
x (
∆t
The Laplace transform domain of the continuous-time first derivative is
In the discrete-time domain, with y(n) = y(n∆t)∆t and x (n) = x (n∆t)∆t, the approximation of this
derivative results in the first-order linear difference equation
x ( n ) − x ( n − 1)
y(n) = .
∆t
208 From Continuous to Discrete Systems
1 − z −1
Y (z) = X ( z ). (5.9)
∆t
Based on (5.8) and (5.9) we can conclude that the mapping of the corresponding differentiation
operators from the continuous-time to the discrete-time domain is
1 − z −1
s= . (5.10)
∆t
With a normalized discretization step ∆t = 1 this mapping is of the form
s = 1 − z −1 .
The same result could be obtained by considering a rectangular rule approximation of a continuous-
time integral, at an instant t = n∆t,
n∆t
Z Z −∆t
n∆t
y(n∆t) = x (t)dt ∼
= x (t)dt + x (n∆t)∆t.
−∞ −∞
y(n∆t) ∼
= y(n∆t − ∆t) + x (n∆t)∆t.
The Laplace and the z-transform domain forms of the previous integral equations are
1
Y (s) = X (s)
s
∆t
Y (z) = X ( z ).
1 − z −1
The same mapping of the z-plane to the s-plane as in (5.10) follows.
Consider the imaginary axis from the s-plane (the Fourier transform line). According to (5.10)
the mapping, with ∆t = 1, is defined by
1 − s → z −1 . (5.11)
Now we will consider the region which corresponds to the imaginary axis and the left semi-plane of
the s-domain (containing poles of a stable system), Fig. 5.7(left). The aim is to find the corresponding
region in the z-domain.
If we start from the s-domain and the region in Fig. 5.7(left), the first mapping is to reverse the
s-domain to −s and shift it for +1, as
1 − s → p.
The corresponding domain, after this mapping, is shown in Fig. 5.7(middle).
The next step is to map the region from p-domain into the z-domain, according to (5.11), as
p → z −1 .
Ljubiša Stanković Digital Signal Processing 209
By denoting Re{z} = x and Im{z} = y we get that the line Re{ p} = 1 in the p−domain,
corresponding to the imaginary axis in the s-plane, is transformed into the z-domain according to
1
Re{ p} = Re{ }
z
1
1 = Re{ }
x + jy
1 x − jy
1 = Re{ }
x + jy x − jy
resulting in
x
1=
x 2 + y2
or in 2
1 1
( x − )2 + y2 = . (5.12)
2 2
Therefore, the imaginary axis in the s-plane is mapped onto a circle defined by (5.12), Fig. 5.7(right) in
the z-plane. From the mapping relation 1 − s → z−1 it is easy to conclude that the origin s = 0 + j0
maps into z = 1 and that s = 0 ± j∞ maps into z = ±0, according to 1/(1 − s) → z.
s=0+jΩ p=1
z=ejω
Im{p}
Im{z}
Im{s}
1−s→ p p→ z−1
Figure 5.7 Illustration of the differentiation based mapping of the left s−semi-plane with the imaginary axis (left),
translated and reversed p−domain (middle), and the z−domain (right).
Mapping of the imaginary axis into z-domain can also be analyzed from
with
r −1 tan ω
Ω= sin ω = .
∆t ∆t
Obviously, ω = 0 maps to Ω = 0 (with Ω ∼ = ω/∆t for small ω), and ω = ±π/2 maps into Ω → ±∞.
Thus, the whole imaginary axis maps onto −π/2 ≤ ω ≤ π/2. These values of ω could be used within
the basic
p period. Relation (5.13),pwith −π/2 ≤ ω ≤ π/2, is a circle defined by (5.12) if we replace
r = x2 + y2 and cos ω = x/ x2 + y2 with σ < 0 (semi-plane with negative real values) being
mapped into r < cos ω (interior of unit circle).
210 From Continuous to Discrete Systems
What is the corresponding transfer function of the discrete-time system using the first-order
backward difference approximation with ∆t = 1/2? What is the solution to the differential
equation for x (t) = u(t). Compare it with the solution to difference equation y(n) with ∆t = 1/8.
⋆The discrete-time system transfer function is obtained using s = 1 − z−1 /∆t in H (s) as
1
H (z) = 2
1 − z −1 3 1 − z −1 1
∆t + 4 ∆t + 8
(∆t)2
= 3 1 2
1+ 4 ∆t + 8 (∆t) − [2 + 34 ∆t]z−1 + z−2
with
where ∆t = 1/2. For x (t) = u(t), the continuous-time output signal is obtained from
1
Y (s) = H (s) X (s) =
s(s2 + 34 s + 81 )
8 8 16
= + 1
− 1
s s+ 2 s+ 4
as
y(t) = [8 + 8e−t/2 − 16e−t/4 ]u(t).
The results of the difference equation for y(n) are compared with the exact solution y(t) in Fig.
5.8. The agreement is high. It could be additionally improved by reducing the sampling interval,
for example, to ∆t = 1/8.
Ljubiša Stanković Digital Signal Processing 211
10
y(t), y(n)
0 5 10 15
Figure 5.8 The exact solution to the differential equation for y(t), in solid line, and the discrete-time system
output y(n), in large dots for ∆t = 1/2 and in small dots for ∆t = 1/8.
In the case of a differentiator based mapping, the imaginary axis in the s−domain, corresponding to
the Fourier transform values, has been mapped onto a circle with radius 1/2 and the center at z = 1/2
in the z−domain, as shown in Fig. 5.7. This mapping does not correspond to the Fourier transform of
discrete-time signals position in the z−plane, which is along the circle line |z| = 1. A transformation
that will map the imaginary axis from the s−domain onto the unit circle in the z−domain is presented
next.
Consider numerical integration in the case of the first-order system (for example, the charge on a
capacitor), using the trapezoid rule
n∆t
Z Z −∆t
n∆t
x (n∆t) + x ((n − 1)∆t)
y(n∆t) = x (t)dt ∼
= x (t)dt + ∆t
2
−∞ −∞
x ( n ) + x ( n − 1)
y ( n ) = y ( n − 1) + ∆t.
2
In the Laplace and the z-transform domain, these relations have the following forms
1
Y (s) = X (s)
s
∆t 1 + z−1
Y (z) = X ( z ).
2 1 − z −1
The mapping from the s−domain to the z−domain is defined here by
2 1 − z −1
s→ . (5.14)
∆t 1 + z−1
In complex analysis this mapping is known as the bilinear transform.
Now we can repeat the transformation of the continuous-time system, H (s), from Example 5.1 to
get the discrete-time system, H (z), by replacing s with 2(1 − z−1 )/(1 + z−1 ) in (5.7).
212 From Continuous to Discrete Systems
Within the derivatives framework, the bilinear transform can be understood as the following
derivative approximation. Consider the first-order backward derivative approximation
y ( n ) = x ( n ) − x ( n − 1).
The same signal samples can used for the first-order forward derivative approximation
y ( n − 1) = x ( n ) − x ( n − 1).
If we assume that the difference x (n) − x (n − 1) fits better to the mean of y(n) and y(n − 1) than to
any single one of them, then the derivative approximation using the difference equation
y ( n ) + y ( n − 1)
= x ( n ) − x ( n − 1),
2
produces the bilinear transform.
In order to prove that the unit circle in the z−domain maps onto the imaginary axis in the
s−domain we may simply replace z = e jω into (5.14) and obtain
1 − e− jω e jω/2 − e− jω/2 ω
2 −
= 2 jω/2 = 2j tan( ) → s∆t.
1+e jω e + e− jω/2 2
For s = σ + jΩ, follows
σ=0
2 ω
Ω= tan( ).
∆t 2
Therefore, the unit circle z = e jω maps onto the imaginary axis σ = 0. The frequency points ω = 0
and ω = ±π map into Ω = 0 and Ω → ±∞, respectively.
The linearity of the frequency mapping Ω → ω is lost. It holds for small values of ω only
2 ω ω
Ω= tan( ) ∼
= , for |ω | ≪ 1.
∆t 2 ∆t
From
q
1+ s∆t (1 + σ∆t 2
2 ) + ( Ω∆t
2 )
2
2
z= s∆t
and |z| = q
1− (1 − σ∆t 2
+ ( Ω∆t 2
2 2 ) 2 )
it may easily be concluded that σ < 0 maps into |z| < 1, since 1 + σ∆t σ∆t
2 < 1 − 2 for σ < 0.
The bilinear transform mapping can be derived using a series of complex plane mappings. Since
s∆t
1+ 2 2
z= s∆t
= s∆t
− 1,
1− 2 1− 2
we can write
s∆t 1
1− → p1 , → p2 , and 2p2 − 1 → z.
2 p1
This series of mappings from the s-domain to the z-domain is illustrated in Fig. 5.9, with ∆t = 1. The
2
fact that p11 → p2 maps the line Re{ p1 } = 1 onto the circle ( x − 21 )2 + y2 = 21 in p2 -domain is
proven in the previous section.
Ljubiša Stanković Digital Signal Processing 213
s=0+jΩ p=1
Im{p }
Im{s}
1
1− s p−1→ p
Re{s} 2
→ p1 Re{p1} 1 2
z=ejω
Im{p2}
Im{z}
1 1
Re{p } 2p −1→ z
2 Re{z}
2
Figure 5.9 Bilinear mapping illustration trough a series of elementary complex plane mappings.
Since the bilinear transform introduces a nonlinear transformation of the frequency axis
from the continuous-time domain to the discrete-time domain, Ω = ∆t 2
tan( ω2 ), this nonlinearity
must be compensated during the system design. Usually it is done by pre-modifying the desired
important frequency values Ωc from the analog domain using Ωd = ∆t 2
tan( ω2c ), and ωc = Ωc ∆t.
The new continuous-time domain frequencies Ωd will be returned back to the desired values ωc and
Ωc = ωc /∆t after the bilinear transformation.
and to stop all other possible signal components. The parameters are Q = 0.01, Ω1 = π/4, and
Ω2 = 3π/5. The signal is sampled with ∆t = 1 and the discrete-time signal x (n) is formed. Using
214 From Continuous to Discrete Systems
the bilinear transform, design the discrete-time system that corresponds to the continuous-time
system with the transfer function H (s).
1 − z −1
s→2 (5.15)
1 + z −1
and map H (s) to HB (z) without any pre-modification. The result is presented in the first two
subplots of Fig. 5.10. The discrete frequencies are shifted since the bilinear transform (5.15) made
a nonlinear frequency mapping from the continuous-time to discrete-time domain, according to
ω
Ω = 2 tan( ).
2
Thus, obviously, the system HB (z) is not a system that will filter the corresponding frequencies
in x (n) in the same way as H (s) filters x (t).
In order to correct the shift introduced by the bilinear transform mapping, the continuous-
time system should be pre-modified as
2QΩ1d 2QΩ2d
Hd (s) = + 2
s2 + 2Ω1d Qs + Ω21d + Q2 s + 2Ω2d Qs + Ω22d + Q2
with
2 Ω ∆t
Ω1d = tan( 1 ) = 0.8284 = 0.2637π
∆t 2
2 Ω2 ∆t
Ω2d = tan( ) = 2.7528 = 0.8762π.
∆t 2
We see that the shift of Ω1 = 0.25π to Ω1d = 0.2637π is small since the bilinear transform
frequency mapping for small frequency values is almost linear. However, for Ω2 = 0.6π, the
shift to Ω2d = 0.8762π is significant due to a high nonlinearity of mapping in that region. The
modified system Hd (s) is presented in subplot 3 of Fig. 5.10. Next, using the bilinear transform
z −1
mapping s → 2 11−+ z −1
the modified frequencies will map to the desired ones ω1 = Ω1 ∆t and
ω2 = Ω2 ∆t. The obtained discrete-time system transfer function is of the form
2QΩ1d 2QΩ2d
H (z) = 2 + 2 .
1 − z −1 1 − z −1 1 − z −1 −1
2 1 + z −1
+ 4Ω1d Q 1 + z −1
+ Ω21d + Q2 2 1 + z −1
+ 4Ω2d Q 11− z
+ z −1
+ Ω22d + Q2
When the expression for H (z) is appropriately rearranged, its final form is given by
The frequency response of this system is shown in panel 4 of Fig. 5.10. This is the desired
discrete-time system corresponding to the continuous-time system in panel 1 of this figure. In
calculations the coefficients are rounded to four decimal places.
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
1
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
1
0.5
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Figure 5.10 Amplitude of the continuous-time system with the transfer function H (s) and the amplitude of
the transfer function HB (z) of the discrete-time system obtained by the bilinear transform (first two panels). A
premodified system to take into account the nonlinearity of the frequency mapping in the bilinear transform, Hd (s),
and the amplitude of the transfer function H (z) of the discrete-time system obtained by the bilinear transform of
Hd (s) (last two panels).
Comparison of the mapping methods presented in this section is summarized in the next table.
Sampling
Fourier transform
Method theorem
H (s)|s= jΩ → H (z)|z=e jω
condition
Impulse Invariance Yes, Ω = ω/∆t Yes
Matched z-transform No No
First-oder difference No No
tan(ω/2)
Bilinear transform Yes, Ω = ∆t/2 No
216 From Continuous to Discrete Systems
The digital filter design will be explained here. The lowpass filter is assumed as the basic filter form,
while the other filters (highpass and bandpass) are designed by modifying the system that corresponds
to the discrete-time lowpass filter. In the examples, the lowpass Butherworth filters will be used.
is noncausal.
There are several methods to approximate the ideal lowpass filter frequency response. One of
them is the Butterworth approximation. Some of commonly used approximations are Chebyshev and
elliptic forms as well.
A lowpass filter of the Butterworth type is shown in Fig. 5.11, along with the ideal one.
Figure 5.11 Lowpass filter frequency response: ideal case (left) and Butterworth type (right).
Example 5.6. Implement the Butterworth discrete-time filter of the order N = 4 with a critical
frequency corresponding to the continuous-time domain filter with the critical frequency
f c = 4[kHz] and the sampling interval ∆t = 31.25[µ sec], using:
(a) The impulse invariance method and
(b) the bilinear transform.
The poles of the fourth-order Butterworth filter in the continuous-time domain (Chapter I,
Subsection 1.6) are
h π π π π i
s0 = Ωc cos( + ) + j sin( + ) = Ωc (−0.3827 + j0.9239)
2 8 2 8
π 3π π 3π
s1 = Ωc cos( + ) + j sin( + ) = Ωc (−0.9239 + j0.3827)
2 8 2 8
π 5π π 5π
s2 = Ωc cos( + ) + j sin( + ) = Ωc (−0.9239 − j0.3827)
2 8 2 8
π 7π π 7π
s3 = Ωc cos( + ) + j sin( + ) = Ωc (−0.3827 − j0.9239).
2 8 2 8
The transfer function of the filter in the Laplace domain is
Ω4c
H (s) = 2
. (5.16)
( s2 + 0.7654Ωc s + Ωc )(s2 + 1.8478Ωc s + Ω2c )
(a) For the impulse invariance method the transfer function (5.16) should be expanded into partial
fractions,
k0 k1 k2 k3
H (s) = + + + ,
s − s0 s − s1 s − s2 s − s3
with the constants k i calculated based on k i = H (s)(s − si )|s=si as
k0 = (−0.3628 + j0.1503)/∆t,
k1 = (0.3628 − j0.8758)/∆t,
k2 = (0.3628 + j0.8758)/∆t,
k3 = (−0.3628 − j0.1503)/∆t.
Using the impulse invariance method we get the transfer function of the Butterworth filter
k0 ∆t k1 ∆t k2 ∆t k3 ∆t
H (z) = + + +
1 − es0 ∆t z−1 1 − es1 ∆t z−1 1 − es2 ∆t z−1 1 − es3 ∆t z−1
−0.3628 + j0.1503 0.3628 − j0.8758
= +
1 − eωc (−0.3827+ j0.9239) z−1 1 − eωc (−0.9239+ j0.3827) z−1
0.3628 + j0.8758 −0.3628 − j0.1503
+ + .
1 − eωc (−0.9239− j0.3827) z−1 1 − eωc (−0.3827− j0.9239) z−1
It can be seen that the discrete-time filter is a function of ωc . Thus, for a given continuous domain
frequencies and the sampling interval ∆t, it is possible to calculate the corresponding discrete-time
frequency ωc = Ωc ∆t and to use this frequency in the filter design with the normalized ∆t = 1.
Using the value of the critical frequency, ωc = π/4, we get
−0.3628 + j0.1503 −0.3628 − j0.1503
H (z) = +
1 − (0.5539 + j0.4913)z−1 1 − (0.5539 − j0.4913)z−1
0.3628 − j0.8758 0.3628 + j0.8758
+ + .
1 − (0.4623 + j0.1433)z−1 1 − (0.4623 − j0.1433)z−1
A system form with the real-valued coefficients is obtained by grouping the complex-conjugate
terms,
(b) For the bilinear transform, the critical frequency ωc has to be pre-modified according to
2 ωc 0.8284
Ωd = tan( ) = .
∆t 2 ∆t
Then, the frequency Ωd is used for the design in (5.16), instead of Ωc . The frequency Ωd will be
transformed back to Ωc = ωc /∆t, after the bilinear transform is used. Using the substitutions
2 1 − z −1
s → ∆t 1 + z −1
and
ωd = Ωd ∆t = 0.8284
in (5.16), the filter transfer function follows as
ωd 4
H (z) =
z −1 2 − z −1 −1 −1
[4( 11−
+ z −1
) + 2ωd 0.7654 11+ z −1
+ ωd2 ][4( 11− z
+ z −1
−z + ω 2 ]
)2 + 2ωd 1.8478 11+ z −1 d
0.4710
= −1 −1 −1
−z )2 + 1.2626 1−z + 0.6863][4( 1−z )2 + 3.0481 1−z + 0.6863] −1
[4( 11+ z −1 1 + z −1 1 + z −1 1 + z −1
− 1
4
0.4710 1 + z
=
3.4237z−2 − 6.6274z−1 + 5.9484 1.6382z−2 − 6.6274z−1 + 7.7704
4
0.084 1 + z−1
= −2
z − 1.9357z−1 + 1.7343 z−2 − 4.0455z−1 + 4.7433
0.084z−4 + 0.336z−3 + 0.504z−2 + 0.336z−1 + 0.084
= .
z−4 − 5.9810z−3 + 14.3z−2 − 16.1977z−1 + 8.2263
The transfer function (amplitude and phase) of the continuous-time filter and the discrete-
time filters, obtained using the impulse invariance method and the bilinear transform, are presented
in Fig. 5.12, within one period in frequency. The agreement between the amplitude and the phase
functions is high. The difference equation describing this Butterworth filter is
1.5
0.5
0
/2
Figure 5.12 Amplitude and phase of the fourth-order Butterworth filter frequency response, obtained using the
impulse invariance method and bilinear transform.
In calculations, the coefficients are normalized by 8.2263 and rounded to four decimal places.
Rounding may cause small quantization errors (that will be discussed within the next chapter).
Ljubiša Stanković Digital Signal Processing 219
⋆The maximum attenuation in the passband and the minimum attenuation in the stopband are
a p = 20 log( A p )
A p = 10a p /20 = 0.7943
As = 10as /20 = 0.1679.
The relations for the filter order N and the critical frequency Ωc are (Chapter I, Subsection 1.6)
1 2 1 2
2N ≥ A p , 2N ≤ As . (5.17)
Ωp Ωs
1+ Ωc 1+ Ωc
Using the equality in both of these relations, the value of N follows from
1
−1
A2p
ln 1
1 −1
A2s
N= = 2.9407. (5.18)
2 ln Ω p
Ωs
N = 3.
We can use any of the relations in (5.17) with the equality sign in order to calculate Ωc . For the
2
first relation, the value of Ωc will be such that H ( jΩ p ) = A2p is satisfied. Then,
Ωp
Ωc = q = 2π × 3.2805 kHz,
1
2N
A2p
−1
ωc = Ωc ∆t = 1.0306.
(2π3.5470 × 103 )3
Hd (s) = .
(s + 2π3.5470 × 103 )(s2 + 2π3.5470 × 103 s + (2π3.5470 × 103 )2 )
The discrete-time Butterworth filter transfer function H (z) follows when the substitution
2 1 − z −1
s=
∆t 1 + z−1
is performed. This filter is of the form
1.11433
H (z) = 2
1 − z −1 − z −1 z −1
(2 1 + z −1
+ 1.1143)( 2 11+ z − 1 + 1.1143 11−
+ z −1
+ 1.11432 )
0.0595(1 + z−1 )3
= .
1 − 1.0229z−1 + 0.6133z−2 − 0.1147z−3
The corresponding difference equation of this filter is
In addition to the last two components that have frequencies corresponding to the analog
signal there is the first component
11π 12π 11π 12π
2π [δ(ω − + ) + δ(ω + − )]
6 6 6 6
that corresponds to
π
x1 (n) = 2 cos( n ).
6
The lowpass filter output is
π π
y(n) = 2 cos( n) + sin( n).
6 4
It corresponds to the continuous-time signal
π
y(t) = 8 cos( t) + 4 sin(πt).
6
One component at the frequency ω = 2π/3 > π/3 is filtered out. The component at ω = π/4
is unchanged. One more component has appeared at the frequency ω = π/6 due to the periodic
extension of the Fourier transform of the discrete-time signal.
In general, a signal component x (t) = exp( jΩ0 t), Ω0 < 0, with a sampling interval ∆t
such that
Kπ ≤ Ω0 ∆t < (K + 1)π
will, after sampling, result into a component within the basic period of the Fourier transform of the
K
discrete-time signal, corresponding to the continuous signal at exp( j(Ω0 t − ∆t πt). This effect
is known as aliasing. The most obvious visual effect is when a wheel rotating with f 0 = 25 [Hz],
Ω0 = 50π, is sampled in a video sequence at ∆t = 1/50 [sec]. Then Ω0 ∆t = π corresponds to
exp( j(Ω0 t − 50πt)) = e j0 , that is, the wheel looks as it were static (nonmoving) object.
222 From Continuous to Discrete Systems
Highpass filters can be obtained by transforming the corresponding continuous-time filters into the
discrete-time domain. For example, if a lowpass filter H (s), with cutoff frequency Ωc , is transformed
using HH (s) = H (1/s), then the resulting filter HH (s) is of the highpass type, with the cutoff
frequency 1/Ωc .
In the discrete-time domain, a highpass filter frequency response, HH (e jω ), is obtained by shifting
the corresponding lowpass filter response, H (e jω ), for π in frequency, Fig. 5.13, that is
HH (e jω ) = H (e j(ω −π ) ).
Figure 5.13 Ideal highpass filter, HH (e jω ), as a shifted version of the ideal lowpass filter, H (e jω ).
Thus, if we have an impulse response h(n) of a lowpass filter, the corresponding highpass filter impulse
response, h H (n), is obtained by multiplying the impulse response values h(n) by (−1)n . The output
of the highpass filter to any input signal x (n) is given by
∞
y(n) = x (n) ∗n h H (n) = ∑ x (m)(−1)n−m h(n − m)
m=−∞
∞ h i
= (−1)n ∑ (−1)m x (m)h(n − m) = (−1)n (−1)n x (n) ∗n h(n) (5.19)
m=−∞
Figure 5.14 Highpass filter realization using the corresponding lowpass filter.
This relation means that the lowpass filter can be implemented using the scheme shown in Fig.
5.14.
Ljubiša Stanković Digital Signal Processing 223
1.5 1.5
1 1
|HH(ejω|
|H(ejω|
0.5 0.5
0 ω 0 ω
−π −π/2 0 π/2 π −π −π/2 0 π/2 π
Figure 5.15 Amplitude of the frequency response of a lowpass Butterworth filter (left) and the filter obtained
from the lowpass Butterworth filter when z is replaced by −z (right).
⋆The impulse response is obtained by changing the sign for every other sample in h(n). In the
z-transform definition that means using (−z)−n instead of z−n . The frequency response of
4
0.1236 1 − z−1
HH (z) =
z−2 + 1.9389z−1 + 1.7420 z−2 + 4.0790z−1 + 4.7686
A bandpass filter is obtained from the corresponding lowpass filter by shifting its frequency response
for ω0 and −ω0 , as shown in Fig. 5.16. The frequency response of the bandpass filter is
The last relation indicates that we may write the output of a bandpass filter as a function of the lowpass
impulse response in the form
This relation leads to the realization of the bandpass filter using the corresponding lowpass filter, as
shown in Fig. 5.17.
× h(n) ×
× h(n) ×
cos(ω0n) 2cos(ω0n)
Figure 5.17 Bandpass system realization using the corresponding lowpass filter and signal modulation.
A system (filter) with unit (constant) amplitude of the frequency response is defined by
where 0 < a < 1 is real-valued and θ is an arbitrary phase. For this system
H A (e jω ) = 1.
This system cannot be causal and stable since there is a pole at z = 2. Define an allpass system
to be connected to H (z) in cascade such that the resulting system is causal and stable, with the
same amplitude of the frequency response as H (z).
⋆When an allpass system, H A (z), is added in cascade with the given system, H (z), the overall
system transfer function, Hs (z) is of the form
z+2 z − 1a e jθ
Hs (z) = H (z) H A (z) = 1 1
e− j2θ .
(z − 2 )( z − 3 )( z − 2) 1 − 1a e− jθ z
The values of a and θ are chosen is such a way that the undesirable pole at z = 2 is canceled out,
that is, a = 1/2 and θ = 0. With these values of a and θ we get
z+2 z−2 z+2
Hs (z) = =− .
(z − 21 )(z − 31 )(z − 2) 1 − 2z 2(z − 21 )2 (z − 13 )
This system has the same amplitude of the frequency response as the initial system H (z), since
Hs (e jω ) = H (e jω ) H A (e jω ) = H (e jω ) H A (e jω ) = H (e jω ) .
This system can be used for multiple poles cancellation and phase correction.
226 From Continuous to Discrete Systems
1
Hi (z) = .
H (z)
It is obvious that
H (z) Hi (z) = 1
h ( n ) ∗ h i ( n ) = δ ( n ).
The inverse system can be used to reverse the signal distortion. For example, assume that the Fourier
transform of a signal x (n) is distorted during the transmission by a transfer function H (z), that is, the
received signal z-transform is R(z) = H (z) X (z). In this case, the distortion can be compensated by
processing the received signal using the inverse system. The output signal is obtained as
1
Y (z) = R ( z ) = X ( z ).
H (z)
The system Hi (z) = 1/H (z) should be stable as well. It means that the poles of the inverse system
should be within the unit circle. The poles of the inverse system are equal to the zeros of H (z).
The system H (z) whose both poles and zeros are within the unit circle is called a minimum phase
system.
(z − 41 )(z + 45 )
H1 (z) =
(z + 41 )(z + 43 )
(z − 41 )(z − 43 )
H2 (z) =
(z + 41 )(z + 43 )
The first system is causal and stable for the region of convergence |z| > 3/4. However one of its
zeros is at |z| = 5/4 > 1 and the system is not a minimum phase system, since its causal inverse
form is not stable. The second system is causal and stable. The same holds for its inverse, since
Ljubiša Stanković Digital Signal Processing 227
all poles of the inverse system are within |z| < 1. Thus, the system H2 (z) is a minimum phase
system.
(b) In this case
5
z2 + z − 16 (z − 14 )(z + 45 )
R(z) = 3
X (z) = X ( z ).
z2 + z + 16 (z + 14 )(z + 43 )
An inverse system to H1 (z) cannot be used since it will not be stable. However, the inverse
system can be stabilized with an allpass system H A (z) so that the amplitude is not changed
1 1
Y (z) = R(z) H (z) = H1 (z) X (z) H (z)
H1 (z) A H1 (z) A
where
5
z+ 4
H A (z) = 5
1+ 4z
and
1 (z + 41 )(z + 34 ) (z + 45 ) (z + 14 )(z + 43 )
HD ( z ) = H A (z) = =
H1 (z) (z − 41 )(z + 54 ) (1 + 45 z) (z − 41 )(1 + 45 z)
This system is stable and causal and will produce Y (e jω ) = X (e jω ).
If a system is the minimum phase system (with all poles and zeros within |z| < 1), then this
system has a minimum group delay out of all systems with the same amplitude of the frequency
response. Thus, any nonminimum phase system will have a more negative phase compared to the
minimum phase system. The negative part of the phase is called the phase-lag function. The name
minimum phase system comes from the minimum phase-lag function.
In order to prove this statement consider a system H (z) with the same amplitude of the frequency
response as a nonminimum phase system Hmin (z). Its frequency response can, therefore, be written as
z−1 − ae− jθ
H (z) = Hmin (z) H A (z) = Hmin (z)
1 − ae jθ z−1
Here, we assumed the first-order allpass system, without any loss of generality, since the same proof
can be used for any number of allpass systems that multiply Hmin (z). Since 0 < a < 1 and the system
Hmin (z) is stable the system H (z) has a zero at |z| = 1/a > 1.
The phases of the systems in the previous equation are related as
e− jω − ae− jθ 1 − ae− jθ e jω
arg{ H A (e jω )} = arg{ −
} = arg{e− jω }
jθ
1 − ae e jω 1 − ae jθ e− jω
a sin(ω − θ )
= −ω + arg{1 − ae− jθ e jω } − arg{1 − ae jθ e− jω } = −ω − 2 arctan .
1 − a cos(ω − θ )
228 From Continuous to Discrete Systems
τg (ω ) = τg min (ω ) + τgA (ω )
τg (ω ) ≥ τg min (ω ),
with τg (ω ) and τg min (ω ) being the phase derivatives (group delays) of systems H (z) and Hmin (z),
respectively.
The phase behavior of all pass system is
1 − ae− jθ
arg{ H A (e j0 )} = arg{ }=0 (5.20)
1 − ae jθ
Zω
arg{ H A (e jω )} = − τg (ω )dω ≤ 0 (5.21)
0
since τg (ω ) > 0 for 0 ≤ ω < π.
We can conclude that the minimum phase systems satisfy the following conditions.
1. A minimum phase system is system of minimum group delay out of the systems with the same
amplitude of frequency response. A system containing one or more allpass parts with uncompensated
zeros outside of the unit circle will have larger delay than the system which does not contain zeros
outside the unit circle.
2. The phase of a minimum phase system will be lower than the phase of any other system with
the same amplitude of frequency response since, according to (5.21),
This proves the fact that the phase of any system arg{ H (e jω ) is always lower than the phase of
minimum phase system arg{ Hmin (e jω )}, having the same amplitude of the frequency response.
3. Since the group delay is minimum we can conclude that
n n
∑ |hmin (m)|2 ≥ ∑ |h(m)|2 .
m =0 m =0
This relation may be proven in a similar way like the minimum phase property, by considering the
outputs of a minimum phase system and a system H (z) = Hmin (z) H A (z).
Example 5.12. A system has absolute squared amplitude of the frequency response equal to
2
2 5
2 cos(ω ) + 2
H (e jω ) =
(12 cos(ω ) + 13)(24 cos(ω ) + 25)
Find the corresponding minimum phase system.
Ljubiša Stanković Digital Signal Processing 229
In the z−domain the system with this amplitude of the frequency response (with real-valued
coefficients) satisfies
2
∗ 1
1
H (z) H ( ∗ ) = H (z) H ( ) = H (e jω ) = H (e jω ) H (e− jω ).
z z=e jω z z=e jω
In this sense
2
2
e jω + e− jω + 5
2
H (e jω ) =
(6e jω + 6e− jω + 13)(12e jω + 12e− jω + 25)
and
2
1 z + 52 + z−1
H (z) H ( ) =
z (6z + 13 + 6z−1 )(12z + 25 + 12z−1 )
2
z2 + 25 z + 1
=
(6z2 + 13z + 6)(12z2 + 25z + 12)
1 (z + 2)2 (z + 12 )2 1 ( 1z + 21 )2 (z + 21 )2
= 2 3 3 4
= .
72 (z + 3 )(z + 2 )(z + 4 )(z + 3 ) 72 (z + 3 )( 1z + 23 )(z + 34 )( 1z + 43 )
2
The minimum phase system, with the desired amplitude of the frequency response, is a part of
H (z) H ∗ ( z1∗ ) = H (z) H ( 1z ) with the zeros and poles inside the unit circle
√
2 (z + 21 )2
H (z) = .
12 (z + 32 )(z + 43 )
5.6 PROBLEMS
with R/L = 8 and 1/( LC ) = 25. Find the difference equation describing the corresponding discrete-
time system obtained by the impulse invariance method. What is the impulse response of the discrete-
time system. Use the sampling interval ∆t = 1.
Problem 5.2. Could the method of impulse invariance be used to map the system
s2 − 3s + 3
H (s) =
s2 + 3s + 3
to the discrete-time domain. What is the corresponding discrete-time system obtained by the bilinear
transform with ∆t = 1?
Problem 5.3. A continuous-time system is described by the differential equation
3 1
y′′ (t) + y′ (t) + y(t) = x (t)
2 2
with the zero initial conditions. What is the corresponding transfer function of the discrete-time system
using the first-order backward difference approximation with ∆t = 1/10? Write the difference equation
of the system whose output approximates the output of the continuous-time system.
Problem 5.4. Transfer function of a continuous-time system is
2s
H (s) = − .
s2 + 2s + 2
What is the corresponding discrete-time system using the invariance impulse method and the bilinear
transform for system mapping with the sampling interval ∆t = 1?
Problem 5.5. A continuous-time system is described by the transfer function of the form
(1 + 4s)
H (s) = .
(s + 1/2)(s + 1)3
What is the corresponding discrete-time system according to:
(a) the impulse invariance method,
(b) the bilinear transform,
(c) the matched z-transform?
Use ∆t = 1.
Problem 5.6. The continuous-time system
2QΩ1
H (s) =
s2 + 2Ω1 Qs + Ω21 + Q2
Problem 5.7. (a) By using the bilinear transform find the transfer function of the second-order
Butterworth filter with f ac = 4kHz. The sampling interval is ∆t = 50µ sec.
(b) Translate the discrete-time transfer function to obtain a highpass filter. Find its corresponding
critical frequency in the continuous-time domain.
Problem 5.8. Design a discrete-time lowpass Butterworth filter for the sampling frequency 1/∆t = 10
kHz. The passband should be from 0 to 1 kHz, the maximum attenuation in the passband should be 3
dB (a p ≥ −3dB) and the attenuation should be more than 10 dB (as < −10dB) for frequencies above
2 kHz.
Problem 5.9. Using the impulse invariance method design a Butterworth filter with the passband
frequency ω p = 0.1π and the stopband frequency ωs = 0.3π in the discrete-time domain. The
maximum attenuation in the passband region should be less than 2dB, and the minimum attenuation in
the stopband should be 20dB.
Problem 5.10. A highpass filter can be obtained from a lowpass using HH (s) = H (1/s). With the
bilinear transform with ∆t = 2 we can transform the continuous-time domain function into discrete
domain using the relation s = (z − 1)/(z + 1). If we have a design of a lowpass filter how to change
its coefficients in order to get a highpass filter.
Problem 5.11. For filtering of a continuous-time signal, a discrete-time filter is used. Find the
corresponding continuous-time filter frequencies if the discrete-time filter is: a) lowpass with
ω p = 0.15π, b) bandpass within 0.2π ≤ ω ≤ 0.25π, c) highpass with ω p = 0.35. Consider the
cases when ∆t = 0.001s and ∆t = 0.1s.
What should be the frequencies to design these systems in the continuous-time domain if the
impulse invariance method is used and what are the design frequencies if the bilinear transform is used?
Problem 5.12. A transfer function of the first-order lowpass system is
1−α
H (z) = .
1 − αz−1
Find the corresponding bandpass system transfer function with frequency shifts for ±ωc .
Problem 5.13. Using an appropriate allpass system find the stable systems with the same amplitude of
the frequency response as the systems:
(a)
2 − 3z−1 + 2z−2
H1 (z) =
1 − 4z−1 + 4z−2
(b)
z
H2 (z) = .
(4 − z)(1/3 − z)
Problem 5.14. The z-transform
5.7 EXERCISE
( s + 2)
H (s) = .
4s2 + s + 1
What is the corresponding discrete-time system obtained with ∆t = 1 by using the impulse invariance
method and the bilinear transform.
x (t) = A cos(Ω0 t + ϕ)
Exercise 5.4. (a) By using the bilinear transform find the transfer function of the third-order
Butterworth filter with the cutoff frequency f c = 3.4 kHz. The sampling step is ∆t = 40 µ sec.
(b) Translate the discrete transfer function to obtain a bandpass system with the corresponding
central frequency f 0 = 12.5 kHz in the continuous-time domain.
Exercise 5.6. Using an allpass system find a stable and causal system with the same amplitude of the
frequency response as the systems:
2 − 5z−1 + 2z−2
H1 (z) = ,
1 − 4z−1 + z−2
z−1
H2 (z) = .
(2 − z)(1/4 − z)
Exercise 5.7. The z-transform
(z − 31 )(z−1 − 13 )
R(z) =
(z + 21 )(z−1 + 12 )
can can be written as
1
R ( z ) = H ( z ) H ∗ ( ∗ ).
z
Find H (z) for the minimum phase system. If h(n) is the impulse response of H (z) and h1 (n) is the
impulse response of
z−1 − a1 e− jθ1
H1 (z) = H (z)
1 − a1 e jθ1 z−1
show that | h(0)| ≤ | h1 (0)| for any θ1 and | a1 | < 1. All systems are causal.
Exercise 5.8. A signal x (n) has passed trough a media whose influence can be described by the
transfer function
(1 − z/3)(1 − 5z)(z2 − z + 43 )
H (z) =
z2 − 2/3
and the signal
r (n) = x (n) ∗ h(n) is obtained. Find a causal and stable system to process r (n) in order
to obtain Y (e jω ) = X (e jω ).
234 From Continuous to Discrete Systems
5.8 SOLUTIONS
j 25
6 − j 25
6
H (z) = −(
+
1 − e 4+ j3) z−1 1 − e 4− j3) z−1
−(
25 −4 −1
3 e z sin(3)
= ,
1 − 2e−4 cos 3z−1 + e−8 z−2
with the corresponding difference equation
25 −4
y(n) = e sin(3) x (n − 1) + 2e−4 cos(3)y(n − 1) − e−8 y(n − 2).
3
The output signal values can be calculated for any input signal using this difference equation. For
x (n) = δ(n) the impulse response follows. The impulse response can be obtained in a closed form
from
25 ∞ −(4+ j3)n −n 25 ∞ −(4− j3)n −n
H (z) = j ∑
6 n =0
e z −j
6 n∑
e z
=0
as
25 −4n − j3n 25
h(n) = e ( je − je j3n )u(n) = e−4n sin(3n)u(n).
6 3
There is no correction term since limz→∞ H (z) = 0.
Solution 5.2. The system is not of lowpass type. For s → ∞ we get H (s) → 1. Thus, the impulse
invariance method cannot be used. The bilinear transform can be used. It produces
( 1 − z −1 ) 2 −1
4 (1+z−1 )2 − 6 11− z
+ z −1
+3 13z−2 − 2z−1 + 1
H (z) = = .
( 1 − z −1 ) 2
4 ( 1 + z −1 ) 2 + 6 1 − z −1
+3 z−2 − 2z−1 + 13
1 + z −1
3 1
y′′ (t) + y′ (t) + y(t) = x (t)
2 2
the transfer function is
1
H (s) = .
s2 + 32 s + 1
2
Ljubiša Stanković Digital Signal Processing 235
1 − z −1
s→ = 10(1 − z−1 )
∆t
as
1 1
H (z) = = .
100(1 − z−1 )2 + 32 10(1 − z−1 ) + 1
2 100z−2 − 215z−1 + 231
2
1+j 1−j
H (s) = − − .
s+1−j s+1+j
Using the invariance impulse method, the transfer function of the discrete-time system follows
1 − z −2
H ( z ) = −2 .
5 − 2z−1 + z−2
(1 + 4s)
H (s) =
(s + 1/2)(s + 1)3
is expanded into partial fractions, appropriate for the impulse invariance method, as
k1 k2 k3 k4
H (s) = + + +
s + 1/2 ( s + 1) ( s + 1)2 ( s + 1)3
with k1 = H (s)(s + 1/2)|s=−1/2 = −8 and k4 = H (s)(s + 1)3 s=−1 = 6. By equating the coeffi-
cients with s3 to 0, we get the relation k1 + k2 = 0. A similar relation follows for the coefficients with
s2 in the form 3k1 + 5k2 /2 + k3 = 0 or k1 /2 + k3 = 0. Then, k2 = 8 and k3 = 4. With
ki ki
→
s − si 1 − e si z −1
and
1 dm k i 1 dm ki
m → { }
m! dsi s − si m! dsim 1 − esi z−1
236 From Continuous to Discrete Systems
Solution 5.6. Since the bilinear transform is used, we have to pre-modify the system according to
2 Ω ∆t
Ωd = tan( 1 ) = 2.0 = 0.6366π.
∆t 2
The frequency value is shifted from Ω1 = 0.5π to Ωd = 0.6366π. The modified system is
2QΩd
Hd (s) = .
s2 + 2Ωd Qs + Ω2d + Q2
−1
−z , the corresponding discrete-time system is obtained,
Now, using s = 2 11+ z −1
2QΩd
H (z) = 2 .
− z −1
2 11+ − z −1 + Ω 2 + Q 2
+ 2Ωd Q 2 11+
z −1 z − 1 d
The bilinear transform returns the pre-modified frequency to the desired one.
Solution 5.7. The poles of H (s) H (−s) for a continuous-time second-order (N = 2) Butterworth
filter are
where
2
fc = tan(2π f ac ∆t/2)/(2π ) = 4.6253kHz.
∆t
With k = 0, 1, 2, 3 follows √ √
2 2
±j
sk = 2π f c (±
).
2 2
For a stable system, the poles satisfy Re{s p } < 0, thus
√ √
2 2
s0,1 = 2π f c (− ±j ).
2 2
Ljubiša Stanković Digital Signal Processing 237
s1 s2 4π 2 f c2
Ha ( s ) = = √ .
(s − s0 )(s − s1 ) s2 + 2π f c 2s + 4π 2 f c2
Using the bilinear transform with ∆t = 50 · 10−6 , we get the corresponding discrete-time system
transfer function,
1.0548(1 + z−1 )2
H (z) = .
5.1066 − 1.8874z−1 + z−2
This filter has −3 dB attenuation at ω = 0.4π, corresponding to Ω = 0.4π/∆t = 2π × 4 × 103 .
b) The discrete highpass filter is obtained by the shift corresponding to
Hh (e jω ) = H (e j(ω +π ) ).
This shift corresponds to the impulse response modulation hh (n) = (−1)n h(n) or to the substitution
of z by −z in the transfer function,
1.0548(1 − z−1 )2
HH (z) = .
5.1066 + 1.8874z−1 + z−2
The critical frequency of the highpass filter, HH (z), is ωc = 0.6π or f ac = 6 kHz.
Solution 5.8. For the continuous-time system the design frequencies are
f p = 1 kHz
f s = 2 kHz.
They correspond to
Ω p = 2π 103 rad/s
Ωs = 4π 103 rad/s.
ω p = 0.2π
ωs = 0.4π.
The frequencies for the filter design, that will be mapped to ωs and ω p , after the bilinear transform is
used, are
2 0.6498
Ω pd = tan(0.2π/2) =
∆t ∆t
2 1.4531
Ωsd = tan(0.4π/2) = .
∆t ∆t
The filter order follows from
1−100.1a p
1 log 1−100.1as
N= = 1.368.
2 log Ω pd
Ωsd
We assume N = 2.
238 From Continuous to Discrete Systems
Since the frequency for −3 dB attenuation is given, the design cutoff frequency is
0.6498
Ωcd = Ω pd = .
∆t
The poles of the filter transfer function, for N = 2 and Ωcd , are
√ √
0.6498 2 2
s0,1 = (− ±j )
∆t 2 2
with the transfer function
1
s0 s1 ∆t2
0.4223
H (s) = = 1
.
(s − s0 )(s − s1 ) s2 + 0.919s ∆t + 0.4223 ∆t1 2
Mapping of this system into the discrete-time domain using the bilinear transform,
2 1 − z −1
s= ,
∆t 1 + z−1
produces the second-order Butterworth filter
0.067569(1 + z−1 )2
H (z) = .
1 − 1.14216z−1 + 0.412441z−2
1−100.1a p
1 log 1−100.1as
N= = 2.335.
2 log Ω p
Ωs
k i = H (s)(s − si ) |s=si .
Ljubiša Stanković Digital Signal Processing 239
Using the impulse invariance method, mapping from the continuous-time domain to the discrete-time
domain, is done according to
ki ∆tk i
→ .
s − si 1 − esi ∆t z−1
The discrete-time filter transfer function is
−0.0253z−2 − 0.0318z−1
H (z) = .
−1.98774 + 4.61093z−1 − 3.68033z−2 + z−3
1
H H ( s ) = H ( ),
s
2 1 − z −1 2 z −1
with s = ∆t 1+z−1 = ∆t z+1 and ∆t = 2. The corresponding lowpass filter would be
z−1
HL (z) = H (s)|s= z−1 = H ( ).
z +1 z+1
The discrete-time highpass filter is
1
HH (z) = HH (s)|s= z−1 = H ( )
z +1 s s = z −1
z +1
z+1
HH (z) = H ( ).
z−1
Obviously HH (z) = HL (−z). It means that a discrete highpass system can be realized by replacing z
with −z in the transfer function. For ∆t 6= 2 a scaling is present as well.
Solution 5.11. a) The frequency relation with ∆t = 0.001 s produces a lowpass filter with Ω p =
ω p /∆t = 150 π rad/s. For ∆t = 0.1 s the frequency is Ω p = ω p /∆t = 1.5 π rad/s.
b) For ∆t = 0.001 s a bandpass filter is obtained for the range 200π rad/s ≤ Ω ≤ 250π rad/s,
while ∆t = 0.1 s produces a bandpass filter with 2π rad/s ≤ Ω ≤ 2.5π rad/s.
c) For ∆t = 0.001 s a highpass filter has the frequency Ω p = 350 rad/s, while for ∆t = 0.1 s
the highpass filter has critical frequency Ω p = 3.5 rad/s.
For the impulse invariance method starting design frequencies should be equal to the calculated
analog frequencies. If the bilinear transform is used calculated analog frequencies Ω p should be
2 Ω p ∆t
pre-modified to Ωm according to Ωm = ∆t tan 2 .
Solution 5.12. The impulse response of the passband filter is h B (n) = 2h(n) cos(ωc n). The z-
transform of the impulse response is
∞ ∞ ∞
HB ( z ) = ∑ 2h(n) cos(ωc n)z−n = ∑ h(n)(e− jωc z)−n + ∑ h(n)(e jωc z)−n
n=−∞ n=−∞ n=−∞
2 − 3z−1 + 2z−2
H1 (z) =
(1 − 2z−1 )2
is not stable since it has a second-order pole at z = 2. This system may be stabilized, keeping the same
amplitude of the frequency response, using a second-order allpass system with zero at z = 2
!2
z −1 − 1
2
H A (z) = 1 −1
.
1− 2z
2 − 3z−1 + 2z−2
H1 (z) = .
( z −1 − 2)2
The causal system H2 (z) has the pole at z = 4. It can be stabilized using the allpass system
z −1 − 1
4 4−z
H A (z) = 1 −1
= .
1− 4z
4z − 1
and
1
R ( z ) = H ( z ) H ∗ ( ∗ ).
z
the minimum phase system is a part of R(z) whose all zeros and poles are located inside the unit circle,
meaning that H (z) system and its inverse system 1/H (z) can be causal and stable. Therefore,
(z − 14 )(z + 21 )
H (z) = .
(z + 54 )(z − 73 )
It is easy to check that H ∗ ( z1∗ ) is equal to the remaining terms in R(z), since
Solution 5.15. The received signal should be processed by the inverse system
1 z − 21
Hi (z) = = √ .
H (z) (4 − z)(1/3 − z)(z2 − 2z + 14 )
Ljubiša Stanković Digital Signal Processing 241
However, this system has two poles outside the unit circle since
z − 21
Hi (z) = .
(4 − z)(1/3 − z)(z − 1.2071)(z − 0.2071)
These poles have to be compensated, keeping the same amplitude, using two first-order allpass systems.
The resulting system transfer function is
z − 4 z − 1.2071 z − 21
Hi (z) = .
1 − 4z 1 − 1.2071z (1/3 − z)(z − 0.2071)(1 − 4z) (1 − 1.2071z)
Chapter 6
Realization of Discrete Systems
INEAR discrete-time systems may, in general, be described by a difference equation relating
L the output signal with the input signal at the considered instant and the previous values of
the output and input signal. The transfer function can be written in various forms producing
different system realizations. Some of them will be presented next. Symbols that are used in the
realizations are presented in Fig. 6.1.
a
z −1
x(n) ax(n) x(n) x(n−1) x(n) x(n)
x(n)
+ + ×
x(n) x(n)+y(n) x(n) − x(n)−y(n) x(n) x(n)y(n)
Figure 6.1 Symbols representing particular digital systems and their functions in the realization of discrete-time
systems.
A system that includes recursions of the output signal values results in an infinite impulse response
(IIR). These systems will be presented first.
The second-order system, as a special case of the system in (6.1), will be presented first. Its
implementation is shown in Fig. 6.2. A general system described by (6.1) can be implemented as in
Fig. 6.3. This form is a direct realization I of a discrete-time system.
242
Ljubiša Stanković Digital Signal Processing 243
x(n) B0 y(n)
+ +
−1 −1
z z
x(n−1) + + y(n−1)
B A
−1 1 1 −1
z z
x(n−2) y(n−2)
B A
2 2
z−1 z−1
+ +
B A
−1 1 1 −1
z z
+ +
B2 A2
z−1 z−1
BM AN
Direct realization I consists of two system blocks connected in cascade. The first block implements the
non-recursive part of the difference equation
y ( n ) = A1 y ( n − 1) + · · · + A N y ( n − N ) + y1 ( n ),
with the output from the first block, y1 (n), being the input signal to the second block. The cascade of
these two block is shown in Fig. 6.3. The transfer functions of these blocks are
H1 (z) = B0 + B1 z−1 + · · · + B M z− M
and
1
H2 (z) = .
1 − A1 z −1 − · · · − A N z − N
244 Realization of Discrete Systems
x(n) B y(n)
0
+ +
−1
z
+ +
A1 B1
−1
z
+ +
A B
2 2
−1
z
AN BM
Example 6.1. Find the transfer function of the discrete-time system presented in Fig. 6.5.
x(n) y(n)
+ +
z−1
+
−1/2
z−1
+
1/2 1/3
z−1
−1/6
⋆The system can be recognized as a direct realization II form. After its blocks are separated and
interchanged, the system in a form shown in Fig. 6.6 is obtained.
z−1 z−1
+
−1/2
z−1 z−1
+
1/3 1/2
z−1
−1/6
Figure 6.6 The system from Fig. 6.5, with interchanged blocks.
The difference equation for the whole system is obtained after y1 (n) from (6.3) is replaced into
(6.4)
1 1 1 1
y ( n ) = y ( n − 2) − y ( n − 3) + x ( n ) − x ( n − 1) + x ( n − 2).
2 6 2 3
The system transfer function of the whole system is
1 − 21 z−1 + 13 z−2
H (z) = H1 (z) H2 (z) = .
1 − 12 z−2 + 16 z−3
246 Realization of Discrete Systems
Systems with a large number of elements in a recursion may be sensitive to the errors due to the
coefficients deviations. Deviations of the coefficients from the true values are caused by finite length
registers used to memorize them in a computer. Influence of the finite register lengths to the signal and
system realization will be studied later, as a part of random disturbance analysis. Here, we will only
consider influence of this effect to the system coefficients, since it may influence the way how to realize
a discrete-time system.
For the first-order system with a real-valued pole
1 1
H (z) = =
1 + A1 z −1 1 − z p1 z−1
the error in coefficient A1 is the same as the error in the system pole z p1 . If the coefficient is quantized
with a step ∆, then the error in the pole location is of order ∆. The same holds for the system zeros.
For a second-order system with real-valued coefficients and a pair of complex-conjugated poles
1 1
H (z) = =
1 + A1 z −1 + A2 z −2 (1 − z p1 z−1 )(1 − z p2 z−1 )
the relation between the coefficients and the real and imaginary parts of the poles z p1/2 = x p ± jy p is
1
H (z) =
1 − 2x p z−1 + ( x2p + y2p )z−2
A1 = −2x p
A2 = x2p + y2p .
The error in coefficient A1 defines the error in the real part of the pole x p .
When the coefficient A2 takes discrete values A2 = m∆, with A1 ∼ x p = n∆, then the imaginary
q √
part of the poles may take the values y p = ± A2 − x2p = ± m∆ − n2 ∆2 with n2 ≤ mN. For small
√
n, that is, for small real part of the pole, y p = ± ∆m. For N discretization levels, assuming that the
poles are within the unit circle x2p + y2p ≤ 1, the first discretization step is changed from 1/N order to
√
1/ N order. The error, in this case, could significantly be increased. The changes in y p due to the
discretization of A2 may be large.
The quantization of x p and y p as a result of quantization of − A1 /2 and A2 = x2p + y2p is shown
in Fig. 6.7, for the case of N = 16 and N = 32 quantization levels. We see that the error in y p , when
it assumes small values, can be very large. We can conclude that the poles close to the unit circle
with larger imaginary values y p are less sensitive to the errors. The highest error could appear if the
second-order real-valued pole (with y p = 0) were implemented using the second-order system.
We have concluded that the poles close to the real axis (small y p ) are sensitive to the error in
coefficients even in the second-order systems. The sensitivity increases with the system order, since the
higher powers in the polynomial increase the maximum possible error.
Consider a general form of a polynomial in the transfer function, written in two forms
P ( z ) = z M + z M −1 A1 + · · · + A M
and
P(z) = (z − z1 )(z − z2 ) . . . (z − z M ).
If the coefficients A1 , A2 , . . . , A M are changed for small ∆A1 , ∆A2 , . . . , ∆A M (due to quantization)
then the pole position (without loss of generality and for notation simplicity consider the pole z1 ) is
Ljubiša Stanković Digital Signal Processing 247
yp=Im{zp } yp=Im{zp }
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x =Re{z } x =Re{z }
p p p p
Figure 6.7 Quantization of the real part and the imaginary part, x p = Re{z p } and y p = Im{z p }, of poles
(zeros) as a result of the quantization in 16 levels (left) and 32 levels (right) of the coefficients A1 = −2x p and
A2 = x2p + y2p .
changed for
∂z1 ∂z1 ∂z1
∆z1 ∼
= ∆A1 + ∆A2 + · · · + ∆A M . (6.5)
∂A1 ∂A2 ∂A M | z = z1
Since there is no a direct relation between z1 and A1 we will find ∂z1 /∂Ai using
The coefficients ∂z1 /∂Ai|z=z1 could be large, especially in the case when there are close poles, with a
small distance (zi − zk ).
In the realization of this system the coefficients are rounded to two decimal positions, with the
absolute error up to 0.005. Find the poles of the system with rounded coefficients.
248 Realization of Discrete Systems
P(z) ∼
= z4 − 2.4673z3 + 2.1200z2 − 0.7336z + 0.0849.
with poles
P̂(z) = (z − 0.5370)(z − 0.2045)(z − 0.7285)(z − 1).
The poles of this function with rounded coefficients can differ significantly from the original
pole values in (6.7). The maximum error in poles is 0.8409 − 0.7285 = 0.1124. One pole is on
the unit circle making the system with rounded coefficients unstable, in contrast to the stable
original system. Note that if the system is written as a product of the first-order functions in the
denominator and every pole value is rounded to two decimals
1
H (z) = 7 12 111 95
(z − 29 )( z − 27 )( z − 132 )( z − 101 )
P(z) ∼
= (z − 0.24)(z − 0.44)(z − 0.84)(z − 0.94)
the poles will differ from the original ones for no more than 0.005.
If the poles are grouped into the second-order terms (what should be done if the coefficients
were complex-conjugate in order to avoid calculation with complex valued coefficients), then
P(z) ∼
= (z2 − 0.6858z + 0.1073)(z2 − 1.7815z + 0.7910).
we will get
P̂(z) = (z − 0.25)(z − 0.44)(z − 0.8442)(z − 0.9358)
with maximum error of 0.01.
The pole values are illustrated in Fig. 6.8.
The sensitivity analysis for this example can be done for each of the poles. Assume that the
poles are denoted by z1 = 12/27, z2 = 7/29, z3 = 111/132, and z4 = 95/101. Then,
P(z)
Figure 6.8 Poles for a system with errors in coefficients: for the fourth-order polynomial (top) and the product of
the two second-order polynomials (bottom).
∆z1 ∼
= 0.0878.
The true error is ∆z1 = 0.0926. A small difference is due to the linear approximation, assuming
small ∆Ai . The obtained result is a good estimate of an order of error for the pole z1 . The error in
z1 is about 18.5 time greater than the maximum error in the coefficients Ai , which is of order
0.005.
A transfer function of the discrete-time system in (6.2), with M = N, might be written as a product of
the first-order subsystems
Commonly real-valued signals are processed and the poles and zeros in the transfer function are in
complex-conjugated pairs. In that case it is better to group these pairs into second-order systems to
250 Realization of Discrete Systems
x(n) B B y(n)
00 0K
+ + + +
−1 −1
z z
+ + + +
A B A B
10 −1 10 1K −1 1K
z z
A B A B
20 20 2K 2K
In the realization, the second-order subsystems are commonly used. Note that it is possible to
realize these second-order subsystems using the first-order systems with real-valued coefficients x pL
and y pL that are real and imaginary parts of the complex-conjugated pair of poles, z pL = x pL ± jy pL ,
respectively. To this aim consider first an example.
Example 6.3. Find the transfer function of a system with a feedback shown in Fig. 6.10.
H(z)
⋆The z-transform of the signal at the output of adder the in the system form Fig. 6.10 is
R ( z ) = X ( z ) − H ( z )Y ( z ) .
Ljubiša Stanković Digital Signal Processing 251
Y ( z ) = H ( z ) R ( z ) = H ( z ) X ( z ) − H 2 ( z )Y ( z ) .
Y (z) H (z)
He (z) = = .
X (z) 1 + H 2 (z)
y pL z−1
Qi ( z ) = .
1 + A1i z−1 + A2i z−2
Using the real and imaginary part of the complex-conjugate poles z pL = x pL ± jy pL , the transfer
function, Qi (z), can be expressed as
y pL z−1 y pL z−1
Qi ( z ) = =
1 − 2x pL z−1 + x2pL z−2 + y2pL z−2 (1 − x pL z−1 )2 + y2pL z−2
1 1 H (z) H2 (z)
= y pL z−1 2 =
(1 − x pL z−1 )2 y pL z−1 1 + H 2 (z)
1+ 1− x pL z−1
where
y pL z−1 1
H (z) = and H2 (z) = .
1 − x pL z−1 1 − x pL z−1
Therefore, the second-order system can be implemented as in Fig. 6.11, using the first-order systems
shown in Fig. 6.12. In this case there is no grouping of the coefficients into the second-order polynomials.
H2(z)
x(n) y(n)
+ H(z) +
− +
z −1
xp
H(z)
Figure 6.11 Complete second-order subsystem with the complex-conjugate pair of poles realized using the
first-order systems.
The error in one coefficient (real or imaginary part of a pole) does not influence the other
coefficients. However, if an error in the signal calculation happens in one cascade, then it will propagate
as an input to the following cascades. In that sense, it would be the best to order cascades in such a way
that the lowest probability of an error appears in the first cascade. From the analysis of error we can
conclude that the cascades with the poles and zeros close to the origin are more sensitive to the error
and should be used in later cascade stages.
252 Realization of Discrete Systems
H(z)
x(n) yp y(n)
−1
+ z
+
xp
Figure 6.12 First-order system used in the realization of the second-order system with the complex-conjugate pair
of poles.
1.4533(1 + z−1 )3
H (z) =
(−0.8673z−1 + 3.1327)(3.0177z−2 − 5.434z−1 + 7.54)
1 + z −1 1 + 2z−1 + z−2
= 0.0615 −
×
1 − 0.2769z 1 1 − 0.7207z−1 + 0.4002z−2
present the cascade realization using:
(a) both the first and the second-order systems;
(b) the first-order systems with real-valued coefficients only.
⋆(a) Realization of the system H (z) when both the first and the second-order subsystems can
used is done according to the system transfer function as in Fig. 6.13.
+ +
1 0.2769 0.7207 −1 2
z
−0.4002 1
(b) For the first-order subsystems, the realization should be done based on
1 + z −1 1
H (z) = 0.0615 × (1 + z −1 ) × (1 + z −1 ) × ,
1 − 0.2769z−1 1 − 0.7207z−1 + 0.4002z−2
Ljubiša Stanković Digital Signal Processing 253
with
1 1
=
1 − 0.7207z−1 + 0.4002z−2 (1 − (0.3603 + j0.5199)z−1 )(1 − (0.3603 − j0.5199)z−1 )
1
=
1 − 2 × 0.3603z−1 + 0.36032 z−2 + 0.51992 z−2
1 1 1
= = .
0.51992 z−2 + (1 − 0.3603z−1 )2 (1 − 0.3603z−1 )2 1 + ( 0.5199z−1−1 )2
1−0.3603z
In this way, the system can be written and realized in terms of the first-order subsystems,
1 + z −1 1 + z −1 1 + z −1 1
H (z) = 0.0615 − − − −1
1 1
1 − 0.2769z 1 − 0.3603z 1 − 0.3603z 1 + 1 0.5199z 0.5199z−1
1−0.3603z−1 1−0.3603z−1
x(n) 0.0615
+ + + +
z−1 z−1
0.2769 0.3603
y(n)
+ + +
z−1 yp=0.5199
yp yp
−1 + −1 +
z z
0.3603
0.3603 0.3603
x(n) y(n)
+ + +
B00
z−1
+ +
A B
10 −1 10
z
A B
20 20
+ +
B01
z−1
+ +
A B
11 11
z−1
A B
21 21
+ +
B0K
z−1
+ +
A B
1K −1 1K
z
A B
2K 2K
⋆ The parallel realization follows directly from the system transfer function definition. It is
presented in Fig. 6.16.
z−1
1.1078 −1 0.2542
z
−0.5482
0.7256
+ +
−1
z
+
0.9246 −0.084
z−1
−0.2343
For the cascade realization, the system transfer function should be written in a form of the
product of the second-order transfer functions,
+ + +
1.1078 −1 1 0.9246 −1 0.0858
z z
For each of the previous realization, an inverse form may be implemented by switching the input and
the output signal and changing the flow directions of the signal. As an example, consider the direct
realization II from Fig. 6.4. This realization, with separated delay circuits is shown in Fig. 6.18. Its
inverse form is presented in Fig. 6.19.
x(n) B0 y(n)
+ +
z−1 z−1
+ +
A1 B1
z−1 z−1
+ +
A2 B2
z−1 z−1
AN BM
y(n) B0 x(n)
+ +
z−1 z−1
+ +
A1 B1
z−1 z−1
+ +
A2 B2
z−1 z−1
AN BM
It is easy to conclude that the inverse realization of the direct realization II has the same transfer
function as the direct realization I. Since both realization I and realization II have the same transfer
functions it follows that the inverse realization has the same transfer function as the original realization.
Ljubiša Stanković Digital Signal Processing 257
In general, transfer functions of discrete-time systems are obtained in the form of a ratio of two
polynomials. The polynomial in the transfer function denominator defines poles. In the time domain,
this means a recursive relation, relating the output signal at the current instant with the previous output
signal values. Realization of this kind of systems is efficient, as described in the previous section. When
the output signal is a linear combination of the input signal, x (n), and its delayed versions, x (n − m),
only, the systems does not have recursions. Its difference equations is
This system is characterized with a finite impulse response, and it is referred to as the FIR system. This
system is always stable. The FIR systems can also have a linear phase.
d(arg{ H (e jω )})
τg = − =q
dω
and will not distort the impulse response with respect to the zero-phase system. The impulse response
will only be delayed in time for a constant q.
After passing through a system with the frequency response H (e jω ), this signal is changed to
M jωm
y(n) = ∑ Am | H (e jωm )|e j(ωm n+θm +arg{ H (e )})
.
m =1
In general, the phase of every signal component is changed in a different way for arg{ H (e jωm )},
causing the signal distortion due to different delays corresponding to different frequencies. If the
phase function of the frequency response is linear then all signal component phases are changed
in the same way for arg{ H (e jωm )} = −ωm q. They corresponding to a constant delay for all
components. A delayed signal, without distortion, is obtained
M
y ( n ) = y0 ( n − q ) = ∑ Am | H (e jωm )|e j(ωm (n−q)+θm ) ,
m =1
258 Realization of Discrete Systems
where y0 (n) would be the response if the phase of the transfer function were 0. In the case of a
linear phase arg{ H (e jω )} = −ωq the phase delay
arg{ H (e jω )}
τϕ = − =q
ω
and the group delay τg are the same. In general, the group delay and the phase delay are different.
The group delay, as the notion dual to the instantaneous frequency, is introduced and discussed in
the first chapter of this book.
Consider a system with a real-valued impulse response h(n). Its frequency response is
N −1 N −1 N −1
H (e jω ) = ∑ h(n)e− jωn = ∑ h(n) cos(ωn) − j ∑ h(n) sin(ωn). (6.9)
n =0 n =0 n =0
Combining the linear phase condition (6.8) with the form in (6.9), we get
The middle point of the interval where h(n) 6= 0 is n = ( N − 1)/2. If q = ( N − 1)/2, then
sin(ω (n − q)) is an odd function with respect to n = ( N − 1)/2. The summation (6.10) is zero
if the impulse response h(n) is an even function with respect to n = ( N − 1)/2. Hence, the solution
to (6.10) is
N−1
q=
2
h(n) = h( N − 1 − n), 0 ≤ n ≤ N − 1.
Since the Fourier transform is unique, this is the unique solution for the linear phase condition. It is
illustrated for an even and odd N in Fig. 6.20. From the symmetry condition, it is easy to conclude that
there is no a causal linear phase system with an infinite impulse response.
6.2.2 Windows
When a system obtained from the design procedure is an IIR system and the requirement is to implement
it as an FIR system, in order to get a linear phase or to guaranty the system stability (when small
changes of the coefficients are possible), then the most obvious way is to truncate the desired impulse
response hd (n) of the resulting IIR system. The impulse response of the FIR system is
hd (n), for 0 ≤ n ≤ N − 1
h(n) =
0, elsewhere.
Ljubiša Stanković Digital Signal Processing 259
q=16
h(n)
N−1=32
0 16 32
n
q=16.5
h(n)
N−1=33
0 16.5 33
n
Figure 6.20 The impulse response of a system with the linear phase for an odd and even N.
H (e jω ) = Hd (e jω ) ∗ω W (e jω ).
Since the rectangular window function has the Fourier transform of the form
N −1
sin(ωN/2)
W (e jω ) = ∑ e− jωn = e− jω ( N −1)/2 ,
n =0
sin(ω/2)
its convergence is slow, with significant oscillations. This oscillations will cause oscillations in
the resulting frequency response H (e jω ), Fig. 6.21. By increasing the number of samples N, the
convergence speed will increase. However the oscillations amplitude will remain the same, Figs.6.21
(d) and (f). Even with N → ∞ the amplitude oscillations will remain, Figs.6.21 (b). This effect is called
the Gibbs phenomenon.
Example 6.7. The desired frequency response of a system is Hd (e jω ), with the IIR hd (n) for
−∞ < n < ∞. Find the FIR system impulse response hc (n) that approximates the desired
transfer function with a minimum mean absolute squared error.
260 Realization of Discrete Systems
Without loss of generality, assume that the most significant values of hd (n) are within
− N/2 ≤ n ≤ N/2 − 1. The impulse response hc (n) can take nonzero values only within
− N/2 ≤ n ≤ N/2 − 1. Therefore,
N/2−1 − N/2−1 ∞
e2 = ∑ |hd (n) − hc (n)|2 + ∑ |hd (n)|2 + ∑ |hd (n)|2 .
n=− N/2 n=−∞ n= N/2
Since the last two terms are hc (n) independent and all three terms are non negative, the
error e2 is minimum if
h(n) = hc (n − N/2).
A
shift in time
does not change the amplitude of the desired frequency response, since
jω
H (e ) = Hc (e jω ).
In order to reduce the oscillations in the frequency response amplitude other windows are
introduced. They are presented within the introductory chapters, trough the examples. Here we will list
the basic windows (more details on the window functions will be given in Part V).
Triangular (Bartlett) window is defined as
|n+1− N/2|
1 − N/2 , for 0 ≤ n ≤ N − 1
w(n) =
0, elsewhere.
Avoiding window discontinuities at the ending points, the convergence of its transform is improved.
Since this window may be considered as a convolution of the two rectangular windows
1
w(n) = [u(n) − u(n − N/2)] ∗n [u(n) − u(n − N/2)]
N/2
its the Fourier transform is the product of the corresponding rectangular window Fourier transforms
Figure 6.21 Impulse response of a FIR system obtained by truncating the desired IIR response (a), (b) using two
rectangular windows of different widths (c)-(f), and using a Hann(ing) window (g),(h).
262 Realization of Discrete Systems
would be continuous in the continuous-time domain. In that domain its first derivative would be
continuous as well. Thus, its Fourier domain convergence is further improved with respect to the
rectangular and the Bartlett windows. The Fourier transform of this window is related to the Fourier
transform of the rectangular window as W (e jω )/2 + W (e j(ω +2π/N )/4 + W (e j(ω −2π/N )/4.
It loses the continuity property (in the continuous-time domain). Its convergence for very large values of
ω will be slower than in the Hann(ing) window case. However, as it will be shown later, its coefficients
are derived in such a way that the first side-lobe is canceled out at its mid point. Then, the immediate
convergence, after the main lobe, is much better than in the Hann(ing) window case.
Other windows are derived with other different constraints. Some of them will be reviewed in
Part V of this book as well.
Suppose that the desired system frequency response is given in the frequency domain. If we want to get
an N point FIR system that approximates the desired frequency response, then it can be obtained by
sampling the desired frequency response Hd (e jω ) at ω = 2πk/N, k = 0, 1, 2, . . . , N − 1, that is
This procedure is illustrated on a lowpass filter design, Fig. 6.22. Note that at the discontinuity points,
high oscillations will occur in the resulting H (e jω ). The oscillations can be avoided by smoothing the
transition intervals. Smoothing by a Hann(ing) window in the frequency domain is shown in Fig. 6.23.
The FIR systems can be realized in the same way as the IIR systems presented in the previous section,
without using the recursive coefficients. A common way of presenting the direct realization of the FIR
system is shown in Fig. 6.24. It is often referred to as an adder with the weighting coefficients h(n).
A realization of liner phase FIR system that uses the coefficients symmetry h(0) = h( N − 1),
h(2) = h( N − 2), . . . property is shown in Fig. 6.25.
Realization of a frequency sampled FIR filter may be done using the relation between the
z−transform and the DFT of a signal.
If we want to realize a FIR system with N nonzero samples, then it can be expressed in term of
the DFT of the frequency response (samples of the transfer function H (z) along the unit circle) as
Ljubiša Stanković Digital Signal Processing 263
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-2 0 2 -2 0 2
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-2 0 2 -2 0 2
Figure 6.22 Realization of a FIR system with N samples in time, obtained by sampling the desired frequency
response with N samples. A direct sampling (left) and the sampling with smoothed transition (right),
0.5
0.25
-16 -12 -8 -4 0 4 8 12
Figure 6.23 A Hann(ing) window for smoothing the frequency response in the frequency domain (left) and in the
time domain (right).
264 Realization of Discrete Systems
+ + +
y(n)
1 N −1
H (k)e j2πnk/N .
N k∑
h(n) =
=0
Then, the transfer function H (z) calculated using the values of h(n), 0 ≤ n ≤ N − 1, is
N −1
1 N −1 N −1 1 N −1 1 − z− N e j2πk
∑ h(n)z−n = ∑ ∑ H (k)e j2πnk/N z−n =
N k∑
H (z) = −1 j2πk/N
H (k)
n =0
N k =0 n =0 =0 1 − z e
Example 6.8. For a system whose impulse response is the Hamming window function of the length
N = 32 present the FIR filter based realization.
Ljubiša Stanković Digital Signal Processing 265
⋆For the Hamming window with N = 32, the impulse response is given by
π
h(n) = 0.52 + 0.48 cos((n − 16) ), for 0 ≤ n ≤ 31.
16
The DFT values are H (0) = 0.52 × 32, H (1) = −0.24 × 32, H (31) = H (−1) = −0.24 × 32,
and H (k) = 0 for other k within 0 ≤ k ≤ 31. Therefore,
and
1 − cos(π/16)z−1
H3 (z) = −2H (1) .
1 − 2 cos(π/16)z−1 + z−2
Example 6.9. For the system whose frequency response Hd ( jΩ) in the continuous-time domain is
Hd ( jΩ) = π − |Ω| ,
for |Ω| ≤ π, with the corresponding Hd (e jω ) in the discrete-time domain (∆t = 1 is assumed,
Fig. 6.26) find the FIR filter impulse response with N = 7 and N = 8:
(a) Sampling of the desired frequency response Hd (e jω ) in the frequency domain.
(b) Calculating hd (n) = IFT{ Hd (e jω )} and taking its N the most significant values,
h(n) = hd (n) for − N/2 ≤ n ≤ N/2 − 1 and h(n) = 0 elsewhere (using rectangular window).
(c) Comment the error in both cases.
⋆(a) Sampling in the frequency domain is illustrated in Fig. 6.26. The values of the FIR system
frequency response, in this case, are the samples of Hd (e jω ),
π (1 − 2 Nk ), for 0 ≤ k < N/2
jω
H (k) = Hd (e ) = .
ω =2πk/N
π (2 Nk − 1), for N/2 ≤ k ≤ N − 1.
The sampling is illustrated in the second row of Fig. 6.26 for N = 7 and N = 8. Impulse response
of the FIR filter is
1 N −1
H (k)e j2πnk/N .
N k∑
h(n) = IDFT{ H (k)} =
=0
266 Realization of Discrete Systems
For N = 7
π 10π 2π 6π 2π 2π 2π
h(n) = + cos( n) + cos(2 n) + cos(3 n), 0 ≤ n ≤ 6.
7 49 7 49 7 49 7
For N = 8
π 3π 2π π 2π π 2π
h(n) = + cos( n) + cos(2 n) + cos(3 n), 0 ≤ n ≤ 7.
8 16 8 8 8 16 8
These impulse responses are shown in Fig. 6.26 (third row). The frequency response of the FIR
filter is
H (e jω ) = FT{h(n)}.
Its values are equal to the desired frequency response at the sampling points
H (e jω ) = Hd (e jω ) .
ω =2πk/N ω =2πk/N
Outside these points, the frequency responses significantly differ (calculate, for example the values
H (e j0 ), H (e jπ/2 ), and H (e jπ )). Here, there is no significant discontinuity in the frequency
response. It means that the frequency response smoothing, using a window (Hann(ing) or
Hamming window in the time domain), would not significantly improve the result.
(b) The impulse response of the desired system is
Zπ
1
hd (n) = IFT{ Hd (e jω )} = (π − |ω |) e jωn dω
2π
−π
Zπ
2 1 − cos(nπ )
= (π − ω ) cos(ωn)dω = .
2π πn2
0
H (e jω ) = FT{h(n)},
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
2 2
1 1
0 0
-1 -1
0 2 4 6 0 2 4 6
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.26 Design of a FIR filter by the frequency sampling of the desired frequency response.
268 Realization of Discrete Systems
-1
-15 -10 -5 0 5 10 15
2 2
1 1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.27 Design of a FIR filter by windowing the impulse response of an IIR filter.
0.2
0.008092
0.1
-0.1
-0.2
0 1 2 3 4 5 6
0.2
0.0018945
0.1
-0.1
-0.2
0 1 2 3 4 5 6
Figure 6.28 Error in the case of the frequency response sampling (top) and the IIR impulse response truncation
(bottom), along with the corresponding mean square error (Er ) value.
Ljubiša Stanković Digital Signal Processing 269
6.3 PROBLEMS
16(z + 1)z2
H (z) =
(4z2 − 2z + 1)(4z + 3)
plot the cascade, parallel and direct realization.
Plot its direct realization I, direct realization II, parallel realization, and cascade realization.
Problem 6.3. Find the transfer function of the discrete system presented in Fig. 6.29.
x(n) y(n)
+ + + +
z−1 z−1
+ + + +
2 1/2 1/3
z−1 z−1
Problem 6.4. Find the transfer function of the discrete system presented in Fig. 6.30.
z−1 z−1
x(n) y(n)
+ + + + +
z−1 z−1
+ + + +
2 1/2 1/3
z−1 z−1
4z2 4z + 4
H (z) = ,
4z2 − 2z + 1 4z + 3
plot its cascade and parallel realization. Write down the difference equation which describes this
system.
1 + z −2
H (z) =
1 + 2z−1 + 2z−2 + z−3
plot the cascade realization.
Problem 6.8. For the system presented in Fig. 6.31 find the transfer function.
H1 ( z )
x(n) rsinθ
+
+
rcosθ z−1
y(n)
+
−rsinθ
rcosθ z−1
H2 ( z )
Problem 6.9. The discrete system is defined by the following two equations
1 1 2
y ( n ) + y ( n − 1) + w ( n ) + w ( n − 1) = x ( n )
4 2 3
5 5
y(n) − y(n − 1) + 2w(n) − 2w(n − 1) = − x (n),
4 3
Ljubiša Stanković Digital Signal Processing 271
where x (n) is the input signal, y(n) is the output signal, and w(n) is a signal within the system. What
is the frequency and impulse response of the system?
1 + 2z − z2 + 4z3 − z4 + 2z5 + z6
H (z) =
z6
has a linear phase function. Find its group delay.
Problem 6.11. Let h(n) be an impulse response of a causal system with the Fourier transform H (e jω ).
A real-valued output signal y1 (n) = x (n) ∗ h(n) of this system is reversed, r (n) = y1 (−n), and
passed through the same system, resulting in the output signal y2 (n) = r (n) ∗ h(n). The final output
is reversed again y(n) = y2 (−n). Find the phase of the frequency response function of the overall
system.
Problem 6.12. For the system whose frequency response in the continuous-time domain is
2 for |ω | < π2
Hd ( jΩ) = 1 for π2 < |ω | < 3π4
0 elsewhere,
with the corresponding Hd (e jω ) in the discrete-time domain obtained with ∆t = 1, find the FIR filter
impulse response with N = 15 and N = 14:
(a) Sampling the desired frequency response Hd (e jω ) in the frequency domain,
(b) Calculating hd (n) = IFT{ Hd (e jω )} and taking its N the most significant values, h(n) =
hd (n) for − N/2 ≤ n ≤ N/2 − 1 and h(n) = 0 elsewhere (rectangular window).
(c) Comment the sources of error in both cases.
6.4 EXERCISE
z2 − 2
H (z) =
(z − 1)(z − 2)
plot the direct realization I, direct realization II, parallel realization, and cascade realization.
3z−2 + 6
H ( z ) = −3
z − 2z−2 + 3z−1 − 6
a) plot the direct realization I, direct realization II, cascade realization, and parallel realization.
b) Find ∑∞ n=−∞ h ( n ), where h ( n ) is the impulse response of the system.
272 Realization of Discrete Systems
x(n) y(n)
+ +
z−1
4 −1
z−1
−5
+ +
0
−1
z
1/2 2
Exercise 6.4. Find the impulse response of the discrete system presented in Fig. 6.32.
Exercise 6.5. Using the impulse invariance method with the sampling interval ∆t = 0.1, transform the
continuous-time system given with the transfer function
1 + 5s
H (s) =
8 + 2s + 5s2
into a discrete-time system, and plot the direct and the cascade realization of the system. Is the obtained
discrete-time system stable?
Exercise 6.6. Using the bilinear transform with the sampling interval ∆t = 1, transform the system
given with the transfer function
2+s
H (s) =
8 + 2s + 5s2
into a discrete-time system, and plot the direct and the cascade realization of the system. Is the obtained
discrete system stable?
Exercise 6.7. Using the bilinear transform, with the sampling interval ∆t = 0.2 transform the
continuous-time system given with the transfer function
3s + 6
H (s) =
(s + 1)(s + 3)
into discrete-time system, and plot its direct realization II.
Exercise 6.8. For the system whose frequency response in the continuous-time domain is
(
|Ω|
2 − π/2 for |ω | < π2
Hd ( jΩ) =
0 elsewhere
with the corresponding Hd (e jω ) in the discrete-time domain obtained for ∆t = 1, and presented in Fig.
6.33, find the FIR filter impulse response with N = 7 and N = 8:
Ljubiša Stanković Digital Signal Processing 273
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.33 The desired system in the continuous-time domain (left) and discrete-time domain (right).
6.5 SOLUTIONS
Solution 6.1. In order to plot the direct form of realization, the transfer function should be written in
a form suitable for this type of realization,
16(z + 1)z2 1 + z −1
H (z) = =
(4z2 − 2z + 1)(4z + 3) (1 − 2 z−1 + 14 z−2 )(1 + 43 z−1 )
1
1 + z −1
= . (6.11)
1 + 14 z−1 − 81 z−2 + 3 −3
16 z
According to the previous relation, direct realizations I and II follow. They are presented in Fig.
6.34 and Fig. 6.35, respectively.
x(n) y(n)
+ +
z−1 z−1
+
−1/4
z−1
+
1/8
−1
z
−3/16
x(n) y(n)
+ +
z−1
+
−1/4
z−1
+
1/8
z−1
−3/16
For the cascade realization, the transfer function is written in the form
1 + z −1
H (z) =
(1 − 21 z−1 + 14 z−2 )(1 + 43 z−1 )
1 + z −1 1
= = H1 (z) H2 (z).
1 − 21 z−1 + 14 z−2 1 + 34 z−1
The cascade realization, implemented as a product of two blocks, has the form shown in Fig. 6.36.
x(n) y(n)
+ + +
−1 −1
z z
+
1/2 −1 −3/4
z
−1/4
In order to plot a parallel realization, the transfer function should be written in a form of partial
fractions expansion, which is suitable for this type of realization,
1 + z −1 Az−1 + B C
H (z) = = + .
(1 − 1 −1
2z + 1 −2 3 −1
4 z )(1 + 4 z ) 1 − 12 z−1 + 41 z−2 1 + 34 z−1
Solution 6.2. Using the z-transform properties, the given difference equation can be written as
Y (z) 1 + z −1 + z −2
H (z) = = . (6.12)
X (z) 1 − z−1 + z−2 + 3z−3
Direct realizations I and II, presented in Fig. 6.38 and Fig. 6.39, respectively, follow from the previous
equation.
For the cascade realization, the transfer function should be written in the form of a product of two
blocks
1 + z −1 + z −2 1
H (z) = = H1 (z) H2 (z).
1 − 2z−1 + 3z−2 1 + z−1
This form is now suitable for the cascade realization given in Fig. 6.40.
276 Realization of Discrete Systems
x(n) y(n)
+ +
22/19
−1
z
+
1/2 −1 1/19
z
−1/4
+
−3/19
−1
z
−3/4
x(n) y(n)
+ +
−1 −1
z z
x(n−1) y(n−1)
+ +
1
−1 −1
z z
x(n−2) y(n−2)
+
−1
−1
z
y(n−3)
−3
x(n) y(n)
+ +
z−1
+ +
1
−1
z
+
−1
−1
z
−3
x(n) y(n)
+ + +
z−1 z−1
+ +
2 −1
z−1
−3
For the parallel realization, we will write the transfer function in the form of partial fractions with
real-valued coefficients
1 1 −1
6 2z + 65
H (z) = + .
1 + z −1 1 − 2z−1 + 3z−2
Its realization is now straightforward.
Solution 6.3. The system can be recognized as a cascade of two subsystems and its transfer function
can be written as a product of the transfer functions of these two blocks
where H1 (z) denotes the first block and H2 (z) denotes the second block. The first subsystem with the
transfer function H1 (z) can be considered as a direct realization II, with the input to output relation
1 1 1
y1 (n) = 2y1 (n − 1) + y1 (n − 2) + x (n) + x (n − 1) − x (n − 2),
3 2 3
as shown in Fig. 6.41. Using the z-transform properties, its transfer function is
+ + + +
2 −1 1/2 −1 1/3
z z
Now consider the second block whose transfer function is H2 (z). This block can be considered
as a parallel realization of two blocks, H2 (z) = H21 (z) + H22 (z) where H21 (z) = 1.
The second transfer function is the transfer function that corresponds to a direct realization II, of
a subsystem described by
1 1
y2 ( n ) = y2 ( n − 1) + y2 ( n − 2) + x1 ( n ) + x1 ( n − 1) − x1 ( n − 2).
3 4
Thus, the transfer function of this subsystem is
1 + 31 z−1 − 41 z−2
H2 (z) = H21 (z) + H22 (z) = 1 + .
1 − z −1 − z −2
Finally, the transfer function of the whole system is
!
1 + 21 z−1 − 31 z−2 1 + 31 z−1 − 14 z−2
H (z) = H1 (z) H2 (z) = 1+ .
1 − 2z−1 − 13 z−2 1 − z −1 − z −2
Solution 6.4. This realization can be considered as a cascade realization of two blocks H1 (z) and
H2 (z),
H (z) = H1 (z) H2 (z).
The first block is a direct realization II, whose transfer function is
1 + ( 12 + 1)z−1 − 13 z−2
H1 (z) = .
1 − 2z−1 − 31 z−2
Previous relation holds since the upper delay block (above the obvious direct realization II block)
has the same input and output as the first delay block below it.
The block with transfer function H2 (z) can be considered as a parallel realization of two blocks,
similarly as in previous example, with, H21 (z) and H22 (z), defined by
1 + 13 z−1 − 14 z−2
H21 (z) = ,
1 − z −1 − z −2
and
H22 (z) = z−1 .
Hence, the transfer function of the second block is
1 + 31 z−1 − 14 z−2
H2 (z) = H21 (z) + H22 (z) = + z −1 .
1 − z −1 − z −2
Now, the resulting transfer function can be written in the form
!
1 + ( 12 + 1)z−1 − 13 z−2 1 + 31 z−1 − 41 z−2 −1
H (z) = H1 (z) H2 (z) == +z .
1 − 2z−1 − 31 z−2 1 − z −1 − z −2
Ljubiša Stanković Digital Signal Processing 279
It can be expressed, using the roots of the numerator and denominator polynomials, as
1 − (0.1 + j0.1)z−1 1 − (0.1 − j0.1)z−1
H (z) =
1 − (0.85 − j0.75)z−1 1 − (0.85 + j0.75)z−1
1 − (0.9 + j0.8)z−1 1 − (0.9 − j0.8)z−1
× .
1 − (0.05 − j0.1)z−1 1 − (0.05 + j0.1)z−1
x(n) y(n)
+ + + +
z−1 z−1
+ + + +
1.7 −1 −1.8 0.1 −1 −0.2
z z
Figure 6.42 Cascade realization of the system with blocks ordered in such a way that the whole system is less
sensitive to possible quantization error
Solution 6.6. For a cascade realization, the transfer function is expressed in the form
1 1 + z −1
H (z) = . (6.13)
1− 1 −1
2z + 1 −2
4z 1 + 34 z−1
x(n) y(n)
+ + +
z−1 z−1
+
−3/4 1/2
z−1
−1/4
1/2 1/19
z−1
−1/4
−3/19
+
z−1
−3/4
Solution 6.7. The transfer function form which corresponds to the cascade realization of the system is
(1 + z −2 )
H (z) = .
( z −1 + 1)(1 + z−1 + z−2 )
In order the use the smallest number of the delay circuits, it can be expressed in the form
1 (1 + z −2 )
H (z) = H1 (z) H2 (z) = . (6.15)
(1 + z −1 ) (1 + z −1 + z −2 )
This form corresponds to the cascade realization presented in Fig. 6.45.
Ljubiša Stanković Digital Signal Processing 281
x(n) y(n)
+ + +
z−1 z−1
+
−1 −1
z−1
−1
1 1 2
Y (z)(1 + z−1 ) + W (z)(1 + z−1 ) = X (z)
4 2 3
5 5
Y (z)(1 − z−1 ) + 2W (z)(1 − z−1 ) = − X (z).
4 3
By eliminating W (z) we get
1 5 1
Y (z)[(2 + z−1 )(1 − z−1 ) − (1 − z−1 )(1 + z−1 )]
2 4 2
4 −1 5 1 −1
= X (z)[ (1 − z ) + (1 + z )].
3 3 2
The transfer function is
Y (z) 3 − 12 z−1
H (z) = = ,
X (z) 1 − 34 z−1 + 18 z−2
with the difference equation describing this system
3 1 1
y(n) − y(n − 1) + y(n − 2) = 3x (n) − x (n − 1).
4 8 2
The frequency response is
3 − 21 e− jω
H (e jω ) = 3 − jω
.
1− 4e + 81 e− j2ω
Based on
Y (z) 3 − 21 z−1 4 1
H (z) = = 3 −1 1 −2
= 1 −1
− ,
X (z) 1 − 4z + 8z 1 − 2z 1 − 41 z−1
the impulse response is
h(n) = [4(1/2)n − (1/4)n ]u(n).
Solution 6.8. The transfer function of the subsystem denoted by H1 (z) follows from
where x1 (n) is the input signal to this subsystem, whose transfer function is
z−1 r sin θ
H2 (z) = − .
1 − r cos θz−1
For the feedback holds
H1 (z)( X (z) + Y (z) H2 (z)) = Y (z).
This relation produces
h ( n ) = h ( N − 1 − n ), 0 ≤ n ≤ N − 1,
with N = 7, which implies the phase function linearity. Thus, the group delay q is
N−1
q= = 3.
2
Solution 6.11. Let h(n) be an impulse response of a causal system with the Fourier transform
H (e jω ). A real-valued output signal y1 (n) = x (n) ∗ h(n) of this system is reversed, r (n) = y1 (−n),
and passed through the same system, resulting in the output signal y2 (n) = r (n) ∗ h(n). The final
output is reversed again y(n) = y2 (−n). Find the phase of the frequency response function of the
overall system. The frequency domain form of the system y1 (n) = x (n) ∗ h(n) is
Y1 (e jω ) = H (e jω ) X (e jω ).
For the operation r (n) = y1 (−n) in the time domain, the frequency domain form is
R(e jω ) = Y1∗ (e jω ) = H ∗ (e jω ) X ∗ (e jω )
When this signal passes through the same system y2 (n) = r (n) ∗ h(n) we have
Y2 (e jω ) = R(e jω ) H (e jω ) = H ∗ (e jω ) H (e jω ) X ∗ (e jω )
Y (e jω ) = Y2∗ (e jω ) = H (e jω ) H ∗ (e jω ) X (e jω ).
So we get
Y (e jω ) = | H (e jω )|2 X (e jω ).
Ljubiša Stanković Digital Signal Processing 283
Obviously, the phase function of the overall system, | H (e jω )|2 , is equal to zero for all ω.
Solution 6.12. (a) Values of the FIR filter, obtained by sampling the frequency response in the
frequency domain are
H (k ) = Hd (e jω ) .
ω =2πnk/N
This sampling is illustrated in the second row of Fig. 6.46 for N = 15 and N = 14.
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
2 2
1 1
0 0
-1 -1
-5 0 5 -6 -4 -2 0 2 4 6
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.46 Design of the FIR filter using the frequency sampling of the desired frequency response.
1 N −1
H (k)e j2πnk/N .
N k∑
h(n) = IDFT{ H (k)} =
=0
It is shown in Fig. 6.46 (third row). The frequency response of the FIR filter is
H (e jω ) = FT{ h(n)}.
284 Realization of Discrete Systems
Its values are equal to the desired frequency response at the sampling points
H (e jω ) = Hd (e jω ) .
ω =2πk/N ω =2πk/N
sin(nπ/2) sin(3nπ/4)
hd (n) = IFT{ Hd (e jω )} = + .
πn πn
Using the first N = 15 samples in the discrete-time domain we get
hd (n), for −7 ≤ n ≤ 7
h(n) =
0, elsewhere
or for N = 16
hd (n), for −8 ≤ n ≤ 7
h(n) =
0. elsewhere.
The frequency response of this FIR filter is
H (e jω ) = DFT{h(n)}.
-1
-15 -10 -5 0 5 10 15
2 2
1 1
0 0
-1 -1
-10 -5 0 5 10 -10 -5 0 5 10
3 3
2 2
1 1
0 0
-5 0 5 -5 0 5
Figure 6.47 The FIR filter design using N the most significant values of the impulse response (window approach).
Ljubiša Stanković Digital Signal Processing 285
(c) The error value, as a function of the frequency ω, along with the mean squared absolute error
Er is shown in Fig. 6.48.
0.037954
0.5
-0.5
0 1 2 3 4 5 6
0.028921
0.5
-0.5
0 1 2 3 4 5 6
Figure 6.48 Error in the case of the frequency response sampling (top) and the IIR impulse response truncation
(bottom), along with the corresponding mean square error (Er ) value.
Part III
Random Discrete-Time
Signals and Systems
286
Chapter 7
Discrete-Time Random Signals
ANDOM signal values cannot be defined by simple deterministic mathematical functions. Their
R values are not known in advance. These signals can be described by stochastic tools only.
Here, we will restrict the analysis to the discrete-time random signals. The first-order and
second-order statistics will be considered.
Statistics is a science or practice dealing with the collection, analysis, interpretation, and presentation
of numerical data, inferring parameters from the whole set of data or their representative sample. A
statistic is a single numerical fact obtained from the analysis of the considered set of data and used for
the whole data set description.
The first-order statistics is the starting point in describing random signals. The mean value, or the data
sample average, of a random signal is one of the parameters of this statistics. If we have a set of signal
samples,
X = { x (n)| n = 1, 2, . . . , N }, (7.1)
the mean value of this set of signal samples is calculated as
1
µ̂ x = mean{ x (n)| n = 1, 2, . . . , N } = ( x (1) + x (2) + · · · + x ( N )). (7.2)
N
For notation simplicity, we will also use µ̂ x = mean{ x (n)}, meaning the mean of the dataset { x (n)}
for all indices n where the signal is available.
To distinguish the calculated (statistically estimated) value µ̂ of a signal parameter from the true
one µ (if all possible signal realizations were available) we will use the hat (ˆ) symbol.
Example 7.1. Consider a random signal x (n) whose one realization is given in Table 7.1. Find
the mean value of this signal. Find how many samples of the signal are within the intervals
[1, 10], [11, 20],. . . , [91, 100]. Plot the number of occurrences of signal x (n) samples within these
intervals as a function of the interval range.
287
288 Discrete-Time Random Signals
Table 7.1
A realization of random signal
54 62 58 51 70 43 99 52 57 76
56 53 38 61 28 69 87 41 72 80
23 26 66 47 69 71 69 81 68 79
31 55 52 23 60 34 83 39 66 59
37 12 54 42 67 95 89 67 42 63
35 55 54 55 49 77 18 64 73 70
67 56 42 66 50 47 49 25 50 57
61 84 48 67 71 74 35 59 60 42
40 77 52 63 57 42 44 64 36 71
66 39 50 31 11 75 45 62 60 55
⋆The realization of signal x (n) defined in Table 7.1 is presented in Fig. 7.1.
120
110
100
90
80
70
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
1 100
100 n∑
µ̂ x = x (n) = 55.76.
=1
From Table 7.1 or its visualized presentation in Fig. 7.1, we can conclude that, for example, there
is no any signal sample whose value is within the interval [1, 10]. Within [11, 20] there are two
signal samples (x (42) = 12 and x (95) = 11). In a similar way, the number of signal samples
Ljubiša Stanković Digital Signal Processing 289
within other intervals are counted and the result is shown in Fig. 7.2. This kind of random signal
presentation is called a histogram of x (n), with the defined intervals.
25
20
15
10
0
0 10 20 30 40 50 60 70 80 90 100
Figure 7.2 Histogram of the random signal x (n) from Fig. 7.1, with 10 intervals defined by [10i + 1, 10i + 10],
i = 0, 1, 2, . . . , 9.
Example 7.2. For the signal x (n) from the previous example assume that a new random signal y(n)
is formed as
x (n) + 5
y(n) = int ,
10
where int {·} denotes the nearest integer. This means that y(n) = 1 for 1 ≤ x (n) ≤ 10, y(n) = 2
for 11 ≤ x (n) ≤ 20, . . . , y(n) = i for 10(i − 1) + 1 ≤ x (n) ≤ 10i, up to i = 10. What is the
set of possible values of y(n)? Find and graphically present the number of occurrences of every
possible value of y(n) in this signal realization. Find the mean value of the new signal y(n) and
discuss the result.
⋆ The signal y(n) is shown in Fig. 7.3. It takes the values from the set {2, 3, 4, 5, 6, 7, 8, 9, 10}.
For the signal y(n), instead of the histogram, we can plot a diagram of the number of
occurrences of every value that y(n) can take, as in Fig. 7.4. The mean value of y(n) is
1 100
100 n∑
µ̂y = y(n) = 6.13.
=1
The mean value can also be written, by grouping the same values of y(n), as
1
µ̂y = (1 · n1 + 2 · n2 + 3 · n3 + · · · + 10 · n10 ) =
100
n n n n
= 1 · 1 + 2 · 2 + 3 · 3 + · · · + 10 · 10 ,
N N N N
290 Discrete-Time Random Signals
11
10
0
0 10 20 30 40 50 60 70 80 90 100
where N = 100 is the total number of the available signal values and ni is the number showing how
many times each of the values i appeared in y(n). If there is a sufficient number of occurrences
for every outcome value i, then
n
Py (i ) = i
N
can be considered as an estimate of the probability that the value i appears. In that sense
10
µ̂y = 1 · Py (1) + 2 · Py (2) + 3 · Py (3) + · · · + 10 · Py (10) = ∑ y(i) Py (i)
i =1
with
10 10
n
∑ Py (i) = ∑ Ni = 1.
i =1 i =1
Values of the probability estimates Py (i ) are shown in Fig. 7.4.
In general, the mean value for every signal sample could be different. For example, if the signal
values represent the highest daily temperature during a year then the mean value is highly dependent
on the considered sample. In order to calculate the mean value of temperature, we have to have several
realizations of these random signals (measurements over M years), denoted by { xi (n)}, where the
argument n = 1, 2, 3, . . . , N is the cardinal number of the day within a year and i = 1, 2, . . . , M is the
index of realization (year index). The mean value is then calculated as
1 1 M
(7.3)
M i∑
µ̂ x (n) = ( x1 (n) + x2 (n) + · · · + x M (n)) = x i ( n ),
M =1
for every n. In this case, we have a set (a signal) of mean values {µ̂ x (n)}, for n = 1, 2, . . . , 365.
Ljubiša Stanković Digital Signal Processing 291
25 0.25
20 0.2
15 0.15
10 0.1
5 0.05
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Figure 7.4 Number of appearances of every possible value of y(n) (left) and the estimates of the probabilities
that the random signal y(n) takes a value i = 1, 2, . . . , 10 (right).
Example 7.3. Consider the signal x (n) whose realizations are given in Table 7.2. The values of x (n)
are equal to the monthly average of the maximum daily temperatures in a city measured from
year 2001 to 2015. Find the mean of this temperature for each month over the considered period
of years. What is the mean value of the temperature over all months and years? What is the mean
temperature for every year?
⋆The signal for years 2001 to 2007 is given in Fig. 7.5. The mean temperature for the nth month,
over the considered years, is
1 15
15 i∑
µ̂ x (n) = x20i (n),
=1
where the notation 20i is symbolic in the sense that 2001, 2002, . . . 2015 holds for i =
01, 02, . . . , 15. The mean value signal µ̂ x (n) is shown in the last panel of Fig. 7.5. The mean value
over all months and years is
12 15
1
∑
15 · 12 n=1 i∑
µ̂ x = x20i (n) = 19.84.
=1
1 12
12 n∑
µ̂ x (20i ) = x20i (n).
=1
The mean value calculated as the sample average is commonly used due it calculation simplicity.
Later, it will be shown that the sample average is optimal in the estimation of the true mean value of a
signal sample when its realizations are corrupted by a quite common disturbance called Gaussian noise
(it is interesting not notice that Gauss introduced his famous distribution as the best framework for the
sample average estimator, see Section 7.4.5).
292 Discrete-Time Random Signals
Table 7.2
Average of maximum temperature values within months over 15 years, 2001-2015.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
10 4 18 17 22 29 30 28 27 17 17 5
6 7 11 23 22 32 35 33 22 26 22 8
10 11 10 16 21 26 32 31 23 19 17 4
3 11 13 19 22 26 34 29 26 22 12 9
7 10 13 21 27 29 30 34 24 20 16 11
7 11 17 17 27 25 37 34 33 22 14 14
7 12 13 19 23 32 34 38 21 21 12 10
12 5 9 20 21 37 34 34 27 22 20 7
7 12 13 23 27 33 29 31 25 21 6 11
8 12 10 17 27 33 38 32 23 20 15 9
8 10 13 24 23 33 33 31 27 21 16 8
4 6 15 18 25 26 27 33 23 23 13 11
3 6 16 17 27 28 30 32 29 24 12 10
11 12 14 18 22 29 34 34 23 21 20 11
6 13 8 22 22 29 30 34 23 18 15 8
The mean value, calculated as the sample average using (7.2) or (7.3), is the result of the
following minimization problem. Given a set of the random sample x (n) realizations, { xi (n)}, where
i = 1, 2, . . . , M is the index of a realization (in (7.2), xi (n) = x (i )). The aim is to estimate the true mean
signal value µ(n) by µ̂(n), such that its squared distance (deviation) from the available realizations
xi (n), i = 1, 2, . . . , M, is minimum, that is
µ̂ x (n) = min ( x1 (n) − α)2 + ( x2 (n) − α)2 + · · · + ( x M (n) − α)2
α
= min ||x − α||22 = min f (α),
α α
where x = [ x1 (n), x2 (n), . . . , x M (n)] T and f (α) = ||x − α||2 is the two-norm of the vector x − α. The
result of this minimization is obtained from
d
( x1 ( n ) − α )2 + ( x2 ( n ) − α )2 + · · · + ( x M ( n ) − α )2 = 0 (7.4)
dα
in the form given by (7.3) or (7.2).
The sample average estimation is very sensitive to possible wrongly recorded realizations of a
sample x (n) or to the realization with a very high disturbance due to some exceptional circumstances.
These signal realizations, which significantly differ from the true value of the signal sample, are called
outliers, in contrast to the realizations with relatively small errors called inliers.
The sample average calculation will produce a completely wrong (unbounded) result if at least
one outlier happens in the considered set of realizations { xi (n)}. The smallest possible fraction of
samples needed to be replaced by outliers, in order to make an estimator unbounded, is called the
breakdown point of the estimator. For the sample average (7.3) or (7.2), the breakdown point is the
Ljubiša Stanković Digital Signal Processing 293
45 45
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
45 45
35 35
25 25
15 15
5 5
-5 -5
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Figure 7.5 Several realizations of a random signal x20i (n), for i = 01, 02, . . . , 07 and the mean value µ x (n) for
every sample (month) over 15 available realizations.
smallest possible, 1/M (where M is the number of available realizations), since only one sample
(outlier) can make it unbounded.
The estimators which are robust to possible outliers in the data are defined and analyzed within
robust statistics. The simplest tool in this area will be considered next.
7.1.2 Median
In addition to the sample average, a sample median is used as a statistic to describe of a set of random
values. The median of a dataset is a value in the middle of the set of available samples, after the
members of the set are sorted. If we denote the sorted values of x (n) as s(n)
If N is even, then the median is defined as the mean value of two samples the nearest to ( N + 1)/2,
s N2 + s N2 + 1
median{ x (n)| n = 1, 2, . . . , N } = , for an even N.
2
⋆(a) After sorting the values in the set A we get A = {−9, −2, −1, 0, 1, 4, 6}. Therefore,
median(A ) = 0.
(b) Similarly, median(B ) = 0. The mean values of these data would significantly differ.
(c) The sorted values of x (n) are shown in Fig. 7.6. Since the number of samples of signal
x (n) is N = 100, there is no single sample in the middle of the sorted sequence. The middle is
between the sorted samples 50 and 51. Thus, the median is defined here as the mean value of the
50th and 51st sorted sample.
120
110
100
90
80
70
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
The median will not be influenced by a possible small number of big outliers (signal values
being significantly different from the values in the rest of the data). In the worst case, we have to
replace N/2 of the realizations in order to be certain that the middle signal sample is among the
Ljubiša Stanković Digital Signal Processing 295
outliers and the median result will not be an inlier. Therefore, the breakdown point of this estimator is
( N/2)/N = 1/2.
The sample average estimator was introduced by minimizing the squared distance (deviation)
from the available realizations xi (n). Since the square of large errors is very large, this kind of estimator
is highly influenced by the outliers. A common way to reduce the influence of large errors is to use the
absolute value of the difference, instead of the squared distance in the minimization function (7.4), that
is,
min | x1 (n) − α| + | x2 (n) − α| + · · · + | x M (n) − α| .
α
The same holds for the case when we consider x (n), n = 1, 2, . . . , N, with
min | x (1) − α| + | x (2) − α| + · · · + | x ( N ) − α| .
α
Next, we will show that the result of this minimization is the median of the considered set
median { xi (n)} = min | x1 (n) − α| + | x2 (n) − α| + · · · + | x M (n) − α| ,
i =1,2,...,M α
and assume, without loss of generality, that the samples in x = [ x1 (n), x2 (n), . . . , x M (n)] T are already
sorted, x1 (n) ≤ x2 (n) ≤ . . . , x M (n) and that M is odd. The minimum of this function cannot be
obtained as in (7.4) since this function is not differentiable at the points α = x1 (n), α = x2 (n), . . . ,
α = x M (n). However, the function f (α) is differentiable for all other values of α and continuous for
any α. We will use this property to establish the intervals of α where it decreases and increases. The
derivative of the function | xi (n) − α| is equal to
(
d | xi ( n ) − α | −1, for α < xi (n)
=
dα 1, for α > xi (n).
Therefore, the derivative of f (α) within the interval on the left of the smallest signal value, α < x1 (n), is
equal to the sum of derivatives d| xi (n) − α|/dα = −1 of all terms, and it is equal to d f (α)/dα = − M.
If we move to the right along the α axis, to the interval x1 (n) < α < x2 (n), then the derivative of
| x1 (n) − α| is changed to 1, while all other M − 1 terms, have the derivatives equal to −1. This
means that d f (α)/dα = − M + 2, in this interval. If we continue and move next to the interval
x2 (n) < α < x3 (n), and so on, we get
− M, for α < x1 (n)
− M + 2, for x1 (n) < α < x2 (n)
.
.
.
d f (α)
= −1, for x( M−1)/2 (n) < α < x( M+1)/2 (n)
dα
1, for x( M+1)/2 (n) < α < x( M+3)/2 (n)
..
.
M, for α > x M (n).
as illustrated in Fig. 7.7 for x = [ x1 (n), x2 (n), . . . , x7 (n)] T = [−0.9, −0.5, 0, 0.2, 0.7, 0.8, 1] T .
296 Discrete-Time Random Signals
Obviously, the cost function f (α) is a decreasing function, d f (α)/dα < 0, for α < x( M+1)/2 (n)
and an increasing function, d f (α)/dα > 0, for α > x( M+1)/2 (n). Since the function f (α) is
continuous, this proves that
median { xi (n)} = min( f (α)).
i =1,2,...,M α
1.5
0.5
0
-0.9 -0.5 0 0.2 0.7 0.8 1
3
-0.9 -0.5 0 0.2 0.7 0.8 1
When M is even, then d f (α)/dα = 0 will be obtained for the interval x M/2 (n) < α <
x M/2+1 (n). This means that the cost function decreases for α < x M/2 (n), it is a constant within the
interval x M/2 (n) < α < x M/2+1 (n), and then increases for α > x M/2+1 (n). In the case of an even
M, the mean value of x M/2 (n) and x M/2+1 (n) is used as the sample median.
In some cases the number of outliers is small. Then, the median will neglect many inlier signal
values that could produce a good estimate of the mean value. In these cases, the best choice would be
to use not only the mid-value in the sorted signal, but several samples of the signal, around its median
and to calculate their (trimmed) mean, for an odd N, as
L
1 N+1
LSmean{ x (n)| n = 1, 2, . . . , N } = ∑ s +i .
2L + 1 i=− L 2
With L = ( N − 1)/2, all signal values are used and LSmean{ x (n)| n = 1, 2, . . . , N } is the standard
mean of a signal. With L = 0, the value of LSmean{ x (n)k n = 1, 2, . . . , N } is equal to the sample
median. In general, this way of the mean estimation is the L-statistics (α-trimmed) based estimation.
Ljubiša Stanković Digital Signal Processing 297
The next important parameter in statistics is a measure of the deviation of realizations of a random
sample from the mean value. The most commonly used parameter for the description of this statistical
property is the standard deviation (called the spread) or its squared value called the variance. For a
random signal x (n) whose values are available in M realizations, the variance is calculated as the mean
squared deviation of the signal values from the corresponding true mean values, µ x (n),
1
σ̂x2 (n) = | x1 (n) − µ x (n)|2 + · · · + | x M (n) − µ x (n)|2 . (7.6)
M
The standard deviation is a square root of the variance. The standard deviation can be estimated
as a square root of the mean of squares of the centered data,
r
1
σ̂x (n) = | x1 (n) − µ x (n)|2 + · · · + | x M (n) − µ x (n)|2 . (7.7)
M
If the mean value is estimated using the same set of data, µ̂ x (n) = M 1
∑iM =1 xi ( n ), the previous
estimate (which assumes the true mean value µ x (n)) tends to produce lower values of the standard
deviation (biased standard deviation). Thus, an adjusted version, the sample standard deviation, is used
as an unbiased spread measure,
r
1
σ̂x (n) = | x1 (n) − µ̂ x (n)|2 + · · · + | x M (n) − µ̂ x (n)|2 . (7.8)
M−1
This form confirms the fact that in the case when only one sample is available, M = 1, we should not
be able to estimate the standard deviation (see Problem 7.2).
Example 7.5. For the signal x (n) from Example 7.1 calculate the mean value and the variance.
Compare it with the mean value and the variance of the signal z(n) given in Table 7.3.
Table 7.3
Random signal z(n)
55 57 56 54 59 52 66 54 56 60
55 55 51 56 48 59 63 52 59 61
47 48 58 53 58 59 59 61 58 61
49 55 54 47 56 50 62 51 58 56
50 44 55 50 58 58 63 58 52 57
50 55 55 55 53 60 46 57 59 59
58 55 58 58 54 53 54 48 54 56
57 62 53 58 59 60 50 56 56 50
51 60 54 57 55 52 52 57 50 59
58 51 54 49 44 60 52 57 56 55
298 Discrete-Time Random Signals
120
110
100
90
80
70
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
⋆The mean value and the variance of signal x (n) are µ̂ x = 55.76 and σ̂x2 = 314.3863. The
standard deviation is σ̂x = 17.7309. It is a measure of the signal value deviations from the mean
value. For the signal z(n), the mean value is µ̂z = 55.14 (very close to µ̂ x ), while the variance
is σ̂z2 = 18.7277 and the standard deviation is σ̂z = 4.3275. Deviations of z(n) from its mean
value are much smaller. If the signals x (n) and z(n) were the measurements of the same physical
value, then the individual measurements from z(n) would be more reliable than the individual
measurements from x (n).
If we denote the sample standard deviation of the data set { xi (n)}, where i = 1, 2, . . . , M, as
S( x1 (n), x2 (n)), . . . , x M (n)) = σx (n), then it satisfies the scale property
The proof is simple, using (7.8) and the property that the mean value of yi (n) = axi (n) + b is
µ̂y (n) = aµ̂ x (n) + b.
The sample standard deviation is sensitive to outliers. This can be concluded form its definition
(7.8). For the spread estimation, when outliers can be expected, the median absolute deviation (MAD)
can be used as its robust measure. The MAD is defined as
MADx (n) = median {x j (n) − median { xi (n)}},
j=1,2,...,M i =1,2,...,M
2
by analogy with the variance definition in (7.6), σx2 (n) = mean {x j (n) − mean { xi (n)} }.
j=1,2,...,M i =1,2,...,M
The MAD value is related to the sample standard deviation as
for the Gaussian random variable (see Problem 7.11). The breakdown point for the MAD is the same
as for the sample median.
Ljubiša Stanković Digital Signal Processing 299
The regression analysis deals with the random variable modeling and it is widely used in various areas,
including machine learning and data prediction. The most common model is the linear regression,
where it has been assumed that the outcome random variable fits the linear model of the independent
(also random) variable. Within the signal processing framework, we will consider a continuous-time
random signal x (t) sampled at random instants tn . In linear regression, the signal model is a linear
function
x (tn ) = atn + b + ε(tn ), i = 1, 2, . . . , N
where ε(tn ) is a random variable that describes the deviations of the individual realizations, x (tn ),
from the assumed linear model, atn + b, with constant parameters a and b. The values of ε(tn ) are
unknown.
The aim is to estimate the linear model parameters a and b from the available data and to use
them for prediction or classification of new data values. Since the values of x (tn ) and tn are available,
the error function is formed as
e(n) = x (tn ) − atn − b.
The cost function, that will be used in the minimization process, is
The most common form of the cost function is defined as the sum of the squared values of the error
function,
N N 2
J ( a, b) = ∑ e2 (n) = ∑ x (tn ) − atn − b . (7.9)
n =1 n =1
This cost function is optimal if the measurement disturbances e(n) = ε(tn ) are Gaussian distributed.
The minimization of this function (least squares - LS minimization) is done using
∂J ( a, b) N
= −2 ∑ tn x (tn ) − atn − b = 0
∂a n =1
and
N
∂J ( a, b)
= −2 ∑ x (tn ) − atn − b = 0
∂b n =1
The system of equations
N N N
â ∑ t2n + b̂ ∑ tn = ∑ tn x(tn ) (7.10)
n =1 n =1 n =1
N N
â ∑ tn + b̂N = ∑ x(tn ) (7.11)
n =1 n =1
that can be written in the form A[ â b̂] T = B, produces the estimates â and b̂ of the linear regression
model parameters a and b, [ â b̂] T = A−1 B. After the system of equations is solved, the values of
parameters are estimated as
µ̂ xt − µ̂ x µ̂t
â = and b̂ = µ̂ x − âµ̂t ,
µ̂t2 − µ̂2t
where µ̂ xt = mean{ x (tn )tn }, µ̂ x = mean{ x (tn )}, µ̂t = mean{tn }, and µ̂t2 = mean{t2n }.
300 Discrete-Time Random Signals
Example 7.6. The random signal x (t), whose behavior is expected to be linear, is sampled at the
instants tn
[ x (t1 ), x (t2 ), . . . , x (t N )] T = [4.81, 3.13, 4.25, 4.04, 4.55, 4.76, 4.16, 3.03] T .
Find the linear regression model using the least squares approach. What is the prediction of the
signal value x (t) at t = 1.1?
used for the model parameters a and b estimation, are defined by (7.10) and (7.11),
3.15 4.41 19.50
A= , B= , with
4.41 8.00 32.73
−1
â 3.15 4.41 19.50 2.03
= = .
b̂ 4.41 8.00 32.73 2.97
The estimated linear regression model is
For tn = 1.1, we can predict x (1.1) = 5.2. The data and the results are shown in Fig. 7.9.
2
-0.2 0 0.2 0.4 0.6 0.8 1
Figure 7.9 The data x (tn ) measured at tn for n = 1, 2, 3, 4, 5, 6, 7, 8 (dots) and the linear model, x (tn ) =
2.03tn + 2.97, obtained by the least squares approach (dotted line). The predicted signal value at tn = 1.1 is
marked by the circle.
Ljubiša Stanković Digital Signal Processing 301
The solution to the minimization problem is obtained from ∂J (a)/∂a T = 0 or −2T T (x − Tâ) = 0 as
â = (T T T)−1 T T x.
The regression analysis can be generalized to the cases with more than one independent variable.
These regression forms will also be considered in Section 7.1.6, after the RANSAC method is presented
in the next section.
When the mean value of the data and the spread measure (commonly standard deviation) are known,
we can define a criterion to identify the outliers in the data. The function used for this purpose is
x (n) − µ̂ x (n)
z(n) = ,
σ̂x (n)
and it is called the z-score. It is common to assume the threshold value T = 2.5 and to declare the signal
samples with |z(n)| ≤ T as inliers and |z(n)| > T as outliers. The meaning of the threshold T = 2.5
will be explained later. Since the values of the average µ̂ x (n) and the sample standard deviation σ̂x (n)
can be significantly compromised with possible outliers, it is recommended to use the median and the
corresponding MAD in the z-score.
The random sample consensus (RANSAC) is used for linear regression when the outliers in the
data are expected. Consider a set of data x (tn ) sampled at random instants tn and assume that the true
data values fit a linear model. Since a large number of outliers is expected, the linear model can be far
from most of the data samples. In the RANSAC approach we will:
2. The samples with indices in S are used to estimate the linear regression model parameters,
â ∑ t2n + b̂ ∑ tn = ∑ tn x(tn )
n ∈S n ∈S n ∈S
â ∑ tn + b̂N = ∑ x(tn ).
n ∈S n ∈S
[ â b̂] T = A− 1
S BS ,
the line
x = ât + b̂
is defined. The distances dn of all data points (tn , x (tn )), n = 1, 2, . . . , N from this line are
calculated,
| âtn + b̂ − x (tn )|
dn = √ .
1 + â2
302 Discrete-Time Random Signals
4. If a sufficient number of data points is such that their distance from the model line is lower than
an assumed distance threshold d, then all these points are included into new set of data
and the final estimation of the parameters a and b (for machine learning or prediction) is obtained
with all data from D.
5. If there was no sufficient number of data points within the distance d, a new random small set of
data, i ∈ S, is taken and the procedure is repeated from Step 2.
6. The procedure is stopped when the desired number of data points within D is achieved or the
maximum number of trials is reached.
Find the linear regression model parameters a and b for these data. Comment the results. Next, apply the
RANSAC approach as follows: Use S = 4 randomly chosen samples with indices n ∈ S = {8, 10, 18, 19}.
Find the linear regression model parameters with this subset of data. How many data points are within the
distance d = 0.25 from the obtained linear model line? If the number of the data points within these lines
is below assumed T = 10, chose another random set S. In that case use S = {5, 11, 16, 19} and repeat the
procedure. If the number of data points between these lines is not below T = 10, use all the data points within
these lines to find the linear regression model parameters.
⋆ For the given data set, the estimates of the linear regression model parameters a and b are
obtained from
−1
â 7.8107 11.1070 44.3483 −0.6707
= =
b̂ 11.1070 20.0000 81.8400 4.4645
with the linear model x (tn ) = −1.5245tn + 4.6840 do not fit the data. This is confirmed by the
fact that only D = 8 data points are within the lines at the distance d = 0.25, as can be seen in
Fig. 7.10(b). Since the number of the data points within D is bellow T = 10, a new random set,
S = {5, 11, 16, 19}, is used. With the data corresponding to this subset, the estimation is obtained
Ljubiša Stanković Digital Signal Processing 303
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0.5 1 0 0.5 1
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0.5 1 0 0.5 1
Figure 7.10 (a) The data x (tn ) measured at tn for n = 1, 2, . . . , 10 (dots) and the linear model obtained by the
least squares (dotted line). The RANSAC illustration: (b) The data (dots) and the linear model obtained by the least
squares using a random subset of 4 marked samples at S = {8, 10, 18, 19} (dotted line). (c) The data (dots) and the
linear model obtained by the least squares using another random subset of 4 marked samples at S = {5, 11, 16, 19}
(dotted line). (d) The data (dots) and the linear model obtained by the least squares using all data at marked samples
in D (dotted line).
from −1
â 1.9992 2.6390 11.2537 1.8342
= =
b̂ 2.6390 4.0000 16.3400 2.8749
with the linear model x (tn ) = 1.8342tn + 2.8749 that fits the data, since D = 16 data points are
within D. Since this number is above the threshold, T = 10, the algorithm is stopped and the
linear regression model is re-estimated using all D = 16 data from the set D, producing
−1
â 5.9802 8.9180 37.7188 1.9502
= =
b̂ 8.9180 16.0000 64.1400 2.9218
and the final estimated linear regression model x (tn ) = 1.9502tn + 2.9218.
The probability that a subset of S = M data points is free from the I outliers in a data sequence
with N samples, is calculated in Example 7.10. This probability can be used to estimate the expected
number of iterations in the RANSAC approach.
304 Discrete-Time Random Signals
The regression can be generalized to more than one independent variable (multivariable) cases, when
the considered sample, x (n), is a function of the random variables t1 (n), t2 (n), . . . , t M (n), that is,
x (n) = x (t1 (n), t2 (n), . . . , t M (n)). The regression can be written in the form of a multidimensional
linear model,
where ti (n), i = 1, 2, . . . , M, are independent variables for the sample x (n). This system of equations
can be written in the matrix/vector form as
x = Ta + Ξ, where
x (1) t1 (1) t2 (1) . . . t M (1) a1 ε (1)
x (2) t1 (2) t2 (2) . . . t M (2) a2 ε (2)
x = . , T = . .. .. .. , a = .. , and Ξ = .. .
.. .. . . . . .
x( N ) t1 ( N ) t2 ( N ) . . . t M ( N ) aM ε( N )
Common goal is to minimize the squared error between the data, x, and the model, Ta,
The solution to this (least squares) minimization problem follows from ∂J (a)/∂a T = 0, as
The estimation of the regression parameters a = [ a1 , a2 , . . . , a N ] T that minimize the cost function J (a),
are obtained from
â = (T T T)−1 T T x = pinv{T}x, (7.12)
where pinv{T} = (T T T)−1 T T is the co called pseudo-inverse of the matrix T.
The sensitivity of the reconstructed coefficients to the random variations in data x (n), caused
by the noise ε(n), highly depends on the condition number of the matrix T T T, whose inverse is to
be calculated. For a high condition number of this matrix, corresponding to a relatively small value
of the determinant det{T T T}, a small noise ε(n) in the input data causes very high variations of the
resulting parameters â = [ â1 , â2 , . . . , â M ] T in the model (ill-posed problem). In order to regularize the
inversion (and to limit possible extremely large elements in the inverse matrix) a small value λ is added
before the inversion, and the vector of parameters is calculated using
â = (T T T + λI)−1 T T x. (7.13)
It can easily be shown that this form of a is the solution to the minimization of the cost function
J (a) = ||x − Ta||22 , when the energy of the coefficients, a = [ a1 , a2 , . . . , a M ] T , constraint is added. The
energy constraint in the minimization keeps the energy of a = [ a1 , a2 , . . . , a M ] T as low as possible (in
order to avoid high values of its elements, due to the possible instability of the inversion of T T T). This
is the reason why the regression estimation is called shrinkage estimation as well. The constrained cost
function is of the form
J (a) = ||x − Ta||22 + λ||a||22 ,
where ||a||22 is the L2 -norm of a. The minimum value of this cost function is obtained from
∂J (a)
= −2T T (x − Ta) + 2λa = 0
∂a T
Ljubiša Stanković Digital Signal Processing 305
⋆ The estimation of the linear model parameters, obtained from the ridge regression (7.13), are
The LASSO minimization, with the same penalty factor λ = 0.01, would produce the estimation
enforcing as many zero-valued elements in a as possible. Due to this property (crucial in sparse
signal processing and compressive sensing), the LASSO minimization would be able to produce
the solution even in the case when the number of observations is smaller than the number of
elements in a. If we keep just the first N = 6 < M = 7 rows in T and x we would get
This regression is still linear since linearity holds with respect to the model parameters a1 , a2 , . . . , a N .
The independent variables matrix is given by
1 t1 t21 ... t1M
1 t2 t22 ... M
t2
T = . , (7.14)
..
1 tN t2N ... M
tN
with all other vectors, results, and comments as in the previous multivariable case. Within the polynomial
fitting framework, the regularization constraints on the solution, prevent over-fitting the model and
keep the parameters low (see Problem 7.3).
Probability theory is a scientific discipline dealing with the analysis of random phenomena through a
set of axioms. The outcomes of a random event are determined by chance. Probability is a measure of
the likelihood of an event to occur.
For the calculation of the parameters of the first-order statistics, it is sufficient to know the
probability or the probability density function of a random variable, as its basic probabilistic description.
7.2.1 Probability
Assumes that a random signal, x (n), may take one of discrete values (amplitudes), ξ i , from the set
A = {ξ 1 , ξ 2 , . . . , ξ N }. Then, we deal with probabilities that the random signal, x (n), at an instant n
takes a specific value ξ i from the set of all possible values,
The probability function Px(n) (ξ ) satisfies the following conditions (axioms of probability theory):
(1) 0 ≤ Px(n) (ξ ) ≤ 1 for any ξ.
(2) For the events x (n) = ξ i and x (n) = ξ j , i 6= j, which exclude each other
n o
Probability x (n) = ξ i or x (n) = ξ j = Px(n) (ξ i ) + Px(n) (ξ j ).
(3) The sum of probabilities that x (n) takes any value ξ i form the set A of all possible values of
ξ is a certain event. Its probability is equal to 1, that is
∑ Px(n) (ξ ) = 1.
ξ ∈A
Independence. Two events (random signal samples) are independent of each other, if the probability
that one event occurs (one signal sample takes a specific value) does not affects the probability of the
other event occurring (does not affect the value of the other signal sample). If the signal samples x (n)
and x (m) are statistically independent random variables then
n o
Probability x (n) = ξ i and x (m) = ξ j = Px(n) (ξ i ) Px(m) (ξ j ).
Exclusiveness (disjointness). Two random events (random signal sample values) are mutually
exclusive or disjoint if they cannot both occur at the same time. If the signal sample values x (n) = ξ i
and x (n) = ξ j are mutually exclusive events then (Property (2))
n o
Probability x (n) = ξ i or x (n) = ξ j = Px(n) (ξ i ) + Px(n) (ξ j ).
Example 7.9. Consider a random signal whose values are equal to the numbers appearing in a die
tossing. The set of possible signal values is ξ i ∈ A = {1, 2, 3, 4, 5, 6}. Find the probability that
the signal sample takes the value x (n) = 2 or the value x (n) = 5, that is
Find the probability that the signal sample at an instant n takes the value x (n) = 2 and that in the
next tossing the signal takes the value x (n + 1) = 5, that is
⋆Events that x (n) = 2 and x (n) = 5 are obviously mutually exclusive. Thus, the probability of
two mutual exclusive events is equal to the sum of their individual probabilities,
1 1 1
Probability { x (n) = 2 or x (n) = 5} = Px(n) (2) + Px(n) (5) = + = .
6 6 3
The events that x (n) = 2 and x (n + 1) = 5 are statistically independent. In this case
11 1
Probability { x (n) = 2 and x (n + 1) = 5} = Px(n) (2) Px(n) (5) = = .
66 36
Conditional probability. Conditional probability is the probability that an event A occurs, given that
another event B has already occurred. The conditional probability of A, given B, is written in the form
P( A| B). The probability that both events A and B occur is
where P( B) is the probability that the event B has occurred, while P( A| B) denotes the probability that
the event A occurs subject to the condition that the event B already occurred.
308 Discrete-Time Random Signals
Example 7.10. Assume that the length of random signal x (n) is N and that the number of samples
disturbed by an extremely high noise is I. The observation set of signal samples is taken as a
subset of M < N randomly positioned signal samples. What is the probability that within M
randomly selected signal samples there are no samples affected by the high noise? If N = 128,
I = 16, and M = 32 find how many sets of M samples without any sample corrupted by the high
noise can be expected in 1000 realizations (trials).
⋆Probability that the first randomly chosen sample is not affected by the high noise could be
calculated as a priori probability,
N−I
P (1) = P ( B ) =
N
since there are N samples in total and N − I of them are noise-free. After the first noise-free
sample is chosen, in the remaining ( N − 1) signal samples there are ( N − 1 − I ) noise-free
sample. The probability of choosing a noise-free sample is now P( A| B) = ( N − 1 − I )/( N − 1).
The probability that the second randomly chosen sample is not affected by the high noise, given
that the first randomly chosen sample is not affected, is equal to the product of the probabilities,
N−1− I N− I
P (2) = P ( A ) = P ( A | B ) P ( B ) = .
N−1 N
Here we used the conditional probability property.
Then, we continue the process of random sample selection. In the same way we can calculate
the probability that all of M randomly chosen samples are not affected by the high noise as
M −1
N− I N−1− I N − ( M − 1) − I N−I−i
P( M) = ... = ∏ .
N N−1 N − ( M − 1) i =0
N−i
For N = 128 signal samples, with I = 16 samples affected with high noise, the probability that
M = 32 randomly selected samples are noise-free is equal to P(32) = 0.0071. If we repeat the
whole procedure 1000 times, by selecting M = 32 samples, we can expect
that is about 7 realizations where none of M signal samples is disturbed by the high noise. One
high noise-free realization is expected in 140 realizations.
In literature, it is common to use the following calculation for the expected number of the
iterations to get a high-noise free realization. The probability that one randomly selected sample
is high noise-free (inlier) is ( N − I )/N ). It is then assumed that this probability can be used for
M samples (that the sample is returned and may be chosen again). The probability that there is at
least one high-noise sample in M samples is [1 − (( N − I )/N )) M ]. Finally, the probability of a
high noise-free realization in Nit such trials is
ln(1 − P)
P = 1 − [1 − (( N − I )/N )) M ] Nit where Nit = .
ln(1 − (( N − I )/N )) M )
Bayes’ theorem. Consider two events A and B. From the probability that both of these events happen
P ( B | Ai ) P ( Ai )
P ( Ai | B ) = .
P( B)
Assume also that the events Ai , i = 1, 2, . . . , N are independent and exhaustive. It means that two
events Ai and A j , i 6= j, cannot happen at the same time (independence) and that one of the events Ai ,
i = 1, 2, . . . , N, must happen (exhaustiveness). In that case, we may write
P ( B | Ai ) P ( Ai )
P ( Ai | B ) = .
P ( B | A1 ) P ( A1 ) + P ( B | A2 ) P ( A2 ) + · · · + P ( B | A N ) P ( A N )
Since the Bayesian approach is of great importance in modern data processing we will comment
on the terms in more detail.
• The event B is assumed (the evidence has happened). The probability P( B) shows how probable
is the assumed event (evidence) under all possible events (hypotheses) Ai . It can be written in
the form P( B) = P( B| A1 ) P( A1 ) + P( B| A2 ) P( A2 ) + · · · + P( B| A N ) P( A N ).
• The specific event Ai is called the hypothesis, assuming that the event B has occurred. The
probability P( Ai ) shows how probable is the event Ai (hypothesis), independent of the event B
occurrence.
• Probability P( B| Ai ) indicates how probable is the event B (evidence), given that the event Ai
(hypothesis) is true.
• The result P( Ai | B) is the probability of the event Ai (how the hypothesis Ai is probable) given
the fact that the evidence B occurred.
Example 7.11. Consider four images, denoted by A1 , A2 , A3 , and A4 . In two images (A1 and A2 )
there are 20% of red pixels, in the third image (A3 ) there are 30% of red pixels, while in the
fourth image (A4 ) there are 50% of red pixels. One image is chosen randomly and one of its
pixels is observed. The chosen pixel is red (evidence B). What is the probability that the image
A4 was chosen?
⋆The probability that the image A4 was chosen (hypothesis A4 ) when the red pixel is obtained
as the evidence (denoted by B) is equal to
P ( B | A4 ) P ( A4 )
P ( A4 | B ) = ,
P( B| A1 ) P( A1 )+P( B| A2 ) P( A2 )+P( B| A3 ) P( A3 )+P( B| A4 ) P( A4 )
where:
310 Discrete-Time Random Signals
• The probability of the red pixel P( B) being obtained, under all possible events (hypotheses) Ai ,
i = 1, 2, 3, 4, is
Example 7.12. A system used for virus testing has reported its expected reliability. The probability of
correct, positive or negative, results for a tested person is very high and it is equal to 0.978. The
probability of a false-positive result (the tested person does not have the specified virus, but the
test is positive) is PF+ = 0.018. The probability of false-negative results (the tested person does
have the specified virus, but the test is negative) is PF− = 0.004. What is the expected number of
positive results in 1000 randomly tested persons from a country if the expected number of the
contaminated people by the virus is: (a) 1 per 1000 people (p = P(V ) = 10−3 ); (b) 1 per 10,000
people (p = P(V ) = 10−4 ); (c) 1.75 per 100 people (p = P(V ) = 0.0175); and (d) 25 per 100
of the selected population set for testing (formed by a prior evaluation of other symptoms)?
A randomly selected person is tested for the virus and the result is positive. Find the
probability that this person is contaminated by the virus in all four previous cases.
⋆ The probability of a positive result is equal to the sum of the probability that the tested person
does not have the virus and that the result is positive, (1 − P(V )) PF+ , and the probability that
the person does have the virus and the result is positive, P(V )(1 − PF− ). This means that the
test of a randomly selected person is positive with the probability
For the given expected number of the virus contaminated people, p = P(V ), we get: (a)
P(+) = 0.019, meaning that there will be 19 positive results in 1000 randomly tested persons,
although the expected rate is 1 per 1000, in this case. Therefore, most of the test results are
Ljubiša Stanković Digital Signal Processing 311
false-positive. (b) P(+) = 0.0181, confirming that the false-positive dominates the test again.
(c) In this case, P(+) = 0.035, meaning that half of the positive tested are indeed the people
contaminated with the virus, since this probability, with an ideal test, would indicate that there are
3.5 contaminated people per 100. (d) For the selected symptomatic set of people, with a relatively
high probability of the virus occurrence, we get P(+) = 0.2625, meaning that the agreement
with the expected number of the contaminated people in this set is high.
The conclusion is that a random screening of the population would not produce a satisfactory
result if the occurrence rate of the virus (disease) is low and the test is not an ideal one with the
zero false-positive and false-negative rates. Testing should be done on a preselected set of people.
This conclusion will be even more obvious from the next Bayes’ analysis.
When a randomly selected person is tested and the result is positive, then the a posterior
probability of the event that the person is virus contaminated, given the positive test, P(V |+), is
where P(+|V ) is the probability of the positive test, given that the person is virus contaminated,
p = P(V ) is the probability that a random person has the virus, and P(+) is the probability that
the test is positive, including both cases that the person does and does not have the virus.
For the three considered cases, the values of the probability P(V |+) values are: (a)
P(V |+) = 0.0525, (b) P(V |+) = 0.0055, and (c) P(V |+) = 0.4964. (d) For the selected set of
people, with a significant probability of having the virus, we get P(V |+) = 0.9486, meaning a
high reliability of the test results (if the test is positive the person is contaminated).
These results confirm the conclusion that the random test should not be done, unless the
probability of the virus (disease) in the tested people is increased using other symptoms, meaning
that the set of the tested people will contain the virus (disease) with a significant probability.
In the previous example, we assumed the probability of p = P(V ). Commonly it is not known
and should be estimated based on the posterior evidence that k out of N test were positive, with the
given testing system. This problem will be considered in Section 7.4.3.
The mean (average) value is calculated over a set of available samples, resulting from an experiment
and it is also of random nature. If the probabilistic description of a random signal (variable) is known,
then we can predict the mean (average) value of this signal without using its specific random realization
or performing experiments with random trials. This analytically obtained value is called the expected
value and it represents the true value of the mean that would be obtained with a large number of
experiments. The expected value is deterministic.
Expected value. The expected value of the signal sample x (n) is calculated as a sum over the set of
possible amplitudes, ξ ∈ A = {ξ 1 , ξ 2 , . . . , ξ N }, weighted by the corresponding probabilities,
Variance. The variance of a random signal sample x (n) which takes the values ξ from the discrete set
A = {ξ 1 , ξ 2 , . . . , ξ N }, with the known probabilities, Px(n) (ξ i ), is defined as
Example 7.13. A random signal x (n) can take values from the set ξ ∈ A = {0, 1, 2, 3, 4, 5}. It is
known that for k = 0, 1, 2, 3, 4 the probability of x (n) = k is twice higher than the probability of
x (n) = k + 1. Find the probabilities Px(n) (ξ k ) = P{ x (n) = k}. Find the expected value and the
variance of this random signal.
⋆Assume that P{ x (n) = 5} = A for k = 5. Then the probabilities that x (n) takes a value k are
ξk = k 0 1 2 3 4 5
Px(n) (ξ k ) = P{ x (n) = k} 32A 16A 8A 4A 2A A
If a random signal can take continuous values in amplitude then we cannot define the probability that
one exact signal amplitude value ξ is taken by the signal sample x (n). In this case, the probability
density function p x(n) (ξ ) should be used. This function defines the probability that the nth signal
sample x (n) takes a value within an infinitesimally small interval dξ around ξ,
Z∞
p x(n) (ξ )dξ = 1.
−∞
The probability of an event that the signal x (n) value is within a < x (n) ≤ b is
Zb
Probability { a < x (n) ≤ b} = p x(n) (ξ )dξ.
a
Ljubiša Stanković Digital Signal Processing 313
Cumulative probability distribution Fx (χ) is defined as the probability that the signal x (n) value is
lower than χ,
Zχ
Fx (χ) = Probability { x (n) ≤ χ} = p x(n) (ξ )dξ.
−∞
Obviously, limχ→−∞ Fx (χ) = 0, limχ→+∞ Fx (χ) = 1,
Zb
Probability { a < x (n) ≤ b} = p x(n) (ξ )dξ = Fx (b) − Fx ( a),
a
For the case of random signals whose samples take continuous amplitude values, the variance is
defined by
Z∞ 2
σx2 (n) = ξ − µ x(n) p x(n) (ξ )dξ,
−∞
where p x(n) (ξ ) is the probability density function.
Example 7.14. Consider a real-valued random signal x (n) with samples whose values are uniformly
distributed over the interval −1 ≤ x (n) < 1.
(a) Find the expected value and the variance of the signal x (n) samples.
(b) The signal y(n) is obtained as y(n) = x2 (n). Find the expected value and the variance
of signal y(n).
⋆Since the random signal x (n) is uniformly distributed within the interval −1 ≤ x (n) < 1, its
probability density function is of the form
A for − 1 ≤ ξ < 1
p x (n) (ξ ) =
0 elsewhere.
R∞
The value of constant A, A = 1/2, is obtained from −∞ p x(n) (ξ )dξ = 1. The expected value
and the variance are given by
Z∞ Z1 Z∞ Z1
1 1 2 1
µ x (n) = ξ p x(n) (ξ )dξ = ξdξ = 0, σx2 (n) = (ξ − µ x (n))2 p x(n) (ξ )dξ = ξ dξ = .
2 2 3
−∞ −1 −∞ −1
314 Discrete-Time Random Signals
The probability that the amplitude of signal y(n) is not larger than an assumed χ is, by
definition, the probability distribution of y(n). Its form is
√ √
Fy (χ) = P{y(n) ≤ χ} = P{ x2 (n) ≤ χ} = P{− χ ≤ x (n) ≤ χ}
0R √
for χ ≤ 0 √0 for χ ≤ 0
χ
= √ p x (n) ( ξ ) dξ for 0 < χ ≤ 1 =
− χ
χ for 0 < χ ≤ 1
1 for χ > 1 1 for χ > 1,
2 2 2
1 1 1
0 0 0
-1 -1 -1
-1 0 1 -1 0 1 -1 0 1
Figure 7.11 Illustration of the probability distribution Fy (χ) calculation for y(n) = x2 (n), when −1 ≤ x (n) ≤ 1.
The probability density function is obtained as the derivative of the probability distribution
Fy (χ), that is
√ 1
for 0 < ξ ≤ 1
dF (ξ ) 2 ξ
py(n) (ξ ) = =
dξ
0 otherwise.
The expected value and the variance of the signal y(n) are
Z1 Z1
1 1 1 1 4
µy (n) = µy = ξ √ dξ = , σy2 (n) = σy2 = (ξ − )2 √ dξ = .
2 ξ 3 3 2 ξ 45
0 0
Example 7.15. Find the probability density function of y(n) for an arbitrary monotonous function
y(n) = f ( x (n)), with inverse f −1 (y(n)) = x (n), if the probability density function of x (n) is
p x ( n ) ( ξ ).
What is the form of py(n) (ξ ) if x (n) is a random variable with the uniform probability
density function, within the interval [−π/2, π/2) and y(n) = tan( x (n)), with the inverse
function x (n) = arctan(y(n))?
Ljubiša Stanković Digital Signal Processing 315
f −Z1 (χ)
−1
Fy (χ) = P{y(n) ≤ χ} = P{ f ( x (n)) ≤ χ} = P{ x (n) ≤ f (χ)} = p x(n) (ξ )dξ
−∞
or
Fy (χ) = Fx ( f −1 (χ)).
The probability density function is
dFy (ξ ) dFx ( f −1 (ξ )) d f −1 ( ξ )
py(n) (ξ ) = = = p x(n) ( f −1 (ξ )) .
dξ dξ dξ
This relation can also be obtained from the fact that the probability contained in a differential
area must be invariant under the change of variables, that is,
For the random variable x (n), with the uniform probability density function within the
interval [−π/2, π/2), the random variable y(n) = tan( x (n)) is distributed from −∞ to ∞. Its
probability density function is
d(arctan(ξ )) 1 1
py(n) (ξ ) = p x(n) (arctan(ξ )) = ,
dξ π 1 + ξ2
As an introduction to the second-order statistics (that will be considered in the next section),
consider two signals x (n) and y(n), with continuous amplitude values. The probability that the nth
signal sample x (n) takes a value within ξ ≤ x (n) < ξ + dξ and that y(m) takes a value within
ζ ≤ y(m) < ζ + dζ is
where p x(n),y(m) (ξ, ζ ) is the joint probability density function. The probability of an event a ≤ x (n) < b
and c ≤ y(m) < d is
Zb Zd
Probability { a ≤ x (n) < b, c ≤ y(m) < d} = p x(n),y(m) (ξ, ζ )dξdζ.
a c
For mutually independent signals p x(n),y(m) (ξ, ζ ) = p x(n) (ξ ) py(m) (ζ ). A special case of the previous
relations is obtained when y(m) = x (m).
316 Discrete-Time Random Signals
Example 7.16. The signal x (n) is defined by x (n) = a(n) + b(n) + c(n), where a(n), b(n), and
c(n) are mutually independent random signals with the uniform probability density function over
the interval [−1, 1). Find the probability density function of the signal x (n), its mean µ x , and the
variance σx2 .
⋆Consider a sum of two independent random signals s(n) = a(n) + b(n). The probability that
s(n) = a(n) + b(n) ≤ χ can be calculated from the joint probability distribution of a(n) and
b(n) as
Fs (χ) = P{s(n) ≤ χ} = Probability{−∞ < a(n) < ∞ and − ∞ < a(n) + b(n) ≤ χ}
Z∞ χZ−ζ Z∞ χZ−ζ
= p a(n),b(n) (ξ, ζ )dξdζ = pb(n) (ζ ) p a(n) (ξ )dξdζ.
−∞ −∞ −∞ −∞
Now, we can calculate the probability density function of s(n) as a derivative of Fs (χ), that is
Z∞ χZ−ζ
dFs (χ) d
ps(n) (χ ) = = pb(n) (ζ ) p a(n) (ξ )dξdζ
dχ dχ
−∞ −∞
Z∞
= pb(n) (ζ ) p a(n) (χ − ζ )dζ = pb(n) (χ) ∗χ p a(n) (χ),
−∞
meaning that the probability density function of the sum of two independent random variables is
a convolution of the individual probability density functions. In a similar way, we can include the
third signal and obtain
p x ( n ) ( χ ) = p c ( n ) ( χ ) ∗ χ p b ( n ) ( χ ) ∗ χ p a ( n ) ( χ ),
( χ +3)2
for − 3 ≤ χ ≤ −1
16
3− χ2
8 for − 1 < χ ≤ 1
p x (n) (χ) = ( χ −3) 2
for 1 < χ ≤ 3
16
0 for |χ| > 3.
The mean value and the variance can be calculated using p x(n) (χ), or in a direct way using the
linearity property, as
Example 7.17. Consider two independent random signals x (n) and y(n), with probability density
functions p x(n) (ξ ) and py(n) (ξ ). A new random signal is defined is such a way that it takes the
lower value of the signals x (n) and y(n) at every instant n,
Find the probability distribution and the probability density function of the random signal z(n).
What is the probability density function of z(n) if
1 −ξ/β x 1 −ξ/β y
p x (n) (ξ ) = e u(ξ ) and py(n) (ξ ) = e u(ξ )?
βx βy
⋆ Since the random signal z(n) takes the lower of the values x (n) and y(n), the probability that
z(n) = min{ x (n), y(n)} is lower than or equal to an assumed χ is equal to the probability that
at least one of the random samples x (n) and y(n) is bellow this assumed χ, that is
Since
P{ x (n) > χ} = 1 − Fx (χ) and P{y(n) > χ} = 1 − Fy (χ),
we get the probability distribution of the random variable z(n) in the form
The probability density function follows as the derivative of the probability distribution,
dFz (ξ )
pz(n) (ξ ) = = p x(n) (ξ ) + py(n) (ξ ) − p x(n) (ξ ) Fy (ξ ) − Fx (ξ ) py(n) (ξ )
dξ
= p x(n) (ξ )(1 − Fy (ξ )) + py(n) (1 − Fx (ξ )).
with
1 1 1
= + ,
βz βx βy
since Fx (ξ ) = (1 − exp(−ξ/β x ))u(ξ ) and Fy (ξ ) = (1 − exp(−ξ/β y ))u(ξ ).
See also Problem 7.8 and its solution.
318 Discrete-Time Random Signals
The correlations and covariances, as the most important parameters of the second-order statistics, will
be analyzed in this section and related to the spectral power density of random signals.
For a real-valued random signal with continuous amplitudes of its samples and the second-order
probability density function p x(n),x(m) (ξ 1 , ξ 2 ), the autocorrelation is
Z∞ Z∞
r xx (n, m) = ξ 1 ξ 2 p x(n),x(m) (ξ 1 , ξ 2 )dξ 1 dξ 2 . (7.21)
−∞ −∞
If the real-valued random variables x (n) and x (m) are statistically independent, then
and
r xx (n, m) = µ x (n)µ x (m).
For a signal { xi (n)}, n = 1, 2, . . . , N and i = 1, 2, . . . , M, being the index of realization of this
signal, the autocorrelation function is estimated by
1 M
xi (n) xi∗ (m). (7.22)
M i∑
r̂ xx (n, m) =
=1
xi = [ xi (1), xi (2), . . . , xi ( N )] H ,
Diagonal elements in the covariance matrix, C x , are the variances σx2 (n).
The cross-correlation and the cross-covariance of two signals x (n) and y(n) are defined as
and
c xy (n, m) = E{( x (n) − µ x (n)) (y(m) − µy (m))∗ } = r xy (n, m) − µ x (n)µ∗y (m). (7.25)
For the signals whose samples are available, the autocovariance is estimated using the following
relation,
1 M
( xi (n) − µ̂ x (n)) ( xi (m) − µ̂ x (m))∗ .
M i∑
ĉ xx (n, m) =
=1
The covariance matrix can be written in the form
1 M
(xi − µ̂ x })(xi − µ̂ x }) H = R̂ x − µ̂ x µ̂ xH .
M i∑
Ĉ x = Cov(x) =
=1
Example 7.18. Consider the nth set of experiments, where the independent variable in the experiment
assume N random values t1 (n), t2 (n), . . . , t N (n) and the result of experiment takes the random
values x1 (n), x2 (n), . . . , x N (n). If the linear model is assumed, xi (n) = ati (n) + b, show that
the solution for the parameter a in this linear regression problem, can be written as
1 N 1 N 1 N
ĉtx (n, n) = r̂tx (n, n) − µ̂t (n)µ̂ x (n) = ∑
N i =1
ti ( n ) xi ( n ) − ∑
N i =1
ti ( n ) ∑
N i =1
xi ( n )
1 N 2 1 N 2
ĉtt (n, n) = r̂tt (n, n) − µ̂2t (n, n) = ∑ ti ( n ) − ∑ ti ( n ) .
N i =1 N i =1
⋆ The solution to the linear regression model is obtained from the system (see Section 7.1.4)
1 N
∑
N i =1
ti (n) xi (n) − ati (n) − b = 0
1 N
∑
N i =1
xi (n) − ati (n) − b = 0
320 Discrete-Time Random Signals
with
a(r̂tt (n, n) − µ̂2t (n)) = r̂tx (n, n) − µ̂t (n)µ̂ x (n)
producing the estimate of parameter a in the form â = ĉtx (n, n)/ĉtt (n, n).
Signals whose first-order and second-order statistics are invariant to a shift in time are called wide
sense stationary (WSS) signals. For the WSS signals holds
µ x (n) = E{ x (n)} = µ x
r xx (n, m) = E{ x (n) x ∗ (m)} = r xx (n − m). (7.26)
A signal is stationary in the strict sense (SSS) if all order statistics are invariant to a shift in time.
1 M
( xi (n) − xi (m))2 ≥ 0.
M i∑
=1
E{ x (n)} = E{ x (n + N )} and
E{ x (n + N ) x ∗ (m)} = r xx ( N + n, m) = r xx (n, m).
The relations introduced for the second-order statistics may be extended to the higher-order
statistics. For example, the third-order moment of a signal x (n) is defined by
In order to calculate the third-order moment we should know the third-order statistics, like the third-
order probability Px(n),x(m),x(l ) (ξ 1 , ξ 2 , ξ 3 ) or probability density function.
For m = l = 0, the third-order moment of the real-valued random variable x (n), is defined by
Z∞
3
M3 = E{ x (n)} = ξ 3 p x(n) dξ.
−∞
1 1
µ x (n) = lim ( x (n) + · · · + x M (n)) = lim ( xi (n) + · · · + xi (n − N + 1)).
M→∞ M 1 N →∞ N
The characteristic function of a random variable x (n) is defined as the expected value of the random
variable y(n) = e jθx(n) , that is
Z∞
Φ x (θ ) = E{e jθx(n) } = p x(n) (ξ )e jθξ dξ.
−∞
It can be interpreted as the Fourier transform of the probability density function p x(n) (ξ ) (with
sign + in the exponent instead of −). The characteristic function can be related to the moments of the
random variable x (n), using the Taylor series expansion of e jθξ around zero (ξ = 0),
1 2 2 1
e jθξ = 1 + jθξ − θ ξ − j θ3 ξ 3 + . . . .
2! 3!
The characteristic function can be written in the form
Z∞ Z∞ Z∞ Z∞
1 2 1 3
Φ x (θ ) = p x(n) dξ + jθ ξ p x(n) dξ − θ ξ 2 p x(n) dξ − j θ ξ 3 p x(n) dξ + . . .
2! 3!
−∞ −∞ −∞ −∞
1 1
= 1 + jθM1 − θ 2 M2 − j θ 3 M3 + . . . , (7.29)
2! 3!
where the moments Mi are defined by
Z∞
Mi = ξ i p x(n) dξ.
−∞
322 Discrete-Time Random Signals
dΦ x (θ ) d2 Φ x (θ ) d3 Φ x (θ )
Φ x (0) = 1, = M1 , = M2 , = M3 , . . .
jdθ θ =0 −dθ 2 θ =0 − jdθ 3 θ =0
For the sum of random variables, z(n) = x (n) + y(n), whose probability density function is equal
to the convolution of the corresponding probability density functions , pz(n) (ξ ) = p x(n) (ξ ) ∗ py(n) (ξ )
(see Example 7.16), the characteristic function is equal to the product of their individual characteristic
functions,
Φ z ( θ ) = Φ x ( θ ) Φ y ( θ ). (7.30)
From this relation, we can easily find the moments of the sum of random variables (see also Problem
7.21 and Example 7.24).
For stationary signals, the autocorrelation function is a function of the difference of time arguments,
r xx (n) = E{ x (n + m) x ∗ (m)}.
The Fourier transform of the autocorrelation function of a WSS signal is the power spectral density
∞
Sxx (e jω ) = ∑ r xx (n)e− jωn (7.31)
n=−∞
Zπ
1
r xx (n) = Sxx (e jω )e jωn dω. (7.32)
2π
−π
Zπ
1
Sxx (e jω )dω = r xx (0) = E{| x (n)|2 }, (7.33)
2π
−π
Example 7.20. Find the expected value, the autocorrelation, and the power spectral density of the
random signal
K
x (n) = ∑ ak e j(ω n+θ ) ,
k k
k =1
where θk are random variables uniformly distributed over −π < θk ≤ π. All random variables
are statistically independent. Frequencies ωk are −π < ωk ≤ π for every k.
K K Zπ
1 j ( ωk n + θ k )
µx = ∑ a k E{ e j ( ωk n + θ k ) } = ∑ ak e dθk = 0.
k =1 k =1
2π
−π
Ljubiša Stanković Digital Signal Processing 323
The autocorrelation is
K K K
r xx (n) = E{ ∑ ak e j(ωk (n+m)+θk ) ∑ ak e− j(ω m+θ ) } = ∑ a2k e jω n ,
k k k
k =1 k =1 k =1
The average signal power of a signal x (n) has been defined as (2.9)
1 N D E
PAV = lim ∑ | x (n)|2 = | x (n)|2 .
N →∞ 2N + 1 n=− N
This relation leads to another definition of the power spectral density of random discrete-time
signals, given by
2
1 2 1 N
jω jω − jωn
Pxx (e ) = lim E{X N (e ) } = lim E{ ∑ x ( n ) e }. (7.34)
N →∞ 2N + 1 N →∞ 2N + 1 n=− N
Different notation is used since the definitions of power spectral density (7.31) and (7.34), in general,
will not produce the same result. We can write
N N
1
Pxx (e jω ) = lim E{ ∑ ∑ x (m) x ∗ (n)e− jω (m−n) }.
N →∞ 2N + 1 m=− N n=− N
The function w B (k) corresponds to the Bartlett (triangular) window over the calculation interval.
324 Discrete-Time Random Signals
15
10
-5
-10
-15
-15 -10 -5 0 5 10 15
Figure 7.12 Illustration of the power spectral density domain and the autocorrelation function r xx (m − n).
If the values of the autocorrelation function r xx (k) are such that the second part of the sum
∑k |k|/(2N + 1)r xx (k)e− jωk is negligible as compared to ∑k r xx (k)e− jωk then
2N
Pxx (e jω ) = lim ∑ r xx (k)e− jωk = FT{r xx (n)} = Sxx (e jω ).
N →∞
k =−2N
This is true for r xx (k) = Cδ(k) or r xx (k) being nonzero within the region |k| < k0 , such that
k0 /(2N + 1) is negligible. Otherwise Pxx (e jω ) is a smoothed version of Sxx (e jω ). Note that Pxx (e jω )
is always nonnegative, by definition (for a numeric illustration see Example 7.52). Estimation of the
power spectral density will be revisited in Section 7.5.6.
Periodically extended signals. Another way to estimate the power spectral density is to assume that a
WSS signal x (n), n = 0, 1, 2, . . . , N − 1, is periodically extended. Then
N −1 N −1
1 1
Pxx (k) = E{| X (k)|2 } = E{ ∑ ∑ x (m) x ∗ (n)e− j2πk(m−n)/N }
N N m =0 n =0
1 N −1 N −1 1 N −1 N −1
∑ ∑ E{ x (m) x ∗ (n)}e− j2πk(m−n)/N = ∑ rxx (m − n)e− j2πk(m−n)/N
N m∑
=
N m =0 n =0 =0 n =0
1 N −1 N −1+ m N −1
= ∑ ∑ r xx (i )e− j2πki/N = ∑ r xx (i )e− j2πki/N = Sxx (k).
N m =0 i = m i =0
Since the signal, x (n), is periodically extended, the autocorrelation, r xx (n), is periodically extended as
well. This means that r xx ( N ) = E{ x (m + N ) x ∗ (m)} = E{ x (m) x ∗ (m)} = r xx (0), r xx ( N + 1) =
E{ x (m + N + 1) x ∗ (m)} = E{ x (m + 1) x ∗ (m)} = r xx (1), and so on, producing the last equality in
the previous derivation.
Ljubiša Stanković Digital Signal Processing 325
R x = NW H P x W.
where the WSS random signal ε(n), n = 0, 1, 2, . . . , N − 1, N = 16, with E{ε(n)} = 0 and
rεε (n) = δ(n), is periodically extended in such a way that ε(n + Nk) = ε(n), where k is an integer.
Find the autocorrelation r xx (n) within the basic period, n = 0, 1, 2, . . . , N − 1, and the power
spectral density Sxx (k). Use 10000 realizations of ε(n) to calculate Xi (k) = DFT N { xi (n)} and
plot meani { Xi (k) Xi∗ (l )}/N, for k = 0, 1, 2, . . . , N − 1 and l = 0, 1, 2, . . . , N − 1.
r xx (n) = E{ x (m + n) x ∗ (m)},
r xx (±n) = 0, for 4 < n < 16 − 4 and r xx (n + 16k) = r xx (n). The exact value of r xx (n) and
its estimation using 10000 realizations
10000 N −1
1
with x ( N + n) = x (n),
10000N i∑ ∑ x i ( n + m ) x i ( m ),
r̂ xx (n) =
=1 m =0
are shown in Fig. 7.13(c). The value of meani { Xi (k) Xi∗ (l )}, averaged over 10000 realizations,
is given in Fig. 7.13(a). For comparison, the DFT of r xx (n) is presented on the diagonal of Fig.
7.13(b), with its exact value mean{ X (k) X ∗ (k)} shown in Fig. 7.13(d).
0 0
2 2
4 4
6 6
8 8
10 10
12 12
14 14
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
(a) (b)
15
3
10
2
1 5
0 0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
(c) (d)
Figure 7.13 The power spectral matrix illustration. (a) The value of meani { Xi (k) Xi∗ (l )}/N averaged over
10000 realizations. (b) The DFT of r xx (n) as a diagonal matrix. (c) The exact value of r xx (n) and its estimation,
r̂ xx (n), using 10000 realizations. (d) The exact value of E{ X (k) X ∗ (k)}/N from (b).
Ljubiša Stanković Digital Signal Processing 327
In many applications, the desired signal is disturbed by various forms of random signals, caused by
numerous factors in signal sensing, transmission, and/or processing. Often, a cumulative influence of
these factors, disturbing useful signal, is described by an equivalent random signal, called noise. In
most cases, we will use a notation ε(n) for these kinds of signals. They model random, commonly
multiple sources, disturbances.
A noise is said to be white if its values are uncorrelated
Spectral density of this kind of noise is constant (like it is the case in the white light),
If this property is not satisfied, then the power spectral density is not constant. Such a noise is referred
to as colored.
Regarding the distribution of noise ε(n) amplitudes the most common types of noise in signal
processing are: uniform, binary, Gaussian, Rayleigh, Laplacian, Cauchy, and Poison noise.
The uniform noise is a discrete-time signal with the probability density function defined by
1
pε(n) (ξ ) = , for − ∆/2 ≤ ξ < ∆/2 (7.36)
∆
and pε(n) (ξ ) = 0 elsewhere, Fig. 7.14. This noise takes values within the interval [−∆/2, ∆/2) with
equal probability. The variance of uniform noise is
∆/2
Z
∆2
σε2 = ξ 2 pε(n) (ξ )dξ = .
12
−∆/2
This kind of noise is used to model rounding errors in the amplitude quantization of discrete-time
signals. Its distribution indicates that all errors within −∆/2 ≤ ξ < ∆/2 are equally probable.
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
0 10 20 30 40 50 60 0 0.5 1 1.5
Figure 7.14 A realization of the uniform noise (left) with its probability density function (right), when ∆ = 1.
328 Discrete-Time Random Signals
Random binary sequence, or binary noise, is a stochastic signal which randomly takes one of the
two fixed signal values. Assume that the noise ε(n) values are, for example, {−1, 1} and that the
probability that ε(n) takes value 1 is Pε (1) = p, while Pε (−1) = 1 − p. The expected value of this
noise is
µε = ∑ ξPε (ξ ) = (−1)(1 − p) + 1 · p = 2p − 1.
ξ =−1,1
The variance is
σε2 = ∑ (ξ − µε )2 Pε (ξ ) = 4p(1 − p).
ξ =−1,1
A special case is obtained when the values from the set {−1, 1} are equally probable, that is, when
p = 1/2. Then, we get µε = 0 and σε2 = 1.
When the random signal ε(n) values are from the set {0, 1}, then this form of binary signal is
referred to as the Bernoulli random signal or Bernoulli noise. This signal takes the value ε(n) = 1 with
the probability p, while the probability of ε(n) = 0 is equal to (1 − p). The probability that the signal
sample ε(n) takes one specific value, can be written as
since P(ε(n) = 1| p) = p and P(ε(n) = 0| p) = 1 − p. The expected value of the Bernoulli noise is
µε = p, while the variance is
Example 7.22. Consider a set of N → ∞ balls. An equal number of balls is marked with 1 (or white)
and 0 (or black). A random signal x (n), n = 0, 1, 2, 3, corresponds to drawing of four balls in a
row. This signal has four samples x (0), x (1), x (2), and x (3). The signal values x (n) are equal
to the marks on the drawn balls. Write all possible realizations of x (n). If k is the number of
appearances of value 1 in the signal, write the probabilities for each value of k.
⋆Signal realizations, xm (n), with the number k being equal to the number of appearances of
digit 1 in every signal realization, are given in the next table.
x m (0) 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
x m (1) 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x m (2) 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x m (3) 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
3
y(m) = ∑ xm (n) = k 0112122312232334
n =0
with p = 1/2 and q = 1 − p = 1/2. For the case when N is a finite number see Problem 7.9.
An interesting form of the random variable, resulting from the experiment when the random
varaible can take only two possible values {−1, 1} or {0, 1} or {No, Yes} or { A, B}, is the binomial
random variable. Binomial random variable is equal to the number, k, of successes (1, or Yes or B)
in a sequence of N independent binary experiments, each of which yields success with probability p.
This random varaible obeys the binomial distribution which is the basis for the popular binomial test of
statistical significance.
The binomial random variable, k, has been introduced through the previous simple example.
In general, if the signal x (n) takes the value B from the set { A, B} with the probability p, then the
probability that there is exactly k values of B in a specific order, within a sequence of N samples of x (n),
is pk (1 − p) N −k . For k = 1, it is possible to achieve this result in N specific orders (combinations, see
Example 7.22). When k = 2, then there are N ( N − 1) = ( N2 ) such combinations. In general, for any k,
there are ( Nk ) orders (combinations) that x (n) takes k times value B, that is
N!
P(k) = N
k p k (1 − p ) N − k = p k (1 − p ) N − k . (7.37)
k!( N − k)!
This is a binomial coefficients form of ( p + q) N = ( p + (1 − p)) N . The expected value of the number
of appearances y(m) = k of the event B or “Yes” in N samples is
N N
N!
µ y = E{ y } = ∑ kP(k) = ∑ k k!( N − k)! pk (1 − p) N −k .
k =0 k =0
Since the first term in the summation is 0, we will shift the summation for one and reindex it to
N −1
N ( N − 1) !
µ y = E{ y } = ∑ (k + 1) (k + 1)!(( N − (k + 1))! pk+1 (1 − p) N −(k+1)
k =0
N −1
( N − 1) !
= Np ∑ pk (1 − p)( N −1)−k .
k =0
k! (( N − 1 ) − k ) !
µy = E{y} = N p.
As we could write from the beginning, the expected value of the number of appearances of an event B,
whose probability is p, in N realizations is E{y} = N p. This derivation was performed not only to
prove this fact, but it will lead us to the next step in deriving the variance of the event y, by using the
330 Discrete-Time Random Signals
Since the first two terms are 0, we can reindex the summation as
N −2
N!
E{y(y − 1)} = ∑ (k + 2)(k + 1) (k + 2)!( N − 2 − k)! pk+2 (1 − p) N −2−k
k =0
N −2
( N − 2) !
= N ( N − 1) p2 ∑ p k (1 − p ) N −2− k .
k =0
k! ( N − 2 − k)!
The relation
N −2
( N − 2) !
∑ pk (1 − p) N −2−k = ( p + (1 − p)) N −2 = 1
k =0
k! ( N − 2 − k ) !
is used to get
E{y(y − 1)} = N ( N − 1) p2 .
The variance of y follows from
Therefore, in a sequence of N values of signal x (n) that can take values {0, 1}, the expected
value of the number of appearances of 1, y = ∑nN=1 x (n), divided by N, is
1 N y Np
(7.38)
N n∑
E{ z } = E{ x (n)} = E{ } = = p.
=1 N N
1 2 N p (1 − p ) p (1 − p )
σz2 = 2
σy = 2
=
N N N
By increasing the number of the total values N, the variance will be lower and a finite set x (n) will
produce a more reliable mean value p (see Example 7.33).
Notice that the random variable z is the mean of the independent random variables x (n).
Here we will reconsider Bayes’ theorem and use Bayesian analysis to test hypotheses about the
probabilistic model parameters in the case of binary random signals. Bayesian inference is a method
in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence samples
(events) become available.
Assume that the random signal x (n) takes the values +1 and 0 (positive or negative test result;
’head’ or ’tail’ in the coin-tossing; value 1 or −1) with probabilities p and 1 − p, respectively, that are
not known.
The event B consists of N observed samples x (n), n = 1, 2, . . . , N, and the number k of x (n) = 1
occurrences in this sequence. The aim is to estimate the probability p based on the results obtained in
the observed event B (that is, based on k and N).
Ljubiša Stanković Digital Signal Processing 331
The classical frequentist approach to the estimation of the probability p is based on (7.38) and
1 N k
N n∑
p̂ = x (n) =
=1 N
assuming that x (n) takes the value of 1 with the probability p and the value of 0 with the probability
q = 1 − p. This problem is elaborated in detail in Example 7.33.
Here, we will consider the problem of the probability p estimation within the Bayes framework.
In this case, Bayes’ relation can be written in the form
P( B| p) P( p)
P( p| B) = , (7.39)
P( B)
where the hypothesis is that the probability p takes a particular value from the given set of all possible
values and that the event B occurred assuming this probability. The terms in this expression are:
• Prior P( p) for the hypothesis p. It has to be assumed based on our possible knowledge about
the resulting p:
1. If all values of p are equally probable, then the uniform prior is assumed, P( p) = C, for a
discrete set of possible values
with the step, for example, ∆p = 0.01 and C is a constant (as it will be shown, not
relevant).
2. If we expect that the value of p is close to 0.5, then we can assume, for example, the prior
P( p) = 2C (1 − 2| p − 0.5|) for 0 ≤ p ≤ 1, with the step ∆p for p, or
3. The Gaussian function prior form
2
1 − ( p−0.5)
P( p) = C √ e 2σ2 , calculated at the discrete values of p, with the step ∆p.
σ 2π
• The likelihood factor P( B| p) is equal to the probability that the event B occurred for the assumed
value of p. For the random binary signal, the event B denotes a realization which consists of k
samples x (n) = 1 and N − k samples when x (n) = 0. The probability of this event B, for the
assumed p, is
P( B| p) = Nk pk (1 − p) N −k . (7.40)
• The probability of observing the data specified by the event B (k times x (n) = 1 and N − k
times x (n) = 0), summed over all hypotheses (all possible values of p) is
P( B) = ∑ Nk pk (1 − p) N −k P( p).
p
It is common to avoid the last (marginal) probability, P( B), in (7.39) which is not dependent on p
(plays the role of a normalization factor), and to consider only the so called posterior
P ( p | B ) ∝ P ( B | p ) P ( p ).
∑ P( p| B) = 1
p
332 Discrete-Time Random Signals
Example 7.23. Here, will calculate the posterior of the hypothesis p, P( p| B) ∝ P( B| p) P( p), for the
binary signal x (n) when the evidence B consists of N signal samples, with x (n) = 1 appearing k
times. The posterior P( p| B) is updated as N increases. The following events are analyzed:
(a) The event B of N = 6 samples x (n), with k = 2 samples taking the value x (n) = 1.
(b) The event B when the number of available samples (observations) is increased to N = 50
and k = 9 times x (n) = 1 is obtained.
(c) The event B with a large number of available samples, N = 1000, when k = 220 nonzero
samples, x (n) = 1, are observed.
For the hypothesis p use:
(i) The uniform prior P( p) = C and √
(ii) the Gaussian prior P( p) = C exp(−( p − 0.5)2 /(2σ2 ))/(σ 2π ), with σ = 0.05,
and the set of values 0 ≤ p ≤ 1 with the step ∆p = 0.01. The value of constant C is not
relevant, since the results are normalized.
⋆ The results for the posterior, P( p| B) ∝ P( B| p) P( p), with P( B| p) defined in (7.40), are shown
in Fig. 7.15, for the uniform prior, P( p) (left) and foe the Gaussian prior P( p) (right), for various
p and given N and k in (a), (b), and (c), respectively.
We can see that the hypothesis’ probability is influenced by the prior distribution. When
the evidence is large (with large N, as in (c)) both cases produce highly concentrated probability
descriptions (denoted by (c)) close to the expected value of the parameter p.
For the probabilistic interpretation, the values of P( p| B) in Fig. 7.15 should be normalized
so that ∑ p P( p| B) = 1 for every considered case (bottom panels).
The maximum position of the likelihood factor P( B| p) (maximum likelihood estimation) can ne
found in an analytic way. From
dP( B| p)
=0
dp
follows k(1 − p) = ( N − k ) p or p = k/N. This solution holds for the maximization of the posterior
P( p| B) for the uniform prior P( p). However, for other prior functions, the maximum of the likelihood
factor depends on the prior function, especially for low N. The influence of the prior will be analyzed
next.
Log-Likelihood Function. A specific form of the binary random signal, called the Bernoulli random
signal, will be used to introduce few more important concepts in Bayesian analysis.
For the Bernoulli random signal, the probability that the signal sample x (n) takes one specific
value, with the assumed parameter value p (hypothesis), can be written in a compact form as
1 1
0.5 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 7.15 Bayesian approach based estimation of the probability p that a nonzero sample, x (n) = 1, is obtained
in the binary random signal for the different number of available samples (realizations) N and the number of nonzero
samples k: (a) N = 6, k = 2, (b) N = 50, k = 9, and (c) N = 1000, k = 220. The uniform prior P( p) is used for
the left panels and the Gaussian prior centered at p = 0.5 for the right panels. The step in p was ∆p = 0.01. All
shown probabilities are normalized to one (top panels). For the probabilistic interpretation, the values of P( p| B)
should be normalized so that ∑ p P( p| B) = 1 for each of the considered case (bottom panels).
The aim is to find the maximum of P(x| p). We may use the derivative of P(x| p), or the derivative
of its logarithm, since the logarithm is a monotonous function and will not change the maximum
position for the positive considered function. In general, we have
d ln( f ( x )) f ′ (x)
= ,
dx f (x)
meaning that both f ( x ) and ln( f ( x )) will produce the same result for the maximum position, f ′ ( x ) = 0.
The log-likelihood function is of the form
N N
ln( P(x| p)) = ∑ x (n) ln( p) + ∑ (1 − x(n)) ln(1 − p).
n =1 n =1
The maximum of this function (maximum likelihood estimation (MLE) of the model parameter p) is
obtained from
p MLE = arg{max{ln( P(x| p))}}.
It follows by making the derivative of ln( P( B| p)) with regard of p equal to 0,
Frequentist versus Bayesian inference. In the previous analysis, we get a specific value for the
hypothesis parameter p (this is the so-called frequentist inference). It does not give the probability of
the hypothesis parameter p, as it was the case in Fig. 7.15. The Bayes analysis, presented in Fig. 7.15,
is based on the posterior P( p| B) ∝ P( B| p) P( p) which results not in a single value of the parameter
p, but in its probabilistic description (Bayesian inference). This probabilistic description includes our
prior belief P( p) and the current experiment outcome (evidence), P( B| p).
The maximum a posterior (MAP) solution is obtained as the position of the maximum of the
logarithm of the a posterior probability P( p| B) ∝ P( B| p) P( p), that is
Note that the result of the maximum likelihood solution p MLE is the special case of the maximum
a posterior solution p MAP , when the prior P( p) is uniform.
In many optimization approaches, the negative value of the logarithm is used as a cost function.
Then, instead of the maximum position, the position of the minimum is the problem solution
In solving the minimization problem, various gradient-based algorithms can also be used.
The Gaussian (normal) noise is used to model a disturbance caused by many small independent factors.
Namely, the central limit theorem states that a sum of a large number of statistically independent
random variables, with any distribution, obeys to the Gaussian distribution.
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
0 10 20 30 40 50 60 0 0.25 0.5
Figure 7.16 A realization of Gaussian noise (left) with its probability density function (right).
Ljubiša Stanković Digital Signal Processing 335
The variance of this noise is σε2 (see Problem 7.10). For a Gaussian distributed random signal with the
mean value µ and the variance σε2 , whose probability density function is
1 2 2
pε(n) (ξ ) = pε(n) (ξ |µ, σε ) = √ e−(ξ −µ) /(2σε ) , (7.42)
σε 2π
Example 7.24. Consider N random signals (variables), xi (n), i = 1, 2, . . . , N, that are independent,
and identically distributed (i.i.d.), with the unit variance and zero-mean. The probability density
functions of the random signals xi (n) are the same and equal to p x (ξ ). A new random signal
y(n) is formed as a sum
1 1 1
y ( n ) = √ x1 ( n ) + √ x2 ( n ) + · · · + √ x N ( n ).
N N N
√
The factors 1/ N are introduced so that the variance of y(n) is σy2(n) = 1.
Find the probability density function of the random signal y(n) when N → ∞.
√ √ √
⋆ The probability density function of the √ random signal xi (n)/ N is p x (ξ N ) N. The
characteristic functions of xi (n) and xi (n)/ N are, respectively, (7.29)
1 2 1 1 1
Φ x (θ ) = 1 + jθM1 − θ M2 − j θ 3 M3 + . . . = 1 − θ 2 − j θ 3 M3 + . . . ,
2! 3! 2! 3!
θ 1 θ2 1 θ3 1 θ2 1 θ3
Φ √x (θ ) = 1 + j √ M1 − M2 − j M3 + · · · = 1 − − j M + ... ,
N N 2! N 3! N 3/2 2! N 3! N 3/2 3
2
The inverse Fourier transform (with sign −) of e−θ /2 is equal to the probability density
function of the unit-variance Gaussian random variable,
Z∞
1 2 1 2
e−θ /2 − jθξ
e dξ = √ e−ξ /2 ,
2π 2π
−∞
what proves the central limit theorem (CLT) for the sum of independent, and identically distributed
(i.i.d.) random variables.
The probability that the amplitude of a zero-mean Gaussian random variable takes a value smaller
than λ is
Zλ
1 −ξ 2 /(2σε2 ) λ
Probability{|ε(n)| < λ} = √ e dξ = erf √ (7.43)
σε 2π 2σε
−λ
where
Zλ
2 2
erf(λ) = √ e−ξ dξ
π
0
is the error function.
Commonly used probabilities that the absolute value of the Gaussian random variable is within
the standard deviation, two standard deviations (two-sigma rule), or three standard deviations are
√
Probability{−σε < ε(n) < σε } = erf(1/ 2) = 0.6827, (7.44)
√
Probability{−2σε < ε(n) < 2σε } = erf( 2) = 0.9545,
√
Probability{−3σε < ε(n) < 3σε } = erf(3/ 2) = 0.9973.
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
Figure 7.17 Probability density function with the intervals corresponding to −σε < ε(n) < σε , −2σε < ε(n) < 2σε ,
and −3σε < ε(n) < 3σε . Value of σε = 1 is used.
Example 7.25. Given 12 measurements of a Gaussian zero-mean noise ε(n) ∈ {−0.7519, 1.5163,
−0.0326, −0.4251, 0.5894, −0.0628, −2.0220, −0.9821, 0.6125, −0.0549, −1.1187, 1.6360}.
Estimate the sample standard deviation of this data and use it to estimate the probability that the
absolute value of this noise will be smaller than 2.5.
Ljubiša Stanković Digital Signal Processing 337
⋆The standard deviation of this noise could be estimated using (7.7) with µ = 0 and M = 12
(see also Section 7.4.5). Its values is σ = 1.031. Thus, the absolute value of this noise will be
smaller than 2.5 with the probability of
Z2.5 √
1 2 2
P{|ε(n)| < 2.5} = √ e−ξ /(2·1.031 ) dξ = erf(2.5/( 2 · 1.031)) = 0.9847.
1.031 2π
−2.5
Example 7.26. The random signal x (n) is a Gaussian noise with the mean value µ x = 1 and the
variance σx2 = 1. The random sequence y(n) is obtained by omitting samples from the signal
x (n) that are either negative or higher than 1. Find the probability density function of sequence
y(n). Find its mean and variance, µy and σy .
Estimation of the Gaussian distribution parameters based on the observed set of the signal values will
be presented in this section, using the maximum likelihood approach.
Stationary signal. Consider a stationary random Gaussian distributed signal x (n) whose N samples
are available. The probability density function of a signal sample x (n) is defined by
1 ( ξ − µ )2
p x(n) (ξ |σ, µ) = √ exp (− ),
σ 2π 2σ2
where σ and µ are the assumed (unknown) parameters of the Gaussian distribution.
338 Discrete-Time Random Signals
1 ( ξ − µ )2 1 ( ξ − µ )2
p x(1),...,x(n) (ξ 1 , . . . , ξ N |σ, µ) = √ exp (− 1 2 ) × · · · × √ exp (− N 2 ).
σ 2π 2σ σ 2π 2σ
(7.45)
Vector form relation of this probability density function is
1 ||ξ − µ||22
px (ξ |σ, µ) = p exp (− ),
σN (2π ) N 2σ2
where x =[ x (1), x (2), . . . , x ( N )] T, ξ =[ξ 1 , ξ 2 , . . . , ξ N ] T, and ||ξ − µ||22 = (ξ 1 − µ)2 + · · · +(ξ N − µ)2 .
Within the maximum likelihood estimation (MLE) framework, the goal is to find the un-
known (prior) parameters σ and µ so that the distribution fits best the observed data x =
[ x (1), x (2), . . . , x ( N )] T . The probability of σ and µ, given the observed random signal samples,
P(σ, µ|x), can be written using Bayes’ relation for a posterior distribution as in (7.16), (7.39)
P(x|σ, µ) P(σ, µ)
P(σ, µ|x) = .
P(x)
Since P(x) does not depend on parameters σ and µ, this (marginal) probability does not influence
the optimization with regard to the parameters σ and µ, and is commonly omitted from the analysis.
Furthermore, using the uniform priors, P(σ, µ) = c, we can write
The probability that the random signal x (n) takes specific values, given by
x = [ x (1), x (2), . . . , x ( N )] T ,
1 ||x − µ||22
P(x|σ, µ) = px (x|σ, µ)dx = p exp (− )dx.
σ N (2π ) N 2σ2
Therefore, the best fitting parameter (σ, µ) values can be obtained by maximizing
For the Gaussian distributed random signal, the maximization is performed straightforwardly, by
differentiating the probability density (likelihood) function p(x|σ, µ) or its logarithm (log-likelihood)
function.
We will use the negative logarithm function, when the likelihood maximization problem is
equivalent to the log-likelihood minimization problem stated as
n o
(σ, µ) MLE = arg min − ln px (x|σ, µ) . (7.46)
This means that we have to minimize the cost function − ln px (x|σ, µ) , defined by
N ||x − µ||22
J (σ, µ) = − ln px (x|σ, µ) = ln(2π ) + N ln(σ) + , (7.47)
2 2σ2
Ljubiša Stanković Digital Signal Processing 339
where ||x − µ||22 is the squared two-norm (L2 -norm) of the vector x − µ,
2( x (1) − µ) + 2( x (2) − µ) + · · · + 2( x ( N ) − µ) = 0, as
1
µ̂ = ( x (0) + x (1) + · · · + x ( N )),
N
while using ∂J (σ, µ)/∂σ = 0, an estimate of the parameter σ is obtained from
N 1 1
− ||x − µ||22 3 = 0, as σ̂2 = ||x − µ̂||22 .
σ σ N
These are the well-known statistical relations for the mean value and the variance introduced
intuitively (frequentist inference) in Section 7.1. In Bayesian inference, we should provide P(σ, µ|x) ∝
px (x|σ, µ) P(σ, µ), for an assumed prior P(σ, µ) and a set of possible values for µ and σ, rather than
their specific values (as in the next example).
Example 7.27. The concept of finding the parameters µ and σ of the Gaussian distribution, to fit
data, is illustrated on a simple data set. Assume that four observations of a Gaussian stationary
signal x (n) are available, and given by x (1) = 0.2, x (2) = −0.3, x (3) = −0.4, and x (4) = 0.5.
Estimate the expected value, µ, and the variance, σ2 , of the Gaussian distribution from the
observed data. The data set is then increased to N = 20 available samples, whose values are given
in Fig. 7.18(right).
Find the posterior distribution P(σ, µ|x) for the discrete sets −1 ≤ µ ≤ 1 and 0.1 ≤ σ ≤ 1,
with the step 0.05, and the uniform prior P(σ, µ) = C.
⋆ The log-likelihood function of the joint distribution of the observed data is given by (7.47)
1
µ̂ = ( x (1) + x (2) + x (3) + x (4)) = 0,
4
while the differentiation of J (σ, µ) with respect to σ results in
4 1
− ( x (1) − µ )2 + ( x (2) − µ )2 + ( x (3) − µ )2 + ( x (4) − µ )2 3 = 0
σ σ
1p 2
σ̂ = 0.2 + 0.32 + 0.42 + 0.52 = 0.3674.
2
The Bayesian inference approach, for the uniform prior P(σ, µ) = C, would produce the
probability
1 ||x − µ||22
P(σ, µ|x) ∝ px (x|σ, µ) P(σ, µ) ∝ px (x|σ, µ) = p exp (− )
σ N (2π ) N 2σ2
340 Discrete-Time Random Signals
as shown in Fig. 7.18(left) for given x = [2, −3, −4, 5] T /10 and variable µ and σ.
In order to show the influence of the number of samples on the reliability of the result for µ
and σ we have also calculated P(σ, µ|x) ∝ px (x|σ, µ) for N = 20 available signal samples and
discrete sets −1 ≤ µ ≤ 1 and 0.1 ≤ σ ≤ 1, with the step 0.05, Fig. 7.18(right).
Both of these sets of the available samples x (n), with N = 4 and N = 20, produce almost
the same result in the frequentist inference approach, µ ≈ 0.00 and σ ≈ 0.37, while their posterior
distributions P(σ, µ|x) are quite different.
0.02 0.1
0 0
1 1
1 1
0.5 0.5
0.3 0 0.3 0
0.1 0.1
-1 -1
x = [2, −3, −4, 5] T /10, x = [2, −3, −4, 5, 8, 1, −1, 3, 0, −7, 0, −5, 2, 4, −1, 4, −6, −1, 1, −1] T /10.
Figure 7.18 Bayesian inference approach based estimation of the parameters µ and σ in the Gaussian random
signal for different numbers of the available samples (realizations) N. All shown probabilities P(σ, µ|x) are
normalized so that ∑σ ∑µ P(σ, µ|x) = 1 holds for considered cases. The uniform prior, P(σ, µ) = C, is used.
Example 7.28. A noisy random variable x (n) is a function of M independent variables ti (n),
i = 1, 2, . . . , M,
1 ξ2 1 ξ2
pε(1),...,ε(n) (ξ 1 , . . . , ξ N |σ) = √ exp (− 12 ) × · · · × √ exp (− N2 ),
σ 2π 2σ σ 2π 2σ
Having in mind that
As explained in this section, the best fitting parameters are obtained by maximizing
N ||x − Ta||22
J (σ, a, T) = ln(2π ) + N ln(σ ) + , (7.48)
2 2σ2
where T is the matrix whose rows are t(n).
The minimization with respect to a produces the MAP result a = (T T T)−1 T T x, as in
(7.12). This maximum a posterior (MAP) solution corresponds to the uniform prior P(σ, a) and
its is equal to the MLE solution. If a nonuniform prior P(σ, a) to σ and a is added, then the
posterior probability is
which penalizes high values of elements and enforces the solution with the maximum possible
number of the zero-valued elements in vector of coefficients a, we get
The case of nonstationary Gaussian random signal, when we cannot assume either that the
expected value and the variance of the samples are time-invariant or that the samples are statistically
independent is more complex. This case will be considered in Part VI.
Consider the signal x = [ x (1), x (2), . . . , x (n)] T and its true parameter θ whose unbiased estimate,
obtained using the data in x, is θ̂ (x). For the unbiased estimate, holds
E{θ̂ (x) − θ } = 0,
342 Discrete-Time Random Signals
∂
E{θ̂ (x) − θ } = 0 for all θ.
∂θ
Assuming that the probability density function of x, with an assumed θ, is px (x|θ ), we can write
Z∞
∂
(θ̂ (x) − θ ) px (x|θ )dx = 0 for all θ.
∂θ
−∞
After the differentiation is performed, this equation can be rewritten in the form
Z∞ Z∞
∂px (x|θ )
− px (x|θ )dx + (θ̂ (x) − θ ) dx = 0 or
∂θ
−∞ −∞
Z∞
∂px (x|θ )
(θ̂ (x) − θ ) dx = 1, (7.51)
∂θ
−∞
since the first integral is equal to 1, by definition. We know that the derivative of the logarithm of a
function px (x|θ )) is given by
Now, we will adjusted the form of this relation for the Schwartz inequality application,
Z∞ h q i h ∂ ln( p (x|θ )) q i 2
x
(θ̂ (x) − θ ) px (x|θ ) px (x|θ ) dx = 1 (7.52)
∂θ
−∞
R R R
According to the Schwartz inequality, ( f ( x ) g( x )dx )2 ≤ f 2 ( x )dx g2 ( x )dx,
Z∞ h q i h ∂ ln( p (x|θ )) q i 2
x
1= (θ̂ (x) − θ ) px (x|θ ) px (x|θ ) dx
∂θ
−∞
Z∞ Z∞
∂ ln( px (x|θ )) 2
≤ (θ̂ (x) − θ )2 px (x|θ )dx px (x|θ )dx.
∂θ
−∞ −∞
Applying this notation, we finally get the Cramer-Rao bound for the variance of the estimated parameter
1 1
Var(θ̂ (x)) ≥ 2 = I ( θ )
∂ ln( px (x|θ ))
E{ ∂θ }
∂ ln( px (x|θ ))
θ̂ (x) − θ = k ,
∂θ
p
where px (x|θ ) on both sides is omitted. The constant k is obtained as k = 1/I (θ ) from the condition
that the integral in (7.52) is equal to 1 for θ̂ (x) − θ = k∂ ln( px (x|θ ))/∂θ. Therefore, for the optimal
estimator and the minimal variance, the following equality
∂ ln( px (x|θ ))
= I (θ )(θ̂ (x) − θ ) (7.54)
∂θ
holds. This relation can be used to find the optimal estimator, θ̂ (x), and the minimal variance, 1/I (θ ),
without the evaluation of the second-order derivative
∂2 ln( px (x|θ ))
= − I (θ ) (7.55)
∂θ 2
Example 7.29. Consider the signal x (n) = s(n) + ε(n), where ε(n) is a zero-mean Gaussian noise.
The aim is to estimate a parameter a of the sinusoidal signal s(n), for example, its amplitude,
frequency, or phase, from N samples of the signal, x = [ x (1), x (2), . . . , x ( N )] T . Find the
minimum variance estimator and the Cramer-Rao bound.
⋆The random variable x (n) − s(n) = ε(n) is Gaussian distributed. For N statistically indepen-
dent values of the error ε(n), with the assumed parameter a value, holds
2 ( x ( N )−s( N | a))2
1 − (x(1)−2σs(21|a)) 1
px ( x (1), . . . , x (n)| a) = √ e × · · · × √ e− 2σ2 , (7.56)
σ 2π σ 2π
or in vector form
1 ||x − s| a||22
px (x| a) = p exp (− ),
σN (2π ) N 2σ2
where s(n| a) is the considered signal with the assumed parameter a, and s| a is its vector form.
The log-likelihood function for this random signal is
N ||x − s| a||22
J (x| a) = − ln px (x| a) = ln(2π ) + N ln(σ) + . (7.57)
2 2σ2
344 Discrete-Time Random Signals
(a) In the case when we want to estimate the amplitude a of the sinusoidal signal
∂J (x| a) 1 N
− = 2 ∑ ( x (n) − a cos(2πnk0 /N )) cos(2πnk0 /N )
∂a σ n =1
N N 2
= 2 ∑ ( x (n) cos(2πnk0 /N )) − a , (7.59)
2σ n=1 N
since ∑nN=1 a cos2 (2πnk0 /N ) = aN/2. Comparing relation (7.59) with (7.54), we can conclude
that the optimal estimator and the minimum variance are, respectively, the cosine transform
N
â = g(x) = 2 ∑ x(n) cos(2πnk0 /N )
n =1
For the sinusoidal signal s(n) = a cos(2πnk0 /N ), in this way we confirm the previous result
2σ2
Var{ â} ≥ ,
N
For this signal ∂s(n| a)/∂a = n cos( an) and the bound for the variance of the frequency
estimation is
σ2 σ2
Var{ â} ≥ 2 = N .
∑nN=1
∂s(n| a) ∑n=1 n cos2 ( an)
2
∂a
The Cramer-Rao bound is shown in Fig. 7.19 for N = 10 and N = 50, and various values of a,
with σ2 = 1.
10-4
0.025 1.5
0.02
1
0.015
0.01
0.5
0.005
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 7.19 Cramer-Rao bound for the variance of the frequency estimation.
Example 7.30. Consider the signal x (tn ) = atn + ε(n), n = 1, 2, . . . , N, where ε(n) is a zero-mean
Gaussian noise. The gaol is to revisit the linear regression model and the estimation of the
parameter a and its variance. What is the optimal estimator for a from the available data x (tn ),
given in the vector x for instants being elements of the vector t? What is the variance of the
optimal estimator of a (the Cramer-Rao bound)?
⋆The cost function for this random signal, with zero-mean Gaussian noise ε(n), is
N ||x − at||22
J (x| a) = − ln px (x| a) = ln(2π ) + N ln(σ) + . (7.61)
2 2σ2
The first derivative is given by
When this expression is compared to I (θ )( g(x − θ )) in (7.54), we get the optimal estimator form
Example 7.31. We can come to the Cramer-Rao relations in an inductive way, analyzing the mean
value estimation in the Gaussian distributed random variable, presented in Section 7.4.5,
1 ||x − µ||22
px (x|µ) = p exp (− ).
σ N (2π ) N 2σ2
with the log-likelihood function (7.47) used for the estimation of the Gaussian distribution
parameters
N ||x − µ||22
J (µ) = − ln px (x|µ) = ln(2π ) + N ln(σ ) + . (7.62)
2 2σ2
The estimated mean value follows from ∂J (µ)/∂µ = 0 with
∂J (µ) ∂ ln px (x|µ) N1 N
=− =− 2 ( ∑ ( x (n)) − µ . (7.63)
∂µ ∂µ σ N n =1
∂2 J ( µ ) N
= 2. (7.64)
∂µ2 σ
In addition, it has been shown that an unbiased estimator attains the bound for all θ if and
only if the first derivative of the log-likelihood function can be written in the form (7.54)
∂ ln px (x|θ )
= I (θ )( g(x − θ )), (7.66)
∂θ
where the estimator with minimum variance is defined by θ̂ (x) = g(x) and the minimum variance
value is Var{θ } = I (1θ ) .
Ljubiša Stanković Digital Signal Processing 347
Notice that the relation in (7.63) is of this form, with I (µ) = N/σ2 and g(x) =
1
N ∑nN=1 x (n).
Cramer-Rao bound for the minimum variances in simultaneous estimation of more than one
parameter, for example, θ = (θ1 , θ2 , . . . , θK ), from the data in x can also be derived following similar
concepts.
The result of an experiment or calculation is commonly a random variable. When the estimate x (n)
is provided, the main question is the confidence in this specific value and how far the true (expected)
value of the considered physical or mathematical value could be? The confidence interval provides
a range within which the true value is estimated to lie. This interval provides the reliability of the
presented estimate.
Consider a Gaussian distributed random variable as the most common case in practice. Assume
that the experiment (or calculation) results are Gaussian distributed. The aim of the experiment is
to estimate the unknown true value µ x . For the Gaussian variable, it is known that all results of an
experiment, x (n), will be within the interval (7.44)
[µ x − 2σx , µ x + 2σx ],
This probability is sufficient for most of the experiments. If required, the probability can be increased
using wider intervals. Here, the unknown true value µ x and the interval bounds are fixed values, without
any randomness.
The confidence intervals are calculated for the specific outcome of the experiment, x (n), and
the a priory estimated spread measure (here the standard deviation σx ). The confidence intervals are
defined as
[ x (n) − 2σx , x (n) + 2σx ] (7.68)
Obviously, the confidence interval is not the same as the interval in (7.67), meaning that a 0.95
probability of (7.67) does not mean that any results of the experiment will be within the confidence
interval with the same probability. However, if we know that the obtained result x (n) is within the
interval in (7.67), with the probability of 0.95, then it means that the true value µ x is within the
confidence interval, [ x (n) − 2σx , x (n) + 2σx ], with the same probability, Fig. 7.20.
Example 7.32. A deterministic signal s(n), with an additive Gaussian noise ε(n), is observed at two
instants n1 and n2 . The standard deviation of the measurement method at the instant n1 was
σx (n1 ) = 0.5, while the standard deviation of the measurement method at n2 was σx (n2 ) = 0.2
(different estimation approaches were used, for example, different windows for averaging; for
the same measurement method, σx (n1 ) = σx (n2 ) would hold). The observed values in these two
measurements are denoted by x (n), and they are equal to:
(a) x (n1 ) = 1.1 and x (n2 ) = −0.2;
(b) x (n1 ) = −0.6 and x (n2 ) = 1.8.
348 Discrete-Time Random Signals
-4 -3 -2 -1 0 1 2 3 4 5 6
Figure 7.20 The interval [µ x − 2σx , µ x + 2σx ] where the Gaussian random variable x (n) lies with the probability
of 0.95, along with the confidence intervals for various x (n) from the defined interval. The common point for all
these confidence intervals is the true mean value µ x (vertical line).
Could we conclude that the true signal s(n) has changed, that is s(n1 ) 6= s(n2 ), for these
two cases (for an experiment, this is the question how can we be confident that a difference in
the true result is obtained under different experiment conditions at n1 and n2 ). The common
probability of 0.95 is assumed for the confidence interval definition.
⋆ (a) For the signal values (experiment outcomes) x (n1 ) = 0.6 and x (n2 ) = 0.1, the correspond-
ing confidence intervals are
[ x (n1 ) − 2σx (n1 ), x (n1 ) + 2σx (n1 )] = [1.1 − 1, 1.1 + 1] = [0.1, 2.2] (7.69)
and
[ x (n2 ) − 2σx (n2 ), x (n2 ) + 2σx (n2 )] = [−0.2 − 0.4, −0.2 + 0.4] = [−0.6, 0.2]. (7.70)
These two confidence intervals overlap, meaning that we can not exclude the case that both true
values, s(n1 ) and s(n2 ), are within the overlapping interval [0.1, 0.2] and that they can take the
same value within this overlapping interval.
(b) When the obtained signal values are x (n1 ) = −0.6 and x (n2 ) = 1.8, the corresponding
confidence intervals are
[ x (n1 ) − 2σx (n1 ), x (n1 ) + 2σx (n1 )] = [−0.6 − 1, −0.6 + 1] = [−1.6, −0.4] (7.71)
and
[ x (n2 ) − 2σx (n2 ), x (n2 ) + 2σx (n2 )] = [1.8 − 0.4, 1.8 + 0.4] = [1.4, 2.2]. (7.72)
These two confidence intervals are clearly separated, meaning that the true signal values, s(n1 )
and s(n2 ), are different with a sufficiently high probability.
Ljubiša Stanković Digital Signal Processing 349
Example 7.33. Consider a random signal x (n) that can take values {No, Yes} or {0, 1} with
probabilities 1 − p and p. If a random realization of this signal is available with N = 1000
samples and we obtained that the event Yes appeared k = 555 times, find the interval where the
true p will be with the probability of 0.95.
Notes: The mean value of samples x (n) is defined by
1
p̂ = x (1) + x (2) + · · · + x ( N ) = k/N.
N
For the binomial distribution, ( Nk ) pk (1 − p) N −k with x (n) ∈ {0, 1}, the expected value of p̂ is
pN
E{ p̂} = E{k}/N = = p.
N
The variance of p̂ is given by
0.03 0.03
0.025 0.025
0.02 0.02
0.015 0.015
0.01 0.01
0.005 0.005
0 0
500 550 600 500 550 600
Figure 7.21 Binomial distribution for N = 1000 and p = 0.55 as a function of k (left) and the Gaussian distribution
with the mean value pN and the variance σ2 = p(1 − p) N.
⋆For the given observation, with k = 555 responses x (n) = 1, the expected value p of the
binomial distributed random variable is estimated as
k 555
p̂ = = = 0.555.
N 1000
350 Discrete-Time Random Signals
For the variance estimation we should know the exact value of p, which is not the case. With the
assumption that p̂ is not far from the exact p, we can use the value of p̂ in the variance calculation
555 555
p (1 − p ) p̂(1 − p̂) p(1 − p) p̂(1 − p̂) 1000 (1 − 1000 ) 0.2470
σp̂2 = = ≃ = = ,
N N p̂(1 − p̂) N 1000 1000
and σ̂p̂ = 0.0157. Therefore, the estimated value p̂ = 0.555 is within the interval
p̂ = 0.555 ∈ p − 2σ̂p̂ , p + 2σ̂p̂ = [ p − 0.0314, p + 0.0314]
with the probability of 0.95, meaning that the true value p is within
with the same probability. The true value is around 55.5%, within ±3.14% range (from 52.36%
to 58.64%) with the probability of 0.95. √
By increasing the value of N we can reduce the margin of the estimation error (σ̂p̂ ∝ 1/ N).
However, about 1000 values are commonly used for various opinion poll estimations.
Bayesian analysis. Within the Bayes’ framework, the probability of the event B (k times x (n) = 1
(Yes) and N − k times x (n) = 0 (No)), with an assumed p is equal to
N k
P( B| p) = p (1 − p ) N − k . (7.73)
k
The posterior is
P( p| B) = P( B| p) P( p)/P( B) ∝ P( B| p) P( p) ∝ P( B| p),
The value of the posterior P( p| B) is shown in Fig. 7.22 for 0 ≤ p ≤ 1 with a step of ∆p = 0.005.
From Fig. 7.22 we can concluded that the posterior, P( p| B), is maximum at p = 0.555,
while the region of significant P( p| B) values is about 7 steps ∆p = 0.005 left and right from the
maximum position, corresponding to 0.555 ± 7 · 0.005 = 0.555 ± 0.035.
Student’s t-distribution. In the previous analysis, we assumed that the standard deviation is known.
When the true value (mean) estimation is done based on a small number of samples, then the standard
deviation has to be estimated as well. For the set of samples x (n), we have the mean and the variance
estimations
1
µ̂ x (n) = x1 ( n ) + x2 ( n ) + · · · + x N ( n ) (7.74)
N
r
1
σ̂x (n) = | x1 (n) − µ̂ x (n)|2 + | x2 (n) − µ̂ x (n)|2 + · · · + | x N (n) − µ̂ x (n)|2 . (7.75)
N−1
Ljubiša Stanković Digital Signal Processing 351
0.03
0.025
0.02
0.015
0.01
0.005
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 7.22 The posterior P( p| B) proportional to the binomial distribution for the event B when N = 1000 and
k = 0.55, as a function of the probability p.
The new random variable, where both µ̂ x (n) and σ̂x (n) are random, is
µ̂ x (n) − µ
z(n) = √
σ̂x (n)/ N
is t-distributed (student distribution). The t-distribution is defined for a given degree of freedom
ν = N − 1 using the gamma functions. For large ν it approaches to the Gaussian distribution, while
for ν = 1 (just two samples) it is quite heavy-tailed and equal to the Cauchy distribution (see Section
7.4.11). The interval −tν < z(n) < tν , where a t-distributed random variable z(n) takes its value with
the probability 0.95, is
Probability{−tν < z(n) < tν } = 0.95 (7.76)
for the value of tν given in Table 7.4 and for some values of ν = N − 1. We can conclude that the
confidence intervals are very wide for small N (for example, six times wider for N = 2 than for
N = 60), while they are almost the same as in the Gaussian distributed random variable for large N,
for example, N ≥ 12.
ν= N−1 1 2 3 5 12 20 60 120
tν 12.076 4.303 3.182 2.571 2.179 2.086 2.000 1.980
Table 7.4
Values of the interval bounds for the t-distribution for Probability{−tν < z(n) < tν } = 0.95.
Example 7.34. The available samples of the random signal x, with the elements x (n), are given by
(a) x = (0.93, 0.17, −0.69, −0.72),
(b) x = (0.93, 0.17, −0.69, −0.72, −0.57, −0.31, 0.27, 1.33. − 1.33, 1.40, −0.57, −0.35, −0.64).
Find the confidence intervals, where the true mean of this random signal is expected with
the probability of 0.95.
352 Discrete-Time Random Signals
⋆In this experiment, both the mean value and the variance are not known and should be estimated
based on the available data.
(a) For N = 4 and ν = N − 1 = 3, and the available realizations of x (n), we have
1
µ̂ x = x (0) + x (1) + x (2) + x (3) = −0.08,
4
r
1
σ̂x = | x (0) − µ̂ x |2 + | x (1) − µ̂ x |2 + | x (2) − µ̂ x |2 + | x (3) − µ̂ x |2 = 0.79.
3
The confidence interval of the normalized and centered random signal
µ̂ x − µ −0.08 − µ
z(n) = √ = ,
σ̂x / 4 0.39
for the probability of 0.95, is defined by −3.182 < z(n) < 3.182 (see Table 7.4), or
[−0.08 − 0.39 × 3.182, −0.08 + 0.39 × 3.182] = [−0.08 − 1.24, −0.08 + 1.24]. (7.77)
1 12
13 n∑
µ̂ x = x (n) = −0.08,
=0
v
u
u 1 12
σ̂x = t | x (n) − µ̂ x |2 = 0.82.
12 n∑
=0
the confidence interval, for ν = N − 1 = 12 and the probability of 0.95, is (see Table 7.4)
[−0.08 − 0.23 × 2.179, −0.08 + 0.23 × 2.179] = [−0.08 − 0.5, −0.08 + 0.5]. (7.78)
Although the same value of the mean value is obtained in both cases, with similar standard
deviations, the confidence interval in (b) shows that the experiment with N = 13 realizations
produces a more reliable estimation of the true mean µ.
Repeat Example 7.27 with the data from this example and comment on the results within
the frequentist and the Bayesian framework.
Variance stabilization – Delta method. Consider again the Bernoulli random variable from Example
7.33. The estimate of the expected value, p, of the probability that x (n) = 1 will appear in the Bernoulli
trial, is given by
k 1 N
N n∑
p̂ = = x ( n ),
N =1
where k is the number of x (n) = 1 appearances in N samples. For large N, this estimate is
approximately Gaussian distributed, Fig. 7.21, with the expected value E{ p̂} = E{k}/N = pN/N = p
and the variance
σp̂2 = Var{ p̂} = Var{k}/N 2 = p(1 − p)/N.
Ljubiša Stanković Digital Signal Processing 353
The property that p̂ − p tends to the Gaussian distributed random variable can be written as
D
p̂ − p → N (0, σp̂2 ).
The problem in the confidence interval definition for p was that the variance σp̂2 depends on the
parameter p which is to be estimated. This has been solved in Example 7.33 using p ≃ p̂. Another
approach to this problem is based on the so called Delta method. This method states that for any
differentiable function g( x ) holds
D
g( p̂) − g( p) → N (0, ( g′ ( p))2 σp̂2 ), (7.79)
for g′ ( p) 6= 0. The proof is simple since for large data set size, N, we can assume that p̂ − p is small
so that the linear Taylor series expansion for the function g( p̂) holds around p, that is
meaning that g( p̂) − g( p) behaves as ( p̂ − p), for sufficiently large N, with the deterministic
proportionality factor of g′ ( p). Since Var{ a( p̂ − p)} = a2 Var{ p̂}, from the previous relation we get
and
1
( g′ ( p))2 σp̂2 = ( g′ ( p))2 p(1 − p)/N = .
4N
p
This means that the random variable arcsin( p̂) is Gaussian distributed with the expected value
√ √
arcsin( p) and the variance 1/(4N ). The confidence intervals for arcsin( p), with the probability
of 0.95, are defined by the two-sigma rule
√ p 1 p 1
arcsin( p) ∈ [arcsin( p̂) − 2 √ , arcsin( p̂) + 2 √ ]. (7.81)
4N 4N
The confidence intervals for p are then obtained taking the sinus of both bounds and squaring the result
r r
p 1 p 1
p ∈ [sin2 arcsin( p̂) − , sin2 arcsin( p̂) + ]. (7.82)
N N
√
In the case when sin arcsin( p̂) − 1/ N is negative, the zero value is used as the lower bound.
For the data from Example 7.33 we get
r r
2
√ 1 2
√ 1
p ∈ [sin arcsin( 0.555) − , sin arcsin( 0.555) + ] = [0.5235, 0.5863].
1000 1000
354 Discrete-Time Random Signals
This interval is almost the same as [0.5236, 0.5864], obtained in Example 7.33, using p ≃ p̂ in the
variance estimation.
The bootstrap is a simple method for statistical inference using remarkable modern computing power,
without relying on many assumptions about the random variable. The main idea is to estimate a statistic
of the considered signal by increasing the number of signal realizations using the existing data and
resampling. Here, is the origin of the method name “pulling itself up by its own bootstrap.” In producing
many realizations, the bootstrap method relies on resampling the existing signal with replacement.
The bootstrap method can be summarized as follows:
1. Consider a signal (data set) { x (n), n = 1, 2, . . . , N }, being a part of much larger population
{ x (n), n = 1, 2, . . . , P}, P ≫ N.
The aim is to provide a statistic as an estimate of the corresponding large population parameter,
using the available data set with N samples only.
2. The original data set x (n), n = 1, 2, . . . , N is resampled into new signals of the length M. We
will consider cases with M = N/2 and M = N and the inference is performed based on these
resampled data.
A new resampled realization of the signal is formed as follows: (a) A random signal sample from
x (n), n = 1, 2, . . . , N, is picked up and assigned to x1 (1). Then this sample is “returned” to the
original data set (so that it can be picked up again, by chance – resampling with replacement).
(b) A new signal sample is randomly picked up from the original set x (n), n = 1, 2, . . . , N and
assigned to x1 (2). This sample is also “returned” to the original data set. This procedure is
repeated M times to form new resampled signal (Bootstrap Sample) x1 with M elements.
3. The desired statistic (in our example, we will consider the mean value) is estimated using x1 as
µ̂(1) = mean{x1 }.
4. The Steps 2 and 3 are repeated for every xb , b = 1, 2, . . . , B, to get
µ̂(b) = mean{xb }, b = 1, 2, . . . , B.
Example 7.35. In order to introduce the basic definitions and principles of the bootstrap method we
will revisit the introductory Example 7.1 and the signal shown in Fig. 7.1, whose values are given
in Table 7.1. Here, we will assume that this set of N = 100 signal values, x (n), is a sample
of a large population with P ≫ N elements. The aim is to estimate the mean value of a large
population using the statistics of the available data set.
In order to perform the statistical analysis using the bootstrap, new realizations should be
created by resampling the original data with replacement. An illustration of this resampling is
given in Table 7.5 for M = 20 and B = 15. The new resampled signals, xb , are obtained by
sampling the original data with replacement, as described in Step 2. Consider, for example, x7 ,
given in the seventh column of this table. Note that the signal sample x (n) = 48 is repeated,
Ljubiša Stanković Digital Signal Processing 355
although there is only one sample x (n) = 48 in the original data set, while many other signal
values do not appear at all in this realization.
A set of B = 1000 resampled realizations of the original signal, xb is formed next. The
bootstrap is applied to this data set with: (a) M = N/2 = 50 and (b) M = N = 100.
The results are shown in Fig. 7.23. We can see that the maximum of the normalized
histogram of B = 1000 values of µ̂(b) = mean{xb } is close to the sample mean value calculated
as the sample average of all 100 available data values. We can also conclude that the confidence
intervals can be estimated considering the probability distribution, Fµ , and its, for example, 0.05
and 0.95 levels.
Since the considered data in Example 7.1 are of the Gaussian nature (what is not an
assumption required by the bootstrap method) we can compare this result with the one obtained
from the variance of the mean value estimate in the Gaussian distributed
√ variable
√ (7.65),
using the standard deviation calculated in Example 7.5, σµ = σ̂x / M = 17.73/ 50 = 2.5.
For the probability of 0.90 the confidence intervals for the Gaussian distribution would be
[55.76 − 1.65σµ , 55.76 + 1.65σµ ] = [51.63, 59.88]. The confidence intervals of the mean value
estimation obtained with the bootstrap method correspond with the theoretical ones for this
distribution.
Table 7.5
Bootstrap resampling of the signal x (n), n = 1, 2, . . . , 100 from Fig. 7.1. B = 15 new signals xb , b = 1, 2, . . . , B, are
formed. Every new signal is of M = 20 length. New resampled signals are formed by randomly picking up a sample
form x (n), n = 1, 2, . . . , N, then “returning” this sample into the original set (so that it can be picked up again, by
chance), randomly picking up second sample of xb , “returning” it, and so on M times. This procedure is repeated
for every xb , b = 1, 2, . . . , B. In practice, B is commonly large.
0.2 0.95
0.15
0.1
0.05
0 0.05
40 50 60 70 40 50 60 70
0.3 0.95
0.2
0.1
0 0.05
40 50 60 70 40 50 60 70
Figure 7.23 Bootstrap statistics of the mean value of a large population with the reduced set of available data
shown in Fig. 7.1. A large point on the horizontal axis stands for the sample average of the considered data set.
The hypothesis testing was introduced in statistics since its foundations are established in the first part
of the last century. The main goal of the hypothesis testing is to provide a statistical decision based
on the experimental data (random signal values). Although the answer to this kind of question can be
provided, in an indirect way, using the presented Bayes’ inference or the confidence intervals, we will
provide here its original analysis due to importance in signal processing and detection theory.
The basic concepts in the hypothesis testing are:
• Null hypothesis, H0 . It assumes that the tested event has not happened and that the experiment
result is obtained by pure chance.
• Alternative hypothesis H1 is contrary to the null hypothesis, meaning that the null hypothesis is
rejected and the experiment result is not obtained by pure chance.
• Level of significance shows how we are confident in the decision made about accepting or
rejecting the null hypothesis since the probability of 1 is not possible in this kind of testing. It is
common to assume that the level of significance is equal to α = 0.05 or α = 0.01, corresponding
to the probabilities of 0.95 or 0.99.
• Type of error I or false-negative result when the null hypothesis is rejected although this
hypothesis was true.
• Type of error II or false-positive result when the null hypothesis is accepted, while this hypothesis
was not true.
Example 7.36. Consider a multiple-choice test with 5 answers to each of N = 20 questions. Only one
of these 5 answers is correct for every question. The null hypothesis, H0 , is the assumption that
the person who answers the test does not have any knowledge of the test topic. Find the number
of the correct answers when the null hypothesis can be accepted with the probability of 0.95.
Ljubiša Stanković Digital Signal Processing 357
⋆ The probability of a correct answer to a specific question, if the null hypothesis holds, is
p = 1/5. The probability that the person will give k correct answers to N = 20 questions, with
the null hypothesis, is already calculated, (7.37), and it is equal to
20−k
P(k| H0 ) = 20 k
k p (1 − p ) .
These probabilities are calculated for every k and shown in Fig. 7.24(a). The probability
distribution is given in Fig. 7.24(b) and (c).
0
1
0.05
0.8
-5 0.04
0.6
0.03
-10 0.4 0.02
0.2 0.01
-15
0 0
0 10 20 0 10 20 5 6 7 8 9 10 11
Figure 7.24 Hypothesis testing. (a) The logarithm of the probability of k correct answers with the null hypothesis,
P(k | H0 ). (b) Cumulative probability distribution of P(k| H0 ). (c) Values of the complementary probability
distribution, being equal to the probability that more than k correct answers will be given with the null hypothesis.
Now, for the given probability we can find the limit number, k, of the positive answers if
the null hypothesis is true. Obviously, it is k = 7 for α = 0.05. This means that if the person has
given k < 7 correct answers, the decision should be that the null hypothesis (the person does have
any knowledge of the tested subject) is true, with a significance level of 0.05. The hypothesis
rejection region is k ≥ 7.
For the significance level of α = 0.01, the hypothesis rejection region would be k ≥ 8.
For example, if the tested person provided k = 10 correct answers on this multiple choice
test, the so-called p-value of this result of the experiment is equal to probability of the considered
experiment producing such an outcome or anyone more extreme,
20 20
p= ∑ P(k| H0 ) = ∑ 20
k pk (1 − p)20−k = 0.0026 < α.
k =10 k =10
This means that for k = 10 correct answers, the null hypothesis should be rejected for both
α = 0.05 or α = 0.01.
Finally, we will calculate the type of error I (false-negative result) for the case k = 10. It is
equal to the probability that we have decided to reject the null hypothesis, since the p-value is
p = 0.0026 < α, but that the person, in reality, does not have any knowledge in this area and that
null hypothesis was true. The type of error I is equal to the probability
that the null hypothesis
holds and that there were k = 10 correct answers, P(k| H0 ) = 20 k
k p (1 − p )
20−k = 0.002,
meaning that 1 person in 500 will achieve this kind of the result.
358 Discrete-Time Random Signals
In many practical cases, we may assume that the random variables (random signal samples) in
the considered experiment (in hypothesis testing, called population) are Gaussian distributed, under
the null hypothesis. This assumption also holds if the particular random variable is not Gaussian, but
the total number of samples is sufficiently large so that the distribution of the sample mean value is
approximately Gaussian, for example, as it was the case in the poll analysis in Example 7.33, and
proven in Example 7.24.
Consider a random signal, x (n), whose probability density function is
1 2
p x (ξ ) = √ e−(ξ −µx )/(2σx )
σ 2π
under the null hypothesis. The result of the experiment is the signal sample x (n) = A. Here, we may
consider three possible scenarios of practical interest for the null hypothesis rejection:
• The experiment result is not equal to the expected mean (two-sided test). This case corresponds
to the case when we want to make the decision if any constant value (positive or negative) is
added to the considered random variable under the null hypothesis. For the assumed level of
significance, the region of rejection is obtained from
λ
Probability{| A − µ x | > λ} = 1 − erf( √ ) < α.
2σx
For the significance level of α = 0.05, the rejection region for the null hypothesis is
| A − µ x | > 2σx .
• The experiment result is greater than the expected mean (right-tailed test). This scenario appears
when the aim is to establish if a certain action has increased the expected value in a positive
direction. Here, we are not interested in a possible decrease in the mean. For the assumed level
of significance, the region of rejection follows from
1 λ
Probability{ A − µ x > λ} = (1 − erf( √ )) < α.
2 2σx
For the significance level of α = 0.05, the rejection region for the null hypothesis is
• The experiment result is lower than the expected mean (left-tailed test). This is opposite to the
previous one.
Example 7.37. An author was selling 34 books on average per week. To improve the sales of his book,
the author designed and implemented an advertisement campaign. The following week, he sold
41 books. Can the author reject the null hypothesis (meaning that the advertisement campaign
had no impact on book sales) with a significance level of α = 0.05?
The number of sold books with the null hypothesis obeys the Poisson distribution (Section
7.4.12 and Problem 7.23)
λk e−λ
P( x (n) = k| H0 ) =
k!
with λ = 34, that can be approximated (for large λ ≥ 20) by the Gaussian distribution, with
µ = λ = 34,
Ljubiša Stanković Digital Signal Processing 359
σ2 = λ = 34,
as illustrated in Fig. 7.27,
1 2
p(ξ | H0 ) = √ e−(ξ −34) /68
.
68π
⋆Since we are looking for a possible influence of the advertisement campaign to the increase
in the number of sold books, we are interested in the right-tailed test, when the criterion for the
hypothesis rejection is (7.83)
√
41 − 34 > 1.645 × 34,
√
with the observed value A = 41. Since 1.645 × 34 = 9.5919, the author cannot reject the null
hypothesis, meaning that the hypothesis that the advertisement campaign does not have any
influence on the number of sold books cannot be rejected.
Example 7.38. The Fourier transform of a signal is presented in Fig. 7.25(left). The Fourier transform
elements of the noise only (null hypothesis) are zero-mean Gaussian random variables with the
variance σX2 = 1. For every element of the Fourier transform, X ( k ), test the null hypothesis and
indicate the elements for which this hypothesis can be rejected with significance level α = 0.001,
meaning that we can reject the hypothesis that there is no signal component at the considered
frequency index.
⋆ For the significance level of α = 0.001 the rejection region of the null hypothesis, for the
Gaussian random variable with the mean µ X and the variance σX2 , is
λ
Probability{| X (k) − µ X | > λ} = 1 − erf( √ ) < 0.001,
2σX
| X (k) − µ X | > 3.2905σX or | X (k)| > 3.2905
Therefore, we cannot reject the null hypothesis for all Fourier transform elements (µ X = 0),
except those at k ∈ {4, 10, 66, 71, 88}, as shown in Fig. 7.25(right), where the rejection region,
for the significance level α = 0.001, is shaded.
When the mean value and the variance of the random variable in the experiment are not known in
advance, then the t-distribution (see Example 7.34) should be used.
In many application the complex-valued Gaussian noise is used as a model for disturbance. Its form
is ε(n) = ε r (n) + jε i (n), where ε r (n) and ε i (n) are real-valued Gaussian noises. Commonly, it is
assumed that they are zero-mean, independent, with identical distributions (i.i.d.), and variance σ2 /2.
The mean value of this noise is
6 6
4 4
3.29
2 2
0 0
-2 -2
-3.29
-4 -4
-6 -6
0 20 40 60 80 100 0 20 40 60 80 100
Figure 7.25 The null hypothesis testing for the Fourier transform of a signal with zero-mean Gaussian noise with
the variance σX2 = 1 (left). The null hypothesis rejection regions (shaded) for the random variable X (k) with the
significance level of α = 0.001, corresponding to | X (k )| > 3.2905.
The variance is
σε2 = E{ε(n)ε∗ (n)} = E{ε r (n)ε r (n)} + E{ε i (n)ε i (n)} + j(E{ε i (n)ε r (n)} − E{ε r (n)ε i (n)})
= E{ε r (n)ε r (n)} + E{ε i (n)ε i (n)} = σ2 .
The amplitude of Gaussian noise |ε(n)| is an important parameter in many detection problems.
The probability density function of the complex-Gaussian noise amplitude is of the form
2ξ −ξ 2 /σ2
p|ε(n)| (ξ ) = e u ( ξ ).
σ2
The probability density function p|ε(n)| (ξ ) is called the Rayleigh distribution.
In order to prove the previous relation, consider the probability density functions of ε r (n) and
ε i (n). Since they are independent and equally distributed, we get
With ξ = ρ cos α and ζ = ρ cos α (the Jacobian of the polar coordinate transformation, J = |ρ|), we get
2
q Zχ Z2π χZ /σ2
2 2
1 − σρ 2 − χσ2
P{ ε2r (n) + ε2i (n) < χ} = 2 e ρdρdα = e−λ dλ = (1 − e )u(χ) = F|ε(n)| (χ).
σ π
0 0 0
dF|ε(n)| (ξ ) 2ξ −ξ 2 /σ2
p|ε(n)| (ξ ) = = e u ( ξ ). (7.84)
dξ σ2
Ljubiša Stanković Digital Signal Processing 361
Example 7.39. A random signal is defined as y(n) = |ε(n)|, where ε(n) is the Gaussian complex
zero-mean i.i.d. noise with variance σ2 . What is the probability that y(n) ≥ A? Calculate this
probability for A = 2 and σ2 = 1.
2ξ − ξ 22
py ( x ) = e σ u(ξ )
σ2
The probability that y(n) ≥ A is
ZA
2ξ − ξ 22 2 2 −A
2
P{ξ > A} = 1 − P{ξ ≤ A} = 1 − 2
e σ dξ = 1 − (1 − e− A /σ ) = e σ2 .
σ
0
The Rayleigh distribution can be related to the χ-squared distribution, which is obtained as the
distribution for the sum of squares of N random Gaussian variables, xi (n), i = 1, 2, . . . , N,
The distribution of z(n) = |ε(n)|2 , where |ε(n)| is the Rayleigh distributed variable, is equal to the
χ-squared distribution of z(n) with N = 2 (see Example 7.14).
This noise is used to model disturbances when strong impulses occur more often than in the case of
Gaussian noise. Due to possible stronger pulses, their probability density function decay toward ±∞ is
slower than in the case of Gaussian noise (a definition of the so called heavy-tailed noise is given in
Example 7.15).
The Laplacian noise has the probability density function
1 −|ξ |/α
pε(n) (ξ ) = e .
2α
It decays much slower as |ξ | increases than in the Gaussian noise case.
The Laplacian noise can be generated using
where ε i (n), i = 1, 2, 3, 4 are real-valued Gaussian independent zero-mean noises, Fig. 7.26 (for the
variance of this noise see Problem 7.20).
The parameters of the Laplace distributed signal can be estimated from data, as it is done in Section
7.4.5. For the stationary Laplacian distributed random variable x (n), x = [ x (1), x (2), . . . , x ( N )] T , with
mean µ, the likelihood maximization problem is equivalent to the log-likelihood minimization problem
again stated as
n o
(α, µ) MLE = arg min − ln px (x|α, µ) , (7.85)
362 Discrete-Time Random Signals
with
1 −| x(1)−µ|/α 1 1
px (x|α, µ) = e × · · · × e−| x( N )−µ|/α = N N e−||x−µ||1 /α (7.86)
2α 2α 2 α
Here, we have to minimize the cost function − ln px (x|α, µ) , defined by
||x − µ||1
J (α, µ) = N ln(2) + N ln(α) + , (7.87)
α
where ||x − µ||1 is the one-norm (L1 -norm) of vector x − µ. The solution to the L1 -norm minimization
problem is presented in Section 7.1.2,
µ = median{x}.
1 1
α= ||x − µ}||1 = ||x − median{x}||1 .
N N
Gaussian distribution
pε(ξ)
0.6
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Laplacian distribution
pε(ξ)
0.6
0.4
0.2
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 7.26 The Gaussian and Laplacian noise histograms (with 10000 realizations), with corresponding
probability density function (dots).
using N = 1001 realizations of the zero-mean Gaussian distributed random variables, x1 (n),
x2 (n), x3 (n), and x4 (n), with the same variance σx = 1.
The Laplacian distribution parameters, obtained by minimizing
are
µ = median{y} = 0.98
and
α = ||y − median{y}||1 /N = 1.04,
where y = [y(1), (y2), . . . , y(1001)] T .
We can also calculate the posterior
with
1
py (y|α, µ) = e−(|y(1)−µ|+|y(2)−µ|+···+|y( N )−µ|)/α
(2α) N
and present it, for a given N, using discrete sets of α and µ, as in Fig. 7.18.
The impulsive noise could be distributed in other ways, like, for example, the Cauchy distributed
noise, whose probability density function is
1
pε(n) (ξ ) = .
π (1 + ξ 2 )
The Cauchy distributed noise ε(n) is a random signal that can be obtained as a ratio of two independent
Gaussian random signals ε 1 (n) and ε 2 (n), that is, as
ε 1 (n)
ε(n) = .
ε 2 (n)
Another realization of the Cauchy random signal and the definition of the heavy-tailed noise are given
in Example 7.15.
The Poisson noise (or shot noise) is a random signal ε(n) which can take discrete integer values k with
the probability of
λk e−λ
P(ε(n) = k) = P(k) = for λ > 0.
k!
The mean value and the variance of ε(n) are µε = λ and σε2 = λ, respectively (see Problem 7.23).
The Poisson random variable is commonly used to model small-probability discrete events. It is typically
concerned with the number of events (for example, the number of phone calls in communications or
the actual number of particles detected in an image sensor) that occur in a certain (unit) time interval.
364 Discrete-Time Random Signals
0 0 0
0 20 40 0 20 40 0 20 40
Figure 7.27 Poisson probability for λ = 5 (left), λ = 10 (middle), and λ = 20 (right), along with the Gaussian
probability density function (crosses) with the mean value µ = λ = 20 and the variance σ2 = λ = 20.
Example 7.41. Within a long duration, continuous-time signal, an impulsive disturbance appears 15
times per minute, on average. What is the probability that there will be less than 3 impulsive
disturbances within a randomly selected continuous-time interval, whose duration is 24 seconds?
⋆ Since the analyzed interval is 24 seconds, all parameters will be reduced to 24 seconds, as the
unit of time. The average number of disturbances within every 24 seconds is 15/60 × 24 = 6.
This means that the parameter λ in the Poisson distribution is λ = 6. The probability that there
are less than 3 disturbing events in 24 seconds is then equal to the probability that there are either
0 disturbances, ε(n) = 0, or 1 disturbance, ε(n) = 1, or 2 disturbances, ε(n) = 2, within the
selected interval, that is
2
6k e −6 62 e−6
P ( ε ( n ) = 0) + P ( ε ( n ) = 1) + P ( ε ( n ) = 2) = ∑ = e−6 + 6e−6 + = 0.062.
k =0
k! 2!
This means that the event of less than 3 disturbances in 24 seconds will occur once in about 16
such intervals.
The probability of, for example, 6 or fewer disturbances in 24 seconds would be 0.6063.
and β > 0, is called the exponential distributed signal. The expected value of this signal is µ x = β,
since
Z∞ ∞ Z∞
ξ −ξ/β ξ
µx = e dξ = − βe−ξ/β + e−ξ/β dξ = β.
β β 0
0 0
Ljubiša Stanković Digital Signal Processing 365
Zχ
1 −ξ/β
Fx (χ) = e dξ = (1 − e−χ/β )u(χ).
β
0
The probability that a random variable x (n) will take a value greater than χ is
Example 7.42. A random signal x (n) is equal to the length of life of the system denoted by the index
n. The average lifetime of this system is 10 years and its life-length is exponentially distributed.
What is the probability that the signal value is x (n) > 20, meaning that the system n will last
more than 20 years?
If the system consists of three components whose life-lengths are statistically independent
and exponentially distributed, with average lifetimes β 1 = 5, β 1 = 10, and β 3 = 15 years,
respectively, and if the system fails if any of its components fails, what is the average lifetime of
the system?
⋆ The value of the parameter β in the exponential distribution is equal to the expected lifetime,
β = 10. The probability that the system lasts x (n) > 20, is
The probability that the system with three statistically independent components will last
longer than χ is equal to the product of the probabilities that each of the components will last
longer than χ, that is
The average lifetime is β = 2.7 and it is shorter than the average life of any of the components.
Example 7.43. Find the Fourier transform, characteristic function, and the moment generating function
of the exponentially distributed random variable x (n). What are the moments of this random
variable?
366 Discrete-Time Random Signals
The exponential distributed random variable exhibits memoryless property, since its probability
of exceeding the value χ + a, given that it has exceeded the value a, is
P{ x (n) > χ + a and x (n) > a} = P{ x (n) > χ + a| x (n) > a} P{ x (n) > a}
and the fact that P{ x (n) > χ + a and x (n) > a} = P{ x (n) > χ + a}, for χ ≥ 0, since the event
x (n) > χ + a includes the event x (n) > a. These two relations produce
In real-world scenario, the signals s(n) are commonly corrupted with additive disturbances, denoted
by ε(n). Then, processing methods are applied on the noisy signals,
x ( n ) = s ( n ) + ε ( n ),
where ε(n) is the additive noise. For a deterministic signal s(n), the expected value of the noisy signal
x (n) is equal to the sum of the deterministic signal value and the expected value of the noise, that is
The variance of the noisy signal is not influenced by the deterministic signal,
In some application the noise effect is multiplicative and depends on the signal itself. Then, the
noisy signal model is
x (n) = (1 + ε(n))s(n).
The expected value and the variance of the noisy signal, with multiplicative noise, are given by
Both the mean and the variance are signal-dependent in the case of multiplicative noise.
Depending on the type of noise, the results obtained so far for various disturbance forms, can be
applied to the analysis of noisy signals. This will be the topic of the next sections.
In signal processing, the most common signal models are the sinusoidal signals, along with their
processing using the Fourier analysis. Influence of noise to this kind of signals and transforms will be
studied in this section.
we get
2
σX (k) = σε2 N. (7.92)
If the deterministic signal s(n) is a complex sinusoid, that is
with the frequency k0 on the grid, ω0 = 2πk0 /N, then its DFT is
S(k) = ANδ(k − k0 ).
The peak signal-to-noise ratio, being relevant parameter for the DFT based estimation of the signal
frequency, is defined by
maxk |S(k)|2 A2 N 2 A2
PSNRout = 2
= 2 = 2 N. (7.94)
σX σε N σε
Its logarithmic form, expressed in dB, is 20 log10 ( AN/σε ). The value of the peak signal-to-noise ratio
increases as N increases. This result is expected, since the signal values are added in phase, increasing
the DFT amplitude N times (its power N 2 times), while the noise values are summed up in power.
The noise influence on the DFT of the real-valued sinusoid
n k
X(k)
x(n)
n k
Figure 7.28 Illustration of the noise-free signal, x (n) = cos(6πn/64), and its DFT, Xk (top panels). The same
signal is corrupted with an additive zero-mean real-valued Gaussian noise of variance σε2 = 1/4, and shown, along
with its DFT (bottom panels).
The input signal-to-noise ratio (SNR) for the signal in (7.93) is defined by
N −1
2
∑ | x (n)|
Ex n =0 N A2 A2
SNRin = = o = Nσ2 = σ2 . (7.95)
Eε N −1 n
2 ε ε
∑ E |ε(n)|
n =0
Ljubiša Stanković Digital Signal Processing 369
If the maximum DFT value is detected, then only its value could be used for the signal
reconstruction (equivalent to the notch filter at k = k0 being used). The DFT of the output signal
is then
Y ( k ) = X ( k ) δ ( k − k 0 ).
The output signal in the discrete-time domain is
1 N −1 1
Y (k)e j2πkn/N = X (k0 )e j2πk0 n/N .
N n∑
y(n) =
=0 N
Since X (k0 ) = AN + Ξ(k0 ) according to (7.89) and (7.92), where Ξ(k) is the noise in the frequency
domain, whose variance is equal to σε2 N, we get
Taking 10 log(◦) of both sides we get the output-to-input relation for the signal-to-noise in dB,
In order to improve the representation and estimation performance of the Fourier transform of a noisy
signal s(n) + ε(n), the Fourier transform is commonly calculated using a window function w(n). This
topic will be studied again, in detail, in Part V, since the windows play a crucial role in time-frequency
analysis. Here, we will present the basic forms and results.
The assumed noise is additive and white, rεε = σε2 δ(n), with the zero-mean. The DFT of the
signal, multiplied by the window function, is equal to
N −1
X (k) = ∑ w(n) [s(n) + ε(n)] e− j2πkn/N .
n =0
where W (k) = DFT{w(n)} is the DFT of the window and ∗k denotes the convolution in frequency.
The variance of X (k) is given by
N −1 N −1 N −1
2
σX (k) = ∑ ∑ w(n1 )w∗ (n2 )σε2 δ(n1 − n2 )e− j2πk(n1 −n2 )/N = σε2 ∑ |w(n)|2 = σε2 Ew ,
n1 =0 n2 =0 n =0
(7.97)
Since we will use mathematical tools that require continuous frequency, consider the Fourier
transform of discrete-time noisy signal x (n) = s(n) + ε(n),
∞
X (e jω ) = ∑ w(n) x (n)e− jωn , (7.98)
n=−∞
where w(m) is a real-valued window, such that w(0) = 1. The frequency variable will be kept in
continuous form since we will use its derivatives in the explanations that follow. The signal s(n) is
deterministic and the noise ε(n) = ε r (n) + jε i (n) is a complex-valued white Gaussian noise with
independent and identically distributed real and imaginary parts, N (0, σε2 /2)). The auto-correlation
function of this noise is
rεε (m) = E{ε(n)ε∗ (n − m)} = σε2 δ(m). (7.99)
The expected value of the Fourier transform, for the noisy signal x (n) = s(n) + ε(n), is
∞
E{ X (e jω )} = E{ ∑ w(n)[s(n) + ε(n)]e− jωn }.
n=−∞
The expected value of the Fourier transform can be written as a convolution of the Fourier transform
W (e jω ) of the window w(n),
∞
W (e jω ) = ∑ w(n)e− jωn ,
n=−∞
and the original Fourier transform, S(e jω ), of the signal s(n), without the window
∞
S(e jω ) = ∑ s(n)e− jωn .
n=−∞
Thus,
Zπ
1
E{ X (e jω )} = S(e j(ω −α) )W (e jα )dα, (7.101)
2π
−π
where the integration is performed over the discrete-time Fourier transform period, −π < ω ≤ π.
The Fourier transform calculated with a window is biased. The window w(n) causes the bias in the
Fourier transform, since its application results in a form that differs from the original Fourier transform
without a window. By expanding S(e j(ω −α) ) in (7.101) into a Taylor series, around ω,
∂S(e jω ) 1 ∂2 S(e jω ) 2
S(e j(ω −α) ) = S(e jω ) − α+ α + ...,
∂ω 2 ∂ω 2
we get
Zπ
1 1 ∂2 S(e jω )
S(e j(ω −α) )W (e jα )dα = S(e jω ) + m2 + . . . , (7.102)
2π 2 ∂ω 2
−π
Ljubiša Stanković Digital Signal Processing 371
where
Zπ Zπ Zπ
1 jω 1 jω 1
W (e )dω = w (0) = 1,2
m1 = ωW (e )dω = 0, m2 = ω 2 W (e jω )dω.
2π 2π 2π
−π −π −π
The first frequency domain moment m1 (and all other odd moments) of W (e jω ) is equal to zero, since
W (e jω ) is an even function (as the Fourier transform of an even, real-valued window function w(n)).
From (7.102) follows that the first term is the original Fourier, while the remaining terms introduce
the Fourier transform distortion (bias). They can be approximated by
Zπ
1 1 ∂2 S(e jω ) 1 ∂2 S(e jω ) 1
S(ω − α)W (e jα )dα − S(e jω ) = m2 + . . . . ∼
= m2 = b(n, ω )m2 .
2π 8 ∂ω 2 2 ∂ω 2 2
−π
(7.103)
A complex-valued Gaussian noise with independent and identically distributed real and imaginary parts,
N (0, σε2 /2)) ia assumed. For a white noise, the variance of the Fourier transform estimator reduces to
∞
2
σX = ∑ σε2 w2 (n) = σε2 Ew , (7.105)
n=−∞
where Ew is the energy of the window. A finite energy window is sufficient to make the variance of
X (e jω ) finite for the Gaussian, zero-mean, white noise. We can conclude that the variance increases as
the energy of the window, Ew , increases. This means that wide windows will produce big variances,
just opposite to the bias which is small for wide windows. Since narrow windows produce large bias
and wide windows are characterized by large variances in the Fourier transform estimation, a trade-off
is required to balance these two sources of the estimation error.
372 Discrete-Time Random Signals
The optimum window width can be obtained by minimizing the mean squared error (MSE) defined as
a sum of the squared bias and variance
e2 = bias2X (ω ) + σX
2
( ω ). (7.106)
Example 7.44. Consider a signal s(n) whose second-order derivative of the Fourier transform is
∂S(e jω )/∂ω 2 (higher-order derivatives can be neglected), and the Hann(ing) window w(n) of
the width N is used in calculation. Find the optimum window width.
⋆ For the Hann(ing) window, Ew = 3N/8 and m2 = 2π 2 /N 2 , so using (7.103) and (7.105), we
get
!2
2∼ π
4 ∂2 S(e jω ) 3N 2
e = 4 + σ . (7.107)
N ∂ω 2 8 ε
It has been assumed that the fourth and other higher-order Fourier transform derivatives can be
neglected. From ∂e2 /∂N = 0, the approximation of the optimum window width follows
s
2 4
∼ 5 40b (ω )π
Nopt (ω ) = (7.108)
3σε2
with b(ω ) = ∂2 S(e jω )/∂ω 2 . Roughly speaking, this relation means that small values of the
window width (intensive smoothing in frequency direction) should be used at the points where
there are no variations in frequency of the Fourier transform, that is, where b2 (ω ) is small.
When b2 (ω ) is large, then the window should be wide, meaning less intensive smoothing, that is,
keeping the original Fourier transform form, for the points when its variations are high. As far as
the noise is concerned, low noise cases (small σε2 ) do not require any smoothing of the original
Fourier transform in the frequency direction. Thus, wide windows should be used. For a high
noise, the Fourier transform smoothing will improve the results.
Of course, in reality, we do not know anything about the signal or its Fourier transform in advance.
An algorithm for the estimation of Nopt (ω ), without using the value of b2 (ω ), will be presented in the
next example.
within −512 ≤ n ≤ 511, where ε(n) is the zero-mean, unit-variance Gaussian noise, σε = 1, is
analyzed using the Fourier transforms, X N (k), with two Hann(ing) windows, one whose width is
N = 1024 and the other with N = 128. For each frequency index k, we will use better of these
two Fourier transforms by checking the confidence intervals intersection.
To simplify the problem, a real-valued and even signal is assumed whose Fourier transform
is real-valued. The standard deviation of the real part of the√Fourier transform, X N (k), calculated
using the Hann(ing) window of the width N, is σXN = √σε 3N/8, while the confidence interval
2
Ljubiša Stanković Digital Signal Processing 373
where the factor of 2.5 is used for the confidence intervals (probability of almost 0.99),
√ assuming
that the noise variance can be estimated from the data. The standard deviation σε / 2 was used
in σXN since the noise was not even and only a half of its power is in the real-valued part of the
Fourier transform.
⋆ For each frequency index k, with the corresponding continuous frequency ω = 2πk/1024,
the Fourier transform is calculated using N = 128, zero-padded up to 1024. This value is denoted
by X128 (k). Then the Fourier transform with N = 1024 is calculated and denoted by X1024 (k).
The confidence intervals are formed for these two Fourier transform values calculated with two
window widths,
h r r i
3 3
X128 (k) − 7.1 128, X128 (k) + 7.1 128
8 8
h r r i
3 3
X1024 (k) − 7.1 1024, X124 (k) + 7.1 1024 .
8 8
If these intervals intersect, then X (k) = X128 (k), otherwise X (k) = X1024 (k). Namely, if the bias
is small, then the Fourier transform X N (k) calculated using both windows will contain the true
value of the Fourier transform (of the noise-free signal). Therefore, for small bias the confidence
intervals will intersect, meaning we should use the window with a smaller variance, which is in
our experiment N = 128. If the bias is large, then it will highly depend on the window width
and will move the obtained Fourier transform X N (k) from its true position. Then, the confidence
intervals will be dominated by the bias (different for two windows) and will not contain the
true Fourier transform value, meaning that they will not intersect. Since the bias is large, in this
case, we should use a small bias window with N = 1024. The result is shown in Fig. 7.29. The
improvement in the SNR ratio is evident.
This is a simplified version of the intersection of confidence intervals (ICI) method to the
window width optimization (Katkovnik-Stankovic method for the window width optimization in
time-frequency analysis). For practical applications, the noise variance should also be estimated
from the data (see Problem 7.12).
Calculation of higher-order moments and the cross-correlation functions for the Fourier transform
of noisy signals could be found in the literature (for the correlation calculation, see the problems).
7.5.6 Periodogram
The power spectral density of signal is commonly estimated using the squared absolute value of the
Fourier transform of the signal, called periodogram,
2
1 2 1 N
jω jω − jωn
Px (e ) = X (e ) = ∑ x (n)e . (7.109)
2N + 1 2N + 1 n=− N
374 Discrete-Time Random Signals
600
400
200
0
-200 (a)
-3 -2 -1 0 1 2 3
600
400
200
0
-200 (b)
-3 -2 -1 0 1 2 3
600
400
200
0
-200 (c)
-3 -2 -1 0 1 2 3
600
400
200
0
-200 (d)
-3 -2 -1 0 1 2 3
1000
500
0
(e)
-3 -2 -1 0 1 2 3
Figure 7.29 Spectral analysis of a signal with two windows in order to approximate optimal window width for
Example 7.45.
Ljubiša Stanković Digital Signal Processing 375
As it has been shown in Section 7.3.4, the periodogram is equal to the power spectral density calculated
(windowed) by a Bartlett window, that is
2N
|k|
Pxx (e jω ) = lim ∑ (1 − )r xx (k)e− jωk = Sxx (e jω ) ∗ω W (e jω ), (7.110)
N →∞
k =−2N
2N + 1
|n|
where W (e jω ) = FT{(1 − 2N +1 )}. This means that the periodogram is a biased estimate of the power
spectral density, for any signal, except r xx (k) = Cδ(k).
Example 7.46. Find the power spectral density of the random signal
where ε(n) is zero-mean Gaussian noise with unite variance. Find the power spectral density
calculated using the periodogram with a window of the width N.
The periodogram of a noisy signal is also a biased estimator of the noise-free periodogram of
deterministic signals. Consider the signal x (n) = s(n) + ε(n), where s(n) is deterministic and ε(n) is
white complex-valued i.i.d. noise with the variance σε2 . Its periodogram is
2
1 N/2−1
jω − jωn
Px (e ) = ∑ (s(n) + ε(n))e . (7.111)
N n=− N/2
376 Discrete-Time Random Signals
0
-100 -50 0 50 100 (a)
0
-100 -50 0 50 100 (a)
0.5
-0.5
-100 -50 0 50 100 (c)
Figure 7.30 Periodogram of a chirp signal (a), chirp-noisy signal (b), and the difference of the previous two
periodograms, being highly signal dependent, with variations (and variance) proportional to |S(k )|2 /N, (c).
Ljubiša Stanković Digital Signal Processing 377
The window, w(k), decays smoothly from w(0) = 1 toward zero for k = ± N. The frequency
domain form of this estimator is equal to the convolution of the true power spectral density and
the Fourier transform of the window, W (e jω ) = FT{w(n)}). In the discrete frequency domain, the
Blackman–Tukey periodogram can be calculated using
1 N − k −1
N − k i∑
r̂ xx (±k) = x ( k + i ) x (i )
=0
1 N − k −1
N i∑
r̂ xx (±k) = x ( k + i ) x ( i ).
=0
The biased estimator under-estimates r xx (k) values for large |k|, however, they should be small anyway.
This estimator avoids possible large outliers in estimating r xx (k) from a small number of samples, for
large |k|.
Daniell periodogram. In order to reduce the noise influence, the smoothed versions of the periodogram
are used as the spectral estimators. The simplest smoothed form of the periodogram is
L L
1 1 1
PxS (k) = ∑ Px (k − i ) = ∑ | X (k − i )|2 .
2L + 1 i=− L 2L + 1 i=− L N
Here, the frequency domain window, W (k), takes the simplest possible form of the rectangular window
in the Blackman-Tukey method, where W (i )| X (k − i )|2 /N was used. Therefore, the Daniell spectral
estimator is a particular case of the Blackman–Tukey class of spectral estimators. It can easily be
related to the Blackman–Tukey periodogram estimator (7.110) using
N/2−1
|n|
A
Sxx (e jω ) = ∑ w(n)(1 − )r xx (n)e− jωn = W (k) ∗k Sxx (k),
n=− N/2
N
where the smoothing window in the frequency domain is the Fourier transform of the auto-correlation
|n|
function window, w(n)(1 − N ), and corresponds to
L L
1 1 1
PxS (k) = ∑ W (i ) Px (k − i ) = ∑ W (i )| X (k − i )|2 . (7.115)
2L + 1 i=− L 2L + 1 i=− L N
S-method. In the analysis of signals with varying spectral content, the Fourier transform is spread due
to the frequency variations of the spectral content within the window (see the stationary phase method in
Chapter 1). Then, instead of smoothing the periodogram in the same direction (in-direction smoothing)
378 Discrete-Time Random Signals
in (7.115), the counter-direction cross-multiplication can be done, and the spectral estimator
L
1 1
SMx (k) = ∑ W (i ) X ( k + i ) X ∗ ( k − i )
2L + 1 i=− L N
is obtained. This is the so-called S-method based spectral estimator. By increasing the width of L in
this method, we could arrive at the Wigner distribution.
where −128 ≤ n ≤ 127, and ε(n) is a zero-mean Gaussian noise with the unit variance. Use
the periodogram with a Hann(ing) window of the width N = 256, the Daniell (Blackman-Tukey
smoothed) estimator, and the S-method, with the same window. In both, the Blackman-Tukey
estimator and the S-method estimator, use L = 7 and W (i ) = 1.
⋆ Spectral analysis of this random noisy signal using the periodogram, the Daniell (Blackman-
Tukey smoothed) estimator, and the S-method based estimator is shown in Fig. 7.31. The
periodogram of the noise-free signal is shown in Fig. 7.31(a). Two highly concentrated sinusoidal
components and one spread (chirp) component can be noticed. For the noisy signal, the noise
almost completely degrades the chirp component in the periodogram with the Hann(ing) window,
Fig. 7.31(b). The visibility of this component is significantly improved by smoothing the
periodogram as in the Daniell (Blackman-Tukey smoothed) estimator, given in Fig. 7.31(c).
In this case, the highly concentrated sinusoidal components are spread as well. Combining the
Fourier transform values in the counter-direction, the S-method based spectral estimation is
obtained. This estimator preserves a high concentration of the sinusoidal components while
improving the concentration of the chirp signal, as shown in Fig. 7.31(d). The S-method based
spectral estimator of the noise-free signal is given in Fig. 7.31(e).
Bartlett Method and Welch periodogram. The Fourier transform of the signal x (n), whose duration
is N, is calculated here over K shorter intervals. The duration of these intervals is M, commonly with
the step R = M (Daniell periodogram) or R = M/2 (Welch periodogram, in this case a window can
also be used). The Fourier transforms of x (n), within these shorter intervals, are
1 M −1
Xi (e jω ) = x (iR + n)e− jωn
M n∑
=0
1 K −1 jω
2
PxS (ω ) =
K i∑
X i ( e ) .
=0
For a numeric illustration of the Welch periodogram calculation see Example 7.52.
Ljubiša Stanković Digital Signal Processing 379
20
15
10
0 (a)
-100 -50 0 50 100
20
15
10
0 (b)
-100 -50 0 50 100
30
20
10
0 (c)
-100 -50 0 50 100
30
20
10
0 (d)
-100 -50 0 50 100
30
20
10
0 (e)
-100 -50 0 50 100
Figure 7.31 Spectral analysis of the random noisy signal, x (n), using the periodogram, the Daniell (Blackman-
Tukey smoothed) estimator, and the S-method based spectral estimator. Order and the description of the panels
correspond to the task order in Example 7.47.
380 Discrete-Time Random Signals
Consider a set of data x (n), for 0 ≤ n ≤ N − 1. Assume that this set of data are noisy samples of the
signal
s(n) = Ae j2πk0 n/N .
The additive noise ε(n) is white, complex-valued Gaussian, with zero-mean and independent real and
imaginary parts. The variance of noise is σε2 . The aim is to find the signal s(n) parameters from the
noisy observations x (n). Since the signal form is known we look for a solution of the same form, using
the model be j2πkn/N where b and k are parameters that have to determined, and
α = {b, k }
is the set of these parameters. The parameter b is complex-valued. It includes the amplitude and the
initial phase of the signal model. For every value of x (n) we may define an error as a difference of the
true value x (n) and the assumed model, at the considered instant n,
Since the noise is Gaussian, the probability density function of the error is
1 2 2
p(e(n, α)) = √ e−|e(n,α)| /(2σε ) .
σε 2π
The joint probability density function, for all signal samples from the data set, is equal to the product
of the individual probability density functions
1 N −1 2 2
pe (e(0, α), e(1, α), . . . , e( N − 1, α)) = 2 N/2
e− ∑n=0 |e(n,α)| /(2σε ) .
(2πσε )
The maximum-likelihood solution for the parameters α = {b, k} in obtained by maximizing the
probability density function for given values of x (n). Maximization of pe (e(0, α), e(1, α), . . . , e( N −
1, α)) is the same as the minimization of the total squared error,
N −1 N −1 2
ǫ(α) = ∑ |e(n, α)|2 = ∑ x (n) − be j2πkn/N . (7.117)
n =0 n =0
The solution to this problem is obtained from ∂ǫ(α)/∂b∗ = 0 (see Example 1.3). It is in the form of a
standard DFT of signal x (n),
1 N −1 n o 1
b= ∑ x (n)e− j2πkn/N = mean x (n)e− j2πkn/N = X (k).
N n =0 N
A specific value of parameter k, that minimizes ǫ(α) and gives the estimate of the signal frequency
index k0 , is obtained by replacing the obtained b back into relation (7.117), defining ǫ(α),
!
N −1 N −1
ǫ(α) = ∑ | x (n) − be j2πkn/N |2 = ∑ | x (n)|2 − N | b |2 .
n =0 n =0
The minimum value of ǫ(α) is achieved when |b|2 (or | X (k)|2 ) is maximum,
s(n) = Ae jω0 n .
Assuming the solution in the form be jωn , the Fourier transform of discrete-time signals would follow.
If the additive noise were, for example, impulsive with the Laplacian distribution, then the
probability density function would be
1 −|e(n,α)|/σε
p(e(n, α)) = e ,
2σε
and the solution to ǫ(α) = ∑nN=−01 |e(n, α)| minimization would follow from
n o
X (k) = N median x (n)e− j2πkn/N .
n=0,1,...,N −1
Example 7.48. The DFT definition, for a given frequency index k, can be understood as
N −1
X (k) = ∑ (s(n) + ε(n))e− j2πkn/N
n =0
n o
=N mean (s(n) + ε(n))e− j2πkn/N (7.118)
n=0,1,...,N −1
can produce better results than (7.118). Calculate the value of X (0) using (7.118) and estimate its
value by (7.119) for the signal
s(n) = exp( j4πn/N )
with N = 8, and the additive noise
⋆If a strong impulsive noise is expected in the signal, then the mean value will be highly
sensitive to this kind of noise. As it is stated, the median based calculation is less sensitive to
strong impulsive disturbances. For the signal
Obviously the median-based estimate is not influenced by this impulsive noise. In this case it
produced better estimate (the exact value) of the considered noise-free DFT element X (0).
Now we will analyze the signal frequency estimation for a single component sinusoidal signal
s(n), with unknown discrete frequency ω0 = 2πk0 /N using the DFT. Since the signal frequency is
assumed on the frequency grid, this case can be understood as the signal frequency position detection.
Available observations of the signal are
where ε(n) is a complex zero-mean i.i.d. Gaussian white noise, with variance σε2 . Its DFT is
N −1
X (k) = ∑ (s(n) + ε(n))e− j2πkn/N = N Aδ(k − k0 ) + Ξ(k),
n =0
with σX2 ( k ) = σ2 N and E{ Ξ ( k )} = 0. The real and imaginary parts of the DFT X ( k ), at the signal
ε 0
position k = k0 , are the Gaussian random variables, with the total variance σε2 N, or
Next, we will find the probability that a DFT value of noise at any k 6= k0 is higher than the
signal DFT value at k = k0 . This case corresponds to a false detection of the signal frequency position,
resulting in an arbitrary large and uniform estimation error (within the considered frequency range).
The probability density function for the absolute DFT values outside the signal frequency, k 6= k0 ,
is Rayleigh-distributed (7.84)
2ξ −ξ 2 /(σε2 N )
q(ξ ) = e , ξ ≥ 0.
σε2 N
The DFT at a noise only position takes a value greater than χ, with probability
Z∞
2ξ −ξ 2 /(σε2 N ) χ2
Q(χ) = 2
e dξ = exp(− 2 ). (7.122)
σε N σε N
χ
The probability that a DFT of noise only is lower than χ is [1 − Q(χ)]. The total number of noise only
points in the DFT is M = N − 1. The probability that M independent DFT noise only values are lower
Ljubiša Stanković Digital Signal Processing 383
than χ is [1 − Q(χ)] M . Probability that at least one of M DFT noise only values is greater than χ, is
The probability density function for the absolute DFT values at the position of the signal (whose
real and imaginary parts are described by (7.121)) is Rice-distributed
2ξ −(ξ 2 + N 2 A2 )/(σε2 N )
p(ξ ) = e I0 (2N Aξ/(σε2 N )), ξ ≥ 0, (7.124)
σε2 N
where I0 (ξ ) is the zero-order modified Bessel function (for A = 0, when I0 (0) = 1 the Rayleigh
distribution is obtained).
When a noise only DFT value surpasses the DFT signal value, then an error in the estimation
occurs. To calculate this probability, consider the absolute DFT value of a signal at and around ξ. The
DFT value at the signal position is within ξ and ξ + dξ with the probability p(ξ )dξ , where p(ξ )
is defined by (7.124). The probability that at least one of M DFT noise only values is above ξ in
amplitude, is equal to
G (ξ ) = 1 − [1 − Q(ξ )] M .
Thus, the probability that the absolute value of the DFT of the signal component is within ξ and ξ + dξ
and that at least one of the absolute DFT noise only values exceeds the DFT signal value is equal to
G (ξ ) p(ξ )dξ. Considering all possible values of ξ, from (7.122) and (7.123), the probability of the
wrong signal frequency detection follows as
Z∞ Z∞ M !
ξ2
PE = G (ξ ) p(ξ )dξ = 1 − 1 − exp(− 2 )
σε N
0 0
2ξ 2 2 2 2
× 2 e−(ξ + N A )/(σε N ) I0 (2N Aξ/(σε2 N ))dξ. (7.125)
σε N
An approximation of this expression can be obtained by assuming that the DFT of the signal
component is not random and that it is equal to N A (positioned at the mean value of the signals DFT),
M
N A2
PE ∼
= 1 − 1 − exp(− 2 ) . (7.126)
σε N
Analysis can easily be generalized to the case with K signal components, s(n) = ∑kK=1 Ak e jωk n .
In many cases, the discrete frequency of the deterministic signal does not satisfy the relation
ω0 = 2πk0 /N, where k0 is an integer. In these cases, when ω0 6= 2πk0 /N, the frequency estimation
result can be improved , for example, by zero-padding before the Fourier transform calculation or using
finer grid around the detected maximum. Comments on the estimation of signal frequency outside the
grid are given in Chapter III as well.
If a random signal x (n) passes through a linear time-invariant system, with an impulse response h(n),
then the expected value of the output signal y(n) is given by
∞ ∞
µy (n) = E{y(n)} = ∑ h(k)E{ x (n − k)} = ∑ h(k)µ x (n − k) = h(n) ∗n µ x (n). (7.127)
k =−∞ k=−∞
384 Discrete-Time Random Signals
The cross-correlation of the input signal, x (n), and the output signal, y(n), is obtained as the
convolution of the input signal autocorrelation and the impulse response, h(n). The z-transform
of both sides of this equation gives
For a stationary signal, with n − m = l and n − k = p, we get that the cross-correlation of the input
signal and the output signal is equal to the convolution of the input signal autocorrelation and the
reversed and conjugated impulse response, that is
∞
r xy (l ) = ∑ r xx ( p)h∗ ( p − l ) = r xx (l ) ∗l h∗ (−l ).
p=−∞
The z-transform of both sides maps this equation into the z-transform domain
∞ ∞ ∞ ∞ ∞ −k
∑ r xy (l )z−l = ∑ ∑ r xx ( p)h∗ ( p − l )z−l = ∑ ∑ r xx ( p)h∗ (k)z− p z−1
l =−∞ l =−∞ p=−∞ k=−∞ p=−∞
1
R xy (z) = R xx (z) H ∗ ( ∗ ).
z
The Fourier transform form the last equation is of the form
After some straightforward transformations, we get the z-transform domain relation between the
autocorrelations of the stationary input and the output signal
1
Ryy (z) = R xx (z) H (z) H ∗ ( ∗ ).
z
The Fourier transform of output signal autocorrelation function in terms of the Fourier transform of the
input signal autocorrelation function and the system frequency response is given by
2
Syy (e jω ) = Sxx (e jω ) H (e jω ) , (7.132)
proving that Sxx (e jω ) is indeed the power density function. By taking a narrow-pass filter with the unit
2
amplitude H (e jω ) = 1 for ω0 ≤ ω < ω0 + dω, we will get the spectral density of the signal x (n)
within that small frequency range.
The input signal is a zero-mean white noise ε(n) with the variance σε2 . Find the cross-correlation
of the input signal and the output signal and the autocorrelation of the output signal. For a = −1
find the power spectral density of the output signal.
Since the input signal is a white noise of variance σε2 its autocorrelation is, by definition,
The power spectral density of the input signal is obtained as the Fourier transform of the
autocorrelation function, that is
∞
Sxx (ω ) = ∑ r xx (n)e− jωn = σε2 .
n=−∞
The z-transform of the autocorrelation function of the output signal, for the linear time-invariant
system, is equal to
The autocorrelation function of the output signal is equal to the inverse z-transform of Ryy (z),
while the z-transform of the cross-correlation of the input and output signal is
Its inverse z-transform of Ryx (z) is equal to the cross-correlation, ryx (n),
with the random input signal x (n) = ε(n), µε = 0 and rεε (n) = δ(n), find:
(a) The expected value µy (n) and the autocorrelation ryy (n) of the output signal,
(b) The power spectral density functions Syy (ω ) and Syx (ω ).
µy = µ x H (e j0 ) = µε H (e j0 ) = 0.
R xx (z) = 1.
The transfer function of the considered system has the following form
1 1
H (z) = = .
1 − 1.3z−1 + 0.36z−2 (1 − 0.9z−1 )(1 − 0.4z−1 )
Ljubiša Stanković Digital Signal Processing 387
while the cross-power spectral density function Syx (ω ) is equal to Ryx (z) at z = e jω ,
Example 7.51. The white noise ε(n) with variance σε2 and zero mean is an input to a linear time-
invariant system. If the impulse response of the system is h(n) show that
E { x (n)y(n)} = h(0)σε2
and
∞
σy2 = σε2 ∑ |h(n)|2 = σε2 Eh ,
n=−∞
where y(n) is the output of this system.
⋆The expected value of the product of the input signal and the output signal is
( )
∞
E { x (n)y(n)} = E ∑ h(k) x (n) x (n − k) .
k =−∞
producing
∞
E { x (n)y(n)} = ∑ h(k)σε2 δ(k) = h(0)σε2 .
k=−∞
The variance of output signal is defined by
or ( )
∞ ∞
σy2 =E ∑ h(k) x (n − k) ∑ h∗ (k ) x ∗ (n − k )
( k=−∞ ) k(=−∞ )
∞ ∞
−E ∑ h(k) x (n − k) E ∑ h∗ (k ) x ∗ (n − k ) .
k=−∞ k=−∞
The output signal is a zero-mean signal, since
∞
E {y(n)} = E {y∗ (n)} = ∑ h(k)E { x (n − k)} = 0.
k =−∞
Since r xx (n) = σε2 δ(n) , that is, r xx (l − k) = σε2 δ(l − k) , only the terms with l = k remain in
the double summation expression for the variance σy2 , producing
∞
σy2 = σε2 ∑ |h(k)|2 = σε2 Eh .
k=−∞
A narrowband random signal with Np components around the frequencies ω1 , ω2 , and ω Np can be
considered, from a spectral point of view, as an output of the system whose transfer function is
G
H (z) =
(1 − r1 e jω1 z−1 )(1 − r2 e jω2 z−1 ) . . . (1 − r Np e jω Np z−1 )
G
= .
1 + a 1 z −1 + a 2 z −2 + · · · + a Np z − Np
when the input is a white noise. The amplitudes of the poles ri , i = 1, 2, . . . , Np , are inside (and close
to) the unit circle. The discrete-time domain description of this system is
where x (n) is a white noise with variance σx2 = 1, the autocorrelation r xx (k) = δ(k), and the spectral
energy density Sxx (ω ) = 1. For a given narrowband random signal y(n), the task is to find coefficients
ai and G.
Ljubiša Stanković Digital Signal Processing 389
The autocorrelation of the real-valued output signal is obtained after the multiplication of the
difference equation by y(n − k),
For k = 0, follows
For k > 0 and a causal system, we may find that r xy (k) = h(−k) = 0. It is also clear from (7.133)
that x (n) is related to y(n) and that any y(n − k), for k > 0, does not include x (n), meaning that
E{ x (n)y(n − k)} = 0, and
The previous equations are known as the Yule-Walk equations. The matrix form of this system of
equations is
2
1 G
ryy (0) ryy (1) ... ryy ( Np ) a1 0
ryy (1) ryy (0)
. . . ryy ( Np − 1)
a2 = 0 (7.134)
... ... ... ... .
... ...
ryy ( Np ) ryy ( Np − 1) . . . ryy (0)
a Np 0
The system is solved for the unknown system coefficients [ a0 , a1 , a2 ,. . . , a Np ] with G = 1. Then, the
coefficients are normalized as [ a0 , a1 , a2 ,. . . , a Np ]/a0 with G = 1/a0 . The spectral energy density of
y(n) follows with Sxx (ω ) = 1 as
2
G
Syy (ω ) = − − − jN ω
. (7.135)
1 + a1 e jω + a2 e j2ω + · · · + a Np e p
1 N −1− k
for 0 ≤ k ≤ N − 1, (7.136)
N − k n∑
ryy (k) = y(n + k)y(n)
=0
and ryy (k) = ryy (−k) for − N + 1 ≤ k < 0. These values are then used in (7.134) for the autoregressive
spectral estimation.
Next, we will comment the estimated autocorrelation within the basic definition of the power
spectral density framework, Section 7.3.4. Relation (7.136) corresponds to the unbiased estimation
of the autocorrelation function. The power spectral density, according to (7.32), is calculated as
Syy (ω ) = FT{ryy (k)}.
Since the autocorrelation estimates for a large k use only a small number of signal samples
in averaging, they are not reliable. It is common to apply a triangular (Bartlett) window function
390 Discrete-Time Random Signals
(w(k) = ( N − |k|)/N) to reduce the weight of these estimates in the Fourier transform calculation
1 N −1− k 1 N −1− k
∑ (7.137)
N n∑
w(k)ryy (k) = w(k) y(n + k)y(n) = y(n + k)y(n)
N − k n =0 =0
for 0 ≤ k ≤ N − 1. Since the window is used, this autocorrelation function estimate is biased. The
Fourier transform of the biased autocorrelation function w(k)ryy (k) = (1 − |k|/N )ryy (k) is the power
spectral density Pyy (ω ) = FT{(1 − |k|/N )ryy (k)} defined by (7.34).
within 0 ≤ n ≤ N − 1 = 127, where ϕ1 and ϕ2 are random variables uniformly distributed from
−1/2 rad to 1/2 rad, while ε(n) is the zero-mean, unit-variance Gaussian noise. Plot the power
spectral density calculated using:
(a) The Fourier transform of ryy (k)
N −1
Syy (ω ) = FT{ryy (k)} = ∑ ryy (k)e− jωk .
k =− N +1
1 N − k −1
N − k i∑
ryy (±k) = x ( k + i ) x ( i ).
=0
(c) The power spectrum in (b) corresponds to FT{w B (k)ryy (k)}, where w B (k) is the
Bartlett window whose width is equal to the width of the autocorrelation function ryy .
(d) The Fourier transform of the signal y(n) over K = 7 shorter intervals. The duration of
these intervals is M = 32, with the step R = M/2. The Fourier transforms of y(n), within these
shorter intervals, are
1 M −1
Yi (e jω ) = y(iR + n)e− jωn
M n∑=0
for i = 0, 1, . . . , 6. The power spectral densities |Yi (e jω )|2 are averaged to produce (Welch
periodogram)
A 1 K −1 2
Syy (ω ) = ∑ Yi (e jω ) .
K i =0
(e) Relation (7.135) with appropriately estimated coefficients ai and G, along with the
relations (7.134) and (7.136).
⋆The results are shown in Fig. 7.32, in order from (a) to (e).
Ljubiša Stanković Digital Signal Processing 391
0.5
0
noise-free spectrum
-0.5
-3 -2 -1 0 1 2 3
0.5
0
(a)
-0.5
-3 -2 -1 0 1 2 3
0.5
0
(b)
-0.5
-3 -2 -1 0 1 2 3
0.5
0
(c)
-0.5
-3 -2 -1 0 1 2 3
0.5
0
(d)
-0.5
-3 -2 -1 0 1 2 3
0.5
0
(e)
-0.5
-3 -2 -1 0 1 2 3
Figure 7.32 Spectral analysis of a signal with random phases (normalized values). Order and the description of
the panels correspond to the task order in Example 7.52.
392 Discrete-Time Random Signals
Detection of an unknown deterministic signal in a high noise environment is of crucial interest in many
real-world applications. In this case the problem is in testing the hypothesis
y ( n ) = y s ( n ) + y ε ( n ),
where ys (n) and yε (n) are the system outputs to the input signals s(n) and ε(n), respectively. For the
output signal ys (n) holds
Ys (e jω ) = H (e jω )S(e jω ).
The power spectral density of ys (n) is equal to
2 2 2
Ys (e jω ) = H (e jω ) S(e jω ) .
The aim is to maximize the output signal at an instant n0 if the input signal contains s(n). According
to Schwartz’s inequality (for its discrete form see Part VI)
2
π Zπ Zπ
1 Z jω 2
2
H ( e jω
) S ( e jω
) e jωn 0
dω ≤ 1 S ( e ) dω
1
H (e jω ) dω,
2π 2π 2π
−π −π −π
Ljubiša Stanković Digital Signal Processing 393
This ratio reaches its maximum when the equality sign holds
Zπ
1 jω 2 Es
PSNRmax = S(e ) dω = 2 .
2πσε2 σε
−π
This system is called matched filter. Its impulse response is matched to the signal form. The
matched filter maximizes the ratio of the output signal and the noise and is used in the detection, to
make a decision if the known signal s(n) exists in the noisy signal x (n).
If the additive noise is Gaussian distributed then the null hypothesis (there is no deterministic
signal in the input signal) rejection region, for significance level α = 0.001 is
λ
Probability{|y(n0 )| > λ} = 1 − erf( √ ) < 0.001,
2σy
|y(n0 )| > 3.2905σy
For the application, where an order of 1000 nonzero samples is expected in the output signal, the
significance level must be small. Otherwise, we will have many false positive results.
Example 7.53. The matched filter is illustrated on the detection of the chirp signal
2
s(n) = e−2(n/128) cos(8π (n/128)2 + πn/8)
in a Gaussian white noise of the variance σε2 = 1. The output of the matched filter is calculated
for n0 = 0 using the known signal,
∞
y(n) = x (n) ∗n s(−n) = ∑ (s(m) + ε(m))s(m − n).
m=−∞
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -1 0 1 2 -200 -100 0 100 200
5 5
0 0
-5 -5
-250 -100 0 100 250 -250 -100 0 100 250
80 80
60 60
40 40
20 20
0 0
-20 -20
-40 -40
-500 -250 0 250 500 -500 -250 0 250 500
Figure 7.33 Illustration of the matched filter: Signal s(t), s(n). Input noisy signal x (n) = s(n) + ε(n), contains
the signal s(n). Input signal x (n) = ε(n) does not contain the signal s(n). Corresponding outputs from the matched
filter y(n) = x (n) ∗ s(−n) are presented bellow the input signal panels. The null hypothesis rejection region is
shaded.
Two cases are shown in Fig. 7.33: (1) When the input signal contains s(n) and (2) when the
input signal does not contain s(n). We can see that the output of the matched filter has an easily
detectable peak at n = 0 for the case then the input signal contains s(n). There is no such a peak
in y(n) when the input signal x (n) is noise only. Therefore, the null hypothesis can be rejected
Ljubiša Stanković Digital Signal Processing 395
in the case presented in the left panels, while it can not be rejected for the case shown in the right
panels.
Consider a signal s(n) that can take one of two constant values s(n) = A1 or s(n) = A2 , corrupted by
an additive random noise ε(n):
(1) x (n) = A1 + ε(n) or
(2) x (n) = A2 + ε(n).
Assume that the probabilities of these two signal states are P( A1 ) and P( A2 ), such that
P( A1 ) + P( A2 ) = 1. In this experiment, a value of the signal x (n) = y is observed and the question
is which of these two hypothesis is true:
where p(y)dy is the probability that y takes a specific value within [y, y + dy). This relation can be
written as
p ( y | A1 ) P ( A1 )
P ( A1 | y ) = .
p(y)
Similarly, Bayes’ formula for the state x (n) = A2 + ε(n) produces
p ( y | A2 ) P ( A2 )
P ( A2 | y ) = .
p(y)
Since P( A1 |y) is the probability of A1 if y occurred and P( A2 |y) is the probability of A2 if the same
y occurred, then a criterion to state that the hypothesis H1 is true can be defined by
P ( A1 | y ) > P ( A2 | y ).
For the Gaussian probability density function of the disturbance ε(n), the signal x (n) in the case
x (n) = A1 + ε(n) is distributed as
1 2 2
pξ (ξ | A1 ) = √ e−(ξ − A1 ) /(2σx ) .
σ 2π
For x (n) = A1 + ε(n) the probability density is
1 2 2
pξ (ξ | A2 ) = √ e−(ξ − A2 ) /(2σx ) .
σ 2π
The decision threshold is then obtained from
2 2 2
e−((d− A1 ) −(d− A2 ) )/(2σx ) = P( A2 )/P( A1 )
P ( A2 )
2d( A1 − A2 ) − A21 + A22 = 2σx2 ln( ).
P ( A1 )
The threshold value, for the Gaussian distribution, is
P( A )
σx2 ln( P( A2 ) ) A1 + A2
1
d= + .
A1 − A2 2
Example 7.54. Consider the random signal with two possible forms: (1) x (n) = A1 + ε(n) = 1 + ε(n)
or (2) x (n) = A2 + ε(n) = −1 + ε(n), where ε(n) is the zero-mean Gaussian distributed
random variable with variance σε2 = 0.5. Assume that the probabilities of these two states are
P( A1 ) = 1/3 and P( A2 ) = 2/3. Find the decision threshold d.
0.6
0.4
0.2
0
-3 -2 -1 0 1 2 3
Assume that the input signal is x (n) and that it contains an information about the desired signal
d(n). The input signal is processed by a system whose impulse response is h(n). The output signal is
y(n) = h(n) ∗n x (n). The task here is to find the impulse response h(n) of the system such that the
difference of the desired signal and the output signal, denoted as the error
e ( n ) = d ( n ) − y ( n ),
This relation states that expected value of the product of the error signal e(n) = d(n) − y(n) and the
input signal x ∗ (n − k) is zero
E {2e(n) x ∗ (n − k)} = 0
for any k. For signals satisfying this relation we say that they are normal to each other.
Relation (7.143) can be written as
( )
∞
E ∑ h(m) x (n − m) x ∗ (n − k) = E {d(n) x ∗ (n − k)}
m=−∞
or
∞
∑ h(m)r xx (k − m) = rdx (k).
m=−∞
398 Discrete-Time Random Signals
Rdx (z)
H (z) = .
R xx (z)
For a special case, when the input signal is the desired signal d(n) with an additive noise
x ( n ) = d ( n ) + ε ( n ),
where ε(n) is uncorrelated with the desired signal, the optimal (Wiener) filtering relation follows
Rdd (z)
H (z) =
Rdd (z) + Rεε (z)
since
Here we used E {d(n)ε∗ (n − k)} = 0, since d(n) and ε(n) are uncorrelated. Also
Sdd (ω )
H (e jω ) = .
Sdd (ω ) + Sεε (ω )
Example 7.55. A signal x (n) = d(n) + ε(n) is processed by an optimal filter. Power spectral density
of d(n) is Sdd (ω ). If the signal d(n) and the additive noise ε(n), whose power spectral density
is Sεε (ω ), are independent find the output signal-to-noise ratio.
The optimal prediction system follows with the input signal x (n) = d(n − 1) + ε(n − 1) and
the desired signal d(n). Transfer function of the optimal predictor is obtained from
and
as
zSdd (z)
H (z) =
Sdd (z) + Sεε (z)
since
∞ ∞
∑ rdd (k + 1)z−k = ∑ rdd (k)z−k+1 = zSdd (z).
k =−∞ k =−∞
The optimal smoothing is the case when the desired signal is d(n) and we can use its future
value(s). This processing follows with x (n) = d(n + 1) + ε(n + 1) as
Example 7.56. The input signal is x (n) = s(n) + ε(n), where d(n) = s(n) is the desired signal and
ε(n) is a noise. If the autocorrelation functions of the signal and noise are rss (n) = 4−|n| and
rεε (n) = 2δ(n), respectively, and the cross-correlation of the signal and noise is rsε (n) = δ(n),
design the optimal filter.
where are
Rdx (z) = Rss (z) + Rsε (z) and R xx (z) = Rss (z) + 2Rsε (z) + Rεε (z).
The optimal systems realization using the FIR filters will be presented within the introductory
part of the chapter dealing with adaptive discrete systems in Part III.
400 Discrete-Time Random Signals
In order to process analog signals using computers they have to be converted into numbers stored
into registers of a finite precision. Continuous-time signals are transformed into digital signals using
analog-to-digital (A/D) converters. This operation is done in two steps. First, the continuous-time signal
is converted into a discrete-time signal by taking samples of the continuous-time signal at discrete-time
instants (sampling)
x (n) = x (n∆t)∆t.
Next, the discrete-time signal, with continuous amplitudes of samples, is converted into a digital signal
xQ (n) = Q[ x (n)]
with discrete-valued amplitudes (quantization). This process is illustrated in Fig. 7.35. The error caused
by the quantization of the discrete-time signal amplitudes is called the quantization noise.
8 1000
7 0111
0.4 0.4 6 0110
5 0101
4 0100
0.2 0.2 3 0011
2 0010
1 0001
0 0 0000
0 5 10 15 0 5 10 15 0 5 10 15
t n n
Figure 7.35 Illustration of a continuous signal and its discrete-time and digital version.
For registers with b bits the digital signal values xQ (n) are coded into binary format. Assume that the
registers with b bits are used and that all input signals are normalized to the range
0 ≤ x (n) < 1.
a −1 a −2 a −3 . . . a − b .
10110010
The quantization error is a difference in the amplitude of the original signal and the quantized
signal
e ( n ) = x ( n ) − x Q ( n ).
402 Discrete-Time Random Signals
When the rounding approach is used, the maximum absolute error can be a half of the last digit weight,
1 1 −b
− 2− b ≤ x ( n ) − x Q ( n ) < 2
2 2
1 1
− ∆ ≤ x (n) − xQ (n) < ∆
2 2
where
∆ = 2− b .
We can also write |e(n)| ≤ 2−(b+1) = 1
2 ∆.
In the example from Fig. 7.35, the quantization step is 2−4 = 1/16 and the error is within
|e(n)| ≤ 12 16
1
.
The error values are equally probable within the defined interval, with the probability density
function 1
for − 21 ∆ ≤ ξ < 12 ∆
pe (ξ ) = ∆ ,
0 elsewhere.
The quantization error of the signal x (n) may be described as an additive uniform white noise.
The expected value of the quantization error, with the rounding approach, is
∆/2
Z
µe = E{e(n)} = ξ pe (ξ )dξ = 0.
−∆/2
∆
µe = E{e(n)} =
2
and the variance
Z∆
1 ∆ 1
σe2 = (ξ − )2 dξ = ∆2 .
∆ 2 12
0
Example 7.58. The DFT of a signal x (n) is calculated using the quantized version of this signal
Quantization is done by an A/D converter with b + 1 = 8 bits, using rounding. The DFT is
calculated on a high precision computer with N = 1024 signal samples. Find the expected value
and variance of the calculated DFT.
Ljubiša Stanković Digital Signal Processing 403
The variance is
N −1 N −1
1 −14 10 1
2
σX (k) = ∑ ∑ σe2 δ(n1 − n2 )e− j2πk(n1 −n2 )/N = σe2 N = 2 2 = .
Q
n1 =0 n2 =0
12 192
The noise in the DFT is a sum of many independent noises from the input signal and coefficients.
Thus, it is Gaussian distributed with standard deviation σXQ = 0.072. It may significantly influence
the signal DFT values, especially if they are not well concentrated or if there are signal components
with small amplitudes.
Example 7.59. How the input signal quantization error influences the results of:
(a) the weighted sum
N −1
Xs = ∑ an x (n) and
n =0
(b) the product
N −1
XP = ∏ x ( n ).
n =0
⋆If the quantized values xQ (n) = Q[ x (n)] = x (n) + e(n) of the signal x (n) are used in
calculation instead of the true signal values then:
(a) The estimator of a weighted sum is
N −1 N −1 N −1
X̂s = ∑ an x Q (n) = ∑ an x (n) + ∑ a n e ( n ).
n =0 n =0 n =0
Assuming that the individual errors are small so that all the higher-order error terms, containing
the error products, for example, e(n)e(m) or e(n)e(m)e(l ), could be neglected we get
N −1 N −1 N −1
X̂P ∼
= ∏ x ( n ) + ∑ ∏ x ( n ) e ( m ).
n =0 m =0 n =0
n6=m
The expected value is zero if the rounding is used. The variance is signal-dependent,
N −1 N −1 N −1 N −1
2 1
σX p
= ∑ ∏ x2 (n)var{e(n)} = 12 ∆2 ∑ ∏ x2 (n).
m =0 n =0 m =0 n =0
n6=m n6=m
In the quantization of results, after basic arithmetic operations are performed, we can distinguish two
cases. The first case is when the fixed-point arithmetic is used. The register here assumes that the
decimal point is positioned at the fixed place. All data are written with respect to this assumed decimal
point position. In the floating-point arithmetic, the numbers are written in the sign-mantissa-exponent
format. The quantization error is then produced on mantissa only.
Fixed point arithmetic assumes that the decimal point position is at a fixed place. The common
assumption is that all input values and the intermediate results, in this case, are normalized so that
0 ≤ x (n) < 1 or −1 < x (n) < 1, if the sign bit is used.
Ljubiša Stanković Digital Signal Processing 405
xQ (n) xQ (m)
will, in general, produce a result of 2b digits. It should be quantized in the same way as the input signal
where e(n, m) is the quantization error satisfying all the previous properties with
1 1
− ∆ ≤ e(m, n) ≤ ∆.
2 2
Example 7.60. Find the expected value of the quantization error for
N −1
r (n) = ∑ x (n + m) x (n − m)
m =0
where x (n) is quantized and the product of signals is quantized to b bits as well. Assume that the
signal values are such that their additions will not cause overflow.
In the case when complex-valued numbers are used in calculation, the quantization of the real
part and the imaginary part is done separately,
xQ (n) = Q[ x (n)] = Q[Re{ x (n)} + j Im{ Q[ x (n)]}] = x (n) + er (n) + jei (n).
406 Discrete-Time Random Signals
Since the real and imaginary part are independent, with the same variance, the variance of the
quantization error for a complex-valued signal is given by
1 2 1 2
σe2 = 2 ∆ = ∆ .
12 6
For the additions the variance is doubled as well.
In case of multiplications one complex-valued multiplication requires four real-valued multi-
plications, introducing four errors. The quantization variance of a complex-valued multiplication
is
1 1
σe2 = 4 ∆2 = ∆2 .
12 3
If the values of a signal x (n) are not small we have to ensure that no overflow occurs during
the calculations using the fixed-point arithmetic. Consider a real-valued random white signal whose
samples are within −1 < x (n) < 1, with the variance σx2 . The registers of b + 1 bits are assumed, with
one bit being used for the sign. As an example consider the expected value calculation
1 N −1
x ( n ).
N n∑
XN =
=0
We have to be sure that an overflow will not occur during the expected value calculation. All sums
should stay within the interval (−1, 1).
One approach to calculate X N is in dividing the input signal values by N and summing them,
that is
x (0) x (1) x ( N − 1)
XN = + + ··· + .
N N N
Then we are sure that no result will be outside the interval (−1, 1). Division of the signal samples by
N introduces an additive quantization noise,
x (0) x (1) x ( N − 1)
X̂ N = + e (0) + + e (1) + · · · + + e ( N − 1).
N N N
Variance of the equivalent noise e(0) + e(1) + · · · + e( N − 1) is
1 2 1 −2b
σe2 = ∆ N= 2 N.
12 12
Since the variance of x (n)/N is σx2 /N 2 , the variance of X̂ N is
2 σx2 1
σX =N + ∆2 N.
N
N2 12
Ratio of the variances corresponding to the signal and the noise in the result is
σ2
N Nx2 1 σx2 1 σx2
SNR = = = 2
1 2
12 ∆ N
N 2 12
1 2
∆ N 1 −2b
12 2
or in [dB]
1 σx2
SNR = 10 log( ) = 20 log σx − 20 log N − 20 log 2−b + 10 log(12)
N2 1 −2b
12 2
log2 N log 2−b
= 20 log σx − 20 − 20 2 + 10.8 = 20 log σx − 6.02(m − b) + 10.8,
log2 10 log2 10
Ljubiša Stanković Digital Signal Processing 407
where N = 2m . Obviously, by increasing the number of samples N to 2N will keep the same SNR if b
is increased for one bit, since (m + 1 − (b + 1)) = m − b.
Another way to calculate the mean is in performing the summation step by step, according to the
presented scheme, for N = 8,
x (0) x (1) x (2) x (3) x (4) x (5) x (6) x (7)
2 + 2 2 + 2 2 + 2 2 + 2
+ +
XN = 2 2 + 2 2 .
2 2
Here, two adjunct signal values x (n) are divided by 1/2 first. They are added then, avoiding
possible overflows. The error in one step is
x (n) x ( n + 1) x ( n ) + x ( n + 1) (2)
+ e(n) + + e ( n + 1) = + en .
2 2 2
The error
(2)
e n = e ( n ) + e ( n + 1)
has the variance n o
(2) 1 1 1
var en = ∆2 + ∆2 = ∆2 .
12 12 6
After every division by 2, the result is shifted in the register to the right and a quantization error is
created. Thus, the error model, due to the addition quantization, is
x (0) x (1) (2) x (2) x (3) (2) x (4) x (5) (2) x (6) x (7) (2)
2 + 2 + e0 2 + 2 + e2 (4) 2 + 2 + e4 2 + 2 + e6 (4)
2 + 2 + e0 2 + 2 + e4 (8)
X̂ N = + + e0
2 2
x (0) x (1) x ( N − 1)
= + + ··· +
N N N
(2) (2) (2)
e0 e e
+ + 2 + · · · + N −2
N/2 N/2 N/2
(4) (4)
e0 e
+ + · · · + N −4
N/4 N/4
..
.
(N)
e0
+ . (7.144)
N/N
The variance of all quantization noises is the same σe2 = 61 ∆2 = 61 2−2b . Notice that the noises in the
first stage are divided by N/2, due to divisions by 2 in the next stages of summation. Their variance is
reduced for N 2 /4. The value of the variance of errors in these stages is
(2) (2) (2)
e0 e e 1 1 N 1 2
var{ + 2 + · · · + N −2 } = ∆2 2 = ∆2
N/2 N/2 N/2 6 N /4 2 6 N
(4) (4)
e0 e 1 1 N 1 4
var{ + · · · + N −4 } = ∆2 2 = ∆2
N/4 N/4 6 N /16 4 6 N
..
.
(N)
e0 1 1 N 1 N 1 2m
var{ } = ∆2 2 2 = ∆2 = ∆2 .
N/N 6 N /N N 6 N 6 N
408 Discrete-Time Random Signals
2 σx2 1 2 1 4 1 2m
σX =N 2
+ ∆2 + ∆2 + · · · + ∆2 (7.145)
N
N 6 N 6 N 6 N
σx2 1 22 m −1 σx2 1 2 1 − 2m
= + ∆ (1 + 2 + · · · + 2 )= + ∆2
N 6 N N 6 N 1−2
σx2 1 22 σx2 1 2 1
= + ∆ (N − 1 ) = + ∆ (1 − ).
N 6 N N 3 N
σx2
Ratio of the variances N and 31 ∆2 (1 − 1
N ), corresponding to the output signal-to-noise ratio, is
σx2
σx2 ∼ 1 σx2
SNR = 1 2
N
1
= 1 2 = = 3σx2 22(b−m/2) .
3 ∆ (1 − N ). 3 ∆ ( N − 1)
N 13 2−2b
Significant improvement (for an order of N) is obtained using this scheme for the summation, instead
of the direct one. In dB the ratio is
m
SNR ∼ = 10 log 3σx2 22(b−m/2) = 20 log σx − 6.02( − b) + 4.8.
2
If the signal values were complex then 2−2b /12 would be changed to 2−2b /6.
The previous results are common in literature. They are derived assuming that the variances of the
errors are the same and obtained assuming uniform nature of the quantization errors. However these
results differ from the ones obtained by statistical analysis. The reason is in the quantization error
distribution and variance. Namely, after the high precision signal x (n) is divided by 2 and stored into
(b + 1)-bit registers, the errors in x (n)/2 + e(n) are uniform, with −∆/2 ≤ e(n) < ∆/2. When these
values are stored into registers, then in every next stage, when we calculate [ x (n)/2 + e(n)] + [ x (n +
1)/2 + e(n + 1)]/2 the input values x (n)/2 + e(n) and x (n + 1)/2 + e(n + 1) are already stored
in the (b + 1)-bit registers. Division by 2 is just a one bit shift to the right. This shift cases one bit error.
Therefore this one bit error is discrete in amplitude
ed ∈ {−∆/2, 0, ∆/2},
with probabilities
Pd (±∆/2) = 1/4 and Pd (0) = 1/2.
The expected value of this kind of error is zero, provided that the rounding is done in such a way that it
takes values ±∆/2 with equal probability (various tie-breaking algorithms for rounding exist). The
(i )
variance of ed is
n o 1 ∆ 1 ∆ 1
(i )
var en = 2var {ed } = 2[ (− )2 + ( )2 ] = ∆2 , for i > 2.
4 2 4 2 4
The total variance of X̂ N is then of the form
σx2 1 2 1 4 1 2m σ2 1 4
2
σX =N 2
+ ∆2 + ∆2 + · · · + ∆2 = x + ∆2 (1 − ),
N
N 4 N 4 N 4 N N 2 3N
Ljubiša Stanković Digital Signal Processing 409
The previous analysis corresponds to the calculation of the DFT coefficient X (0) when the input
signal is a random uniform signal, whose values are within −1 < x (n) < 1, with variance σx2 . A model
for the element X (k), with all quantization errors included, is
1 N −1 n nk
o N −1
X̂ (k) = ∑
N n =0
[ x (n) + ei (n)] WN + e m ( n ) = ∑ y ( n ),
n =0
where ei (n) is the input signal quantization error and em (n) is the multiplication quantization error.
The variances for complex-valued signals are
1 2 1 2 1 1
var{ei (n)} = 2 ∆ = ∆ , var{em (n)} = 4 ∆2 = ∆2 .
12 6 12 3
Moreover, we have to provide that the additions do not produce an overflow. If we use the calculation
scheme, presented for N = 8, as
y (0) y (1) (2) y (2) y (3) (2) y (4) y (5) (2) y (6) y (7) (2)
2 + 2 + e0 2 + 2 + e2 (4) 2 + 2 + e4 2 + 1 + e6 (4)
2 + 2 + e0 2 + 2 + e4 (8)
X̂ (k) = + + e0 ,
2 2
then in every addition, the terms should be divided by 2. This division introduces the quantization error.
In the first step,
y(n) y ( n + 1) 1 nk
+ e(n) + + e(n + 1) = {[ x (n) + ei (n)] WN + em (n)+
2 2 2
( n +1) k
[ x (n + 1) + ei (n + 1)]WN + em (n + 1)} + e(n) + e(n + 1).
nk + e ( n ) + e ( n + 1)W ( n +1) k
(2) ei (n)WN m i N + e m ( n + 1)
en = + e ( n ) + e ( n + 1)
2
with the variance
(2) 1 1 2 1 2 1 2 1 2 1 7
var{en } = ∆ + ∆ + ∆ + ∆ + 2 ∆2 = ∆2 .
4 6 3 6 3 6 12
(4) (N)
In all other steps, within the errors e0 to e0 , just the addition errors appear. Their variance, for
complex-valued terms, is
(i ) 1
var{en } = 2 ∆2 .
6
Therefore, the variance of
(2) (2) (2)
x (0) x (1) x ( N − 1) e e e
X̂ N = + + ··· + + 0 + 2 + · · · + N −2 +
N N N N/2 N/2 N/2
(4) (4) (N)
e0 e e
+ + · · · + N −4 + . . . . + 0 (7.146)
N/4 N/4 N/N
410 Discrete-Time Random Signals
is obtained using
(2) (2) (2)
e0 e e 7 1 N 7 2
var{ + 2 + · · · + N −2 } = ∆2 2 = ∆2
N/2 N/2 N/2 12 N /4 2 12 N
(4) (4)
e0 e 1 1 N 1 4
var{ + · · · + N −4 } = ∆2 2 = ∆2
N/4 N/4 3 N /16 4 3 N
...
(N)
e0 1 1 N 1 2m
var{ } = ∆2 2 2 = ∆2 .
N/N 3 N /N N 3 N
σx2 1 2 3
2
σX = + ∆2 ( + 1 + 2 + · · · + 2m −1 )
N
N 3 N 4
σ2 2 N − 14 ∼ σx2 2
= x + ∆2 = + ∆2
N 3 N N 3
with
3σx2 m
SNR = 10 log = 20 log σx − 6.02( − b) + 1.76.
2N∆2 2
If the described discrete nature of the quantization error amplitude, after the first quantization
step, is taken into account (provided that the rounding is done in such a way that the error takes values
±∆/2 with equal probability), then with
n o 1
(i )
var en = 4var {ed } = ∆2 ,
2
for i > 2, the variance of X̂ N follows
5
σx2 ∆2 7 σ2 N− ∼ σ2
2
σX = + ( + 2 + 4 + · · · + 2m −1 ) = x + ∆2 6
= x + ∆2 .
N
N N 6 N N N
If the FFT is calculated using the fixed-point arithmetic and the signal is uniform, distributed
within −1 < x (n) < 1, with the variance σx2 , then in order to avoid an overflow the signal could be
divided at the input with N and the standard FFT could be used, as in Fig. 7.36.
An improvement in the SNR can be achieved if the scaling is done not to the input signal x (n)
by N, but by 1/2 in every butterfly, as shown in Fig. 7.37. The improvement achieved here is due
to the fact that the quantization errors appearing in the early butterfly stages are divided by 1/2 and
therefore reduced at the output, as in (7.144). An improvement of the order of N is achieved in the
output signal-to-noise ratio.
Fixed point arithmetic is simple, but could be inefficient if the signal values within a wide range of
amplitudes are expected. For example, if we can expect the signal values
xQ (n1 ) = 1011111110101.010
xQ (n2 ) = 0.0000000000110101,
Ljubiša Stanković Digital Signal Processing 411
x(0)/N X(0)
x(1)/N X(4)
−1 W80
x(2)/N X(2)
−1 W80
x(3)/N X(6)
−1 W82 −1 W80
x(4)/N X(1)
−1 W80
x(5)/N X(5)
−1 W81 −1 W80
x(6)/N X(3)
−1 W82 −1 W80
x(7)/N X(7)
−1 W83 −1 W82 −1 W80
Figure 7.36 The FFT calculation scheme obtained using the decimation-in-frequency for N = 8 with the signal
being divided by N in order to avoid an overflow when the fixed point arithmetic is used.
1/
2
1/
2
1/
2
1/
1/2 1/2
x(1) −1/2 X(4)
W80
2
1/
2
1/
1/
2
1/2 1/2
x(2) −1/2 X(2)
W80
1/
2
2
1/
2
1/
1/
2
1/2
x(3) −1/2 −1/2 X(6)
W82 W80
1/
2
1/
2
1/2 1/2
x(4) −1/2 X(1)
W80
2
1/
2
1/
1/
1/
2
2
1/2
x(5) −1/2 −1/2 X(5)
W81 W80
2
1/
1/
2
1/
2
1/2
x(6) −1/2 −1/2 X(3)
W82 W80
1/
2
1/
1/
2
2
1/
2
Figure 7.37 The FFT calculation scheme obtained using the decimation-in-frequency for N = 8 with the signal
being divided by 1/2 in every butterfly in order to avoid an overflow when the fixed-point arithmetic is used.
412 Discrete-Time Random Signals
then obviously fixed-point arithmetic would require large registers so that both values can be stored
without loosing their significant digits. However, we can represent these signal values into the
exponential form as
The exponential format of numbers is then written within the register in the following format
s n s e e1 e2 e3 e4 e5 e6 e7 m − 1 m − 2 m − 3 . . . m − b
where:
sn is the sign of number (1 for a positive number and 0 for a negative number)
se is the sign of exponent (1 for a positive exponent and 0 for a negative exponent)
e1 e2 . . . e7 is the binary format of the exponent, and
m−1 m−2 · · ·−b is the mantissa, assuming that the integer value is always 1, it is omitted.
Within this format, the previous signal value xQ (n1 ), with a register of 19 bits in total, is
1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1,
while xQ (n2 ) is
1 0 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 0 0.
If the exponent cannot be written within the defined number of bits (here 7), the computer has to stop
the calculation and indicate "overflow", that is, the number cannot fit into the register. For mantissa,
the values are just rounded to the available number of bits. In the implementations based on the
floating-point arithmetic, the quantization affects the mantissa only. The relative error in mantissa is
again
1
|e(n)| ≤ 2−(b+1) = ∆.
2
The error in signal is multiplied by the exponent. Since we can say that the exponent value is of the
signal order, we can write
The quantization error behaves here as a multiplicative uniform noise. Thus, for the floating-point
representation, multiplicative errors appear.
The floating-point additions also produce the quantization errors, which are represented by a
multiplicative noise. During additions, the number of bits may increase. This increase in the number of
bits requires mantissa shift, what causes multiplicative error.
In addition to the IEEE standard when the total number of bits is 32 (23 for mantissa and 7 for
exponent) we will mention two standard formats for the telephone signal coding. The µ-law pulse-coded
modulation (PCM) is used in the North America and the A-law PCM is used in European telephone
networks. They use 8-bit representations with a sign bit, 3 exponent bits, and 4 mantissa bits
s e1 e2 e3 m 1 m 2 m 3 m 4 .
The µ-law encoding takes a 14-bit signed signal value (its two’s complement representation) as input,
adds 33 (binary 100001) and converts it to an 8-bit value. The encoding formula in the µ-law is
h i
(−1)s 2e+1 (m + 16.5) − 33 .
The sign bit s is set to 1 if the input sample is negative. It is set to 0 if the input sample is positive.
Number 0 is written as
0 0 0 0 0 0 0 0.
Example 7.61. As an example consider the positive numbers from +1 to +30. They are written as
+21 (m + 16.5) − 33 with 15 quantization steps equal to 2 (starting from m = 1 to m = 15). Then
the numbers from +31 to +94 are written as +22 (m + 16.5) − 33 with 16 quantization steps
equal to 4 (with m from 0 to 15). The last interval for positive numbers is from +4063 to +8158
written as +28 (m + 16.5) − 33 with 16 quantization intervals (with m from 0 to 15) of the width
256. The range of the input values is from −8159 to +8159 (±213 ) with the minimum step size
2 for the smallest amplitudes.
ln(1 + µ | x |)
F ( x ) = sign( x ) .
ln(1 + µ)
with µ = 255.
Example 7.62. Write the number a = 456 in the binary µ-law format.
⋆The number to be represented by 2e+1 (m + 16.5) is 456 + 33 = 489. The mantissa range is
0 ≤ m ≤ 15. This means that the exponent (e + 1) should be such that
489
0 + 16.5 ≤ ≤ 15 + 16.5
2e +1
for the range 16.5 ≤ m + 16.5 ≤ 31.5. It is easy to conclude that 489/16 = 30.5625, meaning
e + 1 = 4 with m + 16.5 = 30.5625. The nearest integer value of m is m = 14. Therefore
â = 23+1 × (14 + 16.5) − 33 = 455 is the nearest µ-law format number to a. The binary form is
0 0 1 1 1 1 1 0.
The quantization step for this range of numbers is 24 = 16. It means that the closest possible
smaller number is 439, while the next possible larger number would be 471. It is the last number
with the quantization step 2e+1 = 16.
r (n, m) = x (n + m) x (n − m)
if the quantization error is caused by the floating-point registers with b bits for the mantissa. What
is the expected value? Write the model for
y ( n ) = x ( n ) + x ( n + 1).
414 Discrete-Time Random Signals
where e(n, n + 1) is the is the multiplicative noise that models the addition error.
Ljubiša Stanković Digital Signal Processing 415
7.8 PROBLEMS
Problem 7.1. Signal x20i (n), for i = 01, 02, .., 15, is the monthly average of the maximum daily
temperatures in a city, measured from the year 2001 to 2015. The values of this signal are given in
Table 7.2. If we can assume that the signal for every individual month is Gaussian, find the probability
that the average of maximum daily temperatures: (a) in January is lower than 2, (b) in January is higher
than 12.
Problem 7.2. Available are M realizations of the random variable xi (n), i = 1, 2, . . . , M, at an instant
n. The variance of x (n) is estimated in two possible scenarios:
(a) The mean value is know in advance and it is equal to zero, µ x (n) = 0.
(b) The mean value is not known and it is estimated from data as
1
µ x (n) = ( x (n) + x2 (n) + · · · + x M (n))
M 1
How the estimate of the variance in (a) is related to the variance estimate in (b)?
x = [0.26, 0.31, 0.64, 0.99, 1.00, 0.92, 0.85, 0.73, 0.58, 0.15] T ,
with different independent random variable, tn , values given in the vector form
t = [−0.8, −0.83, −0.60, −0.10, −0.01, 0.28, 0.39, 0.52, 0.65, 0.92] T .
â = (T T T)−1 T T x. (7.147)
.
(b) Estimate the model parameters with the ridge regression model (the solution to the
minimization of J (a) = ||x − Ta||22 + λ||a||22 ) in the form
â = (T T T + λI)−1 T T x, (7.148)
with λ = 0.1.
(c) Repeat the calculations in (a) and (b) with an increased additive noise in the data
x = [0.35, 0.33, 0.57, 0.92, 0.94, 0.89, 0.87, 0.86, 0.44, 0.29] T .
(d) Predict the value x (1.12) in all considered cases. Use the result in (a) as the reference.
(e) Find the bias and the covariance matrix of the regression ridge estimator as a function of λ,
when the noise in the true data s is white, with the variance σε2 and the considered signal is x = s + ε
(advanced topic).
416 Discrete-Time Random Signals
Find the probability density function p(ξ ) and the probability that x (n) < 2.5.
where a and b are the constants. Find the relation between a and b. What is the cumulative probability
distribution function for a = 1?
Problem 7.6. A random signal x (n) is characterized by the probability density function
λ −λ|ξ |
p x (ξ ) = e , λ > 0.
2
Find the expected value and variance of x (n).
Problem 7.7. The joint probability density function of signals x (n) and y(n) is
kξe−ξ (ζ +1) , 0 ≤ ξ < ∞ 0 ≤ ζ < ∞
p xy (ξ, ζ ) =
0 elsewhere.
Problem 7.8. Consider two independent random signals x (n) and y(n) with probability density
functions p x(n) (ξ ) and py(n) (ξ ). A new random signal is defined is such a way that it takes the greater
value of the signals x (n) and y(n) at each instant n,
Find the probability distribution and the probability density function of the random signal z(n).
Problem 7.9. A set of N = 10 balls is considered, with an equal number of balls being marked with 1
(or white) and 0 (or black). A random signal x (n) corresponds to drawing four balls in a row. It has
four samples x (0), x (1), x (2), and x (3) corresponding to these draws. The signal values are equal to
the number (color) associated with the randomly drawn ball. If k is the number of values 0 that appear
in the signal (number of black balls), write the probability for k = 0. Generalize the result for N balls
and M signal samples.
Problem 7.10. The random signal x (n) is zero-mean Gaussian distributed with the the probability
density function
1 2 2
p x (ξ ) = √ e−ξ /(2σx ) .
σx 2π
Show that the variance of this random variable is equal to σx2 .
Problem 7.11. The random signal x (n) is zero-mean Gaussian distributed random variable with the
variance σ22 . Find the median of x (n) and the median of | x (n)|.
Ljubiša Stanković Digital Signal Processing 417
Problem 7.12. (a) Consider a zero-mean Gaussian distributed random noise ε(n) with variance
σε2 . Find the variance of y(n) = ε(n) − ε(n − 1) and relate it to the sample median of |y(n)| =
|ε(n) − ε(n − 1)|.
(b) Sow that, if a signal x (n) consists of a slow-varying deterministic signal s(n) such that
|s(n) + ε(n) − s(n − 1) − ε(n − 1)| ≈ |ε(n) − ε(n − 1)|, the noise standard deviation can estimated
using
1 1
σ̂ε = √ median {x (n) − x (n − 1)}.
2 0.6745 n=2,3,...,N
(c) Check this result on the signal and noise from Example 7.45.
Problem 7.13. The random signal x (n) is such that x (n) = 0 with probability 0.8. In all other cases
x (n) is Gaussian random variable with the expected value 3 and the variance equal to 2. Find the
expected value and the variance of x (n).
Problem 7.14. The signal ε(n) is a Gaussian noise with the expected value µε = 0 and the variance
σε2 . Find the probability that |ε(n)| > A. If the signal length is N = 2000, find the expected number of
samples with amplitudes higher than A = 10, assuming that σε2 = 2. What is the result for A = 4 and
σε2 = 2.
Problem 7.15. The random signal x (n) is a Gaussian noise with the expected value 0 and the variance
σx2 . The signal has a large number N of samples. A random sequence y(n) is formed using M samples
from the signal x (n) with the lowest amplitudes. Find µy and σy .
Problem 7.16. Consider the signal s(n) = Aδ(n − n0 ) and a zero-mean Gaussian noise ε(n) with
variance σε2 , within the interval 0 ≤ n ≤ N − 1, where n0 is a constant integer within 0 ≤ n0 ≤ N − 1.
Find the probability of the event A that the maximum value of x (n) = s(n) + ε(n) is obtained at
n = n0 .
Problem 7.17. The random signal x (n) is a Gaussian noise with the expected value 0 and the variance
σx2 . A random sequence y(n) is formed by omitting the samples from the signal x (n) whose amplitudes
are higher than A. Find the probability density function of the sequence y(n). Find µy and σy .
Problem 7.18. The signal samples x (n) are such that
A + ε(n), for n ∈ N x
x (n) =
ε ( n ), otherwise
where ε(n) is a Gaussian noise with the expected value µε = 0 and the variance σε2 , A > 0 is a constant
and N x is a nonempty set of discrete-time instants. The threshold-based criterion is used to detect if an
arbitrary time instant n belongs to the set N x
n ∈ Nx if x (n) > T,
where T is the threshold. Find the value of threshold T if the probability of false detection is 0.01.
Problem 7.19. The signal x (n) is a random Gaussian sequence with the expected value µ x = 5 and
the variance σx2 = 1. The signal y(n) is a random Gaussian sequence, independent from x (n), with the
expected value µy = 1 and the variance σy2 = 1. If we consider N = 1000 samples of these signals,
find the expected number of time instants where x (n) > y(n) holds.
Problem 7.20. Let x (n) and y(n) be independent real-valued white Gaussian random variables with
expected values µ x = µy = 0 and the variances σx2 and σy2 . Show that the random variable
1 M
M n∑
z= x (n)y(n)
=1
418 Discrete-Time Random Signals
Problem 7.22. A random signal ε(n) is stationary and Cauchy distributed with the probability density
function
a
pε(n) (ξ ) = .
1 + ξ2
Find the coefficient a, expected value, and the variance of this signal.
Problem 7.23. Find the expected value and the variance of the Poisson distributed random variable
λk e−λ
P( x (n) = k) = P(k) = for λ > 0.
k!
Problem 7.24. The causal system is defined by
The input signal is x (n) = aδ(n) with the random amplitude a. The random variable a is uniformly
distributed within the interval from 4 to 5. Find the expected value and autocorrelation of the output
signal. Is the output signal WSS?
Problem 7.25. Consider the Hilbert transformer with the impulse response
2
π2 sin (nπ/2 )
, n 6= 0
n
h(n) = .
0, n=0
The input signal to this transformer is a white noise with the variance equal to 1.
(a) Find the autocorrelation function of the output signal.
(b) Find the cross-correlation of the input and the output signal. Show that the cross-correlation is
an antisymmetric function.
(c) Find the autocorrelation and the power spectral density function of the analytic signal
ε a (n) = ε(n) + jε h (n), where ε h (n) = ε(n) ∗n h(n).
If the input signal is a white noise x (n) = ε(n), with the autocorrelation function rεε (n) = σε2 δ(n),
find the autocorrelation and the power spectral density of the output signal.
x (n) = ε(n)u(n)
Problem 7.28. Find the expected value, the autocorrelation, and the power spectral density of the
random signal
N
x (n) = ε(n) + ∑ ak e j(ω n+θ ) ,
k k
k =1
where ε(n) is a stationary real-valued noise with the expected value µε and the autocorrelation
rεε (n, m) = σε2 δ(n − m) + µ2ε and θk are random variables, uniformly distributed over the interval
−π < θk ≤ π. All random variables are statistically independent.
Problem 7.29. Find a stable optimal filter if the correlation functions for the desired signal and additive
noise are rss (n) = 0.25|n| , rsε (n) = 0 and rεε (n) = δ(n). Discuss the filter causality.
Problem 7.30. Calculate the DFT value X (2) of the signal s(n) = exp( j4πn/N ), with N = 8,
corrupted by the additive noise ε(n) = 2001δ(n) − 204δ(n − 3), using
N −1
X (k) = ∑ (s(n) + ε(n))e− j2πkn/N
n =0
Problem 7.31. The spectrogram is one of the most commonly used tools in time-frequency analysis.
Its form is 2
N −1
− j 2π ik
Sx (n, k) = ∑ x (n + i )w(i )e N
i =0
where the signal is x (n) = s(n) + ε(n), with s(n) being the desired deterministic signal and ε(n)
being a complex-valued, zero-mean white Gaussian noise, with the variance σε2 and independent
and identically distributed (i.i.d.) real and imaginary parts. The window function is w(i ). Using the
rectangular window of the width N find:
a) the expected value of Sx (n, k),
b) the variance of Sx (n, k).
Note: For a Gaussian random signal ε(n), holds
E{ε(l )ε∗ (m)ε∗ (n)ε( p)} = E{ε(l )ε∗ (m)} E{ε∗ (n)ε( p)}
+ E{ε(l )ε∗ (n)} E{ε∗ (m)ε( p)} + E{ε(l )ε( p)} E{ε∗ (m)ε∗ (n)}. (7.151)
420 Discrete-Time Random Signals
Problem 7.32. The basic time-frequency distribution is the Fourier transform, whose discrete-time
form reads
L
Wx (n, ω ) = ∑ x (n + k) x ∗ (n − k)e− j2ωk ,
k =− L
where the signal is given by x (n) = s(n) + ε(n), with s(n) being the desired deterministic signal and
ε(n) being the complex-valued, zero-mean white Gaussian noise whose variance is σε2 . The real and
imaginary parts of the noise are independent and identically distributed (i.i.d.). Find:
a) the expected value of Wx (n, ω ),
b) the variance of Wx (n, ω ).
Use the previous problem note. Find the variance for an FM signal, when |s(n)| = A.
Problem 7.33. A random signal s(n) carries an information. Its autocorrelation function is rss (n) =
4(0.5)|n| . A noise with variance of autocorrelation rεε (n) = 2δ(n) is added to the signal. Find the
optimal filter for:
(a) d(n) = s(n) - optimal filtering,
(b) d(n) = s(n − 1) - optimal smoothing,
(c) d(n) = s(n + 1) - optimal prediction.
Problem 7.34. Design an optimal filter if the autocorrelation function of the signal is rss (n) =
3(0.9)|n| . The autocorrelation of noise is rεε (n) = 4δ(n), while the cross–correlation of the signal and
noise is rsε (n) = 2δ(n) .
Problem 7.35. The power spectral densities of the signal Sdd (e jω ) and of the input noise is Sεε (e jω )
are given in Fig. 7.38. Show that the frequency response of the optimal filter H (e jω ) is presented in
Fig. 7.38(bottom). Find the SNR at the input and the output of the optimal filter.
Problem 7.36. Find the expected value of the quantization error of the Fourier transform (its pseudo
form over-sampled in frequency)
N −1
Wx (n, k) = ∑ x (n + m) x (n − m)e− j2πmk/N ,
m =0
where x (n) is a real-valued quantized signal. The product of signals is quantized to b bits as well.
Neglect the quantization of the coefficients e− j2πmk/N and the quantization of their products with the
signal.
7.9 EXERCISE
Exercise 7.1. Signal x20i (n) is equal to the monthly average of the maximum daily temperatures in
a city measured from year 2001 to 2015. If we can assume that the signal for an individual month is
Gaussian find the probability that the average of the maximum temperatures: (a) in July is lower than
25, (b) in August is higher than 39.
Exercise 7.2. The random signal x (n) is such that x (n) = x1 (n) with probability p. In all other cases
x (n) is x2 (n). If the expected value and the variance of x1 (n) and x2 (n) are µ x1 , σx21 and µ x2 , σx22 ,
respectively, find the expected value and the variance of x (n).
Result: µ x = pµ x1 + (1 − p)µ x2 and
h i h i
σx2 = p E{ x12 (n)} − µ2x + (1 − p) E{ x22 (n)} − µ2x
1.5
Sdd (ejω)
0.5
0
−3 −2 −1 0 1 2 3
1.5 jω
Sεε (e )
0.5
0
−3 −2 −1 0 1 2 3
1.5
H(ejω)
1
0.5
0
−3 −2 −1 0 1 2 3
2
Figure 7.38 Power spectral densities of the signal S(e jω ) and input noise Sεε (e jω ) along with the frequency
response of an optimal filter H (e jω ).
Exercise 7.3. Find the expected value and the variance of a white uniform noise whose values are
within the interval − a ≤ x (n) ≤ a. If this signal is an input to the FIR system with the impulse response
h(n) = 1 for 1 ≤ n ≤ N and h(n) = 0 elsewhere, find the expected value and the variance of the
output signal.
Exercise 7.4. Consider the signal x (n) equal to the Gaussian zero-mean noise with the variance σε2 . A
new noise y(n) is formed using the values of x (n) lower than median value. Find the expected value
and the variance of this new noise y(n). Result: σy2 = 0.1426σε2 .
x (n) = ε(n)u(n)
422 Discrete-Time Random Signals
where µε = 0 and rεε (n) = σε2 δ(n). Find the expected value value and the autocorrelation ryy (n, m) of
the output signal. What is the cross-correlation between the input signal and the output signal ryx (n, m).
Show that for n → ∞ the output signal tends to a WSS signal.
Exercise 7.6. (a) Calculate the DFT value X (4) for x (n) = exp( j4πn/N ) with N = 16.
(b) Calculate the DFT of a noisy signal x (n) + ε(n), where the noise realization is ε(n) =
1001δ(n) − 899δ(n − 3) + 561δ(n − 11) − 32δ(n − 14).
(c) Estimate the DFT using the noisy signal x (n) + ε(n) and
n o
XR (k) = N median Re ( x (n) + ε(n))e− j2πkn/N
n=0,1,..,N −1
n o
+ jN median Im ( x (n) + ε(n))e− j2πkn/N .
n=0,1,..,N −1
Exercise 7.7. The power spectral densities of the desired signal Sdd (e jω ) and the input noise Sεε (e jω )
are given in Fig. 7.39 for two cases. One on the left panels and the other on the right panels. Show that
the frequency response of the optimal filter H (e jω ) is given in Fig. 7.39(bottom panel for both cases of
the signal and noise). Find the SNR at the input and the output of the optimal filter in both cases.
Exercise 7.8. Find the transfer function of the optimal filter for the signal x (n) = s(n) + ε(n), where
ε(n) is a white noise with the autocorrelation rεε (n) = Nδ(n), and s(n) is the random signal obtained
as the output of the first-order linear system to the white noise with the autocorrelation rss (n) = a|n| ,
0 < a < 1. The signal and noise are not correlated.
Exercise 7.9. A random signal s(n) carries an information. Its autocorrelation function is rss (n) = 41|n| .
A noise with the autocorrelation rεε (n) = 0.5δ(n) is added to the signal. Find the optimal filter for:
(a) d(n) = s(n) - optimal filtering,
(b) d(n) = s(n − 1) - optimal smoothing,
(c) d(n) = s(n + 1) - optimal prediction.
Exercise 7.10. Find the power spectral densities of the signals whose autocorrelation functions are:
(a) r xx (n) = δ(n) + 2 cos(0.πn),
(b) r xx (n) = −4δ(n + 1) + 7δ(n) − 4δ(n − 1), and
∞
(c) r xx (n) = 2a cos(ω0 n) + ∑ σ2 (1/2)k δ(n − k).
k =0
Exercise 7.11. Find the expected value and variance of the periodogram, Pxx (e jω ), of a deterministic
signal s(n) corrupted by the white noise with variance σε2 ,
2
1 N/2−1
jω
− jωn
Pxx (e ) = E{ ∑ (s(n) + ε(n))e }. (7.152)
N n=− N/2
1.5 jω 1.5 jω
Sdd (e ) Sdd (e )
1 1
0.5 0.5
0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1.5 1.5
Sεε (ejω) Sεε (ejω)
1 1
0.5 0.5
0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1.5 1.5
H(ejω) H(ejω)
1 1
0.5 0.5
0 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
2
Figure 7.39 Power spectral densities of the signal S(e jω ) and the input noise Sεε (e jω ) along with the frequency
responses of the optimal filters H (e jω ). Two cases are shown, one on the left panels and the other on the right
panels.
424 Discrete-Time Random Signals
7.10 SOLUTIONS
Solution 7.1. (a) The expected value of the temperature for January, Table 7.2, is
µ x (1) = 7.2667.
The standard deviation for January, calculated over 15 years, is σx (1) = 2.6196. The probability that
the average maximum temperature in January is lower than 2 is
Z2 2
1 − (ξ −µx (1)) 7.2667 − 2
P ( x (1) < 2) = √ e 2σx2 (1) dξ = 0.5 1 − erf( √ ) = 0.0260.
σx (1) 2π 2.7115 2
−∞
This means that this event will occur once in about 40 years.
(b) The average maximum temperature is higher than 12 with the probability
Z∞ 2
1 − (ξ −µx (1)) 12 − 7.2667
P( x (1) > 12) = √ e 2σx2 (1) dξ = 0.5 1 − erf( √ ) = 0.0404.
σx (1) 2π 2.7115 2
12
This means that this event will happen once in about 25 years.
Solution 7.2. (a) In the scenario when the expected value is a priory known, µ x (n) = 0, the variance
estimation is
1 2
σx2 (n) = x1 (n) + x22 (n) + · · · + x2M (n) − µ2x (n).
M
(b) When the mean is also estimated from the data, the variance will be denoted by s2x (n), and it
is equal to
!
2 1 1 M 2 1 M 2
M i∑ M i∑
s x (n) = x1 ( n ) − xi ( n ) + · · · + x M ( n ) − xi ( n )
M =1 =1
1 M 1 M 2 1 M 2 1 M 2
= ∑
M j =1
x j (n) − ∑
M i =1
xi ( n ) = ∑
M j =1
x j (n) − ∑
M i =1
xi ( n )
1 M 2 1 M M 1 M 2 1 M 1 M M
= ∑ x j ( n ) − 2 ∑ ∑ xi ( n ) x j ( n ) = ∑ x j (n) − 2 ∑ x2j (n) − 2 ∑ ∑ xi (n) x j (n)
M j =1 M j =1 i =1 M j =1 M j =1 M j =1 i =1
i6= j
M−1 M 2 M2 − M 1 M M
= 2 ∑
M j =1
x j (n) −
M 2 2 ∑ ∑ x i ( n ) x j ( n ).
M − M j =1 i =1
i6= j
In the second summation, the denominator ( M2 − M) is used since there are exactly ( M2 − M ) terms
in it, and the mean value estimate is
M M
1
µ̂2x (n) ≈
− M j=1 i∑
∑ x i ( n ) x j ( n ).
M2 =1
i6= j
Ljubiša Stanković Digital Signal Processing 425
Therefore, the the variance, s2x (n), can be written in the form
M − 1 1 M 2 1 M M
s2x (n) = ∑ x j (n) − 2 ∑ ∑ xi ( n ) x j ( n )
M M j =1 M − M j =1 i =1
i6= j
M − 1 1 M 2 M−1
= ∑ x j (n) − µ̂2x (n) ≈ σx2 (n)
M M j =1 M
This means that the variance with the true mean value, σx2 (n), is (approximately) related to the variance
with the estimated mean value, s2x (n), as
!
M 2 1 1 M 2 1 M 2
2
M i∑ M i∑
σx (n) ≈ s (n) = x1 ( n ) − xi ( n ) + · · · + x M ( n ) − xi ( n ) .
M−1 x M−1 =1 =1
Solution 7.3. (a) The model parameters, obtained as the solution to the least squares minimization
problem of J (a) = ||x − Ta||22 , for the data
t = [−0.8, −0.83, −0.60, −0.10, −0.01, 0.28, 0.39, 0.52, 0.65, 0.92] T
and
x = [0.26, 0.31, 0.64, 0.99, 1.00, 0.92, 0.85, 0.73, 0.58, 0.15] T .
are given by (see the matrix T definition in (7.14))
The estimated model is shown in Fig. 7.40(a) by the dotted line. Since the noise is small (cased by
rounding the data to two decimals), the model fits the data accurately.
(b) When the regularization constant is added, the solution to the ridge regression minimization
of J (a) = ||x − Ta||22 + λ||a||22 ), in the form
is obtained. In this case, a small bias in fitting the data can be observed from Fig. 7.40(b).
(c) When a stronger additive noise is present in the data
x = [0.35, 0.33, 0.57, 0.92, 0.94, 0.89, 0.87, 0.86, 0.44, 0.29] T ,
the least squares and the ridge regression solutions are, respectively,
and
The model results are shown in Fig. 7.40(c) and (d), respectively. The higher-order model coefficients
in â are larger in the solution when the regularization is not used.
(d) The predicted values of x (1.12) in all considered cases are indicated by a circle. We can see
that the moderate noise causes significant deviation (over-fitting) of the results, Fig. 7.40(c), if the
regularization is not used. In the case with a very small noise, the regularizations slightly worsen the
results, by introducing the bias, as shown in Fig. 7.40(b).
426 Discrete-Time Random Signals
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Figure 7.40 Polynomial fitting example. (a) Data with a small noise (the black dots) and the polynomial fitting
with the least-squares solution (the dotted line). (b) Data with a small noise and the polynomial fitting with the
regression model using regularization constant λ = 0.1, producing a small deviation from the data. (c) Data with
moderate noise and the polynomial fitting with the least-squares solution. The noise causes significant deviations
and an over-fitted model. (d) Data with moderate noise and the polynomial fitting with the regression model using
the regularization constant λ = 0.1, keeping the energy of all model coefficients low. The predicted value at x (1.12)
is marked by the circle.
(e) Advanced topic: Assume that the observed signal consists of the true data, s and the noise
ε, that is x = s + ε. The parameters of the true model, Ta = s, are the solution to the least-squares
minimization of J (a) = ||s − Ta||22 , that is
a = (T T T)−1 T T s.
For λ = 0 the estimator is unbiased, bias(â) = 0, while for large λ the bias increases toward
|bias(â)| = |(T T T)−1 T T s|.
The covariance matrix of the estimator is, by definition,
T
Cov(â) = E{(â − E{â})(â − E{â}) T } = E{ (T T T + λI)−1 T T ε (T T T + λI)−1 T T ε }
T
= σε2 (T T T + λI)−1 T T (T T T + λI)−1 T T = σε2 (T T T + λI)−1 T T T(T T T + λI)−1 .
The probability of x (n) < 2.5 is P( x (n) < 2.5) = F (2.5) = 0.75.
Therefore, Z ∞ Z Z ∞
0 2a
ae−b|ξ | dξ = a ebξ dξ + e−bξ dξ = = 1,
−∞ −∞ 0 b
428 Discrete-Time Random Signals
resulting in b = 2a.
For a = 1, the probability density function is p x (ξ ) = e−2|ξ | for −∞ < ξ < ∞ . The probability
distribution function is Z χ Z χ
e2χ
Fx (χ) = p x (ξ )dξ = e2ξ dξ =
−∞ −∞ 2
for −∞ < χ < 0, and
Z χ Z χ
1 1 e−2χ
Fx (χ) = + pǫ (ξ )dξ = + e−2ξ dξ = 1 −
2 0 2 0 2
for 0 < χ < ∞.
Solution 7.8. Since the random signal z(n) takes the greater of the values x (n) and y(n) the
probability that z(n) = max{ x (n), y(n)} is lower than or equal to an assumed χ is equal to the
probability that both random samples x (n) and y(n) are lower than or equal to this assumed χ, that is
Since
P{ x (n) ≤ χ} = Fx(n) (χ) and P{y(n) ≤ χ} = Fy(n) (χ),
we get the probability distribution of the random variable z(n) in the form
The probability density function follows as the derivative od the probability distribution,
dFz(n) (ξ )
pz(n) (ξ ) = = p x(n) (ξ ) Fy(n) (ξ ) + Fx(n) (ξ ) py(n) (ξ ).
dξ
Ljubiša Stanković Digital Signal Processing 429
Solution 7.9. There are 5 out of 10 black balls. The probability that x (0) = 0 is
5
P0 = .
10
If the first ball was 0, then we have 9 balls for the second draw, with 4 balls marked with 0. The
probability that x (1) = 0, if x (0) = 0 is
4
P1 = .
9
If x (0) = 0 and x (1) = 0, then there are 8 remaining balls with 3 of them being marked with 0. The
probability that x (2) = 0, with x (0) = 0 and x (1) = 0, is
3
P2 = .
8
The probability for k = 0 is
5 432
P ( k = 0) = .
10 9 8 7
In general, if there were N balls, with an equal number of balls being marked with 1 (or white) and 0
(or black), and we considered M signal samples (drawings), the probability P(k = 0) would be
M −1
N/2 − i
P ( k = 0) = ∏ N−i
.
i =0
Z∞ Z∞ ξ2
2 1 −
Var{ x (n)} = ξ p x (ξ )dξ = ξ2 √ e 2σx2
dξ.
σx 2π
−∞ −∞
and the substitution of the variables to polar coordinates, ξ = σx ρ cos(φ), ζ = σx ρ sin(φ), we get
Z∞ Z2π ρ2
1
Var{ x (n)}Var{ x (n)} = ρ4 cos2 (φ) sin2 (φ)e− 2 σx4 ρdρdφ
2π
0 0
Z∞ ρ2
Z∞
1 1 4
= σx4 ρ4 e− 2 ρdρ = σ t2 e−t dt = σx4 ,
8 2 x
0 0
where cos2 (φ) sin2 (φ) = sin2 (2φ)/4 = (1 − cos(4φ))/8 and the substitution ρ2 /2 = t are used.
This means that Var{ x (n)} = σx2 .
430 Discrete-Time Random Signals
Solution 7.11. For a random signal x (n), with a probability density function p x (ξ ), the median is
defined as the value m x such that
Zmx Z∞
p x (ξ )dξ = p x (ξ )dξ.
−∞ mx
For the zero-mean Gaussian distributed random variable m x = 0. For the random variable | x (n)|, the
probability density function is
ξ2
2 −
2σx2
p| x| (ξ ) = 2p x (ξ )u(ξ ) = √ e u ( ξ ).
σx 2π
The median of | x (n)| is obtained from
Zmx ξ2 Z∞ ξ2
2 −
2σx2
2 −
2σx2
1
√ e dξ = √ e dξ =
σx 2π σx 2π 2
0 mx
or
Z∞ ξ2
2 −
2σx2
mx 1
√ e dξ = 1 − erf( √ )= .
σx 2π σx 2 4
mx
The solution is
m x = 0.6745σx .
Solution 7.12. (a) When ε(n) is a zero-mean Gaussian distributed random noise, with variance σε2 ,
the variance of y(n) = ε(n) − ε(n − 1) is equal to
σy2 = 2σε2 .
Based on the result from Problem 7.11, the median of |y(n)| = |ε(n) − ε(n − 1)| is related to
the standard deviation as √
my = 0.6745σy = 2 0.6745σε .
(b) For the signal x (n) such that | x (n) − x (n − 1)| = |s(n) + ε(n) − s(n − 1) − ε(n − 1)| ≈
|ε(n) − ε(n − 1)| holds, we have
√
median {x (n) − x (n − 1)} ≈ median {ε(n) − ε(n − 1)} ≈ 2 0.6745σε .
n=2,3,...,N n=2,3,...,N
the previous relation can be used to estimate the input SNR value.
Ljubiša Stanković Digital Signal Processing 431
(c) The true standard deviation of noise in Example 7.45 was σε = 4. The value of variance,
estimated using the previous relation, is
1
σ̂ε = √ median {x (n) − x (n − 1)} = 4.5.
2 0.6745 n=2,3,...,N
The difference between σ̂ε and σε is due the fact that the signal variations |s(n) − s(n − 1)| are
small, but not negligible. However, the estimation σ̂ε is sufficiently accurate for the presented algorithm,
since its slightly higher value than the true standard deviation will affect the confidence intervals only,
by increasing their bounds and the corresponding probabilities (from the factor of 7.1 in Example 7.45
to the factor of 8, corresponding to the case as if the true standard deviation σε = 4 were used with the
confidence interval bounds defined by 2.82σXN , instead of the assumed bounds defined by 2.5σXN ).
The standard deviation estimate could also be obtained using the variance definition for
x (n) − x (n − 1), given by
2 2
mean {x (n) − x (n − 1) } ≈ mean {ε(n) − ε(n − 1) } = 2σ̂ε2 .
n=2,3,...,N n=2,3,...,N
Solution 7.13. Let us find the probability that x (n) < ξ for arbitrary ξ. Consider the case when ξ < 0,
0.2 ξ −3
P{ x (n) < ξ } = 1 + erf .
2 2
It has been taken into account that the considered sample is Gaussian (with probability 0.2), along with
the probability that the sample value is smaller than ξ.
For ξ > 0, we should take into account that the signal takes x (n) = 0 with the probability 80%
as well as that in the remaining 20% cases, Gaussian random value could be smaller than ξ. So, we get
0.2 ξ −3
P{ x (n) < ξ } = 0.8 + 1 + erf .
2 2
Now, we have
1 + erf ξ −
0.2
2 2
3
forξ < 0
P{ x (n) < ξ } =
0.8 + 0.2 1 + erf ξ −3 for ξ > 0.
2 2
This function has a discontinuity at ξ = 0. It is not differentiable at this point as well. The derivative of
P{ x (n) < ξ } can be expressed in the form of the generalized functions (Dirac delta function) as
d 0.2 ( ξ −3)2
P{ x (n) < ξ } = py(n) (ξ ) = √ e− 4 + 0.8δ(ξ ).
dξ 2 π
The expected value and the variance are
Z∞
µy(n) = ξ py(n) (ξ )dξ = 0.2 × 3 + 0.8 × 0 = 0.6
−∞
Z∞
σy2(n) = (ξ − 0.6)2 py(n) (ξ )dξ = 0.2 × 7.76 + 0.8 × (0.6)2 = 1.84.
−∞
432 Discrete-Time Random Signals
For N = 2000, the expected number of samples with amplitude above A is P{|ε(n)| > 10} × 2000 ≈
3 × 10−9 ≈ 0. This means that we do not expect any sample with amplitude higher than 10.
For A = 4, we have
with 2000 × 4.7 × 10−3 = 9.4 ≈ 9 samples among the considered 2000 assuming an amplitude higher
than 4.
Solution 7.15. If we are in the position to use a reduced set of the signal samples for processing, then
the ideal scenario would be to eliminate signal samples with the higher noise values and to keep for
processing the samples with the lower noise values. For the case of N, the signal samples and signal
processing based on M samples, we can find the interval of amplitudes A for the lowest M noisy
samples. The probability that | x (n)| < Aσε is equal to
ZAσε
1 2 2
P{| x (n)| < Aσε } = √ e−ξ /(2σε ) dξ.
σε 2π
− Aσε
ZA
1 2 A M
√ e−ξ /2
dξ = erf √ = .
2π 2 N
−A
R∞
The constant k is obtained from the condition that py (ξ )dξ = 1. Its value is k = N/M.
−∞
The variance of this new noise, formed from the Gaussian noise after the largest N − M values
are removed, is much lower than the variance of the whole noise. It is given by
√ M
N Z ( N )σε
2erfinv
2 2
σy2 = M
√ ξ 2 e−ξ /(2σε ) dξ. (7.157)
σε 2π √
M
− 2erfinv( N ) σε
Ljubiša Stanković Digital Signal Processing 433
Solution 7.16. The probability density function for any sample x (n), n 6= n0 , is
1 2 2
p x(n),n6=n0 (ξ ) = √ e−ξ /(2σε ) .
σε 2π
The probability that any of these samples is smaller than a value of λ could be defined using (7.43)
Since the random variables x (n), 0 ≤ n ≤ N − 1, n 6= n0 , are statistically independent, then the
probability that all of them are smaller than λ is
−
PN −1 ( λ ) = Probability{All N − 1 values of x ( n ) < λ, n 6 = n0 }
h √ i N −1
= 0.5 + 0.5 erf(λ/( 2σε )) .
The probability density function of the sample x (n0 ) is a Gaussian function with the mean value A,
that is
1 2 2
p x(n0 ) (ξ ) = √ e−(ξ − A) /(2σε ) .
σε 2π
The probability that the random variable x (n0 ) takes a value around λ, λ ≤ x (n0 ) < λ + dλ, is
1 2 2
Pn+0 (λ) = Probability{λ ≤ x (n0 ) < λ + dλ} = √ e−(ξ − A) /(2σε ) dλ (7.158)
σε 2π
The probability that all values of x (n), 0 ≤ n ≤ N − 1, n 6= n0 are smaller than λ and that, at the same
time, λ ≤ x (n0 ) < λ + dλ is
N −1
− λ 1 2 2
PA (λ) = PN −1 ( λ ) Pn
+
0
( λ ) = 0.5 + 0.5 erf √ √ e−(ξ − A) /(2σε ) dλ,
2σε σε 2π
while the total probability that all x (n), 0 ≤ n ≤ N − 1, n 6= n0 are bellow x (n0 ) is an integral over
all possible values of λ
Z∞ Z∞ N −1
λ 1 2 2
PA = PA (λ)dλ = 0.5 + 0.5 erf √ √ e−(ξ − A) /(2σε ) dλ. (7.159)
2σε σε 2π
−∞ −∞
Solution 7.17. The probability density function for the sequence y(n) is
2
− (ζ )
py(n) (ζ ) = B √1 e 2σx2 for − A < ζ ≤ A
σx 2π
0 otherwise.
434 Discrete-Time Random Signals
R∞
A
The constant B can be calculated from −∞ py(n) (ζ )dζ = 1. Its value is B = 1/ erf √ . Now, we
σx 2
have µy(n) = 0 and
ZA 2 √ − A22
1 1 − (ζ )2 A 2e 2σx
σy2(n) = ζ2 √ e 2σx
dζ = σx2 1 −
.
erf A
√ σx 2π σx √π erf A
√
−A σx 2 σx 2
√
By denoting β = A/( 2σx ), the variance σy2(n) can be written as
2
!
e− β
σy2(n) = σx2 1 − 2β √ .
π erf ( β)
Solution 7.18. False detection means that we make a wrong decision by classifying instant n into set
N x . The probability is
1 1 T
PF = P{ε(n) > T } = − erf √
2 2 2σε
Now, we can find T as √
T= 2σε erfinv(1 − 2PF ) ≈ 2.33σε
where erfinv(·) is the inverse erf function. Note that the threshold does not depend on A.
since signals are mutually independent. The probability that x (n) > y(n) can be obtained by integrating
p x(n),y(n) (ξ, ζ ) over the region ξ > ζ,
Z∞ Zξ
1 ( ξ −5)2 1 ( ζ −1)2
P{ x (n) > y(n)} = √ e− 2 √ e− 2 dζdξ ≈ 0.99766.
2π 2π
−∞ −∞
For 1000 instants, we expect that x (n) > y(n) is satisfied in about 998 instants.
1 M M 1 M M
= 2 ∑ ∑
M n =1 m =1
E [ x (n)y(n) x (m)y(m)] = 2 ∑ ∑ E[ x (n) x (m)] E[y(n)y(m)]
M n =1 m =1
1 M 1 M 1 2 2
= ∑ E[ x2 (n)] E[y2 (n)] = 2 ∑ σx2 σy2 = σ σ .
2
M n =1 M n =1 M x y
Ljubiša Stanković Digital Signal Processing 435
Solution 7.21. The moments of the Gaussian distributed random variable follow from the moment
generating function, related to the Fourier transform of the Gaussian distribution (characteristics
function), as
Z∞
1 2 2 2 2 2 2
Mx (θ ) = Φ x (− jθ ) = √ e−(ξ −µ) /(2σ ) e j(− jθ )ξ dξ = e−σ (− jθ ) /2 e j(− jθ )µ = eσ θ /2 eθµ .
σ 2π
−∞
Expanding the moment generating function Mx (θ ) into Taylor’s series around θ = 0 we get
2 2 1 1 1
eθµ eσ θ /2
(θµ)2 + (θµ)3 + (θµ)4 + . . . )
= (1 + (θµ) +
2! 3! 4!
1 1 1
×(1 + (σ2 θ 2 /2) + (σ2 θ 2 /2)2 + (σ2 θ 2 /2)3 + (σ2 θ 2 /2)4 + . . . )
2! 3! 4!
1 1 1
= 1 + θµ + θ 2 (µ2 + σ2 ) + θ 3 (µ3 + 3µσ2 ) + θ 4 (µ4 + 6µ2 σ2 + 3σ4 ) . . .
2! 3! 4!
The moments of the Gaussian distributed random variable are given by (on the right for µ = 0)
M1 = µ, M1 = 0,
2 2
M2 = µ + σ , M2 = σ2 ,
M3 = µ3 + 3µσ2 , M3 = 0,
M4 = µ4 + 6µ2 σ2 + 3σ4 , M4 = 3σ4 .
The cumulants Ki of the random variable x (n) are obtained, by definition, from the Taylor’s
series around θ = 0 of the logarithm of the moment generating function ln( Mx θ )). In the case of the
Gaussian distribute variable
ln( Mx (θ )) = θµ + σ2 θ 2 /2.
Obviously,
K1 = µ, K1 = 0,
2
K2 = σ , K2 = σ 2 ,
K3 = 0, K3 = 0,
K4 = 0, K4 = 0,
and Ki = 0 for i > 2. This is well-known criterion to test if a random variable is Gaussian distributed,
since the cumulants of this distribution should be zero for i > 2. Since the third-oder moments are
zero-valued for any even distribution function, the fourth-order cumulant is used to check if a random
variable is Gaussian distributed. For an even distributed random variable, the fourth-order cumulant is
related to the moments as K4 = M4 − 3M22 , and Mi is statistically estimated as Mi = mean( xi (n))).
The kurtosis is defined as the fourth-order moment of the centered and normalized random
variable x ( n ) − µ 4
x
Kurtx = E{ }.
σx
For the Gaussian random variable Kurtx = 3. Any different value than Kurtx = 3 produces the excess
kurtosis,
ExcessKurtx = Kurtx − 3,
whose value is different than zero, and this value is an indicator of the distribution deviation from the
Gaussian distribution.
436 Discrete-Time Random Signals
Solution 7.22. The probability that the random variable is within −∞ < ξ < ∞ is equal to
Z∞ Z∞
a
1= pε(n) (ξ )dξ = dξ = a arctan(ξ )|∞
−∞ = aπ,
1 + ξ2
−∞ −∞
Solution 7.23. The expected value of the Poisson distributed random variable is
∞ ∞ ∞ ∞
λk e−λ λk e−λ λ k −1
µx = ∑ kP(k) = ∑ k = ∑k = λe−λ ∑
k =0 k =0
k! k =1
k! k =1
( k − 1) !
∞ k
λ
= λe−λ ∑ = λe−λ eλ = λ.
k =0
k!
Notice that the variance is the mean-value dependent, σx2 = µ x = λ. In the confidence interval
calculation, this problem can be solved using the variance stabilization (see Section 7.4.7 and Fig. 7.27).
The transformation
√ that would produce a mean value independent estimate of the variance would be
g(λ) = λ.
1
H (z) = .
1 − 0.5z−1
The z-transform of the input signal x (n) is
∞ ∞
X (z) = ∑ x (n)z−n = ∑ aδ(n)z−n = a.
n=−∞ n=−∞
Ljubiša Stanković Digital Signal Processing 437
61 −(n+m)
ryy (n, m) = E {y(n)y∗ (m)} = 2 u ( n ) u ( m ).
3
The output signal y(n) is not WSS.
(b) The z-transform of the cross-correlation of the input and the output signal y(n) = ε(n) ∗n h(n) =
ε h (n), is
R xy (z) = R xx (z) H (z).
For z = e jω , we get
Rεε h (e jω ) = Sεε (ω ) H (e jω ) = H (e jω ),
438 Discrete-Time Random Signals
resulting in 2
2 sin (nπ/2)
, n 6= 0
π n
rεε h (n) = h(n) =
0, n = 0.
It is easy to conclude that the cross-correlation function is antisymmetric r xy (−n) = −r xy (n).
Xa (e jω ) = X (e jω ) + jH (e jω ) X (e jω ).
a|n| 2
ryy (n) = σ .
1 − a2 ε
Ljubiša Stanković Digital Signal Processing 439
σε2 σε2
Syy (ω ) = Ryy (e jω ) = −
= .
(1 − ae jω )(1 − ae jω ) 1 − 2a cos ω + a2
The variance is
n 2 o
σy2 (n) = E y(n) − µy (n) = E{y2 (n)} − µ2y (n)
!2
n n
k1 k2 1 − a n +1
= ∑ ∑ a a E{ε(n − k1 )ε(n − k2 )}u(n) − µε
1−a
u ( n ).
k 1 =0 k 2 =0
1 − a2( n +1)
σy2 (n) = σε2 u ( n ).
1 − a2
N
µ x = µε + ∑ a k E{ e j ( ω n + θ ) } = µ ε ,
k k
k =1
since
Zπ
1
E{ e j ( ωk n + θ k ) } = e j(ωk n+θk ) dθk = 0.
2π
−π
The autocorrelation is
N
r xx (n) = σε2 δ(n) + µ2ε + ∑ a2k e jω n ,
k
k =1
Solution 7.29. For the optimal filtering d(n) = s(n). The cross-correlation of the input signal and
the desired signal is
Its z-transform is
−15z/4
Rdx (z) = Rss (z) = .
(z − 1/4)(z − 4)
440 Discrete-Time Random Signals
A stable system requires the region of convergence 0.127 < |z| < 7.873. This region of convergence
does not correspond to a causal system.
and
XR (k) = 8.
Note that the noise-free DFT value X (2) is 8.
Solution 7.31. With the rectangular window, the spectrogram form is given by
2
N −1 N −1 N −1
− j 2π ik 2π
Sx (n, k) = ∑ x (n + i )e N = ∑ ∑ x ( n + i1 ) x ∗ ( n + i 2 ) e − j N ( i1 − i2 ) k .
i =0 i =0 i =0 1 2
Using the fact that the signal s(n) is deterministic and the noise ε(n) is zero-mean, we get
N −1 N −1 2π
E{Sx (n, k)} = ∑ ∑ s ( n + i1 ) s ∗ ( n + i2 ) e − j N ( i1 − i2 ) k
i1 =0 i2 =0
N −1 N −1 2π
+ ∑ ∑ E{ε(n + i1 )ε∗ (n + i2 )}e− j N (i1 −i2 )k
i1 =0 i2 =0
or
N −1 N −1 2π
E{Sx (n, k)} = Ss (n, k) + σε2 ∑ ∑ δ ( i1 − i2 ) e − j N ( i1 − i2 ) k
i1 =0 i2 =0
N −1
= Ss (n, k) + σε2 ∑ 1 = Ss (n, k ) + Nσε2
i =0
σ2 = E{Sx (n, k)S∗x (n, k)} − E{Sx (n, k)}E{S∗x (n, k)}.
where
E{ x (n + i1 ) x ∗ (n + i2 ) x ∗ (n + i3 ) x (n + i4 )}
= s ( n + i1 ) s ∗ ( n + i2 ) s ∗ ( n + i3 ) s ( n + i4 )
+ s(n + i1 )s∗ (n + i2 )rεε (i4 − i3 ) + s(n + i1 )s∗ (n + i3 )rεε (i4 − i2 )
+ s∗ (n + i2 )s(n + i4 )rεε (i1 − i3 ) + s∗ (n + i3 )s(n + i4 )rεε (i1 − i2 )
+ E{ε(n + i1 )ε∗ (n + i2 )ε∗ (n + i3 )ε(n + i4 )}.
The facts that odd order moments of a Gaussian zero-mean noise are zero and rεε∗ (k) = rε∗ ε (k) = 0
for a complex-valued noise with i.i.d. are used. According to relation (7.151) from the note, holds
E{ε(n + i1 )ε∗ (n + i2 )ε∗ (n + i3 )ε(n + i4 )} = rεε (i1 − i2 )rεε (i4 − i3 ) + rεε (i1 − i3 )rεε (i4 − i2 ).
(7.160)
σ2 = Ss2 (n, k) + 4Nσε2 Ss (n, k) + 2N 2 σε4 − (Ss (n, k) + Nσε2 )2 = 2Nσε2 Ss (n, k) + N 2 σε4 . (7.161)
Solution 7.32. (a) The expected value of the Fourier transform of x (n) is
L
E{Wx (n, ω )} = ∑ E{ x (n + k) x ∗ (n − k)}e− j2ωk .
k=− L
the signal is deterministic and it is not correlated with the white noise ε(n),
where rεε (2k ) is the autocorrelation function of the additive noise ε(n). The noise variance is σε2 . Then,
In the case of a Gaussian zero-mean white stationary noise complex-valued noise with i.i.d. real and
imaginary parts rεε∗ (k) = rε∗ ε (k) = 0 and we can write
E{ x (n + k1 ) x ∗ (n − k1 ) x ∗ (n + k2 ) x (n − k2 )}
= s(n + k1 )s∗ (n − k1 )s∗ (n + k2 )s(n − k2 ) + s(n + k1 )s∗ (n − k1 )rεε (−2k2 )
+ s(n + k1 )s∗ (n + k2 )rεε (k2 − k1 ) + s∗ (n − k1 )s(n − k2 )rεε (k1 − k2 )
+ s∗ (n + k2 )s(n − k2 )rεε (2k1 ) + rεε (2k1 )rεε (−2k2 ) + rεε
2
( k 1 − k 2 ).
Solution 7.33. The signal s(n) and the noise ε(n) are not correlated. In this case,
(a) For the optimal filtering d(n) = s(n). The cross-correlation of the desired and input signal is
rdx (n) = E{d(k) x (n − k)} = E{s(k)[s(k − n) + ε(k − n)]} = rss (n) = 4(0.5)|n| .
and
∞ −n 3z2
Rdx (z) = ∑ 4(0.5)|n−1|z = zRss (z) =
n=−∞ (2z − 1)(2 − z)
follows
3z2
H (z) = .
−2z2 + 8z − 2
(c) In the case of prediction d(n) = s(n + 1) and
rdx (n) = E{d(k ) x (n − k)} = E{s(k + 1)[s(k − n) + ε(k − n)]} = rss (n + 1),
∞ −n 3
Rdx (z) = ∑ 4(0.5)|n+1|z = z−1 Rss (z) =
n=−∞ (2z − 1)(2 − z)
with
3
H (z) = .
−2z2 + 8z − 2
Solution 7.34. For the optimal filter the desired signal is d(n) = s(n). The correlation functions are
and
rdx (n) = E{s(k)[s∗ (k − n) + ε∗ (k − n)]} = rss (n) + rsε (n) = 3(0.9)|n| + 2δ(n).
Calculation of the z-transforms and the filter transfer function is left to the reader.
Solution 7.35. The power spectral densities of the signal and the input noise are
1 − |ω/2| for |ω/2| < 1
Sdd (e jω ) =
0 elsewhere
and
1 − ||ω | − 2| for | ω − 2| < 1
Sεε (e jω ) = .
0 elsewhere.
The optimal filter frequency response is
Sdd (e jω )
H (e jω ) = .
Sdd (e ) + Sεε (e jω )
jω
The result for −π ≤ ω < 0 is symmetric and is shown in Fig. 7.38(bottom). The input SNR is
Es 2
SNRi = = =1
Eε 2
or 0 [dB]. The output SNR is
Rπ 2 R2 2
1 jω jω
3/2 + 2 1 (1 − ω2 ) 2−ωω dω
2π −π Sdd ( e ) H ( e ) dω
SNRo = Rπ = R 2
1 jω 2 jω 2 2
2π −π Sεε ( e ) H ( e ) dω 2 1 (1 + (ω − 2)) 2−ωω dω
10 − 12 ln 2
= = 18.6181
16 ln 2 − 11
or 12.7 [dB].
N −1
cx (n, k) =
W ∑ xQ (n + m) xQ (n − m) + e(n + m, n − m) e− j2πmk/N
m =0
N −1
= ∑ {[ x(n + m) + e(n + m)] [(x(n − m) + e(n − m)] + e(n + m, n − m)}e− j2πmk/N .
m =0
It has been assumed that the errors in two different signal samples are not correlated E{e(n + m)e(n −
m)} = 0 for m 6= 0 and that the signal and the error are not correlated, E{ x (n + m)e(n − m)} = 0
for any m and n.
Part IV
445
Chapter 8
Adaptive Systems
8.1 INTRODUCTION
Classic systems for signal processing are designed to satisfy properties defined in advance. Their
parameters are time-invariant. Adaptive systems change their parameters or form, in order to achieve
the best possible performance. These systems are characterized by ability to observe variations in the
input signal behavior and to react to these changes by adapting their parameters in order to improve
the desired performance of the output signal. Adaptive systems have the ability to "learn" so that they
can appropriately adapt the performance when the system environment is changed. By definition the
adaptive systems are time-variant. These systems are often nonlinear as well. These two facts make
the design and analysis of adaptive systems more difficult than in the case of classical time-invariant
systems. Adaptive systems are the topic of this chapter.
Consider an adaptive system with one input and one output signal, as in Figure 8.1. In addition
to the algorithm that transforms the input signal to the output signal, the adaptive system have a part
that tracks the system performance and implements appropriate system changes. This control system
takes into account the input signal, the output signal, and some additional information that can help in
making a decision on how the system parameters should change.
adaptation
rule other
data
________________________________________
Authors: Ljubiša Stanković, Miloš Daković
446
Part V
Time-Frequency Analysis
549
Chapter 10
Linear Time-Frequency Representations
The Fourier transform provides a unique mapping of a signal from the time domain to the
frequency domain. The frequency domain representation provides the signal’s spectral content.
Although the phase characteristic of the Fourier transform contains information about the time
distribution of the spectral content, it is very difficult to use this information. Therefore, one may say
that the Fourier transform is practically useless for this purpose, that is, that the Fourier transform does
not provide a time distribution of the spectral components.
Depending on problems encountered in practice, various representations have been proposed
to analyze non-stationary signals in order to provide time-varying spectral description. The field of
the time-frequency signal analysis deals with these representations of non-stationary signals and their
properties. Time-frequency representations may roughly be classified as linear, quadratic, and higher
order representations.
Linear time-frequency representations exhibit linearity, that is, the representation of a linear
combination of signals equals the linear combination of the individual representations. From this class,
the most important one is the short-time Fourier transform (STFT) and its variations. The energetic
version of the STFT is called spectrogram. It is the most frequently used tool in time-frequency signal
analysis.
The second class of time-frequency representations are the quadratic ones. The most interesting
representations of this class are those which provide a distribution of signal energy in the time-frequency
plane. They will be referred to as distributions. The concept of a distribution is borrowed from the
probability theory, although there is a fundamental difference. For example, in time-frequency analysis,
distributions may take negative values. Other possible domains for quadratic signal representations are
the ambiguity domain, the time-lag domain and the frequency-Doppler frequency domain. In order to
improve time-frequency representation various higher-order distributions have been defined as well.
The idea behind the short-time Fourier transform (STFT) is to apply the Fourier transform to a portion
of the original signal, obtained by introducing a sliding window function w(t) to localize the analyzed
signal x (t). The Fourier transform is calculated for the localized part of the signal. It produces the
spectral content of the portion of the analyzed signal within the time interval defined by the width of
the window function. The STFT (a time-frequency representation of the signal) is then obtained by
sliding the window along the signal. Illustration of the STFT calculation is presented in Fig.10.1.
_________________________________________________
Authors: Ljubiša Stanković, Miloš Daković, Thayaparan Thayananthan
550
Ljubiša Stanković Digital Signal Processing 551
w(τ)
x(t)
τ
t
t
x(t+τ)w(τ)
From (10.1) it is apparent that the STFT actually represents the Fourier transform of a signal x (t),
truncated by the window w(τ ) centered at instant t (see Fig. 10.1). From the definition, it is clear that
the STFT satisfies properties inherited from the Fourier transform (e.g., linearity).
By denoting xt (τ ) = x (t + τ ) we can conclude that the STFT is the Fourier transform of the
signal xt (τ )w(τ ),
STFT (t, Ω) = FTτ { xt (τ )w(τ )}.
Another form of the STFT, with the same time-frequency performance, is
Z∞
STFTI I (t, Ω) = x (τ )w∗ (τ − t)e− jΩτ dτ (10.2)
−∞
Example 10.1. To illustrate the STFT application, let us perform time-frequency analysis of the
following signal
x (t) = δ(t − t1 ) + δ(t − t2 ) + e jΩ1 t + e jΩ2 t . (10.3)
The STFT of this signal equals
STFT (t, Ω) = w(t1 − t)e− jΩ(t1 −t) + w(t2 − t)e− jΩ(t2 −t)
+ W (Ω − Ω1 )e jΩ1 t + W (Ω − Ω2 )e jΩ2 t , (10.4)
where W (Ω) is the Fourier transform of the used window. The STFT is depicted in Fig. 10.2 for
various window lengths, along with the ideal representation. A wide window w(t) in the time
domain is characterized by a narrow Fourier transform W (Ω) and vice versa. Influence of the
window to the results will be studied later.
552 Linear Time-Frequency Representations
STFTwide(t,Ω) STFTnarrow(t,Ω)
Ω Ω
Ω Ω
2 2
Ω Ω
1 1
(a) (b)
t t t t t t
1 2 1 2
Ω Ω
2 2
Ω Ω
1 1
(c) (d)
t1 t2 t t1 t2 t
Figure 10.2 Time-frequency representation of a sum of two delta pulses and two sinusoids obtained using: (a) a
wide window, (b) a narrow window, (c) a medium width window, and (d) an ideal time-frequency representation.
Z∞ ZT
ja(t+τ )2 − jΩτ 2
STFT (t, Ω) = e w(τ )e dτ = e ja(t+τ ) w(τ )e− jΩτ dτ
−∞ −T
r r
2
j(2at−Ω)τ0 jaτ02 2πj 2 2 Ω − 2at πj
≃e jat
e e w(τ0 ) = e jat e− j(2at−Ω) /4a w (10.6)
2a 2a a
since
2a(t + τ0 ) = Ω.
Ljubiša Stanković Digital Signal Processing 553
In this case, the width of |STFT (t, Ω)| along frequency does not decrease with an increase of the
window w(τ ) width. The width of |STFT (Ω, t)| around the central frequency Ω = 2at is
D = 4aT,
where 2T is the window width in the time domain. Note that this relation holds for a wide
window w(τ ), such that the stationary phase method may be applied. If the window is narrow,
with respect to the phase variations of the signal, the STFT width is defined by the width of
the Fourier transform of window. It is proportional to 1/T. Thus, the overall STFT width could
be approximated by a sum of the frequency variation caused width and the window’s Fourier
transform width, that is,
2c
Do = 4aT + , (10.8)
T
where c is a constant defined by the window shape (by using the main lobe as the window width,
it will be shown later that c = 2π for a rectangular window or c = 4π for a Hann(ing) window).
This relation corresponds to the STFT calculated as a convolution of an appropriately scaled time
domain window whose width is |τ | < 2aT and the frequency domain form of window W (Ω).
The approximation is checked against the exact STFT calculated by definition. The agreement is
almost complete, Fig.10.3.
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
−1 0 1 −1 0 1
frequency Ω frequency Ω
Figure 10.3 Exact absolute STFT value of a linear FM signal at t = 0 for various window widths T =
2, 4, 8, 16, .., 1024 (left) and its approximation calculated as an appropriately scaled convolution of the time and
frequency domain window w(τ ) (right).
Therefore, there is a window width T producing the narrowest possible STFT for this signal.
It is obtained by equating the derivative of the overall width to zero,
2c
4a − = 0,
T2
554 Linear Time-Frequency Representations
which results in r
c
To = . (10.9)
2a
As expected, for a sinusoid, a → 0, To → ∞. This is just an approximation of the optimal window,
since for narrow windows we may not apply the stationary phase method (the term 4aT is then
much smaller than 2c/T and may be neglected anyway).
Note that for a = 1/2, when the instantaneous frequency is a symmetry line for the time
and the frequency axis 2 − T2c2 = 0 or 2T = 2c
T , meaning that the optimal window should have
the widths equal in the time-domain 2T and in the frequency domain 2c/T (main lobe width).
Example 10.3. For illustration consider two different signals x1 (t) and x2 (t) producing the same
amplitude of the Fourier transform, Fig. 10.4,
!
t t 16 t − 128 2
x1 (t) = sin 122π − cos 42π − π
128 128 11 64
2 !
t t − 128 t − 120 3 −( t−140 )2
− 1.2 cos 94π − 2π −π e 75
128 64 64
!
t t − 50 2 −( t−50 )2
− 1.6 cos 15π − 2π e 16 (10.11)
128 64
x2 (t) = x1 (255 − t).
Their spectrograms are shown in Fig.10.5. From the spectrograms, we can follow time variations
of the spectral content. The signals obviously consist of one constant high frequency component,
one linear frequency component (in the first signal with decreasing frequency as time progresses,
and in the second signal with increasing frequency), and two chirps (one appearing at different
time instants and the other having different frequency variations).
Ljubiša Stanković Digital Signal Processing 555
x1(t)
x2(t)
(a) (b)
t t
|X1(Ω)|
|X2(Ω)|
(c) (d)
Ω Ω
Figure 10.4 Two different signals x1 (t) 6= x2 (t) with the same amplitudes of their Fourier transforms, | X1 (Ω)| =
| X2 (Ω)|.
The signal can be obtained from the STFT calculated at an instant t, STFT (t, Ω), as its inverse Fourier
transform
Z∞
1
x (t + τ )w(τ ) = STFT (t, Ω)e− jΩτ dΩ.
2π
−∞
This relation can be theoretically used for the signal within the region w(τ ) 6= 0,
Z∞
1 1
x (t + τ ) = STFT (t, Ω)e− jΩτ dΩ.
w(τ ) 2π
−∞
is obtained. If the value of step R is smaller than the window duration then the same signal value is
used within two (several) windows.
For the correct reconstruction, the segments ri (τ ) = x (iR + τ )w(τ ), whose positions on the time
axis are illustrated in Fig. 10.1, should be properly re-positioned in terms of t-axis, using τ = t − iR,
and summed
SPEC1(t,Ω)
250
200
150
100
t
50
(a)
0 2.5 3
0.5 1 1.5 2
0
Ω
SPEC2(t,Ω)
250
200
150
100
t
50
(b)
0 2.5 3
0.5 1 1.5 2
0
If the sum of shifted versions of the windows is constant (without loss of generality assume equal to 1),
then
Z∞
1
x (t) = ∑ STFT (iR, Ω)e− jΩ(t−iR) dΩ.
i
2π
−∞
The condition ∑ w(τ − iR) = 1 is equivalent to the requirement that the periodic extension of the
i
window, with the period R, is constant (see Fig. 10.6). The periodic extension of a continuous signal
corresponds to the sampling of the window Fourier transform at Ω = 2πk/R in the Fourier domain,
(1.66). This means that
2π
W( k) = 0
R
Ljubiša Stanković Digital Signal Processing 557
If this STFT is available at a discrete set of instants, t = iR (or any other set of discrete instants ti ),
then the summation of STFT (t, Ω) over all values calculated at different instants t is
Z∞
1
∑ 2π STFT (iR, Ω)e jΩτ dΩ = ∑ x (τ )w∗ (τ − iR) = x (τ ) (10.15)
i −∞ i
1 Z∞
∑ 2π STFT (iR, Ω)e jΩτ dΩ w(τ − iR) = ∑ x(τ )w2 (τ − iR) = x(τ ) (10.16)
i −∞ i
if the condition
∑ w2 (τ − iR) = 1, (10.17)
i
holds. The same condition can be derived from the following analysis.
The STFT can be considered as a projection (inner product) of the signal, x (τ ), to the time-
frequency kernel function
ht,Ω (τ ) = w(τ − t)e jΩτ ,
that is
D E D E Z∞
STFT (t, Ω) = x (τ ), ht,Ω (τ ) = x (τ ), w(τ − t)e jΩτ = x (τ )w∗ (τ − t)e− jΩτ dτ.
τ τ
−∞
D E (10.18)
The back-projection of the STFT to the same (conjugate) kernel is STFT (t, Ω), h∗t,Ω (τ ) or
t,Ω
D E Z∞
1
STFT (t, Ω), w∗ (τ − t)e− jΩτ =∑ STFT (ti , Ω)w(τ − ti )e jΩτ dΩ. (10.19)
t,Ω ti 2π
−∞
With t = ti = iR, relation (10.19) reduces to (10.16), with the same reconstruction condition.
558 Linear Time-Frequency Representations
10.3 WINDOWS
The window function plays a crucial role in the localization of the signal in the time-frequency plane.
The most commonly used windows will be presented next.
The rectangular window function has very strong and oscillatory sidelobes in the frequency domain,
since the function sin(ΩT )/Ω converges very slowly, toward zero, in Ω as Ω → ±∞. Slow
convergence in the Fourier domain is caused by a significant discontinuity in time domain, at t = ± T.
The mainlobe width of WR (Ω) is dΩ = 2π/T. In order to enhance signal localization in the frequency
domain, other window functions have been introduced.
The discrete-time form of the rectangular window is
This window could be considered as a self convolution of the rectangular window of the duration T,
since
The Fourier transform of the triangular window is a product of two Fourier transforms of the rectangular
window of the width T,
4 sin2 (ΩT/2)
WT (Ω) = . (10.23)
Ω2
Convergence of this function toward zero, as Ω → ±∞, is of the 1/Ω2 order. It is a continuous
function of time, with discontinuities in the first derivative at t = 0 and t = ± T. The mainlobe of this
Ljubiša Stanković Digital Signal Processing 559
window function is twice wider in the frequency domain than in the rectangular window case. Its width
follows from ΩT/2 = π as dΩ = 4π/T.
The discrete-time form is
2 |n|
w(n) = 1 − [u(n + N/2) − u(n − N/2)].
N
In the frequency domain its form is
N/2−1
2 |n| − jωn sin2 (ωN/4)
W (e jω ) = ∑ 1− e = .
n=− N/2
N sin2 (ω/2)
Since cos (πτ/T ) = [exp ( jπτ/T ) + exp (− jπτ/T )]/2, the Fourier transform of this window is
related to the Fourier transform of the rectangular window of the same width as
1 1 1
WH (Ω) = WR (Ω) + WR (Ω − π/T ) + WR (Ω + π/T )
2 4 4
π 2 sin(ΩT )
= . (10.25)
Ω ( π 2 − Ω2 T 2 )
Example 10.4. Find the window that will correspond to the frequency smoothing ( X (k + 1) + X (k ) +
X (k − 1))/3, that is, to
1
DFT{ x (n)w(n)} = DFT{ x (n)} ∗k DFT{w(n)}
N
1 1 1
= X ( k + 1) + X ( k ) + X ( k − 1).
3 3 3
Example 10.5. Find the formula to calculate the STFT with a Hann(ing) window, if the STFT
calculated with a rectangular window is known.
For the Hann(ing) window w(τ ) of the width 2T, we may roughly assume that its Fourier
transform WH (Ω) is nonzero within the main lattice | Ω |< 2π/T only, since the sidelobes decay
very fast. Then, we may write dΩ = 4π/T. It means that the STFT is nonzero valued in the shaded
regions in Fig. 10.2.
We see that the duration in time of the STFT of a delta pulse is equal to the widow width
dt = 2T. The STFTs of two delta pulses (very short duration signals) do not overlap in time-frequency
domain if their distance is greater than the window duration |t1 − t2 | > dt . Then, these two pulses
can be resolved. Thus, the window width is here a measure of time resolution. Since the Fourier
Ljubiša Stanković Digital Signal Processing 561
transform of the Hann(ing) window converges fast, we can roughly assume that a measure of duration
in frequency is the width of its mainlobe, dΩ = 4π/T. Then, we may say that the Fourier transforms
of two sinusoidal signals do not overlap in frequency if the condition |Ω1 − Ω2 | > dΩ holds. It is
important to observe that the product of the window durations in time and frequency is a constant. In
this example, considering time domain duration of the Hann(ing) window and the width of its mainlobe
in the frequency domain, this product is dt dΩ = 8π. Therefore, if we improve the resolution in the
time domain dt , by decreasing T, we inherently increase the value of dΩ in the frequency domain. This
essentially prevents us from achieving the ideal resolution (dt = 0 and dΩ = 0) in both domains. A
general formulation of this principle, stating that the product of effective window durations in time and
in frequency cannot be arbitrarily small, will be presented later.
The Hann(ing) window satisfies the constant overlap-add (COLA) reconstruction condition,
∑i w(τ − iR) = 1, with R = T, as shown in Fig. 10.6. This property follows from cos2 (πτ/(2T )) +
cos2 (π (τ ± T )/(2T )) = cos2 (πτ/(2T )) + sin2 (πτ/(2T )) = 1.
The same condition can be satisfied with R = T/2, R = T/4, . . . , after an appropriate scaling of
the window amplitude.
1.2
0.8
0.6
0.4
0.2
0
-4 -3 -2 -1 0 1 2 3 4
Figure 10.6 Hann(ing) window and its shifted versions, that satisfy the constant overlap-add (COLA) reconstruc-
tion condition ∑i w(τ − iR) = 1, with R = 1.
A similar relation between the Hamming and the rectangular window transforms holds, as in the case
of Hann(ing) window.
The Hamming window was derived starting from
If we choose such a value of a to cancel out the second sidelobe at its maximum (that is, at ΩT ∼
= 2.5π)
then we get
2aT T T
0= − (1 − a ) +
2.5π 1.5π 3.5π
resulting in
a = 25/46 ∼ = 0.54. (10.28)
This window has several sidelobes, next to the mainlobe, lower than the previous two windows.
However, since it is not continuous at t = ± T, its decay in frequency, as Ω → ±∞, is not fast. Note
that we let the mainlobe to be twice wider than in the rectangular window case, so we cancel out not
the first but the second sidelobe, at its maximum.
The discrete-time domain form is
2πn
w(n) = 0.54 + 0.46 cos [u(n + N/2) − u(n − N/2)]
N
with
W (k) = 0.54Nδ(k) + 0.23Nδ(k + 1) + 0.23Nδ(k − 1).
In some applications, it is crucial that the sidelobes are suppressed, as much as possible. This is achieved
using windows of more complicated forms, like the Blackman window. It is defined by
0.42 + 0.5 cos (πτ/T ) + 0.08 cos (2πτ/T ) for |τ | < T
w(τ ) = (10.29)
0 elsewhere.
with a0 + a1 + a2 = 1 and canceling out the Fourier transform values W (Ω) at the positions of the third
and the fourth sidelobe maxima (that is, at ΩT ∼ = 3.5π and ΩT ∼= 4.5π). Here, we let the mainlobe to
be three times wider than in the rectangular window case, so we cancel out not the first nor the second
but the third and fourth sidelobes, at their maxima.
The discrete-time and frequency domain forms are
2πn 4πn N N
w(n) = 0.42 + 0.5 cos + 0.08 cos [u(n + ) − u(n − )]
N N 2 2
W (k) = [0.42δ(k) + 0.25(δ(k + 1) + δ(k − 1)) + 0.04(δ(k + 2) + δ(k − 2))] N.
Further reduction of the sidelobes can be achieved by, for example, the Kaiser (Kaiser-Bessel) window.
It is an approximation to a restricted time duration function with minimum energy outside the mainlobe.
This window is defined by using the zero-order Bessel functions, with a localization parameter. It has
the ability to keep the maximum energy within the mainlobe, while minimizing the sidelobe energy.
The sidelobe level can be as low as −70 dB, as compared to the mainlobe, and even lower. This kind of
window is used in the analysis of signals with significantly different amplitudes, when the sidelobe of
one component can be much higher than the amplitude of the mainlobe of other components.
These are just a few of the windows used in signal processing. Some windows, along with the
corresponding Fourier transforms, are presented in Fig. 10.7.
Ljubiša Stanković Digital Signal Processing 563
10 log| W( )|
W( )
w( )
10 log| W( )|
W( )
w( )
10 log| W( )|
W( )
w( )
10 log| W( )|
W( )
w( )
10 log| W( )|
W( )
w( )
Figure 10.7 Windows in the time and frequency domains: rectangular window (first row), triangular (Bartlett)
window (second row), Hann(ing) window (third row), Hamming window (fourth row), and Blackman window (fifth
row).
Example 10.6. Calculate the STFT of the signals x1 (t) = 2 cos(4πt/T ) + 2 cos(12πt/T ) and
x2 (t) = 2 cos(4πt/T ) + 0.001 cos(64πt/T ) at t = 0. Use a Hamming and a Blackman window
with T = 128 and ∆t = 1. Comment the results.
⋆ The STFT at t = 0 is shown in Fig.10.8. The resolution of close components in x1 (t) is better
when the Hamming window is used, since the main lobe of the Blackman window is wider. Small
564 Linear Time-Frequency Representations
signal component in x2 (t) is visible in the STFT with the Blackman window since its side-lobes
are lower.
0 0
|STFT(0,Ω)|
|STFT(0,Ω)|
10 10
−5 −5
10 10
0 0
|STFT(0,Ω)|
|STFT(0,Ω)|
10 10
−5 −5
10 10
Figure 10.8 The STFT at n = 0 calculated using the Hamming window (left) and the Blackman window (right)
of the signals x1 (n) (top) and x2 (n) (bottom).
Discretization and realizations of the STFT will be discussed in this section. Recursive realization
which is appropriate for the on-line implementation of the STFT will be presented, along with the filter
bank form of the STFT.
In numerical calculations, the integral form of the STFT should be discretized. By sampling the signal
with sampling interval ∆t we get
Z∞ ∞
STFT (t, Ω) = x (t + τ )w(τ )e− jΩτ dτ ≃ ∑ x ((n + m)∆t)w(m∆t)e− jm∆tΩ ∆t.
−∞ m=−∞
By denoting
x (n) = x (n∆t)∆t
Ljubiša Stanković Digital Signal Processing 565
and normalizing the frequency Ω by ∆t, ω = ∆tΩ, we get the time-discrete form of the STFT as
∞
STFT (n, ω ) = ∑ w(m) x (n + m)e− jmω . (10.30)
m=−∞
We will use the same notation for continuous-time and discrete-time signals, x (t) and x (n). However,
we hope that this will not cause any confusion since we will use different sets of variables, for example
t and τ for continuous time and n and m for discrete time. Also, we hope that the context will be
always clear, so that there is no doubt what kind of signal is considered.
It is important to note that STFT (n, ω ) is periodic in frequency with period 2π. The relation
between the analog and the discrete-time form is
∞
STFT (n, ω ) = ∑ STFT (n∆t, Ω + 2kΩ0 ) with ω = ∆tΩ.
k =−∞
The sampling interval ∆t is related to the period in frequency as ∆t = π/Ω0 . According to the
sampling theorem, in order to avoid the overlapping of the STFT periods (aliasing), we should take
π π
∆t = ≤
Ω0 Ωm
where Ωm is the maximum frequency in the STFT. Strictly speaking, the windowed signal x (t + τ )w(τ )
is time limited, thus it is not frequency limited. Theoretically, there is no maximum frequency since
the width of the window’s Fourier transform is infinite. However, in practice we can always assume
that the value of spectral content of x (t + τ )w(τ ) above frequency Ωm , that is, for |Ω| > Ωm , can
be neglected, and that overlapping of the frequency content above Ωm does not degrade the basic
frequency period.
The discretization in frequency should be done by a number of samples greater than or equal to
the window length N. If we assume that the number of discrete frequency points is equal to the window
length, then
N/2−1
STFT (n, k ) = STFT (n, ω )|ω = 2π k = ∑ w(m) x (n + m)e− j2πmk/N (10.31)
N
m=− N/2
for a given instant n. When the DFT routines with indices from 0 to N − 1 are used, then a shifted
version of w(m) x (n + m) should be formed for the calculation for N/2 ≤ m ≤ N − 1. It is obtained
as w(m − N ) x (n + m − N ), since in the DFT calculation periodicity of the signal w(m) x (n + m),
with period N, is inherently assumed.
Example 10.7. Consider a signal with M = 16 samples, x (0), x (1), . . . , x (15), write a matrix form
for the calculation of a four-sample STFT. Present nonoverlapping and overlapping cases of the
STFT calculation.
566 Linear Time-Frequency Representations
⋆ For the calculation of (10.31) with N = 4, when k = −2, −1, 0, 1, for given instant n, the
following matrix notation can be used
4
STFT (n, −2) W4 W42 1 W4−2 x ( n − 2)
STFT (n, −1) W 2 W 1 1 W −1 x (n − 1)
4 4 4
STFT (n, 0) = 1 1 1 1 x (n)
STFT (n, 1) W4−2 W4−1 1 W41 x ( n + 1)
or
STFT(n) = W4 x(n)
with STFT(n) = [STFT (n, −2) STFT (n, −1) STFT (n, 0) STFT (n, 1)] T , x(n) = [ x (n − 2)
x (n − 1) x (n) x (n + 1)] T , and W4 is the DFT matrix of order four with elements W4mk =
exp(− j2πmk/N ). Here, a rectangular window is assumed. Including the window function, the
previous relation can be written as
STFT(n)= W4 H4 x(n),
with
w(−2) 0 0 0
0 w(−1) 0 0
H4 =
0
0 w (0) 0
0 0 0 w (1)
being a diagonal matrix whose elements are the window values w(m), H4 =diag(w(m)),
m = −2, −1, 0, 1 and
w(−2)W44 w(−1)W42 w(0) w(1)W4−2
w(−2)W 2 w(−1)W 1 w(0) w(1)W −1
W4 H 4 =
w(−2)
4 4 4 .
w(−1) w(0) w(1)
w(−2)W4−2 w(−1)W4−1 w(0) w(1)W41
where STFT is a matrix of the STFT values with columns corresponding to the calculation
instants and the rows to the frequencies. This matrix is of the form
STFT = STFT M (0) STFT M ( M) . . . STFT M ( N − M )
STFT (2, −2) STFT (6, −2) STFT (10, −2) STFT (14, −2)
STFT (2, −1) STFT (6, −1) STFT (10, −1) STFT (14, −1)
= STFT (2, 0) STFT (6, 0) STFT (10, 0) STFT (14, 0) .
The matrix X4,4 is formed of four successive signal values in each column. Notation X N,R will
be used to denote the signal matrix with columns containing N signal values and the difference
of the first signal value indices in the successive columns is R. For R = N the nonoverlapping
calculation is performed.
Ljubiša Stanković Digital Signal Processing 567
For a STFT calculation with overlapping, R < N, for example with the time step in the
STFT calculation R = 1, we get
x (0) x (1) x (2) . . . x (10) x (11) x (12)
x (1) x (2) x (3) . . . x (11) x (12) x (13)
STFT = H4 W4
x (2) x (3) x (4) . . . x (12) x (13) x (14)
x (3) x (4) x (5) . . . x (13) x (14) x (15)
STFT =W4 H4 X4,1 .
The step R defines the difference of the arguments in two neighboring columns. In the first
case the difference of arguments in two neighboring columns was 4 (time step in the STFT
calculation was R = 4 equal to the window width, meaning nonoverlapped calculation). In the
second example difference is R = 1 < 4, meaning the overlapped STFT calculation. Note that
the window function HN and the DFT matrix WN remain the same for both cases.
Assuming that the values of the signal with amplitudes bellow 1/e4 could be neglected, find the
sampling rate for the STFT-based analysis of this signal. Write the approximate spectrogram
expression for the Hann(ing) window of N = 32 samples in the analysis. What signal will be
presented in the time-frequency plane, within the basic frequency period, if the signal is sampled
at ∆t = 1/128?
⋆ The time interval, with significant signal content, for the first signal component is −2 ≤ t ≤ 2,
with the frequency content within −56π ≤ Ω ≤ −8π, since the instantaneous frequency
is Ω(t) = −12πt − 32π. For the second component these intervals are 0 ≤ t ≤ 2 and
160π ≤ Ω ≤ 224π. The maximum frequency in the signal is Ωm = 224π. Here, we have to take
into account possible spreading of the spectrum caused by the lag window. Its width in the time
domain is dt = 2T = N∆t = 32∆t. The width of the mainlobe in frequency domain dw is defined
by 32dw ∆t = 4π, or Ωw = π/(8∆t). Thus, taking the sampling interval ∆t = 1/256, we will
satisfy the sampling theorem condition in the worst instant case, since π/(Ωm + dw ) = 1/256.
In the case of the Hann(ing) window with N = 32 and ∆t = 1/256, the lag interval is
N∆t = 1/8. We will assume that the amplitude variations within the window are small, that
is, w(τ )e−(t+τ ) ∼
2 2
= w(τ )e−t for −1/16 < τ ≤ 1/16. Then, according to the stationary phase
method, we can write the STFT approximation,
2
+32π + 1 e−8(t−1)2 w2 Ω−32πt−160π
|STFT (t, Ω)|2 = 61 e−2t w2 Ω+12πt 12π 32 32π
−96π ≤ Ω ≤ −32π. Thus, the signal represented by the STFT in this case will correspond to
2 2 2 2
xr (t) = e−t e− j6πt − j32πt + e−4(t−1) e j16πt + j(160−256)πt ,
with approximation,
2
1 −8( t −1)2 2
|STFT (t, Ω)|2 = 16 e−2t w2 Ω+12πt+32π
12π + 32 e w Ω−32πt−96π
32π , (10.32)
For the rectangular window, the STFT values at an instant n can be calculated recursively from the
STFT values at n − 1, as
This recursive formula follows easily from the STFT definition (10.31).
For other window forms, the STFT can be obtained from the STFT obtained by using the
rectangular window. For example, according to (10.26) the STFT with Hann(ing) window STFTH (n, k)
is related to the STFT with rectangular window STFTR (n, k) as
1 1 1
STFTH (n, k) = STFTR (n, k) + STFTR (n, k − 1) + STFTR (n, k + 1).
2 4 4
This recursive calculation is important for hardware implementation of the STFT and other related
time-frequency representations (e.g., the higher order representations implementations based on the
STFT).
(−1)k e j2kπ/N
x(n+N/2−1) + + STFTR(n,k)
z−N z−1
−1
a1
STFTR(n,k+1)
a0
STFTR(n,k) + STFTH(n,k)
a−1
STFTR(n,k−1)
Figure 10.9 Recursive implementation of the STFT for the rectangular and other windows.
Ljubiša Stanković Digital Signal Processing 569
A system for the recursive implementation of the STFT is shown in Fig. 10.9. The STFT obtained
using the rectangular window is denoted by STFTR (n, k), Fig.10.9, while the values of coefficients are
1 1 1
( a −1 , a0 , a1 ) = ( , , ),
4 2 4
( a−1 , a0 , a1 ) = (0.23, 0.54, 0.23),
( a−2 , a−1 , a0 , a1 , a2 ) = (0.04, 0.25, 0.42, 0.25, 0.04)
Z∞ Z∞ h i
STFT (t, Ω) = x (t + τ ) w(τ ) e− jΩτ dτ = x (t − τ )w(τ )e jΩτ dτ = x (t) ∗t w(t)e jΩt
−∞ −∞
where an even, real valued, window function is assumed, w(τ ) = w(−τ ). For a discrete set of
frequencies Ωk = k∆Ω = 2πk/( N∆t), k = 0, 1, 2, . . . , N − 1, and discrete values of signal, we get
that the discrete STFT, (10.31), is an output of the filter bank with impulse responses
h i
STFT (n, k) = x (n) ∗n w(n)e j2πkn/N = x (n) ∗n hk (n)
what is illustrated in Fig.10.10. The next STFT can be calculated with time step R∆t, meaning
downsampling in time with factor 1 ≤ R ≤ N. Two special cases are: no downsampling, R = 1, and
nonoverlapping calculation, R = N. Influence of R to the signal reconstruction will be discussed later.
Nonoverlapping cases are important and easy for analysis. They also keep the number of the STFT
coefficients equal to the number of the signal samples. However, the STFT is commonly calculated
using overlapping windows. There are several reasons for introducing overlapped STFT representations.
Rectangular windows have poor localization in the frequency domain. The localization is improved by
other window forms. In the case of nonrectangular windows some of the signal samples are weighted
in such a way that their contribution to the final representation is small. Then we want to use additional
STFT with a window positioned in such a way that these samples contribute more to the STFT
calculation. Also, in the parameters estimation and detection the task is to achieve the best possible
estimation or detection for each time instant instead of using interpolations for the skipped instants when
the STFT with a big step (equal to the window width) is calculated. Commonly, the overlapped STFTs
are calculated using, for example, rectangular, Hann(ing), Hamming, Bartlett, Kaiser, or Blackman
window of a constant window width N with steps N/2, N/4, N/8, . . . in time. Computational cost
is increased in the overlapped STFTs since more STFTs are calculated. A way of composing STFTs
570 Linear Time-Frequency Representations
STFT(n,0)
w(n) ↓R
STFT(n,1)
w(n) e j2πn/N ↓R
x(n)
...
STFT(n,N−1)
w(n) e j2πn(N−1)/N ↓R
calculated with a rectangular window into a STFT with, for example, the Hann(ing), Hamming, or
Blackman window, is presented in Fig.10.9.
If a signal x (n) is of duration M, with 0 ≤ n ≤ M − 1, in some cases in addition to the overlapping
in time, an interpolation in frequency is done, for example up to the DFT grid with M samples. The
overlapped and interpolated STFT of this signal is calculated, using a window w(m) whose width is
N ≤ M, as
N/2−1
STFTN (n, k) = ∑ w(m) x (n + m)e− j2πmk/M
m=− N/2
k = − M/2, − M/2 + 1, . . . , −1, 0, 1, . . . , M/2 − 1.
Example 10.9. The STFT calculation of a signal whose frequency changes linearly is done by using a
rectangular window. Signal samples within 0 ≤ n ≤ M − 1 with M = 64 were available. The
nonoverlapping STFT of this signal is calculated with a rectangular window of the width N = 8
and presented in Fig.10.11. Its values are STFT8 (n, k) at n = 4, 12, 20 . . . , 60 and −4 ≤ k ≤ 3.
The nonoverlapping STFT values obtained using the rectangular window are shifted in frequency,
scaled, and added up, Fig. 10.12, to produce the STFT with a Hamming window, Fig. 10.13.
The STFT calculation for the same linear FM signal will be repeated for the overlapping
STFT with step R = 1, when n = 0, 1, 2, 3, 5, . . . , 63 is used. Here, it has been assumed that
the linear FM signal is available for all − N/2 ≤ m + n ≤ M − 1 + N/2 − 1. Results for the
rectangular and Hamming window (obtained by a simple matrix calculation from the rectangular
window case) are presented in Fig.10.14. Three window widths are used here.
The same procedure is repeated with the windows zero padded up to the widest used window
(interpolation in frequency). The results are presented in Fig.10.15. Note that regarding to the
amount of information all these figures do not differ from the basic time-frequency representation
presented in Fig.10.11.
Ljubiša Stanković Digital Signal Processing 571
Figure 10.11 The STFT of a linear FM signal x (n) calculated using a rectangular window of the width N = 8.
Figure 10.12 The STFT of a linear FM signal calculated using a rectangular window (from the previous figure),
along with its frequency shifted versions STFTR (n, k − 1) and STFTR (n, k + 1). Their weighted sum produces
the STFT of the same signal with a Hamming window STFTH (n, k ).
572 Linear Time-Frequency Representations
Figure 10.13 The STFT of a linear FM signal x (n) calculated using the Hamming window with N = 8.
Calculation is illustrated in the previous figure.
Signal reconstruction from non-overlapping STFT values is obvious for a rectangular window. A simple
illustration is presented in Fig.10.16. Windowed signal values are reconstructed from the STFTs by a
simple inversion of each STFT
STFT(n) = W N Hw x(n)
Hw x(n) = IDFT{STFT(n)} = W− 1
N STFT ( n )
where Hw is a diagonal matrix with the window values as its elements, Hw = diag(w(m)).
Ljubiša Stanković Digital Signal Processing 573
STFT with rectangular window, N=48 STFT with Hamming window, N=48
STFT with rectangular window, N=16 STFT with Hamming window, N=16
STFT with rectangular window, N=8 STFT with Hamming window, N=8
Figure 10.14 Time-frequency analysis of a linear frequency modulated signal with overlapping windows of
various widths. Time step in the STFT calculation is R = 1.
574 Linear Time-Frequency Representations
STFT with rectangular window, N=48 STFT with Hamming window, N=48
STFT with rectangular window, N=16 STFT with Hamming window, N=16
STFT with rectangular window, N=8 STFT with Hamming window, N=8
Figure 10.15 Time-frequency analysis of a linear frequency modulated signal with overlapping windows of various
widths. Time step in the STFT calculation is R = 1. For each window width the frequency axis is interpolated
(signal in time is zero padded) up to the total number of available signal samples M = 64.
Ljubiša Stanković Digital Signal Processing 575
6
S4(2,1) S4(6,1) S (10,1) S (14,1)
4 4
5
2
S4(2,0) S4(6,0) S (10,0)
4
S (14,0)
4
1
−1
−2
S (2,−1) S4(6,−1) S4(10,−1) S4(14,−1)
4
−3
−4
−5
−6
S4(2,−2) S4(6,−2) S (10,−2) S4(14,−2)
4
−7
−8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x(0), x(1), x(2), x(3) x(4), x(5), x(6), x(7) x(8), x(9),x(10),x(11) x(12),x(13),x(14),x(15)
Figure 10.16 Illustration of the signal reconstruction from the STFT with nonoverlapping windows.
Example 10.10. Consider a signal with M = 16 samples, x (0), x (1), . . . , x (15). Write a matrix form
for the signal inversion using a four-sample STFT (N = 4) calculated with the rectangular and a
Hann(ing) window: (a) Without overlapping, R = 4. (b) With a time step in the STFT calculation
of R = 2.
576 Linear Time-Frequency Representations
⋆ (a) For the nonoverlapping case the STFT calculation is done according to:
x (0) x (4) x (8) x (12)
x (1) x (5) x (9) x (13)
STFT = W4 H4
x (2) x (6) x (10) x (14) .
x (3) x (7) x (11) x (15)
with H4 =diag([w(−2) w(−1) w(0) w(1)]) and W4 is the corresponding four sample DFT
matrix.
The inversion relation is
x (0) x (4) x (8) x (12)
x (1) x (5) x (9) x (13) − 1 −1
x (2) x (6) x (10) x (14) = H4 W4 STFT
x (3) x (7) x (11) x (15)
The inversion is
W4−1 STFT = H4 X =
0 x (0)w(−2) x (2)w(−2) x (4)w(−2) . . . x (14)w(−2)
0 x (1)w(−1) x (3)w(−1) x (5)w(−1) . . . x (15)w(−1)
,
x (0) w (0) x (2) w (0) x (4) w (0) x (6) w (0) . . . 0
x (1) w (1) x (3) w (1) x (5) w (1) x (7) w (1) . . . 0
where X is the matrix with signal elements. The window matrix is left on the right side, since in
general it may be not invertible. By calculating W4−1 STFT we can then recombine the signal
values. For example, the element producing x (0)w(0) in the first column is combined with the
element producing x (0)w(−2) in the second column to get x (0)w(0) + x (0)w(−2) = x (0),
since for the Hann(ing) window of the width N holds w(n) + w(n − N/2) = 1. The same is
done for other signal values in the matrix obtained after inversion,
Note that the same relation would hold for a triangular window, while for a Hamming window a
similar relation would hold, with w(n) + w(n − N/2) = 1.08. The results should be corrected
in that case, by a constant factor of 1.08.
Illustration of the STFT calculation for an arbitrary window width N at n = n0 is presented
in Fig.10.17. Its inversion produces x (n0 + m)w(m) = IDFT{STFTN (n0 , k)}. Consider the
previous STFT value in the case of nonoverlapping windows. It would be STFTN (n0 − N, k).
Its inverse
IDFT{STFTN (n0 − N, k)} = x (n0 − N + m)w(m)
is also presented in Fig.10.17. As it can be seen, by combining these two inverse transforms we
would get signal with very low values around n = n0 − N/2̇. If one more STFT is calculated at
n = n0 − N/2 and its inverse combined with previous two it will improve the signal presentation
within the overlapping region n0 − N ≤ n < n0 . In addition for the most of common windows
w(m − N ) + w(m − N/2) + w(m) = 1 (or a constant) within 0 ≤ m < N meaning that
the sum of overlapped inverse STFTs, as in Fig.10.17, will give the original signal within
n0 − N ≤ n < n0 .
In general, let us consider the STFT calculation with overlapping windows. Assume that the
STFTs are calculated with a step 1 ≤ R ≤ N in time. Available STFT values are
...
STFT(n0 − 2R),
STFT(n0 − R), (10.33)
STFT(n0 ),
STFT(n0 + R),
STFT(n0 + 2R),
...
Based on the available STFT values (10.33), the windowed signal values can be reconstructed as
Hw x(n0 + iR) = W− 1
N STFT ( n0 + iR ), i = · · · − 2, −1, 0, 1, 2, . . .
1 N/2−1
w(m) x (n0 + iR + m) = ∑ STFT (n0 + iR, k)e j2πmk/N .
N k=−
(10.34)
N/2
Since R < N we we will get the same signal value within different STFT, for different i. For example,
for N = 8, R = 2 and n0 = 0 we will get the value x (0) for m = 0 and i = 0, but also for m = −2
and i = 1 or m = 2 and i = −1, and so on. Then in the reconstruction we should use all these values
to get the most reliable reconstruction.
Let us re-index the reconstructed signal values (10.34) by substitution m = l − iR, as in (10.12),
1 N/2−1
w(l − iR) x (n0 + l ) = ∑ STFT (n0 + iR, k)e j2πlk/N e− j2πiRk/N
N k=− N/2
− N/2 ≤ l − iR ≤ N/2 − 1.
578 Linear Time-Frequency Representations
x(n)
n
n −N n −N/2 n
0 0 0
w(m) w(m)
m
x(n0−N+m)w(m) x(n0+m)w(m)
w(m)
x(n)w(n−n0+N/2) x(n0−N/2+m)w(m)
m
n
x(n)w(n−n0+N)+x(n)w(n−n0+N/2)+x(n)w(n−n0)
Figure 10.17 Illustration of the STFT calculation with windows overlapping in order to produce an inverse STFT
whose sum will give the original signal within n0 − N ≤ n < n0 .
If R < N, then a value of the signal x (n0 + l ) will be obtained by inverting the STFT
1 N/2−1
w ( l ) x ( n0 + l ) = ∑ STFT (n0 , k)e j2πlk/N
N k=− N/2
Ljubiša Stanković Digital Signal Processing 579
The same signal value, x (n0 + l ), will be obtained within the other overlapping inversions
..
.
1 N/2−1
w(l − 2R) x (n0 + l ) = ∑ STFT (n0 + 2R, k)e j2πlk/N e− j2π2Rk/N
N k=− N/2
1 N/2−1
w ( l − R ) x ( n0 + l ) = ∑ STFT (n0 + R, k)e j2πlk/N e− j2πRk/N
N k=− N/2
1 N/2−1
w ( l + R ) x ( n0 + l ) = ∑ STFT (n0 − R, k)e j2πlk/N e j2πRk/N
N k=− N/2
1 N/2−1
w(l + 2R) x (n0 + l ) = ∑ STFT (n0 − 2R, k)e j2πlk/N e j2π2Rk/N
N k=− N/2
..
.
By summing all the reconstructions over i satisfying − N/2 ≤ l − iR ≤ N/2 − 1 we get the final
reconstructed signal x (n0 + l ). Obviously, this sum produces the exact, up to a constant undistorted
signal value, if
∑ w(l − iR) = 1 (10.35)
i
or
c(l ) = ∑ w(l − iR) = const. = C (10.36)
i
since
∑ w(l − iR)x(n0 + l ) = Cx(n0 + l )
i
for any n0 and l. Note that ∑i w(l − iR) is a periodic extension of w(l ) with a period R. If W (e jω )
is the Fourier transform of w(l ) then the Fourier transform of its periodic extension is equal to the
samples of W (e jω ) at ω = 2πk/R. The condition (10.36) is equivalent to
Special cases:
1. For R = N (nonoverlapping), relation (10.36) is satisfied for the rectangular window, only.
2. For a half of the overlapping period, R = N/2, condition (10.36) is met for the rectangular,
Hann(ing), Hamming, and triangular window. Realization in this case for N = 8 and R =
N/2 = 4 is presented in Fig.10.18. Signal values with a delay of N/2 = 4 samples are obtained
at the exit. The STFT calculation process is repeated after each 4 samples, producing blocks of
4 signal samples at the output.
3. The same holds for R = N/2, N/4, N/8, if the values of R are integers.
580 Linear Time-Frequency Representations
N/2
w(3) STFT(n−3,0) w(3) x(n−0)
x(n−0) ↓ −4
z
z−1
w(2) STFT(n−3,1) w(2) x(n−1)
x(n−1) ↓ z−4
−1
z
w(1) STFT(n−3,2) w(1) x(n−2)
x(n−2) ↓ −4
z
−1
z
w(0) STFT(n−3,3) w(0) x(n−3)
x(n−3) ↓ −4
z
−1 STFT IDFT
z
w(−1) STFT(n−3,4) w(−1) x(n−4) x(n−4)
x(n−4) ↓ +
−1
(DFT)
z
w(−2) STFT(n−3,5) w(−2) x(n−5) x(n−5)
x(n−5) ↓ +
z−1
w(−3) STFT(n−3,6) w(−3) x(n−6) x(n−6)
x(n−6) ↓ +
−1
z
w(−4) STFT(n−3,7) w(−4) x(n−7) x(n−7)
x(n−7) ↓ +
R=N/2=4
Figure 10.18 Signal reconstruction from the STFT for the case N = 8, when the STFT is calculated with step
R = N/2 = 4 and the window satisfies w(m) + w(m − N/2) = 1. This is the case for the rectangular, Hann(ing),
Blackman and triangular windows. The same holds for the Hamming window up to a constant scaling factor of
1.08.
4. For R = 1, (the STFT calculation in each available time instant), any window satisfies the
inversion relation. In this case we may also use a simple reconstruction formula, Fig.10.19
!
1 N/2−1 1 N/2−1 N/2−1
− j2πmk/N
N k=− ∑ STFT ( n, k ) = ∑
N m=−
w ( m ) x ( n + m ) ∑ e
N/2 N/2 k=− N/2
= w (0) x ( n ).
Very efficient realizations, for this case, are the recursive ones, instead of the direct DFT
calculation, Fig.10.9.
In analysis of non-stationary signals our primary interest is not in signal reconstruction with
the fewest number of calculation points. Rather, we are interested in tracking signals’ non-stationary
parameters, like for example, instantaneous frequency. These parameters may significantly vary between
neighboring time instants n and n + 1. Quasi-stationarity of signal within R samples (implicitly
assumed when down-sampling by factor of R is done) in this case is not a good starting point for the
analysis. Here, we have to use the time-frequency analysis of signal at each instant n, without any
down-sampling.
If the reconstructed signal is weighted by the same analysis window, that is w(l − iR) x (n0 + l )
is multiplied by w(l − iR), then the reconstruction condition for the weighted overlap-add method is
∑ w2 (l − iR) = 1. (10.37)
i
For more details on this from and its kernel framework interpretation see Section 10.2.
Ljubiša Stanković Digital Signal Processing 581
x(n)
w(3) STFT(n−3,0)
x(n−0)
−1
z
w(2) STFT(n−3,1)
x(n−1)
−1
z
w(1) STFT(n−3,2)
x(n−2)
−1
z
w(0) STFT(n−3,3) 1/(Nw(0)) x(n−3)
x(n−3) +
z−1 STFT
w(−1) STFT(n−3,4)
x(n−4)
(DFT)
z−1
w(−2) STFT(n−3,5)
x(n−5)
−1
z
w(−3) STFT(n−3,6)
x(n−6)
z−1
w(−4) STFT(n−3,7)
x(n−7)
Figure 10.19 Signal reconstruction when the STFT is calculated with step R = 1.
Window with and form can be varying for different time instants, frequency bands, or can be time-
frequency varying. These forms of the windows will be presented next.
In general, varying window widths could be used for different time-frequency points. When Ni changes
with ni we have the case of a time-varying window. Assuming a rectangular window we can write,
Ni /2−1 − j 2π
N mk
STFTNi (ni , k) = ∑ x ( ni + m ) e i (10.38)
m=− Ni /2
Notation STFTNi (ni , k) means that the STFT is calculated using signal samples within the window
[ni − Ni /2, ni + Ni /2 − 1] for − Ni /2 ≤ k ≤ Ni /2 − 1, corresponding to an even number of Ni
discrete frequencies from −π to π. For an odd Ni , the summation limits are ±( Ni − 1)/2. Let us
restate that a wide window includes signal samples over a wide time interval, losing the possibility to
detect fast changes in time, but achieving high frequency resolution. A narrow window in the STFT
will track time changes, but with a low resolution in frequency. Two extreme cases are Ni = 1 when
and Ni = M when
STFTM (n, k) = X (k),
582 Linear Time-Frequency Representations
where M is the total number of all available signal samples and X (k) = DFT{ x (n)}.
In vector notation
STFT Ni (ni ) = W Ni x Ni (ni ),
where STFT Ni (ni ) and x Ni (ni ) are column vectors. Their elements are STFTNi (ni , k), k =
− Ni /2,. . . , Ni /2 − 1 and x (ni + m), m = − Ni /2,dots, Ni /2 − 1, respectively
where m is the column index and k is the row index of the matrix. The STFT value STFTNi (ni , k) is
presented as a block in the time-frequency plane of the width Ni in the time direction, covering all
time instants [ni − Ni /2, ni + Ni /2 − 1] used in its calculation. The frequency axis can be labeled
with the DFT indices p = − M/2, . . . , M/2 − 1 corresponding to the DFT frequencies 2π p/M (dots
in Fig.10.20). With respect to this axis labeling, the block STFTNi (ni , k) will be positioned at the
frequency 2πk/Ni = 2π (kM/Ni )/M, that is, at p = kM/Ni . The block width in frequency is M/Ni
DFT samples. Therefore the block area in time and DFT frequency is always equal to the number of all
available signal samples M as shown in Fig.10.20 where M = 16.
Example 10.11. Consider a signal x (n) with M = 16 samples. Write the expression for calculation of
the STFT value STFT4 (2, 1) with a rectangular window. Indicate graphically the region of time
instants used in the calculation and the frequency range in terms of the DFT frequency values
included in the calculation of STFT4 (2, 1)?
−2 ≤ 2 + m < 1
0 ≤ n ≤ 3.
The frequency term is exp(− j2πm/4). For the DFT of a signal with M = 16
15 2π
X (k) = ∑ x (m)e− j 16 mk
m =0
k = −8, −7, · · · − 1, 0, 1, . . . , 6, 7
this frequency would correspond to the term exp(− j2π4m/16). Therefore k = 1 corresponds
to the frequency p = 4 in the DFT. Since the whole frequency range −π ≤ ω < π in the case
of Ni = 4 is covered with 4 STFT values STFT4 (2, −2), STFT4 (2, −1), STFT4 (2, 0), and
STFT4 (2, 1) and the same frequency range in the DFT has 16 frequency samples, it means that
each STFT value calculated with Ni = 4 corresponds to a range of frequencies corresponding to
Ljubiša Stanković Digital Signal Processing 583
7 7
6 6
S (2,1) S (6,1) S (10,1) S (14,1)
5 4 4 4 4 5
S (11,0)
S (13,0)
S (15,0)
S (1,0)
S (3,0)
S (5,0)
S (7,0)
S (9,0)
4 4
3 3
2
2 2
S (2,0) S (6,0) S (10,0) S (14,0)
1 4 4 4 4 1
0 0
-1 -1
-2 -2
S (2,-1) S (6,-1) S (10,-1) S (14,-1)
-3 4 4 4 4 -3
S (11,-1)
S (13,-1)
S (15,-1)
S (1,-1)
S (3,-1)
S (5,-1)
S (7,-1)
S (9,-1)
-4 -4
-5 -5
2
-6 -6
S (2,-2) S (6,-2) S (10,-2) S (14,-2)
-7 4 4 4 4 -7
-8 -8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 7
S (12,3)
6 6 8
S (4,1) S (8,1) S (6,1)
5 4 4 5 4
S (12,2)
S (11,0)
S (13,0)
S (1,0)
S (1,0)
S (3,0)
4 4 8
3 3
2
2
2
S (12,1)
2 2 8
S (4,0) S (8,0) S (6,0)
1 4 4 1 4
S (12,0)
S (14,0)
S (15,0)
0 0 8
-1 -1
1
1
S (12,-1)
-2 -2 8
S (4,-1) S (8,-1) S (6,-1)
-3 4 4 -3 4
S (12,-2)
S (11,-1)
S (13,-1)
S (1,-1)
S (1,-1)
S (3,-1)
-4 -4 8
-5 -5
2
S (12,-3)
2
-6 -6 8
S (4,-2) S (8,-2) S (6,-2)
-7 4 4 -7 4
S (12,-4)
-8 -8 8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 10.20 The nonoverlapping STFTs with: (a) constant window of the width N = 4, (b) constant window
of the width N = 2, (c)-(d) time-varying windows. Time index is presented on the horizontal axis, while the DFT
frequency index is shown on the vertical axis (the STFT is denoted by S for notation simplicity).
4 DFT values,
where STFT is a column vector containing all STFT vectors STFT Ni (ni ), i = 0, 1, . . . , K, X = W M x
is a DFT of the whole signal x (n), while W̃ is a block matrix (M × M) formed from the smaller
DFT matrices W N0 , W N1 , . . . ,W NK , as in (10.38). Since the time-varying nonoverlapping STFT
corresponds to a decimation-in-time DFT scheme, its calculation is more efficient than the DFT
calculation of the whole signal. Illustration of time-varying window STFTs is shown in Fig.10.20(c),
(d). For a signal with M samples, there is a large number of possible nonoverlapping STFTs with a
time-varying window Ni ∈ {1, 2, 3, . . . , M }. The exact number will be derived later.
Example 10.12. Consider a signal x (n) with M = 16 samples, whose values are x = [0.5, 0.5,
−0.25, j0.25, 0.25, − j0.25, −0.25, 0.25, −0.25, 0.25, 0.5, 0.5, − j0.5, j0.5, 0, −1]. Some of its
nonoverlapping STFTs are calculated according to (10.38) and shown in Fig.10.20. Different
representations can be compared based on the concentration measures, for example,
The best STFT representation, in this sense, would be the one with the smallest µ[STFTN (n, k)].
For the considered signal and its four representations shown in Fig.10.20 the best representation,
according to this criterion, is the one shown in Fig.10.20(b).
Example 10.13. Consider a signal x (n) with M = 8 samples. Its values are x (0) = 0, x (1) = 1,
x (2) = 1/2, x (3) = −1/2, x (4) = 1/4, x (5) = − j/4, x (6) = −1/4, and x (7) = j/4.
(a) Calculate the STFTs of this signal with rectangular window of the widths N = 1, N = 2,
N = 4. Use the following STFT definition
N/2−1
STFTN (n, k) = ∑ x (n + m)e− j2πmk/N .
m=− N/2
For an odd N, the summation limits are ±( N − 1)/2. Calculate STFT1 (n, k) for n =
0, 1, 2, 3, 4, 5, 6, 7, then STFT2 (n, k) for n = 1, 3, 5, 7, then STFT4 (n, k) for n = 2, 6 and
STFT8 (n, k) for n = 4. For frequency axis use notation k = 0, 1, 2, 3, 4, 5, 6, 7.
(b) Assuming that time-varying approach is used in the nonoverlapping STFT calculation,
find the total number of possible representations.
(c) Calculate the concentration measure of µ[STFT (n, k)]1/2 for each of the cases in (b)
and find the representation (nonoverlapping combination of previous STFTs) when the signal
is represented with the smallest number of coefficients. Does it correspond to the minimum of
µ[STFT (n, k)]1/2 ?
Ljubiša Stanković Digital Signal Processing 585
– for N = 2
– for N = 4 and n = 2, 6
(b) Now we have to make all possible nonoverlapping combinations of these transforms and to
calculate the concentration measure for each of them. Total number of combinations is 25. The
absolute STFT values are shown in Fig. 10.21, along with measure
M = 4. 41 M = 4. 60 M = 4. 60 M = 4. 79 M = 3. 41
M = 4. 00 M = 4. 19 M = 4. 19 M = 4. 38 M = 3. 00
M = 5. 41 M = 5. 60 M = 5. 60 M = 5. 79 M = 4. 41
M = 5. 00 M = 5. 19 M = 5. 19 M = 5. 38 M = 4. 00
M = 5. 51 M = 5. 70 M = 5. 70 M = 5. 89 M = 4. 51
Figure 10.21 Time-frequency representation in various lattices (grid-lines are shown), with concentration measure
M = µ[SPEC (n, k)]1/2 value. The optimal representation, with respect to this measure, is presented with thicker
gridlines. Time axis is n = 0, 1, 2, 3, 4, 5, 6, 7 and the frequency axis is k = 0, 1, 2, 3, 4, 5, 6, 7.
(c) By measuring the concentration for all of them, we will get that the optimal combination,
to cover the time-frequency plane, is
with just three nonzero transformation coefficients. It corresponds to the minimum of µ[SPEC (n, k)].
Ljubiša Stanković Digital Signal Processing 587
In this case there is an algorithm for efficient optimal lattice determination, based on two
regions consideration, starting from lattices 1, 19, and 25 from the Fig. 10.21, corresponding to
the constant window widths of N = 1, N = 2, and N = 4 samples.
3π/4
frequency
π/2
π/4
0
0 1 2 3 4 5
time
STFT2(1,1)
STFT3(4,2)
3π/4
STFT1(2,0)
frequency
π/2 STFT3(4,1)
STFT2(1,0)
π/4
STFT3(4,0)
0
0 1 2 3 4 5
time
for and odd and even number of samples N, respectively. The elements are
Example 10.15. A discrete signal x (n) is considered for 0 ≤ n < M. Find the number of the STFTs
of this signal with time-varying windows.
(a) Consider arbitrary window widths from 1 to M.
(b) Consider dyadic windows, that is, windows whose width is 2m , where m is an
integer, such that 2m ≤ M. In this case find the number of time-varying window STFTs for
M = 1, 2, 3, . . . , 15, 16.
⋆ (a) Let us analyze the problem recursively. Denote by F ( M) the number of STFTs for a signal
with M samples. It is obvious that F (1) = 1, that is, for one-sample signal there is only one
STFT (signal sample itself). If M > 1, we can use window with widths k = 1, 2, . . . M, as the first
analysis window. Now let us analyze remaining ( M − k) samples in all possible ways, so we can
write a recursive relation for the total number of the STFTs. If the first window is one-sample
window, then the number of the STFTs is F ( M − 1). When the first window is a two-sample
window, then the total number of the STFTs is F ( M − 2), and so on, until the first window is the
M-sample window, when F ( M − M) = 1. Thus, the total number of the STFTs for all cases is
F ( M) = F ( M − 1) + F ( M − 2) + . . . + F (1) + 1.
We can introduce F (0) = 1 (meaning that if there are no signal samples we have only one way to
calculate time-varying window STFT) and obtain
M
F ( M ) = F ( M − 1) + F ( M − 2) + . . . F (1) + F (0) = ∑ F( M − k)
k =1
and
M M
F ( M ) − F ( M − 1) = ∑ F ( M − k ) − ∑ F ( M − k ) = F ( M − 1)
k =1 k =2
F ( M) = 2F ( M − 1).
resulting in F ( M) = 2 M−1 .
(b) In a similar way, following the previous analysis, we can write
⌊log2 M⌋
F ( M) = F ( M − 20 ) + F ( M − 21 ) + F ( M − 22 ) + · · · + F ( M − 2m ) = ∑ F ( M − 2m ),
m =0
where ⌊log2 M⌋ is an integer part of log2 M. Here we cannot write a simple recurrent relation as
in the previous case. It is obvious that F (1) = 1. We can also assume that F (0) = 1. By unfolding
590 Linear Time-Frequency Representations
M 1234 5 6 7 8
F ( M) 1 2 3 6 10 18 31 56
M 9 10 11 12 13 14 15 16
.
F ( M) 98 174 306 542 956 1690 2983 5272
where [·] is an integer part of the argument, holds, with relative error smaller then 0.4% for
1 ≤ M ≤ 1024. For example, for M = 16 we have 5272 different ways to split time-frequency
plane into non-overlapping time-frequency regions.
where ( ak , bk ] and (bk , ck ], define the width of wk (τ − bk ). For consecutive intervals the relations
ak+1 = bk , bk+1 = ck hold, as shown in Fig. 10.24. These windows satisfy the constant overlap-add
relation
K −1
∑ w(τ − bk ) = 1, (10.41)
k =0
since the squared sine and cosine sum up in two consecutive windows with the same parameters.
The initial window function, w0 (τ − b0 ), is defined
1, for a0 = 0 < τ ≤ b0
w0 (τ − b0 ) = cos2 π2 c0 b−0b0 ( bτ0 − 1) , for b0 < τ ≤ c0 (10.42)
0, elsewhere,
0.5
0
0 1 2 3 4 5 6 7 8
0.5
0
0 1 2 3 4 5 6 7 8
A simple way to construct a window (function) for the weighted overlap-add method is to take the
square root of the constant overlap-add window (for, example, the sine window as the square root of
the Hann window). In this case, the window functions become
sin π2 b a−k a ( aτ − 1) , for ak < τ ≤ bk
k k k
w k ( τ − bk ) = π bk τ (10.46)
cos (
2 c k − bk bk − 1 ) , for bk < τ ≤ ck
0, elsewhere,
592 Linear Time-Frequency Representations
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 (a)
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 (b)
Figure 10.25 (a) Time-varying asymmetric Hann(ing) windows, 0 ≤ t ≤ 8, that satisfy the constant overlap-
add (COLA) reconstruction condition ∑k w(τ − bk ) = 1, with bk ∈ {0.1, 0.6, 1.6, 2.2, 3.3, 4.5, 6.0, 8.0},
k = 0, 1, 2, 3, 4, 5, 6, 7. (b) Time-varying asymmetric square root of the Hann(ing) windows (sine window),
0 ≤ t ≤ 8, that satisfy the weighted overlap-add (WOLA) reconstruction condition ∑k w2 (τ − bk ) = 1, with
bk ∈ {0.1, 0.6, 1.6, 2.2, 3.3, 4.5, 6.0, 8.0}, k = 0, 1, 2, 3, 4, 5, 6, 7.
with ak+1 = bk , bk+1 = ck and the initial and the final intervals defined as in (10.42) and (10.43). The
problem of this window is in its differentiability at the interval ending points, as can be seen from Fig.
10.25(b), causing slow frequency domain convergence. This problem will be addressed later, within the
continuous wavelet transform analysis.
The STFT may use frequency-varying window as well. For a given DFT frequency pi the window
width in time is constant, Fig.10.26
Ni /2−1 − j 2π
N mk i
STFTNi (n, k i ) = ∑ w(m) x (n + m)e i .
m=− Ni /2
7 7
6 6
S4(2,1) S4(6,1) S4(10,1) S4(14,1)
5 5
S2(11,0)
S2(13,0)
S2(15,0)
S2(1,0)
S2(3,0)
S2(5,0)
S2(7,0)
S2(9,0)
4 4
3 3
S (4,1) S8(12,1)
8
2 2
1 1 S16(8,1)
0 0 S16(8,0)
−1 −1
−2 −2
S4(2,−1) S4(6,−1) S4(10,−1) S4(14,−1)
−3 −3
S2(11,−1)
S2(13,−1)
S2(15,−1)
S2(1,−1)
S2(3,−1)
S2(5,−1)
S2(7,−1)
S2(9,−1)
−4 −4
−5 −5
−6 −6
S4(2,−2) S4(6,−2) S4(10,−2) S4(14,−2)
−7 −7
−8 −8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
7 7 S16(8,7)
6 6 S16(8,6)
5 5
S8(4,2) S8(12,2)
S2(11,0)
S2(13,0)
S2(15,0)
S2(1,0)
S2(3,0)
S2(5,0)
S2(7,0)
S2(9,0)
4 4
3 3
S8(4,1) S8(12,1)
2 2
1 1
0 0
S4(2,0) S4(6,0) S4(10,0) S4(14,0)
−1 −1
−2 −2
S4(2,−1) S4(6,−1) S4(10,−1) S4(14,−1)
−3 −3
−4 −4
S4(2,−1) S4(6,−1) S4(10,−1) S4(14,−1)
−5 −5
S8(4,−3) S8(12,−3)
−6 −6
−7 S16(8,−7) −7
S8(4,−4) S8(12,−4)
−8 S16(8,−8) −8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 10.26 Time-frequency analysis with the STFT using frequency-varying windows.
For the signal used to illustrate the frequency-varying STFT in 10.26, the best concentration (out
of the presented four) is the one shown in the last subplot. Optimization can be done in the same way
as in the case of time-varying windows.
The STFT can be calculated using the signal’s DFT instead of the signal. There is a direct relation
between the time and the frequency domain STFT via coefficients of the form exp( j2πnk/M ). A dual
form of the STFT is
1 M −1
P(i ) X (k + i )e j2πin/M , (10.47)
M i∑
STFT (n, k) =
=0
STFT M (k) = W− 1 −1
M P M X ( k ).
Frequency domain window P(i ) may be of frequency varying width. This form is dual to the time-
varying form. Forms corresponding to frequency varying windows, dual to the ones for the time-varying
594 Linear Time-Frequency Representations
windows, can be easily defined, for example, for a rectangular frequency domain window, as
−1
W N0 0 · · · 0
0 W −1 · · · 0
N1
STFT = .. .. . . .. X, (10.48)
. . . .
0 0 · · · W−NK
1
where X = [ X (0), X (1), . . . , X ( M − 1)] T is the DFT vector. A specific form of the STFT with the
frequency-varying windows is called the wavelet transform and will be considered later in the book.
In general, spectral content of signal changes in time and frequency in an arbitrary manner. Combining
time-varying and frequency-varying windows we get hybrid time–frequency-varying windows with
STFTN(i,l ) (ni , k l ),
N(i,l ) /2−1
− j N2π mk l
STFTN(i,l ) (ni , k l ) = ∑ w(i,l ) (m) x (ni + m)e (i,l ) (10.49)
m=− N(i,l ) /2
For a graphical representation of the STFT with varying windows, the corresponding STFT value
should be assigned to each instant n = 0, 1, . . . , M − 1 and each DFT frequency p = − M/2, − M/2 +
1, . . . , M/2 − 1 within a block. In the case of a hybrid time–frequency-varying window the matrix
form is obtained from the definition for each STFT value. For example, for the STFT calculated as in
Fig.10.27, for each STFT value an expression based on (10.49) should be written. Then the resulting
matrix STFT can be formed.
There are several methods in the literature that adapt windows or basis functions to the signal
form for each time instant or even for every considered time and frequency point in the time-frequency
plane. Selection of the most appropriate form of the basis functions (windows) for each time-frequency
point includes a criterion for selecting the optimal window width (basis function scale) for each point.
After the presentation of the wavelet transform we will shift back our attention to the frequency of the
signal, rather than to its amplitude values. There are signals whose instantaneous frequency variations
are known up to an unknown set of parameters. For example, many signals could be expressed as
polynomial-phase signals
2 3 N +1
x (t) = Ae j(Ω0 t+ a1 t + a2 t +···+ a N t )
where the parameters Ω0 , a1 , a2 , . . . , a N are unknown. For nonstationary signals, this approach may be
used if the nonstationary signal could be considered as a polynomial phase signal within the analysis
window. In that case, the local polynomial Fourier transform (LPFT) may be used. It is defined as
Z∞
2 3 N +1
LPFTΩ1 ,Ω2 ,...,Ω N (t, Ω) = x (t + τ )w(τ )e− j(Ωτ +Ω1 τ +Ω2 τ +···+Ω N τ ) dτ. (10.50)
−∞
In general, parameters Ω1 , Ω2 , . . . , Ω N could be time dependent, that is, for each time instant t, the set
of optimal parameters could be different.
Ljubiša Stanković Digital Signal Processing 595
7
STFT (12,3)
8
6
STFT (2,1) STFT (6,1)
4 4
5
STFT (12,2)
8
4
3
STFT8(4,1) STFT8(12,1)
2
1 STFT16(8,1)
frequency
0 STFT16(8,0)
−1
STFT8(4,−1)
−2
STFT (10,−1)
4
−3
STFT (13,−1)
STFT (15,−1)
STFT (4,−2)
8
−4
2
−5
−6
STFT (2,−2) STFT (6,−2) STFT (10,−2)
4 4 4
−7
−8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time
2 3 N +1
Realization of the LPFT reduces to the local signal x (t + τ ) demodulation by e− j(Ω1 τ +Ω2 τ +···+Ω N τ )
followed by the STFT calculation.
Show that its LPFT could be completely concentrated along the instantaneous frequency.
For Ω1 = a1 , the second-order phase term does not introduce any distortion to the local polynomial
spectrogram,
LPFTΩ = a (t, Ω)2 = |W (Ω − Ω0 − 2a1 t)|2 ,
1 1
596 Linear Time-Frequency Representations
with respect to the spectrogram of a sinusoid with constant frequency. For a wide window w(τ ),
like in the case of the STFT of a pure sinusoid, we achieve high concentration.
The LPFT could be considered as the Fourier transform of windowed signal demodulated with
exp(− j(Ω1 τ 2 + Ω2 τ 3 + · · · + Ω N τ N +1 )). Thus, if we are interested in signal filtering, we can
find the coefficients Ω1 , Ω2 , . . . , Ω N , demodulate the signal by multiplying it with exp(− j(Ω1 τ 2 +
Ω2 τ 3 + · · · + Ω N τ N +1 )) and use a standard filter for almost a pure sinusoid. In general, we can extend
this approach to any signal x (t) = e jφ(t) by estimating its phase φ(t) with φ b(t) (using the instantaneous
frequency estimation that will be discussed later) and filtering demodulated signal x (t) exp(− jφ b(t))
by a lowpass filter. The resulting signal is obtained when the filtered signal is returned back to the
original frequencies, by modulation with exp( jφ b(t)).
Example 10.17. Consider the first-order LPFT of a signal x (t). Show that the second-order moments
of the LPFT could be calculated based on the windowed signal moment, windowed signal’s
Fourier transform moment and one more LPFT moment for any Ω1 in (10.50), for example for
Ω1 = 1.
defined by
Z∞ 2
1
MΩ 1 = Ω2 LPFTΩ1 (t, Ω) dΩ (10.52)
2π
−∞
is equal to
2
Z∞ d xt (τ )e− jΩ1 τ 2
MΩ 1 = dτ,
dτ
−∞
2
since the LPFT could be considered as the Fourier transform of xt (τ )e− jΩ1 τ , that is,
2
LPFTΩ1 (t, Ω) = FT{ xt (τ )e− jΩ1 τ }, and the Parseval’s theorem is used. After the derivative
calculation
Z∞ 2
MΩ 1 = dxt (τ ) − j2Ω1 τxt (τ ) dτ =
dτ
−∞
Z∞
dx (τ ) 2 dx (τ ) dx ∗ (τ )
( t + j2Ω1 τxt∗ (τ ) t − j2Ω1 τxt (τ ) t + |2Ω1 τxt (τ )|2 )dτ.
dτ dτ dτ
−∞
This is the moment of Xt (Ω) = FT{ xt (τ )}, since the integral of |dxt (τ )/dτ |2 over τ is equal
to the integral of | jΩXt (Ω)|2 over Ω, according to Parseval’s theorem. Also, we can see that the
last term in MΩ1 contains the signal moment,
Z∞
mx = τ 2 | xt (τ )|2 dτ, (10.53)
−∞
Z∞
d[ xt (τ )] d[ x ∗ (τ )]
MΩ1 − M0 − 4m x Ω21 = Ω1 j2τxt∗ (τ ) − j2τxt (τ ) t dτ.
dτ dτ
−∞
Note that the last integral does not depend on parameter Ω1 . Thus, the relation among the LPFT
moments at any two Ω1 , for example, Ω1 = a and an arbitrary Ω1 , easily follows as the ratio
MΩ1 = a − M0 − 4a2 m x a
= . (10.54)
MΩ1 − M0 − 4Ω21 m x Ω1
with M1 = MΩ1 =1 .
Obviously, the second-order moment, for any Ω1 , can be expressed as a function of other
three moments. In this case the relation reads
MΩ1 = 4Ω21 m x + Ω1 ( M1 − M0 − 4m x ) + M0 .
Example 10.18. Find the position and the value of the second-order moment minimum of the LPFT,
based on the windowed signal moment, the windowed signal’s Fourier transform moment, and
the LPFT moment for Ω1 = 1.
⋆ The minimum value of the second-order moment (meaning the best concentrated LPFT in the
sense of the duration measures) could be calculated from
dMΩ1
=0
dΩ1
as
M1 − M0 − 4m x
Ω1 = − .
8m x
Since m x > 0 this is a minimum of the function MΩ1 . Thus, in general, there is no need for a
direct search for the best concentrated LPFT over all possible values of Ω1 . It can be found based
on three moments.
598 Linear Time-Frequency Representations
( M1 − M0 − 4m x )2
MΩ1 = M0 − . (10.56)
16m x
Note that any two moments, instead of M0 and M1 , could be used in the derivation.
where r
1 − j cot α j(u2 /2) cot α j(τ2 /2) cot α − juτ csc α
Kα (u, τ ) = e e e . (10.58)
2π
It can be considered as a rotation of signal in the time-frequency plane for an angle α. Its inverse can be
considered as a rotation for angle −α
Z∞
x (t) = Xα (u)K−α (u, t)du.
−∞
√
Special cases of the FRFT reduce to: X0 (u) = x (u) and Xπ/2 (u) = X (u)/ 2π, that is, the signal
and its Fourier transform.
The windowed FRFT is
q Z∞
1− j cot α j(u2 /2) cot α 2
/2) cot α − juτ csc α
Xw,α (t, u) = 2π e x (t + τ )w(τ )e j(τ e dτ. (10.59)
−∞
meaning that the lag truncation could be applied after signal rotation or prior to the rotation. Results
are similar. A similar relation for the moments, like (10.55) in the case of LPFT, could be derived here.
It states that any FRFT moment can be calculated if we know just any three of its moments.
High-resolution techniques are developed for efficient processing and separation of very close sinusoidal
signals (in array signal processing, separation of sources with very close DOAs). Among these
techniques the most widely used are Capon’s method, MUSIC, and ESPRIT. The formulation of
high-resolution techniques could be extended to the time-frequency representations. Here we will
present a simple formulation of the STFT and the LPFT within Capon’s method framework.
Here we will present the STFT formulation in a common array signal-processing notation. The STFT
of a discrete time signal x (n) in (causal) notation
1 N −1
x (n + m)e− jωm
N n∑
STFT (ω, n) =
=0
can be written as
1 H
STFT (ω, n) = ŝω (n) = h H x(n) = a (ω )x(n)
N
a H (ω ) = [1 e−iω e−iω2 . . . e−iω ( N −1) ] (10.63)
T
x(n) = [ x (n) x (n + 1) x (n + 2) . . . x (n + N − 1)] ,
where T denotes the transpose operation, and H denotes the conjugate and transpose (Hermitian)
operation. Normalization of the STFT with N is done, as in the robust signal analysis.
The average power of the output signal ŝω (n), over M samples (ergodicity over M samples
around n is assumed), for a frequency ω, is
1
|ŝω (n)|2 (10.64)
M∑
P(ω ) =
n
1 H 1 1
= a (ω ) ∑[x(n)x H (n)]a(ω ) = 2 a H (ω )R̂ x a(ω ),
N2 M n N
The standard STFT (10.63) can be derived based on the following consideration. Find h as a solution
of the problem
min{h H h} subject to h H a(ω ) = 1. (10.65)
h
This minimization problem will be explained through the next example.
600 Linear Time-Frequency Representations
Example 10.19. Show that the output power of the filter producing s(n) = h H x(n) is minimized for
the input x(n) = Aa(ω ) + ε(n), with respect the input white noise ε(n), whose autocorrelation
function is R̂ε = ρI if h H h is minimum subject to h H a(ω ) = 1.
⋆ The output for the noise only is sε (n) = h H ε(n), while its average power is
1 1
|h H ε(n)|2 = ∑ h H ε(n)ε H (n)h
M∑n M n
!
H 1
ε(n)ε (n) h =ρ h H h.
H
M∑
=h
n
h H x(n) = h H Aa(ω ) = A.
Thus, the condition h H a(ω ) = 1 means that the estimate is unbiased with respect to input
sinusoidal signal with amplitude A.
resulting in
a(ω ) 1
h= = a(ω ) (10.66)
a H (ω )a(ω ) N
and the estimate (10.63), which is the standard STFT, follows.
Consider now a different optimization problem, defined by
1
|h H x(n)|2 } subject to h H a(ω ) = 1. (10.67)
M∑
min{
h n
Two points are emphasized in this optimization problem. First, the weights are selected to minimize the
1
average power M ∑n |h H x(n)|2 of the output signal of the filter. It means that the filter should give
the best possible suppression of all components of signals-plus-noise components of the observations
as well as a suppression of the components of the desired signal for all time-instants (minimization of
the power of y(n)). Second, by setting the condition h H a(ω ) = 1, in the considered time instant n the
signal amplitude is preserved at the output.
The optimization problem can be rewritten in the form
1
h H x(n)x H (n)h} subject h H a(ω ) = 1.
M∑
min{
h n
By denoting
1
x ( n ) x H ( n ),
M∑
R̂x =
n
Ljubiša Stanković Digital Signal Processing 601
we get
∂
{h H R̂x h + λ(h H a(ω ) − 1)} = 0 subject to h H a(ω ) = 1.
∂h H
gives the solution
−1 λa ( ω )
h = −R̂x subject to h H a(ω ) = 1. (10.68)
2
The solution can be written in the form
R̂− 1
x a(ω )
ĥ = , (10.69)
a (ω )R̂−
H 1
x a(ω )
where
1
x ( n ) x H ( n ). (10.70)
M∑
R̂x =
n
The output signal power, in these cases, corresponds to Capon’s form of the STFT, defined by
1
|h H x(n)|2 = h H R̂x h (10.71)
M∑
SCapon (ω ) =
n
!H
R̂− 1
x a(ω ) R̂− 1
x a(ω )
= R̂ x (10.72)
a H (ω )R̂− 1
x a(ω ) a H (ω )R̂− 1
x a(ω )
1
= . (10.73)
a H (ω )R̂− 1
x a(ω )
where K is a parameter defining the width of a symmetric sliding window. Inserting R̂x (n, K ) instead of
R̂x in (10.71) gives the STFT with weights minimizing the output power in (10.67), for the observations
in the neighborhood of the time instant of interest n.
The mean value of this power function, calculated in the neighborhood of the time n over the
window used in (10.74), gives an averaged Capon’s STFT as follows
1
SCapon (n, ω ) = . (10.75)
a H (ω )R̂− 1
x (n)a(ω )
where n indicates the time instant of the interest and the mean is calculated over the observations y(n)
in the corresponding window.
In the realization the autocorrelation function is regularized by a unity matrix I thus, we use
n+K/2
1
x( p)x H ( p) + ρI. (10.76)
K + 1 p=n∑
R̂(n) =
−K/2
602 Linear Time-Frequency Representations
instead of R̂x (n) for the inverse calculation in (10.75) and (10.71).
In the MUSIC formulation of the high resolution STFT the eigenvalue decomposition of the
autocorrelation matrix (10.76) is used as
n+K/2
1
x( p)x H ( p) + ρI = V H (n)Λ(n)V(n),
K + 1 p=n∑
R̂(n) =
−K/2
Note that the Capon spectrogram, using eigenvalues and eigenvectors of the autocorrelation matrix, can
be written as
1
SCapon (n, ω ) =
a H ( ω ) V H ( n ) Λ −1 ( n ) V ( n ) a ( ω )
1
=
N
∑ 1
λk |STFTk (n, ω )|2
k =1
where
STFTk (n, ω ) = a H (ω )vk (n)
is the STFT of the kth eigenvector (column) of the autocorrelation matrix R̂(n), corresponding to
the eigenvalue λk . If the signal has N − M components then the first N − M largest eigenvalues λk
(corresponding to the smallest values 1/λk ) will represent the signal space (components), and the
remaining M eigenvalues will correspond to the noise space (represented by ρI in the definition of
autocorrelation matrix R̂(n)).
If a frequency ω corresponds to a signal component, then all eigenvectors corresponding to
the noise space will be orthogonal to that harmonic, being represented by a H (ω ). It means that the
spectrograms of all noise space only components will be very small at the frequencies corresponding to
the signal frequencies.
The MUSIC STFT is defined based on this fact. It is calculated using the eigenvectors
corresponding to noise space, as
1 1
SMUSIC (n, ω ) = = , (10.77)
a H (ω )V H
M V M a(ω )
N
2
∑ |STFTk (n, ω )|
k = N − M +1
where V M is the eigenvector matrix containing only M eigenvectors corresponding to the M lowest
eigenvalues in Λ, representing the space of noise. In this case the signal has N − M components
corresponding to the largest eigenvalues. A special case with M = 1 is the Pisarenko method.
Example 10.20. Calculate high resolution forms of the spectrogram for two-component signal whose
frequencies ω0 + ∆ω and ω0 − ∆ω may be considered as constants around the instant of interest
n = 128,
In the STFT calculation use a rectangular window of the width N = 16. Use 15 samples for
averaging (estimation) of the autocorrelation matrix, as well as its regularization by a 0.0001 · I
(corresponding to noise signal x (n) + ε(n), where ε(n) is complex white noise with variance
σε2 = 0.0001). Assume that signal samples needed for autocorrelation function estimation are
also available.
⋆ Signal values around n = 128 are considered. The STFT is calculated using N = 16 signal
samples
x(128) = [ x (128) x (129) x (130) . . . x (143)] T
and a rectangular window. The mainlobe with of this window is D = 4π/N = π/4 = 0.7854.
Its will not be able to resolve two components closer than 2∆ω ∼ D/2 = 0.3927. Considered
∆ω = 0.05 is well below this limit. The STFT is interpolated in frequency up to 2048 samples.
The result is shown in Fig. 10.28(a). Next the autocorrelation matrix
1 128+7
R̂(128) = ∑ x( p)x H ( p) + 0.00001 · I
15 p=128 −7
are calculated at the frequencies of interest ω = 2πk/2048, for k = 0, 1, 2, . . . , 1023. The Capon’s
STFT is then
1 1
SCapon (128, ω ) = = .
a H (ω )R̂−1 (128)a(ω ) 16
∑ 1
λk |STFTk (n, ω )|2
k =1
1 1
0.5 0.5
0 0
(a) (b)
0 1 2 3 0 1 2 3
Ω Ω
MUSIC spectrogram (normalized) Capon spectrogram (zoomed log scale)
1
−1
10
−2
0.5 10
−3
10
0 −4
(c) 10 (d)
0 1 2 3 0.95 1 1.05
Ω Ω
MUSIC spectrogram (zoomed log scale) Pisarenko spectrogram (zoomed log scale)
−1 −1
10 10
−2 −2
10 10
−3 −3
10 10
−4 −4
10 (e) 10 (f)
0.95 1 1.05 0.95 1 1.05
Ω Ω
Figure 10.28 (a) The standard STFT using a rectangular window N = 16. The STFT is interpolated in frequency
up to 2048 samples. (b) Capon’s spectrogram calculated in 2048 frequency points. (c) MUSIC spectrogram calculated
in 2048 frequency points. (d) Capon’s spectrogram zoomed to the signal components. (e) MUSIC spectrogram
zoomed to the signal components. (f) Pisarenko spectrogram zoomed to the signal components.
With varying coefficients or appropriate signal multiplication, before the STFT calculation, a local
polynomial version of Capon’s transform could be defined. For example, for a linear frequency-
modulated signal of the form
2
x (n) = Ae j(α0 n +ω0 n+ ϕ0 )
we should use (10.75) or (10.71) with a signal of the form
n+K/2
1 2
xα ( p)x aH ( p) with xα ( p) = x( p)e− jαp ,
K + 1 p=n∑
R̂x (n, K, α) =
−K/2
where α as a parameter. The high-resolution form of the LPFT can be used for efficient processing of
close linear frequency-modulated signals, with the same rate within the considered interval.
Ljubiša Stanković Digital Signal Processing 605
Example 10.21. The Capon LPFT form is illustrated on an example with a signal with two close
components
that in addition to the linear frequency-modulated contained a small disturbing cubic phase term.
The considered time interval was −1 ≤ t ≤ 1 − ∆t with ∆t = 2/512, ρ = 0.5, K = 30, and the
frequency domain is interpolated eight times. The standard STFT, LPFT, Capon’s STFT, and
Capon’s LPFT-based representations are presented in Fig. 10.29.
0.5 0.5
0 0
t
t
−0.5 −0.5
(a) (b)
−500 0 500 −500 0 500
Ω Ω
0.5 0.5
0 0
t
−0.5 −0.5
(c) (d)
−500 0 500 −500 0 500
Ω Ω
Figure 10.29 (a) The standard STFT, (b) the LPFT, (c) Capon’s STFT, and (d) Capon’s LPFT-based representations
of two close almost linear frequency-modulated signals.
A way to improve time-frequency representation of this signal is in transforming the signal into a
sinusoid whose constant frequency is equal to the instantaneous frequency value of the linear frequency
modulated signal at the considered instant. Then, a wide window can be used, with a high frequency
resolution. The obtained result is valid for the considered instant only and the signal transformation
procedure should be repeated for each instant of interest.
A simple way to introduce this kind of signal representation is presented. Consider an LFM
signal,
x (t) = A exp( jφ(t)) = A exp( j( at2 /2 + bt + c)).
Its instantaneous frequency changes in time as
Ωi (t) = dφ(t)/dt = at + b.
One of the goals of time-frequency analysis is to obtain a function that will (in an ideal case) fully
concentrate the signal power along its instantaneous frequency. The ideal representation would be
dφ(t) τ τ
τ = φ(t + ) − φ(t − ) = τ ( at + b) = τΩi (t).
dt 2 2
606
Ljubiša Stanković Digital Signal Processing 607
Optimal STFT with a Hann window Wigner distribution with a Hann window
Figure 11.1 Optimal STFT (absolute value, calculated with optimal window width) and the Wigner distribution
of a linear frequency modulated signal.
This property can easily be converted into an ideal time-frequency representation for the linear frequency
modulated signal by using
The Fourier transform of x (t + τ/2) x ∗ (t − τ/2) over τ, for a given t, is called the Wigner distribution.
It is defined as
Z∞
WD (t, Ω) = x (t + τ/2) x ∗ (t − τ/2)e− jΩτ dτ. (11.1)
−∞
The Wigner distribution is originally introduced in quantum mechanics. The illustration of the Wigner
distribution calculation is presented in Fig. 11.2.
Expressing x (t) in terms of X (Ω) and substituting it into (11.1) we get
Z∞
1
WD (t, Ω) = X (Ω + θ/2) X ∗ (Ω − θ/2)e jθt dθ (11.2)
2π
−∞
x(t+τ/2)
τ
x(t−τ/2)
τ
x(t+τ/2)x*(t−τ/2)
τ
FT{x(t+τ/2)x*(t−τ/2)}
Ω Ω
Figure 11.2 Illustration of the Wigner distribution calculation, for a considered time instant t. Real values of a
linear frequency modulated signal (linear chirp) are presented.
Based on the definition of the Wigner distribution in the frequency domain, (11.2), one may easily
prove the fulfillment of the frequency marginal.
Example 11.1. Find the Wigner distribution of signals: (a) x (t) = δ(t − t1 ) and (b) x (t) = exp( jΩ1 t).
since | a| δ( at) x (t) = δ(t) x (0). From the Wigner distribution definition in terms of the Fourier
transform, for x (t) = exp( jΩ1 t) with X (Ω) = 2πδ(Ω − Ω1 ), follows
WD (t, Ω) = 2πδ(Ω − Ω1 ).
2
Example 11.2. Consider a linear frequency modulated signal, x (t) = Ae jbt /2 . Find its Wigner
distribution.
with
WD (t, Ω) = 2π | A|2 δ(Ω − bt).
Again, a high concentration along the instantaneous frequency in the time-frequency plane may
be achieved for the linear frequency modulated signals.
These two examples demonstrate that the Wigner distribution can provide superior time-frequency
representation of one-component signal, in comparison to the STFT.
Example 11.3. Calculate the Wigner distribution for a linear frequency modulated signal, with
Gaussian amplitude (Gaussian chirp signal)
2
/2 j(bt2 /2+ct)
x (t) = Ae− at e .
M Z∞
M
τ ∗ τ − jΩτ
WD (t, Ω) = ∑ ∑ xm t +
2
xn t −
2
e dτ.
m=1 n=1−∞
M M Z∞
τ ∗ τ
WDct (t, Ω) = ∑ ∑ xm (t + ) x (t − )e− jΩτ dτ.
m =1 n =1 − ∞
2 n 2
n6=m
Usually, they are not desirable in the time-frequency signal analysis. Cross-terms can mask the presence
of auto-terms, which makes the Wigner distribution unsuitable for the time-frequency analysis of
signals.
For a two-component signal with auto-terms located around (t1 , Ω1 ) and (t2 , Ω2 ) (see Fig.11.3)
the oscillatory cross-terms are located around ((t1 + t2 )/2, (Ω1 + Ω2 )/2).
Auto−term
Ω
2
Ω1 Oscillatory
cross−term
Auto−term
0 t1 t2 t
Example 11.4. Analyze auto-terms and cross-terms for two-component signal of the form
1 2 jΩ1 t 1 2 − jΩ1 t
x ( t ) = e − 2 ( t − t1 ) e
+ e − 2 ( t + t1 ) e
Ljubiša Stanković Digital Signal Processing 611
where the first and second terms represent auto-terms while the third term is a cross-term. Note
that the cross-term is oscillatory in both directions. The oscillation rate along the time axis is
proportional to the frequency distance between components 2Ω1 , while the oscillation rate along
frequency axis is proportional to the distance in time of components, 2t1 . The oscillatory nature
of cross-terms will be used for their suppression.
To analyze auto-terms and cross-terms, the well-known ambiguity function can be used as well.
It is defined as:
Z∞
τ ∗ τ − jθt
AF (θ, τ ) = x t+ x t− e dt. (11.6)
2 2
−∞
It is already a classical tool in optics as well as in radar and sonar signal analysis.
The ambiguity function and the Wigner distribution form a two-dimensional Fourier transform
pair
AF (θ, τ ) = FT2D
t,Ω {WD ( t, Ω )},
∞
Z Z∞ Z∞
1 τ τ
x (u + ) x ∗ (u − )e− jθu du e jθt− jΩτ dτdθ,
WD (t, Ω) =
2π 2 2
−∞ −∞ −∞
where the integration over frequency related variable θ assumes factor 1/(2π ) and the positive sign in
the exponent exp( jθt).
Consider a signal whose components are limited in time to
It means that xm (t + τ/2) xm ∗ ( t − τ/2) is located within | τ | < 2T , that is, around the θ-axis
m
independently of the signal’s position tm . Cross-term between signal’s m-th and n-th component
is located within |τ + tn − tm | < Tm + Tn . It is dislocated from τ = 0 for two components that do not
occur simultaneously, that is, when tm 6= tn .
From the frequency domain definition of the Wigner distribution a corresponding ambiguity
function form follows
Z∞
1 θ θ jΩτ
AF (θ, τ ) = X Ω+ X∗ Ω − e dΩ. (11.7)
2π 2 2
−∞
From this form we can conclude that the auto-terms of the components, limited in frequency to
Xm (Ω) 6= 0 only for |Ω − Ωm | < Wm , are located in the ambiguity domain around τ-axis within the
612 Quadratic Time-Frequency Representations
|θ + Ωn − Ωm | < Wm + Wn ,
where Ωm and Ωn are the frequencies around which the Fourier transform of each component lies.
Therefore, all auto-terms are located along and around the ambiguity domain axis. The cross-terms,
for the components which do not overlap in the time and frequency, simultaneously, are dislocated from
the ambiguity axes, Fig. 11.4. This property will be used in the definition of the reduced interference
time-frequency distributions.
| AF (θ,τ) |
τ
Cross−term
τ
2
Auto−terms
0
τ
1
Cross−term
θ 0 θ θ
1 2
Figure 11.4 Auto and cross-terms for two-component signal in the ambiguity domain.
The ambiguity function of a four-component signal consisting of two Gaussian pulses, one
sinusoidal and one linear frequency modulated component is presented in 11.5.
In the ambiguity domain (θ, τ ) auto-terms are located around (0, 0) while cross-terms are located
around (2Ω1 , 2t1 ) and (−2Ω1 , −2t1 ) as presented in Fig. 11.4.
Ljubiša Stanković Digital Signal Processing 613
AF(θ,τ)
100
50
τ 0
−50
−100
0 1 2 3
−3 −2 −1
θ
A list of the properties satisfied by the Wigner distribution follows. The obvious ones will be just stated,
while the proofs will be given for more complex ones. In the case when the Wigner distributions of
more than one signal are considered, the signal will be added as an index in the Wigner distribution
notation. Otherwise signal x (t) is assumed, as a default signal in the notation.
P1 – Realness
For any signal holds,
WD ∗ (t, Ω) = WD (t, Ω).
P2 – Time-shift property
The Wigner distribution of a signal shifted in time
y ( t ) = x ( t − t0 ),
is
WDy (t, Ω) = WDx (t − t0 , Ω).
P3 – Frequency shift property
For a modulated signal
y(t) = x (t)e jΩ0 t ,
we have
WDy (t, Ω) = WDx (t, Ω − Ω0 ).
P4 – Time marginal property
Z∞
1
WD (t, Ω)dΩ = | x (t)|2 .
2π
−∞
P5 – Frequency marginal property
Z∞
WD (t, Ω)dt = | X (Ω)|2 .
−∞
614 Quadratic Time-Frequency Representations
1
R∞
This property follows from 2π −∞ WD ( t, Ω ) dΩ = | x (t)|2 .
P8 – Scaling
For a scaled version of the signal
q
y(t) = | a| x ( at), a 6= 0,
In order to prove this property, we will use the derivative of the inverse Fourier transform of the
Wigner distribution
Z∞
d[ x (t + τ/2) x ∗ (t − τ/2)] 1
= jΩ WD (t, Ω)e jΩτ dΩ
dτ 2π
−∞
The proof is the same as in the instantaneous frequency case, using the frequency domain relations.
P11 – Time constraint
The Wigner distribution is a function of x (t + τ/2) x ∗ (t − τ/2). If x (t) = 0 for t outside [t1 , t2 ] then
x (t + τ/2) x ∗ (t − τ/2) is different from zero within
If X (Ω) = 0 for Ω outside [Ω1 , Ω2 ], then, also WD (t, Ω) = 0 for Ω outside [Ω1 , Ω2 ].
P13 – Convolution
Z∞
WDy (t, Ω) = WDh (t − τ, Ω)WDx (τ, Ω)dτ.
−∞
for
Z∞
y(t) = h(t − τ ) x (τ )dτ,
−∞
P14 – Product
Z∞
1
WDy (t, Ω) = WDh (t, Ω − v)WDx (t, v)dv
2π
−∞
for
y ( t ) = h ( t ) x ( t ).
The local autocorrelation of y(t) is h(t + τ/2)h∗ (t − τ/2) x (t + τ/2) x ∗ (t − τ/2). Thus, the Wigner
distribution of y(t) is the Fourier transform of the product of local autocorrelations h(t + τ/2)h∗ (t −
τ/2) and x (t + τ/2) x ∗ (t − τ/2). It is a convolution in frequency of the corresponding Wigner
distributions of h(t) and x (t). Property P13 could be proven in the same way using the Fourier
transforms of signals h(t) and x (t).
P15 – Fourier transform property
for q
y(t) = |c|/(2π ) X (ct), c 6= 0.
Here the signal y(t) is equal to the scaled version of the Fourier transform of signal x (t),
Z∞
|c| cτ ∗ cτ − jΩτ
WDy (t, Ω) = X ct + X ct − e dτ
2π 2 2
−∞
Z∞
1 θ θ j(−Ω/c)θ
= X ct + X ∗ ct − e dθ. (11.10)
2π 2 2
−∞
In practical realizations of the Wigner distribution, we are constrained with a finite time lag τ. A pseudo
form of the Wigner distribution is then used. It is defined as
Z∞
PWD (t, Ω) = w(τ/2)w∗ (−τ/2) x (t + τ/2) x ∗ (t − τ/2)e− jΩτ dτ (11.14)
−∞
Ljubiša Stanković Digital Signal Processing 617
where window w(τ ) localizes the considered lag interval. If w(0) = 1, the pseudo Wigner distribution
satisfies the time marginal property. Note that the pseudo Wigner distribution is smoothed in the
frequency direction with respect to the Wigner distribution
Z∞
1
PWD (t, Ω) = WD (t, θ )We (Ω − θ )dθ
2π
−∞
PWD (t,Ω)
1
250
200
150
100
t
50
(a)
0 2.5 3
0.5 1 1.5 2
0
Ω
PWD2(t,Ω)
250
200
150
100
t
50
(b)
0 2.5 3
0.5 1 1.5 2
0
calculate an approximate value of the pseudo Wigner distribution with a window w(τ ) of the
width defined by T = 8.
Z8
PWD (Ω, t) = e− j128 cos(π (t+τ/2)/64) e j128 cos(π (t−τ/2)/64) w(τ )e− jΩτ dτ.
−8
Z8 π3 τ 3 +τ23
j128 sin(πt/64) 1
WD (Ω, t) = e j2π sin(πt/64)τ e 1283 6 w(τ )e− jΩτ dτ.
−8
π3 τ13 +τ23
For |τ1,2 | ≤ 8 it holds 128 128 3 sin( πt/64) 6 ≤ 0.33. By neglecting this term we may write
PWD (Ω, t) ∼
= W (Ω − 2π sin(πt/64)),
where W (Ω) is the Fourier transform of window w(τ ) Fig.11.7(a) (with a Hann(ing) window).
For a wider window this approximation does not hold and the inner interferences in the Wigner
distribution appear, Fig.11.7(b) (with a four times wider Hann(ing) window).
If the signal in (11.14) is discretized in τ with a sampling interval ∆t, then a sum instead of an integral
is formed. The pseudo Wigner distribution of a discrete-lag signal, for a given time instant t, is given by
∞
∗ ∗ − jmΩ∆t
PWD (t, Ω) = ∑ w m ∆t
2 w − m ∆t
2 x t + m ∆t
2 x t − m ∆t
2 e ∆t. (11.15)
m=−∞
Sampling in τ with ∆t = π/Ω0 , Ω0 > Ωm corresponds to the sampling of signal x (t + τ/2) in τ/2
with ∆t/2 = π/(2Ω0 ).
The discrete-lag pseudo Wigner distribution is the Fourier transform of signal
∆t ∆t ∆t ∆t
R(t, m) = w m w∗ −m x t+m x∗ t − m ∆t.
2 2 2 2
Ljubiša Stanković Digital Signal Processing 619
PWD(t,Ω) PWD(t,Ω)
50 50
t t
0 0
Figure 11.7 Pseudo Wigner distribution of sinusoidally frequency modulated signal. Narrow Hann(ing) window
(left) and a four times wider window (right).
with ω = Ω∆t. If the sampling interval satisfies the sampling theorem, then the sum in (11.15) is equal
to the integral form (11.14). A discrete form of the pseudo Wigner distribution, with N + 1 samples
and ω = 2πk/( N + 1), for a given time instant t, is
N/2
PWD (t, k) = ∑ R(t, m)e− j2πmk/( N +1) .
m=− N/2
Here, N/2 is an integer. This distribution could be calculated using the standard DFT routines.
For discrete-time instants t = n∆t, introducing the notation
∆t ∗ ∆t ∆t ∗ ∆t
R(n∆t, m∆t) = w m w −m x n∆t + m x n∆t − m ∆t
2 2 2 2
m m
m ∗ m
R(n, m) = w w∗ − x n+ x n− ,
2 2 2 2
the discrete-time and discrete-lag pseudo Wigner distribution can be written as
∞ m m m ∗ m − jmω
PWD (n, ω ) = ∑ w w∗ − x n+ x n− e . (11.16)
m=−∞ 2 2 2 2
Notation x (n + m/2), for given n and m, should be understood as the signal value at the instant
x ((n + m/2)∆t). In this notation, the discrete-time pseudo Wigner distribution is periodic in ω with
period 2π.
Since various discretization steps are used (here and in open literature), we will provide a relation
of discrete indexes to the continuous time and frequency, for each definition, as
2πk
PWD (t, Ω)|t=n∆t, Ω= 2πk = PWD n∆t, → PWD (n, k).
( N +1)∆t ( N + 1)∆t
The sign → could be understood as the equality sign in the sense of sampling theorem (Example 2.13).
Otherwise, it should be considered as a correspondence sign. The discrete form of (11.14), with N + 1
620 Quadratic Time-Frequency Representations
samples, is
2πk
PWD n∆t, → PWD (n, k)
( N + 1)∆t
N/2 m m m ∗ m − j2πkm/( N +1)
PWD (n, k) = ∑ w w∗ − x n+ x n− e ,
m=− N/2
2 2 2 2
The discrete-time and discrete-lag pseudo Wigner distribution, in this case, is of the form
∞
PWD (n, ω ) = 2 ∑ w(m)w∗ (−m) x (n + m) x ∗ (n − m)e− j2mω . (11.17)
m=−∞
It corresponds to the continuous-time pseudo Wigner distribution (11.14) with substitution τ/2 → τ
Z∞
PWD (t, Ω) = 2 w(τ )w∗ (−τ ) x (t + τ ) x ∗ (t − τ )e− j2Ωτ dτ.
−∞
for − N/2 ≤ 2k ≤ N/2. Since, the standard DFT routines are commonly used for the pseudo Wigner
distribution calculation, we may use every other (2k) sample in (11.18) or oversample the pseudo
Wigner distribution in frequency (as it has been done in time). Then,
n∆t 2πk
PWD , → PWD (n, k)
2 ( N + 1)∆t
N/2
PWD (n, k) = ∑ w(m)w∗ (−m) x (n + m) x ∗ (n − m)e− j2πmk/( N +1) . (11.19)
m=− N/2
This discrete pseudo Wigner distribution, oversampled in both time and in frequency by factor of 2, has
finer time-frequency grid, producing smaller time-frequency estimation errors at the expense of the
calculation complexity.
Ljubiša Stanković Digital Signal Processing 621
Example 11.7. Signal x (t) = exp( j31πt2 ) is considered within −1 ≤ t ≤ 1. Find the sampling
interval of signal for discrete pseudo Wigner distribution calculation. If the rectangular window of
the width N + 1 = 31 is used in analysis, find the pseudo Wigner distribution values and estimate
the instantaneous frequency at t = 0.5 based on the discrete pseudo Wigner distribution.
⋆ For this signal the instantaneous frequency is Ωi (t) = 62πt. It is within the range −62π ≤
Ωi (t) ≤ 62π. Thus, we may approximately assume that the maximum frequency is Ωm =
62π.The sampling interval for the Fourier transform would be ∆t ≤ 1/62. For the direct pseudo
Wigner distribution calculation, it should be twice smaller, ∆t/2
√ ≤ 1/124. Therefore, the discrete
version of the pseudo Wigner distribution, normalized with 2 ∆t, at t = 0.5 or n = 62, is (11.18)
15 2 2
PWD (n, k) = ∑ e j31π ((n+m)/124) e− j31π ((n−m)/124) e− j4πmk/31
m=−15
15 sin( π8 (n − 16k))
= ∑ e jπmn/124 e− j4πmk/31 = π .
m=−15
sin( 248 (n − 16k))
The argument k, when the pseudo Wigner distribution reaches maximum for n = 62, follows
from 62 − 16k = 0 as
62
k̂ = arg max PWD (n, k) = = 4,
k 16
where [·] stands for the nearest integer. Obviously, the exact instantaneous frequency is not on the
discrete frequency grid. The estimated value of the instantaneous frequency at t = 1/2 is
The true value is Ωi (1/2) = 31π. When the true frequency is not on the grid, the estimation can
be improved using the interpolation or displacement bin, as explained in Chapter 1. The frequency
sampling interval is ∆Ω = 4π/(( N + 1)∆t) = 8π, with maximum estimation absolute error
∆Ω/2 = 4π.
If we used the standard DFT routine (11.19) with N + 1 = 31 and all available frequency
samples, we would get
n 2 2
o
PWD (n, k) = DFT31 e j31π ((n+m)/124) e− j31π ((n−m)/124)
15 2 2 sin( π8 (n − 8k))
= ∑ e j31π ((n+m)/124) e− j31π ((n−m)/124) e− j2πmk/31 = π .
m=−15
sin( 248 (n − 8k))
Using an odd number of samples N + 1 in the previous definitions, the symmetry of the product
x (n + m) x ∗ (n − m) is preserved in the summation. However, when an even number of samples is
used, that is not the case. To illustrate this effect, consider a simple example of signal, for n = 0, with
N = 4 samples. Then, four values of the signal x (m), used in calculation, are
622 Quadratic Time-Frequency Representations
So, in forming the local autocorrelation function, there are several possibilities. One is to omit
sample x (−2) and to use an odd number of samples, in this case as well. Also, it is possible to
periodically extend the signal and to form the product based on
Here, we can use four product terms, but with the first one formed as x (−2) x ∗ (−2), that is, as
x (− N/2) x ∗ (− N/2). When a lag window with zero ending value is used (for example, a Hann(ing)
window), this term does not make any influence to the result. The used lag window must also follow
the symmetry, for example we (m) = cos2 (πm/N ), when,
n∆t 2πk
PWD , → PWD (n, k)
2 N∆t
N/2−1
PWD (n, k) = ∑ we (m) x (n + m) x ∗ (n − m)e− j2πmk/N
m=− N/2
N/2−1
= ∑ we (m) x (n + m) x ∗ (n − m)e− j2πmk/N ,
m=− N/2+1
since we (− N/2) = 0. However, if the window is nonzero at the ending point m = − N/2, this term
will result in a kind of aliased distribution.
In order to introduce another way of the discrete Wigner distribution calculation, with an even
number of samples, consider again the continuous form of the Wigner distribution of a signal with a
limited duration. Assume that the signal is sampled in such a way that the sampling theorem can be
applied and the equality sign used (Example 2.13). Then, the integral may be replaced by a sum
N
∗ ∆t − jmΩ∆t
WD (t, Ω) = ∑ x (t + m ∆t
2 ) x (t − m 2 )e ∆t
m=− N
N/2
∗ ∆t − j2mΩ∆t
= ∑ x (t + 2m ∆t
2 ) x ( t − 2m 2 ) e ∆t
m=− N/2
N/2−1
∗ ∆t − j(2m+1)Ω∆t
+ ∑ x (t + (2m + 1) ∆t
2 ) x ( t − (2m + 1) 2 ) e ∆t. (11.20)
m=− N/2
The initial sum is split into its even and odd terms part. Now, let us assume that the signal is sampled in
such a way that twice wider sampling interval ∆t is also sufficient to obtain the Wigner distribution (by
using every other signal sample). Then, for the first sum (with an odd number of samples) holds,
N/2
1
∑ x (t + m∆t) x ∗ (t − m∆t)e− j2mΩ∆t ∆t = WD (t, Ω).
m=− N/2
2
Ljubiša Stanković Digital Signal Processing 623
The factor 1/2 comes from the sampling interval. Now, from (11.20) follows
N/2−1
∗ ∆t − j(2m+1)Ω∆t 1
∑ x (t + (2m + 1) ∆t
2 ) x ( t − (2m + 1) 2 ) e ∆t = WD (t, Ω). (11.21)
m=− N/2
2
This is just the discrete Wigner distribution with an even number of samples. If we denote
x (t + (2m + 1) ∆t ∆t
2 ) = x ( t + m∆t + 2 ) = xe ( t + m∆t )
√
x (n∆t + m∆t + ∆t 2 ) 2∆t = xe ( n + m )
then
∆t ∆t
x (t − m∆t − 2 ) = x ( t − m∆t + 2 − ∆t )
√
∆t
x (n∆t − m∆t + 2 − ∆t ) 2∆t = xe ( n − m − 1).
They would produce a modulated version of the pseudo Wigner distribution, due to the shift of a
half of the sampling interval. However, this shift can be corrected as (11.21)
N/2−1
WD (t, Ω) = e− jΩ∆t ∑ xe (t + m∆t) xe∗ (t − m∆t − ∆t)e− j2mΩ∆t (2∆t)
m=− N/2
for any t and Ω (having in mind the sampling theorem). Thus, we may also write
πk
WD n∆t, → WD (n, k)
N∆t
N/2−1
WD (n, k) = e− jπk/N ∑ xe (n + m) xe∗ (n − m − 1)e− j2πmk/N . (11.22)
m=− N/2
In MATLAB notation, relation (11.1.4) can be implemented, as follows. The signal values are
x+
n = [ xe ( n − N/2), xe ( n − N/2 + 1), . . . , xe ( n + N/2 − 1)],
x− ∗ ∗ ∗
n = [ xe ( n + N/2 − 1), xe ( n + N/2 − 2), . . . , xe ( n − N/2)].
The vector of Wigner distribution values, for a given n and k, is
T
− jπk/N + − − jπkm/N
WD (n, k)=e xn ∗ xn . ∗ e ,
where e− jπkm/N is the vector with elements e− jπkm/N , for − N/2 ≤ m ≤ N/2 − 1, ∗ is the matrix
multiplication and .∗ denotes the vector multiplication term by term.
Thus, in the case of an even number of samples, the discrete Wigner distribution of a signal xe (n),
calculated according to (11.1.4), corresponds to the original signal x (t) related to xe (n) as
√
xe (n) ↔ x (n∆t + ∆t/2) 2∆t.
624 Quadratic Time-Frequency Representations
To check this statement, consider the time marginal property of this distribution. It is
!
1 N/2−1 N/2−1
∗ 1 N/2−1 − j(2m+1)πk/N
∑ WD (n, k) = ∑
N k=−
x e ( n + m ) x e ( n − m − 1)
N k=−∑ e
N/2 m=− N/2 N/2
!
N/2−1 −
1 1 − e j(2m+1)π
= ∑ xe (n + m) xe∗ (n − m − 1) e j(2m+1)π/2
m=− N/2
N 1 − e− j(2m+1)π/N
N/2−1 1 2
= ∑ ( xe (n + m) xe∗ (n − m − 1)δ(2m + 1)) = xe (n − ) = | x (n∆t)|2 (2∆t),
m=− N/2
2
where
Y (k) = DFT N {y(n)},
the pseudo Wigner distribution (11.1.4), without frequency ovesampling, in the case of an even N, can
be calculated as
2πk
WD n∆t, → WD (n, k)
N∆t
N/4−1
WD (n, k) = e− jπk/( N/2) ∑ ( R(n, m) + R(n, m + N/2)) e− j2πmk/( N/2)
m=− N/4
where
R(n, m) = xe (n + m) xe∗ (n − m − 1).
Periodicity in m, for a given n, with period N is assumed in R(n, m), that is, R(n, m + N ) = R(n, m) =
R(n, m − N ). It is needed to calculate R(n, m + N/2) for − N/4 ≤ m ≤ N/4 − 1 using R(n, m)
for − N/2 ≤ m ≤ N/2 − 1 only.
In the case of real-valued signals, in order to avoid the need for oversampling, as well as to
eliminate cross-terms (that will be discussed later) between positive and negative frequency components,
their analytic part is used in calculations.
ZL P
1
SM(t, Ω) = P(θ )STFT (t, Ω + θ )STFT ∗ (t, Ω − θ )dθ, (11.25)
π
− LP
where P(θ ) is a finite frequency domain window (we also assume rectangular form), P(θ ) = 0 for
|θ | > L P . Distribution obtained in this way is referred to as the S-method. Two special cases are: the
spectrogram P(θ ) = πδ(θ ) and the pseudo Wigner distribution P(θ ) = 1.
The S-method can produce a representation of a multi-component signal such that the distribution
of each component is its Wigner distribution, avoiding cross-terms, if the STFTs of the components do
not overlap in time-frequency plane.
Consider a signal
M
x (t) = ∑ xm (t)
m =1
where xm (t) are monocomponent signals. Assume that the STFT of each component lies inside the
region Dm (t, Ω), m = 1, 2, . . . , M and assume that regions Dm (t, Ω) do not overlap. Denote the length
of the m-th region along Ω, for a given t, by 2Bm (t), and its central frequency by Ω0m (t). Under this
assumptions the S-method of x (t) produces the sum of the pseudo Wigner distributions of each signal
component
M
SMx (t, Ω) = ∑ PWDxm (t, Ω), (11.26)
m =1
if the width of the rectangular window P(θ ), for a point (t, Ω), is defined by
Bm (t) − |Ω − Ω0m (t)| for (t, Ω) ∈ Dm (t, Ω)
L P (t, Ω) =
0 elsewhere.
To prove this consider a point (t, Ω) inside a region Dm (t, Ω). The integration interval in (11.25), for
the m-th signal component is symmetrical with respect to θ = 0. It is defined by the smallest absolute
value of θ for which Ω + θ or Ω − θ falls outside Dm (t, Ω), that is,
For Ω > Ω0m (t) and positive θ, the integration limit is reached for θ = Bm (t) − (Ω − Ω0m (t)). For
Ω < Ω0m (t) and positive θ, the limit is reached for θ = Bm (t) + (Ω − Ω0m (t)). Thus, having in
mind the interval symmetry, an integration limit which produces the same value of integral (11.25) as
the value of (11.23), over the region Dm (t, Ω), is given by L P (t, Ω). Therefore, for (t, Ω) ∈ Dm (t, Ω)
we have SMx (t, Ω) = PWDxm (t, Ω). Since regions Dm (t, Ω) do not overlap we have
M
SMx (t, Ω) = ∑ PWDxm (t, Ω).
m =1
M
produces SMx (t, f ) = ∑m =1 PWDxm ( t, Ω ), if the regions
Dm (t, Ω) for m = 1, 2, .., M, are at least
2L P apart along the frequency axis, Ω0p (t) − Ω0q (t) > B p (t) + Bq (t) + 2L P , for each p, q and
t. This is the S-method with constant window width. The best choice of L P is the value when P(θ )
is wide enough to enable complete integration over the auto-terms, but narrower than the distance
626 Quadratic Time-Frequency Representations
between the auto-terms, in order to avoid the cross-terms. If two components overlap for some time
instants t, then the cross-term will appear, but only between these two components and for that time
instants.
A discrete form of the S-method (11.25) reads
L
SM L (n, k) = ∑ S N (n, k + i )S∗N (n, k − i )
i =− L
for P(i ) = 1, − L ≤ i ≤ L (a weighted form P(i ) = 1/(2L + 1) could be used). A recursive relation
for the S-method calculation is
The spectrogram is the initial distribution SM0 (n, k) = |S N (n, k)|2 and 2 Re[S N (n, k + i )S∗N (n, k −
i )], i = 1, 2, . . . , L are the correction terms. Changing parameter L we can start from the spectrogram
( L = 0) and gradually make the transition toward the pseudo Wigner distribution by increasing L.
For the S-method realization we have to implement the STFT first, based either on the FFT
routines or recursive approaches suitable for hardware realizations. After we get the STFT we have to
“correct” the obtained values, according to (11.27), by adding few “correction” terms to the spectrogram
values. Note that S-method is one of the rare quadratic time-frequency distributions allowing easy
hardware realization, based on the hardware realization of the STFT, presented in the first part, and its
“correction” according to (11.27). There is no need for analytic signal since the cross-terms between
negative and positive frequency components are removed in the same way as are the other cross-terms.
If we take that STFT (n, k) = 0 outside the basic period, that is, when k < − N/2 or k > N/2 − 1,
then there is no aliasing when the STFT is alias-free (in this way we can calculate the alias-free Wigner
distribution by taking L = N/2 in (11.27)). The calculation in (11.27) can be performed for the whole
matrix of the S-method and the STFT. This can significantly save time in some matrix based calculation
tools.
There are two ways to implement summation in the S-method. The first one is with a constant
L. Theoretically, in order to get the Wigner distribution for each individual component, the number
of correcting terms L should be such that 2L is equal to the width of the widest auto-term. This will
guarantee cross-terms free distribution for all components which are at least 2L frequency samples
apart.
The second way to implement the S-method is with a time-frequency dependent L = L(n,k) . The
summation, for each point (n, k), is performed as long as the absolute values of S N (n, k + i ) and
S∗N (n, k − i ) for that (n, k ) are above an assumed reference level (established, for example, as a few
percents of the STFT maximum value). Here, we start with the spectrogram, L = 0. Consider the
correction term S N (n, k + i )S∗N (n, k − i ) with i = 1. If the STFT values are above the reference level
then it is included in summation. The next term, with i = 2 is considered in the same way, and so
on. The summation is stopped when a STFT in a correcting term is below the reference level. This
procedure will guarantee cross-terms free distribution for components that do not overlap in the STFT.
with
( a1 , a2 , a3 ) = (−21, −1, 20)
Ljubiša Stanković Digital Signal Processing 627
and
(b1 , b2 , b3 ) = (2, −0.75, −2.8),
is considered at the instant n = 0. The IFs of the signal components are k i = ai , while the
normalized squared amplitudes of the components are indicated by dotted lines in Fig.11.8. An
ideal time-frequency representation of this signal, at n = 0, would be
The starting STFT, with the corresponding spectrogram, obtained using the cosine window of the
width N = 64 is shown in Fig.11.8(a),(b). The first correction term is presented in Fig.11.8(c).
The result of summing the spectrogram with the first correction term is the S-method with
L = 1, Fig.11.8(d). The second correction term (Fig.11.8(e)) when added to SM1 (0, k), produces
the S-method with L = 2, Fig.11.8(f). The S-methods for L = 3, 5, and 8, ending with the
Wigner distribution (L = 31) are presented in Fig.11.8(g)-(j). Just a few correction terms are
sufficient in this case to achieve a high concentration. The cross-terms start appearing at L = 8 and
increase as L increases toward the Wigner distribution. They make the Wigner distribution almost
useless, since they cover a great part of the frequency range, including some signal components
(Fig.11.8(j)).
The optimal number of correction terms L is the one that produces the best S-method
concentration (sparsity), using the ℓ1/2 -norm of the spectrogram and the S-method (corresponding
to the ℓ1 -norm of the STFT). In this case the best concentrated S-method is detected for L = 5. The
spectrogram is the initial distribution SM0 (n, k) = |S N (n, k)|2 and 2 Re[S N (n, k + i )S∗N (n, k −
i )], i = 1, 2, . . . , L are the correction terms. Considering the parameter L as a frame index, we
can make a video of the transition from the spectrogram to the Wigner distribution.
Example 11.9. The adaptive S-method realization will be illustrated on a five-component signal
x (t) defined for 0 ≤ t < 1 and sampled with ∆t = 1/256. The Hamming window of the
width Tw = 1/2 (128 samples) is used for STFT calculation. The spectrogram is presented
in Fig.11.9(a), while the S-method with the constant Ld = 3 is shown in Fig.11.9(b). The
concentration improvement with respect to the case Ld = 0, Fig.11.9(a), is evident. Further
increasing of Ld would improve the concentration, but the cross-terms would also appear. Small
changes are noticeable between the components with constant instantaneous frequency and
between quadratic and constant instantaneous frequency component. An improved concentration,
without cross-terms, can be achieved using the variable window width Ld . The regions Di (n, k),
determining the summation limit Ld (n, k ) for each point (n, k), are obtained by imposing the
reference level corresponding to 0.14% of its maximum value at that time instant n. They are
defined as:
1 when |STFTxi (n, k)|2 ≥ Rn
Di (n, k) =
0 elsewhere
and presented in Fig.11.9(c). White regions mean that the value of spectrogram is below 0.14%
of its maximum value at that time instant n, meaning that the concentration improvement is not
performed at these points. The signal dependent S-method is given in Fig.11.9(d). The method
sensitivity, with respect to the reference level is low.
628 Quadratic Time-Frequency Representations
STFT |SN(0,k)|
first correction term second correction term
*
2Re[SN(0,k+1) SN (0,k−1)] 2Re[S (0,k+2) S *(0,k−2)]
N N
2
|SN(0,k)| =SM0(0,k) (b)+(c)=(d) SM1(0,k) (d)+(e)=(f) SM2(0,k)
k k k
−32 −16 0 16 31 −32 −16 0 16 31 −32 −16 0 16 31
(b) (d) (f)
k
(g) −32 −16 0 16 31 (i)
(h)
k
(j) (k) −32 −16 0 16 31
(l)
Figure 11.8 Analysis of a signal consisting of three LFM components (at the instant n = 0). (a) The STFT with a
cosine window of the width N = 64. (b) The spectrogram. (c) The first correction term. (d) The S-method (SM)
with one correction term. (e) The second correction term. (f) The S-method with two correction terms. (g) The
S-method with three correction terms. (h) The S-method with five correction terms. (i) The S-method with six
correction terms. (j) The S-method with eight correction terms.(k) The S-method with nine correction terms. (l) The
Wigner distribution (the S-method with L = 31 correction term).
In order to provide additional insight into the field of joint time-frequency analysis, as well as
to improve concentration of time-frequency representation, energy distributions of signals were
introduced. We have already mentioned the spectrogram which belongs to this class of representations
and is a straightforward extension of the STFT. Here, we will discuss other distributions and their
generalizations.
The basics condition for the definition of time-frequency energy distributions is that a two-
dimensional function of time and frequency P(t, Ω) represents the energy density of a signal in the
Ljubiša Stanković Digital Signal Processing 629
1 1
0.5 0.5
t 0 t 0
0.5
1
t 0 0.5
t 0
−0.5
Figure 11.9 Time-frequency analysis of a multi-component signal: a) Spectrogram, b) The S-method with a
constant window, with L P = 3, c) Regions of support for the S-method with a variable window width calculation,
corresponding to Q2 = 725, d) The S-method with the variable window width calculated using regions in c).
time-frequency plane. Thus, the signal energy associated with the small time and frequency intervals
∆t and ∆Ω, respectively, would be
However, point by point definition of time-frequency energy densities in the time-frequency plane is
not possible, since the uncertainty principle prevents us from defining concept of energy at a specific
instant and frequency. This is the reason why some more general conditions are being considered to
derive time-frequency distributions of a signal. Namely, one requires that the integral of P(t, Ω) over
Ω, for a particular instant of time should be equal to the instantaneous power of the signal | x (t)|2 ,
while the integral over time for a particular frequency should be equal to the spectral energy density
| X (Ω)|2 . These conditions are known as marginal conditions or marginal properties of time-frequency
distributions.
Therefore, it is desirable that an energetic time-frequency distribution of a signal x (t) satisfies:
– Energy property
Z∞ Z∞
1
P(t, Ω) dΩ dt = Ex , (11.28)
2π
−∞ −∞
630 Quadratic Time-Frequency Representations
|x(t)|2
Integration over Ω
Ω Ω
Integration over t
t |X(Ω)|2
P(t,Ω)
Time and frequency marginal properties (11.29) and (11.30) may be considered as the projections
of the distribution P(t, Ω) along the time and frequency axes, that is, as the Radon transform of P(t, Ω)
along these two directions. It is known that the Fourier transform of the projection of a two-dimensional
function on a given line is equal to the value of the two-dimensional Fourier transform of P(t, Ω),
denoted by AF (θ, τ ), along the same direction (inverse Radon transform property). Therefore, if
P(t, Ω) satisfies marginal properties then any other function having two-dimensional Fourier transform
equals to AF (θ, τ ) along the axes lines θ = 0 and τ = 0, and arbitrary values elsewhere, will satisfy
marginal properties, Fig. 11.11.
Assuming that the Wigner distribution is a basic distribution which satisfies the marginal properties
(any other distribution satisfying marginal properties can be used as the basic one), then any other
distribution with two-dimensional Fourier transform
2 2 θ AF(τ,θ)
|x(t)| FT [ |x(t)| ]
t τ
Integration over Ω
FT [ | X(Ω)|2 ]
2D FT
Ω Ω
Integration over t
t |X(Ω)|2
P(t,Ω)
Figure 11.11 Marginal properties and their relation to the ambiguity function.
Various distributions can be obtained by altering the kernel function c(θ, τ ). For example, c(θ, τ ) = 1
produces the Wigner distribution, while for c(θ, τ ) = e jθτ/2 the Rihaczek distribution follows.
The Cohen class of distributions, defined in the ambiguity domain:
Z∞ Z∞
1
CD (t, Ω) = c(θ, τ ) AF (θ, τ )e jθt− jΩτ dτ dθ (11.34)
2π
−∞ −∞
can be written in other domains, as well. The time-lag domain form is obtained from (11.32), after
integration on θ, as:
Z∞ Z∞
CD (t, Ω) = c T (t − u, τ ) x (u + τ/2) x ∗ (u − τ/2)e− jΩτ dτ du. (11.35)
−∞ −∞
The frequency-Doppler frequency domain form follows from (11.33), after integration on τ, as:
Z∞ Z∞
1
CD (t, Ω) = CΩ (θ, Ω − u) X (u + θ/2) X ∗ (u − θ/2)e jθt dθ du. (11.36)
(2π )2
−∞ −∞
632 Quadratic Time-Frequency Representations
Finally, the time-frequency domain form is obtained as a two-dimensional convolution of the two-
dimensional Fourier transforms, from (11.34), as:
Z∞ Z∞
1
CD (t, Ω) = Π(t − u, Ω − ξ )WD (u, ξ ) du dξ. (11.37)
2π
−∞ −∞
Kernel functions in the respective time-lag, Doppler frequency-frequency and time-frequency domains
are related to the ambiguity domain kernel c(θ, τ ) as:
Z∞
1
c T (t, τ ) = c(θ, τ )e jθt dθ (11.38)
2π
−∞
Z∞
CΩ (θ, Ω) = c(θ, τ )e− jΩτ dτ (11.39)
−∞
Z∞ Z∞
1
Π(t, Ω) = c(θ, τ )e jθt− jΩτ dτ dθ. (11.40)
2π
−∞ −∞
According to (11.37) all distributions from the Cohen class may be considered as 2D filtered versions
of the Wigner distribution. Although any distribution could be taken as a basis for the Cohen class
derivation, the form with the Wigner distribution is used because it is the best concentrated distribution
from the Cohen class with the signal independent kernels.
The analysis performed on ambiguity function and Cohen class of time-frequency distributions leads to
the conclusion that the cross-terms may be suppressed or eliminated, if a kernel c(θ, τ ) is a function of
a two-dimensional low-pass type. In order to preserve the marginal properties c(θ, τ ) values along the
axis should be c(θ, 0) = 1 and c(0, τ ) = 1.
Choi and Williams exploited one of the possibilities defining the distribution with the kernel of
the form
2 2 2
c(θ, τ ) = e−θ τ /σ .
The parameter σ controls the slope of the kernel function which affects the influence of cross-terms.
Small σ causes the elimination of cross-terms but it should not be too small because, for the finite width
of the auto-terms around θ and τ coordinates, the kernel will cause their distortion, as well. Thus, there
should be a trade-off in the selection of σ.
Here we will mention some other interesting kernel functions, producing corresponding
distributions, Fig. 11.12:
Born-Jordan distribution
sin( θτ
2 )
c(θ, τ ) = θτ
,
2
Zhao-Atlas-Marks distribution
sin( θτ
2 )
c(θ, τ ) = w(τ ) |τ | θτ
,
2
Sinc distribution
θτ 1 for |θτ/α| < 1/2
c(θ, τ ) = rect( )=
α 0 otherwise
Ljubiša Stanković Digital Signal Processing 633
Butterworth distribution
1
c(θ, τ ) = ,
1 + ( θθτ
c τc
)2N
where w(τ ) is a function corresponding to a lag window and α, N, θc and τc are constants in the above
kernel definitions.
c(θ,τ) c(θ,τ)
100 100
50 50
τ 0 τ 0
−50 −50
(a) (b)
−100 −100
0 2 0 2
−2 −2
θ θ
c(θ,τ) c(θ,τ)
100 100
50 50
τ 0 τ 0
−50 −50
(c) (d)
−100 −100
0 2 0 2
−2 −2
θ θ
Figure 11.12 Kernel functions for: Choi-Williams distribution, Born-Jordan distribution, Sinc distribution and
Zhao-Atlas-Marks distribution.
The spectrogram belongs to this class of distributions. Its kernel in (θ, τ ) domain is the ambiguity
function of the window
Z∞ τ τ − jθt
c(θ, τ ) = w t− w t+ e dt = AFw (θ, τ ).
2 2
−∞
Since the Cohen class is linear with respect to the kernel, it is easy to conclude that a distribution from
the Cohen class is positive if its kernel can be written as
M
c(θ, τ ) = ∑ ai AFw (θ, τ ), i
i =1
634 Quadratic Time-Frequency Representations
where ai ≥ 0, i = 1, 2, . . . , M.
There are several ways for calculation of the reduced interference distributions from the Cohen
class. The first method is based on the ambiguity function (11.34):
1. Calculation of the ambiguity function,
2. Multiplication with the kernel,
3. Calculation of the inverse two-dimensional Fourier transform of this product.
The reduced interference distribution can be calculated by using (11.35) or (11.37) with
appropriate kernel transformations defined by (11.38) and (11.40). All these methods assume signal
oversampling in order to avoid aliasing effects. Figure 11.13 shows the ambiguity function along with
kernel (Choi-Williams). Figure 11.14(a) presents Choi-Williams distribution calculated according to the
presented procedure. In order to reduce high side lobes of the rectangular window, the Choi-Williams
distribution is also calculated with the Hann(ing) window in the kernel definition c(θ, τ )w(τ ) and
shown in Fig. 11.14(b). The pseudo Wigner distribution with Hann(ing) window is given in Fig. 11.6.
100
50
τ 0
−50
−100
−3 −2 −1 0 1 2 3
θ
Figure 11.13 Ambiguity function for signal from Fig.10.4 with the Choi-Williams kernel
For the discrete-time signals. there are several ways to calculate a reduced interference
distributions from the Cohen class, based on (11.34), (11.35), (11.36), or (11.37).
The kernel functions are usually defined in the Doppler-lag domain (θ, τ ). Thus, here we should
use (11.34) with the ambiguity function of a discrete-time signal
Ljubiša Stanković Digital Signal Processing 635
CWD(t,Ω)
250
200
150
t 100
50 (a)
0 2.5 3
0.5 1 1.5 2
0
Ω
CWD(t,Ω)
250
200
150
t 100
50 (b)
0 2.5 3
0.5 1 1.5 2
0
Ω
Figure 11.14 Choi-Williams distribution: (a) direct calculation, (b) calculation with the kernel multiplied by a
Hann(ing) lag window.
∞
∆t ∗ ∆t
AF (θ, m∆t) = ∑ x p∆t + m x p∆t − m e− jpθ∆t ∆t.
p=−∞ 2 2
The signal should be sampled as in the Wigner distribution case. For a given lag instant m, the ambiguity
function can be calculated using the standard DFT routines. Another way to calculate the ambiguity
function is to take the inverse two-dimensional transform of the Wigner distribution. Note that the
corresponding transformation pairs are time ↔ Doppler and lag ↔ f requency, that is, t ↔ θ and
τ ↔ Ω. The relation between discretization values in the Fourier transform pairs (considered interval,
sampling interval in time ∆t, number of samples N, sampling interval in frequency ∆Ω = 2π/( N∆t))
is discussed in Chapter 1.
The generalized ambiguity function is obtained as
while a distribution, with kernel c(θ, τ ) is the two-dimensional inverse Fourier transform in the form
1 ∞ ∞
CD (n∆t, k∆Ω) = ∑ ∑ AFg (l∆θ, m∆t)e− jkm∆t∆Ω e jnl∆θ∆t ∆t∆θ.
2π l =−∞ m=−∞
In this notation we can calculate CD (n, k) = IDFT2D l,m AFg ( l, m ) , where the values of AFg ( l, m )
are calculated according to (11.41).
In the time-lag domain, the discrete-time form reads
∞ ∞
CD (n∆t, k∆Ω) = ∑ ∑ c T (n∆t − p∆t, m∆t)
p=−∞m=−∞
∆t ∆t
× x p∆t + m x∗ p∆t − m e− jkm∆t∆Ω (∆t)2 (11.42)
2 2
with
1 ∞
c T (n∆t − p∆t, m∆t) = ∑ c(l∆θ, m∆t)e jnl∆θ∆t e− jl p∆θ∆t ∆θ.
2π l =− ∞
For the discrete-time signals, it is common to write and use the Cohen class of distributions in the form
∞ ∞
CD (n, ω ) = ∑ ∑ c T (n − p, m) x ( p + m) x ∗ ( p − m)e− j2mω , (11.43)
p=−∞ m=−∞
where
∆t ∆t
x ( p + m) x∗ ( p − m) = x ( p + m) x∗ ( p − m) ∆t
2 2
∆t ∆t
c T (n − p, m) = c T (n − p) , m∆t
2 2
∆t
CD (n, ω ) → CD n , Ω∆t .
2
Here we should mention that the presented kernel functions are of infinite duration along the coordinate
axis in (θ, τ ) thus, they should be limited in calculations. Their transforms exist in a generalized sense
only.
Distributions from the Cohen class can be calculated using decomposition of the kernel function in the
time-lag domain. Starting from
Z∞ Z∞
CD (t, Ω) = c T (t − u, τ ) x (u + τ/2) x ∗ (u − τ/2)e− jΩτ dτdu
−∞ −∞
Then, it is easy to conclude that the Cohen class of distribution can be written as a sum of spectrograms:
where λi represents eigenvalues, while qi are corresponding eigenvectors of C, that is, columns of Q,
used as windows in the STFT calculations.
Example 11.10. A four-component real-valued signal with M = 384 samples is considered. Its STFT
is calculated with a Hann(ing) window of the width N = 128 with a step of 4 samples. The
spectrogram (L = 0) is shown in Fig.11.15(a). The alias-free Wigner distribution (L = N/2) is
presented in Fig. 11.15(b). The Choi-Williams distribution of analytic signal is shown in Fig.
11.15(c). Its cross-terms are smoothed by the kernel, that also spreads the auto-term of the LFM
signal and chirps. The S-method with L = 10 is shown in Fig. 11.15(d). For graphical presentation,
the distributions are interpolated by a factor of 2. In all cases the pure sinusoidal signal is well
concentrated. In the Wigner distribution and the SM the same concentration is achieved for the
LFM signal.
638 Quadratic Time-Frequency Representations
SPEC(t,Ω) WD(t,Ω)
250 250
200 200
150 150
t 100 t 100
50 50
0 2 2.5 3 0 2 2.5 3
a) 0.5 1 1.5 b) 0.5 1 1.5
0 0
Ω Ω
CWD(t,Ω) SM(t,Ω)
250 250
200 200
150 150
t 100 t 100
50 50
0 2 2.5 3 0 2 2.5 3
c) 0.5 1 1.5 d) 0.5 1 1.5
0 0
Ω Ω
Figure 11.15 Time-frequency representation of a four component signal: (a) the spectrogram, (b) the Wigner
distribution, (c) the Choi-Williams distribution, and (d) the S-method.
Chapter 12
Wavelet Transform
The first form of functions having the basic property of wavelets was used by Haar at the beginning
of the twentieth century. At the beginning of 1980’s, Morlet introduced a form of basis functions for
analysis of seismic signals, naming them “wavelets”. Theory of wavelets was linked to the image
processing by Mallat in the following years. In late 1980s Daubechies presented a whole new class of
wavelets that can be implemented in a simple way, using digital filtering ideas. The most important
applications of the wavelets are found in image processing and compression, pattern recognition and
signal denoising. Here, we will only link the basics of the wavelet transform to the time-frequency
analysis.
Common STFT is characterized by a constant window and constant time and frequency resolutions
for both low and high frequencies. The basic idea behind the wavelet transform, as it was originally
introduced by Morlet, was to vary the resolution with scale (being related to frequency) in such a way
that a high frequency resolution is obtained for signal components at low frequencies, whereas a high
time resolution is obtained for signal at high frequency components. This kind of resolution change
could be relevant for some practical applications, like for example seismic signals. It is achieved by
introducing a frequency variable window width. The window width is decreased as frequency increases.
The basis functions in the STFT are
Z∞
STFTI I (t, Ω0 ) = x (τ )w∗ (τ − t)e− jΩ0 τ dτ
−∞
D E Z∞
= x (τ ), w(τ − t)e jΩ0 τ = h x (τ ), h(τ − t)i = x (τ )h∗ (τ − t)dτ
−∞
where h(τ − t) = w(τ − t)e jΩ0 τ is a a band-pass signal, obtained when a real-valued window w(τ − t)
is modulated by e jΩ0 τ . Notice that h(τ − t) = w(τ − t)e jΩ0 (τ −t) is also used. This form follows from
Z∞ Z∞
∗ − jΩ0 τ
STFT (t, Ω0 ) = x (t + τ )w (τ )e dτ = x (τ )w∗ (τ − t)e− jΩ0 (τ −t) dτ.
−∞ −∞
When the above idea about the wavelet transform is translated into the mathematical form and
related to the STFT, one gets the definition of a continuous wavelet transform
639
640 Wavelet Transform
Z∞
1 τ−t
WT (t, a) = p x (τ ) h∗ ( )dτ, (12.1)
| a| a
−∞
where h(t) is a band-pass signal, and the parameter a is the scale. This transform produces a time-scale,
rather than the time-frequency signal representation. For the Morlet wavelet the relation between the
scale and the frequency is a = Ω0 /Ω. In order to establish a strong formal relationship between the
wavelet transform and the STFT, we will choose the basic Morlet wavelet h(t) in the form
where w(t) is a window function and Ω0 is a constant frequency. For the Morlet wavelet we have a
Gaussian function r
1 −ατ2
w(τ ) = e
2π
√
where the values of α and Ω0 are chosen such that the ratio of h(0)t=0 = 1/ 2π and the first
maximum (of the real part of the Morlet wavelet w(τ − t) cos(Ω0 τ ), which √ is also used in the
analysis) at τ = 2π/Ω0 is equal to 1/2 = exp (−α4π 2 /Ω20 ), that is, Ω0 = 2π α/ ln 2. Substitution
of (12.2) into (12.1) leads to a continuous wavelet transform form suitable for a direct comparison with
the STFT
Z∞ Z∞
1 ∗ τ − t − jΩ0 (τ −t)/a 1 τ
WT (t, a) = p x (τ )w ( )e dτ = p x (t + τ )w∗ ( )e− jΩ0 τ/a dτ.
| a| a | a| a
−∞ −∞
(12.3)
From the definition of w(τ/a) it is obvious that small Ω (that is, large a) corresponds to a
wide wavelet, that is, a wide window, and vice versa. The basic idea of the wavelet transform and its
comparison with the STFT is illustrated in Fig. 12.1.
From the filter theory point of view the wavelet transform,
p for a given scale a, could be considered
as the output of system with impulse response h∗ (−t/a) | a|, that is,
q
WT (t, a) = x (t) ∗t h∗ (−t/a) | a|,
where ∗t denotes a convolution in time. Similarly the STFT, for a given Ω, may be considered as
STFTI I (t, Ω) = x (t) ∗t [w∗ (−t)e jΩt ]. If we consider these two bandpass filters from the bandwidth
point of view we can see that, in the case of STFT, the filtering is done by a system whose impulse
response w∗ (−t)e jΩt has a constant bandwidth, being equal to the width of the Fourier transform of
w ( t ).
Constant Q-Factor Transform: The quality factor Q for a band-pass filter, as measure of the filter
selectivity, is defined as
Central Frequency
Q=
Bandwidth
In the STFT the bandwidth is constant, equal to the window Fourier transform width, Bw . Thus, factor
Q is proportional to the considered frequency,
Ω
Q= .
Bw
In the case of the wavelet transform the bandwidth of impulse response is the width of the Fourier
transform of w(t/a). It is equal to B0 /a, where B0 is the constant bandwidth corresponding to the
Ljubiša Stanković Digital Signal Processing 641
Ω=Ω0/2
a=2
(a) (b)
t t
Ω=Ω0
a=1
(c) (d)
t t
Ω=2Ω0
a=1/2
(e) (f)
t t
Figure 12.1 Expansion functions for the wavelet transform (left) and the short-time Fourier transform (right). Top
row presents high scale (low frequency), middle row is for medium scale (medium frequency) and bottom row is for
low scale (high frequency).
WT(t,Ω) STFT(t,Ω)
Ω Ω
Ω2 Ω2
Ω1 Ω1
(a) (b)
t1 t2 t t1 t2 t
Figure 12.2 Illustration of the wavelet transform (a) of a sum of two delta pulses and two sinusoids compared to
the STFT (b)
In analogy with spectrogram, the scalogram is defined as the squared magnitude of a wavelet
transform:
SCAL(t, a) =| WT (t, a) |2 . (12.6)
The scalogram obviously loses the linearity property, and fits into the category of quadratic transforms.
12.1.1 S-Transform
The S-transform (the Stockwell transform) is conceptually a combination of the STFT analysis and
wavelet analysis. It employs a common window, as in the STFT, with a frequency variable length
as in the wavelet transform. The frequency-dependent window function produces a higher frequency
resolution at lower frequencies, while at higher frequencies sharper time localization can be achieved,
the same as in the continuous wavelet case.
For a signal x (t) the S-transform is defined by
+
Z∞ 2 2
|Ω| − (τ −8π
t) Ω
Sc (t, Ω) = x (τ )e 2 e− jΩτ dτ, (12.7)
(2π )3/2
−∞
|Ω| − τ2 Ω22
w(τ, Ω) = e 8π , (12.9)
(2π )3/2
the definition of the continuous S-transform can be rewritten as follows
+
Z∞
Sc (t, Ω) = e− jΩt x (t + τ )w(τ, Ω)e− jΩτ dτ. (12.10)
−∞
A discretization over τ of (12.10) results in the discrete form of S-transform
Ljubiša Stanković Digital Signal Processing 643
The spectral domain STFT can be obtained from the corresponding time domain form
Z∞ Z∞ Z∞
1
STFT (t, Ω) = x (t + τ )w∗ (τ )e− jΩτ dτ = X (θ )e j(t+τ )θ w∗ (τ )e− jΩτ dθdτ
2π
−∞ −∞ −∞
Z∞ Z∞
1 1
= X (θ )W ∗ (θ − Ω)e jθt dθ = ∗
X (θ )WΩ (θ )e jθt dθ,
2π 2π
−∞ −∞
where W (θ ) is the Fourier transform of the window function w(τ ) and WΩ ∗ ( θ ) is its bandpass form
∗
centered at the frequency Ω (including the possibility that the form of WΩ (θ ) is frequency-varying and
that it changes with Ω as well, as in Section 10.6.1.1). The STFT can be considered as the projection
(inner product) of the Fourier transform of the signal, X (θ ), onto the kernel WΩ (θ )e− jθt .
The inversion relation can be derived in the same way as in Section 10.2. Assume that the STFT
is calculated (available) for a set of discrete frequency values Ωi . The Fourier transform of the signal is
a projection of the STFT onto the kernel functions
D E Z∞
∗
STFT (t, Ω), WΩ (θ )e jθt =∑ STFT (t, Ωi )WΩi (θ )e− jθt dt
Ω,t Ωi − ∞
∗ ∗
= ∑ X (θ )WΩ i
(θ )WΩi (θ ) = X (θ ) ∑ WΩ i
(θ )WΩi (θ ) = X (θ )
Ωi Ωi
if the condition
∑ |WΩ (θ )|2 = 1,
i
(12.12)
Ωi
holds. The inverse Fourier transform relation,
Z∞
STFT (t, Ω)e− jθt dt = X (θ )WΩ
∗
( θ ),
−∞
is used. Notice that we have not used a factor of 1/(2π ) within the scalar product definition and the
summation over Ωi , in order to simplify the notation. With this factor, the reconstruction condition
would be ∑Ωi |WΩi (θ )|2 /(2π ) = 1.
The spectral wavelet function is defined as a projection od the signal onto a set of the kernel
functions W ( ai θ )e− jθt = WΩi (θ )e− jθt , where ai is the scale which changes the position and the form
of the basic bandpass function W (θ ).
The Meyer wavelet transfer functions in the spectral domain, at a scale ai , in the notation W ( ai θ ),
are defined as in (10.40), (10.42), and (10.43),
sin π2 q( ai θ − 1) , for 1 < ai θ ≤ M
Wi (θ ) = W ( ai θ ) = (12.13)
cos π2 q( aMi θ − 1) , for M < ai θ ≤ M2
644 Wavelet Transform
and 0 elsewhere, for 2 ≤ i ≤ K − 1, where q = 1/( M − 1). The sine and cosine function arguments
are such that they are either 0 or π/2 at the interval ending points. The scales ai for each frequency
interval are related through a geometric progression with a factor of M > 1, that is
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 (a)
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 (b)
Figure 12.3 (a) Spectral domain windows (sine type) for the wavelet transform, 0 ≤ θ ≤ 8, that satisfy the
reconstruction condition ∑i W 2 ( ai θ ) = 1, with , with W ( a0 θ ) = G (θ ), M = 2, θmax = 8, and K = 7. (b)
Spectral domain windows (Meyer spectral domain wavelet) for the wavelet transform, 0 ≤ θ ≤ 8, that satisfy
the reconstruction condition ∑i W 2 ( ai θ ) = 1, with W ( a0 θ ) = G (θ ), M = 2, θmax = 8, and K = 7.
Since W ( ai θ ) are bandpass functions, to handle the lowpass spectral components (the interval for
θ which includes θ = 0), the lowpass type scale function, G (θ )), is added in the form
1, for 0 ≤ θ ≤ M/aK = θmax /MK −1
G (θ ) = cos π2 q( aM Kθ
− 1) , for M < aK θ ≤ M2 (12.15)
0, elsewhere.
Ljubiša Stanković Digital Signal Processing 645
An example of the frequency domain windows (spectral transfer functions) of this wavelet is
shown in Fig. 12.3(a).
The reconstruction condition in (12.12) can be written in the form of a sum of all normalized
spectral transfer functions
K −1
∑ |W (ai θ )|2 = |G(θ )|2 + ∑ |W (ai θ )|2 = 1. (12.16)
ai i =1
The derivative discontinuity at the frequency band ending points can be avoided by introducing
the polynomial argument into sine and cosine functions of the form
This polynomial will keep the property that the argument is such that v x (0) = 0 and v x (1) = 1, but it
will make the derivatives smooth at the transition points. The Meyer wavelet functions are defined by
sin π2 v x q( ai θ − 1) , for 1 < ai θ ≤ M
W ( ai θ ) = π ai θ (12.18)
cos v
2 x q ( M − 1 ) , for M < ai θ ≤ M2
0, elsewhere.
The same form is used in the first transfer function W ( a1 θ ) and the scale function G (θ ). The spectral
Meyer wavelet functions with the same parameters as in the previous example, are shown in Fig.
12.3(b). They exhibit smooth transitions and they satisfy the reconstruction condition (12.16).
This analysis will start by splitting the signal’s spectral content into its high frequency and low frequency
part. Within the STFT framework, this can be achieved by a two sample rectangular window
w ( m ) = δ ( m ) + δ ( m − 1),
1 1 1
STFT (n, 0) = √ ∑ x (n + m)e− j0 = √ ( x (n) + x (n + 1)) = x L (n), (12.19)
2 m =0 2
1 N −1
STFT (n, k) = √ ∑ x (n + m)e− j2πkm/N
N m =0
N/2−1 − j2πkm/N in order to remain within the
is used, instead of STFT (n, k) = ∑m =− N/2 x ( n + m ) e √
common wavelet literature notation. For√
the same reason the STFT is scaled by N (a form when the
DFT and IDFT have the same factor 1/ N).
646 Wavelet Transform
This kind of signal analysis leads to the Haar (wavelet) transform. In the Haar wavelet
transform the high-frequency part, x H (n) is not processed anymore. It is kept with this (high) two-
samples resolution in time. The resolution in time of x H (n, 1) is just slightly (two-times) √ lower
than the original signal sampling interval. The lowpass part x L (n) = ( x (n) + x (n + 1)) / 2 will
be used in further processing. After the signal samples x (n) and x (n + 1) are processed using
(12.19) and (12.20), then next two samples x (n + 2) and x (n√+ 3) are analyzed. The highpass
part is again calculated x H (n + 2) = √( x (n + 2) − x (n + 3)) / 2 and kept as it is. Lowpass part
x L (n + 2) = ( x (n + 2) + x (n + 3)) / 2 is considered as a new signal, along with its corresponding
previous sample x L (n).
Spectral content of the lowpass part of signal is divided, in the same way, into its low and high
frequency part,
1 1
x LL (n) = √ ( x L (n) + x L (n + 2)) = [ x (n) + x (n + 1) + x (n + 2) + x (n + 3)]
2 2
1 1
x LH (n) = √ ( x L (n) − x L (n + 2)) = [ x (n) + x (n + 1) − [ x (n + 2) + x (n + 3)]] .
2 2
The highpass part x LH (n) is left with resolution four in time, while the lowpass part is further
processed in the same way, by dividing spectral content of x LL (n) and x LL (n + 4) into its low and
high frequency part. This process is continued until the full length of signal is achieved. The Haar
wavelet transformation matrix in the case of signal with 8 samples is
√
√ 2W1 (0, H ) 1 −1 0 0 0 0 0 0 x (0)
√2W1 (2, H ) 0 0 1 −1 0 0 0 0
x (1)
√2W1 (4, H ) 0 0 0 0 1 − 1 0 0 x (2)
2W1 (6, H ) 0 0 0 0 0 0 1 −1
x (3) .
= (12.21)
2W2 (0, H ) 1 1 − 1 − 1 0 0 0 0 x ( 4 )
2W2 (4, H ) 0 0 0 0 1 1 −1 −1
x (5)
√
2 2W4 (0, H ) 1 1 1 1 −1 −1 −1 −1 x (6)
√
2 2W4 (0, L) 1 1 1 1 1 1 1 1 x (7)
This kind of signal transformation was introduced by Haar more than a century ago . In this notation
scale a = 1 values of the wavelet coefficients W1 (2n, H ) are equal to the highpass part of signal
calculated using two samples, W1 (2n, H ) = x H (2n). The scale a = 2 wavelet coefficients are
W2 (4n, H ) = x LH (4n). In scale a = 4 there is only one highpass and one lowpass coefficient at
n = 0, W4 (8n, H ) = x LLH (8n) and W4 (8n, L) = x LLL (8n). In this way any length of signal N = 2m
can be decomposed into Haar wavelet coefficients.
The Haar wavelet transform has a property that its highpass coefficients are equal to zero if the
analyzed signal is constant within the analyzed time interval, for considered scale. If signal has large
number of constant value samples within the analyzed time intervals, then many Haar wavelet transform
coefficients are zero valued. They can be omitted in signal storage or transmission. In recovery their
values are assumed as zeros and the original signal is obtained. The same can be done in the case
of noisy signals, when all coefficients bellow an assumed level of noise can be zero-valued and the
signal-to-noise ratio in the reconstructed signal improved.
Although the presented Haar wavelet analysis is quite simple we will use it as an example to introduce
the filter bank framework of the wavelet transform. Obvious results from the Haar wavelet will be used
to introduce other wavelet forms. For the Haar wavelet calculation two signals x L (n) and x H (n) are
formed according to (12.19) and (12.20), based on the input signal x (n). Transfer functions of the
Ljubiša Stanković Digital Signal Processing 647
2 jω 2 jω 2
|H (e )| +|H (e )| =2
L H
1.8
1.6 jω
|H (e )|=|DFT{φ (n)}|
L 1
1.4
1.2
0.8 jω
|H (e )|=|DFT{ψ (n)}|
H 1
0.6
0.4
0.2
0
−3 −2 −1 0 1 2 3
√
Figure 12.4 Amplitude of the Fourier transform of basic Haar wavelet and scale function divided by 2.
s L (n) = x L (2n)
s H (n) = x H (2n). (12.23)
Downsampling of a signal x (n) to get the signal y(n) = x (2n) is described in the z-transform domain
by the function
1 1
Y (z) = X (z1/2 ) + X (−z1/2 ).
2 2
648 Wavelet Transform
2
x(n)
X(z)
H (z)
L ↓ [X(z1/2)H (z1/2)+X(−z1/2)H (−z1/2)]/2
L L
Figure 12.5 Signal filtering by a low pass and a high pass filter followed by downsaampling by 2.
If the signals s L (n)and s H (n) are passed through the lowpass and highpass filters HL (z) and
HH (z) and then downsampled,
1 1
S L (z) = H (z1/2 ) X (z1/2 ) + HL (−z1/2 ) X (−z1/2 )
2 L 2
1 1
S H (z) = HH (z ) X (z ) + HH (−z1/2 ) X (−z1/2 )
1/2 1/2
2 2
hold.
12.2.2 Upsampling
Let us assume that we are not going to transform the signals s L (n) and s H (n) any more. The only goal
is to reconstruct the signal x (n) based on its downsampled lowpass and highpass part signals s L (n)
and s H (n). The first step in the signal reconstruction is to restore the original sampling interval of the
discrete-time signal. It is done by upsampling the signals s L (n) and s H (n).
Upsampling of a signal x (n) is described by
since
∞
X ( z2 ) = ∑ x (n)z−2n = . . . x (−1)z2 + 0 · z1 + x (0) + 0 · z−1 + x (1)z−2 + . . . . (12.25)
n=−∞
If a signal x (n) is downsampled first and then upsampled, the resulting signal transform is
1 1/2 2 1 2
Y (z) =
X( z ) + X (− z1/2 )
2 2
1 1
Y (z) = X (z) + X (−z). (12.26)
2 2
In the Fourier domain it means Y (e jω ) = ( X e jω + X e j(ω +π ) . This form indicates that an
aliasing component X e j(ω +π ) appeared in this process.
In general, when the signal is downsampled and upsampled the aliasing appears since the component
X (−z) exists in addition to the original signal X (z) in (12.26). The upsampled versions of signals
s L (n) and s H (n) should be appropriately filtered and combined in order to eliminate aliasing. The
conditions to avoid the aliasing in the reconstructed signal will be studied next.
2
S (z)
H
H (z)
H ↓ ↑ G (z)
H
2
x(n) y(n)
+
X(z) Y(z)
2
HL(z) ↓ ↑ GL(z)
S (z)
L
2
Figure 12.6 One stage of the filter bank with reconstruction, corresponding to the one stage of the wavelet
transform realization.
In the reconstruction process the signals are upsampled (S L (z) → S L (z2 ) and S H (z) → S H (z2 ))
and passed through the reconstruction filters GL (z) and GL (z) before being added up to form the
output signal, Fig.12.6. The output signal transforms are
1 1
YL (z) = S L (z2 ) GL (z) = [ HL (z) X (z) + HL (−z) X (−z)] GL (z)
2 2
2 1 1
YH (z) = S H (z ) GH (z) = [ HH (z) X (z) + HH (−z) X (−z)] GH (z)
2 2
650 Wavelet Transform
Y ( z ) = X ( z ).
HL (−z) GL (z)
GH (z) =
HH (−z)
HL (z) GL (−z)
HH (z) = .
GH (−z)
Second expression is obtained from (12.28) with z being replaced by −z, when HL (z) GL (−z) +
HH (z) GH (−z) = 0. Substituting these values into (12.27) we get
HL (e jω ) GL (e jω ) + HH (e jω ) GH (e jω ) = 2 (12.32)
jω jω jω jω
HL (−e ) GL (e ) + HH (−e ) GH (e ) = 0.
Ljubiša Stanković Digital Signal Processing 651
The wavelet transform is calculated using downsampling by a factor 2. One of the basic requirements
that will be imposed to the filter impulse response for an efficient signal reconstruction is that it is
orthogonal to its shifted version with step 2 (and its multiples). In addition the wavelet functions in
different scales should be orthogonal. Orthogonality of wavelet function in different scales will be
discussed later. The orthogonality condition for the impulse response is
For the Haar wavelet transform this condition is obviously satisfied. In general, for wavelet transforms
when the duration of impulse response h L (n) is greater than two, the previous relation can be understood
as a downsampled convolution of h L (n) and h L (−n)
The Fourier transform of the downsampled convolution, for real-valued h L (n) is, (12.24)
1 2 1
2
FT{r (2n))} = HL (e jω/2 ) + HL (−e jω/2 ) .
2 2
From r (2n) = δ(n) follows
2 2
HL (e jω ) + HL (−e jω ) = 2.
The impulse response is orthogonal, in the sense of (12.33), if the frequency response satisfies
2 2
HL (e jω ) + HL (e j(ω +π ) ) = 2.
If the impulse response h L (n) is orthogonal, as in (12.33), then the last relation is satisfied for
g L (n) = h L (−n).
or P(z) + P(−z) = 2 with P(z) = GL (z) GL (z−1 ). Relation (12.34) may also written for HL (z)
If the highpass filters are obtained from corresponding lowpass filters by reversal, in addition to
common multiplication by (−1)n , then
g H (n) = (−1)n g L (K − n)
K K
GH (e jω ) = ∑ gH (n)e− jωn = ∑ (−1)n gL (K − n)e− jωn
n =0 n =0
K K
= ∑ (−1)K−m gL (m)e− jω(K−m) = (−1)K e− jωK ∑ e jπm gL (m)e− j(−ω)m
m =0 m =0
− jωK − j(ω −π ) − jωK
= −e GL (e ) = −e GL (−e− jω )
or
GH (e jω ) = −e− jωK GL (−e− jω ) = −e− jωK HL (−e jω )
for GL (e jω ) = HL (e − jω ). Similar relation holds for the anticausal h H (n) impulse response
The reconstruction conditions are satisfied since, according to (12.27) and (12.31), a relation
corresponding to
HH (z) GH (z) = HL (−z) GL (−z)
holds in the Fourier domain
h ih i
HH (e jω ) GH (e jω ) = −e jωK HL (−e− jω ) −e− jωK HL (−e jω )
HL (e jω ) = GL (e− jω )
GH (e jω ) = −e− jωK GL (−e− jω )
HH (e jω ) = −e jωK GL (−e jω ). (12.35)
Ljubiša Stanković Digital Signal Processing 653
Note that the following symmetry of the frequency response amplitude functions holds
HL (e jω ) = GL (e− jω ) = HH (e j(ω +π ) ) = HH (e− j(ω +π ) ) .
∑ h L (m)h H (m − 2n) = 0
m
is also satisfied with these forms of transfer functions for any n. Since
and Z {h L (2n) ∗ h H (−2n)} = 0, in the Fourier domain this relation assumes the form
HL (−e jω ) GL (e jω ) + HH (−e jω ) GH (e jω ) = 0
The condition that the reconstruction filter GL (z) has zero value at z = e jπ = −1 means that√its form
is GL (z) = a(1 + z−1 ). This form without additional requirements would produce a = 1/ 2 from
the reconstruction relation GL (z) GL (z−1 ) + GL (−z) GL (−z−1 ) = 2. The time domain filter form is
1
g L (n) = √ [δ(n) + δ(n − 1)] .
2
It corresponds to the Haar wavelet. All other filter functions can be defined using g L (n) or GL (e jω ).
The same result would be obtained starting from the filter transfer functions for the Haar wavelet
already introduced as
1
H L ( z ) = √ (1 + z )
2
1
H H ( z ) = √ (1 − z ) .
2
The reconstruction filters are obtained from (12.27)-(12.28)
1 1
√ (1 + z ) G L ( z ) + √ (1 − z ) G H ( z ) = 2
2 2
1 1
√ (1 − z ) G L ( z ) + √ (1 + z ) G H ( z ) = 0
2 2
654 Wavelet Transform
as
1
G L ( z ) = √ 1 + z −1 (12.37)
2
1
G H ( z ) = √ 1 − z −1
2
with
1 1
g L (n) = √ δ(n) + √ δ ( n − 1) (12.38)
2 2
1 1
g H (n) = √ δ(n) − √ δ ( n − 1).
2 2
The values impulse responses in the Haar wavelet transform (relations (12.22) and (12.38)) are:
√ √ √ √
n 2h L (n) 2h H (n) n 2g L (n) 2g H (n)
0 1 1 0 1 1
−1 1 −1 1 1 −1
A detailed time domain filter bank implementation of the reconstruction process in the Haar wavelet
case is described. The reconstruction is implemented in two steps:
1) The signals s L (n) and s H (n) from (12.23) are upsampled, according to (12.25), as
These signals are then passed trough the reconstruction filters. A sum of the outputs from these filters is
1
y(0) = √ [ x L (0) + x H (0)] = x (0)
2
1
y(1) = √ [ x L (0) − x H (0)] = x (1)
2
...
1
y(2n) = √ [ x L (2n) + x H (2n)] = x (2n)
2
1
y(2n + 1) = √ [ x L (2n) − x H (2n)] = x (2n + 1).
2
A system for the implementation of the Haar wavelet transform of a signal with eight samples is
presented in Fig.12.7. It corresponds to the matrix form realization (12.21).
Ljubiša Stanković Digital Signal Processing 655
discrete−time n 0 1 2 3 4 5 6 7
⋆ The wavelet transform of a signal with M = 16 samples after the stage a = 1 is shown in
Fig.12.8(a). The whole frequency range is divided into √ two subregions, denoted by L and H within √
the coefficients W1 (n, L) = [ x (n) + x (n + 1)] / 2 and W1 (n, H ) = [ x (n) − x (n − 1)] / 2
calculated at instants n = 0, 2, 3, 6, 8, 10, 12, 14. In the second stage ( a = 2) the highpass region is
not transformed, while the lowpass part s2 (n) =√W1 (2n, L) is divided into its lowpass and high- √
pass region W2 (n, L) = [s2 (n) + s2 (n + 1)] / 2 and W2 (n, H ) = [s2 (n) − s2 (n + 1)] / 2,
respectively, Fig.12.8(b). The same calculation is performed in the third and fourth stage,
Fig.12.8(c) - (d).
The Haar wavelet has the duration of impulse response equal to two. In one stage, it corresponds
to a two-sample STFT calculated using a rectangular window. Its Fourier transform presented
in Fig.12.4 is quite rough approximation of a lowpass and highpass filter. In order to improve
filter performance, an increase of the number of filter coefficients should be done. A fourth order
FIR system will be considered. The impulse response of anticausal fourth order FIR filter is
h L (n) = [ h L (0), h L (−1), h L (−2), h L (−3)] = [ h0 , h1 , h2 , h3 ].
656 Wavelet Transform
15 15
14 14
13 13
W1(10,H)
W1(12,H)
W (14,H)
W1(10,H)
W1(12,H)
W1(14,H)
W (0,H)
W (2,H)
W (4,H)
W1(6,H)
W1(8,H)
W (0,H)
W1(2,H)
W1(4,H)
W (6,H)
W1(8,H)
12 12
11
1
1 11
1
1
10 10
9 9
8 8
7 7
6 6
W (0,H) W (4,H) W (8,H) W (12,H)
2 2 2 2
5 W1(10,L) 5
W (12,L)
W (14,L)
W1(0,L)
W (2,L)
W1(4,L)
W1(6,L)
W (8,L)
4 4
1
3 1 3
1
2 2
W (0,L) W (4,L) W (8,L) W (12,L)
2 2 2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(a) (b)
15 15
14 14
13 13
W1(10,H)
W1(12,H)
W (14,H)
W1(10,H)
W1(12,H)
W1(14,H)
W1(0,H)
W (2,H)
W1(4,H)
W1(6,H)
W1(8,H)
W (0,H)
W1(2,H)
W1(4,H)
W (6,H)
W1(8,H)
12 12
11
1
11
1
1
1
10 10
9 9
8 8
7 7
6 6
W (0,H) W (4,H) W (8,H) W (12,H) W (0,H) W (4,H) W (8,H) W (12,H)
2 2 2 2 2 2 2 2
5 5
4 4
3 3
W (0,H) W (8,H) W (0,H) W (8,H)
3 3 3 3
2 2
1 1 W (0,H)
W (0,L) W (8,L) 4
0 3 3
0 W (0,L)
4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
(c) (d)
Figure 12.8 Wavelet transform of a signal with M = 16 samples at the output of stages 1, 2, 3 and 4, respectively.
Notation Wa (n, H ) is used for the highpass value of coefficient after stage (scale of) a at an instant n. Notation
Wa (n, L) is used for the lowpass value of coefficient after stage (scale of) a at an instant n.
If the highpass and reconstruction filter coefficients are chosen such that
then relation (12.35) is satisfied with K = 3, since h L (n) = g L (−n), g H (n) = (−1)n g L (3 − n), and
h H (n) = (−1)n g L (n + 3).
The reconstruction conditions
are satisfied if
h20 + h21 + h22 + h23 = 1.
Ljubiša Stanković Digital Signal Processing 657
and
For the calculation of impulse response values h0 , h1 , h2 , h3 of a fourth order system (12.39) four
independent equations (conditions) are needed.√We already have three conditions. The filter has
to satisfy zero-frequency condition HL (e j0 ) = 2, high-frequency condition HL (e jπ ) = 0 and the
reconstruction condition h20 + h21 + h22 + h23 = 1. Therefore one more condition is needed. In the
Daubechies D4 wavelet derivation the fourth condition is imposed so that the derivative of the filter
transfer function at ω = π is equal to zero
dHL (e jω )
= 0.
dω
ω =π
This condition, meaning a smooth approach to zero-value at ω = π, also guarantees that the output of
high-pass filter HH (−z) to the linear input signal, x (n) = an + b, will be zero. This will be illustrated
later. Now we have a system of four equations,
√ √
h0 + h1 + h2 + h3 = 2 from HL (e j0 ) = 2
h20 + h21 + h22 + h23 = 1 reconstruction condition
h0 − h1 + h2 − h3 = 0 from HL (e jπ ) = 0
dHL (e jω )
−h1 + 2h2 − 3h3 = 0 from = 0.
dω
ω =π
Its solution produces the fourth order Daubechies wavelet coefficients (D4)
Note that this is just one of possible symmetric solutions of the previous system of equations, Fig.12.9.
The reconstruction conditions for the fourth order FIR filter
HL (e jω ) = h0 + h1 e jω + h2 e j2ω + h3 e j3ω
658 Wavelet Transform
1 gL(n) 1 gH(n)
0.5 0.5
0 0
−0.5 −0.5
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
time n time n
1 hL(n) 1 hH(n)
0.5 0.5
0 0
−0.5 −0.5
−4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4
time n time n
with Daubechies wavelet coefficients (D4) can also be checked in a graphical way by calculating
2 2
HL (e jω ) + HL (e j(ω +π ) ) = 2
From Fig.12.10, we can see that it is much better approximation of low and high pass filters than in the
Haar wavelet case, Fig.12.4.
2 jω 2 jω 2
|HL(e )| +|HH(e )| =2
1.8
1.6 jω
|H (e )|=|DFT{φ (n)}|
L 1
1.4
1.2
0.8 jω
|H (e )|=|DFT{ψ (n)}|
H 1
0.6
0.4
0.2
0
−3 −2 −1 0 1 2 3
Figure 12.10 Amplitude of the Fourier transform of basic Daubechies D4 wavelet and scale function.
Ljubiša Stanković Digital Signal Processing 659
Another way to derive Daubechies wavelet coefficients (D4) is in using relation (12.34)
P(z) + P(−z) = 2
with
P ( z ) = G L ( z ) H L ( z ) = G L ( z ) G L ( z −1 )
Condition imposed on the transfer function GL (z) in D4 wavelet is that its value and the value of its
first derivative at z = −1 are zero-valued (smooth approach to the highpass zero value)
GL (e jω ) =0
ω =π
dGL (e jω )
= 0.
dω
ω =π
2
Then GL (z) must contain a factor of the form 1 + z−1 . Since the filter order must be even (K must
2
be odd), taking into account that 1 + z−1 would produce a FIR system with 3 nonzero coefficients,
then we have to add at least one factor of the form a(1 + z1 z−1 ) to GL (z). Thus, the lowest order FIR
filter with an even number of (nonzero) impulse response values is
2
G L ( z ) = 1 + z −1 a (1 + z1 z −1 )
with 2 2
P ( z ) = 1 + z −1 1 + z1 R ( z )
where h ih i
R(z) = a(1 + z1 z−1 ) a(1 + z1 z1 ) = z0 z−1 + b + z0 z.
Using
P(z) + P(−z) = 2
only the terms with even exponents of z will remain in P(z) + P(−z) producing
The solution is z0 = −1/16 and b = 1/4. It produces az1 = z0 = −1/16 and a2 + z21 = b = 1/4
with √
1 √ 1− 3
a = √ 1 + 3 and z1 = √
4 2 1+ 3
and 2
1 √ √ √ √
R(z) = √ 1 + 3 + 1 − 3 z −1 1 + 3 + 1 − 3 z1 .
4 2
The reconstruction filter transfer function is
1 √ √
G L ( z ) = √ (1 + z −1 )2 1 + 3 + 1 − 3 z −1
4 2
with
1 √ √ √ √
g L (n)= √ [ 1 + 3 δ(n) + 3 + 3 δ(n − 1) + 3 − 3 δ(n − 2) + 1 − 3 δ(n − 3)].
4 2
660 Wavelet Transform
All other impulse responses follow from this one (as in the presented table).
x (n) = an + b.
is equivalent to the condition that highpass coefficients (output from HH (e jω )) are zero-valued,
Fig.12.10. Show that the lowpass coefficients remain a linear function of time.
⋆ The highpass coefficients after the first stage W1 (2n, H ) are obtained by downsampling
W1 (n, H ) whose form is
if
−h0 + h1 − h2 + h3 = 0 and
h1 − 2h2 + 3h3 = 0.
Thus we may consider that the highpass D4 coefficients will indicate the deviation of the signal
from a linear function x (n) = an + b. In the first stage the coefficients will indicate the deviation from
the linear function within four samples. In the next stage the equivalent length of wavelet is doubled.
The highpass coefficient in this stage will indicate the deviation of the signal from the linear function
within doubled number of signal samples, and so on. This a significant difference from the STFT nature
that is derived based on the Fourier transform and the signal decomposition and tracking its frequency
content.
Ljubiša Stanković Digital Signal Processing 661
The matrix for the D4 wavelet transform calculation in the first stage is of the form
W1 (0, L) h0 h1 h2 h3 0 0 0 0 x (0)
W1 (0, H ) h3 − h2 h1 − h0 0 0
0 0 x (1)
W1 (2, L) 0 0 h h h h 0 0 x (2)
0 1 2 3
W1 (2, H ) 0 0 h3 − h2 h1 − h0 0 0
= x (3) . (12.40)
W1 (4, L) 0 0 0 0 h h h h x ( 4 )
0 1 2 3
W1 (4, H ) 0 0 0 0 h − h h − h x ( 5 )
3 2 1 0
W1 (6, L) h2 h3 0 0 0 0 h0 h1 x (6)
W1 (6, H ) h1 − h0 0 0 0 0 h3 − h2 x (7)
In the first row of transformation matrix the coefficients corresponds to h L (n), while the second row
corresponds to h H (n). The first row produces D4 scaling function, while the second row produces
D4 wavelet function. The coefficients are shifted for 2 in next rows. As it has been described in the
Hann(ing) window reconstruction case, the calculation should be performed in a circular manner,
assuming signal periodicity. That is why the coefficients are circularly shifted in the last two rows.
Example 12.5. Consider a signal x (n) = 64 − |n − 64| within 0 ≤ n ≤ 128. How many nonzero
coefficients will be in the first stage of the wavelet transform calculation using D4 wavelet
functions. Assume that the signal can appropriately be extended so that the boundary effects can
be neglected.
662 Wavelet Transform
⋆ In the first stage all highpass coefficients corresponding to linear four-sample intervals
will be zero. It means that out of 64 high pass coefficients (calculated with step two in time)
only one nonzero coefficient will exist, calculated for n = 62, including nonlinear interval
62 ≤ n ≤ 65. It means that almost a half of the coefficients can be omitted in transmission or
storage, corresponding to 50% compression ratio. In the DFT analysis this would correspond to a
signal with a half of (the high frequency) spectrum being equal to zero. In the wavelet analysis
this process would be continued with additional savings in next stages of the wavelet transform
coefficients calculation. It also means that if there is some noise in the signal, we can filter out all
zero-valued coefficients using an appropriate threshold. For this kind of signal (piecewise linear
function of time) we will be able to improve the signal-to-noise ratio for about 3 dB in just one
wavelet stage.
Example 12.6. For the signal x (n) = δ(n − 7) defined within 0 ≤ n ≤ 15 calculate the wavelet
transform coefficients using the D4 wavelet/scale function. Repeat the same calculation for the
signal x (n) = 2 cos(16πn/N ) + 1 with 0 ≤ n ≤ N − 1 with N = 16.
⋆ The wavelet coefficients in the first stage (scale a = 1, see also Fig.12.7) are
with √ √ √ √
1− 3 3− 3 3+ 3 1+ 3
[ h3 , h2 , h1 , h0 ] = [ √ , √ , √ , √ ].
4 2 4 2 4 2 4 2
In specific, W1 (0, H ) = 0, W1 (2, H ) = 0, W1 (4, H ) = −0.4830, W1 (6, H ) = −0.2241,
W1 (8, H ) = 0, W1 (10, H ) = 0, W1 (12, H ) = 0, and W1 (14, H ) = 0.
The lowpass part of the first stage values
The values of W2 (4n, H ) are: W2 (0, H ) = −0.5123, W2 (4, H ) = −0.1708, W2 (8, H ) = 0, and
W2 (12, H ) = 0. The lowpass values at this stage at the input to the next stage (a = 3) calculation
15 15
14 14
13 13
W1(10,H)
W (12,H)
W1(14,H)
W (10,H)
W1(12,H)
W (14,H)
W1(0,H)
W1(2,H)
W (4,H)
W (6,H)
W (8,H)
W (0,H)
W (2,H)
W (4,H)
W (6,H)
W1(8,H)
12 12
11
1
1
11
1
1
1
10 10
9 9
8 8
7 7
6 6
W2(0,H) W2(4,H) W2(8,H) W2(12,H) W2(0,H) W2(4,H) W2(8,H) W2(12,H)
5 5
4 4
3 3
W3(0,H) W3(8,H) W3(0,H) W3(8,H)
2 2
1 1
W3(0,L) W3(8,L) W3(0,L) W3(8,L)
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 12.11 Daubechies D4 wavelet transform (absolute value) of the signal x (n) = δ(n − 7) using N = 16
signal samples, 0 ≤ n ≤ N − 1 (left). The Daubechies D4 wavelet transform (absolute value) of the signal
x (n) = 2 cos(2π8n/N ) + 1, 0 ≤ n ≤ N − 1, with N = 16 (right).
The inverse matrix for the D4 wavelet transform for a signal with N = 8 samples would be
calculated from the lowest level in this case for a = 2 with coefficients W2 (0, L), W2 (0, H ), W2 (4, L),
and W2 (4, H ). The lowpass part of signal at level a = 1 would be reconstructed using
W1 (0, L) h0 h3 h2 h1 W2 (0, L)
W1 (2, L) h1 − h2 h3 − h0 W2 (0, H )
W1 (4, L) = h2 h1 h0 h3 W2 (4, L) .
W1 (6, L) h3 − h0 h1 − h2 W2 (4, H )
After the lowpass part W1 (0, L), W1 (2, L), W1 (4, L), and W1 (6, L) are reconstructed, they are used
with wavelet coefficients from this stage W1 (0, H ), W1 (2, H ), W1 (4, H ), and W1 (6, H ) to reconstruct
the signal as
x (0) h0 h3 0 0 0 0 h2 h1 W1 (0, L)
x (1) h1 − h2 0 0 h3 − h0
0 0 W1 (0, H )
x (2) h2 h1 h0 h3 0 0 0
0 W1 (2, L)
x (3) h3 − h0 h1 − h2 0 0 0 0 W1 (2, H )
= . (12.41)
x (4) 0 0
0 h2 h1 h0 h3 0 W1 (4, L)
x (5) 0 0 h3 − h0 h1 − h2 0
0 W1 (4, H )
x (6) 0 0 0 0 h2 h1 h0 h3 W1 (6, L)
x (7) 0 0 0 0 h3 − h0 h1 − h2 W1 (6, H )
This procedure can be continued for signal of length N = 16 with one more stage. Additional stage
would be added for N = 32 and so on.
664 Wavelet Transform
Example 12.7. For the Wavelet transform from the previous example find its inverse (reconstruct the
signal).
⋆ The inversion is done backwards. From W3 (0, H ), W3 (0, L), W3 (8, H ), W3 (8, L) we get
signal s3 (n) or W2 (2n, L) as
W2 (0, L) h0 h3 h2 h1 W3 (0, L)
W2 (4, L) h1 − h2 h3 − h0 W3 (0, H )
W2 (8, L) = h2 h1 h0 h3 W3 (8, L)
W2 (12, L) h3 − h0 h1 − h2 W3 (8, H )
h0 h3 h2 h1 0.4668 −0.1373
h1 − h2 h3 − h0 −0.1251 0.6373
=
h2 h1 h0 h3 −0.1132 =
.
0
h3 − h0 h1 − h2 −0.4226 0
Then W2 (4n, L) = s3 (n) are used with the wavelet coefficients W2 (4n, H ) to reconstruct
W1 (2n, L) or s2 (n) using
W1 (0, L) h0 h3 0 0 0 0 h2 h1 W2 (0, L)
W1 (2, L) h1 − h2 0 0 h3 − h0
0 0 W2 (0, H )
W1 (4, L) h2 h1 h0 h3 0 0 0 0 W2 (4, L)
W1 (6, L) h3 − h0 h1 − h2 0 0
= 0 0 W2 (4, H ) .
W1 (8, L) 0 0 h h h h 0 0 W ( 8, L )
2 1 0 3 2
W1 (10, L) 0 0 h − h h − h 0 0 W2 (8, H )
3 0 1 2
W1 (12, L) 0 0 0 0 h2 h1 h0 h3 W2 (12, L)
W1 (14, L) 0 0 0 0 h3 − h0 h1 − h2 W2 (12, H )
The obtained values W1 (n, L) with the wavelet coefficients W1 (n, H ) are used to reconstruct the
original signal x (n). The transformation matrix in this case is of 16 × 16 order and it is formed
using the same structure as the previous transformation matrix.
Although the wavelet realization can be performed using the same basic function presented in the
previous section, here we will consider the equivalent wavelet function h H (n) and equivalent scale
function h L (n) in different scales. To this aim we will analyze the reconstruction part of the system.
Assume that in the wavelet analysis of a signal only one coefficient is nonzero. Also assume that this
nonzero coefficient is at the exit of all lowpass filters structure. It means that the signal is equal to the
basic scale function in the wavelet analysis. The scale function can be found in an inverse way, by
reconstructing signal corresponding to this delta pulse like transform. The system of reconstruction
filters is shown in Fig.12.12. Note that this case and coefficient in the Haar transform would correspond
to W4 (0, L) = 1 in (12.21) or in Fig.12.7.
The reconstruction process consists of signal upsampling and passing it trough the reconstruction
stages. For example, the output of the third reconstruction stage has the z-transform
Φ2 ( z ) = G L ( z ) G L ( z2 ) G L ( z4 ).
Ljubiša Stanković Digital Signal Processing 665
where g L (n) is the four sample impulse response (Daubechies D4 coefficients). Duration of the scale
function φ1 (n) is (4 + 3) + 4 − 1 = 10 samples, while the duration of φ2 (n) is 19 + 4 − 1 = 22
samples. The scale function for different scales a (exists of different reconstruction stages) are is
presented in Fig.12.14. Normalized values φa (n)2( a+1)/2 are presented. The amplitudes are scaled
by 2( a+1)/2 in order to keep their values within the same range for various a. In a similar way the
δ (n) GL(z)
φ0(n)=hL(n)
↑ GL(z)
φ1(n)
2
0 GH(z) ↑ GL(z)
2
φ2(n)
↑ GH(z)
↑ GH(z)
wavelet function ψ(n) is calculated. The mother wavelet is obtained in the wavelet analysis of a signal
when only one nonzero coefficient exists at the highpass of the lowest level of the signal analysis.
To reconstruct the mother wavelet the reconstruction system as in Fig.12.13 is used. The values of
ψ(n) are calculated: using the values of g H (n) at the first input, upsampling it and passing trough the
reconstruction system with g L (n), to obtain ψ1 (n) and repeating this procedure for the next steps. The
resulting z-transform is
Ψ ( z ) = G H ( z ) G L ( z2 ) G L ( z4 ).
In the Haar transform (12.21) and Fig.12.7 this case would correspond to W4 (0, H ) = 1.
666 Wavelet Transform
0 G (z)
L
↑ G (z)
L
ψ1(n)
2
δ (n) G (z)
H
ψ0(n)=hH(n) ↑ G (z)
L
2
ψ (n)
2
↑ G (z)
H
↑ GH(z)
Calculation in the time of the wavelet function in different scales is done using
Different scales of the wavelet function are presented in Fig.12.14, normalized using ψa (n)2( a+1)/2 .
Wavelet function are orthogonal in different scales, with corresponding steps, as well. For example,
it is easy to show that
hψ0 (n − 2m), ψ1 (n)i = 0
since !
hψ0 (n − 2m), ψ1 (n)i = ∑ g H ( p) ∑ gH (n − 2m) gL (n − 2p) =0
p n
for any p and m according to (12.36).
Note that the wavelet and scale function in the last row are plotted as the continuous functions. The
continuous wavelet transform (CWT) is calculated by using the discretized versions of the continuous
functions. However in contrast to the discrete wavelet transform whose step in time and scale change is
strictly defined, the continuous wavelet transform can be used with various steps and scale functions.
Example 12.8. In order to illustrate the procedure it has been repeated for the Haar wavelet when
g L (n) = [1 1] and g H (n) = [1 −1]. The results are presented in Fig.12.15.
Ljubiša Stanković Digital Signal Processing 667
1 1
0 0
−1 −1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
−1 −1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
−1 −1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
−1 −1
0 10 20 30 40 0 10 20 30 40
1 1
0 0
−1 −1
0 1 2 3 0 1 2 3
Figure 12.14 The Daubechies D4 wavelet scale function and wavelet calculated using the filter bank relation
in different scales: a = 0 (first row), a = 1 (second row), a = 2 (third row), a = 3 (fourth row), a = 10 (fifth
row-approximation of a continuous domain). The amplitudes are scaled by 2(a+1)/2 to keep them within the same
range. Values ψa (n)2(a+1)/2 and φa (n)2(a+1)/2 are presented.
668 Wavelet Transform
1 1
0 0
−1 −1
0 5 10 15 0 5 10 15
1 1
0 0
−1 −1
0 5 10 15 0 5 10 15
1 1
0 0
−1 −1
0 5 10 15 0 5 10 15
1 1
0 0
−1 −1
0 5 10 15 0 5 10 15
Figure 12.15 The Haar wavelet scale function and wavelet calculated using the filter bank relation in different
scales. Values are normalized 2(a+1)/2 .
The results derived for Daubechies D4 wavelet transform can be extended to higher order polynomial
functions. Consider a sixth order FIR system
h0 h2 + h1 h3 + h2 h4 + h3 h5 = 0
h0 h4 + h1 h5 = 0,
are added. Since the filter order is 6 then two orthogonality conditions must be used. One for shift 2
and the other for shift 4.
The linear signal cancellation condition is again used as
The final condition in the Daubechies D6 wavelet transform is that the quadratic signal cancellation is
achieved for highpass filter, meaning
d2 HL (e jω ) d2 ∑5n=0 hn e jωn 5
2 jωn
dω 2
=
dω 2 = − ∑ n h n e
= 0.
ω =π n =0 ω =π
ω =π
−h1 + 22 h2 − 32 h3 + 42 h4 − 52 h5 = 0
From the set of six equations the Daubechies D6 wavelet transform coefficients are obtained as
This is one of possible symmetric solutions of the previous system. From the definition it is obvious that
the highpass coefficients will be zero as far as the signal is of quadratic nature within the considered
interval. These coefficients can be used as a measure of the signal deviation from the quadratic form in
each scale.
Implementation is the same as in the case of Haar or D4 wavelet transform. Only difference is in
the filter coefficients form.
This form can be also derived from the reconstruction conditions and the fact that the transfer
function GL (z) contains a factor of the form (1 + z−1 )3 since z = −1 is its third order zero, according
to the assumptions.
In the Daubechies D6 wavelet transform the last condition is introduced so that the output of high-pass
filter is zero when the input signal is quadratic. Another way to form filter coefficients for a six sample
wavelet is to introduce the condition that the first moment of the scale function is zero, instead of the
second order moment of the wavelet function. In this case symmetric form of coefficients should be
used in the definition
√
h L (−2) + h L (−1) + h L (0) + h L (1) + h L (2) + h L (3) = 2
h2L (−2) + h2L (−1) + h2L (0) + h2L (1) + h2L (2) + h2L (3) = 1
−2h L (−2) + h L (−1) − h L (1) + 2h L (2) − 3h L (3) = 0
h L (−2)h L (0) + h L (−1)h L (1) + h L (0)h L (2) + h L (1)h L (3) = 0
h L (−2)h L (2) + h L (−1)h L (3) = 0.
670 Wavelet Transform
Originally the wavelet transform was introduced by Morlet as a frequency varying STFT. Its aim was
to analyze spectrum of the signal with varying resolution in time and frequency. Higher resolution in
frequency was required at low frequencies, while at high frequencies high resolution in time was the
aim, for specific analyzed seismic signals.
The Daubechies D4 wavelet/scale function is derived from the condition that the highpass
coefficients of a signal with linear change in time (x (n) = an + b) are zero-valued. Higher order
Daubechies wavelet/scale functions are derived by increasing the order of the signal polynomial
changes. Frequency of a signal does not play any direct role in the discrete-wavelet transform definition
using Daubechies functions. In this sense it would be easier to relate the wavelet transform to the
linear (D4) and higher order interpolations of functions (signals), within the intervals of various lengths
(corresponding to various wavelet transform scales), than to the spectral analysis where the harmonic
basis functions play the central role.
Example 12.9. Consider a signal x (n) with M = 16 samples, 0 ≤ n ≤ M − 1. Write the Daubechies
D4 wavelet transform based decomposition of this signal that will divide the frequency axis into
four equal regions.
⋆ In the STFT a 4−point (N −point) signal would be used to calculate 4 (or N) coefficients of
the frequency plane. The wavelet transform divides the time-frequency plane into two regions
(high and low) regardless of the number of the signal values (wavelet transform coefficients) being
used. If the Haar wavelet is used in Fig.12.16 then by dividing both highpass bands and lowpass
bands in the same way the short-time Walsh-Hadamard transform with 4-sample nonoverlapping
calculation would be obtained. In the cases of Daubechies 4D wavelet transform, a kind of short
time analysis with the Daubechies functions would be obtained. For the Daubechies D4 function
the scale 2 functions:
discrete−time n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 12.16 Full coverage of the time-frequency plane using the filter bank calculation and systems with impulse
responses corresponding to the wavelet transformation.
would be used to calculate W (4n, 0), W (4n, 1), W (4n, 2), and W (4n, 3), Fig.12.17. The
asymmetry of the frequency regions is visible.
Note that the STFT analysis of this case, with a Hann(ing) window of N = 8 and calculation
step R = 4 will result in the same number of instants, however the frequency range will be divided
in 8 regions, having a finer grid. This grid is redundant with respect to the signal and to the
wavelet transform. Both, the signal and the wavelet transform have 16 values (coefficients).
672 Wavelet Transform
1 4
0 2
−1
0
0 2 4 6 8 10 −1 −0.5 0 0.5 1
1 4
0 2
−1
0
0 2 4 6 8 10 −1 −0.5 0 0.5 1
1 4
0 2
−1
0
0 2 4 6 8 10 −1 −0.5 0 0.5 1
1 4
0 2
−1
0
0 2 4 6 8 10 −1 −0.5 0 0.5 1
Figure 12.17 Daubechies functions: Scaling function (first row), Mother wavelet function (second row), Function
producing the low-frequency part in the second stage of the high frequency part in the first stage (third), Function
producing the high-frequency part in the second stage of the high frequency part in the first stage (fourth). Time
domain forms of the functions are left while its spectral content is shown on the right.
Part VI
673
Chapter 13
Sensing of Sparse Signals
A discrete-time signal can be transformed into various domains using different signal trans-
formations. Some signals that cover the whole considered interval in one domain could have only
a few nonzero coefficients in a transformation domain. These signals are sparse in the considered
transformation domain. An observation or measurement of a sparse signal is a linear combination
of the sparsity domain coefficients. Since the signal samples are linear combinations of the signal
transform coefficients they can be considered as the measurements of a sparse signal in the respective
transformation domain.
Compressive sensing is a field dealing with a model for data acquisition including the problem of
sparse signal recovery from a reduced set of measurements. A reduced set of measurements can be a
result of a desire to sense a sparse signal with the lowest possible number of measurements/observations
(compressive sensing). It can also be a result of a physical or measurement unavailability to take the
complete set of measurements. In applications it could also happen that some arbitrarily positioned
samples of a signal are so heavily corrupted by disturbances that it is better to omit them and consider as
unavailable in the analysis and to try to reconstruct the signal from a reduced set of samples. Although
the reduced set of measurements appears in the first case as a result of the user strategy to compress
information, while in the next two cases the reduced set of samples is not a result of user intention, all
of them can be considered within the unified framework. Under some conditions, a full reconstruction
of a sparse signal can be obtained with a reduced set of measurements/samples, as in the case if a
complete set of measurements/samples were available. A priori information about the sparse nature of
the analyzed signal in a known transformation domain must be used in this analysis. Sparsity is the
main requirement that should be satisfied in order to efficiently apply the compressive sensing methods
for sparse signal reconstruction.
The topic of this chapter is to analyze the signals that are sparse in one of its transformations
domains. The DFT will be used as a case study. The compressive sensing results and algorithms are
presented and used as a tool to solve engineering problems involving sparse signals.
Before we start the analysis we will describe two simple examples that can be interpreted and solved
within the context of sparse signal processing and compressive sensing.
Consider a large set of real numbers X (0), X (1), . . . , X ( N − 1). Assume that only one of them
is nonzero (or different from a common and known expected value). We do not know either its position
_________________________________________________
Authors: Ljubiša Stanković, Miloš Daković, Srdjan Stanković, Irena Orović
674
Ljubiša Stanković Digital Signal Processing 675
or its value. The aim is to find the position and the value of this nonzero number. The nonzero valued
sample will be denoted by X (i ). A direct way to find the position of nonzero sample would be to
perform up to N measurements and to check which sample assumes a nonzero value. However, if N is
very large and there is only one nonzero sample we can get the result using just a few measurements. A
procedure to solve the problem with a reduced number of measurements is described next.
Take random numbers as weighting coefficients a0 (k), k = 0, 1, 2, . . . , N − 1, for each sample.
Measure the total value of all N weighted samples. Since only one of them is different from zero we
will get the measurement
N −1
y (0) = ∑ a0 ( k ) X ( k ) = a0 ( i ) X ( i ). (13.1)
k =0
The same value will be obtained if there is only one sample different from the common and known
expected value m of all other samples. Then, we will get the total measured value
M = a1 m + a2 m + · · · + ai (m + X (i )) + · · · + a N m.
In the space of unknowns (variables) X (0), X (1), . . . , X ( N − 1), this equation represents an N-
dimensional hyperplane. We know that only one unknown X (k) is nonzero at an unknown position
k = i. The cross-section of hyperplane (13.2) with any of the coordinate axes could be a solution to our
problem Fig.13.2(a). Assuming that a single X (k) is nonzero, a solution will exist for any k. Thus, one
measurement would produce a set of N possible single nonzero values equal to
As expected, from one measurement we are not able to solve the problem (to find the position and the
value of one nonzero sample).
If we perform one more measurement, with another set of weighting coefficients ak (1),
k = 0, 1, . . . , N − 1, and get measurement y(1) = X (i ) ai (1), the result will be another hyperplane,
Fig.13.2(b),
N −1
y (1) = ∑ X ( k ) a k (1). (13.3)
k =0
This measurement will produce a new set of possible solutions for each X (k) defined by
If these two hyperplanes (sets of possible solutions) produce only one common value for k = i,
1 2 3 N
Figure 13.1 There are N bags with coins. One of them, at an unknown position, contains false coins. False coins
differ from the true ones in mass for an unknown X (i ) = ∆m. The mass of the true coins is m. A set of coins for
the measurement is formed using: a1 coins from the first bag, a2 coins from the second bag, and so on. The total
measured value is M = a1 m + · · · + ai (m + X (i )) + · · · + a N m. The difference of this value from the total mass
if all coins were true is M − MT . The equations for the case with one and two bags with false coins are shown. The
notation ak (0) = ak+1 , for k = 0, 1, . . . , N − 1, is used in this illustration.
Figure 13.2 The solution illustration for N = 3, K = 1, and various possible cases: (a) Three possible solutions
for one measurement plane. (b) Unique solution for two measurement planes. (c) Two possible solutions for two
measurement planes.
Ljubiša Stanković Digital Signal Processing 677
Example 13.1. Consider a set of N = 5 bags of coins. In one of them all coins are false. The weight
of true coins is m = 2.
In the first measurement we use ak (0) = k coins from the kth bag. The total weight of coins
in this measurement is M = 31. This weight is equal to (1 + 2 + 3 + 4 + 5)2 + iX (i ) = M,
where X (i ) is the unknown weight difference of false coins. It means that iX (i ) = 1, since all
true coins would produce M = (1 + 2 + 3 + 4 + 5)2 = 30. If the false coins were in the first
bag, their weight difference would be X (1) = 1/1 = 1, if they were in the second bag then
X (2) = 1/2, and so on, X (3) = 1/3, X (4) = 1/4, X (5) = 1/5. False coins can be in any of
th3 five bags.
Let us perform one more measurement with ak (1) = k2 coins from each bag. Assume that we
get the total measured weight M = 113. It is equal to M = 2(12 + 22 + 32 + 42 + 52 ) + i2 X (i ) =
113. Obviously i2 X (i ) = 3. Again, if the false coins were in the first bag then X (1) = 3/1,
the second false bag would produce X (2) = 3/22 = 3/4, and so on, X (3) = 3/32 = 1/3,
X (4) = 3/16, X (5) = 3/25.
The common solution for both sets is X (3) = 1/3. Thus, the false coins are in the third bag.
Their weight difference from the true coins is 1/3.
for all i 6= k. It also means that rank (A2 ) = 2 for all 2 × 2 submatrices, denoted by A2 , of the
measurement matrix A defined by (13.4).
In order to prove this statement assume that two different solutions X (i ) and X (k), for the case of
one nonzero coefficient, satisfy the same measurement hyperplane equations (proof by contradiction)
Then
a i (0) X ( i ) = a k (0) X ( k )
and
a i (1) X ( i ) = a k (1) X ( k ).
a i (0) a (0)
= k
a i (1) a k (1)
or ai (0) ak (1) − ai (1) ak (0) = 0. This is contrary to the assumption that ai (0) ak (1) − ai (1) ak (0) 6= 0.
The same conclusion can be made considering matrix form relations for X (i ) and X (k). If both
of them may satisfy the same two measurements then
y (0) a i (0) a k (0) X (i )
=
y (1) a i (1) a k (1) 0
y (0) a i (0) a k (0) 0
= . (13.5)
y (1) a i (1) a k (1) X (k)
Subtraction of the previous matrix equations results in
a i (0) a k (0) X (i )
= 0.
a i (1) a k (1) − X (k)
If ai (0) ak (1) − ai (1) ak (0) 6= 0 is satisfied, then the trivial solution to the problem, X (i ) = X (k) = 0,
follows. Therefore, two different nonzero solutions X (i ) and X (k) cannot exist in this case.
The previous experiment can be repeated assuming two nonzero values X (i ) and X (k),
Fig.13.1(second option). In the case of two nonzero elements in vector X, two measurements,
N −1
y (0) = ∑ X ( l ) a l (0) = X ( i ) a i (0) + X ( k ) a k (0) (13.6)
l =0
N −1
y (1) = ∑ X ( l ) a l (1) = X ( i ) a i (1) + X ( k ) a k (1),
l =0
will result in X (i ) and X (k) for any i and k, i 6= k. They are the solution to a system with two equations
of two unknowns. Therefore, with two measurements we cannot get a result of the problem and find
the positions and the values of two nonzero coefficients. If two more measurements are performed then
an additional system with two equations
is formed. Two systems of two equations (13.6) and (13.7) could be solved to find X (i ) and X (k) for
each combination of i and k. If these two systems produce only one common solution pair X (i ) and
X (k), then this pair is the unique solution to our problem. As in the case of one nonzero coefficient, we
may show that the sufficient condition for the unique solution is
a k 1 (0) a k 2 (0) a k 3 (0) a k 4 (0)
a k (1) a k (1) a k (1) a k (1)
det
a k (2) a k (2) a k (2) a k (2) 6 = 0
1 2 3 4 (13.8)
1 2 3 4
a k 1 (3) a k 2 (3) a k 3 (3) a k 4 (3)
Ljubiša Stanković Digital Signal Processing 679
for all combinations of k1 , k2 , k3 and k4 or rank (A4 ) = 4 for all A4 , where A4 is a 4 × 4 submatrix
of the measurement matrix A defined, in this case, as
a0 (0) a1 (0) . . . a N −1 (0)
a0 (1) a1 (1) . . . a N −1 (1)
A = a0 (2) a1 (2) . . . a N −1 (2) .
(13.9)
a0 (3) a1 (3) . . . a N −1 (3)
Suppose that (13.8) holds and that two pairs of solutions of the problem X (k1 ), X (k2 ) and
X (k3 ), X (k4 ) exist. Then,
y (0) a k 1 (0) a k 2 (0) a k 3 (0) a k 4 (0) X (k1 )
y (1) a k 1 (1) a k 2 (1) a k 3 (1) a k 4 (1) X (k2 )
y (2) = a k 1 (2) a k 2 (2) a k 3 (2) a k 4 (2) 0
y (3) a k 1 (3) a k 2 (3) a k 3 (3) a k 4 (3) 0
and
y (0) a k 1 (0) a k 2 (0) a k 3 (0) a k 4 (0) 0
y (1) a k (1) a k 2 (1) a k 3 (1) a k 4 (1) 0
1 .
y (2) = a k (2) a k 2 (2) a k 3 (2) a k 4 (2) X ( k 3 )
1
y (3) a k 1 (3) a k 2 (3) a k 3 (3) a k 4 (3) X (k4 )
Subtracting these two systems we get
a k 1 (0) a k 2 (0) a k 3 (0) a k 4 (0) X (k1 )
a k (1) a k (1) a k 3 (1) a k 4 (1)
0= 1 2 X (k2 ) .
a k 1 (2) a k 2 (2) a k 3 (2) a k 4 (2) − X ( k 3 )
a k 1 (3) a k 2 (3) a k 3 (3) a k 4 (3) − X (k4 )
Since (13.8) holds then X (k1 ) = X (k2 ) = X (k3 ) = X (k4 ) = 0, meaning that the assumption about
two independent pairs of solutions with two nonzero coefficients is not possible if (13.8) holds.
The presented approach to solve the problem (and to check the solution uniqueness) is illustrative,
however computationally not feasible. For example, in a simple case with N = 1024 and just two
nonzero coefficients, we have to solve systems of equations (13.6) and (13.7) for each possible
combination of i and k and to compare their solutions. The total number of combinations of two out of
N indices is
N
∼ 5 × 105 .
2
In order to check the solution uniqueness we should calculate a determinant value for all combinations
of four indices k1 , k2 , k3 , k4 out of the set of N indices. The number of determinants is ( N4 ) ∼ 1010 . If
one determinant of the fourth order is calculated in 10−5 [sec], then more than 5 days are needed to
calculate all determinants for this quite simple case of two nonzero coefficients.
As the second illustrative example consider a signal described by a weighted sum of K harmonics
from a set of possible discrete oscillatory functions e j2πkn/N , k = 0, 1, 2, . . . , N − 1,
with K ≪ N. This signal is sparse in the DFT domain. Its DFT X (k) assumes only a few nonzero
values at k = k i , i = 1, 2, . . . , K.
In classical signal processing this signal is described by a full set of N signal samples/measure-
ments x (n) at n = 0, 1, 2, . . . , N − 1.
However, if we know that the signal consists of only K ≪ N discrete oscillatory functions with
unknown amplitudes and frequency indices k i , then regardless of their frequencies, the signal can be
680 Sensing of Sparse Signals
fully reconstructed from a reduced set of signal samples. As in the first illustrative example, a signal
sample at an arbitrary instant n1 can be considered as a weighted measurement of the sparse coefficients
X ( k ),
N −1 N −1
y (0) = x ( n1 ) = ∑ X (k)ψk (n1 ) = ∑ X ( k ) a k (0),
k =0 k =0
with the weighting factors ψk (n1 ) = exp( j2πn1 k/N )/N = ak (0). The previous relation is the inverse
DFT.
Now a similar analysis like in the first illustrative example can be performed, assuming, for
example, K = 1 or K = 2. We can find the positions and the values of nonzero coefficients X (k) using
just a few signal samples/measurements y(i ). When the nonzero coefficient positions and their values
are recovered then the whole DFT and the signal are recovered.
This model corresponds to many signals in real life. For example, in the Doppler-radar systems
the speed of a radar target is transformed into a frequency of a sinusoidal signal. Since the returned
signal contains only one or just a few targets, the signal representing target velocity is a sparse signal
in the DFT domain. It can be reconstructed from fewer samples than the total number of radar return
signal samples N.
After the basic notions are introduced through illustrative examples in the previous section, here we
will provide formal definitions of the key concepts in sparse signal processing and compressive sensing.
13.2.1 Sparsity
X (k) = 0
for k ∈
/ {k1 , k2 , . . . , k K } = K, where the sparsity support set K is a subset of all possible indices,
kXk0 = card {K } = K,
where card {K } is the notation for the number of elements in K. Counting the nonzero elements in a
signal representation X can be achieved using the so called ℓ0 -norm
N −1
k X k0 = ∑ | X (k)|0 .
k =0
This function is referred to as the ℓ0 -norm (norm-zero) although it does not satisfy the norm properties
(kcXk0 = kXk0 6= c kXk0 , for an arbitrary constant c). By definition | X (k)|0 = 0 for | X (k)| = 0 and
| X (k)|0 = 1 for | X (k)| 6= 0.
Ljubiša Stanković Digital Signal Processing 681
card {K } = K ≪ N.
Example 13.2. Consider two sets of sparse numbers X (k) and H (k), k = 0, 1, . . . , N − 1, in vector
notations X and H. Show that the sparsity of the sum of these numbers is not greater than the
sum of their individual sparsities,
k H + X k0 ≤ k H k0 + k X k0 , (13.10)
⋆ Assume that the sparsity support of X is K X and the sparsity support of H is K H . We can
differ the following cases:
- If K X ∩ K H = ∅, then the number of nonzero numbers in X (k) + H (k) is equal to the
sum of the numbers of nonzero elements in X (k) and H (k), and kH + Xk0 = kHk0 + kXk0 .
- If K X ∩ K H 6= ∅, then the number of nonzero numbers in X (k) + H (k) is always smaller
than the sum of the numbers of nonzero elements in X (k ) and H (k), for the number of overlapping
indices and possible number of the elements that cancel out. Then kH + Xk0 < kHk0 + kXk0 .
Inequality (13.10) follows from these two cases.
13.2.2 Measurements
is called a measurement, with the weighting coefficients (weights) denoted by ak (n). The measurements
can written in a form of the system of M equations
y (0) a0 (0) a1 (0) a N −1 (0) X (0)
y (1) a0 (1) a1 (1) a N −1 (1)
X (1)
.. = .. .. .. .. (13.12)
. . . . .
y ( M − 1) a0 ( M − 1) a1 ( M − 1) a N −1 ( M − 1) X ( N − 1)
0
1
2
3
4
5
6
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 13.3 Principle of compressive sensing. The short and wide measurement matrix A maps the original
N-dimensional K-sparse vector, X, to an M-dimensional dense vector of measurements, y, with M < N and
K ≪ N. In our case N = 14, M = 7, and K = 2.
The fact that the signal is sparse with X (k) = 0 for k ∈ / {k1 , k2 , . . . , k K } = K is not included in the
measurement matrix A since the positions of the nonzero values are unknown. If the knowledge that
X (k) = 0 for k ∈/ {k1 , k2 , . . . , k K } = K were included, then a reduced measurement matrix would be
obtained as
y (0) a k 1 (0) a k 2 (0) a k K (0) X (k1 )
y (1) a k 1 (1) a k 2 (1) a k K (1) X (k2 )
.. = .. .. .. .. (13.13)
. . . . .
y ( M − 1) a k 1 ( M − 1) a k 2 ( M − 1) a k K ( M − 1) X (k K )
or
y = AK XK .
The M × K matrix AK would be formed if we knew the positions of nonzero samples k ∈
{k1 , k2 , . . . , k K } = K. It would follow from the measurement matrix A by omitting the columns
corresponding to the zero-valued elements X (k). Vector XK consists of the assumed nonzero elements
X ( k ).
Assuming that there are K nonzero elements X (k), the total number of possible different matrices
AK is equal to the number of combinations with K out of N positions. It is equal to ( N K ). This matrix
will play an important role in the analysis that follows.
In signal processing, the sparsity domain is commonly one of the signal transformation domains. For a
linear signal transform X = Φx and its inverse transform x = ΨX the signal samples are
N −1
x (n) = ∑ X (k)ψk (n),
k =0
Ljubiša Stanković Digital Signal Processing 811
σ = 0.1 are assumed. The assumed threshold for considering hyperparameters extremely large is
Th = 100. Hyperparameters above this threshold are omitted from calculation (along with the
corresponding values in X, A, D and V). The results for estimated mean value V in the first
iteration are shown in Fig.14.38(c), along with the values of hyperparameters V in Fig.14.38(d).
The hyperparameters whose value is above Th are omitted (pruned) along with the corresponding
values at the same positions in all other matrices. In the second iteration the values of remaining
hyperparameters V are shown in Fig.14.38(e). After the elimination of hyperparameters above
the threshold, the third iteration is calculated with the remaining positions of the hyperparameters.
In this iteration all hyperparameters, except those whose values are close to one, are eliminated
Fig.14.38(f). The remaining positions, after this iteration, correspond to the nonzero elements
X (k i ), i = 1, 2, . . . , K positions, with corresponding pruned matrices ΣK , AK , DK . The values of
X (k i ) are estimated using Vi from
T
VK = ΣK AK y/σ2 = (AK T AK + σ2 DK )−1 AK
T
y
in the final iteration. If the measurements were noise-free this would be exact recovery. The values
of estimated X (k i ), i = 1, 2, . . . , K are shown in Fig.14.38(g). The diagonal values of ΣK are the
variances of X (k i ).
Choi-Willimas distribution, 637
Cohen class of distributions, 631
Index discrete form, 635
kernel decomposition, 636
Coherence, 688, 691
χ-squared distribution, 361 comb filter, 183
Complex sinusoidal signal, 18
Activation function, 529 Compressive sensing, 674
Adaptive systems, 446 Confidence Intervals, 347
Allpass system, 224 Constant overlap-add (COLA), 557
Ambiguity function, 611 Constant overlap-add (WOLA), 557
Analog signals, 14 Continuous signals, 17
Analytic part, 35 Convolution
Antenna array, 488 circular, 108
Anticausal systems, 174 continuous, 21
Attenuation, 55 discrete-time, 63
Auto-regressive (AR), 173 in frequency domain, 34, 71
Autocorrelation function, 318 Convolution filter, 525
Autocovariance function, 318 Convolution kernel, 525
Convolutional neural network – CNN, 525
Back-propagation in CNN, 538 Cramer-Rao bound, 341
Backward difference, 207 Critically dumped response, 48
Bandpass filter, 223 Cross-validation method, 427
Basis pursuit, 754 Cumulants, 418
Bayes’ theorem, 309
Bayesian inference, 330 Daniell periodogram, 377, 378
Bayesian reconstruction, 806 Deep neural network, 514
Bias-to-variance trade-off, 372, 427 Delta method, 353
Bilinear transform, 211 Derivative
Binomial random variable, 329 complex function, 30
Blackman window, 562 Difference equation, 171, 174
Block LMS algorithm, 495 Differential equation, 47
Bootstrap method, 354 Differentiator, 71
Born-Jordan distribution, 632 Digital signals, 14
Butterworth filter, 52, 219 Direct realization I, 242
discrete-time, 216 Direct realization II, 243
Dirichlet conditions, 23
Capon’s method, 599 Discrete Cosine transform (DCT), 132
local polynomial Fourier transform (LPFT), 604 Discrete Fourier transform (DFT), 102, 117
short-time Fourier transform (STFT), 601 Discrete Hartley transform (DCT), 155
Cascade realization, 249 Discrete pseudo Wigner distribution, 618
Causal system, 22, 65 Discrete Sine transform (DST), 135
Causal systems, 169 Discrete system, 62
Central limit theorem, 334, 336 Discrete-time signals (discrete signals), 58
Characteristic function, 321 Displacement, 130
Characteristic polynomial, 174 Downsampling, 647
812
Ljubiša Stanković Digital Signal Processing 813
Variance, 297
Variance stabilization, 352
Voting mashines, 524
816
Ljubiša Stanković Digital Signal Processing 817
[48] A. V. Oppenheim and G. C. Verghese, Signals, Systems and Inference, Prentice Hall, 2015.
[49] A. Papulis, Signal Analysis, McGraw-Hill, 1997
[50] C. L. Phillips, J. Parr, and E. Riskin, Signals, Systems, and Transforms, Prentice Hall, 2013.
[51] B. Porat, A Course in Digital Signal Processing, Wiley, 1996.
[52] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms, and
Applications, Pearson; 4 edition, 2013.
[53] A. Quinquis, Digital Signal Processing Using Matlab, Wiley-ISTE, 2008.
[54] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall,
1975.
[55] M. A. Richards, Fundamentals of Radar Signal Processing, McGraw-Hill Education, second
edition, 2014.
[56] M. J. Roberts, Signals and Systems: Analysis Using Transform Methods & MATLAB, McGraw-
Hill Education, 2011.
[57] A. H. Sayed, Adaptive Filters, Wiley-IEEE, 2008.
[58] L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Time Series Analysis,
Prentice Hall, 1991.
[59] M. Soumekh, Synthetic Aperture Radar Signal Processing with MATLAB Algorithms, Wiley-
Interscience, 1999.
[60] L. Stanković, Digital Signal Processing, Naucna knjiga, Beograd, Second edition 1989 (in
Montenegrin/Serbian/Croat/Bosnian).
[61] L. Stanković, M. Daković, and T. Thayaparan, Time-frequency Signal Analysis with Applications,
Artech House, Boston, 2013.
[62] L. Stanković and I. Djurović, Solved Problems in Digital Signal Processing, University of
Montenegro, 1998 (in Montenegrin/Serbian/Croat/Bosnian).
[63] S. Stanković, I. Orović, E. Sejdić, Multimedia Signals and Systems, Springer, second edition,
2015.
[64] H. Stark and J. W. Woods, Probability and Random Processes with Applications to Signal
Processing, Prentice Hall, third edition, 2001.
[65] L. Tan and J. Jiang, Digital Signal Processing: Fundamentals and Applications, Academic Press;
2 edition , 2013.
[66] S. Theodoridis and R. Chellappa, eds., Academic Press Library in Signal Processing, Vols.1-4,
Academic Press, 2013.
[67] M. Tipping, "Sparse Bayesian Learning and the Relevance Vector Machine", Journal of Machine
Learning Research, JMLR.org, 1, 2001, pp.211- 244.
[68] A. Uncini, Fundamentals of Adaptive Signal Processing (Signals and Communication Technol-
ogy), Springer, 2014.
[69] M.Vetterli and J.Kovačević, Wavelets and Subband Coding, CreateSpace Independent Publishing
Platform, 2013.
[70] M. Vetterli, J. Kovačević, and V. K. Goyal, Foundations of Signal Processing, Cambridge
University Press, 2014.
[71] B. Widrow and D. Stearns, Adaptive Signal Processing, Prentice Hall, 1985.
About the Author
Ljubiša Stanković was born in Montenegro on June 1, 1960. He received a BSc degree in electrical
engineering from the University of Montenegro in 1982 with the award as the best student at the
University. As a student, he won several competitions in mathematics in Montenegro and former
Yugoslavia. He received an MSc degree in communications from the University of Belgrade, and a
PhD degree in theory of electromagnetic waves from the University of Montenegro in 1988. As a
Fulbright grantee, he spent 1984-1985 academic year at the Worcester Polytechnic Institute, Worcester,
MASS. Since 1982, he has been on the faculty at the University of Montenegro, where he has been a
full professor since 1995.
In 1997-1999, he was on leave at the Ruhr University Bochum, Germany, supported by the
Alexander von Humboldt Foundation. At the beginning of 2001, he was at the Technische Universiteit
Eindhoven.
From 2003 to 2008, Stanković was the rector of the University of Montenegro. He was the
ambassador of Montenegro to the United Kingdom, Iceland, and Ireland 2011-2015. During his stay in
the United Kingdom he was a visiting academic at Imperial College London, 2013-2014.
His current interests are in signal processing. He published about 500 technical papers, more than
170 of them in the leading journals.
Stanković received the highest state award of Montenegro in 1997 for scientific achievements.
Stanković was an associate editor of the IEEE Transactions on Image Processing, an associate editor of
the IEEE Signal Processing Letters, an associate editor of the IEEE Transactions on Signal Processing,
and an associate editor of the IET Signal Processing.
Stanković is a member of the Editorial Board of Signal Processing (Elsevier), associate editor of
the CN Computer Sciences (Springer Nature), deputy editor of the IET Signal Processing, and a senior
area editor of the IEEE Transactions on Image Processing.
He is a member of the National Academy of Sciences and Arts of Montenegro (CANU) since
1996, vice-president of CANU since 2015, a member of the Academia Europaea, and a member of
the European Academy of Sciences and Arts. Stanković is a Fellow of the IEEE for contributions to
time-frequency signal analysis.
Stanković (with coauthors) won the Best paper award from the European Association for Signal
Processing (EURASIP) for 2017 for a paper published in Signal Processing.
For bibliographic data and copies of the published papers see www.tfsa.ac.me
819
820 Bibliography
ISBN-13: 978-1514179987
ISBN-10: 1514179989
All right reserved. Printed and bounded in the United States of America.
No part of this book may be reproduced or utilized in any form or by any means, electronic or
mechanical, including photocopying, recording, or by any information storage and retrieval system,
without permission in writing from the copyright holder.