Lecture Notes An Introduction To Digital Communications: 1997-2011 by Armand M. Makowski
Lecture Notes An Introduction To Digital Communications: 1997-2011 by Armand M. Makowski
AN INTRODUCTION TO DIGITAL
COMMUNICATIONS
Armand M. Makowski 2
1
1997-2011
c by Armand M. Makowski
2 Department of Electrical and Computer Engineering, and Institute for Systems Re-
search, University of Maryland, College Park, MD 20742. E-mail: [email protected].
Phone: (301) 405-6844
2
Part I
Preliminaries
3
Chapter 1
Decision Theory
In this chapter we present the basic ideas of statistical decision theory that will
be used repeatedly in designing optimal receivers in a number of settings. These
design problems can all be reduced to problems of M -ary hypothesis testing which
we investigate below in generic form.
pm := P [H = m] , m = 1, . . . , M.
5
6 CHAPTER 1. DECISION THEORY
P [X x|H = m] = Fm (x), x Rd .
P [X x, H = m] = pm Fm (x), x Rd , m = 1, . . . , M.
for all k, m = 1, . . . , M . The use of any admissible rule d in D thus incurs a cost
C(H, (X)). However, the value of the cost C(H, (X)) is not available to the
decision-maker3 and attention focuses instead on the expected cost J : D R
defined by
J() := E [C(H, (X)] , D.
The Bayesian M-ary hypothesis testing problem (PB ) is now formulated as
J( ? ) J(), D.
Versions of the problem with cost (1.1)(1.2) will be extensively discussed in this
text. The remainder of the discussion assumes this cost structure.
3
Indeed the value of H is not known, in fact needs to be estimated!
8 CHAPTER 1. DECISION THEORY
k 6= m
m k = ,
k, m = 1, . . . , M
with
Rd = M
m=1 m .
d(x) = m if x m , m = 1, . . . , M.
with tie breakers if necessary. For sake of concreteness, ties are broken according
to the lexicographic order, i.e., if at point x, it holds that
for distinct values i and j, then x will be assigned to ?i if i < j. With such
precautions, these sets form a partition (?1 , . . . , ?M ) of Rd , and the detector
? : Rd H associated with this partition takes the form
(1.4) H
b =m iff pm fm (x) largest
with the interpretation that upon collecting the observation vector x, the detector
? selects the state of nature m as its estimate on the basis of x.
implies
Z M
X Z M
X
f (x)dx pm fm (x)dx = pm = 1,
Rd m=1 Rd m=1
d
and the function f is indeed integrable over all of R . This fact will be used with-
out further mention in the discussion below to validate some of the manipulations
involving integrals.
For any partition (1 , . . . , M ) of Rd , we need to show that
(1.5) F (?1 , . . . , ?M ) F (1 , . . . , M ) 0,
where
F (?1 , . . . , ?M ) F (1 , . . . , M )
X M Z Z
= pm fm (x)dx pm fm (x)dx .
m=1 ?m m
XM Z Z
f (x)dx f (x)dx
m=1 ?m m
XM Z M Z
X
= f (x)dx f (x)dx
m=1 ?m m=1 m
Z Z
= f (x)dx f (x)dx = 0,
Rd Rd
1.4. ALTERNATE FORMS OF THE OPTIMAL DETECTOR 11
The MAP detector With the usual caveat on tie breakers, the definition (1.3) of
the optimal detector ? yields
Choose H
b =m iff pm fm (x) largest
pm fm (x)
iff PM largest
k=1 pk fk (x)
iff P [H = m|X = x] largest
where the last equivalence follows from Bayes Theorem in the form
pm fm (x)
P [H = m|X = x] = PM , x Rd
k=1 p k fk (x)
Choose H
b =m iff log (pm f (x|H = m)) largest.
Uniform prior and the ML detector There is one situation of great interest,
from both practical and theoretical viewpoints, where further simplifications are
achieved in the structure of the optimal detector. This occurs when the rv H is
uniformly distributed over H, namely
1
(1.6) P [H = m] = , m = 1, . . . , M.
M
12 CHAPTER 1. DECISION THEORY
Choose H
b =m iff fm (x) largest,
(1.8) X = H + V
where the rvs H and V are assumed to be mutually independent rvs distributed
as before. Under this observation model, for each m = 1, . . . , M , Fm admits the
density
1 1
e 2 (xm ) (xm )
1 0
(1.9) fm (x) = p , x Rd .
(2)p det()
We note that
Choose H
b =m iff 2 log pm (x m )0 1 (x m ) largest.
1.6. CONSECUTIVE OBSERVATIONS 13
Under uniform prior, this MAP detector becomes the ML detector and takes the
form
Choose Hb = m iff (x m )0 1 (x m ) smallest.
The form of the MAP detector given above very crisply illustrates how the
prior information (pm ) on the hypothesis is modified by the posterior information
collected through the observation vector x. Indeed, at first, if only the prior dis-
tribution were known, and with no further information available, it is reasonable
to select the most likely state of nature H = m, i.e., the one with largest value
of pm . However, as the observation vector x becomes available, its closeness to
m should provide some indication on the underlying state of nature. More pre-
cisely, if m is the closest (in some sense) to the observation x among all the
vectors 1 , . . . , M , then this should be taken as an indication of high likelihood
that H = m; here the appropriate notion of closeness is the norm on Rd induced
by 1 . The MAP detector combines these two trends when constructing the op-
timal decision in the following way: The state of nature H = m may have a rather
small value for its prior pm , making it a priori unlikely to be the underlying state
of nature, yet this will be offset if the observation x yields an extremely small
value for the distance (x m )0 1 (x m ) to the mean vector m .
When = 2 I d for some > 0, the components of V are mutually indepen-
dent, and the MAP and ML detectors take the simpler forms
1
Choose H
b =m iff 2 log pm kx m k2 largest
2
and
Choose H
b =m iff kx m k2 smallest,
respectively. Thus, given the observation vector x, the ML detector returns the
state of nature m whose mean vector m is closest (in the usual Euclidean sense)
to x. This is an example of nearest-neighbor detection.
p(h) = P [H = h]
= P [H1 = h1 , . . . , Hn = hn ] , h = (h1 , . . . , hn ) Ln .
Note that the functional form of (1.11) implies more than the conditional indepen-
dence of the rvs X 1 , . . . , X n as it also stipulates for each i = 1, . . . , n that the
conditional distribution of X i given H depends only on Hi , the state of nature at
the epoch i when this observation is taken.
The results obtained earlier apply for it suffices to identify the state of nature
as the rv H and the observation as X: We then see that the ML detector for H
on the basis of the observation vector X prescribes
n
Y
Choose H
c = (h1 , . . . , hn ) iff fhi (xi ) largest.
i=1
Choose H
b i = hi iff fhi (xi ) largest, i = 1, . . . , n.
This time again, a separation occurs under the independence assumption (1.12),
namely the combined prescriptions
Choose H
b i = hi iff P [Hi = hi ] fhi (xi ) largest, i = 1, . . . , n.
Again great simplification is achieved as the MAP detector reduces to sequentially
applying an MAP detector for deciding the state of nature Hi at epoch i on the
basis of the observation X i collected only at that epoch for each i = 1, . . . , n.
0
In fact, with the convention 0
= 0, we find
fm (y, z)
(1.14) hm (z|y) = , y Rp , z R q .
gm (y)
Returning to the definition (1.4) of the optimal detector, we see that ? pre-
scribes
Hb = m iff pm gm (y)hm (z|y) largest
with a tie-breaker. Therefore, if the conditional density at (1.14) were to not de-
pend on m, i.e.,
(1.16) H
b =m iff pm gm (y) largest.
The condition (1.15) and the resulting form (1.16) of the optimal detector suggest
that knowledge of Z plays no role in developing inference of H on the basis of
the pair (Y , Z), hence the terminology irrelevant data given to Z.
In a number of cases occuring in practice, the condition (1.15) is guaranteed by
the following stronger conditional independence: (i) The rvs Y and Z are mutu-
ally independent conditionally on the rv H, and (ii) the rv Z is itself independent
of the rv H. In other words, for each m = 1, . . . , M , it holds that
fm (y, z) = gm (y)h(z), y Rp , z R q
h(z|y) = h(z), y Rp , z Rq .
1.8. SUFFICIENT STATISTICS 17
where
kx m k2 = kxk2 2m x0 + 2m kk2 .
As a result, the density fm can be written in the form (1.17) with
1
e 22 kxk ,
1 2
h(x) = p x Rd
(2 2 )p
and
gm (t) = e 22 (2m t+m kk ) ,
1 2 2
t R.
It now follows from Theorem 1.8.1 that the mapping T : Rd R given by
T (x) := x0 , x Rd
1.9 Exercises
Ex. 1.1 Consider the Bayesian hypothesis problem with an arbitrary cost function
C : H H R. Revisit the arguments of Section 1.2 to identify the optimal
detector.
Ex. 1.2 Show that the detector identified in Exercise 1.1 is indeed the optimal
detector. Arguments similar to the ones given in Section 1.3 can be used.
Ex. 1.4 Show that the formulations (1.7) and (1.8) are equivalent.
Ex. 1.5 In the setting of Section 1.6, show that the rv H is uniformly distributed
on Ln if and only the rvs H1 , . . . , Hn are i.i.d. rvs, each of which is uniformly
distributed on L. Use this fact to obtain the form of the ML detector from the
results derived in the second half of Section 1.6, under the assumption (1.12) on
the prior.
1.9. EXERCISES 19
Ex. 1.6 Consider the situation where the scalar observation X and the state of
nature H are rvs related through the measurement equation
X = H + V
under the following assumptions: The rvs H and V are mutually independent, the
rv H takes values in some finite set H = {1, . . . , M }, and the R-valued rv V
admits a density fV . Here 1 , . . . , M denote distinct scalars, say 1 < . . . < M .
Find the corresponding ML detector.
Ex. 1.7 Continue Exercise 1.6 when the noise V has a Cauchy distribution with
density
1
fV (v) = , v R.
(1 + v 2 )
Show that the ML detector implements nearest-neighbor detection.
Ex. 1.8 Consider the multi-dimensional version of Exercise 1.6 with the observa-
tion X and the state of nature H related through the measurement equation
X = H + V
under the following assumptions: The rvs H and V are mutually independent,
the rv H takes values in some finite set H = {1, . . . , M }, and the Rd -valued rv
V admits a density fV . Here the vectors 1 , . . . , M are distinct elements of Rd .
Find the ML detector when fV is of the form
fV (v) = g(kvk2 ), v Rd
(in which case 2 > 0). Under either circumstance, it can be shown that
2 2
E eiX = ei 2 ,
(2.1) R.
and E X 2 = 2 + 2
(2.2) E [X] =
so that Var[X] = 2 . This confirms the meaning ascribed to the parameters and
2 as mean and variance, respectively.
21
22 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
= P U 1 (x )
= ( 1 (x )), x R.
The evaluation of probabilities involving Gaussian rvs thus reduces to the evalua-
tion of related probabilities for the standard Gaussian rv.
For each x in R, we note by symmetry that P [U x] = P [U > x], so that
(x) = 1 (x), and is therefore fully determined by the complementary
probability distribution function of U on [0, ), namely
(2.5) Q(x) := 1 (x) = P [U > x] , x 0.
2
2 a a 2 a2
ax bx = b x x = b x + , xR
b 2b 4b
so that
Z 2
a2 a
I(a, b) = e 4b eb(x 2b ) dx
R
r Z r
a2 b b(x 2ba )2
= e 4b e dx.
b R
The desired conclusion (2.6) follows once we observe that
Z r
b b(x 2ba )2
e dx = 1
R
a 1
as the integral of a Gaussian density with mean = 2b
and variance 2 = 2b
.
Sometimes we shall be faced with the task of evaluating integrals that reduce
to integrals of the form (2.6). This is taken on in
and Z
2 2
(2.12) Erfc(x) = et dt, x 0.
x
1
Erf(x) = 2 (x 2) and Erfc(x) = 2Q(x 2),
2
so that
Erf(x) = 1 Erfc(x), x 0.
Conversely, we also have
1 x 1 x
(x) = 1 + Erf and Q(x) = Erfc .
2 2 2 2
Thus, knowledge of any one of the quantities , Q, Erf or Erfc is equivalent to
that of the other three quantities. Although the last two quantities do not have
a probabilistic interpretation, evaluating Erf is computationally more efficient.
Indeed, Erf(x) is an integral of a positive function over the finite interval [0, x]
(and not over an infinite interval as in the other cases).
26 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
P [U > x] E eU ex
2
= ex+ 2
x2 (x)2
(2.13) = e 2 e 2
t2
With the Taylor series expansion of e 2 in mind, approximations for Q(x) of
increased accuracy thus suggest themselves by simply approximating the second
exponential factor (namely ext ) in the integral at (2.15) by terms of the form
n
X (1)k
(2.16) t2k , n = 0, 1, . . .
k=0
2k k!
2.5. GAUSSIAN RANDOM VECTORS 27
for each n = 0, 1, . . ..
Proposition 2.4.1 Fix n = 0, 1, . . .. For each x > 0 it holds that
(2.17) Q2n+1 (x) Q(x) Q2n (x),
with
(2n)! (2n+1)
(2.18) | Q(x) Qn (x) | x (x).
2n n!
where n
X (1)k (2k)!
(2.19) Qn (x) = (x) x(2k+1) .
k=0
2k k!
and
(2.23) X =st + T U p
where U p is the Rp -valued rv (U1 , . . . , Up )0 .
From (2.22) and (2.23) it is plain that
E [X] = E [ + T U p ] = + T E [U p ] =
and
E (X ) (X )0 = E T U p (T U p )0
= T E U p U 0p T 0
(2.24) = T I p T 0 = ,
whence
E [X] = and Cov[X] = .
Again this confirms the terminology used for and as mean vector and covari-
ance matrix, respectively.
It is a well-known fact from Linear Algebra [, , p. ] that for any symmetric
and non-negative definite d d matrix , there exists a d d matrix T such that
(2.22) holds with p = d. This matrix T can be selected to be symmetric and non-
negative definite, and is called the square root of . Consequently, for any vector
in Rd and any symmetric non-negative definite d d matrix , there always
exists an Rd -valued Gaussian rv X with mean vector and covariance matrix
Simply take
X =st + T U d
where T is the square root of .
0 h Pp 0 i
= ei E ei k=1 (T )k Uk
p
0 h 0 i
i E ei(T )k Uk
Y
(2.26) = e
k=1
p
0 0
= ei e 2 |(T )k |
1 2
Y
(2.27)
k=1
upon invoking (2.22). It is now plain from (2.27) that the characteristic function
of the Gaussian Rd -valued rv X is given by (2.25).
Conversely, consider an Rd -valued rv X with characteristic function of the
form (2.25) for some vector in Rd and some symmetric non-negative definite
d d matrix . By comments made earlier, there exists a d d matrix T such
that (2.22) holds. By the first part of the proof, the Rd -valued rv X f given by
X
f := + T U d has characteristic function given by (2.25). Since a probability
distribution is completely determined by its characteristic function, it follows that
30 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
N () := {x Rd : x = 0d }.
= AE [(X )(X )0 ] A0
(2.29) = AA0 .
Consequently, the Rq -valued rv Y has mean vector +A and covariance matrix
AA0 .
Next, by the Gaussian character of X, there exist a d p matrix T for some
positive integer p and i.i.d. zero mean unit variance Gaussian rvs U1 , . . . , Up such
that (2.22) and (2.23) hold. Thus,
Y =st + A ( + T U p )
= + A + AT U p
(2.30) = e +TeUp
with
e := + A and T
e := AT
and the Gaussian character of Y is established.
This result can also be established through the evaluation of the characteristic
function of the rv Y . As an immediate consequence of Lemma 2.8.1 we get
32 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
The first part of Lemma 2.9.1 is a simple rewrite of Corollary 2.8.1. Some-
times we refer to the fact that the rv X is Gaussian by saying that the rvs X 1 , . . . , X r
are jointly Gaussian. A converse to Lemma 2.9.1 is available:
It might be tempting to conclude that the Gaussian character of each of the rvs
X 1 , . . . , X r alone suffices to imply the Gaussian character of the combined rv
X. However, it can be shown through simple counterexamples that this is not so.
In other words, the joint Gaussian character of X does not follow merely from
that of its components X 1 , . . . , X r without further assumptions.
(n)
where for each n = 1, 2, . . ., the integer kn and the coefficients aj , j = 1, . . . , kn ,
(n)
are non-random while the rvs {Yj , j = 1, . . . , kn } are jointly Gaussian rvs. Typ-
ically, as n goes to infinity so does kn . Note that under the foregoing assumptions
for each n = 1, 2, . . ., the rv Xn is Gaussian with
kn
X h i
(n) (n)
(2.36) E [Xn ] = aj E Yj
i=1
and
kn X
X kn
(n) (n) (n) (n)
(2.37) Var[Xn ] = ai aj Cov[Yi , Yj ].
i=1 j=1
34 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
Therefore, the study of such integrals is expected to pass through the conver-
gence of sequence of rvs {Xn , n = 1, 2, . . .} of the form (2.35). Such considera-
tions lead naturally to the need for the following result [, Thm. , p.]:
In that case,
X k =k X
where X is an Rd -valued Gaussian rv with mean vector and covariance matrix
.
The second half of condition (2.38) ensures that the matrix is symmetric
and non-negative definite, hence a covariance matrix.
Returning to the partial sums (2.35) we see that Lemma 2.10.1 (applied with
d = 1) requires identifying the limits = limn E [Xn ] and 2 = limn Var[Xn ],
in which case Xn =n X where X is an R-valued Gaussian rv with mean and
variance . In Section ?? we discuss a situation where this can be done quite
easily.
with Y and Z independent zero mean Gaussian rvs with variance 2 . It is easy to
check that
x2
(2.40) P [X > x] = e 22 , x 0
with corresponding density function
d x x2
(2.41) P [X x] = 2 e 22 , x 0.
dx
2.12. A PROOF OF PROPOSITION ?? 35
with
yn
(2.49) | Hn (y) ey | .
n!
so that
d y
e Hn+1 (y) = ey Hn (y) .
dy
Integrating and using the fact Hn+1 (0) = 1, we find
Z y
y
et Hn (t) dt.
(2.50) e Hn+1 (y) =
0
An easy induction argument now yields (2.48) once we note for the basis step that
H0 (y) > ey for all y > 0.
To obtain the bound (2.49) on the accuracy of approximating ey by Hn (y), we
proceed by induction on n. For n = 0, it is always the case that |ey H0 (y)| 1,
whence (2.49) holds for all y 0 and the basis step is established. Next, we
assume that (2.49) holds for all y 0 for n = m with some m = 0, 1, . . ., namely
ym
(2.51) |ey Hm (y)| , y 0.
m!
Hence, upon invoking (2.50) we observe that
Z y
y
|e Hm+1 (y)| |et Hm (t)|dt
Z0 y m
t y m+1
dt = , y0
0 m! (m + 1)!
in mind to use (2.48) to bound the second exponential factor in the integrand of
(2.15), we note that
n
(1)k 2k xt
Z 2 Z
xt t X
e Hn dt = t e dt
0 2 k=0
2k k! 0
n
(1)k (2k+1) 2k u
X Z
= x u e du
k=0
2k k! 0
n
X (1)k (2k)!
(2.52) = x(2k+1)
k=0
2k k!
where the last equality made use of the well-known closed-form expressions
Z
up eu du = p!, p = 0, 1, . . .
0
2.13 Exercises
Ex. 2.1 Derive the relationships between the quantities , Q, Erf or Erfc which
are given in Section 2.4.
38 CHAPTER 2. GAUSSIAN RANDOM VARIABLES
Ex. 2.2 Given the covariance matrix , explain why the representation (2.22)
(2.23) may not be unique. Give a counterexample.
Ex. 2.3 Give a proof for Lemma 2.9.1 and of Lemma 2.9.2.
Ex. 2.4 Construct an R2 -valued rv X = (X1 , X2 ) such that the R-valued rvs X1
and X2 are each Gaussian but the R2 -valued rv X is not (jointly) Gaussian.
Ex. 2.5 Derive the probability distribution function (2.40) of a Rayleigh rv with
parameter ( > 0).
Ex. 2.6 Show by direct arguments that if X is a Rayleigh distribution with pa-
rameter , then
h Xi2 is exponentially distributed with parameter (2 2 )1 [Hint:
2
Compute E eX for a Rayleigh rv X for 0.]
Ex. 2.7 Derive the probability distribution function (2.45) of a Rice rv with pa-
rameters (in R) and ( > 0).
Ex. 2.9 Let X1 , . . . , Xn be i.i.d. Gaussian rvs with zero mean and unit variance
and write Sn = X1 + . . . + Xn . For each a > 0 show that
na2
e 2
(2.53) P [Sn > na] (n ).
a 2n
This asymptotic is known as the Bahadur-Rao correction to the large deviations
asymptotics of Sn .
1. (Commutativity)
v + w = w + v, v, w V
2. (Associativity)
(u + v) + w = u + (v + w), u, v, w V
v + 0 = v = 0 + v, vV
39
40 CHAPTER 3. VECTOR SPACE METHODS
v + (v) = 0 = (v) + v
It is a simple matter to check that there can be only one such zero vector 0, and
that for every vector v in V , its negative v is unique.
In order for the group (V, +) to become a vector space on R we need to endow
it with an external multiplication operation whereby multiplying a vector by a
scalar is given a meaning as a vector. This multiplication operation, say : R
V V , is required to satisfy the following properties:
1. (Distributivity)
(a + b) v = a v + b v, a, b R, v V
2. (Distributivity)
a (v + w) = a v + a w, a R, v, w V
3. (Associativity)
a (b v) = (ab) v = b (a v) , a, b R, v V
4. (Unity law)
1 v = v, vV
(3.2) a1 = . . . = ap = 0.
v+w E and av E
42 CHAPTER 3. VECTOR SPACE METHODS
2. (Symmetry)
hv, wi = hw, vi, v, w V
3. (Positive definiteness)
hv, vi > 0 if v 6= 0 V
hv, vi 0, v V.
The properties listed in Proposition 3.4.1 form the basis for the notion of norm
in more general settings [?].
product and its positive definiteness. To establish the triangular inequality, con-
sider elements v and w of V . It holds that
where the first equality follows by bilinearity of the scalar product, and the in-
equality is justified by the Cauchy-Schwartz inequality (discussed in Proposition
3.4.2 below). This establishes the triangular inequality.
holds with equality in (3.6) if and only if v and w are co-linear, i.e., there exists a
scalar a in R such that v = aw.
Q(t) := kv + twk2
(3.7) = kvk2 + 2thv, wi + t2 kwk2 , tR
by bilinearity of the scalar product. The fact that Q(t) 0 for all t in R is
equivalent to the quadratic equation Q(t) = 0 having at most one (double) real
root. This forces the corresponding discriminant to be non-positive, i.e.,
and the proof of (3.6) is completed. Equality occurs in (3.6) if and only if = 0,
in which case there exists t? in R such that Q(t? ) = 0, whence v + t? w = 0, and
the co-linearity of v and w follows.
In the remainder of this chapter, all discussions are carried out in the context
of a vector space (V, +) on R equipped with a scalar product h, i : V V R.
3.5. ORTHOGONALITY 45
3.5 Orthogonality
The elements v and w of V are said to be orthogonal if
hv, wi = 0.
We also say that the vectors v 1 , . . . , v p are (pairwise) orthogonal if
hv i , v j i = 0, i 6= j, i, j = 1, . . . , p.
More generally, consider an arbitrary family {v , A} of elements in V with
A some index set (not necessarily finite). We say that this family is an orthogonal
family if every one of its finite subset is itself a collection of orthogonal vectors.
A moment of reflection shows that this is equivalent to requiring the pairwise
conditions
(3.8) hv , v i = 0, 6= A.
Moreover, for any subset E of V , the element v of V is said to be orthogonal
to E if
hv, wi = 0, w E.
If the set E coincides with the linear span of the vectors v 1 , . . . , v p , then v is
orthogonal to E if and only if hv, v i i = 0 for all i = 1, . . . , p.
An important consequence of orthogonality is the following version of Pythago-
ras Theorem.
Proposition 3.5.1 When v and w are orthogonal elements in V , we have Pythago-
ras relation
(3.9) kv + wk2 = kvk2 + kwk2 .
The notions of orthogonality and norm come together through the notion of
orthonormality: If the vectors v 1 , . . . , v p are orthogonal with unit norm, they are
said to be orthornormal, a property characterized by
(3.10) hv i , v j i = (i, j), i, j = 1, . . . , p.
The usefulness of this notion is already apparent when considering the follow-
ing representation result.
Lemma 3.5.2 If E is a linear space of V spanned by the orthornormal family
u1 , . . . , up , then the representation
p
X
(3.11) h= hh, ui iui , hE
i=1
We emphasize that the discussion of Sections 3.4 and 3.5 depends only on
the defining properties of the scalar product. This continues to be the case in the
material of the next section.
3.6. DISTANCE AND PROJECTION 47
(3.15) hv v ? , hi = 0, h E.
Before giving the proof of Proposition 3.6.1 in the next section we discuss
some easy consequences of the conditions (3.15). These conditions state that
the vector v v ? is orthogonal to E. The unique element v ? satisfying these
constraints is often called the projection of v onto E, and at times we shall use the
notation
v ? = ProjE (v),
in which case (3.15) takes the form
Corollary 3.6.1 For any linear space E of V , the projection mapping ProjE :
V E is a linear mapping wherever defined: For every v and w in V whose
projections ProjE (v) and ProjE (w) onto E exist, the projection of av + bw onto
E exists for arbitrary scalars a and b in R, and is given by
We stress again that at this level of generality, there is no guarantee that the
projection always exists. There is however a situation of great practical impor-
tance where this is indeed the case.
For future use, under the conditions of Lemma 3.6.1, we note that
p
X
(3.19) kProjE (v)k2 = |hv, ui i|2 , vV
i=1
hv v ? , ui i = hv, ui i hv ? , ui i
p
X
= hv, ui i hv, uj ihuj , ui i
j=1
(3.20) = hv, ui i hv, ui i = 0, i = 1, . . . , p.
k = 1, 2
hv v ?k , hi = 0,
hE
so that
hv ?1 v ?2 , hi = 0, h E.
Using h = v ?1 v ?2 , element of E, in this last relation we find kv ?1 v ?2 k = 0,
whence v ?1 = v ?2 necessarily.
Let v ? be an element in E which satisfies (3.14). For any h in E, the vector
v + th is also an element of E for all t in R. Thus, by the definition of v ? it holds
?
that
kv v ? k2 kv (v ? + th)k2 , t R
with
kv (v ? + th)k2 = kv v ? k2 + t2 khk2 2thv v ? , hi.
Consequently,
t2 khk2 2thv v ? , hi 0, t R.
This last inequality readily implies
and
|t|khk2 2hv v ? , hi, t < 0.
Letting t go to zero in each of these last two inequalities yields hv v ? , hi 0
and hv v ? , hi 0, respectively, and the desired conclusion (3.15) follows.
Conversely, consider any element v ? in E satsifying (3.15). For each x in E,
(3.15) implies the orthogonality of v v ? and h = v ? x (this last vector being
in E), and Pythagoras Theorem thus yields
kv xk2 = kv v ? k2 + kv ? xk2 kv v ? k2 .
This establishes the minimum distance requirement for v ? and (3.15) indeed char-
acterizes the solution to (3.14).
sp (v 1 , . . . , v n ) = sp (u1 , . . . , up ) .
and go to Step 2.
At Step k, the procedure has already returned the ` orthonormal vectors u1 , . . . , u`
with ` = `(k) k, and let E` denote the corresponding linear span, i.e., E` :=
sp(u1 , . . . , u` ).
Step k + 1: Pick v k+1 .
Either v k+1 lies in the span E` , i.e.,
X
v k+1 = hv k+1 , uj iuj ,
j=1
3.8.1 Exercises
Ex. 3.1 Show that in a commutative group (V, +), there can be only one zero
vector.
Ex. 3.2 Show that in a commutative group (V, +), for every vector v in V , its
negative v is unique.
Ex. 3.4 If E is a linear subspace of V , then it necessarily contains the zero ele-
ment 0. Moreover, v belongs to E if and only if v belongs to E.
Ex. 3.5 For non-zero vetrors v and w in V , we define their correlation coefficient
by
hv, wi
(v; w) = .
kvkkwk
Ex. 3.6 Show that |(v; w)| 1. Find a necessary and sufficient condition for
(v; w) = 1 and (v; w) = 1.
Ex. 3.7 If the set E is the linear span of the vectors v 1 , . . . , v p in V , then show
that v is orthogonal to E if and only if hv, v i i = 0 for all i = 1, . . . , p.
Ex. 3.8 Consider a linear subspece E which is is spanned by the set F in V . Show
that v in V is orthogonal to E if and only if vis orthogonal to F .
Ex. 3.9 Let E1 and E2 be subsets of V such that E1 E2 . Assume that for some
v in V , its projection ProjE2 (v) exists and is an element of E1 . Explain why
Ex. 3.11 Repeat Exercise 3.3 using the Gram-Schmidt orthonormalization proce-
dure.
3.8. GRAM-SCHMIDT ORTHONORMALIZATION 53
Ex. 3.12 Let (V1 , +) and (V2 , +) denote two vector spaces on R. A mapping
T : V1 V2 is linear if
Ex. 3.13 For i = 1, 2, let (Vi , +) denote a vector space on R, equipped with its
own scalar product h, ii : Vi Vi R, and let k ki denote the corresponding
norm. A mapping T : V1 V2 is said to be norm-preserving if
kT (v)k2 = kvk1 , v V1 .
Ex. 3.14
54 CHAPTER 3. VECTOR SPACE METHODS
Chapter 4
Finite-dimensional representations
55
56 CHAPTER 4. FINITE-DIMENSIONAL REPRESENTATIONS
The signal has finite energy if E() < . The space of all finite energy signals
on the interval I is denoted by L2 (I), namely
L2 (I) := { : I R : E() < }.
The set L2 (I) can be endowed with a vector space structure by introducing a
vector addition and multiplication by constants, i.e., for any and in L2 (I) and
any scalar a in R, we define the signals + and a by
( + )(t) := (t) + (t), tI
and
(a)(t) := a(t), t I.
The signals + and a are all finite energy signals if and are in L2 (I). It
is easy to show that equipped with these operations, (L2 (I), +) is a vector space
on R. The zero element for (L2 (I), +) will be the zero signal : I R defined
by (t) = 0 for all t in I.
In L2 (I) the notion of linear independence specializes as follows: The signals
1 , . . . , p in L2 (I) are linearly independent if
p
X
ai i =
i=1
Example 4.2.1 Take I = [0, 1] and for each k = 0, 1, . . ., define the signal k :
[0, 1] R by k (t) = tk (t I). For each p = 1, 2, . . ., the signals 0 , 1 , . . . , p
are linearly independent in L2 (I). Therefore, L2 (I) cannot be of finite dimension.
Here as well, we can define a product scalar by setting
Z
h, i := (t)(t)dt, , L2 (I).
I
We leave it as an exercise to show that this definition gives rise to a scalar product
on L2 (I). The norm of a finite energy signal is now defined by
p
kk := (, ), L2 (I)
or in extensive form,
Z 12
2
p
kk = |(t)| dt = E(), L2 (I).
I
It should be noted that this notion of energy norm is not quite a norm on L2 (I)
as understood earlier. Indeed, positive definiteness fails here since kk = 0 does
not necessarily imply = Just take (t) = 1 for t in I Q and (t) = 0 for
t in I Qc , in which case kk = 0 but 6= ! This difficulty is overcome by
partitioning L2 (I) into equivalence classes with signals considered as equivalent
if their difference has zero energy, i.e., the two signals and 0 in L2 (I) are
equivalent if k 0 k2 = 0. It is this collection of equivalence classes that should
be endowed with a vector space structure and a notion of scalar product, instead
of the collection of all finite energy signals defined on I Pointers are provided in
Exercises 4.3-4.6. This technical point will be not pursued any further as it does
not affect the analyses carried out here. Thus, with a slight abuse of notation, we
will consider the scalar product defined earlier on L2 (I) as a bona fide scalar
product.
With these definitions, the notions of orthogonality and orthonormality are
defined as before. However, while in Rd there could be no more than d vectors
which can ever be orthonormal, this is not the case in L2 (I) [Exercise 4.8].
Example 4.2.2 Pick I = [0, 1] and for each k = 0, 1, . . . define the signals k :
I R by
(4.1) 0 (t) = 1, k (t) = 2 cos(2kt), t I, k = 1, 2, . . .
For each p = 1, 2, . . ., the signals 0 , 1 , . . . , p are orthonormal in L2 (I).
4.3. PROJECTIONS IN L2 (I) 59
The notion of distance on L2 (I) associated with the energy norm takes the
special form
Z 12
2
d(, ) := |(t) (t)| dt , , L2 (I).
I
Additional assumptions are needed on E for (4.2) to hold for all signals in L2 (I).
However, when ? does exist, it is necessarily unique by virtue of Proposition
3.6.1.
To gain a better understanding as to why the projection onto E may fail to
exist, consider the situation where a countably infinite family of orthonormal sig-
nals {k , k = 1, 2, . . .} is available. For each n = 1, 2, . . ., let En denote the
linear span of the n first signals 1 , . . . , n . Fix in L2 (I). By Lemma 3.6.1 the
projection of onto En always exists, and is given by
n
X
bn := ProjEn () = h, k ik ,
k=1
en := bn ,
Case 2 When is not an element of E , then is not the zero signal but
two distinct scenarios are possible.
E = sp(k , k = 1, 2, . . .).
The set E is called the closure of the linear subspace E ; it is itself a linear
subspace of L2 (I) which could be defined by
E := { L2 (I) : () = 0}.
to capture the intuitive fact that ProjE () is the limiting signal increasingly
approximated by the projection signals {bn , n = 1, 2, . . .}. Note that here =
ProjE ().
It follows from the discussion above that only finitely many of the coefficients
{h, k i, k = 1, 2 . . .} can ever be zero, and some care therefore needs to be
exercised in defining this element (4.4) of L2 (I) Up to now only finite linear
combinations have been considered. For our purpose, it suffices Pto note that for
any sequence {ck , k = 1, . . .} of scalars, the infinite series k=1 ck k can be
made to represent an element of L2 (I) under the summability condition
X
(4.5) |ck |2 < .
k=1
converge
P in some suitable sense to an element of L2 (I) (which is represented by
k=1 ck k ). We invite the reader to check that indeed
X
(4.6) |h, k i|2 < , L2 (I).
k=1
62 CHAPTER 4. FINITE-DIMENSIONAL REPRESENTATIONS
Example 4.3.1 Continue with the situation in Example 4.2.2, and set
X 1
(t) := 2
cos(2kt), t I.
k=1
k
The signal is a well defined element of L2 (I) with () = 0, and yet is not an
element of E .
Example 4.3.2 Continue with the situation in Example 4.2.2, and take
(t) := sin(2t), t I.
Example 4.3.3 Continue with the situation in Example 4.2.2, and take
X 1
(t) := sin(2t) + 2
cos(2kt), t I.
k=1
k
This time, it is still the case that () > 0 but the projection of onto E does
not exist.
E 6= L2 (I),
Proof. First, when restricted to En , the projection operator ProjEn reduces to the
identity, i.e., ProjEn () = whenever is an element of En . Thus, with the
notation introduced earlier, for any in En , we have
Xn
= n =
b h, k ik
k=1
so that n
X
kk2 = |h, k i|2
k=1
and (4.8) holds. The relation (4.9) is proved in a similar way.
As a result, if Tn () = Tn ( 0 ) for signals and 0 in En , then Tn ( 0 ) = 0
by linearity and k 0 k = kTn ( 0 )k = 0 by isometry. The inescapable
conclusion is that = 0 , whence Tn is one-to-one.
Finally, any vector v = (v1 , . . . , vn ) in Rn gives rise to a signal v in En
through
Xn
v := vk k .
k=1
It is plain that hv , k i = vk for each k = 1, . . . , n, hence Tn (v ) = v and the
mapping Tn is onto.
holds.
4.5. EXERCISES 65
4.5 Exercises
Ex. 4.1 Consider two families u1 , . . . , up and w1 , . . . , wq of linearly independent
vectors in Rd . Show that we necessarily have p = q whenever
sp (u1 , . . . , up ) = sp (w1 , . . . , wq ) .
Ex. 4.4 With the notation of Exercise 4.3, show that addition of signals and mul-
tiplication of signals by scalars are compatible with this equivalence relation .
More precisely, with 0 and 0 in L2 (I), show that + 0 + 0 and
a a 0 for every scalar a.
Ex. 4.5 With 0 and 0 in L2 (I), show that kk2 = k 0 k2 and that
h, i = h 0 , 0 i.
Ex. 4.6 Let L2 (I) denote the collection of equivalence classes induced on L2 (I)
by the equivalence relation . Using Exercise 4.4 and Exercise 4.5, define a
structure of vector space on L2 (I) and a notion of scalar product.
Ex. 4.7 Show that the signals {k , k = 0, 1, . . .} of Example 4.2.1 are linearly
independent in L2 (I).
Ex. 4.8 Show that the signals {k , k = 0, 1, . . .} of Example 4.2.2 form an or-
thonormal family in L2 (I).
66 CHAPTER 4. FINITE-DIMENSIONAL REPRESENTATIONS
t [0, 1]
k (t) = tk ,
k = 0, 1, 2
Does the answer depend on the order in which the algorithm processes the signals
0 , 1 and 2 ?
Ex. 4.10 The distinct finite energy signals 1 , . . . , n defined on [0, 1] have the
property that 1 (t) = . . . = n (t) for all t in the subinterval [, ] with 0 < <
< 1. Are such signals necessarily linearly independent in L2 [0, 1]? Explain.
Ex. 4.11 Starting with a finite energy signal g in L2 [0, T ] with E(g) > 0, define
the two signals gc and gs in L2 (0, T ) by
gc (t) := g(t) cos (2fc t) and gs (t) := g(t) sin (2fc t) , 0tT
for some carrier frequency fc > 0. Show that the signals gc and gs are always
linearly independent in L2 [0, T ].
0tT
sm (t) = A cos(2fc t + m ),
m = 1, . . . , M
with amplitude A > 0, carrier fc > 0 and distinct phases 0 1 < . . . < M <
2. What is the dimension L of sp (s1 , . . . , sM )? Find an orthonormal family in
L2 [0, T ], say 1 , . . . , L , such that sp (s1 , . . . , sM ) = sp (1 , . . . , L ). Find the
corresponding finite dimensional representation.
Ex. 4.14 Same problem as in Exercise 4.12. for the M signals given by
0tT
sm (t) = Am g(t),
m = 1, . . . , M
Ex. 4.18 Consider a finite energy non-constant pulse g : [0, 1] R, with g(t) >
0 in the unit interval [0, 1]. Are the signals g and g 2 linearly independent in
L2 [0, 1]? Are the signals g, g 2 , . . . , g p always linearly independent in L2 [0, 1]?
Ex. 4.19 For each > 0, let s and c denote the signals R R given by
Ex. 4.24