0% found this document useful (0 votes)
881 views70 pages

BBC 2009Ch1

Uploaded by

Luke Pfister
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
881 views70 pages

BBC 2009Ch1

Uploaded by

Luke Pfister
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Hilbert Spaces and Least Squares Methods for Signal Processing

Yoram Bresler Samit Basu Christophe Couvreur


February 16, 2009
2
Acknowledgement: The help of Kiryung Lee in drafting some of the gures is gratefully acknowledged
Copyright c (2009 by Yoram Bresler
Contents
1 Matrix Inverse Problems with Least-Squares Solutions 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 The vector space C
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 Matrices and C
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.4 Subspaces Associated with Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Existence and Uniqueness of Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Left and Right Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4.1 Full Column Rank case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4.2 Full Row Rank Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.1 Approximation on a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.5.2 Orthogonality Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5.3 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5.4 Projectors in C
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6 Four Fundamental Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7 Least Squares Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.8 Minimum Norm Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.8.1 Choosing a Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.8.2 Finding the Minimum Norm Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.8.3 Properties of Minimum Norm Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.9 Minimum Norm Least Squares Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.10 The MoorePenrose Pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2 Singular Value Decomposition 71
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.2 Properties of Hermitian Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.3 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3.1 The SVD Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.4 Computing the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.5 Properties of the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3
4 CONTENTS
2.6 Application to MNLS and Construction of the Pseudoinverse . . . . . . . . . . . . . . . . . . 80
2.7 Lower Rank Approximation of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.8 Sensitivity, and Conditioning of Matrix Inverse Problems . . . . . . . . . . . . . . . . . . . . 86
2.8.1 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.8.2 Conditioning of Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.9 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.10 Total Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.10.1 Unconstrained TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.10.2 Constrained TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.11 Subspace Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3 Temporal and Spatial Spectrum Estimation 109
3.1 Harmonic Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.1.1 Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.1.2 Pronys Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.1.3 Tufts and Kumaresan Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.1.4 Coordinate Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.1.5 Iterated Quadratic Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.2 Sensor Array Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.2.1 Beamforming for DOA estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.2.2 MUSIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.2.3 ESPRIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.2.4 Harmonic Retrieval Using Sensor Array Methods . . . . . . . . . . . . . . . . . . . . . 132
4 Inverse Problems in Linear Vector Spaces 137
4.1 Why general linear vector spaces? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.2.1 Linear Combinations And Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.3 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3.2 Sequence Spaces: The
p
NVSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.3.3 Function Spaces: The Lebesgue L
p
NVSs . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.4 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.4.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.4.2 Induced Norms on IPSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.4.3 The
2
IPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.4.4 The L
2
IPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.5 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.6 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.7 Norms on Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.8 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
CONTENTS 5
5 Finite Dimensional Linear Vector Spaces 173
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.2 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.3 Representation of Linear Operators in Finite Dimensional Spaces . . . . . . . . . . . . . . . . 175
5.3.1 Matrix Representation of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.4 Transition Matrices and Similarity Transformations . . . . . . . . . . . . . . . . . . . . . . . . 178
5.4.1 Numerical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.6 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.7 MNLS Problems in Finite Dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6 Innite Dimensional Linear Vector Spaces 187
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Vector Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.3 Complete NVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.4 Vector Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.5 Linear Operators in Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7 Complete Inner Product Spaces Hilbert Spaces 197
7.1 Projections and Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.1.1 Approximation on a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.1.2 Orthogonality Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.1.3 Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.2 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.3 Fundamental Subspace Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.4 Linear Inverse Problems in Innite Dimensional Vector Spaces . . . . . . . . . . . . . . . . . 205
7.4.1 Least Squares, Minimum Norm, and Minimum Norm Least Squares . . . . . . . . . . 205
7.4.2 Pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.5 Extrapolation of Bandlimited Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8 Review of Probability 215
8.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.3 Discrete-Time Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9 The Hilbert Space of Random Variables 221
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.2 Dening the Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.3 Convergence of Sequences of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9.4 Linear Minimum Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
9.5 Best Linear Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.6 Sequential Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6 CONTENTS
10 Least Squares and Random Processes 231
10.1 LTI Transformations of Zero Mean, WSS Processes . . . . . . . . . . . . . . . . . . . . . . . . 231
10.2 FIR Wiener Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
10.3 Innite Past Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Chapter 1
Matrix Inverse Problems with
Least-Squares Solutions
1.1 Introduction
In this chapter, we will begin to introduce the mathematical tools needed to solve least squares problems.
We will begin our discussion of least squares problems in C
n
, which is the n-dimensional complex Euclidean
space. We will start here for three reasons. First, as a straightforward generalization of the standard 2
and 3 dimensional Euclidean spaces that we study in Geometry, C
n
serves as the setting in which all of our
intuition works. Concepts like planes, points, vectors, etc. for which we have a physical intuition can be
directly used to formulate and solve least squares problems. The second reason for studying C
n
is that most
of the concepts that we learn in this chapter will generalize to spaces that are far more complicated than
C
n
. We will rely on C
n
(indeed, on the 2D and 3D real Euclidean spaces, where we can build models and
draw pictures) to develop and understand the results in these more complicated spaces. The third reason for
studying C
n
, is that many practical problems need to be solved on a computer, which is essentially limited
to manipulations on C
n
.
1
In light of these three objectives, our approach to the vector space C
n
is perhaps somewhat unusual.
Because we want to use the results that we develop in this part of the book in later parts, we will generally
avoid the shortest path to the desired result. Indeed, at times it will seem like our treatment is positively
obscure. But there is a good reason for this lengthy treatment. Many of the arguments and proofs that
we present in this chapter are universal in the sense that they work verbatim in the more complicated
spaces considered later. Thus, a thorough understanding of the results and their proofs now is invaluable to
accessing the analogous material in the more complicated spaces. If we instead gave the simplest arguments
for these results (e.g., by using properties of C
n
that do not generalize), then we would have to resort to
re-explaining these results from scratch in the more complicated situations. Our intuition from this chapter
would no longer generalize to later chapters. Hence, we take the less obvious route now, and reap the benets
later in our study of least squares problems.
1
In fact, owing to the nite word-length used in representing numbers, computing is limited to Q
n
, a space where all
coordinates are rational numbers. However, because every real number can be approximated arbitrarily well by a rational
(stated mathematically as the rationals are dense on the real line), we will ignore this nuance in our discussions.
7
8 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Throughout this chapter, we have included a number of Illustrations. These are segments in which
we discuss the relevant theory in terms of signal processing applications. For example, the vast majority of
the illustrations in this chapter focus on the deconvolution problem. The reason for this is twofold. First,
deconvolution is exible enough to demonstrate and require all of the theory developed in this chapter.
Second, deconvolution is an extremely useful tool in signal processing. For these two reasons, we will return
time and again to deconvolution as a means of illustrating the relevant theory.
1.2 The vector space C
n
1.2.1 Denitions
We briey dene the notation used throughout the rst part of the book in this section. We assume that
the reader has a working knowledge of basic linear algebra, and will skim through the basic denitions that
are important from that eld. For a more thorough review of linear algebra, we can strongly recommend [?]
or [?].
We will be primarily interested in the set C
n
, which is the set of all n-tuples of complex numbers, i.e.,
items of the form x = [x
1
, x
2
, . . . , x
n
]
T
, where x
j
C. These lists of complex numbers will be referred to
as vectors, which is suggestive of their generalization of vectors in the plane. We will also use the set R
n
,
which is the set of all n-tuples of real numbers. In DSP, such vectors can be used to represent nite-length
sequences, or nite-duration discrete-time signals. We will therefore use these terms interchangeably.
The sets C
n
and R
n
have additional structure that makes them special. First, we can add elements of
these sets by adding their components:
[a
1
, a
2
, . . . , a
n
]
T
+ [b
1
, b
2
, . . . , b
n
]
T
= [a
1
+b
1
, a
2
+b
2
, . . . , a
n
+b
n
]
T
(1.1)
Note that the sum of two vectors from C
n
is always a vector in C
n
. Such a property is referred to as a
closure property, and we say that C
n
is closed under addition, using (1.1) as a denition of addition. Note
also that R
n
is a subset of C
n
which is also closed under addition. We can also multiply vectors by constants
from the appropriate set, and still end up with a vector. For example, if x C
n
, and C, then
x = [x
1
, x
2
, . . . , x
n
]
T
. (1.2)
So, clearly, C
n
is closed under multiplication by complex numbers, Thus, we say that C
n
is closed under
scalar multiplication, using (1.2) as a denition of multiplication. The same also holds for R
n
, if we assume
that scalars are real. Subsets of C
n
(or subsets of R
n
if the set of scalars is taken as R, and not C) with the
closure properties under addition as dened in (1.1) and scalar multiplication as dened in (1.2) are called
subspaces, a term which we will use to greater generality in Part II of this book. For now, we will work
exclusively with subspaces of C
n
and R
n
. When the subset in question is the entire set, i.e. C
n
or R
n
, we
will refer to the subspace as a vector space.
It is useful to think of subspaces as hyperplanes (the generalization of a plane to n dimensions), but with
an extra condition. Consider, for example, the following plane:
Example 1.1:
V = x R
3
: x
1
+x
2
+x
3
= 1.
1.2. THE VECTOR SPACE C
N
9
A quick check veries that V is not closed under addition or scalar multiplication. For example, the zero
vector 0 is not an element of V . But from the denition of a subspace, we can deduce that 0 must be an
element of a subspace, because subspaces are closed under scalar multiplication, and multiplying a vector by
0 yields 0. So when thinking of a subspace as a plane, we must be careful to remember that the hyperplane
must include the origin 0. Here are a few examples of subspaces of C
n
and R
n
. We leave it to the reader to
verify the closure properties in each case.
Example 1.2: x C
5
: x
1
= 0
x R
3
: x
1
+ 2x
3
= 0
x C
1
: !x = 0
x R
2

Note that the last example is a valid subspace because we do not require the subspaces to be proper, i.e.
we allow the entire set R
2
, in this case, to be a subspace of itself.
Illustration 1.1 (DFT and Subspaces): Subspaces often consist of signals sharing a useful common physical
property. For example, consider DFT-bandlimited signals. Recall that the DFT of a length N complex signal x
n

N1
n=0
is
the sequence
y(k) =
N1

n=0
x(n)e
j
2
N
kn
, k = 0, . . . N 1. (1.3)
Arranging the elements of the signal and its DFT into vector x, y C
N
, respectively, we write the DFT more compactly as
y = DFTx. Consider now the the subset of signals with zero energy outside the DFT frequency band B = k
1
, . . . k
2
.
In set notation, we can describe the set of such signals as
= x C
N
: y(k) = 0 k / B, y = DFTx. (1.4)
We will call such signals DFT-Bandlimited to band B, and will say that their DFT y is supported on B. The band B can
be, in fact, more general, consisting of non-consecutive frequencies, such as the union of several bands, allowing to model
bandpass signals, bandstop signals, etc.
For a xed band B, is a subspace? Indeed, from the linearity properties of DFT, we can show that is a subspace
of C
N
. The closure properties are easily checked: Let x
1
, x
2
with DFTs y
1
and y
2
, respectively, and consider
x
3
= x
1
+x
2
. Then by the linearity of the DFT
y
3
= DFTx
3
= DFT(x
1
+x
2
) = DFTx
1
+DFTx
2
= y
1
+y
2
.
Because y
1
and y
2
are supported on B, it follows that so is y
3
, hence x
3
, establishing closure under addition and
scalar multiplication.
The fact that signals with a common physical property often form a subspace is extremely useful. In particular, it will
allow us, given an arbitrary signal x C
N
, to nd the signal closest to it that has the desired property. We caution,
however, that various sets of signals that are of interest may not be subspaces. For example, we leave it to the reader to
explain why the set of signals with conjugate symmetric DFT
= x C
N
: y(k) = y(N k), k = 0, . . . N 1 y = DFTx
is not a subspace.
10 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Illustration 1.2 (FIR lters and Subspaces): The concept of a subspace arises naturally in the study of linear
systems. For example, consider a FIR lter with a length K unit-pulse response h = [h(0), h(1), . . . , h(K 1)]
T
, applied
to a length N input signal x = [x(0), x(1), . . . , x(N 1)]
T
. We know that the output is of length M = N +K 1, and
that with zero initial conditions on the lter h, the output is described by a convolution:
y(m) =
N1

k=0
x(k)h
T
(mk) m = 0, 1, . . . , M 1 (1.5)
where
h
T
(n) =

h(n) if 0 n K 1
0 otherwise
.
We can write (1.5) more compactly as y = x h, using to denote a convolution.
Suppose now, that we x the impulse response of the lter, h, and allow x take on any value in C
N
. The output of
the lter is an element of C
M
, and the set of all possible output signals y, is described in set notation as
V = y C
M
: y = x h, x C
N
. (1.6)
Why is this interesting? Well, it follows from (1.5) (or alternately from the linearity properties of convolution) that V is a
subspace of C
M
. The closure properties are easily checked as follows. Let y
1
, y
2
V , and y
3
= y
1
+y
2
.
y
1
V = x
1
C
N
such that y
1
= x
1
h
y
2
V = x
2
C
N
such that y
2
= x
2
h,
so now by the linearity of convolution
y
3
= y
1
+y
2
= x
1
h +x
2
h = (x
1
+x
2
) h = x
3
h, x
3
C
N
= y
3
V.
As we will shortly begin to discuss, subspaces are mathematical entities that we can manipulate and study. Like the
previous illustration, this illustration in terms of an FIR lter provides the motivation for that study. For example, one
question we might ask is does y
0
belong to V ?. This kind of question arises when we have a model for h, but can only
measure y
0
. Noise in the measurements means we cannot be certain that y
0
V . Or, if the noise is negligible, we may
instead want to determine whether our model for h is correct (i.e. agrees with the physical measurement). We may want
to clean up our measurement y
0
by nding an element of V that is close to y. All of these questions, and far more
can be answered using the tools of this chapter.
Being able to add two vectors together is useful, and we can generalize it by allowing a nite number of
vectors, and allowing for scalings as well. Hence, we dene a linear combination of vectors in a set S (which
may or may not be nite) as a nite sum of the form
M

m=1

m
x
m
, x
m
S
where the
j
are scalars.
2
A set S is termed linearly independent if

m=1

m
x
m
= 0;
m
C

= (
m
= 0; m = 1, . . . , M)
2
At this point, it makes no sense to take innite sums, because such sums require a notion of convergence that we will not
cover until Part II.
1.2. THE VECTOR SPACE C
N
11
for every nite M, and every set x
m

M
m=1
S. The name correctly suggests that none of these vectors can
be written as a linear combination of the remaining M 1 vectors. The set of all vectors reachable by
taking (nite) linear combinations of elements of a set of vectors S is called the span of S. It can be dened
explicitly as
spanS = set of all possible linear combinations of vectors in S
We leave it as an exercise to verify that the span of any set of vectors S is a subspace. When the set S has
the additional property of linear independence, then the set S is called a basis for the subspace spanS. In
other words, a set of vectors B is a basis for a subspace V if it satises two conditions: (i) the elements of B
are linearly independent; and (ii) every element of V can be written as a linear combination of elements of
B. The dimension of a subspace is the number of elements in any basis for that subspace. This denition
only makes sense because every basis for a subspace must have the same number of elements in it.
Example 1.3 (The Standard Basis): The set of vectors e
i
= [0, . . . 0, 1, 0, . . . 0]
T
, i = 1, . . . n, where only the ith
element of e
i
is one, is a basis for both C
n
and R
n
. It is known as the standard basis.
Because they admit the standard basis, both C
n
and R
n
have dimension n. Likewise, because every basis
in these spaces can have at most n elements in it, every subspace of C
n
(or R
n
) has dimension at most n.
The following example, with an innite set S, further illustrates the concepts of span, basis, and dimension.
Example 1.4: Consider the set S = x = [1, 2, ]
T
: R R
3
. Clearly, S is a line in three dimensional Euclidean
space, parallel to the z axis, and passing through the point [1, 2, 0]. It is easily veried that
spanS = x = [, 2, ]
T
: , R = span[1, 2, 0]
T
, [0, 0, 1]
T
.
Hence, spanS is a two dimensional subspace of R
3
(a plane), with basis [1, 2, 0]
T
, [0, 0, 1]
T
. Hence, a single line can
span a two-dimensional space.
Illustration 1.3 (FIR Filters, Subspaces, Basis, and Dimension): Consider again the space V of possible
outputs of the FIR lter h, dened in (1.6). Is this a proper subspace of C
M
? One way to determine this, is to nd the
dimension dim(V ) of V . If dim(V ) < dim(C
M
) = M, then (why?) V is a proper subspace of C
M
.
To nd the exact dimension of V , we will nd a basis for it. To do so, we rewrite (1.5) as
y =
N1

k=0
x(k)D
k
h (1.7)
where D
k
denotes a delay by k samples, so that D
k
h C
N
is dened by
D
k
h(m) =

h(mk) if 0 mk K 1
0 otherwise
m = 0, . . . M 1.
Even without further work, we can already determine that dim(V ) N, because by (1.7) any y V can be represented
as a linear combination of no more than N xed vectors D
k
h, k = 0, . . . N 1. It follows that V is a proper subspace of
C
M
for any K > 1, i.e., for any FIR with more than one tap.
12 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
To determine the exact dimension of V , we will now show that the set D
k
h, k = 0, . . . N 1 is in fact a basis for
V . In view of (1.7), we only need to check the linear independence of these vectors. To that end, arrange the D
k
h as
columns of a M N matrix
H

= [D
0
h, D
1
h, . . . D
N1
h] =

h
0
0 . . . 0
h
1
h
0
.
.
.
.
.
.
.
.
. h
1
.
.
.
0
h
K1
.
.
.
.
.
.
h
0
0 h
K1
.
.
. h
1
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 h
K1

. (1.8)
Then, by (1.7), we have
y = h x = Hx. (1.9)
It is not hard to show by direct algebraic arguments (see the exercises) that all N columns of H are linearly independent
whenever h ,= 0. Here, we present instead an argument that uses our knowledge of signal processing.
Returning to (1.7), suppose y = 0. If we can show that this implies x(k) = 0, k = 0, . . . N 1, then D
k
h, k =
0, . . . , N 1 must be a linearly independent set. Consider the Z transform of the convolution relation y = h x,
Y (z) = H(z)X(z). Suppose y = 0; then Y (z) = 0 for all z C. Furthermore, if h ,= 0, H(z) can only vanish at at most
K 1 points in the complex plane (why?). Hence, X(z) = 0 z C,implying that x = 0. It therefore follows that if
h ,= 0, then y = 0 = x = 0, and D
k
h, k = 0, . . . , N 1 is a linearly independent set. In conclusion, we have shown
that if h ,= 0, then D
k
h, k = 0, . . . , N 1 is a basis for V , and therefore dim(V ) = N.
We can measure the size of vectors in these subspaces using the concept of a norm. A norm on a vector
space is a function that assigns a positive real number to every vector. It has four properties that make it
conform to our notions of length in the plane. These properties are:
Property 1.1 (Norm Properties). 1. Nonnegative: |x| 0
2. Positive Denite: |x| = 0 x = 0.
3. Homogeneous: |x| = [[ |x|
4. Triangle Inequality: |x +y| |x| +|y|.
Recall the denition of the Euclidean norm on C
n
, which is a straightforward extension of the notion of
length in two or three dimensions:
|x|
2
=

i=1
[x
i
[
2
.
Verifying that the Euclidean norm, along with some other, common norms, satises Property 1.1 is left to
the exercises. A norm also induces a measure of distance. For our purposes, we will dene the distance
between two vectors x and y to be |x y|.
In the sequel, we will restrict our attention, unless explicitly stated, to the Euclidean norm, or two-norm
||
2
for vectors in C
n
. We will therefore usually drop the subscript 2 in its notation.
1.2. THE VECTOR SPACE C
N
13
A nal structural element that we will add to our vector spaces, is that of the dot product. The dot
product of two vectors x, y in C
n
is denoted and dened as
x
H
y =
n

i=1
x
i
y
i
.
To recall one of the most important interpretations of the dot product, we need the following inequality,
which is arguably one of the most important inequalities in the context of this book. The proof is left as an
exercise.
Theorem 1.1 (Cauchy-Schwartz Inequality). If x, y C
n
, then
[x
H
y[ |x| |y| ,
with equality if and only if x = y, with C.
This theorem (which for brevity we will refer to as the CS Inequality) allows us to interpret the dot
product as the generalization of an angle between two vectors,
= arccos

x
H
y
|x| |y|

. (1.10)
We can make such a denition because the argument of the arccos is guaranteed, by the CS inequality, to
be between 0 and 1. Also, this denition is consistent with our geometric notion of angle in Euclidean 2 or
3 space.
Illustration 1.4 (Comparing Signals): The dot product, the CS inequality, and (1.10) together serve as very useful
tools in the comparison of signals. Suppose for example, a sample segment of a speech signal x = [x(0), x(1), . . . , x(N
1)]
T
R
N
is to be compared to a template x
0
recorded under dierent circumstances. We would like to compare the
two signals to gauge how similar x is to x
0
. But because the signals were evaluated under possibly dierent conditions,
it is possible that x has been scaled by some unknown constant . If we think of these sampled signals as vectors in a
N-dimensional vector space, then the angle between the two vectors:
(x, x
0
) = arccos

x
H
x
0
|x| |x
0
|

,
gives us a measure of proximity that is length invariant. In particular, we get the following properties by using this angle
as a measure of similarity:
(x, x
0
) = 0 implies x = x
0
by the CS inequality,
(c
1
x, c
2
x
0
) = (x, x
0
) for any c
1
, c
2
R,
(x, x
0
) = (x
0
, x).
Hence, the notion of angles between vectors has a useful interpretation in signal processing as a measure of similarity that
is invariant to scale changes.
1.2.2 Orthogonality
With the ability to measure angles between vectors, we can extend the notion of perpendicular vectors in
Euclidean 2 or 3 space to general vectors in C
n
. Recall that two vectors are perpendicular if the angle
14 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
between them is /2. A nearly equivalent denition is that of orthogonality, in which two vectors x, y in C
n
are orthogonal if
x
H
y = 0,
which we denote as x y. We will say that a vector x is orthogonal to a subset S C
n
, denoted x S, if
it is orthogonal to every vector in S, i.e., x z z S.
Note that the zero vector is orthogonal to every vector in C
n
. Furthermore, it is the only vector orthogonal
to itself, a fact that we will state formally because of its central importance to this chapter.
Lemma 1.1. x x if and only if x = 0.
Proof. If x = 0, then x
H
x = 0 by denition. Conversely, if x x then we have
x
H
x =
n

i=1
[x
i
[
2
= 0
Since each term in the sum is nonnegative, each component x
i
must be zero, and thus x = 0.
Orthogonality is important in applications, because it embodies a deterministic notion of independence.
To fully understand this idea will require some more machinery, and we will return to an interpretation of
orthogonality in terms of applications in Section 1.5.
If we x a vector of interest x, then orthogonality also denes some sets of vectors that are interesting.
First, given any two subsets S and T of our vector space C
n
, the sum of these sets is the set
S +T = x +z, x S, z T
which is a subset of C
n
. Note that, in general, the sum of the sets is dierent from their union S+T = ST.
Example 1.5: Let S = [1, 0]
T
R
2
, and T = [0, ]
T
R
2
, R. Then T is the y-axis in the plane, and S is a
point on the x-axis. Now, the union of these two sets is simply
S T = [1, 0]
T
, [0, ]
T
,
which is the y-axis, augmented by a single point on the x-axis. The sum of these sets, on the other hand, is the set
S +T = [1, ], R
which is a line parallel to the y-axis, passing through the point [1, 0]
T
.
For the most part, we will be interested in sums of subspaces. Two subspaces are called disjoint when
the only vector common to both is 0.
3
Now, when two subspaces S and T are disjoint, then their sum is
called a direct sum (denoted S T). Direct sums are extremely useful, because they allow us to uniquely
decompose an arbitrary element s (S
1
S
2
) into the sum of two components s = s
1
+ s
2
with s
1
S
1
and s
2
S
2
.
Theorem 1.2. Let S be a subspace of C
n
, and let S
1
and S
2
be subspaces of C
n
that satisfy S = S
1
S
2
.
Then for every s S, there exists a unique pair s
1
S
1
, s
2
S
2
such that s = s
1
+s
2
.
3
This usage is distinct from the notion of disjoint sets, because the two subspaces do have 0 as a common element.
1.2. THE VECTOR SPACE C
N
15
Proof. The proof is simple. Existence of a pair s
1
S
1
, s
2
S
2
is guaranteed by the denition of a direct
sum. What we have to prove is uniqueness. Suppose that there are two decompositions:
s = s
1
+s
2
s
1
S
1
, s
2
S
2
s = t
1
+t
2
t
1
S
1
, t
2
S
2
.
Then it follows that s
1
+ s
2
= t
1
+ t
2
, or by rearranging, s
1
t
1
= t
2
s
2
. Now, because S
1
is closed
under addition, we have s
1
t
1
S
1
, and likewise for S
2
, we have s
2
t
2
S
2
. Thus, it follows that
s
1
t
1
S
1
S
2
, and t
2
s
2
S
1
S
2
. The intersection of these subspaces is the single element 0. We
conclude
s
1
t
1
= 0 t
2
s
2
= 0
and the decomposition is unique.
Next, we dene algebraic complements. Two subspaces S and T of C
n
are algebraic complements if they
are disjoint, and their algebraic sum is the entire space: S +T = C
n
. The notion of algebraic complements
is distinct from the notion of set complements. Furthermore, while the set complement of a given subspace
S C
n
is unique, there are many subspaces that can be the algebraic complements of S. Recall that for
any set S, and any subset T S, the set complement of T, denoted T
c
, is the set x S : x T. The
following example serves to illustrate these dierences.
Example 1.6: Let S = span [1, 0]
T
, and T = span [0, 1]
T
. We can think of S and T as being the x, and y-axis,
respectively. Now, these subspaces are algebraic complements, because R
2
= S + T (indeed, we can write R
2
= S T).
Likewise, S and Q = span [1, 1]
T
are algebraic complements. But S
c
,= T, S
c
,= Q, and in fact,
S
c
= x R
2
: x
2
,= 0.
Illustration 1.5 (FIR lters and Subspaces Continued): We can examine the various notions of complements
in terms of our earlier FIR lter example. Recall our subspace of output signals for a xed h:
V = y C
M
: y = x h, x C
N
.
The set complement of V , V
c
is simply the set of unreachable signals, i.e. signals y C
M
for which no input x exists
that satises y = x h. Although this set would appear to be of great interest to us, it will turn out (for reasons to be
discussed later) to be of secondary importance.
The concept of a direct sum (which is a special case of algebraic sum) can also be illustrated in terms of the FIR lter
example. Suppose we partition our class of input signals x C
N
into two subspaces. The rst subspace X
1
corresponds
to the rst half of the input signal (assuming N is even) , i.e.
X
1
= x C
N
: x
n
= 0 n = N/2 + 1, N/2 + 2, . . . , N
and the second subspace corresponds to the second half of the input signal,
X
2
= x C
N
: x
n
= 0 n = 1, 2, . . . , N/2.
Note that X
1
+X
2
= X
1
X
2
= C
N
so that X
1
and X
2
are algebraic complements.
If we then consider the outputs generated by passing signals in these subspaces through the lter h, we get
V
1
= y C
M
: y = x h, x X
1

16 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS


and
V
2
= y C
M
: y = x h, x X
2
.
The subspaces V
1
and V
2
clearly satisfy V
1
V
2
= V , as the only vector common to both is the zero signal y = 0 (proving
this fact is left to the exercises). Furthermore, while X
1
and X
2
are algebraic complements, V
1
and V
2
are not (unless
V = C
M
). It follows by Theorem 1.2 that any signal y V can be uniquely decomposed into the sum of a signal y
1
V
1
,
and y
2
V
2
. Thus, given a measurement y from the output of our FIR lter, we can decompose it into the output from
the rst half of the signal, and into the output from the second half of the signal, although these outputs will, in general,
overlap. Such a decomposition could potentially be very useful if we are trying to learn something about the structure of
the input signal. For instance, if the FIR lter represents the response of a communications channel, separating the output
to y
1
and y
2
would amount to overcoming intersymbol interference between successive input symbols x
1
and x
2
. We will
learn how to perform such a decomposition in this chapter.
The most useful notion of a complement for our purposes, however, is the idea of an orthogonal comple-
ment. For any subset S of C
n
, the orthogonal complement, denoted S

, is the set dened by x C


n
:
x
H
s = 0 s S. Figure ?? depicts several examples of orthogonal complements in R
2
.
Example 1.7:
S =

1
4
3

0
5
2

,
S

= x C
3
: [1, 4, 3]x = 0, [0, 5, 2]x = 0 = span

[
7
5
,
2
5
, 1]
T

Example 1.8:
S = x C
2
: x
1
+x
2
= 0
S

= x C
2
: x
1
x
2
= 0
Orthogonal complements have many properties that make them fairly easy to work with. The rst
property is that the orthogonal complement of any set is a subspace.
Property 1.2. For every subset S of C
n
, S

is a subspace of C
n
.
Proof.
x, y S

= x
H
z = y
H
z = 0 z S
= , C
n
, (x +y)
H
z = 0 +0 = 0 z S
= x +y S

For a general set S, taking the orthogonal complement twice does not give back the original set S. In
light of the previous property, this should not be surprising, since the orthogonal complement of any set is
a subspace, and the original S was not necessarily a subspace. Instead, we have the following containment
property.
1.2. THE VECTOR SPACE C
N
17
Property 1.3. S (S

Proof.
x S = x y y S

= x (S

The next property is simply a restatement of Lemma 1.1 in terms of orthogonal complements, and states
that at most the zero vector belongs to both a set and its orthogonal complement. From our previous
denitions, this means that the algebraic sum of S and S

is a direct sum when S is a subspace.


Property 1.4. Let S be a set that contains 0. Then S S

= 0
Proof.
x S S

= x S, x S

= x x
= x = 0 (Lemma 1.1)
Next, we nd that the only vector that is orthogonal to the set S +S

is the zero vector.


Property 1.5. Let S be a non-empty set. Then (S +S

= 0
Proof.
x (S +S

= x (S +S

)
= x y +z, y S, z S

= x y +0, y S (0 S

, Property 1.2)
= x z, z S

= x S, x S

= x S

, x S

= x x
= x = 0
The next property is a direct consequence of Lemma 1.1 in terms of orthogonal complements, and states
that the only vector that is orthogonal to the entire space C
n
is the zero vector.
Property 1.6. (C
n
)

= 0
Proof.
x (C
n
)

= x x
= x = 0
18 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
The last, and arguably, the most important property, states that when S is a subspace, we can replace the
containment in Property 1.3 with an equality. Thus, for any subspace of C
n
, we can undo an orthogonal
complement by taking the orthogonal complement: (S

= S. Furthermore, for any set S, we can combine


Property 1.5 and Property 1.6 to get the general statement:
(S +S

= (C
n
)

.
Now, when S is a subspace, then note that S + S

is the sum of two subspaces, which is also a subspace


(verify this for yourself). So for this special case, we can take the orthogonal complement of both sides of
this equation to yield
S +S

= C
n
,
or using the direct sum notation, C
n
= S S

. This decomposition is extremely important to our study


of least squares problems and subspaces (and their orthogonal complements) in general. Together with
Theorem 1.2, it states that any vector x C
n
can be decomposed uniquely into a sum x = x
S
+x
S
, where
x
S
S, and x
S
S

. This decomposition is important, because out of all such additive decompositions


x = x
1
+ x
2
, where x
1
S and x
2
T (with S + T = C
n
), this particular decomposition has optimality
properties that we will study and exploit to solve least squares problems.
Property 1.7. If S is a subspace of C
n
, then (i) (S

= S, (ii) C
n
= S S

Proof. Our arguments above show that (ii) follows from (i) and Properties 1.5 and 1.6. The proof of (i) is
deferred until later.
1.2.3 Matrices and C
n
Now that we have described C
n
and given it some structure, we turn to the manipulation of vectors. Recall
that a map is a function that assigns to each element of a set X (called the domain), a unique element in
another set Y (called the range). If a map T has domain X and range Y , we will denote this by writing
T : X = Y , read T maps X into Y . When X and Y are sets of vectors, like C
n
, it is useful to think
of maps as black boxes, or lters, that take inputs from one vector space, and return vectors in another
vector space. This model, depicted in Figure ??, is tremendously exible. For example, if a vector represents
n samples of an input signal, we might pass these samples through some type of digital lter to obtain m
samples of the output signal. The digital lter is then a map.
This notion of a map is too exible to admit much detailed analysis. We will focus on a class of maps
that are reasonably exible, and yet can be analyzed quite thoroughly. Specically, we will be interested in
linear maps. Linear maps A : X = Y are maps that obey superposition and homogeneity (or in short,
linearity)
A(x +y) = A(x) +A(y) , C, x, y X
Linear maps are suciently exible to characterize many, but not all, signal processing problems of interest.
Illustration 1.6 (Vector Operations in Signal Processing): Here we list a number of vector manipulations that
might arise in signal processing applications, and discuss the linearity of each operation.
1. Convolution - when the input signal x and the impulse response h are nite length, then the output signal y = hx
is of nite length, and we can think of convolution in terms of operations on vectors. This operation is, of course,
linear.
1.2. THE VECTOR SPACE C
N
19
2. Time reversal - for a vector x C
n
, the operation of reversing its elements, i.e.
x
R
= [x
n
, x
n1
, . . . , x
1
]
T
arises in signal processing applications, such as computing correlations or convolutions, and is a linear operation.
3. Truncation and zero padding - truncating a vector x C
n
means to map it to a smaller vector y C
m
with m < n
by discarding some of the elements of x. Zero padding is the reverse operation, and is the mapping of x into a
larger vector by inserting zeros where appropriate. Both of these operations are linear.
4. Down-sampling and upsampling by integer factors - down-sampling by a factor of N is the operation of keeping only
1 out of every N samples from a signal, and upsampling is the insertion of N 1 zeros after every sample of a
signal. Both of these operations are linear.
5. Permutation of vector elements - a permutation is simply a reshuing of the components of a vector, i.e., each
index k k is mapped to unique a new index k

, with component x
k
replacing component x
k
. This operation is
linear. Note that time reversal is simply a special case of a permutation.
6. DFT, DWT, DCT - the Discrete Fourier, Wavelet, and Cosine transforms are all linear operations on vectors that
frequently arise in signal processing applications and methods.
7. Quantization - quantization, which maps the elements of x into integers is clearly not a linear operation.
8. Image rotation - by appropriately reordering an N N image, we can think of it as a vector in C
N
2
. Rotation of
this image by a multiple of /2 radians corresponds to linear operations on that vector.
9. Thresholding - the threshold operation,
(Thresh

x)
i
=

x
i
if [x
i
[ >
0 otherwise
is a nonlinear mapping that zeroes out the components of x that are smaller in magnitude than some threshold .
An important characteristic of linear maps, is that when the domain and range are vector spaces like C
n
or R
n
, these linear maps can be written in an explicit form. Every linear transformation from C
n
into C
m
can be written as a matrix, or array of scalars, of size mn. A matrix A acts on a vector x C
n
by matrix
multiplication to return a vector y C
m
, denoted y = Ax, and dened by
y
i
=
n

j=1
a
i,j
x
j
, i = 1, . . . , m,
where a
i,j
is the element of A (written as an array), in the ith row, and jth column. We will write a matrix
that maps C
n
into C
m
(i.e. an mn matrix) as an element of the set C
mn
.
We leave it to the reader to check that every linear transformation can be written as a matrix that acts
on vectors in this way. So our interest in linear maps leads us directly to a study of matrices. We will
generally focus on these matrices for the remainder of this part, occasionally using the equivalence of linear
maps and matrices in our work.
Illustration 1.7 (Matrix representation of the DFT): The DFT of a length N complex sequence, y = DFTx,
dened in (1.3), can be represented as
y = Wx, (1.11)
where the N N DFT matrix W has elements W
mn
= e
j
2
N
mn
.
20 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Likewise, we know from DSP that the original sequence can be recovered from its DFT by the inverse DFT,
x(n) =
1
N
N1

k=0
x(n)e
j
2
N
kn
, n = 0, . . . N 1. (1.12)
denoted as x = DFT
1
y. Not surprisingly, this operation too can be represented in matrix form as x = W
1
y,
where W
1
is the inverse DFT matrix, which is equal to the matrix inverse of the DFT matrix, and has elements
(W
1
)
mn
=
1
N
e
j
2
N
mn
.
For future reference, we note a few additional properties of the DFT matrix and its inverse. From the form of the
elements of these matrices, it follows that they are symmetric matrices, that is,
W
T
= W, (W
1
)
T
= W
1
.
and that they are related in a simple way one is the complex conjugate of the other, to within scaling:
W
1
=
1
N
W =
1
N
W
H
,
where the last equality follows by the symmetry of W. It therefore follows that
WW
H
= W
H
W = NI,
so that the normalized DFT matrix

W

=
1

N
W is a unitary matrix

W

W
H
=

W
H

W = I.
Illustration 1.8 (FIR Filters and matrix representation of convolution): As shown in (1.9), the linear map
corresponding to convolution with a nite impulse response h can be represented by the convolution matrix H dened
in (1.8). Note that H has a special structure: its elements are constant along diagonals. In other words, H
ij
is only a
function of the dierence i j of the row an column indices. A matrix with this structure is known as a Toeplitz matrix.
Toeplitz matrices arise in many signal processing applications (including convolution, of course) and have been the subject
of extensive studies. We will encounter them and their properties again in other parts of the book.
Illustration 1.9 (FIR Filters and Matrix Representation of Truncated Convolution): Consider again the
convolution relation y = h x. This time however, we only observe a truncated version of the convolved signal: our
measurement is now the vector of samples b C
L
for L N +K 1,
b = [y(c), y(c + 1), . . . , y(c +L 1)]
T
c N +K L. (1.13)
Truncated convolution frequently arises in imaging systems. Here, the signal x (the imaged scene) is innitely long, or at
least very long compared to the size of the image plane. Thus, we only record part of the scene y. Occasionally, we are
interested in the part of the scene that is the same size as our recording. Hence, in this instance, we have L = N, the
size of the input and output are the same. The matrix H C
LN
describing the truncated linear convolution problem
can be constructed as H = TH, where H was dened previously (the untruncated linear convolution matrix), and
T C
L(N+K1)
is the truncation matrix dened as
T =

e
T
c+1
e
T
c+2
.
.
.
e
T
c+L

0 . . . 0 1 0 . . . . . . . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . . . . 0 1 0 . . . 0

, (1.14)
where e
l
is the lth column of I
(N+K1)
. The resulting L N truncated convolution matrix is also a Toeplitz matrix.
1.2. THE VECTOR SPACE C
N
21
Illustration 1.10 (Matrix representation of Circular convolution): The circular convolution (modulo N) of
two length N vectors x C
N
, h C
N
, is denoted by y = h
N
x and dened by
y(m) =
N1

n=0
x(n)h(< mn >
N
), m = 0, . . . , N 1 (1.15)
where < >
N
is the modulo-N operation. Using arguments similar to those in Illustration 1.3, the reader can show that
circular convolution can be represented in matrix form as
y = h
N
x = Hx, (1.16)
where the N N circular convolution matrix H is
H

=

h(0) h(N 1) h(N 2) . . . h(1)


h(1) h(0) h(N 1) . . . h(2)
h(2) h(1) h(0) . . . h(3)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
h(N 1) h(N 2) h(N 3) h(0)

. (1.17)
Similarly to noncircular convolution, H is a Toeplitz matrix. However, it has an additional property particular to circular
convolution: successive rows of H are related by a cyclic shift, as are successive columns. A matrix with these properties is
known as a circulant matrix. Its most important property is that it is diagonalized by a DFT matrix (see exercises). Note
that if C is a circulant matrix, so are C
Trans
and C
Herm
.
Before continuing, we recall the denition of the adjoint, or Hermitian transpose of a matrix. For any
matrix A : C
m
= C
n
, the Hermitian transpose matrix is the matrix B : C
n
= C
m
dened by
b
i,j
= a
j,i
, i = 1, . . . , m, j = 1, . . . , n.
We will write A
H
for the Hermitian transpose of the matrix A. When working with linear maps on real
vector spaces, i.e., real matrices, the Hermitian transpose is also equal to the transpose, denoted by A
T
.
Like the vectors on which they operate, we can measure the size of a linear map by means of a matrix
norm, a positive real-valued function of a matrix with the same properties as Property 1.1, only with x
replaced by A. We will work exclusively with the following two norms, the Frobenius norm, which is more or
less a straightforward extension of the Euclidean norm of a vector, and the Spectral norm, which is trickier
to understand, but generally more useful.
Denition 1.1 (Frobenius Norm). |A|
F
=

m
i=1

n
j=1
[a
i,j
[
2
=

tr

A
H
A

1/2
.
Denition 1.2 (Spectral Norm). |A|
2
= max
xC
n
,x=1
|Ax|
The spectral norm has the form of maximizing the gain of the matrix A, measured as |Ax| / |x| over
all possible choices of x.
Example 1.9 (Spectral Norm of a Diagonal Matrix): Let C
nn
be a diagonal matrix, with diagonal entries

k
. We denote this by = diag(), where = [
1
, . . .
n
]
T
. By denition, the spectral norm squared of is
||
2
2
= max
xR
n
,x=1
|x|
2
.
22 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
We can bound the norm from above by noting
|x|
2
=
n

k=1
[
k
x
k
[
2
[
max
[
2
n

k=1
[x
k
[
2
where

max
= max
k=1,...,n
[
k
[.
Using the constraint that |x|
2
=

n
k=1
[x
k
[
2
= 1, we arrive at an upper bound for the matrix norm as
||
2
2
[
max
[
2
.
We can achieve this upper bound by choosing x = [0, 0, . . . , 0, 1, 0, . . . , 0]
T
, with a 1 in the column corresponding to
max
.
Thus, we have
|diag()|
2
= [
max
[. (1.18)
We will also nd it useful to have a special notation for multiplication by a diagonal matrix. In particular, we write
x = x where denotes the element-by-element, or Schur-Hadamard product,
b = [
1
x
1
,
2
x
2
, . . . ,
n
x
n
]
T
.
Hence, the operation of element-by-element scaling of a vector is a linear operation, with a diagonal matrix representation.
Illustration 1.11 (Spectral Norm of Circular Convolutions): The ubiquitous convolution gives a special inter-
pretation to the spectral norm, as the word spectral suggests. Suppose that x C
N
, h C
N
, and y = h
N
x is the
circular convolution of the two. As shown in Illustration 1.10, the circular convolution can be represented by a circulant
matrix H. We will compute the spectral norm of this matrix.
From basic DSP, we know that circular convolutions can be computed using the DFT. Furthermore, because the DFT
is a linear operation, it has a matrix representation, let us call it W. Now, the circular convolution can be computed as
y = W
1
([Wh] [Wx]).
We can therefore compute the spectral norm of H using this representation in terms of the DFT. In particular, note that
by the Parseval Identity for the DFT |Wx| = |x| =

W
1
x

for any vector x (assuming that the DFT is normalized


properly). Hence, we nd
|Hx| =

W
1
([Wh] [Wx])

= |([Wh] [Wx])| = |diag(Wh)[Wx]|


Using the fact that the DFT is an invertible transformation, and its norm preserving properties, we get the following
expression for the spectral norm of H:
|H|
2
= max
xC
n
,x=1
|Hx|
= max
zC
n
,z=1

HW
1
z

= max
zC
n
,z=1
|diag(Wh)z|
= max
i
[(Wh)
i
[ ,
where the last equality follows from (1.18). So the spectral norm of the circulant matrix H is simply the magnitude of the
largest component in the DFT of h (i.e. the largest spectral component of h).
We will discuss each of these norms in more detail (and particularly how to calculate the spectral norm)
later on.
1.2. THE VECTOR SPACE C
N
23
1.2.4 Subspaces Associated with Linear Maps
The remainder of Part I is devoted to the study of inverse problems in C
n
, that is, given a set of measurements
b C
m
, which are related to some unknown quantities x C
n
by a known linear map A, nd x. In terms
of matrices, which we know are equivalent to linear maps, we would like to solve the equation Ax = b for x,
given b. Furthermore, we will be almost exclusively interested in the case when A is not a square matrix,
leading to more measurements than unknowns, or more unknowns than measurements. In studying such
equations, two subspaces associated with A will be of central importance.
The range space of the linear map A : C
m
= C
n
, denoted R(A), is a subset of C
m
dened by
Denition 1.3 (Range Space). The Range Space of A C
mn
, denoted R(A), is a subset of C
m
dened
by
R(A)

=Ax, x C
n

The Range Space is actually a subspace of C


m
, as the following property establishes.
Property 1.8. R(A) is a subspace of C
m
, and dimR(A) = rank A.
Proof. Let the columns of the matrix A be a
1
, a
2
, . . . , a
n
; then
Ax = (a
1
a
2
. . . a
n
)x = a
1
x
1
+a
2
x
2
+. . . a
n
x
n
so that
R(A) = spana
1
, a
2
, . . . , a
n
,
which is a subspace whose dimension is equal to the number of linearly independent columns of A, which,
by denition, is also the rank of A.
The other space of primary interest associated with a matrix A is the nullspace of A:
Denition 1.4 (NullSpace). The nullspace of A C
mn
, denoted N(A), is a subset of C
n
dened by
N(A)

=x C
n
, Ax = 0
The nullspace of A is also a subspace. A fact from linear algebra, is that the dimension of N(A) =
n rank(A). There are two ways to prove this fact. One method is by reducing A to row-echelon form, see
[?] for details. We will provide an alternate proof in Section 1.5.
Illustration 1.12 (FIR Filters): We can now recast our study of convolution in Illustration 1.2-Illustration 1.8
in terms of matrices and their associated spaces. The set V of Illustration 1.2 is then simply R(H). Considering our
discussion of V and its usefulness in terms of signal processing applications, the range space of a matrix can be considered
to be the subspace of all possible output signals. This interpretation remains true for any linear operation, not just for
convolution. If a vector y belongs to R(H), then there is an input signal x that when operated on by H, yields y. Note
the importance of the range space in an inverse problem sense. If we observe the output of the convolution y = Hx, and
wish to nd x given y (called deconvolution), then a prerequisite for such an x to exist is that y R(H). Such a y is
called consistent because it is consistent with our model that y was formed by ltering a signal with h.
The nullspace of H is equally important in the context of inverse problems. In terms of the convolution, the nullspace
N(H) consists of all signals zeroed out by the lter h. If we were working with sequences, then this would correspond
to signals in the stop-band of the lter h. But FIR lters, and nite length sequences do not give rise to nullspaces. In
24 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
particular, it follows from Illustration 1.3 that N(H) = 0 unless h = 0, in which case N(H) = C
N
. So for an FIR
model, with no truncation and linear convolution, the resulting convolution matrix has no nullspace. This is good news if
we wish to solve the deconvolution problem, because it means that no part of the input signal was lost when ltered with
h. If, on the other hand, in a dierent linear system H has a nontrivial nullspace, and we partition the input signal into
x = x
1
+x
2
with x
1
N(H), and x
2
N

(H), then we get the same output y in response to the inputs x = x


1
+x
2
,
and x = x
2
, because
y = H(x
1
+x
2
) = Hx
1
+Hx
2
= 0 +Hx
2
= Hx
2
.
If we are trying to solve a deconvolution problem, and recover x from our measurements y, we run into trouble even if
y is consistent, because some components of the input signal (those in N(H)) were not measured. In signal processing
terminology, these signals are sometimes referred to as ghosts, or ghost signals, because they cannot be measured in
the output.
Although this cannot happen with FIR lters and linear convolutions, it can certainly happen with circular convolutions.
The study of the range space and null space of a circular convolution are left as an exercise for the reader.
Two other subspaces that are associated with a matrix, which are dened in terms of the previous
two, will play an important role in the sequel. These are the row space of A, R(A
H
), and the left null
space of A, N(A
H
). By its denition, R(A
H
) is the subspace of C
n
spanned by the Hermitian transpose
of the rows of A, and hence its name. Likewise, N(A
H
) takes its name from its alternative denition as
N(A
H
) = x C
m
: x
H
A = 0. We will see that these various subspaces are related through orthogonal
complementation.
1.3 Existence and Uniqueness of Solutions
Confronted with an inverse problem of the form Ax = b, where b is known to within some measurement
error, and we would like to nd x, the immediate question we should ask ourselves is does a solution
exist?. If the answer is yes, then the next question is how many solutions are there?. Characterizing
the existence and uniqueness of solutions is important, because when the answer to the rst question is no,
(the solution does not exist) and/or the answer to the second question is more than one, (the solution is
not unique) we must redene what we mean by a solution.
We can characterize both existence and uniqueness of solutions to the equation
Ax = b
by examining the null and range spaces of A. For instance, an exact solution to the system of equations
Ax = b exists if and only if b R(A), i.e. the system of linear equations Ax = b is consistent. So, to
characterize the situation when a solution always exists, we have the following result.
Theorem 1.3 (Existence Theorem). Let A C
mn
. Then the following four statements are equivalent:
1. for each and every b C
m
, there exists at least one solution to Ax = b,
2. the Range space of A is full; i.e., C
m
= R(A),
3. rank A = m,
4. the rows of A are linearly independent; i.e. A is full row rank.
1.3. EXISTENCE AND UNIQUENESS OF SOLUTIONS 25
Each of the four conditions of Theorem 1.3 requires that m n; i.e. that there are at most as many
equations as unknowns. Equivalently, it is necessary (but not sucient) for A to have at least as many
columns as rows.
Example 1.10: For the special case that m = 1, e.g. A = [9, 2, 6], the existence theorem is satised unless A is
identically zero.
Example 1.11: Consider the following matrix,
A =

4 8 6
0 4 7

.
By inspection, we see that the rows of A are linearly independent, so that A satises the conditions of the existence
theorem. So for every vector b C
2
, there is at least one solution to the equation b = Ax.
Example 1.12: Let
A =

2 6
1 2

.
Again, by inspection, we see that A satises the conditions of the existence theorem. Note that in this case m = n.
Example 1.13: Let
A =

1 0 1
0 1 1
1 1 2

.
This matrix has three rows but only rank 2. The range space is described by
R(A) = span

1
0
1

0
1
1

,= C
3
.
For this A, the existence theorem is not satised, and b exist for which Ax = b has no solution. Consider, for example,
the vector b = [0, 0, 1]
T
. Rewriting Ax = b, we nd
x
1
+x
3
= 0
x
2
+x
3
= 0
x
1
+x
2
+ 2x
3
= 1
which is reduced (by subtracting the rst two equations from the third) to the clearly inconsistent equation
0 = 1
Example 1.14: Let
A =

6 5 6 3
2 3 0 1
4 4 3 2

.
Although this matrix has more columns than rows, it is not full row rank. The last row in A is the average of the rst
two rows. Hence, the existence conditions are not satised, and a solution to Ax = b does not always exist. This
26 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
example points out the danger of using the phrases more columns than rows, more unknowns than equations or
underdetermined to describe a matrix that satises the existence conditions. This matrix is adequately described by all
three of these phrases, but fails to satisfy the existence conditions. We will only use the phrase full row rank, which by
the statement of existence theorem, is a precise condition to guarantee existence.
While the range space characterizes existence of a solution, the nullspace of A characterizes uniqueness.
To see why, suppose Ax = b has two solutions x
1
and x
2
. Then we can subtract these equations to arrive
at A(x
1
x
2
) = 0. Hence if N(A) = 0, then x
1
= x
2
, and the solution must be unique. Conversely, if A
has a non-zero null vector, say z, then whenever Ax = b has a solution x
0
, then x
0
+z is also a solution
for any scalar . Hence, the following four equivalent statements
Theorem 1.4 (Uniqueness Theorem). Let A C
mn
. Then the following four statements are equivalent:
1. if there exists a solution to Ax = b, it is unique,
2. the nullspace of A is trivial; i.e., N(A) = 0,
3. rank A = n,
4. the columns of A are linearly independent; i.e., A is full column rank.
Each of these four statements implies that m n; i.e. that there are at least as many equations as
unknowns.
Example 1.15: If n = 1, e.g. A = [1, 5, 6]
T
, then the conditions of the uniqueness theorem are trivially satised
unless A is identically zero. A solution to Ax = b does not always exist (unless m = 1) because A does not satisfy the
existence theorem, but when a solution does exist, it is unique.
Example 1.16: Let
A =

0 1
1 1
1 0

.
This matrix satises the conditions of the uniqueness theorem, because it has full column rank. Thus, when a solution
to Ax = b exists, it is unique. For example, if b = [4, 6, 2]
T
, the unique solution is x = [2, 4]
T
. On the other hand, if
b = [1, 0, 0]
T
, no solution exists.
Example 1.17: Let A be the same as in Example 1.13. The columns of A are not linearly independent. The null
space is described by
N(A) = span

1
1
1

.
Thus, the uniqueness theorem is not satised. Note that this matrix also failed the existence theorem. Thus, in some
sense, it is the worst possible type of matrix with respect to uniqueness and existence: a solution to Ax = b may not
exist, and if it does exist, then it cannot be unique.
1.3. EXISTENCE AND UNIQUENESS OF SOLUTIONS 27
Example 1.18: Let
A =

2 5 3
3 5 2
3 7 4
7 8 1

.
This example is analogous to Example 1.14. The matrix has more rows than columns, but the columns are linearly
dependent. The null space is spanned by the single vector [1, 1, 1]
T
. Hence, A does not satisfy the conditions of the
uniqueness theorem. Furthermore, this example warns against the use of phrases like overdetermined, more rows than
columns or more equations than unknowns to characterize uniqueness. Instead, we will say full column rank, which
by the statement of the uniqueness theorem, is a precise condition to guarantee uniqueness.
Combining the two theorems, we arrive at the existence and uniqueness theorem.
Theorem 1.5 (Existence and Uniqueness). Let A C
mn
. Then the following four statements are equiv-
alent:
1. For each and every b C
m
, there exists a solution to Ax = b and it is unique.
2. R(A) = C
m
, N(A) = 0
3. rank A = m = n
4. A is square and nonsingular.
We will nd that in practical problems, the conditions of Theorem 1.5 are almost never satised. Indeed,
the square, invertible case is of little interest to us in this book. Instead, we will focus on the other cases:
full-column rank, full-row rank, square and singular, or rectangular and rank decient. So far, we have
established that in these instances, a solution may either (1) not exist, or (2) exist and not be unique. When
the solution exists, but is not unique, the set of all solutions has a special structure. Let x
p
be one particular
solution to Ax = b. Then every other solution must be expressible as x
p
+z for some z N(A), because
Ax
p
= b = Ax = A(x x
p
) = 0 = x = x
p
+z, Az = 0.
Conversely, for every z N(A), x
p
+z must also be a solution. Thus, the solution set of Ax = b is
S = x
p
+ N(A)
Recall that elements from N(A) are the solutions to the homogeneous system Ax = 0. Also recall that N(A)
is a subspace of C
n
. Thus, the solution set S is a translated subspace, also called an ane subspace or a
linear variety. Note that S is not a subspace because it does not, in general, contain the 0 vector. Note also
that we can use any particular solution to translate the subspace - the resulting set S is unchanged.
Illustration 1.13: We will once again consider the deconvolution problem, only this time we will examine it in
the context of our existence and uniqueness conditions. Let us begin with the problem of linear convolution. From
Illustration 1.12, it is clear that the matrix we are interested in analyzing is H. As shown in Illustration 1.3, H has full
column rank, provided that the lter is not identically zero. Hence, for any nonzero lter h, H will satisfy the uniqueness
conditions. On the other hand, for K > 1, H has more rows than columns, and thus cannot possibly have full row rank.
28 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
So except for the trivial case of a one tap lter, H cannot satisfy the existence conditions. So we can draw the following
conclusion about the linear deconvolution problem:
Linear Deconvolution: For any y C
N+K1
, a vector x may or may not exist that satises y = h x. When it
does exist, it is the only such vector, i.e. it is unique.
Consider now the case of circular deconvolution, which has the matrix representation y = h
N
x = Hx, with H
dened in (1.17). For circular convolutions dened by (1.15), the matrix H is always square, and so we need to check the
existence and uniqueness conditions carefully to determine if they are satised. We leave this study as an exercise for the
reader, with the hint that the DFT matrix introduced in Illustration 1.11 is very useful in analyzing and interpreting the
relevant conditions.
Our third example is that of truncated linear convolution discussed in Illustration 1.9. In the context of an inverse
problem, truncated deconvolution frequently arises in image processing for deblurring. For the truncated convolution
problem, testing the existence and uniqueness conditions is somewhat more complicated than for the untruncated or
circular cases. A study of truncated convolution in terms of uniqueness and existence is treated in the exercises.
1.4 Left and Right Inverses
We are now in a position to examine a matrix A that describes a problem, and determine whether a solution
to the equation Ax = b exists, and if it does exist, whether it is unique. These conditions, stated in
terms of the column and row rank of the matrix A are useful in classifying the solutions to this equation.
But now, we turn to the more practical matter of solving Ax = b for x, given b. We cannot, in general,
write x = A
1
b, because A may not satisfy the conditions of Theorem 1.5 (i.e., A may not be square and
nonsingular). Instead we will try to nd matrices B, known as generalized inverses, such that x = Bb is
a solution to Ax = b when such a solution exists. We will treat the cases of full column rank and full row
rank individually, beginning with the case of full column rank.
1.4.1 Full Column Rank case
When matrix A is full column rank, then
For each b R(A), there exists a unique x C
n
such that Ax = b.
What we will show now is that the solution is linear in b. Because we know that any linear map can be
written as a matrix, we will have proven the existence of a generalized inverse matrix, called a left inverse
of A, which given any b R(A), returns the unique x C
n
for which Ax = b.
Theorem 1.6 (Existence of the Left Inverse). If A C
mn
, with rank A = n, then there exists A
L
C
nm
such that (i) A(A
L
b) = b for all b R(A), and (ii) A
L
(Ax) = x for all x C
n
.
Proof. Because we already know that for every vector b R(A) there is a unique x C
n
for which Ax = b,
we can dene a map A
L
: R(A) =C
n
such that A
L
(b) = x. We can immediately conclude that:
A
L
(Ax) = x for all x C
n
AA
L
(b) = b for all b R(A).
The key step is to show that this map is linear, and hence has a matrix representation. Let x
1
, x
2
C
n
and
b
1
= Ax
1
, b
2
= Ax
2
, such that x
1
= A
L
(b
1
) and x
2
= A
L
(b
2
). Then for scalars , C,
A(x
1
+x
2
) = b
1
+b
2
1.4. LEFT AND RIGHT INVERSES 29
so that
A
L
(b
1
+b
2
) = A
L
[A(x
1
+x
2
)] = x
1
+x
2
= A
L
(b
1
) +A
L
(b
2
).
Since A
L
is linear, it must have a matrix representation A
L
.
Before continuing, there are two important points to make about this proof. First, it is not constructive,
meaning that it gives no immediate indication of how one goes about nding or constructing A
L
. We will
consider constructions of A
L
later on. The other important point is that we proved that A
L
exists, by rst
proving that a map exists, and then proving that the map is linear. This is a technique that will be used
time and again in developing key results.
The matrix A
L
is a generalized inverse of A, because for all x C
n
, we have A
L
(Ax) = x which implies
that A
L
A = I
n
, where I
n
denotes the nn identity matrix. But A
L
is not an inverse, because AA
L
= I
m
,
except under the conditions outlined in Property 1.11. Nor is the left inverse unique, as illustrated by the
following example.
Example 1.19: Let A =

1
2

. Then A has an innite number of left inverses, some of which include


[1 0] [0
1
2
] [1 1],
and convex combinations of these as well. But AA
L
,= I
2
for any of these left inverses.
Now, let us take b R(A) to be b =

3
6

. For each of the left inverses listed above, the we get A


L
b = 3, which is
the unique answer to Ax = b.
The lack of a unique left inverse arises because the action of A
L
is not specied on vectors b not in the
range space of A.
Example 1.20: Take the same A =

1
2

as in the previous example, and b =

3
4

, R(A). Then each of the left


inverses listed in the previous example yields a dierent solution:
[1 0]b = 3 [0
1
2
]b = 2 [1 1]b = 1.
We can now list some useful properties of left inverses.
Property 1.9. AA
L
is a left identity for A:
(AA
L
)A = A
Proof. Follows directly from Theorem 1.6.
Property 1.10. AA
L
is idempotent:
(AA
L
)(AA
L
) = AA
L
Proof. Follows directly from Property 1.9.
30 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Property 1.11. The left inverse is unique if and only if the range space of A is all of C
m
, i.e. A is square
and invertible.
Proof. We can write the equation dening the left inverse as
A
H
(A
L
)
H
= I
n
(1.19)
or, if we let the kth row of A
L
be a
H
k
, and denote the kth column of I
n
by e
k
, we have the following set of
equations
A
H
a
k
= e
k
, k = 1, . . . , n.
For A
L
to be unique, each of these matrix-vector equations must have a unique solution, which we know
from our previous discussion requires N(A
H
) = 0. This implies that A has linearly independent rows,
and that n m. But for A to have a left inverse, we require that the columns be linearly independent and
m n. Hence m = n, and A has full rank. Also, we would have
AA
L
= I
m
A
L
A = I
n
which denes the matrix inverse A
1
.
Illustration 1.14: Let us try and apply our newfound results on left inverses to the problem of deconvolution. For
the sake of simplicity, we will choose a very simple deconvolution problem, in which the lter h C
2
is h = [1, 1]
T
and
the input x C
4
has length 4. The matrix H is then
H =

1 0 0 0
1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1

.
This matrix is clearly a full column rank matrix. So how do we nd a left inverse for it? The proof of Theorem 1.6 is
unhelpful, because no left inverse is constructed there. However, the proof of Property 1.11 provides a hint by means of
(1.19). There, we nd that the rows of the left inverse satisfy a system of independent linear equations. If we write this
system for our particular choice of A = H, we get the following system of equations:

1 1 0 0 0
0 1 1 0 0
0 0 1 1 0
0 0 0 1 1

[c
1
, c
2
, c
3
, c
4
] =

1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

,
where c
H
i
C
5
is the ith row of the left inverse matrix.
We can solve this system of equations by inspection. If we write the solution in matrix form, we nd that the left
inverse must take on the following form:
(H
L
)
H
=

1

2

3

4

1
1
2

3

4

1
1
2
1
3

4

1
1
2
1
3
1
4

1
1
2
1
3
1
4
1

,
1.4. LEFT AND RIGHT INVERSES 31
where the constants
i
are arbitrary. Now given an observation b = Hx, we can determine x by simply computing
x = H
L
b. Furthermore, because H is full column rank, we get the unique solution regardless of how we choose the four
constants
i
. The solution agrees with our intuition. The lter h is a rst order dierence, and the rows of the left inverse
are shifted step functions (i.e., discrete versions of integrators).
What happens when b , R(H)? Consider the simple example of b = [1, 1, 1, 1, 1]
T
. We leave it to the reader to verify
that this b is not in the range space of H (consider the nature of h). If we compute H
L
b for this b, we get the solution
x = [5
1
4, 5
2
3, 5
3
2, 5
4
1], which obviously depends on the choice of the constants
i
. It is not clear that
this x constitutes any kind of solution, or how to choose the constants
i
so as to make it as close to a solution as
possible (in some sense). We will address these issues later, when we have more results on least squares theory.
1.4.2 Full Row Rank Case
In this case, we know that R(A) = C
m
, and there always exists a solution to the system Ax = b for every
b C
m
:
For each b C
m
, there exists at least one x C
n
such that Ax = b
Similar to the case of A having full column rank, we can construct a linear map, called the right inverse,
which given any b C
m
, returns an x C
n
for which Ax = b. In this case, however, we will explicitly
construct a right inverse of A.
Theorem 1.7. If R(A) = C
m
then there exists a matrix A
R
such that AA
R
b = b for all b C
n
.
Proof. Let x
k
be any solution to Ax
k
= e
k
, where e
k
C
m
is the kth column of the I
m
identity matrix.
Let A
R
= (x
1
x
2
. . . x
m
). It is an n m matrix. Now, for any b C
m
,
b = b
1
e
1
+b
2
e
2
+ +b
m
e
m
So that
A(A
R
b) = b
1
A(A
R
e
1
) +b
2
A(A
R
e
2
) + +b
m
A(A
R
e
m
)
= b
1
Ax
1
+b
2
Ax
2
+. . . b
m
Ax
m
= b
1
e
1
+b
2
e
2
+ +b
m
e
m
= b
The matrix A
R
is a generalized inverse of A, because AA
R
= I
m
. However, except under the conditions
outlined in Property 1.15, A
R
A = I
n
. Like the left inverse, the right inverse is also not unique, as illustrated
by the following example.
Example 1.21: Let A = [1 2]. Then A is full row rank, and all of the following are right inverses,

1
0

0
1
2

1
1

.
Linear combinations of these matrices with coecients that add up to one are also right inverses. But A
R
A ,= I
2
for any
of these matrices. Furthermore, for any b C, each of these matrices gives a dierent, valid solution for x = A
R
b.
32 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
We have used the same matrix A in this example as we did in Example 1.19, to illustrate the following
property of right inverses.
Property 1.12. Let A C
mn
. Then A
R
is a right inverse for A if and only if (A
R
)
T
is a left inverse
for A
T
.
Proof. From our study of left inverses, we know that a left inverse of A
T
is any matrix C that satises
CA
T
= I
m
. Taking the transpose of both sides of this relation, we nd that AC
T
= I
m
, which implies
that C
T
= A
R
. The converse follows by reversing the argument.
Some further properties that are analogous to the properties of left inverses are listed below.
Property 1.13. A
R
A is a right identity for A:
A(A
R
A) = A
Proof.
A(A
R
A) = (AA
R
)A = I
m
A = A
Property 1.14. A
R
A is idempotent:
(A
R
A)(A
R
A) = A
R
A
Proof.
(A
R
A)(A
R
A) = A
R
(AA
R
)A = A
R
I
m
A = A
R
A
Property 1.15. The right inverse is unique if and only if the null space of A is trivial, i.e. A is square
and invertible.
Proof. Left as an exercise.
Illustration 1.15: We will revisit the deconvolution problem of Illustration 1.14, this time, assuming a truncated
convolution with L < N. If we use the same h = [1, 1]
T
, then the matrix A that maps the input signals to the convolved
and truncated observations b is
A =

1 1 0 0
0 1 1 0
0 0 1 1

,
where we have chosen L = 3, c = 1 in (1.13).
For A, the range space is C
3
, because the rows are linearly independent. Furthermore, the null space of A is nontrivial.
Hence, for any b C
3
, there are always an innite number of x that satisfy the equation Ax = b. We can construct a
right inverse by setting up the system of equations outlined in the proof of Theorem 1.7:

1 1 0 0
0 1 1 0
0 0 1 1

[c
1
, c
2
, c
3
] =

1 0 0
0 1 0
0 0 1

,
1.5. PROJECTIONS 33
where c
i
C
4
is the ith column of the right inverse A
R
. Solving this system for c
i
, yields a right inverse matrix of the
form:
A
R
=

1

2

3

1
+ 1
2

3

1
+ 1
2
+ 1
3

1
+ 1
2
+ 1
3
+ 1

where the
i
are arbitrary constants. Now, one can easily verify that for any b C
3
, the choice of these constants dictates
which solution from the set of all possible solutions, one chooses. Unlike the left inverse case, there is no problem of a
solution not existing for a particular b, but rather, we must decide which of the innity of solutions we want to pick, and
choose the constants
i
accordingly. We will have to wait until we have covered least squares theory to make such a
selection.
1.5 Projections
We now know how to nd a solution to the equation Ax = b if a solution exists, by constructing left or right
inverses, as appropriate. But it is not at all clear what we should consider as a solution if an exact solution
either does not exist, or is not unique. When an exact solution does not exist, our intuition tells us that a
solution x for which A x is close to b will have to do. To try and nd such an x requires that we rst
clarify what we mean by close.
1.5.1 Approximation on a Subspace
We begin with a simpler form of the problem. Suppose we are given some vector b C
n
, and a subspace S
of C
n
, from which we would like to select a point that is closest to b. Figure 1.1 depicts the situation when
b is a point in Euclidean 3-space, and S is a plane. Since we are working in our vector space, we already
have a notion of distance, which we dened in Section 1.2.1. What we would like to do is choose the element
in S that minimizes the distance to x,
x = arg min
sS
|s x| . (1.20)
But how do we know that such an x exists, or that it is unique? Or, for that matter, that can we compute
it? The following examples illustrate some of the diculties that can arise, particularly when S is not a
subspace.
Example 1.22: Let S = s R : 0 s < 1. For x 1, there is no element of S closest to x.
Example 1.23: Let S = s R
2
: |s| = 1. Consider the problem
x = arg min
sS
|s 0| , (1.21)
that is, nding an approximation to 0 from the points in S. A closest point exists in this problem (in fact, any point in
S is a closest point), but the closest point is clearly not unique. Even more interesting, for any x ,= 0, there exists a
unique closest point to x in S!
34 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Example 1.24: Let S = span [1, 1]
T
R
2
. Now, for any point x = [x
1
, x
2
]
T
R
2
, the squared distance from any
point s S to x is simply f(s) = (s
1
x
1
)
2
+(s
2
x
2
)
2
. Any s S can be expressed as s = [1, 1]
T
for some R. So
we can rewrite f(s) as a function of : f() = ( x
1
)
2
+ ( x
2
)
2
. Now we apply the rst order necessary conditions
for a minimum of f(), by setting the derivative of f() with respect to to zero:
df()
d
= 4 2(x
1
+x
2
) = 0,
so that the unique minimizer of is =
1
2
(x
1
+x
2
).
4
Hence, for this S, the nearest point exists, is unique, and is of the
form
x =
x
1
+x
2
2

1
1

.
Example 1.24 suggests that for subspaces, we can nd a closest point, and that the closest point is unique.
We will, in fact, prove this fact in the subsequent sections, but will take a dierent approach that can be
generalized to our treatment in Part II of the course.
Illustration 1.16: Consider again our FIR lter illustration from before. There, we knew that the set of all valid
outputs for a xed lter h of length K, acting on signals of length N, formed a subspace of C
N+K1
:
V = y C
N+K1
: y = x h, x C
N
.
Suppose now that we measure y using some physical system. This measurement will be corrupted by some measurement
errors, or noise, so that we actually measure y = x h +n. How do we remove the noise n from our measurement y?
Well, if we know that the output should belong to the subspace V , we can try and nd a point in that space, y V , that
is closest to y, i.e.
y = arg min
vV
|v y| .
1.5.2 Orthogonality Principle
To solve (1.20), we need the notion of an orthogonal projection of a point onto a subspace. This is a
generalization of our intuition in 2D and 3D Euclidean spaces and Figure 1.1.
Denition 1.5. A point x is an orthogonal projection of x onto subspace S, if x S, and x x S

.
Example 1.25: Let S = span [1, 1]
T
R
2
, and let x = [2, 4]
T
. Then the point x = [3, 3]
T
is the orthogonal projection
of x onto S, because
x x =

1
1

We said that x is an orthogonal projection . . . because we were unsure if it is unique. The following
theorem allows us to say the orthogonal projection . . . . We must however, preface our claims with a
condition on existence, since we dont know whether orthogonal projections of points exist for arbitrary
subspaces of C
n
.
4
Strictly speaking, we only know that is an extremum of f(), but the second derivative of f() with respect to is 2 > 0,
so that must be a minimum. Furthermore, f() is convex, so it is, in fact, the global minimum.
1.5. PROJECTIONS 35
y
z
b

b
S
x
Figure 1.1: Orthogonal projection of a point in R
3
onto a two dimensional plane S.
Theorem 1.8. If the orthogonal projection of a point x C
n
onto a subspace S C
n
exists, then it is
unique.
Proof. Let x
1
, x
2
be two orthogonal projections of x onto S. Then by denition, we have
x
1
x S

, x
2
x S

.
Because S

is a subspace, it contains the dierence,


x x
1
(x x
2
) = x
2
x
1
S

.
However, x
2
x
1
S, because x
1
, x
2
S, and S is a subspace. Now, by Property 4, S

S = 0 , so that
x
2
= x
1
.
What makes the orthogonal projection of a point onto a subspace useful and interesting is the following
principle, which is arguably the most important and fundamental concept in this part of the book.
Theorem 1.9 (Orthogonality Principle). Let S be a subspace of C
n
, and x C
n
, and suppose that x S
satises:
x x S

.
Then
| x x| |y x|
for any y S, with equality i y = x. That is, if the orthogonal projection of x onto S exists, it is the
nearest point to x among all points y S.
36 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Example 1.26: Let us revisit our example of S = span [1, 1]
T
. We found that the optimal approximation to any
x R
2
was the vector x =
x
1
+x
2
2
[1, 1]
T
. We can check that this solution satises the orthogonality principle, by noting
that
s S

= [1 1]s = 0.
Hence, we check that
[1 1]

x
1
+x
2
2
x
1
x
1
+x
2
2
x
2

= x
1
+x
2
x
1
x
2
= 0.
Thus, for any x, our solution for x satises x x S

.
The orthogonality principle (or OP for short) is illustrated on our 3D example in Figure 1.1. It tells
us how to go about nding the solution to (1.20); we need only nd the point s S whose residual s x is
orthogonal to S. Because of the OP, we call the optimal x chosen in accordance with (1.20), when it exists,
the orthogonal projection of x onto S. To prove the OP, we need an extension of the Pythagorean theorem
to our vector space C
n
.
Theorem 1.10 (Pythagorean Theorem). For any x, z C
n
, x z = |x +z|
2
2
= |x|
2
2
+|z|
2
2
.
Proof.
|x +z|
2
= (x +z)
H
(x +z) = |x|
2
+|z|
2
+x
H
z +z
H
x.
The proof is completed by applying the denition of orthogonality.
Example 1.27: The converse of the Pythagorean theorem does not hold in C
n
. Consider the vectors
x =

y =

0
1

.
Then |x +y|
2
= |x|
2
+|y|
2
, but x
H
y = ,= 0, so that x , y.
With the aid of the Pythagorean Theorem, the OP can be easily proved.
Proof of Orthogonality Principle.
|y x|
2
= |y x + x x|
2
Because y, x S by hypothesis, and S is a subspace (and hence closed under addition), y x S. Also by
hypothesis, x x S

. Hence we can apply the Pythagorean Theorem to the right hand side:
|y x|
2
= |y x|
2
+| x x|
2
2
which leads to the inequality
| x x|
2
|y x|
2
.
Equality implies that |y x|
2
= 0, which, from the positivity property of the norm, implies that y = x.
1.5. PROJECTIONS 37
1.5.3 Projectors
Now, for every point x in C
n
, we would like to nd its orthogonal projection onto the subspace S. Unfortu-
nately, nothing we have seen so far guarantees existence of the orthogonal projection. It can certainly happen
that an orthogonal projection need not exist if S is not a subspace, as we saw in Example 1.22. This example
illustrates that we must be careful when discussing the orthogonal projection, simply because it might not
exist. Nonetheless, we may dene a projector P
S
for the situations when the orthogonal projection onto
S does exist, as the (possibly nonlinear) map that given any point x C
n
, returns its unique orthogonal
projection on S. Hence, we have the following
Denition 1.6 (Projector). A mapping P
S
: C
n
S is a projector associated with subspace S if it satises
P
S
(x) x S

x C
n
.
Combining Denition 1.6 with the Orthogonality Principle, we immediately obtain the following theorem.
Known alternately as the Nearest Point Theorem for obvious reasons (abbreviated as NPT), or the Projection
Theorem, it is second in importance only to the OP. It explicitly states the role of projectors in terms of the
approximation on a subspace problem.
Theorem 1.11 (Nearest Point Theorem). Suppose that a projector P
S
exists for a subspace S C
n
. Then
for any x C
n
, P
S
(x) is the unique vector in S that is closest to x, i.e.,
|x P
S
(x)| |x s|
2
s S, x C
n
with equality i s = P
S
(x).
The projector map has some very special properties, the most important one being linearity.
Theorem 1.12 (Linearity of Projectors). Let S be a subspace of C
n
and x
1
, x
2
C
n
. Then for any , C,
P
S
(x
1
+x
2
) = P
S
(x
1
) +P
S
(x
2
).
Proof.
[P
S
(x
1
) +P
S
(x
2
)] [x
1
+x
2
] = [P
S
(x
1
) x
1
] + [P
S
(x
2
) x
2
]
The right hand side is clearly an element of S

, because S

is a subspace, hence P
S
(x
1
) + P
S
(x
2
) is
an orthogonal projection of x
1
+ x
2
on S. Furthermore, the orthogonal projection of a point is unique.
Hence P
S
(x
1
) +P
S
(x
2
) must be P
S
(x
1
+x
2
).
From Theorem 1.12 it follows that if the projector for a subspace S exists, then it must have a matrix
representation P C
nn
. We will explicitly construct this representation later in this chapter. First,
however, we will derive many of the properties of projectors, without being assured of existence.
5
We will
often drop the subscript that indicates the subspace with which a projector P is associated.
Property 1.16. Ps = s for all s S.
5
As we discussed earlier, the reason for this circuitous approach is that these properties will be directly usable in the more
general framework of innite dimensional spaces. The same would not be true if we proved the properties using the expression
for the projector that we present much later.
38 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Proof. Note that for all s S,
s s = 0 S

.
By the uniqueness of the orthogonal projection of s onto S, we must have Ps = s.
Property 1.17. R(P) = S
Proof. To show equality of these two subspaces, or more generally, of two sets A and B, we will often use the
method of showing A B and A B. For this case, by denition, P : C
n
S, so clearly R(P) S. The
reverse containment follows from Property 1.16, because for every s S, s = Ps, so that S R(P).
Example 1.28: Let us return to our simple example with S = span [1, 1]
T
. We know from our previous work that for
every vector x R
2
, we must have
Px =
1
2

x
1
+x
2
x
1
+x
2

.
From this equation, we can infer the form of the projector matrix to be
P =
1
2

1 1
1 1

.
This P clearly satises Property 1.17, because R(P) = span [1, 1]
T
= S. We can also easily check Property 1.16, which
is simply:
P

1
1

1
1

.
The following theorem gives us an alternate characterization of a projector in terms of idempotence and
Hermitian symmetry. Recall that a matrix A is idempotent if AA = A, and a matrix is Hermitian if
A
H
= A.
Theorem 1.13. An n n matrix P is a projector associated with the subspace R(P) if and only if P is
idempotent and Hermitian, i.e., P
2
= P
H
= P.
Proof. We will begin with the suciency of the two conditions. Let P be a Hermitian, idempotent matrix,
and let S = R(P). Then for every s S, there exists some z C
n
such that s = Pz. Hence for any
x C
n
, we have
s
H
(x Px) = (Pz)
H
(x Px)
= z
H
(P
H
x P
H
Px)
= z
H
(Px P
2
x) because P
H
= P
= z
H
(Px Px) because P
2
= P
= 0.
So that x Px S x C
n
. Thus, P is a projector associated with the subspace S.
Conversely, if P is the projector associated with S, then, by Property 1.17, R(P) = S, and by Prop-
erty 1.16, Ps = s for all s S. Thus, for any z C
n
,
P
2
z = PPz = Ps = s,
1.5. PROJECTIONS 39
where s = Pz. Thus, P is idempotent. To prove it is Hermitian:
R(P) = S and (x Px) S

= (Pz)
H
(x Px) = 0 z, x C
n
= z
H
(P
H
x P
H
Px) = 0 z, x C
n
= P
H
x P
H
Px C
n
= P
H
x P
H
Px = 0 x C
n
, (Property 1.6) = P
H
= P
H
P
= P = (P
H
P)
H
= P
H
P = P
H
Corollary 1.1. If P is a projector associated with S, then x S

, Px = 0.
Proof.
x S

= s
H
x = 0 s S
= (Pv)
H
x = 0 v C
n
, x S

(because R(P) = S)
= v
H
Px = 0 v, x C
n
(because P
H
= P)
= Px C
n
= Px = 0
With the properties derived so far, we can prove uniqueness of the projector associated with a subspace.
Theorem 1.14 (Uniqueness of the Projector). If the projector associated with a subspace S exists, then it
is unique.
Proof. Let P
1
and P
2
be two projectors associated with a subspace S. Then
x = P
1
x + (I P
1
)x x C
n
= P
2
x = P
2
P
1
x +P
2
(I P
1
)x = P
1
x +0
recognizing that P
1
x S and (IP
1
)x S

by Denition 1.6, so that P


2
((IP
1
)x) = 0 by Corollary 1.1.
Thus P
1
= P
2
.
Property 1.18. P is a projector if and only if I P is a projector. Furthermore, if P is a projector
associated with subspace S, then I P is a projector associated with S

. Conversely, if I P is a projector
associated with S

, then P is the projector associated with (S

.
Proof. Suppose P is the projector associated with S. Then x C
n
, (I P)x S

, so that I P : C
n

. Furthermore, for all x C


n
, (I P)x x = Px S (S

. Hence (I P)x x (S

, and
I P is the projector for S

by Denition 1.6.
Next, to prove the converse, let Q = I P be the projector for S

, and replace P by Q and S by S

in the previous argument. It then follows that I Q = P is a projector for (S

.
Note that in C
n
we will obtain a sharper result, stated in Corollary 1.3. However, because at this point
we have not yet proven Property 1.7 (which we intend to prove using the properties of projectors), we can
not claim (S

= S.
40 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Example 1.29: Let
P =

1 0
0 1

.
Then we see that P satises P = P
2
= P
H
. Thus, the identity matrix is trivially a projector for the space C
2
. Also, by
Property 1.18, it follows that the zero matrix P = 0 is a projector for the zero subspace S = 0.
Example 1.30: Let
P =
1
2

1 1
1 1

.
This is the projector for Example 1.24. It is easily veried that P
2
= P
H
= P. Also, by Property 1.18, the following
matrix
I P =
1
2

1 1
1 1

,
is also a projector, and is associated with S

.
Next, we establish the fact (familiar from geometry in R
2
and R
3
), that projection can only shrink the
norm (length) of a vector.
Corollary 1.2. |Px| |x| with equality if and only if Px = x.
Proof. For any x C
n
, x = Px + (I P)x with Px (I P)x. Thus by Theorem 1.10,
|x|
2
= |Px|
2
+|(I P)x|
2
from which the inequality is immediate. Moreover, the equality is achieved if and only if (I P)x = 0, or
Px = x.
1.5.4 Projectors in C
n
While the characterizations of projectors treated in the previous subsection are important, both in the
solution of problems using projectors, and in gaining an understanding of their structure, we have not yet
addressed the problem of nding a projector. In fact, we have not even demonstrated that a projector exists
for an arbitrary subspace. Establishing existence of the projector for an arbitrary subspace S of C
n
is the
goal of this subsection.
Theorem 1.15 (Existence of a Projector). Every subspace S of C
n
has an associated projector matrix P.
Furthermore, if the columns of B C
nr
constitute a basis for S, then the matrix
P

=B(B
H
B)
1
B
H
(1.22)
is the projector for S.
Proof. Every subspace of C
n
has a basis so that B can be formed. First, note that B
H
B is nonsingular
because
B
H
Bx = 0 = x
H
B
H
Bx = 0 = |Bx|
2
= 0 = Bx = 0 = x = 0,
because the columns of B are linearly independent. Hence N(B
H
B) = 0, and thus B
H
B is nonsingular.
Hence, P dened in (1.22) exists for any subspace S.
1.5. PROJECTIONS 41
Next, consider any x C
n
. Now for any s S, there exists z C
r
such that s = Bz. Now,
s
H
(Px x) = z
H
B
H
(B(B
H
B)
1
B
H
x x) = z
H
B
H
x z
H
B
H
x = 0,
so that Px x S

x C
n
. Also, clearly P : C
n
S, hence by Denition 1.6, P is a projector for S.
Uniqueness of the projector follows from Theorem 1.14.
Note that P takes on a particularly simple form if the columns of B constitute an orthonormal basis for
S, for then B
H
B = I, and P = BB
H
.
Example 1.31: Let S = span [1, 1]
T
, and let us construct the projector for S. Using (1.22), and [1, 1]
T
as a basis for
S, we nd
P =

1
1

[1 1]

1
1

1
[1 1] =
1
2

1 1
1 1

,
which agrees with the expression for P we found previously.
A basis for S is not unique (in the previous example, we could have taken any nonzero multiple of [1, 1]
T
as a basis). It would seem that dierent choices of B in the above construction might lead to dierent
projection matrices P. But, Theorem 1.14 guarantees that this cannot happen.
Example 1.32: Let
S = span

1
0
1

0
1
1

.
If we construct the projector using the basis
B =

1 0
0 1
1 1

,
we nd
P =
1
2

1 0 1
0 2 0
1 0 1

.
We get the same projector if we use the following basis instead
B =

2 1
1 5
3 6

.
Illustration 1.17 (Denoising and FIR lters): Theorem 1.14 gives us a means of solving our denoising problem
for the output of an FIR lter. Recall that we wished to nd the point v V , where V was the subspace of all possible
outputs from the FIR lter, that was closest to our measurement y in a least squares sense. According to (1.22), we can
do so if we have a basis for V . From Illustration 1.3 we know that V = R(H), where for a K tap lter h, with a length
N input signal, H is the K +N 1 N convolution matrix. We also claimed that H has full column rank provided that
h ,= 0. Thus, we can construct the projector for V as
P
V
= H(H
H
H)
1
H
H
.
42 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
This thus allows us to solve our problem by using Theorem 1.11. We can immediately write down that
arg min
vV
|v y| = P
V
y.
Thus, we now have a simple formula for nding the optimal output signal v in the sense that it is closest to our measurements
y in a least squares sense!
Now that we have existence of a projector for every subspace S C
n
, we can nally prove Property 1.7.
Let us rst restate Property 1.7 as a theorem and prove it.
Theorem 1.16 (Property 1.7). If S is a subspace of C
n
then (i) C
n
= S S

, (ii) (S

= S.
Proof. The previous theorems established the existence of a unique projector for every subspace S. Let P
be the projector for S. Then every x C
n
can be decomposed as
x = Px + (I P)x.
Here, Px S and (I P)x S

. We thus have C
n
= S +S

. By Property 1.4, S S

= 0, so the sum
is actually a direct sum.
To prove (ii), we only need to show that S (S

, because the reverse containment S (S

has
already been demonstrated in Property 1.3. Again, let P be the projector for S. Then (I P)z S

for
all z C
n
. Then,
x (S

= x (I P)z z C
n
= x
H
(I P)z = 0 z C
n
= (I P)x z z C
n
= (I P)x = 0
= x = Px = x S.
Corollary 1.3. P is a projector for S C
n
if and only if (I P) is a projector for S

.
Proof. This follows immediately from Property 1.18 and Theorem 1.16 (ii).
The next result on projectors is known as the Projector Update Formula. Given two subspaces V and U
of C
n
, this result allows us to express the projector on the subpaces V +U in terms of the projector P
V
on
V , and an update term accounting for the change from V to V + U. This result, too, is a corollary of the
previous results, but because of its importance we state it as a separate theorem. We dene the projection
P
W
U of one subspace U onto another subspace W as the subspace
P
W
U = w W : w = P
W
u, u U
Theorem 1.17. [Projector Update Formula] Let V and U be subspaces of C
n
. Then
P
V +U
= P
V
+P
P
V

U
(1.23)
Proof. One way to prove the result, is using Theorem 1.13 and Theorem 1.16 (i). The details of the proof
are left as an exercise.
1.6. FOUR FUNDAMENTAL SUBSPACES 43
Theorem 1.17 is extremely useful, because it expresses the new information involved in a subspace update.
In particular, it leads to ecient update of projectors in spectral analysis, and to the powerful idea of
innovations in recursive least-squares ltering and in the representation of random processes.
Some further results on projectors on subspaces of C
n
are given below, with proofs left as exercises.
Property 1.19. The rank of a projector P C
nn
satises rank P n, with equality if and only if P = I,
the n n identity matrix.
Property 1.20. The rank of a projector P C
nn
satises rank P = tr [P].
1.6 Four Fundamental Subspaces
In preparation for the next section on solving inverse problems, we need a few additional properties of
matrices and the subspaces associated with them. Associated with an mn matrix A are four fundamental
spaces, R(A), R(A
H
), N(A), and N(A
H
). These subspaces are related, as stated explicitly in what is
sometimes called the Fundamental Theorem of Linear Algebra:
Theorem 1.18 (Fundamental Theorem of Linear Algebra). If A is an mn matrix, then
C
m
= R(A) N(A
H
), R

(A) = N(A
H
), N

(A
H
) = R(A)
and
C
n
= R(A
H
) N(A), R

(A
H
) = N(A), N

(A) = R(A
H
).
Proof. We can obtain the last three results by proving only
R

(A
H
) = N(A),
and utilizing Theorem 1.16. The remaining results can be obtained by replacing A
H
by A. To prove this
relation, note that
x R

(A
H
) x
H
A
H
z = 0 z C
m
Ax C
m
Ax = 0 x N(A).
Example 1.33: Let
A =

1 2
2 4
3 6

.
Then we nd that
R(A) = span

1
2
3

R(A
H
) = span

1
2

,
and that
N(A) = span

2
1

N(A
H
) = span

1
1
1

3
0
1

.
We can easily verify that (R(A))

= N(A
H
) and that (R(A
H
))

= N(A).
44 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
n
C
m
Cm
A
H
A
0
0
( )
H
N A
( ) R A
( )
H
R A
( ) N A
Figure 1.2: Schematic diagram of the four fundamental subspaces and the relationships between them.
Figure 1.2 illustrates the Fundamental Theorem of Linear Algebra, including the orthogonality relations
between the subspaces. Note that Theorem 1.18 also establishes that the nullity of A, which is dimN(A),
is n rank A
H
= n rank A. Some properties of the following composite operators
AA
H
: C
m
=C
m
and A
H
A : C
n
=C
n
also follow from Theorem 1.18.
Corollary 1.4.
R(AA
H
) = R(A) N(AA
H
) = N(A
H
)
and
R(A
H
A) = R(A
H
) N(A
H
A) = N(A)
with the further result that
rank AA
H
= rank A
H
A = rank A.
Proof. It suces to show that N(AA
H
) = N(A
H
), because it will imply equality of the orthogonal comple-
ments R(AA
H
) and R(A). The remaining results will follow by replacing A with A
H
. The rank result then
follows by observing rank A = dimR(A) = dimR(A
H
). To prove the equality of the nullspaces, we note
that N(AA
H
) N(A
H
), because if A
H
x = 0 then AA
H
x = 0. The reverse containment follows from:
x N(AA
H
) = x
H
AA
H
x = 0
= A
H
x A
H
x
= A
H
x = 0 = x N(A
H
).
1.7. LEAST SQUARES SOLUTIONS 45
1.7 Least Squares Solutions
Let us now return to the problem of solving Ax = b for x given b and A, and apply the machinery of
the last several sections to the problem. We can now use our notion of an orthogonal projection to nd an
approximate solution to Ax = b when A has full column rank, and an exact solution does not exist (i.e.
b R(A)). First, recall that if A has full column rank, then a solution may not exist, but if it does, then it
is unique. Our goal, then, is to nd an x for which A x b in some sense, when an exact solution does not
exist. But we have just considered the problem of approximating a vector b by an element of a subspace S.
Suppose we decide to solve Ax = b in the following manner:
x
LS
= arg min
x
|Ax b|
2
(1.24)
which is referred to as a least squares solution, because it minimizes the square of the residual. However, as
x ranges over C
m
, Ax takes values in R(A). So an exactly equivalent means of solving the same problem,
is:
1. Find

b
LS
= arg min
sR(A)
|s b|
2. Find x
LS
such that A x
LS
=

b
LS
The equivalence of these problems forms the basis for the statement and proof of the following theorem:
Theorem 1.19 (Least Squares Theorem). Suppose the columns of mn matrix A are linearly independent
(A is full column rank). Then there exists a unique element x
LS
C
n
that satises
|A x
LS
b| |Ax b| x C
n
,
and furthermore x
LS
is given by A
LS
b where A
LS
is the unique matrix
A
LS
= A
L
P
R(A)
and A
L
is any left inverse of A.
Proof. Because of the equivalence of the following two expressions,
min
xC
n
|Ax b| = min
sR(A)
|s b| ,
we can apply the Nearest Point Theorem, to conclude that there is a unique point in R(A) that is closest
to b, which is given by

b
LS
= P
R(A)
b. Since A has full column rank, Ax =

b
LS
has a unique solution, and
by Theorem 1.6, the unique solution can be obtained by letting x = A
L

b
LS
for any left inverse A
L
. Finally,
because, for every b C
m
, the corresponding LS solution x
LS
is unique, it follows that A
LS
must also be
unique.
While the Least Squares Theorem gives us the existence and uniqueness of the least squares solution, it
does not completely answer the question of how to compute the least squares solution, especially since we
do not have a constructive method for nding left inverses of a matrix A. A more direct, less intuitive result
gives the least squares solution in a computationally direct form known as the Normal Equations.
46 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Theorem 1.20 (Normal Equations). Let A C
mn
, b C
m
. Then x
LS
satises
|A x
LS
b| |Ax b| x C
n
if and only if it satises the normal equations
A
H
A x
LS
= A
H
b.
Proof. From the OP, we know that the residual A x
LS
b must be orthogonal to the subspace we are using
to approximate b, in this case, R(A). Thus, (A x
LS
b) R(A)

. By Theorem 1.18, R(A)

= N(A
H
),
which implies that A
H
(A x
LS
b) = 0.
Corollary 1.5. If A has full column rank, then
A
LS
= (A
H
A)
1
A
H
. (1.25)
Proof. If A has full column rank, then A
H
A is invertible by Corollary 1.4, and the normal equations have
a unique solution
x
LS
= (A
H
A)
1
A
H
b.
The proof is completed by noting that x
LS
= A
LS
b, and that A
LS
is unique.
n
C
m
Cm
( ) R A
P
( ) R A
P b
L
A
LS
A

LS
x
( )
H
N A
( ) R A
b
Figure 1.3: Geometric interpretation of the LS solution as a direct mapping from b to x
LS
, and as a mapping
from b to P
R(A)
b, followed by a mapping from P
R(A)
b to x
LS
.
The same result can also be obtained using the constructive denition of a projector matrix and the
properties of left inverses. Figure 1.3 illustrates both views of the least squares solution in terms of the
various spaces associated with A.
We will briey list some important properties of A
LS
.
Property 1.21. A
LS
is the unique left inverse of A with the same nullspace as A
H
.
Proof. It is clear that A
LS
is a left inverse, because
A
LS
A = A
L
P
R(A)
A = A
L
A = I,
1.7. LEAST SQUARES SOLUTIONS 47
where P has no eect on the columns of A, which are (by denition) already in the range space of A. Now,
let

A
L
be any left inverse of A that satises N(

A
L
) = N(A
H
). Then because the least squares residual
(b AA
LS
b) lies in R(A)

= N(A
H
), we have

A
L
(b AA
LS
b) = 0 b C
m

A
L
b =

A
L
AA
LS
b = A
LS
b (using

A
L
A = I).
Hence, A
L
= A
LS
.
Property 1.22. AA
LS
is the unique projector matrix for R(A)
Proof. We present two proofs. First, using Theorem 1.19, we have for all b C
m
,
AA
LS
b = A(A
L
P
R(A)
)b = (AA
L
)P
R(A)
b = P
R(A)
b
where we have used Theorem 1.6, which states that AA
L
c = c for all vectors c R(A) with c = P
R(A)
b.
Alternately, from Theorem 1.20, the expression for AA
LS
is:
AA
LS
= A(A
H
A)
1
A
H
which, by Theorem 1.15, is the expression for the projection matrix for the subspace spanned by the columns
of A, i.e. R(A).
Example 1.34: Consider the simple example of

1
2

x =

3
4

,
which has no solution. We will solve this system using both Theorem 1.19 and Theorem 1.20 in turn. First, by Theorem 1.19,
we nd the projection matrix for R(A), which in this case, by (1.22), is simply
P
R(A)
=
1
5

1 2
2 4

.
We also need a left inverse of A, which we nd from Example 1.19 to be A
L
= [1 0]. Thus, computing the least squares
solution by Theorem 1.19,
x
LS
= [1 0]
1
5

1 2
2 4

3
4

=
11
5
.
The same solution is obtained by choosing A
L
= [0
1
2
] or A
L
=
1
4
[2, 1].
On the other hand, if we use Theorem 1.20, we write the normal equations as
[1 2]

1
2

x
LS
= [1 2]

3
4

,
for which the unique solution is x
LS
=
11
5
.
We can also compute A
LS
for this example using both approaches. First, from Theorem 1.19, we nd that A
LS
=
A
L
P
R(A)
for any A
L
a left inverse of A. Using any of the left inverses from Example 1.19 we get the same answer
A
LS
=
1
5
[1 2]. We get the same expression if we use Theorem 1.20.
48 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Example 1.35: Let us take a slightly more complicated example. Suppose we wish to solve

1 0 1
0 1 0
1 0 0
1 0 1

x =

1
0
0
1

.
One can readily verify that the columns of A are linearly independent, because A
H
A is a diagonal matrix with nonzero
diagonal entries (thus, A
H
A is full rank, and by Corollary 1.4, A is full column rank). We can also determine by inspection
that the right hand side, b, is not in the range space of A, so that no solution exists. This time, we will nd the least
squares solution using only the normal equation, which are

3 0 0
0 1 0
0 0 2

x
LS
=

2
0
0

.
Solving this yields the unique solution x =
2
3
[1 0 0]
T
.
Illustration 1.18 (Least Squares Deconvolution): Let us return to our FIR deconvolution problem. In matrix-
vector form, we had an observation y C
N+K1
, and an unknown input signal x C
N
, and the two were related by
y = Hx, where H C
(N+K1)N
is the convolution matrix associated with a known K-tap lter h. Now, given a noisy
measurement b = y + n, we know that it is unlikely that b R(H), so that no solution for x exists. However, we can
solve for x in a least squares fashion by solving the normal equations:
H
H
H x
LS
= H
H
b.
This will give us the best candidate for an input signal, in the sense that
x
LS
= arg min
xC
N
|b Hx|
2
.
Because we have eectively inverted the convolution operation, this process for nding x
LS
is known as least squares
deconvolution.
If H is a circular convolution, then we know that it is diagonalized by a DFT matrix: H = QQ
H
, where is a
diagonal matrix. We can solve the normal equations explicitly in this case, as
Q
H
Q
H
QQ
H
x
LS
= Q
H
Q
H
b
Q[[
2
Q
H
x
LS
= Q
H
Q
H
b
x
LS
= Q
1
Q
H
b.
Of course, for this example H must be square and invertible, so that the LS solution is exact in this instance.
1.8 Minimum Norm Solutions
So far, we have learned how to solve the problems of nding the best approximation to a point on a subspace,
and how to nd an approximate (in the sense of least squares) solution to Ax = b, when A is full column
rank. Let us now consider the next case of interest, in which A has full row rank, and thus at least one
solution to Ax = b is guaranteed to exist for every b C
m
. The problem now is to try and pick one of the
(innitely) many solutions to Ax = b.
1.8. MINIMUM NORM SOLUTIONS 49
1.8.1 Choosing a Solution
From our previous discussions, we know that the solution set is a translated version of the subspace N(A),
also called a linear variety, as depicted in gure 1.4. We must choose a point in this translated subspace as
our desired solution, since every vector in this set satises the equation Ax = b.
1
x
2
x
( ) N A
2
{ : } T = = x Ax b R
Figure 1.4: The solution set of Ax = b for A with full row-rank is a linear variety. The diagram illustrates
the 2-D case.
Unfortunately, there is no denitive answer to this problem. In many cases, the structure of the physical
problem that gave rise to the system of equations may suggest a means of selecting a solution. In a general
context, however, without knowledge of the problem structure, we might resort to any of the following
criteria:
minimum energy, i.e., of all x that satisfy Ax = b, choose that x which has the smallest Euclidean
norm,
closest solution to some nominal point, where some x
0
serves as a nominal solution that we know the
solution should be close to,
minimize the worst case error among all solutions with a bounded energy.
The rst two criteria are straightforward. The third is depicted in gure 1.5. Here we are assuming that
the true solution x

has energy that is bounded by some unknown constant E, and lies in the set F. We
then pick the solution that minimizes the worst case error among all solutions with energy bounded by E.
All three of these criteria can be satised by one method of solution selection, in which we pick the
minimum norm solution x
MN
(note the absence of a because this is an exact solution), i.e., the solution
that satises:
|x
MN
| |x| x T
where T = x C
n
: Ax = b is the solution set. How the minimum norm solution satises the rst of our
three criteria is obvious, since the Euclidean norm of a vector is equivalent to the square root of its energy.
50 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
1
x
2
x
S
*
x
minmax
x
{ }
2
2
: E = x x R
F
Figure 1.5: If we know a priori, that the energy of the true solution x

is bounded above by some constant


E
2
, then we restrict our attention to the set F illustrated in this diagram. We then seek the solution x
minmax
that minimizes the largest error we could make over all possible choices of x

.
Proving that the minimum norm solution satises the other two criteria is slightly more dicult, and will
be postponed until after we have covered the minimum norm solution in detail.
1.8.2 Finding the Minimum Norm Solution
Let us rst consider an example to try and guess how the minimum norm solution might be found.
Example 1.36: Let A = [1, 1], and suppose b = 3. For simplicity, we will restrict our attention to solutions of Ax = b
that are real, i.e. x R
2
. Then the set T of all possible solutions to Ax = b is simply
T = x R
2
: x
1
+x
2
= 3,
which is shown in Figure 1.6, and is a linear variety. For any x T, we can reparameterize the vector as
x =

R.
The energy (norm squared) of this vector is simply f() =
2
+(3)
2
. We can apply the rst order necessary conditions
for optimality by setting the derivative of f() with respect to equal to zero:
df()
d
= 2 2(3 ) = 4 6 = 0,
from which we conclude that
x
MN
=

3
2
3
2

.
This solution is plotted on Figure 1.6.
Note that for any x T, the norm |x| is equal to the distance between the origin 0 and the point x. The
point of minimum norm in T is therefore the one closest to 0. If follows from the geometry of the projection
1.8. MINIMUM NORM SOLUTIONS 51
that this point is obtained by dropping a perpendicular to T from 0. Because T is a line parallel to N(A),
this perpendicular is N(A)

. The solution x
MN
should then be the unique point in the intersection of T
and N(A)

. This result is suggested by Figure 1.6, in which the subspace N(A)

is shown to include the


optimal x
MN
.
1
x
2
x
2
1 2
( ) { : 0} N x x = + = A x R
2
1 2
{ : 3} T x x = + = x R
( ) N
^
A
min
1
3
2
1

=


x
Figure 1.6: A specic example of the construction of the minimum norm solution.
The generalization of this geometrical explanation is stated in two parts. The rst is the Dual Projection
Theorem, which states that the element of a linear variety T = x
p
+S with minimum norm is the unique
point in the intersection of T and S

. The second is the Minimum Norm Theorem stated later.


Theorem 1.21 (Dual Projection Theorem). Let T = x
p
+ S be a translated subspace of C
n
. Then the
element of T with minimum norm, i.e.
t
MN
= arg min
tT
|t| ,
exists, is unique, and it is the only point that satises t
MN
T S

. Furthermore t
MN
= P
S
t for any
t T.
Proof. The proof of this theorem closely follows the geometric argument given above. It reduces the problem
of nding the minimum norm element of a linear variety to an application of the Nearest Point Theorem.
Every t T can be written as t = x
p
+s for some s S. Hence,
t
MN
= arg min
tT
|t| = x
p
+ arg min
sS
|x
p
+s| = x
p
+ arg min
sS
|s (x
p
)|
= x
p
+P
S
(x
p
) = (I P
S
)x
p
= P
S
x
p
.
Because S has an associated projector P
S
, P
S
= I P
S
exists and P
S
x
p
is unique for every x
p
.
Furthermore, t T, t = x
p
+s for some s S. Hence
P
S
t = P
S
(x
p
+s) = P
S
x
p
= t
MN
t T
as claimed. Thus the minimum norm element t
MN
of T exists, is unique, and t
MN
= P
S
t t T. Finally,
we prove that T S

= t
MN
. Let t T S

. Then because t S

, we must have
t = P
S
t = t
MN
.
52 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Theorem 1.22 (Minimum Norm Theorem). Let A be an mn matrix with rank m, and let b be any vector
in C
m
. Then amongst all x C
n
satisfying Ax = b, there exists a unique solution x
MN
with least norm,
and it lies in R(A
H
). Moreover, it is a linear function of b given by x = A
MN
b, where
A
MN
= A
H
(AA
H
)
1
Proof. We rst prove that the solution set T = x C
n
: Ax = b is a linear variety. Because A has full
row rank, there exists a x
p
C
n
such that Ax
p
= b. Now for any n N(A),
A(x
p
+n) = Ax
p
+An = b +0 = b,
so that the set of all solutions T x
p
+ N(A). Conversely, if we take any solution t T, then
0 = At b = At Ax
p
= A(t x
p
),
so that t x
p
N(A) or t x
p
+ N(A), implying T x
p
+ N(A). Hence T = x
p
+ N(A).
We can now apply the Dual Projection Theorem, to conclude that the minimum norm solution is the
unique vector that satises x
MN
T N(A)

= T R(A
H
). Moreover,
x
MN
R(A
H
) = x
MN
= A
H
z for some z C
m
= b = Ax
MN
= (AA
H
)z.
The mm matrix AA
H
is invertible, because by Corollary 1.4 it has rank m. Therefore, z = (AA
H
)
1
b,
which implies that
x
MN
= A
H
(AA
H
)
1
b.
Example 1.37: Returning to Example 1.36, we nd that
x
MN
= A
MN
b =
1
2

1
1

3 =

3
2
3
2

.
Example 1.38: Let
A =

1 0 1
0 1 0

and let b = [1, 2]


T
. We will compute the minimum norm solution using all four means to demonstrate that they are, in
fact, equal. First, we can write the solution set as
T = x R
3
: x
2
= 2, x
1
+x
3
= 1.
Again, the vector x T can be reparameterized as x = [, 2, 1 ]
T
. We could then form the energy cost function f()
and set the rst derivative to zero. However, we know from Example 1.36, that the form of the nal solution is (because
x
2
does not depend on )
x
MN
=

1
2
2
1
2

.
1.8. MINIMUM NORM SOLUTIONS 53
Next, let us compute x
MN
using the geometry of the problem. First, we note that by inspection, A has a one dimensional
null space, that can be written as
N(A) = span

1
0
1

.
We can then look for the unique x
MN
T N(A)

. Now, since x N(A)

n
H
x = 0 n N(A), we arrive at
the following conditions:
2 = x
2
From x T
1 = x
1
+x
3
From x T
0 = x
1
x
3
From x N(A)

from which we nd the unique solution x


MN
= [
1
2
, 2,
1
2
]
T
, as before.
We can solve for x
MN
in yet another manner by using the relationship x
MN
= P
R(A
H
)
x
p
, where x
p
is any solution
to Ax = b. We choose the following for x
p
:
x
p
=

1
2
0

.
The projector for R(A
H
), can be found as P
R(A
H
)
= B(B
H
B)
1
B
H
by Theorem 1.15 with B = A
H
since the rows of
A form a basis for R(A
H
). We thus nd
P
R(A
H
)
=
1
2

1 0 1
0 2 0
1 0 1

,
with which we can compute
x
MN
= P
R(A
H
)
x
p
=
1
2

1
4
1

.
Finally, if we compute A
MN
, we get
A
MN
=

1 0
0 1
1 0

2 0
0 1

1
=

1
2
0
0 1
1
2
0

.
Computing x
MN
= A
MN
b yields exactly the same answer as before.
Illustration 1.19 (Tomography): In this illustration, we will study the minimum norm solution to the tomography
problem. In its most common form, tomography consists of trying to reconstruct a 2D cross sectional density f(x, y) of
some object of interest, from a set of line integrals of that density, i.e. measurements of the form
b
i
=

i
x+
i
y=c
i
f dl.
These measurements can be made via a number of techniques, one example being the ubiquitous medical Computed
Tomography (CT) scanner, which obtains the measurements b
i
by measuring X-ray absorption through the sample along
dierent lines.
Let us consider the tomographic problem in a very simplied form. Suppose that we wish to determine the entries of
a matrix X C
nn
(a discrete version of the cross sectional density f(x, y)), but we are given measurements that are
54 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
sums of the entries of X along rows and columns. In particular, let us assume n = 3, so that X can be written as
X =

x
1,1
x
1,2
x
1,3
x
2,1
x
2,2
x
2,3
x
3,1
x
3,2
x
3,3

.
Furthermore, suppose that our measurements b C
4
consist of the sums of X along the central column, row, diagonal
and antidiagonal:
b =

x
1,2
+x
2,2
+x
3,2
x
2,1
+x
2,2
+x
2,3
x
1,1
+x
2,2
+x
3,3
x
1,3
+x
2,2
+x
3,1

. (1.26)
Our intuition tells us that b does not uniquely determine X. It is not necessarily clear, however, if any of the elements
of X might be uniquely determined. Furthermore, even if we cannot recover X uniquely, what is the minimum norm
solution? To answer these questions, we rst have to rewrite the measurement equation in matrixvector form. The rst
step in such a reformulation is to write X as a vector x, which is known as lexigraphical reordering. Suppose we chose to
dene x C
9
as the concatenations of the columns of X, denoted by vec(X):
x = [x
1,1
, x
2,1
, x
3,1
, x
1,2
, x
2,2
, x
3,2
, x
1,3
, x
2,3
, x
3,3
]
T
.
Then, because (1.26) is linear, there is a matrix representation A C
49
such that b = Ax.
Without writing A in explicit form, how can we explore the properties of its fundamental spaces? For such a simple
case, we can do much of the reasoning by inspection. For example, A must have full row rank, which we can argue by
mapping the following input matrices X into measurement vectors b:

1 0 0
0 0 0
0 0 0

0 1 0
0 0 0
0 0 0

0 0 1
0 0 0
0 0 0

0 0 0
1 0 0
0 0 0

.
Using (1.26), we nd that each of these inputs gives rise to a measurement of the form b = e
i
, where e
i
is a column of
the 4 4 identity matrix. It thus follows that R(A) = C
4
, since any vector y C
4
can be written as the measurement of
a linear combination of these inputs.
We can construct the nullspace of A by inspection in this simple case as well. The reader can quickly verify that each
of the following matrices corresponds to a linearly independent null vector of A after lexigraphical reordering:

1 0 0
0 0 0
0 0 1

0 1 0
0 0 0
0 1 0

0 0 1
0 0 0
1 0 0

0 0 0
1 0 1
0 0 0

1 1 1
1 2 1
1 1 1

. (1.27)
Since there are exactly 5 linearly independent null vectors, these inputs constitute a basis for the null space of A. From the
structure of these null vectors, it immediately follows that no element of X is uniquely determined by the measurements b
(the exact argument is left as an exercise). In the context of tomography, these are ghost images, because their presence
cannot be determined from the measurements.
To compute the minimum norm solution for a given set of measurements, we solve x
MN
= A
H
(AA
H
)
1
b. The oper-
ation of computing A
H
y arises frequently in tomography problems, and has been given the special name of backprojection,
because the measurements y
i
are projected back along the direction that they were measured. The matrix (AA
H
)
1
can
be found after some algebra to be
(AA
H
)
1
=

3 1 1 1
1 3 1 1
1 1 3 1
1 1 1 3

1
=
1
12

5 1 1 1
1 5 1 1
1 1 5 1
1 1 1 5

.
1.8. MINIMUM NORM SOLUTIONS 55
For a simple numerical example, let
X =

1 0 0
0 1 0
1 0 0

, (1.28)
in which case b = [1, 1, 2, 2]
T
. Solving for the minimum norm solution yields
X
MN
=
1
2

1 0 1
0 2 0
1 0 1

.5 0 .5
0 1 0
.5 0 .5

(1.29)
Note that this solution yields exactly the same measurements as the original X, but diers signicantly from X. Yet, it
can be argued to be better (in a sense made precise in Subsection 1.8.3) than the solution
X = X + 1000X
5
=

1001 1000 1000


1000 1999 1000
1001 1000 1000

where X
5
is the fth ghost in (1.27).
1.8.3 Properties of Minimum Norm Solutions
Properties of A
MN
The matrix A
MN
has several useful properties, which we list below. The proofs are left to the exercises.
Property 1.23. A
MN
is a right inverse of A, and the unique right inverse of A with the same range space
as A
H
.
Property 1.24. A
MN
A is a projection matrix for R(A
H
).
Example 1.39: Let A = [1, 1]. Then
A
MN
=
1
2

1
1

.
We note that AA
MN
= 1, so that A
MN
is a right inverse of A. Furthermore, R(A
MN
) = R(A
H
). Finally, the matrix
A
MN
A is
A
MN
A =
1
2

1 1
1 1

= P
R(A
H
)
.
Minimizing Distance from a Nominal
As we mentioned earlier, one criterion for selecting a solution from the solution set T, is to nd the solution
closest to some nominal point x
0
. If the norm used to measure closeness is the Euclidean norm, the required
solution can be obtained by translating the minimum norm solution of a slightly modied system of equations
with the same coecient matrix A. Dene z

=x x
0
and c

=b Ax
0
. Then,
Ax = b Az = c.
Since |x x
0
| = |z|, the solution to Ax = b that is closest to x
0
is obtained from the minimum norm
solution z
MN
of Az = c by the translation: x

= z
MN
+x
0
.
56 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Illustration 1.20: Let us revisit the simplied tomography problem of Illustration 1.19. This, time, suppose that
we wish to determine the input matrix X that is closest to the following matrix
X
0
=

1 0 0
1 1 0
1 0 0

,
and that is still consistent with the measurements b. If we use the same input X of (1.28), the minimum norm solution
X
MN
is still (1.29). On the other hand, using the formulation outline above, we nd the solution closest to X
0
to be
X

=
1
12

13 1 1
7 10 5
13 1 1

1.0833 0.0833 0.0833


0.5833 0.8333 0.4167
1.0833 0.0833 0.0833

.
Comparing the error norms, we nd
|X
MN
X|
F
= |vec(X
MN
) vec(X)| = 1
|X

X|
F
= |vec(X

) vec(X)| .7638,
so that X

is closer to X than X
MN
in terms of the Euclidean norm of the relevant errors.
Minimizing Worst-case Error
Let us now examine the solution that minimizes the worst case error among all solutions with bounded
energy. We know a priori that |x| E. The set of admissible solutions is now the set
F = x C
n
: Ax = b, |x|
2
E.
This set, in relation to the solution set T for a typical example in R
2
is depicted in Figure 1.5. If x
0
is a
candidate solution, then since the unknown solution must be from F, the estimation error worst scenario is
6
max
xF
|x x
0
| ,
which is a function of the candidate solution x
0
. The candidate solution which minimizes this worst case
error (the minimax solution) is
x
minmax
= arg min
x
0
F
max
xF
|x x
0
| .
From the gure, it appears that this solution lies at the center of the chord F, and is also on N(A)

. So it
seems that the minimum norm solution should be the desired x
minmax
as well. This claim is established by
the following theorem:
Theorem 1.23 (Min-Max Theorem for Minimum Norm Solutions). x
minmax
= x
MN
, with a worst-case
error of
max
xF
|x x
minmax
|
2
=

E |x
minmax
|
2
2
Proof. Recall that every x F T, can be expressed as
x = x
MN
+s
6
The max must exist because the Euclidean norm is a continuous function, and the set F is compact.
1.9. MINIMUM NORM LEAST SQUARES SOLUTIONS 57
where s N(A). Because x
MN
s, we have, by the Pythagorean Theorem, that |x|
2
= |x
MN
|
2
+ |s|
2
.
Hence, we can rewrite F as
F = x
MN
+s : s N(A), |s|
2
E |x
MN
|
2
.
Thus,
max
xF
|x x
0
| = max
sN(A);s
2
Ex
MN

2
|s +x
MN
x
0
| .
By the triangle inequality, |s +x
MN
x
0
| |x
MN
x
0
| +|s|. Hence
max
xF
|x x
0
| = |x
MN
x
0
| +

E |x
MN
|
2
where the maximum is achieved by
s
worst
=

Ex
MN

2
x
MN
x
0

(x
MN
x
0
) if x
MN
= x
0
s : s N(A), |s| =

E |x
MN
|
2
if x
MN
= x
0
.
Consequently,
min
x
0
max
xF
|x x
0
| =

E |x
MN
|
2
,
and the minimizing choice is x
0
= x
MN
.
1.9 Minimum Norm Least Squares Solutions
We have dealt with the cases that A has full column or row rank, by constructing least squares and minimum
norm solutions, respectively. The third, and nal case is that in which A has neither full column or row
rank. In this case, a solution to Ax = b may not exist, and if it does, it is not unique. Consider the following
example.
Example 1.40: Let A be dened as
A =

1 2
2 4

.
Let b = [2, 3]
T
. Then clearly b , R(A), so that a solution to Ax = b does not exist. Furthermore,
N(A) = span

2
1

,= 0
so that when a solution does exist it is not unique. If we attempt to solve the normal equations to obtain a least squares
solution x
LS
, we nd
A
H
A x
LS
= 5

1 2
2 4

x
LS
= 8

1
2

.
Now clearly, there are an innity of solutions to the normal equations, all in the linear variety
T =

1
3
10

+ span

2
1

.
58 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
How do we choose one of the solutions from T? One approach is to combine the results of the previous
two sections, and choose the least squares solution of minimum norm, i.e. the Minimum Norm Least Squares
or (MNLS) solution. In particular, we will nd the solution that satises:
x = arg min
xT
|x|
where
T = x C
n
: x minimizes |Ax b|. (1.30)
The resulting approximate solution is denoted x
MNLS
, and is unique. Although we will be able to write
x
MNLS
as the solution of a set of equations (much like the normal equations for the least squares solution),
we will be unable to explicitly construct a A
MNLS
that maps b C
m
to the minimum norm least squares
solution.
Theorem 1.24 (Minimum Norm Least Squares Theorem). Among all vectors x that minimize |Ax b|,
the one with smallest norm is the only one in R(A
H
) and it is unique. In addition, it is the only vector
x
MNLS
that satises
x
MNLS
= A
H
z (1.31)
A
H
AA
H
z = A
H
b.
Proof. By the Nearest Point Theorem, x is a least squares solution in T if and only if
Ax = P
R(A)
b. (1.32)
Thus, every exact solution of (1.32) is a least squares solution in T of (1.30). Furthermore, existence of the
projection P
R(A)
b, and thus a least squares solution is guaranteed by the existence of a projector for the
subspace R(A). Now, N(A) = 0, which implies that there is an innite set of solutions to (1.32). The
set of least squares solutions T = x
p
+ N(A) is a linear variety, so that the minimum norm least squares
element exists and is unique by the Dual Projection Theorem. Furthermore, x
MNLS
= T R(A
H
).
We can also use Theorem 1.20 to develop an alternate characterization of x
MNLS
. Theorem 1.20 states
that any least squares solution of Ax = b satises the normal equations
A
H
Ax = A
H
b. (1.33)
From the Dual Projection Theorem, we know that x
MNLS
R(A
H
), so that there exists z C
m
that
satises
x = A
H
z. (1.34)
Combining (1.33) and (1.34) yields the following two equations for x
MNLS
:
x
MNLS
= A
H
z (1.35)
A
H
AA
H
z = A
H
b. (1.36)
Note that a solution z to A
H
AA
H
z = A
H
b is not unique because N(A
H
AA
H
) may be non-trivial.
However, dierent solutions z will dier from each other only by vectors from N(A
H
AA
H
), which is the
1.9. MINIMUM NORM LEAST SQUARES SOLUTIONS 59
n
C
m
Cm
A
0
0
( )
H
N A
( ) R A
( )
H
R A
( ) N A
T
b
Figure 1.7: Schematic of the minimum norm least squares solution
same as N(A
H
). Thus, A
H
z for all these solutions will be the same. The geometry of the minimum norm
least squares solution is depicted in gure 1.7. Note how multiple least squares solutions are mapped by A
to the projection of b on R(A). Also, x
MNLS
is the projection of these least squares solutions on R(A
H
).
As a nal structural detail, we note that the minimum norm least squares solution inherits the projection
properties of both the least squares and minimum norm solutions:
Property 1.25. Let x
MNLS
be the MNLS solution of Ax = b. Then A x
MNLS
= P
R(A)
b.
Proof. Follows from the fact that x
MNLS
is a least squares solution, and thus A x
MNLS
is the projection of
b onto R(A).
Property 1.26. Let x
MNLS
be the MNLS solution of Ax = b, and let P
R(A)
b = Ay. Then x
MNLS
=
P
R(A
H
)
y.
Proof. Because x
MNLS
is an MN solution of Ax = Ay, it is the projection of every solution of Ax = Ay
onto R(A
H
). Since one such solution is y, x
MNLS
= P
R(A
H
)
y.
Example 1.41: Let A =

1 2
2 4

, and let b = [2, 3]


T
. Lets nd the MNLS solution by solving the equations of
Theorem 1.24. We have
A
H
AA
H
z = 25

1 2
2 4

z = 8

1
2

.
Now let us take any solution to this set of equations. Suppose, for example, we choose z = [1,
17
50
]
T
. We nd the MNLS
solution by solving
x
MNLS
= A
H
z =
8
25

1
2

.
60 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
Illustration 1.21: Let us revisit the tomography problem, but this time with a slightly dierent problem formulation.
Once again, we will assume that the input X C
33
is
X =

x
1,1
x
1,2
x
1,3
x
2,1
x
2,2
x
2,3
x
3,1
x
3,2
x
3,3

.
This time, the measurement b C
6
consists of all the row and column sums of X, arranged as
b =

x
1,1
+x
1,2
+x
1,3
x
2,1
+x
2,2
+x
2,3
x
3,1
+x
3,2
+x
3,3
x
1,1
+x
2,1
+x
3,1
x
1,2
+x
2,2
+x
3,2
x
1,3
+x
2,3
+x
3,3

.
Again, we will assume a lexigraphical reordering of X into a vector x C
9
as x = vec(X).
The mapping from x to b is clearly linear, and thus has a matrix representation A C
69
. Now, the columns of A are
clearly not linearly independent, as the matrix has more columns than rows. However, after a more careful examination,
we see that A cannot be full row rank either. To see this, note that if we sum the rst three and last three elements of b,
we should get the same number, because
b R(A) = b
1
+b
2
+b
3
= b
4
+b
5
+b
6
=
3

i=1
3

j=1
x
i,j
.
Thus, there exist vectors b C
6
such that b , R(A). Take, for example, the vector b = [1, 0, 0, 1, 0, 0]
T
. There is no
input X for which b corresponds to a set of measurements on X. So we can reason that the rows of A must also be
linearly dependent, otherwise R(A) = C
6
.
For this version of the tomography problem then, we may or may not have a solution to Ax = b, depending on whether
b R(A). Furthermore, if we do have a solution, it cannot be unique, as the null space of A is nontrivial. In particular,
consider the following inputs, which form a basis for N(A):

1 0 1
0 0 0
1 0 1

1 1 0
1 1 0
0 0 0

0 0 0
0 1 1
0 1 1

1 1 0
0 0 0
1 1 0

.
Thus to nd a solution to this form of the tomography problem, we must resort to a MNLS type of solution. A further
exploration is covered in the exercises.
1.10 The MoorePenrose Pseudoinverse
Nothing in our treatment of MNLS solutions suggests that the MNLS solution is linear in the vector of
measurements b. So far, we can only establish the existence of a nonlinear map A
MNLS
: C
n
C
m
, where
A
MNLS
(b) = x
MNLS
. Furthermore, we know from the various properties, that AA
MNLS
(b) = P
R(A)
b,
and A
MNLS
(Ay) = P
R(A
H
)
y. We will show that A
MNLS
is in fact linear. To do so, we will rst study a
matrix A
+
known as the MoorePenrose Pseudoinverse, which is dened implicitly in terms of a number of
conditions. We will then show that when A
+
exists, it has the property A
+
b = x
MNLS
, so that if A
+
exists,
then A
MNLS
is linear and has the matrix representation A
+
. Studying the existence of A
+
is postponed until
the next chapter, when we will develop the tools necessary to construct A
+
and thus prove its existence.
1.10. THE MOOREPENROSE PSEUDOINVERSE 61
The MoorePenrose pseudoinverse is traditionally dened exclusively via its algebraic properties. In
particular, the MoorePenrose pseudoinverse (or just pseudoinverse for short) B of a matrix A is any matrix
that satises the Penrose Conditions, which are:
1. ABA = A,
2. BAB = B,
3. (AB)
H
= AB,
4. (BA)
H
= BA.
From these algebraic properties alone, we can establish that if the pseudoinverse exists, it must be unique.
Lemma 1.2 (Uniqueness of the Pseudoinverse). Let B and C be two matrices that satisfy the Penrose
conditions for some matrix A. Then B = C.
Proof. The proof follows by applying the Penrose conditions to equate B and C:
B = BAB (2) C = CAC (2)
= A
H
B
H
B (4) = CC
H
A
H
(3)
= (ACA)
H
B
H
B (1) = CC
H
(ABA)
H
(1)
= (A
H
C
H
)(A
H
B
H
)B = C(C
H
A
H
)(B
H
A
H
)
= CABAB (4) = CACAB (3)
= CAB (1) = CAB (1)
As a result of the uniqueness lemma, we can speak of the pseudoinverse of a matrix A. Furthermore,
from the algebraic properties, and the properties of projectors, we can draw some additional conclusions
about the pseudoinverse, specically in terms of projections.
Theorem 1.25. Suppose B is the pseudoinverse of A. Then
1. AB is the projector for R(A), and
2. BA is the projector for R(B),
3. R(B) = R(A
H
).
Proof. (1) From the Penrose conditions, it follows that (AB)
H
= AB and (ABA)B = AB, so that AB
is Hermitian and idempotent. From Theorem 1.13, it follows that AB is a projector. Furthermore, it is
clear that R(AB) R(A). But A = ABA (by Penrose condition (1)), so that R(A) R(AB). Hence
AB = P
R(A)
.
(2) Again, from the Penrose conditions, it follows that (BA)
H
= BA and (BAB)A = BA, so that BA
is Hermitian and idempotent, and BA is a projector. Now, R(BA) R(B). But, B = BAB (by Penrose
condition (2)), so R(B) R(BA). Hence BA = P
R(B)
.
(3) First, from the Penrose conditions,
A
H
= (ABA)
H
= (A
H
B
H
)A
H
= BAA
H
,
62 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
so that R(A
H
) R(B). Second, note that BA = (BA)
H
= A
H
B
H
, so that R(B) R(A
H
). Hence,
R(B) = R(A
H
).
The most important aspect of the pseudoinverse is that if it exists, then it is the unique map that
generates MNLS solutions to Ax = b for any b.
Theorem 1.26. For any A C
mn
, if A
+
exists, then A
MNLS
= A
+
, i.e.
x
MNLS
= A
+
b b C
m
. (1.37)
Proof. From the MNLS theorem, we know that x
MNLS
must satisfy two conditions. First, it must be a least
squares solution, and thus satisfy the normal equations. Second, it must lie in R(A
H
). We obtain that
A
+
b R(A
H
) by Theorem 1.25 (3): R(A
+
) = R(A
H
). To prove that A
+
b satises the normal equations,
we simply use the Penrose conditions:
A
H
A(A
+
b) = A
H
(AA
+
)b = A
H
(AA
+
)
H
b = (AA
+
A)
H
b = A
H
b.
We will postpone the proof of existence of the pseudoinverse until the next chapter, when we will be able
to construct the pseudoinverse (and thus, by Theorem 1.26, A
MNLS
) for arbitrary matrices A. However,
when the matrix A has special rank structure, the pseudoinverse reduces to the special cases of A
1
, A
LS
,
and A
MN
that we have considered already.
Theorem 1.27. Let A C
mn
. If
1. A has full column rank, then A
+
= A
LS
,
2. A has full row rank, then A
+
= A
MN
,
3. A is square and nonsingular, then A
+
= A
1
.
The proof of this theorem is left to the exercises.
Example 1.42: Suppose A =

1 0
0 0

. We claim that A is its own pseudoinverse, which is readily veried via the
Penrose conditions.
Example 1.43: Suppose A =

1 2
2 4

. Then A
+
=
1
25
A.
1.11 Problems
Review
1. Is the set x C
1
: 'x = 0 a subspace? Why?
2. For what values of r is the set A
r
= x R
4
: x
1
+ + x
4
= r a linear subspace of R
4
? Why?
Describe this subspace in words.
1.11. PROBLEMS 63
3. What is the denition of dimension of a linear space?
4. Let A
T
= [r
1
, r
2
, . . . , r
m
] How is dimspanr
1
, r
2
, . . . , r
m
related to rank(A)? Why?
5. Consider the equation Ax = b, where A =

2 1
6 3

, b =

3
9

. Write out and sketch the solution


set S for this equation.
6. Given a length 4 complex sequence x = [x
0
, . . . , x
3
]
T
its convolution with a length 3 lter h =
[h
0
, h
1
, h
2
]
T
= [1, 2, 3]
T
is y = x h. Dene appropriate spaces X and Y for x and y, and write the
convolution mapping H : X Y as an appropriate matrix H.
(a) Is H linear? one-to-one? onto? invertible?
(b) What are R(H), N(H)? What is the physical signicance of these subpspaces?
(c) The deconvolution problem is to determine x given y and h.
i. Suppose we measure y = [1, 3, 6, 5, 5, 3]
t
. Can you determine x? Why? How is this related
to R(H) and N(H)? What could have caused the problem?
ii. You only know the measured values of [y
1
, . . . , Y
4
]. A solution for x exists: (i) always? (ii)
Sometimes ? (iii) Never? Why?
iii. As above, but you only know [y
1
, y
2
, y
3
].
iv. You know [y
2
, y
3
, y
4
] = [4, 5, 6]. Write out the solution set for x. How is this related to the
null space of a certain matrix?
7. Let A
T
=

1 2 3
4 5 6

. Does A have a left inverse? A right inverse? How about A


T
? Why?
8. Suppose that b R(A) ! x C
n
s.t. Ax = b.
(a) How does it follow that A
L
?
(b) Under what conditions is each of the following identities true? (i) A
L
A = I
n
; (ii) AA
L
= I
m
9. Give an example (dierent from text) of a matrix A with 3 dierent left inverses A
L
1
, A
L
2
, A
L
3
. For
which b will you get A
L
1
b = A
L
2
b = A
L
3
b?
10. For which matrices A
L
and for which A
R
?
11. What are the right and left inverses of A
L
? Of A
R
?
12. Give an example of a set o in R
2
for which the nearest point map is (i) nonlinear but unique; (ii)
neither linear nor unique. Is it possible for the nearest point map to be nonunique but linear?
13. Why is the nearest point map always idempotent?
14. For each of the following instances nd o

and indicate whether o and o

are complements of one


another.
(a) o = [1, 1]
t
in R
2
.
(b) o = the y axis in R
3
.
64 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
(c) o = the unit circle in R
2
.
(d) o = the x and y axes in R
3
.
15. Property 4 of ortho-complements implies (o + o

= (C
n
)

. Why cant we conclude o + o

= C
n
(i.e., Part II of Property 6)?
16. Determine the nearest point mapping from C
4
to o = span[1, j, 0, 1]
T
, [j, 0, 1, 0]
T
, [0, j, j, 1]
T
. Use it
to nd the point in o nearest to x = [1, 1, 1, 1]
T
.
17. Determine the nearest point mapping from R
4
to the set
(a) o = x R
4
: x
0
= x
2
= 0
(b) o = x R
4
: x
i
0, i = 0, . . . , 3
Which of these mappings are orthogonal projections? Why? Apply these mappings to the vector
x = [1, 2, 3, 4]
T
.
18. Prove the projector update formula, Theorem 1.17.
19. Use the property R(A
H
)

= N(A) to prove [N(A


H
)]

= R(A). Justify carefully every step.


20. How is dimN(A
H
) related to (i) rank(A) ; (ii) dimN(A).
21. Is there more than one left inverse that makes A
L
b = 0 b [R(A)]

? Why?
22. Is it true that b A x
LS
N(A
H
) ? Why?
23. Find the solution to [1, 2]x = 4 that is closest to the nominal x
0
= (1, 2)
T
, using the method of Sec.
2.8.2. Illustrate your result by a sketch in R
2
.
Problems
24. Prove that the span of any set of vectors is a subspace.
25. Prove that the denition of dimension makes sense, i.e. that every basis for a subspace must have the
same number of elements in it.
26. Prove the Cauchy-Schwartz Inequality.
27. Prove that (1.2.1) satises the four properties of a norm.
28. Another norm that is useful in applications of least squares theory is the weighted Euclidean norm,
dened by
|x|
2
W
= x
H
Wx =
n

i,j=1
x
i
w
i,j
x
j
where W is a symmetric and positive denite matrix (i.e. x
H
Wx 0 with equality if and only if
x = 0). Prove that the weighted Euclidean norm satises the four properties of a norm.
29. The innity or max norm for vectors of C
n
is dened by
|x|

= max
1kn
[x
k
[.
Prove that the innity norm satises Property 1.1.
1.11. PROBLEMS 65
30. The
1
norm for vectors of C
n
is dened by
|x|
1
=
n

k=1
[x
k
[.
Prove that the
1
norm satises Property 1.1.
31. Prove that every linear map L : C
n
C
m
can be represented by a matrix A such that L(x) =
Ax x C
n
.
32. Find conditions on the unit pulse response h such that the matrix corresponding to a circular convo-
lution has full rank.
33. Diagonalization of a Circulant Matrix. Use arguments similar to those in Illustration 1.11 to show
that any circulant matrix H can be factored as H = W
H
W, where W represents the (appropriately
scaled) DFT matrix, and is an appropriate diagonal matrix.
34. Use the results of Problem 33 to show that an alternative factorization for a circulant matrix is
H = WDW
H
, where D is another diagonal matrix. (Hint: use the fact that if C is a circulant
matrix, so is its transpose C
T
.)
35. What is the range space of the matrix representation of circular convolution? The null space? Express
in terms of the DFT of the lter.
36. Prove that in Illustration 1.5 that V
1
and V
2
have only the vector 0 in common.
37. Reconsider Illustration 1.5, this time, we partition the input signals into
X
1
= x C
N
: x
n
= 0 n = 1, 3, 5, . . . , N 1
and
X
2
= x C
N
: x
n
= 0 n = 2, 4, 6, . . . , N
i.e., the space of the even half of the signal (X
1
) and the space of the odd half of the signal (X
2
).
Show that the two sets of output signals V
1
and V
2
obtained by convolving signals in X
1
and X
2
respectively with h, are subspaces, have only the zero signal in common, and that their direct sum is
V , the subspace of all possible outputs. Which if any of these four subspaces are mutually orthogonal,
and which are orthogonal complements?
38. Prove that the Frobenius norm, dened in denition 1.1, satises the properties of a norm, listed in
Property 1.1.
39. Prove that the Spectral norm, dened in denition 1.2, satises the properties of a norm.
40. Prove that H, the convolution matrix dened in (1.8) has a trivial null space if h = 0.
41.

Postulate some sucient conditions for existence and uniqueness of solutions to a truncated convo-
lution problem. Try and prove your postulates. *
42. Prove Property 1.15.
43. Prove or disprove (by counterexample) the following statements:
66 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
(a) Setting: mn matrix A is full rank, and n n matrix B is invertible.
Statement: AB is full rank. Consider both m n and n < m.
(b) A is full rank and AB is full rank B is full rank.
44. Prove or disprove the following statements about A
L
, the left inverse of A.
(a) AA
L
x = x x R(A).
(b) A
L
is unique A
L
is also a right inverse.
(c) A has a left inverse A has full column rank.
(d) If B has the same size as A, then AA
L
B = A A = B
45. Prove the converse of the Pythagorean Theorem, in the real case.
46. Let S T be two sets. Is it true that T

? Prove or disprove.
47. Let S be a subset of C
n
Prove that (S

= spanS.
48. Prove or disprove (by counterexample) the following statements. Setting: V is a vector space; T is a
subspace of V; S is a subset of V.
(a) span(S) = T S

= T

.
(b) S

= T

span(S) = T
49. Suppose the columns of B
1
form a basis for subspace S and the same is true for B
2
. Show that
B
2
= B
1
M, where M is a nonsingular square matrix.
50. Use the factorization of a circulant matrix H in Problem 33 to rederive its spectral norm, and compare
to its Frobenius norm.
51. Prove the Fundamental Theorem of Linear Algebra, Theorem 1.18, by rst proving N

(A) = R(A
H
).
52. Show directly, without appealing to the uniqueness of the projector, that P = B(B
H
B)
1
B
H
does
not depend on the choice of basis B for the subspace spanned by it.
53. Suppose we re-dene orthogonality on C
m
as y z if (Tz)
H
(Ty) = 0, where T is a rank-m matrix
(possibly rectangular). Ortogonality on C
n
remains unchanged. Let A C
mn
. Express the orthog-
onal complements of R(A) and N(A) in terms of the null and range spaces of appropriate matrices,
respectively. Then, repeat the above in the case m = n (when orthogonality is redened on both C
m
and C
n
.
54. Let S

=x : Ax = b for some b = 0 and some A with nontrivial null space. What is S

? The answer
depends on whethor or not b R(A).
55. Consider F dened as x C
n
: Ax = b, |x| E. If non-Euclidean norms are used, will the
minimizer of the worst-case error be the minimum-norm element of F?
56. Tomography for a 3 3 image. You are given exact measurements of the row and column sums of
the pixels (6 measurements).
(a) Can you recover the image? How? Why?
1.11. PROBLEMS 67
(b) Ghosts are images that produce exactly zero measurements. Write a matlab program to generate
random ghosts. Show three of these ghosts. How are the ghosts related to your ability or inability
to reconstruct the image?
(c) You suspect that your assistant was too lazy to collect actual measurements, and just fabricated
the numbers instead. Construct a certain projector and show how use it to verify your suspicion.
(d) Repeat (a)-(c) when you are also given the sums along the two main diagonals and two subdiag-
onals in each direction (total of 6 + 2 + 4 = 12 measurements).
(e) Would your answers to (a-c) change if you knew the image is non-negative? How?
57. Simple Noise Cleaning. A certain Polynesian string musical instrument can be modeled by an FIR
lter with impulse response h
n
= 0.99
n
cos(n/3) + 0.98
n
cos(n/2), n = 0, . . . , 30, which is excited
by a short input determined by the player plucking the string. Your anthropologist friend obtained a
recording of music played with this instrument, but his trip through the jungle wrecked havoc with
the tape. The recording is now corrupted by additive noise of unknown nature.
Fortunately, you discover that the music consists of segments at most 33 samples long, separated by
periods of silence.
(a) Devise a method to process the recording to reduce the noise. Hint: think of the subspace in which
the noiseless signal lives. How does your method work? Assume the recording has already been
correctly segmented into signal segments 33 samples long each.
(b) Write a matlab denoising program to implement this method, and demonstrate its operation with
three kinds of noise: (i) white Gaussian; (ii) white U[1, 1]; (iii) Constant tone interference. Use
plots to show the signal before and after. Pick the noise level so that there is visible dierence
before and after, and the after signal looks similar to the noise-free signal.
(c) Consider a signal y = x + w consisting of a desired signal x C
n
corrupted by additive noise
w C
n
. The-signal-to-noise ratio (SNR) of y is dened as
|x|
2
2
|w|
2
2
and represent the ratio of signal to noise energy. What is the worst-case improvement in SNR,
and what is the best-case improvement that can be expected from the denoising algorithm over
all possible noise vectors?
(d) When the signal and noise are modeled as random vectors, the SNR is dened as the ratio of
expected powers,
E|x|
2
2

E|w|
2
2

(When the signal x is deterministic (no statistical model), then the expected value can be omitted).
Derive an expression to compute the SNR improvement for white additive noise. Consider both
deterministic and random signal x.
(e) Compute the actual SNR improvement observed in your experiment in (57b). Does it agree with
your theoretical predictions above? why or why not?
68 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS
(f) Speculation: conjecture whether it would be possible to solve the problem, even if h
n
were un-
known. You can use qualitative hand waving arguments.
58. If A
#
is dened as the matrix satisfying (i) AA
#
= P
R(A)
and (ii) A
#
A = P
R(A
H
)
, then
(a) Is A
#
unique?
(b) Are the Penrose conditions sucient to specify A
#
?
(c) Is A
#
the pseudo-inverse of A?
59. (a) Suppose the columns of A C
mn
are linearly independent, and let S C
lm
. The weighted
least-squares solution, denoted x
WLS
, is a minimizer of |S(Ax b)|. Prove that regardless of
the number of rows l 1 of S, x
WLS
is unique if and only if A
H
S
H
SA is non-singular. Obtain
an explicit expression for x
WLS
in this case, without employing pseudo inverses.
(b) Suppose the rows of A C
mn
are linearly independent, m < n, and let T C
np
. The
weighted minimum norm solution to Ax = b, denoted x
WMN
, is the one minimizing |Tx|. Show
by example that x
WMN
can be unique even if T is rectangular. Find a necessary and sucient
condition for the uniqueness of x
WMN
. Obtain an explicit expression for x
WLS
in this case,
without employing pseudo inverses.
(c) Derive the equation satised by the unique weighted minimum norm least-squares solution x
WMNLS
in the general case when none of the matrices A, S, and T has full rank, but A
H
S
H
SA and T
H
T
are nonsingular. Also, consider the case when A
H
S
H
SA is singular.
60. The x C
n
that minimizes |S(Ax b)| and |Tx| subject to the linear equality constraints Ex = c
is a constrained, weighted minimum norm least-squares solution to Ax = b, denoted x
CWMNLS
. State
the necessary and sucient conditions for uniqueness of the solution, and obtain an expression for
x
CWMNLS
: (i) when A
H
S
H
SA and T
H
T are nonsingular and (ii) when A
H
S
H
SA and T
H
T are
singular. Hint: consider a decomposition of the solution into two parts: one that satises the constraint,
the other that is unaected by it.
61. ** The goal in this problem is to derive necessary and sucient conditions on matrices A and B such
that (AB)
+
= B
+
A
+
, where A
+
denotes the pseudo-inverse of A
+
.
(a) Show that the proposed identity does not hold for all matrices of compatible sizes, by appropriate
counterexample(s).
(b) Find nontrivial examples for which the identity does hold (other than scalars, or diagonal matrices)
(c) State the appropriate necessary and sucient conditions. Express your conditions in terms of
the fundamental subspaces of A and B. You may nd it useful to check your conditions on the
examples in (a) and (b) above.
Projects and Applications
62. Give signal processing applications for two of the following problems in C
n
or R
n
: LS, MN, solution
closest to nominal, MNLS. In other words, show how the application in question can be set up as one
of these problems, and motivate. Identify the matrix A, the vectors b and x, give their dimension and
relation to physical parameters in the problem. Comment on the physical signicance of the solution
1.11. PROBLEMS 69
and its properties. Your application should be dierent from those described in BBC or in the HW, or
in the book by Moon. Feel free to draw on the literature (e.g., IEEE Trans. Signal Processing, IEEE
Signal Processing Magazine, Proceedings ICASSP, or other textbooks or the web. Try to come up with
examples dierent from those of your friends!
70 CHAPTER 1. MATRIX INVERSE PROBLEMS WITH LEAST-SQUARES SOLUTIONS

You might also like