0% found this document useful (0 votes)

41 views37 pages

Wavelet de Haar

1) The document discusses convergence of sequences in metric spaces. A sequence {xn} in a metric space (X,d) converges to a limit x if d(xn,x) approaches 0 as n approaches infinity. 2) A Cauchy sequence is defined as a sequence where the elements get closer together, meaning d(xn,xm) approaches 0 as n and m approach infinity. In complete metric spaces, all Cauchy sequences converge. 3) The space of real numbers R is complete, while the space of rational numbers Q is not complete as it does not contain all limits of Cauchy sequences (like the sequence of partial sums of π).

Uploaded by

juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views37 pages

Wavelet de Haar

Uploaded by

juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lecture 3

Convergence of sequences in metric spaces

In the previous section, we introduced the idea of a metric space (X, d): A set X along with a
metric d that assigns distances between any two elements x, y ∈ X. It now remains to discuss the
important idea of convergence in such metric spaces. This will be easy, since you have already seen
the idea of convergence of sequences of real numbers.
Recall that a sequence {xn } of real numbers is said to converge to a limit x ∈ R if the following
holds true:

Given any ǫ > 0, there exists an integer Nǫ > 0 such that

|xn − x| < ǫ for all n > Nǫ . (1)

The ǫ-subscripted N , i.e., Nǫ , indicates that the value of N will generally be dependent upon the value
of ǫ. (Most probably, the smaller we make ǫ, thus “squeezing” the tail of the sequence closer to x, the
larger the required value of N .)
But look again at the LHS of the inequality in (1) – not surprisingly, it is the distance between
xn and the limit x, i.e., |xn −x| = d(xn , x), where the metric d on the set of real numbers was discussed
earlier. Therefore, we can rewrite the above requirement for convergence of a sequence {xn } as follows,

Given any ǫ > 0, there exists an integer Nǫ > 0 such that

d(xn , x) < ǫ for all n > Nǫ . (2)

This requirement is equivalent to the statement,

lim d(xn , x) = 0, (3)

n→∞

which is also written as follows

d(xn , x) → 0 as n → ∞. (4)

The beautiful thing is that the above result carries over to metric spaces in general – regardless
of what comprises these spaces, e.g., numbers, functions, sets, measures, etc.. We say that a sequence
{xn } of elements of a metric space X, i.e., xn ∈ X converges to a limit x ∈ X if

d(xn , x) → 0 as n → ∞, (5)

30
acknowledging that this is a shorthand notation for the more proper ǫ-Nǫ definition.
The above definition of convergence is fine if you happen to know the limit x of your sequence
{xn }. You may then be able “measure” the distances d(xn , x) and show that this sequence converges
to zero. But what if you don’t know the limit x? Can you still characterize a sequence {xn } as
being convergent, or possibly convergent, if you don’t actually know the limit x? The mathematician
Cauchy struggled with this problem in his study of the real numbers and came up with the following
definition (Cauchy 1821):

A sequence of real numbers {xn } is said to be a Cauchy sequence if, for any ǫ > 0, there
exists an Nǫ > 0 such that

|xn − xm | < ǫ for all n, m > Nǫ . (6)

In other words, the elements xn of the “tail” of the sequence are getting closer and closer to each
other. Cauchy then proved the following remarkable result:

(Cauchy 1821) A sequence of real numbers {xn } is convergent (to a limit x ∈ R) if and
only if it is a Cauchy sequence.

The “if and only if”, i.e., equivalence, of Cauchy sequences and convergence to a limit, is true because
of the completeness of the real number system. We’ll return to this idea in a moment.

The idea of a Cauchy sequence is easily brought into a general metric space setting:

Let (X, d) be a metric space. Then a sequence of elements {xn } is said to be a Cauchy
sequence in (X, d) if, for any ǫ > 0, there exists an Nǫ > 0 such that

d(xn , xm ) < ǫ for all n, m > Nǫ . (7)

We have simply replaced the distance function |x−y| on the real number line with the metric d(x, y) in
our metric space. This definition is now applicable to real numbers, ordered n-tuples in Rn , functions,
etc..

In general, however, Cauchy’s “if and only if” result for sequences of real numbers may not hold.
We do have the following result, however:

31
Theorem: Let {xn } be a convergent sequence in a metric space (X, d). Then {xn } is a Cauchy
sequence.

The proof of this theorem is so straightforward that it is worthy of mention here. (It is a direct
analogue of the proof for real numbers.)

Proof: Since {xn } is convergent, there exists an x ∈ X (the limit of the sequence) such that the
following holds: For an ǫ > 0, there exists an Nǫ > 0 such that d(xn , x) < ǫ/2 for all n > Nǫ . (Note
that we’re using ǫ/2 on the RHS, for reasons that will be clear below – it’s quite OK to do this.) Then,
for any n, m > Nǫ , it follows, from the triangle inequality, that

d(xn , xm ) ≤ d(xn , x) + d(x, xm )

ǫ ǫ
< +
2 2
= ǫ. (8)

Just to repeat the final conclusion: d(xn , xm ) < ǫ for all n, m > Nǫ . Therefore, by definition, the
sequence {xn } is Cauchy, proving the theorem.

And what about the converse? Are all Cauchy sequences convergent? The answer is “no” – it
depends upon the metric space X in which we are working. Here is an example.

Example: Let X = Q, the set of rational numbers, with metric d(x, y) = |x − y|. Consider the
following sequence,
31 314 3141
x1 = 3, x2 = , x3 = , x4 = ,···. (9)
10 100 1000
The element xn is obtained by truncating the decimal expansion of π to n − 1 decimal digits. It can
be shown (left as an exercise – it’s not that difficult) that this sequence is Cauchy. And it darn well
looks like it is convergent, i.e., converging to π. But this limit point π is NOT in the metric space Q.
Therefore the sequence {xn } is NOT convergent.
Of course, if you extended the metric space to be the set of real numbers R, then the sequence
{xn } is convergent – it converges to an element of R, the non-rational number π. As you may already
know, the real numbers are said to be the completion of the rational numbers.

For this reason, we have the following definition:

32
Definition: A metric space (X, d) is said to be complete if all every Cauchy sequence {xn } converges
(to an element x ∈ X).

We can now come up with some easy examples:

1. The metric space X = R, the set of real numbers, is complete.

2. The metric space X = Q, the set of rational numbers, is not complete.

3. The closed interval X = [a, b] of real numbers, is complete.

4. The interval X = (a, b] is not complete. (It’s possible to construct a sequence which converges
to a, which is not an element of X.) Likewise, the interval (a, b) is not complete.

In a short time, we shall arrive at some interesting conclusions regarding metric spaces of functions.
First, however, we examine the consequence of convergence with respect to a couple of important
metrics.

Convergence of sequences of continuous functions with respect to the d∞ and d2

metrics

Convergence in d∞ metric

Here we examine the consequence of convergence of sequences of continuous functions with respect to
the d∞ metric. Recall that for two functions f, g ∈ C[a, b], the distance between them according to
this metric is given by
d∞ (f, g) = max |f (x) − g(x)|. (10)
a≤x≤b

The distance between f and g is the maximum difference of their values on [a, b]. An important
consequence of this is the following: If

d∞ (f, g) < ǫ implies that |f (x) − g(x)| < ǫ ∀ x, y ∈ [a, b]. (11)

This implies that for all x ∈ [a, b], the difference between f (x) and g(x) is less than ǫ. This is a very
“tight” closeness between the graphs of f (x) and g(x).
Now suppose that we have a sequence of functions fn ∈ C[a, b] that converges to a limit function
f ∈ C[a, b] with respect to the d∞ metric. This means that

d∞ (fn , f ) → 0 as n → ∞. (12)

33
Using the ǫ definition: For any ǫ > 0, there exists an Nǫ > 0 such that

d∞ (fn , f ) < ǫ for all n > Nǫ . (13)

But from (11), this implies that

|fn (x) − f (x)| < ǫ for all n > Nǫ . (14)

First of all, let us consider a fixed a value of x in [a, b]. As we push ǫ toward 0, the values fn (x)
for n > Nǫ are forced to be closer and closer toward f (x). In other words, the real numbers fn (x)
converge to f (x), i.e.,
lim fn (x) = f (x) or simply fn (x) → f (x). (15)
n→∞

This is true for any x ∈ [a, b]. This type of convergence is known as pointwise convergence.
But there is another important aspect of this result: the convergence is uniform: For any
x ∈ [a, b], the values of fn (x) lie within ǫ of f (x).
Another way to see this is to note that Eq. (14) implies the following (from the property of
absolute values):
f (x) − ǫ < fn (x) < f (x) + ǫ for all n > Nǫ . (16)

In other words, the graphs of the fn (x) functions for n > Nǫ lie inside an ǫ-ribbon of the graph of
f (x). Letting ǫ go to zero reduces the width of this ribbon. The fact that for a given ǫ > 0, the same
Nǫ will work for all values of x in [a, b] implies uniform convergence. This is made possible by the fact
that the interval [a, b] is closed and bounded.

Convergence in d2 metric

Recall that for two functions f, g ∈ C[a, b], the distance between them according to the d2 metric is
given by
Z b 1/2
2
d2 (f, g) = [f (x) − g(x)] dx . (17)
a
Note that 1/2
Z b
2
d2 (f, g) < ǫ implies that [f (x) − g(x)] dx < ǫ. (18)
a
This metric involves an integration over the functions f and g and not a comparison of their values
over the interval. Unlike the case for the d∞ metric, Eq. (18) does not imply that the maximum
difference between f (x) and g(x) values must be less that ǫ. It is possible that the values of f (x) and
g(x) can differ significantly, but over a sufficiently small interval.

34
To illustrate, consider the following example. Let f (x) = 0 and gn (x), n = 2, 3, · · ·, be a sequence
of functions be defined over [0, 1] as follows: For n ≥ 2, define



 0, 0 ≤ x ≤ 12 − n1 ,


 1 + n x − 1 , 1 − 1 ≤ x ≤ 1,

2 2 n 2
gn (x) = (19)
1
1 1 1
 1 − n x − 2 , 2 ≤ x ≤ 2 + n,




0, 21 + n1 ≤ x ≤ 1.



The graph of this seemingly complicated formula is a “hat” centered at x = 1/2, as sketched below.
y

y = gn (x)
y = f (x) = 0

x
0 1 − 1 1 1 + 1 1
2 n 2 2 n

We now compute the d2 distance between f and gn . After a little algebra, we find that
Z 1 1/2
2
d2 (f, gn ) = [f (x) − gn (x)] dx
0
"Z 1 1
#1/2
2
+n
2
= [gn (x)] dx
1
−1
2 n
r
2
= . (20)
3n
We can make n as large as we please, thereby making the d2 distance between f and gn as small as
desired. In other words, d2 (f, gn ) → 0 as n → ∞.
But note that for all n ≥ 2, f (1/2) − g(1/2) = 1, which implies that d∞ (f, gn ) = 1 for all n ≥ 0.
In summary,

1. In the d2 metric, the functions gn are approaching the function f (x) = 0 as n → ∞.

2. In the d∞ metric, the functions gn remain a constant distance of 1 away from function f (x) = 0.
Therefore, they cannot converge to f (x) = 0 in the d∞ metric.

Let us make one more observation regarding the gn functions. If we focus our attention on a fixed
value of x ∈ [a, b], note that the behaviour of the real numbers gn (x) as n → ∞ may be summarized
as follows:

35
1. If x = 1/2, then gn (1/2) = 1 for all n ≥ 2, implying that

lim gn (1/2) = 1. (21)

n→∞

2. If x 6= 1/2, then for some N > 0, gn (x) = 0 for all n > N . (The hat just gets thinner and
thinner.). This implies that
lim gn (x) = 0, x 6= 1/2. (22)
n→∞

To summarize, from a pointwise perspective, the sequence of functions gn (x), in the limit n → ∞,
approach the function 
 0, x 6= 1/2,
g(x) = (23)
 1, x = 1/2.

For this reason, g(x) is known as the pointwise limit function of the sequence of functions {gn (x)}.
Clearly, g(x) is not a continuous function on [0,1] since it is discontinuous at x = 1/2. Note that
g(x) differs from the zero function f (x) = 0 only at x = 1/2. From the perspective of the d2 metric,
these functions are identical, i.e.,
Z 1 1/2
2
d2 (f, g) = [f (x) − g(x)] dx = 0. (24)
0

The integral does not register the difference of the functions at the single point x = 1/2. More on this
later.

The question that you may now ask is, “So, which metric is better?” The answer is, “It depends
on what you want.” If you are working on a problem that demands functions to be close to each other
over an interval, e.g., uniform convergence, then you use the d∞ metric. On the other hand, if the
difference between functions is more aptly described in terms of integrals, then the d2 metric might
be more useful.
In signal and image processing, the space of continuous functions is too restrictive. This should
be quite evident in the case of images which, by their very nature, are not continuous functions –
every edge of an image, e.g., boundary of an object, represents a curve of discontinuity of the function
u(x, y) representing the image.
The fact that you can have convergence of functions in the d2 metric while still having function
values differing by significant amounts will help us to understand the phenomenon of “ringing”, i.e.,
the Gibbs phenomenon, that occurs when approximating functions with Fourier series.

36
Finally, we comment that the above phenomenon observed for the d2 metric will also be observed
for the other dp metrics, p ≥ 1, that are formulated in terms of integrals.

Appendix: Completeness/incompleteness of the function space C[a, b]

The material in this section was not covered in the lecture. It is included here for the
purpose of “completeness” for those interested. It will not be covered on any examina-
tion.

We now return to the idea of completeness, as applied to the space of continuous functions C[a, b]
on a closed interval. We shall simply state the following result. Its proof is probably discussed in an
advanced course on analysis (and certainly covered in AMATH 731, “Applied Functional Analysis”).

Theorem: The metric space (C[a, b], d∞ ) is complete.

In other words, if the metric d∞ is employed, a Cauchy sequence of continuous functions {fn } ⊂
C[a, b] converges to a limit function f which is continuous, i.e., f ∈ C[a, b]. (Remember that the
sequence {fn } is Cauchy as measured by the metric d∞ .)

On the other hand:

Theorem: The metric space (C[a, b], d2 ) is not complete.

In other words, sequence of of continuous functions {fn } ⊂ C[a, b] which is Cauchy with respect
to the d2 metric may converge to a function f that is NOT continuous, i.e., NOT an element of C[a, b].
An example was presented earlier – the “hat function” centered at x = 1/2. Here is another example.

Example: Let [a, b] = [0, 1] and consider the sequence of functions fn (x) = xn , n = 1, 2, · · ·, sketched
schematically below.
Clearly, fn ∈ C[a, b]. Let’s examine the behaviour of these functions as n → ∞.

1. For any x such that 0 ≤ x < 1, xn → 0 as n → ∞.

2. For x = 1, clearly, xn = 1.

37
y

1
y=x

y = x2

y = x3
y = xn

x
0 1

Some graphs of xn on [0, 1].

We’ll simply state the fact that the sequence {fn } is Cauchy in d2 metric (and leave it to the reader
as an optional exercise). Instead, we’ll focus on the limit f of this sequence which, in this case, can
be found rather easily.
Pointwise, it appears that the functions fn (x) are converging to the function f (x) given by

 0, 0 ≤ x < 1
f (x) = (25)
 1, x = 1.

Clearly, f (x) is not a continuous function – it is discontinuous at x = 1. Let us now examine the
distances d2 (fn , f ):
Z 1 1/2
2
d2 (fn , f ) = [fn (x) − f (x)] dx
0
Z 1 1/2
n 2
= [x − 0] dx
0
1
= √ . (26)
2n + 1

(The single point x = 0 at which f (x) = 1 can be ignored in the integration, since its contribution to
the integral is zero: fn (1) − f (1) = 0. Even if fn (1) 6= f (1), the contribution to the integral would be
zero.) Clearly, d2 (fn , f ) → 0 as n → ∞, i.e., the sequence of functions {fn } is converging to f in d2
metric.
In summary, we have constructed a sequence of continuous functions fn that converges, in d2
metric, to a discontinuous function, i.e., a function that does not belong to C[0, 1]. Therefore, the
metric space (C[a, b], d2 ) is not complete.

38
The moral of the story

The moral of the story is that if we wish to talk about convergence in the d2 metric – which we shall
have to do in our applications to signal and image processing – then we’ll have to extend the class of
functions considered. The space C[a, b] of continuous functions is insufficient. We must be prepared
to allow discontinuous functions.
The “completion” of the space C[a, b] with respect to the d2 metric is the space denoted as “L2 [a, b].
It is the metric space of functions for which the following distance is finite,
Z b 1/2
2
d2 (f, g) = [f (x) − g(x)] dx < ∞. (27)
a

More on this later.

End of Appendix

39
Lecture 4

Normed linear spaces

Metric spaces – sets with distance functions – are very nice, but they are not sufficient for the appli-
cations we wish to study. We would like to be able to add and subtract signals and images, even take
linear combinations of them. For this reason, we would like to work with spaces that have a vector
space structure. In the literature, vector spaces are also called “linear spaces.”

Definition: Let X be a real (or complex) vector space. A real-valued function kxk defined on X is a
norm on X if the following properties are satisfied:

1. (positivity) kxk ≥ 0 for all x ∈ X.

2. (strict positivity) kxk = 0 if and only if x = 0.

3. (triangle inequality) kx + yk ≤ kxk + kyk.

4. (homogeneity) kαxk = |α|kxk for any scalar α (α ∈ R or α ∈ C) for any x ∈ X.

kxk is the “length” of x ∈ X.

The pair X, k · k) is called a normed linear space.

The norm k · k defines a metric on X,

d(x, y) = kx − yk, x, y ∈ X. (28)

It can be verified that this metric satisfies all of the required properties of a metric. In particular,
the triangle inequality for norms given above guarantees that the above metric satisfies the triangle
inequality for metrics: If we replace x with x − z and y with −y + z in Property 3 above, we obtain

kx − yk ≤ kx − zk + ky − zk, (29)

or
d(x, y) ≤ d(x, z) + d(z, y). (30)

As a result, normed linear spaces are metric spaces.

40
If we now consider a normed linear space X as a metric space with metric d(x, y) = kx − yk, then
we may ask whether or not it is complete, i.e., whether all Cauchy sequences in X with respect to this
metric converge to an element in X. If so, then we say that the normed linear space X is complete.

Definition: A complete normed space is called a Banach space.

This is in honour of Stefan Banach (1892-1945), a distinguished Polish mathematician, considered to

be the founder of modern functional analysis.

Examples:

1. The space X = Rn . Of course, you are very familiar with the Euclidean magnitude of a vector,
i.e., an n-tuple, x = (x1 , x2 , · · · , xn ) to characterize its length as follows,

kxk2 = (x21 + x22 + · · · + x2n )1/2 . (31)

This is a particular example of a family of norms that can be assigned to elements of Rn , the
p-norms,
kxkp = [|x1 |p + |x2 |p + · · · + |xn |p ]1/p , p ≥ 1. (32)

As we saw for the p-metrics, the limit p → ∞ is a special case:

kxk∞ = max |xi |. (33)

1≤i≤n

By virtue of the completeness property of the real numbers, the normed linear spaces (Rn , k · kp )
are complete for all p ≥ 1 as well as the case p = ∞. They are all Banach spaces.

2. X = C[a, b], the space of continuous functions on the interval [a, b]. Here, we use the “infinity
norm:” For an f ∈ C[a, b],
kf k∞ = max |f (x)|. (34)
a≤x≤b

It generates the d∞ metric studied earlier: For f, g ∈ C[a, b],

d∞ (f, g) = kf − gk∞ = max |f (x) − g(x)|. (35)

a≤x≤b

3. The lp sequence spaces: For a p ≥ 1,

X
lp = {x = (x1 , x2 , · · ·) | |xi |p < ∞} (36)
i]1

41
We define the p-norm as follows: For an x = (x1 , x2 , · · ·),
"∞ #1/p
X
kxkp = |xi |p . (37)
i=1

As an example, consider the sequence,

1 1 1
x= 1, , , · · · , , · · · , (38)
2 3 n
∞
X 1
x ∈ lp for p ≥ 2 but x ∈
/ l1 . This follows from the fact that the sum p
converges for
n=1
n
p = 2, 3, 4, · · · but not for p = 1.

4. The space of real-valued, p-integrable functions on [a, b], denoted as Lp [a, b] and defined as
follows:
Z b
p
L [a, b] = {f : [a, b] → R | |f (x)|p dx < ∞}, p = 1, 2, · · · . (39)
a

The associated p-norms are given by

Z b 1/p
p
kf kp = |f (x)| dx . (40)
a

These norms define the dp metrics introduced in a previous lecture: For f, g ∈ Lp [a, b],
Z b 1/p
dp (f, g) = kf − gkp = |f (x) − g(x)|p dx . (41)
a

The most commonly used normed linear space in this family is the case p = 2, namely, the space
L2 [a, b] of square integrable functions with norm,
Z b 1/2
2
kf k2 = |f (x)| dx . (42)
a

(We say that a function f ∈ L2 [a, b] is “square integrable” since the integration of its squared
magnitude, |f (x)|2 , over the interval [a, b] is finite.) The metric associated with this norm is the
usual “L2 metric”,
Z b 1/2
2
d2 (f, g) = kf − gk2 = |f (x) − g(x)| dx . (43)
a

As you may recall – and as will certainly be discussed very shortly – this is the function space
that is relevant to Fourier series.

42
Note that for any p ≥ 1, the space Lp [a, b] includes the space of continuous functions C[a, b].
This follows from the fact that continuous functions on a closed interval are bounded: For each
f ∈ C[a, b], there exists an M ≥ 0 such that |f (x)| ≤ M for all x ∈ [a, b]. It then follows that
Z Z b
p
|f (x)| dx ≤ M p dx
a a
Z b
= Mp dx
a
= M p (b − a)

< ∞. (44)

Therefore f ∈ Lp .

But the Lp spaces include a great deal more: discontinuous functions, even unbounded functions.
(The latter won’t be needed in our applications.) And the Lp spaces are not identical. For
example, consider the example [a, b] = [0, 1]. The function f (x) = x−1/2 is an element of L1 [0, 1]
since
Z 1 Z 1
|f (x)| dx = x−1/2 dx = 2 < ∞. (45)
0 0

But it is not an element of L2 [0, 1] since

Z 1 Z 1
2
|f (x)| dx = x−1 dx diverges. (46)
0 0

Therefore, there are functions which are in L1 [0, 1] but not in L2 [0, 1], suggesting that L2 [a, b] ⊂
L1 [a, b]. This result can be generalized to Lp [a, b] ⊂ Lq [a, b] when p > q. But this is beyond the
scope of the course.

Note: The Lp spaces may easily be extended to include complex-valued functions f [a, b] → C.
The definition in (39) holds, with “R” simply replaced by “C”.

5. The space of real/complex-valued p-integrable functions on the real line R, i.e.

Z ∞
p
L (R) = {f : R → R (or C) | |f (x)|p dx < ∞}, p = 1, 2, · · · , (47)
−∞

with associated p-norms,

Z ∞ 1/p
p
kf kp = |f (x)| dx . (48)
−∞

It will be necessary to consider functions defined on the entire real line in this course. From your
previous encounters with improper integrals, you will not be surprised to know that functions

43
in Lp (R) must satisfy a rather stringent condition – that

f (x) → 0 as |x| → ∞. (49)

It’s actually a little more complicated than this – the rate of decay will depend on p.

“Best approximation” in normed linear spaces

We now come to an extremely important concept of this course. You are, of course, very familiar with
finite-dimensional normed linear spaces, such as X = Rn . In such cases, which we shall denote as
“dim(X) = n”, suppose that we have a set of n linearly independent elements ui ∈ X, 1 ≤ i ≤ n, i.e.,

c1 u1 + c2 u2 + · · · + cn un = 0 if and only if c1 = c2 = · · · = cn = 0. (50)

As you know, the {ui } form a basis for X: Given an element v ∈ X, there exists a set of coefficients
ci ∈ R, 1 ≤ i ≤ n, such that
v = c1 u1 + c2 u2 + · · · + cn un . (51)

This course is concerned with the approximation of functions which, as stated in a previous lecture,
are elements of infinite-dimensional spaces, i.e., dim(X) = ∞. In such cases, we have to be a little
more careful. Suppose that we have a set of linearly independent elements ui ∈ X, 1 ≤ i ≤ n. Define
Sn to be the span of the ui , i.e.,

Sn = span {u1 , u2 , · · · , un }

= {x ∈ X | x = c1 u1 + c2 u2 + · · · + cn un for some (c1 , c2 , · · · , cn ) ∈ Rn }. (52)

Sn is an n-dimensional subspace of the infinite-dimensional space X.

Now let v be an arbitrary element of X, depicted schematically below. The best approximation
in Sn to v ∈ X is given by the element vn ∈ Sn that is closest to v in terms of the distance function
defined by the norm on X, i.e.,
d(v, x) = kv − xk. (53)

We shall denote this minimum distance as

∆n = min kx − vk = kvn − vk. (54)

x∈Sn

44
X

v.
∆n = kvn − vk

.
vn Sn

Best (finite-dimensional) approximation vn ∈ Sn to v ∈ X.

Once again, vn is the element in Sn that lies closest to v, as measured by the metric that is
defined by the norm k · k in X.

The above statement regarding vn , which involves the preceding equation (54), may be written
mathematically and more compactly by means of the so-called “arg” notation. We write,

vn = arg min kx − vk . (55)

x∈Sn

This statement may be read as follows: “vn is the ‘argument’ or ‘element’ in Sn at which the mini-
mization over Sn is achieved”. In other words, “vn is the element in Sn that minimizes the distance
to v, i.e., kx − vk.

(Note: At this point we should make a parenthetical remark that, technically speaking, after the phrase “given

by the element vn ∈ Sn ” should be added “(provided it exists)”. As well, the “min” in the equation should

be replaced by “inf”. There are pathological situations where a minimum value for the error is approached

in some limit by a sequence of approximations, with the actual best approximation not existing. But in most

applications, including those considered here, this will not be the case, i.e., Eq. (54) will hold. And in the case

of Hilbert spaces, i.e., complete inner product spaces – to be discussed next – such a “minimizer” always exists,

and it is unique.)

Since the approximation vn lies in Sn , it will have an expansion in the basis set ui , i.e.,

vn = c1 u1 + c2 u2 + · · · + cn un . (56)

We then write that

v ≈ vn = c1 u1 + c2 u2 + · · · + cn un , (57)

45
often adding the phrase “in the norm k · k or associated metric on X” (e.g., “in L2 norm,” “in d2
metric” or even “in L2 metric”). The error of this approximation or simply “approximation error” is

∆n = kv − vn k . (58)

As we increase n, i.e., the number of basis elements ui employed, the approximation cannot get
worse, i.e., the error ∆n cannot increase. (If we employ basis elements uk that are not used, i.e., their
associated coefficients are zero, then the approximation error will remain the same.) Hopefully, the
∆n will decrease as n increases. And, indeed, we would hope that ∆n → 0 as n → ∞. Of course, if,
for some n, v ∈ Sn , then ∆n = 0, i.e., v = vn , admitting the expansion in (57).

Note: In the discussion above, there has been no mention of orthogonality. This is because normed
linear spaces, in general, are not necessarily inner product spaces that are equipped with an inner
product, from which comes the property of orthogonality. The special case of best approximation in
inner product spaces will be discussed in the next section of this course.

We now consider some examples of best approximations in normed linear spaces.

1. Example No. 1: Let X = C[a, b], the normed linear space of continuous functions on [a, b].
The functions uk (x) = xk−1 , k = 1, 2, · · ·, i.e., {1, x, x2 , · · ·}, form a linearly independent set,
hence a basis in C[0, 1]. In this case, given a function f ∈ C[0, 1], the best approximation in Sn
would be the n − 1-degree polynomial approximation to f having the form,

f (x) ∼
= vn = c0 + c1 x + · · · cn−1 xn−1 . (59)

If we use the k · k∞ norm on this space, i.e.,

kf k∞ = max |f (x)| , (60)

x∈[a,b]

the best approximation vn ∈ Sn is obtained by solving the following optimization problem,

min kf − vn k = min max |f (x) − c1 − c1 x − · · · − cn−1 xn−1 | . (61)

c0 ,···,cn−1 c0 ,···,cn−1 x∈[a,b]

From a practical point of view, this is a very complicated optimization problem.

(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1 (x) = 1.
The best approximation v1 ∈ S1 is the best constant approximation to f (x) in the d∞

46
metric, i.e., f (x) ≃ c, where c is obtained by minimizing the distance function,

∆∞ (c) = d∞ (f, c) = kf − ck∞ = max |f (x) − c| . (62)

x∈[a,b]

If the formula for f (x) exists, we may be able to remove the absolute value by considering
the various intervals over which the integrand, f (x)−c, is positive and negative. In Problem
Set No. 1, you are asked to find the best constant approximation to f (x) = x2 on [0,1]
using the above distance function.

2. Example No. 2: Let X = L1 [a, b], the normed linear space of functions on [a, b] satisfying the
condition that Z b
kf k1 = |f (x)| dx < ∞ . (63)
a

Recall that the metric associated with this norm is given by

Z b
d1 (f, g) = kf − gk1 = |f (x) − g(x)| dx f, g ∈ L1 [a, b] . (64)
a

Recall that the space of continuous functions C[a, b] is a subset of this space. The functions
uk (x) = xk−1 , k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L1 [0, 1], its best
approximation in Sn is given by the solution to the following optimization problem,
Z b
min kf − vn k1 = min |f (x) − c0 − c1 x − · · · − cn−1 xn−1 | dx . (65)
c0 ,···,cn−1 c0 ,···,cn−1 a

This problem actually turns out to be a bit more tractable – it can be solved by the method of
linear programming since the optimization problem in linear in the unknowns ci .

(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1 (x) = 1.
The best approximation v1 ∈ S1 is the best constant approximation to f (x) in the L1
metric, i.e., f (x) ≃ c, where we minimize the distance function,
Z b
∆1 (c) = d1 (f, c) = kf − ck1 = |f (x) − c| dx , (66)
a

where we have replaced c0 with c. Because the integrand contains an absolute value we
cannot use differentiation methods to find the c-value which minimizes ∆1 (c). If the formula
for f (x) exists and is not too complicated, it may be able to evaluate the integral by
evaluating it on separate intervals over which f (x) − c is positive and negative. In Problem
Set No. 1, you are asked to find the best constant approximation to f (x) = x2 on [0,1]
using the above distance function.

47
3. Example No. 3: Let X = L2 [a, b], the normed linear space of functions on [a, b] satisfying the
condition Z b
kf k22 = [f (x)]2 dx < ∞ . (67)
a

Recall that the metric associated with this norm is given by

Z b 1/2
2
d2 (f, g) = kf − gk2 = [f (x) − g(x)] dx , f, g ∈ L1 [a, b] . (68)
a

Once again, the space of continuous functions C[a, b] is a subset of this space. The functions
uk (x) = xk−1 , k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L2 [0, 1], its best
approximation in Sn is given by the solution to the following optimization problem,
Z b
2
min kf − vn k2 = min [f (x) − c0 − c1 x − · · · − cn−1 xn−1 ]2 dx . (69)
c0 ,···,cn−1 c0 ,···,cn−1 a

Note that we have chosen to minimize the squared L2 distance since it avoids square roots.
(Minimizing the square of a positive function h(x) is the same as minimizing h(x).) This problem
can, in principle, be solved analytically by finding the stationary points of the squared distance
function,
Z b
∆22 (c0 , c1 , · · · , cn−1 ) = [f (x) − c0 − c1 x − · · · − cn−1 xn−1 ]2 dx . (70)
a

This is one of the reasons that the metric associated with the space L2 is used in most signal
and image processing applications. We’ll illustrate with two special cases.

(a) The special case n = 1. Here again, we use only one function from the basis set, i.e.,
u1 (x) = 1. The best approximation v1 ∈ S1 is once again the best constant approximation
to f (x), but this time with respect to the L2 metric, i.e., f (x) ≃ c, where we minimize the
squared L2 distance function,
Z b
h(c) = ∆21 (c) = [d2 (f, c)]2 = kf − ck22 = [f (x) − c]2 dx . (71)
a

There are two ways to solve this minimization problem.

• Method No. 1: Expand the integrand and integrate to produce an expression for
∆22 (c) in terms of c:
Z b Z b Z b
h(c) = ∆22 (c) = 2
[f (x)] dx − 2c f (x) dx + c 2
dx
a a a
Z b Z b
2
= [f (x)] dx − 2c f (x) dx + c2 (b − a) . (72)
a a

48
Find critical points:
Z b
′
h (c) = −2 f (x) dx + 2c(b − a) . (73)
a

Setting h′ (c) = 0 yields the result,

b
1
Z
c= f (x) dx . (74)
b−a a

Note that h′′ (c) = 2(b − a) > 0, so the critical point is a global minimum.

In summary, the best constant approximation to a function f (x) using the L2 metric
on [a, b] is given by
b
1
Z
f (x) ≃ f (x) dx . (75)
b−a a

Note that this constant is the mean or average value of f (x) over [a, b], often denoted
as f¯[a,b] . This is a very well known result in signal and image processing.

• Method No. 2: Use Leibniz’ Rule to differentiate the integral in (71),

Z b
′
h (c) = 2 [f (x) − c] dx = 0 , (76)
a

which yields the same result for c.

In Problem Set No. 1, you are asked to provide the best piecewise-constant approxi-
mation to the function f (x) = x2 on [0,1] using the above method.

(b) The special case n = 2. We now use two functions rom the basis set, i.e., u1 (x) = 1 and
u2 (x) = x to produce the best approximation, v2 (x), to f (x) having the form,

f (x) ≃ c0 + c1 x . (77)

We must now minimize the following squared L2 distance function,

Z b
h(c0 , c1 ) = ∆22 (c0 , c1 ) = kf − c0 − c1 xk22 = [f (x) − c0 − c1 x]2 dx , (78)
a

which is now a function of two variables, c0 and c1 . Critical points (c0 , c1 ) must satisfy the
following stationarity conditions,
b
∂h
Z
= −2 [f (x) − c0 − c1 x] dx = 0 ,
∂c0 a
b
∂h
Z
= −2 [f (x) − c0 − c1 x] x dx = 0 , . (79)
∂c1 a

49
These conditions yield the following set of inhomogeneous linear equations in c and d,
Z b Z b Z b
dx c0 + x dx c1 = f (x) dx
a a a
Z b Z b Z b
2
x dx c0 + x dx c1 = xf (x) dx . (80)
a a a

The integrals on the LHS are easily evaluated in terms of the endpoints a and b. Assuming
that the integrals on the RHS involving f (x) can be evaluated, the system is easily solved
using Cramer’s Rule.

This system of equations may be viewed as a continuous version of the classical method of
least squares best approximation of data points (xi , f (xi )) by the straight line y = c0 + c1 x.

4. Example No. 4: We return to the approximations that are yielded by partial sums of the
Fourier series of a function f (x) defined on the interval [−π, π], i.e.,
N
X
f (x) ≃ SN (x) = a0 + [an cos nx + bn sin nx] , (81)
n=1

where the coefficients an and bn are given in Eq. (2) in Lecture 2. The relevant normed linear
space is L2 [−π, π] which is also an inner product space. The approximation SN (x) is an element
of a 2N + 1-dimensional space that is spanned by the basis functions,

1, cos x, sin x, · · · , cos N x, sin N x , (82)

Here, we simply state that SN (x) is the best approximation to f (x) in this space and that the
coefficients an and bn are obtained from the inner product defined on this space. We shall justify
these statements in the next section, where we discuss approximations in an inner product, or
“Hilbert”, space.

Once again, we mention that in the discussion preceding the above examples, nothing was said
about the “orthogonality” of the basis set un . That is because nothing could be said! We have been
working with normed linear spaces, which are not necessarily inner product spaces. Only in
inner product spaces, the subject of the next section, can we have the property of orthogonality.

50
Inner product spaces

Of course, you are familiar with the idea of inner product spaces – at least finite-dimensional ones.
Let X be an abstract vector space with an inner product, denoted as “h , i”, a mapping from X × X
to R (or perhaps C). The inner product satisfies the following conditions,

1. hx + y, zi = hx, zi + hy, zi, ∀x, y, z ∈ X ,

2. hαx, yi = αhx, yi, ∀x, y ∈ X, α ∈ R,

3. hx, yi = hy, xi, ∀x, y ∈ X,

4. hx, xi ≥ 0, ∀x ∈ X, hx, xi = 0 if and only if x = 0.

We then say that (X, h , i) is an inner product space.

In the case that the field of scalars is C, then Property 3 above becomes

3. hx, yi = hy, xi, ∀x, y ∈ X,

where the bar indicates complex conjugation. Note that this implies that, from Property 2,

hx, αyi = αhx, yi, ∀x, y ∈ X, α ∈ C.

Note: For anyone who has taken courses in Mathematical Physics, the above properties may be
slightly different than what you have seen, as regards the complex conjugations. In Physics, the usual
convention is to complex conjugate the first entry, i.e. hαx, yi = αhx, yi.

The inner product defines a norm as follows,

hx, xi = kxk2
p
or kxk = hx, xi. (83)

(Note: An inner product always generates a norm. But not the other way around, i.e., a norm is not
always expressible in terms of an inner product. You may have seen this in earlier courses – the norm
has to satisfy the so-called “parallelogram law.”)

And where there is a norm, there is a metric: The norm defined by the inner product h , i will define
the following metric,
p
d(x, y) = kx − yk = hx − y, x − yi, ∀x, y ∈ X. (84)

And where there is a metric, we can discuss convergence of sequences, etc..

51
A complete inner product space is called a Hilbert space, in honour of the celebrated
mathematician David Hilbert (1862-1943).

Finally, the inner product satisfies the following relation, called the “Cauchy-Schwarz inequality,”

|hx, yi| ≤ kxkkyk, ∀x, y ∈ X. (85)

You probably saw this relation in your studies of finite-dimensional inner product spaces, e.g., Rn . It
holds in abstract spaces as well.

52
Lecture 5

Inner product spaces (cont’d)

Examples:

1. X = Rn . Here, for x = (x1 , x2 , · · · , xn ) and y = (y1 , y2 , · · · , yn ),

hx, yi = x1 y1 + x2 y2 + · · · + xn yn . (86)

The norm induced by the inner product is the familiar Euclidean norm, i.e.
" n #1/2
X
2
kxk = kxk2 = xi . (87)
i=1

And associated with this norm is the Euclidean metric, i.e.,

" n #1/2
X
2
d(x, y) = kx − yk = [xi − yi ] . (88)
i=1

You’ll note that the inner product generates bf only the p = 2 norm or metric. Rn is also
complete, i.e., it is a Hilbert space.

2. X = Cn – a minor modification of the real vector space case. Here, for x = (x1 , x2 , · · · , xn ) and
y = (y1 , y2 , · · · , yn ),
hx, yi = x1 y1 + x2 y2 + · · · + xn yn . (89)

The norm induced by the inner product will be

" n
#1/2
X
kxk = kxk2 = |xi |2 , (90)
i=1

and the associated metric is

" n
#1/2
X
d(x, y) = kx − yk = |xi − yi |2 . (91)
i=1

Cn is complete, therefore a Hilbert space.

∞
X
3. The sequence space l2 introduced earlier: Here, x = (x1 , x2 , · · ·) ∈ l2 implies that x2i < ∞.
i=1
For x, y ∈ l2 , the inner product is defined as
∞
X
hx, yi = xi y i . (92)
i=1

Note that l2 is the only lp space for which an inner product exists. It is a Hilbert space.

53
4. X = C[a, b], the space of continuous functions on [a, b] is NOT an inner product space.

5. The space of (real- or complex-valued) square-integrable functions L2 [a, b] introduced earlier.

Here, Z b
kf k2 = hf, f i = |f (x)|2 dx < ∞. (93)
a

The inner product in this space is given by

Z b
hf, gi = f (x)g(x) dx, (94)
a

where we have allowed for complex-valued functions.

As in the case of sequence spaces, L2 is the only Lp space for which an inner product exists. It
is also a Hilbert space.

6. The space of (real- or complex-valued) square-integrable functions L2 (R) on the real line, also
introduced earlier. Here,
Z ∞
2
kf k = hf, f i = |f (x)|2 dx < ∞. (95)
−∞

The inner product in this space is given by

Z ∞
hf, gi = f (x)g(x) dx, (96)
−∞

Once again, L2 is the only Lp space for which an inner product exists. It is also a Hilbert space,
and will the primary space in which we will be working for the remainder of this course.

Orthogonality in inner product spaces

An important property of inner product spaces is “orthogonality.” Let X be an inner product space.
If hx, yi = 0 for two elements x, y ∈ X, then x and y are said to be orthogonal (to each other).
Mathematically, this relation is written as “x ⊥ y,” just as is done for vectors in Rn .

We’re now going to need a few ideas and definitions for the discussion that is coming up.

• Recall that a subspace Y of a vector space X is a nonempty subset Y ⊂ X such that for all
y1 , y2 ∈ Y , and all scalars c1 , c2 , the element c1 y1 + c2 y2 ∈ Y , i.e., Y is itself a vector space. This
implies that Y must contain the zero element, i.e., y = 0.

54
• Moreover the subspace Y is convex: For every x, y ∈ Y , the “segment” joining x + y, i.e., the
set of all convex combinations,

z = αx + (1 − α)y, 0 ≤ α ≤ 1, (97)

is contained in Y .

• A vector space X is said to be the direct sum of two subspaces, Y and Z, written as follows,

X = Y ⊕ Z, (98)

if each x ∈ Z has a unique representation of the form

x = y + z, y ∈ Y, z ∈ Z. (99)

The sets Y and Z are said to be algebraic complements of each other.

Note: The concept of an algebraic complement does not have to invoke the use of orthogonality.
(Unfortunately, it was used in the lecture.)

• In Rn and inner product spaces X in general, it is convenient to consider spaces that are orthog-
onal to each other. Let S ⊂ X be a subset of X and define

S ⊥ = {x ∈ X | x ⊥ S} = {x ∈ X | hx, yi = 0 ∀y ∈ S}. (100)

The set S ⊥ is said to be the orthogonal complement of S.

55
Aside: Some remarks regarding the idea of a “complement”

This section was not covered in the lecture. It is included here only for supplementary
purposes. You will not be examined on this material.

The concept of an algebraic complement does not have to invoke the use of orthogonality. With
thanks to a student who once raised the question of algebraic vs. orthogonal complementarity in class,
let us consider the following example.
Let X denote the (11-dimensional) vector space of polynomials of degree 10, i.e.,

X10
X={ ck xk , ck ∈ R, 0 ≤ k ≤ 10}. (101)
k=0

Equivalently,
X = span {1, x, x2 , · · · , x10 }. (102)

Now define
Y = span {1, x, x2 , x3 , x4 , x5 }, Z = span {x6 , x7 , x8 , x9 , x10 }. (103)

First of all, Y and Z are subspaces of X. Furthermore, X is a direct sum of the subspaces Y and Z.
However, the spaces Y and Z are not orthogonal complements of each other.
First of all, for the notion of orthogonal complmentarity, we would have to define an interval of
support, e.g., [0, 1], over which the inner products of the functions is defined. (And then we would
have to make sure that all inner products involving these functions are defined.) Using the linearly
independent functions, xk , − ≤ k ≤ 10, one can then construct (via Gram-Schmidt orthogonalization)
an orthogonal set of polynomial basis functions, {φk (x)}, 0 ≤ k ≤ 10, over X. It is possible that the
first few members of this orthogonal set will contain the functions xk , 0 ≤ k ≤ 5, which come from
the set Y . But the remaining members of the orthogonal set will contain higher powers of x, i.e., xk ,
6 ≤ k ≤ 10, as well as lower powers of x, i.e., xk ¡ 0 ≤ k ≤ 5. In other words, the remaining members
of the orthogonal set will not be elements of the set Y – they will have nonzero components in X.
See also Example 3 below.

56
Examples:

1. Let X be the Hilbert space R3 and S ⊂ X defined as S = span{(1, 0, 0)}. Then S ⊥ =

span{(0, 1, 0), (0, 0, 1)}. In this case both S and S ⊥ are subspaces.

2. As before X = R3 but with S = {(c, 0, 0) | c ∈ [0, 1]}. Now, S is no longer a subspace but simply
a subset of X. Nevertheless S ⊥ is the same set as in 1. above, i.e., S ⊥ = span{(0, 1, 0), (0, 0, 1)}.
We have to include all elements of X that are orthogonal to the elements of S. That being
said, we shall normally be working more along the lines of Example 1, i.e., subspaces and their
orthogonal complements.

3. Further to the discussion of algebraic vs. orthogonal complementarity, consider the same spaces
X, Y and Z as defined in Eqs. (102) and (103), but defined over the interval [−1, 1]. The
orthogonal polynomials φk (x) over [−1, 1] that may be constructed from the functions xk , 0 ≤
k ≤ 10 are the so-called Legendre polynomials, Pn (x), listed below:

n Pn (x)
0 1
1 x
1 2
2 2 (3x − 1)
1 3
3 2 (5x − 3x)
1 4
4 8 (35x − 30x2 + 3)
(104)
1 5
5 8 (63x − 70x3 + 15x)
1 6
6 16 (231x − 315x4 + 105x2 − 5)
1 7
7 16 (429x − 693x5 + 315x3 − 35x)
1 8
8 128 (6435x − 12012x6 + 6930x4 − 1260x2 + 35)
1 9
9 128 (12155x − 25740x7 + 18018x5 − 4620x3 + 315x)
1 10
10 256 (46189x − 109395x8 + 90090x6 − 30030x4 + 3465x2 − 63)

These polynomials satisfy the following orthogonality property,

1
2
Z
Pm (x)Pn (x) dx = δmn , (105)
−1 2n + 1

57
where δmn is the Kronecker delta,

 1, m = n,
δmn = (106)
 0, m =
6 n.

From the above table, we see that the Legendre polynomials Pn (x), 1 ≤ n ≤ 5 belong to space
Y , whereas the polynomials Pn , 5 ≤ n ≤ 10, do not belong solely to Z. This suggests that the
spaces Y and Z are not orthogonal complements. However, the following spaces are orthogonal
complements:

Y ′ = span{P0 , P1 , P2 , P3 , P4 , P5 }, Z ′ = span{P6 , P7 , P8 , P9 , P10 }. (107)

(Actually, Y ′ is identical to Y defined earlier.)

There is, however, another decomposition going on in this space, which is made possible by
the fact that the interval [−1, 1] is symmetric with respect to the point x = 0. Note that the
polynomials Pn (x) are either even or odd. This suggests that we should consider the following
subsets of X,

Y ′′ = {u ∈ X | u is an even function}, Z ′′ = {u ∈ X | u is an odd function}. (108)

It is a quite simple exercise to show that any function f (x) defined on an interval [−a, a] may
be written as a sum of an even function and an odd function. This implies that any element
u ∈ X may be expressed n the form

u = v + w, v ∈ Y ′′ , w ∈ Z ′′ . (109)

Therefore the spaces Y ′ and Z ′ are algebraic complements. In terms of the inner product of
functions over [−1, 1], however, Y ′ and Z ′ are also orthogonal complements since
Z 1
f (x)g(x) dx = 0, (110)
−1

if f and g have different parity.

58
The discussion that follows will be centered around Hilbert spaces, i.e., complete inner product
spaces. This is because we shall need the closure properties of these spaces, i.e., that they contain
the limit points of all sequences. The following result is very important.

The “Projection Theorem” for Hilbert spaces

Theorem: Let H be a Hilbert space and Y ⊂ H any closed subspace of H. (Note: This means that
Y contains its limit points. In the case of finite-dimensional spaces, e.g., Rn , a subspace is closed. But
a subspace of an infinite-dimensional vector space need not be closed.) Now let Z = Y ⊥ . Then for
any x ∈ H, there is a unique decomposition of the form

x = y + z, y ∈ Y, z ∈ Z = Y ⊥ . (111)

The point y is called the (orthogonal) projection of x on Y .

This is an extremely important result from analysis, and equally important in applications. We’ll
examine its implications a little later, in terms of “best approximations” in a Hilbert space. In the
figure below, we provide a sketch that will hopefully illustrate the situation.

0
Y

The space Y is contained between the two lines that emanate from 0. Note that Y lies on both sides
of 0: If p ∈ Y , then −p, which lies on the “other side” of 0, is also an element of Y . The point y ∈ Y is
the orthogonal projection of x onto the set Y and may be viewed as the point of intersection betwenn
a line which extends from x to the set Y in such a way that it is “perpendicular” to Y . The examples
that we consider should clarify this idea.

59
From Eq. (111), we can define a mapping PY : H → Y , the projection of H onto Y so that

PY : x → y. (112)

Note that
PY : H → Y, PY : Y → Y, PY : Y ⊥ → {0}. (113)

Furthermore, PY is an idempotent operator, i.e.,

PY2 = PY . (114)

This follows from the fact that PY (x) = y and PY (y) = y, implying that PY (PY (x)) = PY (x).

Finally, note that

kxk2 = kyk2 + kzk2 . (115)

This follows from the fact that the norm is defined by means of the inner product:

kxk2 = hx, xi

= hy + z, y + zi

= hy, yi + hy, zi + hz, yi + hz, zi

= kyk2 + kzk2 , (116)

where the final line results from the fact that hy, zi = hz, yi since y ∈ Y and z ∈ Z = y ⊥ .

Example: Let H = R3 , Y = span{(1, 0, 0)} and Y ⊥ = span{(0, 1, 0), (0, 0, 1)}. Then x = (1, 2, 3)
admits the unique expansion,
(1, 2, 3) = (1, 0, 0) + (0, 2, 3), (117)

where y = (1, 0, 0) ∈ Y and z = (0, 2, 3) ∈ Y ⊥ . y is the unique projection of x on Y .

Orthogonal/orthonormal sets of a Hilbert space

Let H denote a Hilbert space. A set {u1 , u2 , · · · un } ⊂ H is said to form an orthogonal set in H if

hui , uj i = 0 for i 6= j. (118)

If, in addition,
hui , ui i = kui k2 = 1, 1 ≤ i ≤ n, (119)

60
then the {ui } are said to form an orthonormal set in H.

You will not be surprised by the following result, since you have most probably seen it in earlier
courses in linear algebra.

Theorem: An orthogonal set {u1 , u2 , · · ·} not containing the element {0} is linearly independent.

Proof: Assume that there are scalars c1 , c2 , · · · , cn such that

c1 u1 + c2 u2 · · · cn un = 0. (120)

For each k = 1, 2, · · · n, form the inner product of both sides of the above equation with uk , i.e.,

huk , c1 u1 + c2 u2 · · · + cn un i = huk , 0i = 0. (121)

By the orthogonality of the ui , the LHS of the above equation reduces to ck kuk k2 , implying that

ck kuk k2 = 0, k = 1, 2, · · · , n. (122)

By assumption, however, uk 6= 0, implying that kuk k2 6= 0. This implies that all scalars ak are zero,
which means that the set {u1 , u2 , · · · , un } is linearly independent.

Note: As you have also seen in courses in linear algebra, given a linearly independent set {v1 , v2 , · · · , vn },
we can always construct an orthonormal set {e1 , e2 , · · · , en } via the Gram-Schmidt orthogonalization
procedure. Moreover,
span{v1 , v2 , · · · , vn } = span{e1 , e2 , · · · , en }. (123)

We have now arrived at the most important result of this section.

61
Best approximation in Hilbert spaces

Recall the idea of the best approximation in normed linear spaces, discussed a couple of lectures ago.
Let X be an infinite-dimensional normed linear space. Furthermore, let ui ∈ X, 1 ≤ i ≤ n, be a set
of n linearly independent elements of X and define the n-dimensional subspace,

Sn = span{u1 , u2 , · · · , un }. (124)

Then let x be an arbitrary element of X. We wish to find the best approximation to x in the subspace
Sn . It will be given by the element yn ∈ Sn that lies closest to x, i.e.,

yn = arg min kx − vk. (125)

v∈Sn

(The variables used above may be different from those used in the earlier lecture.)

We are going to use the same idea of best approximation, but in a Hilbert space setting, where
we have the additional property that an inner product exists in our space. This, of course, opens the
door to the idea of orthogonality, which will play an important role.

The best approximation in Hilbert spaces may be phrased as follows:

Theorem: Let {e1 , e2 , · · · , en } be an orthonormal set in a Hilbert space H. Define Y = span{ei }ni=1 .
Y is a subspace of H. Then for any x ∈ H, the best approximation of x in Y is given by the unique
element
n
X
y = PY (x) = ck ek (projection of x onto Y ), (126)
k=1

where
ck = hx, ek i, k = 1, 2, · · · , n. (127)

The ck are called the Fourier coefficients of x w.r.t. the set {ek }.
Furthermore,
n
X
|ck |2 ≤ kxk2 . (128)
k=1

Proof: Any element v ∈ Y may be written in the form

n
X
v= ck ek . (129)
k=1

62
The best approximation to x in Y is the point y ∈ Y that minimizes the distance kx − vk, v ∈ Y , i.e.,

y = arg min kx − vk. (130)

v∈Y

In other words, we must find scalars c1 , c2 , · · · , cn such that the distance,

n

X
f (c1 , c2 , · · · , cn ) = x − ck ek , (131)

k=1

is minimized. Here, f : Rn (or Cn ) → R. It is easier to consider the non-negative squared distance

function,
2
n

X
g(c1 , c2 , · · · , cn ) = x − ck ek

k=1
n n
* +
X X
= x− ck ek , x − cl el ≥ 0. (132)
k=1 l=1

Minimizing g is equivalent to minimizing f .

Here, we consider the real-scalar-valued case, i.e., ci ∈ R, which is somewhat simpler than the
complex-valued case. (The latter which, of course, includes the former, is proved in an Appendix
at the end of this day’s lecture.) Using the linearity properties of the inner product, we can first
right-hand side into four components as follows,
n n
* +
X X
x− ck ek , x − cl el
k=1 l=1
m
* + * n + * n n
+
X X X X
= hx, xi − x, cl el − ck ek , x − ck ek , cl el . (133)
l=1 k=1 k=1 l=1

The first term on the RHS is simply hx, xi = kxk2 . The second term, using once again the linearity
properties of the inner product, may be expressed as follows,
n n n
* +
X X X
x, cl el = hx, cl el i = cl hx, el i . (134)
l=1 l=1 l=1

Likewise, the third term becomes

* n + n n
X X X
ck ek , x = ck hek , xi = ck hx, ek i . (135)
k=1 k=1 k=1

The final inner product on the RHS of Eq. (133) becomes,

* n n
+ n X n
X X X
ck ek , cl el = ck cl hek , el i
k=1 l=1 k=1 l=1
n
X
= c2k , (136)
k=1

63
where the final line is a result of the orthonormality of the en .
From all of these calculations, Eq. (133) becomes,
n
X n
X n
X
2
g(c1 , c2 , · · · , cn ) = kxk − cl hx, el i − ck hx, ek i + c2k . (137)
l=1 k=1 k=1

Recall that we would like to minimize this function of n variables. We first impose the necessary
stationarity conditions for a minimum,
∂g
= −hx, ep i − hep , xi + 2cp = 0, p = 1, 2, · · · , n. (138)
∂cp
Therefore,
cp = hx, ep i, p = 1, 2, · · · , n, (139)

which identifies a unique point c ∈ Rn , to which corresponds a unique element y ∈ Y . In order to

check that this point corresponds to a minimum, we examine the second partial derivatives,
∂2g
= 2δqp . (140)
∂cq ∂cp
In other words, the Hessian matrix is diagonal and positive definite for all c 6= 0. Therefore the point
correponds to a global minimum.

Finally, substitution of these (real or complex) values of ck into the squared distance function in
Eq. (132) yields the result
n
X n
X n
X
2 2 2
g(c1 , c2 , · · · cn ) = kxk − |cl | − |ck | + |ck |2
l=1 k=1 k=1
n
X
= kxk2 − |ck |2
k=1
≥ 0, (141)

which then implies Eq. (128). The proof of the Theorem is complete.

Some additional comments regarding the best approximation:

1. The above result implies that the element x ∈ H may be expressed uniquely as

x = y + z, y ∈ Y, z ∈ Z = Y ⊥ . (142)

To see this, define

n
X
z =x−y =x− ck ek , (143)
k=1

64
where the ck are given by (127). For l = 1, 2, · · · , n, take the inner product of el with both sides
of this equation to give
n
X
hz, el i = hx, el i − ck hek , el i
k=1
= hx, el i − cl

= 0. (144)

Therefore, z ⊥ el , l = 1, 2, · · · , n, implying that z ∈ Y ⊥ .

2. The term z in (142) may be viewed as the residual in the approximation x ≈ y. Its norm then
defines the error of approximation,
kx − yk = kzk. (145)

3. As the dimension n of the orthonormal set {e1 , e2 , · · · , en } is increased, we expect to obtain

better approximations to the element x – unless, of course, x is an element of one of these
finite-dimensional spaces, in which case we arrive at zero approximation error, and no further
improvement is possible. Let us designate Yn = span{e1 , e2 , · · · , en }, and yn = PYn (x) the best
approximation to x in Yn . Then the error of this approximation is given by

kzn k = kx − yn k. (146)

We expect that kzn+1 k ≤ kzn k.

4. Note also that the inequality

n
X
|ck |2 ≤ kxk2 (147)
k=1

holds for all appropriate values of n > 0. (If H is finite-dimensional, i.e., dim(H) = N , then
n = 1, 2, · · · , N .) In other words, the partials sums on the left are bounded from above. This
inequality, known as Bessel’s inequality, will have important consequences.

Appendix: Best approximation in the complex-scalar case

We now consider the squared distance function g(a) in Eq. (132) in the case that H is a complex
Hilbert space, i.e., a ∈ Cn .
2
n

X
g(c1 , c2 , · · · , cn ) = x − ck ek

k=1

The first and last terms are fixed. The middle term is a sum of nonnegative numbers. The minimum
value is achieved when all of these terms are zero. Consequently, f (c1 , · · · , cn ) is a minimum if and
only if ck = hx, ek i for k = 1, 2, · · · , n. As in the real case, we have
n
X
kxk2 − |hx, ek i|2 ≥ 0. (149)
k=1

Accelerationism: Capitalism As Critique & Other Essays
100% (2)
Accelerationism: Capitalism As Critique & Other Essays
180 pages
Problems
No ratings yet
Problems
10 pages
Excel 1
No ratings yet
Excel 1
13 pages
7a Calculus
No ratings yet
7a Calculus
6 pages
SPACEMEN - Friends and Foes - (Part I) by Trevor James - 1957 - Project ETHERIA
100% (1)
SPACEMEN - Friends and Foes - (Part I) by Trevor James - 1957 - Project ETHERIA
23 pages
Q. HO-KIM - Group Theory: A Physicist's Primer
100% (3)
Q. HO-KIM - Group Theory: A Physicist's Primer
320 pages
7d Cinema Details
No ratings yet
7d Cinema Details
4 pages
Grosz 1999 Becomings Explorations in Time Memory and Futures Futures
100% (2)
Grosz 1999 Becomings Explorations in Time Memory and Futures Futures
134 pages
Reflection Questions
100% (1)
Reflection Questions
13 pages
Fal PDF
No ratings yet
Fal PDF
42 pages
Physics and Philosophy: Werner Carl Heisenberg
No ratings yet
Physics and Philosophy: Werner Carl Heisenberg
32 pages
Chapter 3 Vectors PDF
No ratings yet
Chapter 3 Vectors PDF
43 pages
What Is Spacetime?
No ratings yet
What Is Spacetime?
4 pages
2 - Form of The Earth
No ratings yet
2 - Form of The Earth
58 pages
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
No ratings yet
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
45 pages
Ondiculas en Espacios de Hilbert
No ratings yet
Ondiculas en Espacios de Hilbert
35 pages
Wavelets On The Interval and Fast Wavelet Transforms: Albert Cohen, Ingrid Daubechies, Pierre Vial
No ratings yet
Wavelets On The Interval and Fast Wavelet Transforms: Albert Cohen, Ingrid Daubechies, Pierre Vial
29 pages
Vertex-Cay 1 PDF
No ratings yet
Vertex-Cay 1 PDF
47 pages
Minimal Representations of Orientation Homogeneous Transformations
No ratings yet
Minimal Representations of Orientation Homogeneous Transformations
14 pages
Series de Fourier Con Bases Ortonormales
No ratings yet
Series de Fourier Con Bases Ortonormales
41 pages
Bases Biortogonales
No ratings yet
Bases Biortogonales
23 pages
Complete Metric Space: 1 Sequence
No ratings yet
Complete Metric Space: 1 Sequence
44 pages
Analisis Armonico
No ratings yet
Analisis Armonico
36 pages
Semi-Orthogonal Wavelets of Space and Fast Wavelet Algorithms
No ratings yet
Semi-Orthogonal Wavelets of Space and Fast Wavelet Algorithms
6 pages
Análisis Funcional para Bases Wavelet
No ratings yet
Análisis Funcional para Bases Wavelet
34 pages
Jeep 206
No ratings yet
Jeep 206
18 pages
Análisis de La Convergencia de Sucesiones para Wavelets
No ratings yet
Análisis de La Convergencia de Sucesiones para Wavelets
31 pages
University of Waterloo Department of Applied Mathematics Amath 391: From Fourier To Wavelets Fall 2019 Lecture Notes
No ratings yet
University of Waterloo Department of Applied Mathematics Amath 391: From Fourier To Wavelets Fall 2019 Lecture Notes
29 pages
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
No ratings yet
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
27 pages
Ortonormalidad en Espacios de Hilbert
No ratings yet
Ortonormalidad en Espacios de Hilbert
20 pages
Grade 7-Time and Motion-Class Work
No ratings yet
Grade 7-Time and Motion-Class Work
4 pages
Marcos de Bases Ortonormales
No ratings yet
Marcos de Bases Ortonormales
17 pages
Mathtriangle PDF
No ratings yet
Mathtriangle PDF
34 pages
Inequalities Type AND: OF Littlewood-Paley For Frames Wavelets
No ratings yet
Inequalities Type AND: OF Littlewood-Paley For Frames Wavelets
15 pages
Metricspaces PDF
No ratings yet
Metricspaces PDF
40 pages
Statistical Convergence in G-Metric Spaces: Rasoul Abazari
No ratings yet
Statistical Convergence in G-Metric Spaces: Rasoul Abazari
9 pages
Alignment Design by Elements
No ratings yet
Alignment Design by Elements
5 pages
Series Sucesiones y Weiertras
No ratings yet
Series Sucesiones y Weiertras
31 pages
II-3Sequences and Completness
No ratings yet
II-3Sequences and Completness
4 pages
LP8 60 Proving Triangle Congruence
No ratings yet
LP8 60 Proving Triangle Congruence
1 page
Kula's Formula For The Tangent To Circles
No ratings yet
Kula's Formula For The Tangent To Circles
5 pages
Note2c MathDynamicProgramming
No ratings yet
Note2c MathDynamicProgramming
24 pages
Geometry Lesson 1
No ratings yet
Geometry Lesson 1
5 pages
MA203
No ratings yet
MA203
84 pages
Cardinal It y
No ratings yet
Cardinal It y
7 pages
Harmony Maths
No ratings yet
Harmony Maths
23 pages
2 Espacios Metricos PDF
No ratings yet
2 Espacios Metricos PDF
16 pages
CompleteMetricSpaces PDF
No ratings yet
CompleteMetricSpaces PDF
4 pages
Lialy Sarti - 17029064 - Resume III
No ratings yet
Lialy Sarti - 17029064 - Resume III
8 pages
Christian Remling: N N N J J P 1/p
No ratings yet
Christian Remling: N N N J J P 1/p
11 pages
Metric Spaces
100% (1)
Metric Spaces
31 pages
MathAnalysis2 13 Draft
No ratings yet
MathAnalysis2 13 Draft
6 pages
The Completion of A Metric Space: N N N N N N N N N N N
No ratings yet
The Completion of A Metric Space: N N N N N N N N N N N
4 pages
Exercise 6.5 (Solutions) : Question # 1
No ratings yet
Exercise 6.5 (Solutions) : Question # 1
3 pages
Lecture Notes 0913
No ratings yet
Lecture Notes 0913
8 pages
4011-Article Text-27517-1-10-20220622
No ratings yet
4011-Article Text-27517-1-10-20220622
5 pages
Metric Spaces, Topology, and Continuity
No ratings yet
Metric Spaces, Topology, and Continuity
21 pages
Problem Set
No ratings yet
Problem Set
4 pages
Metric Space: X X y y
No ratings yet
Metric Space: X X y y
20 pages
Etric Spaces Topology and Continuity: U B C E 526
No ratings yet
Etric Spaces Topology and Continuity: U B C E 526
20 pages
Sequences and Series of Functions (Finalized)
100% (1)
Sequences and Series of Functions (Finalized)
22 pages
Lecture Notes - 1
No ratings yet
Lecture Notes - 1
3 pages
Metric Spaces and Topology, Notes
No ratings yet
Metric Spaces and Topology, Notes
51 pages
Lim Inf and Sup PDF
100% (1)
Lim Inf and Sup PDF
24 pages
Analysis Pm20002: in This Module We Deal With Sequences, Series and Functions of One Real Variable. The Main Subjects Are
No ratings yet
Analysis Pm20002: in This Module We Deal With Sequences, Series and Functions of One Real Variable. The Main Subjects Are
53 pages
Notes403 8
No ratings yet
Notes403 8
7 pages
ANALYSIS3: WED, SEP 11, 2013 Printed: September 12, 2013 (LEC# 3)
No ratings yet
ANALYSIS3: WED, SEP 11, 2013 Printed: September 12, 2013 (LEC# 3)
11 pages
Metric Spaces and Complex Analysis
No ratings yet
Metric Spaces and Complex Analysis
120 pages
Complete Metric Space
No ratings yet
Complete Metric Space
14 pages
MathAnalysis2 13 Gs
No ratings yet
MathAnalysis2 13 Gs
8 pages
Numerical Optimization: 1 Analysis
No ratings yet
Numerical Optimization: 1 Analysis
10 pages
PMTH 4312016 Project
No ratings yet
PMTH 4312016 Project
5 pages
Study Guide
No ratings yet
Study Guide
5 pages
N N M N N N N N N N N
No ratings yet
N N M N N N N N N N N
2 pages
Chapter 2. Normed Linear Spaces: The Basics
No ratings yet
Chapter 2. Normed Linear Spaces: The Basics
2 pages
Real Analysis: July 10, 2006
No ratings yet
Real Analysis: July 10, 2006
42 pages
Complete
No ratings yet
Complete
9 pages
Intro To Analysis
No ratings yet
Intro To Analysis
47 pages
Metric Space Notes
No ratings yet
Metric Space Notes
70 pages
Summary
No ratings yet
Summary
6 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Mathanalysis3 04
No ratings yet
Mathanalysis3 04
5 pages
07 MetricSpaces
No ratings yet
07 MetricSpaces
8 pages
T5SOL
No ratings yet
T5SOL
5 pages
TH TH N N N N J: Typeset by AMS-TEX 1
No ratings yet
TH TH N N N N J: Typeset by AMS-TEX 1
9 pages
Metric Spaces Questions
No ratings yet
Metric Spaces Questions
9 pages
4.2. Sequences in Metric Spaces
No ratings yet
4.2. Sequences in Metric Spaces
8 pages
Einsteins Theory of Special Relativity For Dummies
No ratings yet
Einsteins Theory of Special Relativity For Dummies
9 pages
Topology
No ratings yet
Topology
64 pages
CH 3555
No ratings yet
CH 3555
25 pages
Topics in Analysis Author T. W. Korner
No ratings yet
Topics in Analysis Author T. W. Korner
74 pages
Metric Spaces
No ratings yet
Metric Spaces
40 pages
Topology S17
No ratings yet
Topology S17
31 pages
Lec 06 Topology
No ratings yet
Lec 06 Topology
13 pages
Vectors Test Paper Jee Adv Solutions
No ratings yet
Vectors Test Paper Jee Adv Solutions
2 pages
Lorentz Group
No ratings yet
Lorentz Group
9 pages
Ma2108-Cs New
No ratings yet
Ma2108-Cs New
4 pages
Ma2108 Final Cheatsheet
No ratings yet
Ma2108 Final Cheatsheet
5 pages
Differential Poincaré
No ratings yet
Differential Poincaré
8 pages
Peer Graded Problems Corrected Week2
No ratings yet
Peer Graded Problems Corrected Week2
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet