Wavelet de Haar
Wavelet de Haar
In the previous section, we introduced the idea of a metric space (X, d): A set X along with a
metric d that assigns distances between any two elements x, y ∈ X. It now remains to discuss the
important idea of convergence in such metric spaces. This will be easy, since you have already seen
the idea of convergence of sequences of real numbers.
Recall that a sequence {xn } of real numbers is said to converge to a limit x ∈ R if the following
holds true:
The ǫ-subscripted N , i.e., Nǫ , indicates that the value of N will generally be dependent upon the value
of ǫ. (Most probably, the smaller we make ǫ, thus “squeezing” the tail of the sequence closer to x, the
larger the required value of N .)
But look again at the LHS of the inequality in (1) – not surprisingly, it is the distance between
xn and the limit x, i.e., |xn −x| = d(xn , x), where the metric d on the set of real numbers was discussed
earlier. Therefore, we can rewrite the above requirement for convergence of a sequence {xn } as follows,
The beautiful thing is that the above result carries over to metric spaces in general – regardless
of what comprises these spaces, e.g., numbers, functions, sets, measures, etc.. We say that a sequence
{xn } of elements of a metric space X, i.e., xn ∈ X converges to a limit x ∈ X if
d(xn , x) → 0 as n → ∞, (5)
30
acknowledging that this is a shorthand notation for the more proper ǫ-Nǫ definition.
The above definition of convergence is fine if you happen to know the limit x of your sequence
{xn }. You may then be able “measure” the distances d(xn , x) and show that this sequence converges
to zero. But what if you don’t know the limit x? Can you still characterize a sequence {xn } as
being convergent, or possibly convergent, if you don’t actually know the limit x? The mathematician
Cauchy struggled with this problem in his study of the real numbers and came up with the following
definition (Cauchy 1821):
A sequence of real numbers {xn } is said to be a Cauchy sequence if, for any ǫ > 0, there
exists an Nǫ > 0 such that
In other words, the elements xn of the “tail” of the sequence are getting closer and closer to each
other. Cauchy then proved the following remarkable result:
(Cauchy 1821) A sequence of real numbers {xn } is convergent (to a limit x ∈ R) if and
only if it is a Cauchy sequence.
The “if and only if”, i.e., equivalence, of Cauchy sequences and convergence to a limit, is true because
of the completeness of the real number system. We’ll return to this idea in a moment.
The idea of a Cauchy sequence is easily brought into a general metric space setting:
Let (X, d) be a metric space. Then a sequence of elements {xn } is said to be a Cauchy
sequence in (X, d) if, for any ǫ > 0, there exists an Nǫ > 0 such that
We have simply replaced the distance function |x−y| on the real number line with the metric d(x, y) in
our metric space. This definition is now applicable to real numbers, ordered n-tuples in Rn , functions,
etc..
In general, however, Cauchy’s “if and only if” result for sequences of real numbers may not hold.
We do have the following result, however:
31
Theorem: Let {xn } be a convergent sequence in a metric space (X, d). Then {xn } is a Cauchy
sequence.
The proof of this theorem is so straightforward that it is worthy of mention here. (It is a direct
analogue of the proof for real numbers.)
Proof: Since {xn } is convergent, there exists an x ∈ X (the limit of the sequence) such that the
following holds: For an ǫ > 0, there exists an Nǫ > 0 such that d(xn , x) < ǫ/2 for all n > Nǫ . (Note
that we’re using ǫ/2 on the RHS, for reasons that will be clear below – it’s quite OK to do this.) Then,
for any n, m > Nǫ , it follows, from the triangle inequality, that
Just to repeat the final conclusion: d(xn , xm ) < ǫ for all n, m > Nǫ . Therefore, by definition, the
sequence {xn } is Cauchy, proving the theorem.
And what about the converse? Are all Cauchy sequences convergent? The answer is “no” – it
depends upon the metric space X in which we are working. Here is an example.
Example: Let X = Q, the set of rational numbers, with metric d(x, y) = |x − y|. Consider the
following sequence,
31 314 3141
x1 = 3, x2 = , x3 = , x4 = ,···. (9)
10 100 1000
The element xn is obtained by truncating the decimal expansion of π to n − 1 decimal digits. It can
be shown (left as an exercise – it’s not that difficult) that this sequence is Cauchy. And it darn well
looks like it is convergent, i.e., converging to π. But this limit point π is NOT in the metric space Q.
Therefore the sequence {xn } is NOT convergent.
Of course, if you extended the metric space to be the set of real numbers R, then the sequence
{xn } is convergent – it converges to an element of R, the non-rational number π. As you may already
know, the real numbers are said to be the completion of the rational numbers.
32
Definition: A metric space (X, d) is said to be complete if all every Cauchy sequence {xn } converges
(to an element x ∈ X).
4. The interval X = (a, b] is not complete. (It’s possible to construct a sequence which converges
to a, which is not an element of X.) Likewise, the interval (a, b) is not complete.
In a short time, we shall arrive at some interesting conclusions regarding metric spaces of functions.
First, however, we examine the consequence of convergence with respect to a couple of important
metrics.
Convergence in d∞ metric
Here we examine the consequence of convergence of sequences of continuous functions with respect to
the d∞ metric. Recall that for two functions f, g ∈ C[a, b], the distance between them according to
this metric is given by
d∞ (f, g) = max |f (x) − g(x)|. (10)
a≤x≤b
The distance between f and g is the maximum difference of their values on [a, b]. An important
consequence of this is the following: If
d∞ (f, g) < ǫ implies that |f (x) − g(x)| < ǫ ∀ x, y ∈ [a, b]. (11)
This implies that for all x ∈ [a, b], the difference between f (x) and g(x) is less than ǫ. This is a very
“tight” closeness between the graphs of f (x) and g(x).
Now suppose that we have a sequence of functions fn ∈ C[a, b] that converges to a limit function
f ∈ C[a, b] with respect to the d∞ metric. This means that
d∞ (fn , f ) → 0 as n → ∞. (12)
33
Using the ǫ definition: For any ǫ > 0, there exists an Nǫ > 0 such that
First of all, let us consider a fixed a value of x in [a, b]. As we push ǫ toward 0, the values fn (x)
for n > Nǫ are forced to be closer and closer toward f (x). In other words, the real numbers fn (x)
converge to f (x), i.e.,
lim fn (x) = f (x) or simply fn (x) → f (x). (15)
n→∞
This is true for any x ∈ [a, b]. This type of convergence is known as pointwise convergence.
But there is another important aspect of this result: the convergence is uniform: For any
x ∈ [a, b], the values of fn (x) lie within ǫ of f (x).
Another way to see this is to note that Eq. (14) implies the following (from the property of
absolute values):
f (x) − ǫ < fn (x) < f (x) + ǫ for all n > Nǫ . (16)
In other words, the graphs of the fn (x) functions for n > Nǫ lie inside an ǫ-ribbon of the graph of
f (x). Letting ǫ go to zero reduces the width of this ribbon. The fact that for a given ǫ > 0, the same
Nǫ will work for all values of x in [a, b] implies uniform convergence. This is made possible by the fact
that the interval [a, b] is closed and bounded.
Convergence in d2 metric
Recall that for two functions f, g ∈ C[a, b], the distance between them according to the d2 metric is
given by
Z b 1/2
2
d2 (f, g) = [f (x) − g(x)] dx . (17)
a
Note that 1/2
Z b
2
d2 (f, g) < ǫ implies that [f (x) − g(x)] dx < ǫ. (18)
a
This metric involves an integration over the functions f and g and not a comparison of their values
over the interval. Unlike the case for the d∞ metric, Eq. (18) does not imply that the maximum
difference between f (x) and g(x) values must be less that ǫ. It is possible that the values of f (x) and
g(x) can differ significantly, but over a sufficiently small interval.
34
To illustrate, consider the following example. Let f (x) = 0 and gn (x), n = 2, 3, · · ·, be a sequence
of functions be defined over [0, 1] as follows: For n ≥ 2, define
0, 0 ≤ x ≤ 12 − n1 ,
1 + n x − 1 , 1 − 1 ≤ x ≤ 1,
2 2 n 2
gn (x) = (19)
1
1 1 1
1 − n x − 2 , 2 ≤ x ≤ 2 + n,
0, 21 + n1 ≤ x ≤ 1.
The graph of this seemingly complicated formula is a “hat” centered at x = 1/2, as sketched below.
y
y = gn (x)
y = f (x) = 0
x
0 1 − 1 1 1 + 1 1
2 n 2 2 n
We now compute the d2 distance between f and gn . After a little algebra, we find that
Z 1 1/2
2
d2 (f, gn ) = [f (x) − gn (x)] dx
0
"Z 1 1
#1/2
2
+n
2
= [gn (x)] dx
1
−1
2 n
r
2
= . (20)
3n
We can make n as large as we please, thereby making the d2 distance between f and gn as small as
desired. In other words, d2 (f, gn ) → 0 as n → ∞.
But note that for all n ≥ 2, f (1/2) − g(1/2) = 1, which implies that d∞ (f, gn ) = 1 for all n ≥ 0.
In summary,
2. In the d∞ metric, the functions gn remain a constant distance of 1 away from function f (x) = 0.
Therefore, they cannot converge to f (x) = 0 in the d∞ metric.
Let us make one more observation regarding the gn functions. If we focus our attention on a fixed
value of x ∈ [a, b], note that the behaviour of the real numbers gn (x) as n → ∞ may be summarized
as follows:
35
1. If x = 1/2, then gn (1/2) = 1 for all n ≥ 2, implying that
2. If x 6= 1/2, then for some N > 0, gn (x) = 0 for all n > N . (The hat just gets thinner and
thinner.). This implies that
lim gn (x) = 0, x 6= 1/2. (22)
n→∞
To summarize, from a pointwise perspective, the sequence of functions gn (x), in the limit n → ∞,
approach the function
0, x 6= 1/2,
g(x) = (23)
1, x = 1/2.
For this reason, g(x) is known as the pointwise limit function of the sequence of functions {gn (x)}.
Clearly, g(x) is not a continuous function on [0,1] since it is discontinuous at x = 1/2. Note that
g(x) differs from the zero function f (x) = 0 only at x = 1/2. From the perspective of the d2 metric,
these functions are identical, i.e.,
Z 1 1/2
2
d2 (f, g) = [f (x) − g(x)] dx = 0. (24)
0
The integral does not register the difference of the functions at the single point x = 1/2. More on this
later.
The question that you may now ask is, “So, which metric is better?” The answer is, “It depends
on what you want.” If you are working on a problem that demands functions to be close to each other
over an interval, e.g., uniform convergence, then you use the d∞ metric. On the other hand, if the
difference between functions is more aptly described in terms of integrals, then the d2 metric might
be more useful.
In signal and image processing, the space of continuous functions is too restrictive. This should
be quite evident in the case of images which, by their very nature, are not continuous functions –
every edge of an image, e.g., boundary of an object, represents a curve of discontinuity of the function
u(x, y) representing the image.
The fact that you can have convergence of functions in the d2 metric while still having function
values differing by significant amounts will help us to understand the phenomenon of “ringing”, i.e.,
the Gibbs phenomenon, that occurs when approximating functions with Fourier series.
36
Finally, we comment that the above phenomenon observed for the d2 metric will also be observed
for the other dp metrics, p ≥ 1, that are formulated in terms of integrals.
The material in this section was not covered in the lecture. It is included here for the
purpose of “completeness” for those interested. It will not be covered on any examina-
tion.
We now return to the idea of completeness, as applied to the space of continuous functions C[a, b]
on a closed interval. We shall simply state the following result. Its proof is probably discussed in an
advanced course on analysis (and certainly covered in AMATH 731, “Applied Functional Analysis”).
In other words, if the metric d∞ is employed, a Cauchy sequence of continuous functions {fn } ⊂
C[a, b] converges to a limit function f which is continuous, i.e., f ∈ C[a, b]. (Remember that the
sequence {fn } is Cauchy as measured by the metric d∞ .)
In other words, sequence of of continuous functions {fn } ⊂ C[a, b] which is Cauchy with respect
to the d2 metric may converge to a function f that is NOT continuous, i.e., NOT an element of C[a, b].
An example was presented earlier – the “hat function” centered at x = 1/2. Here is another example.
Example: Let [a, b] = [0, 1] and consider the sequence of functions fn (x) = xn , n = 1, 2, · · ·, sketched
schematically below.
Clearly, fn ∈ C[a, b]. Let’s examine the behaviour of these functions as n → ∞.
2. For x = 1, clearly, xn = 1.
37
y
1
y=x
y = x2
y = x3
y = xn
x
0 1
We’ll simply state the fact that the sequence {fn } is Cauchy in d2 metric (and leave it to the reader
as an optional exercise). Instead, we’ll focus on the limit f of this sequence which, in this case, can
be found rather easily.
Pointwise, it appears that the functions fn (x) are converging to the function f (x) given by
0, 0 ≤ x < 1
f (x) = (25)
1, x = 1.
Clearly, f (x) is not a continuous function – it is discontinuous at x = 1. Let us now examine the
distances d2 (fn , f ):
Z 1 1/2
2
d2 (fn , f ) = [fn (x) − f (x)] dx
0
Z 1 1/2
n 2
= [x − 0] dx
0
1
= √ . (26)
2n + 1
(The single point x = 0 at which f (x) = 1 can be ignored in the integration, since its contribution to
the integral is zero: fn (1) − f (1) = 0. Even if fn (1) 6= f (1), the contribution to the integral would be
zero.) Clearly, d2 (fn , f ) → 0 as n → ∞, i.e., the sequence of functions {fn } is converging to f in d2
metric.
In summary, we have constructed a sequence of continuous functions fn that converges, in d2
metric, to a discontinuous function, i.e., a function that does not belong to C[0, 1]. Therefore, the
metric space (C[a, b], d2 ) is not complete.
38
The moral of the story
The moral of the story is that if we wish to talk about convergence in the d2 metric – which we shall
have to do in our applications to signal and image processing – then we’ll have to extend the class of
functions considered. The space C[a, b] of continuous functions is insufficient. We must be prepared
to allow discontinuous functions.
The “completion” of the space C[a, b] with respect to the d2 metric is the space denoted as “L2 [a, b].
It is the metric space of functions for which the following distance is finite,
Z b 1/2
2
d2 (f, g) = [f (x) − g(x)] dx < ∞. (27)
a
End of Appendix
39
Lecture 4
Metric spaces – sets with distance functions – are very nice, but they are not sufficient for the appli-
cations we wish to study. We would like to be able to add and subtract signals and images, even take
linear combinations of them. For this reason, we would like to work with spaces that have a vector
space structure. In the literature, vector spaces are also called “linear spaces.”
Definition: Let X be a real (or complex) vector space. A real-valued function kxk defined on X is a
norm on X if the following properties are satisfied:
It can be verified that this metric satisfies all of the required properties of a metric. In particular,
the triangle inequality for norms given above guarantees that the above metric satisfies the triangle
inequality for metrics: If we replace x with x − z and y with −y + z in Property 3 above, we obtain
kx − yk ≤ kx − zk + ky − zk, (29)
or
d(x, y) ≤ d(x, z) + d(z, y). (30)
40
If we now consider a normed linear space X as a metric space with metric d(x, y) = kx − yk, then
we may ask whether or not it is complete, i.e., whether all Cauchy sequences in X with respect to this
metric converge to an element in X. If so, then we say that the normed linear space X is complete.
Examples:
1. The space X = Rn . Of course, you are very familiar with the Euclidean magnitude of a vector,
i.e., an n-tuple, x = (x1 , x2 , · · · , xn ) to characterize its length as follows,
This is a particular example of a family of norms that can be assigned to elements of Rn , the
p-norms,
kxkp = [|x1 |p + |x2 |p + · · · + |xn |p ]1/p , p ≥ 1. (32)
By virtue of the completeness property of the real numbers, the normed linear spaces (Rn , k · kp )
are complete for all p ≥ 1 as well as the case p = ∞. They are all Banach spaces.
2. X = C[a, b], the space of continuous functions on the interval [a, b]. Here, we use the “infinity
norm:” For an f ∈ C[a, b],
kf k∞ = max |f (x)|. (34)
a≤x≤b
41
We define the p-norm as follows: For an x = (x1 , x2 , · · ·),
"∞ #1/p
X
kxkp = |xi |p . (37)
i=1
4. The space of real-valued, p-integrable functions on [a, b], denoted as Lp [a, b] and defined as
follows:
Z b
p
L [a, b] = {f : [a, b] → R | |f (x)|p dx < ∞}, p = 1, 2, · · · . (39)
a
These norms define the dp metrics introduced in a previous lecture: For f, g ∈ Lp [a, b],
Z b 1/p
dp (f, g) = kf − gkp = |f (x) − g(x)|p dx . (41)
a
The most commonly used normed linear space in this family is the case p = 2, namely, the space
L2 [a, b] of square integrable functions with norm,
Z b 1/2
2
kf k2 = |f (x)| dx . (42)
a
(We say that a function f ∈ L2 [a, b] is “square integrable” since the integration of its squared
magnitude, |f (x)|2 , over the interval [a, b] is finite.) The metric associated with this norm is the
usual “L2 metric”,
Z b 1/2
2
d2 (f, g) = kf − gk2 = |f (x) − g(x)| dx . (43)
a
As you may recall – and as will certainly be discussed very shortly – this is the function space
that is relevant to Fourier series.
42
Note that for any p ≥ 1, the space Lp [a, b] includes the space of continuous functions C[a, b].
This follows from the fact that continuous functions on a closed interval are bounded: For each
f ∈ C[a, b], there exists an M ≥ 0 such that |f (x)| ≤ M for all x ∈ [a, b]. It then follows that
Z Z b
p
|f (x)| dx ≤ M p dx
a a
Z b
= Mp dx
a
= M p (b − a)
< ∞. (44)
Therefore f ∈ Lp .
But the Lp spaces include a great deal more: discontinuous functions, even unbounded functions.
(The latter won’t be needed in our applications.) And the Lp spaces are not identical. For
example, consider the example [a, b] = [0, 1]. The function f (x) = x−1/2 is an element of L1 [0, 1]
since
Z 1 Z 1
|f (x)| dx = x−1/2 dx = 2 < ∞. (45)
0 0
Therefore, there are functions which are in L1 [0, 1] but not in L2 [0, 1], suggesting that L2 [a, b] ⊂
L1 [a, b]. This result can be generalized to Lp [a, b] ⊂ Lq [a, b] when p > q. But this is beyond the
scope of the course.
Note: The Lp spaces may easily be extended to include complex-valued functions f [a, b] → C.
The definition in (39) holds, with “R” simply replaced by “C”.
It will be necessary to consider functions defined on the entire real line in this course. From your
previous encounters with improper integrals, you will not be surprised to know that functions
43
in Lp (R) must satisfy a rather stringent condition – that
It’s actually a little more complicated than this – the rate of decay will depend on p.
We now come to an extremely important concept of this course. You are, of course, very familiar with
finite-dimensional normed linear spaces, such as X = Rn . In such cases, which we shall denote as
“dim(X) = n”, suppose that we have a set of n linearly independent elements ui ∈ X, 1 ≤ i ≤ n, i.e.,
As you know, the {ui } form a basis for X: Given an element v ∈ X, there exists a set of coefficients
ci ∈ R, 1 ≤ i ≤ n, such that
v = c1 u1 + c2 u2 + · · · + cn un . (51)
This course is concerned with the approximation of functions which, as stated in a previous lecture,
are elements of infinite-dimensional spaces, i.e., dim(X) = ∞. In such cases, we have to be a little
more careful. Suppose that we have a set of linearly independent elements ui ∈ X, 1 ≤ i ≤ n. Define
Sn to be the span of the ui , i.e.,
Sn = span {u1 , u2 , · · · , un }
44
X
v.
∆n = kvn − vk
.
vn Sn
Once again, vn is the element in Sn that lies closest to v, as measured by the metric that is
defined by the norm k · k in X.
The above statement regarding vn , which involves the preceding equation (54), may be written
mathematically and more compactly by means of the so-called “arg” notation. We write,
This statement may be read as follows: “vn is the ‘argument’ or ‘element’ in Sn at which the mini-
mization over Sn is achieved”. In other words, “vn is the element in Sn that minimizes the distance
to v, i.e., kx − vk.
(Note: At this point we should make a parenthetical remark that, technically speaking, after the phrase “given
by the element vn ∈ Sn ” should be added “(provided it exists)”. As well, the “min” in the equation should
be replaced by “inf”. There are pathological situations where a minimum value for the error is approached
in some limit by a sequence of approximations, with the actual best approximation not existing. But in most
applications, including those considered here, this will not be the case, i.e., Eq. (54) will hold. And in the case
of Hilbert spaces, i.e., complete inner product spaces – to be discussed next – such a “minimizer” always exists,
and it is unique.)
Since the approximation vn lies in Sn , it will have an expansion in the basis set ui , i.e.,
vn = c1 u1 + c2 u2 + · · · + cn un . (56)
45
often adding the phrase “in the norm k · k or associated metric on X” (e.g., “in L2 norm,” “in d2
metric” or even “in L2 metric”). The error of this approximation or simply “approximation error” is
∆n = kv − vn k . (58)
As we increase n, i.e., the number of basis elements ui employed, the approximation cannot get
worse, i.e., the error ∆n cannot increase. (If we employ basis elements uk that are not used, i.e., their
associated coefficients are zero, then the approximation error will remain the same.) Hopefully, the
∆n will decrease as n increases. And, indeed, we would hope that ∆n → 0 as n → ∞. Of course, if,
for some n, v ∈ Sn , then ∆n = 0, i.e., v = vn , admitting the expansion in (57).
Note: In the discussion above, there has been no mention of orthogonality. This is because normed
linear spaces, in general, are not necessarily inner product spaces that are equipped with an inner
product, from which comes the property of orthogonality. The special case of best approximation in
inner product spaces will be discussed in the next section of this course.
1. Example No. 1: Let X = C[a, b], the normed linear space of continuous functions on [a, b].
The functions uk (x) = xk−1 , k = 1, 2, · · ·, i.e., {1, x, x2 , · · ·}, form a linearly independent set,
hence a basis in C[0, 1]. In this case, given a function f ∈ C[0, 1], the best approximation in Sn
would be the n − 1-degree polynomial approximation to f having the form,
f (x) ∼
= vn = c0 + c1 x + · · · cn−1 xn−1 . (59)
(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1 (x) = 1.
The best approximation v1 ∈ S1 is the best constant approximation to f (x) in the d∞
46
metric, i.e., f (x) ≃ c, where c is obtained by minimizing the distance function,
If the formula for f (x) exists, we may be able to remove the absolute value by considering
the various intervals over which the integrand, f (x)−c, is positive and negative. In Problem
Set No. 1, you are asked to find the best constant approximation to f (x) = x2 on [0,1]
using the above distance function.
2. Example No. 2: Let X = L1 [a, b], the normed linear space of functions on [a, b] satisfying the
condition that Z b
kf k1 = |f (x)| dx < ∞ . (63)
a
Recall that the space of continuous functions C[a, b] is a subset of this space. The functions
uk (x) = xk−1 , k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L1 [0, 1], its best
approximation in Sn is given by the solution to the following optimization problem,
Z b
min kf − vn k1 = min |f (x) − c0 − c1 x − · · · − cn−1 xn−1 | dx . (65)
c0 ,···,cn−1 c0 ,···,cn−1 a
This problem actually turns out to be a bit more tractable – it can be solved by the method of
linear programming since the optimization problem in linear in the unknowns ci .
(a) The special case n = 1. Here, we use only one function from the basis set, i.e., u1 (x) = 1.
The best approximation v1 ∈ S1 is the best constant approximation to f (x) in the L1
metric, i.e., f (x) ≃ c, where we minimize the distance function,
Z b
∆1 (c) = d1 (f, c) = kf − ck1 = |f (x) − c| dx , (66)
a
where we have replaced c0 with c. Because the integrand contains an absolute value we
cannot use differentiation methods to find the c-value which minimizes ∆1 (c). If the formula
for f (x) exists and is not too complicated, it may be able to evaluate the integral by
evaluating it on separate intervals over which f (x) − c is positive and negative. In Problem
Set No. 1, you are asked to find the best constant approximation to f (x) = x2 on [0,1]
using the above distance function.
47
3. Example No. 3: Let X = L2 [a, b], the normed linear space of functions on [a, b] satisfying the
condition Z b
kf k22 = [f (x)]2 dx < ∞ . (67)
a
Once again, the space of continuous functions C[a, b] is a subset of this space. The functions
uk (x) = xk−1 , k = 1, 2, · · ·, also form a basis in this space. Given a function f ∈ L2 [0, 1], its best
approximation in Sn is given by the solution to the following optimization problem,
Z b
2
min kf − vn k2 = min [f (x) − c0 − c1 x − · · · − cn−1 xn−1 ]2 dx . (69)
c0 ,···,cn−1 c0 ,···,cn−1 a
Note that we have chosen to minimize the squared L2 distance since it avoids square roots.
(Minimizing the square of a positive function h(x) is the same as minimizing h(x).) This problem
can, in principle, be solved analytically by finding the stationary points of the squared distance
function,
Z b
∆22 (c0 , c1 , · · · , cn−1 ) = [f (x) − c0 − c1 x − · · · − cn−1 xn−1 ]2 dx . (70)
a
This is one of the reasons that the metric associated with the space L2 is used in most signal
and image processing applications. We’ll illustrate with two special cases.
(a) The special case n = 1. Here again, we use only one function from the basis set, i.e.,
u1 (x) = 1. The best approximation v1 ∈ S1 is once again the best constant approximation
to f (x), but this time with respect to the L2 metric, i.e., f (x) ≃ c, where we minimize the
squared L2 distance function,
Z b
h(c) = ∆21 (c) = [d2 (f, c)]2 = kf − ck22 = [f (x) − c]2 dx . (71)
a
• Method No. 1: Expand the integrand and integrate to produce an expression for
∆22 (c) in terms of c:
Z b Z b Z b
h(c) = ∆22 (c) = 2
[f (x)] dx − 2c f (x) dx + c 2
dx
a a a
Z b Z b
2
= [f (x)] dx − 2c f (x) dx + c2 (b − a) . (72)
a a
48
Find critical points:
Z b
′
h (c) = −2 f (x) dx + 2c(b − a) . (73)
a
Note that h′′ (c) = 2(b − a) > 0, so the critical point is a global minimum.
In summary, the best constant approximation to a function f (x) using the L2 metric
on [a, b] is given by
b
1
Z
f (x) ≃ f (x) dx . (75)
b−a a
Note that this constant is the mean or average value of f (x) over [a, b], often denoted
as f¯[a,b] . This is a very well known result in signal and image processing.
(b) The special case n = 2. We now use two functions rom the basis set, i.e., u1 (x) = 1 and
u2 (x) = x to produce the best approximation, v2 (x), to f (x) having the form,
f (x) ≃ c0 + c1 x . (77)
which is now a function of two variables, c0 and c1 . Critical points (c0 , c1 ) must satisfy the
following stationarity conditions,
b
∂h
Z
= −2 [f (x) − c0 − c1 x] dx = 0 ,
∂c0 a
b
∂h
Z
= −2 [f (x) − c0 − c1 x] x dx = 0 , . (79)
∂c1 a
49
These conditions yield the following set of inhomogeneous linear equations in c and d,
Z b Z b Z b
dx c0 + x dx c1 = f (x) dx
a a a
Z b Z b Z b
2
x dx c0 + x dx c1 = xf (x) dx . (80)
a a a
The integrals on the LHS are easily evaluated in terms of the endpoints a and b. Assuming
that the integrals on the RHS involving f (x) can be evaluated, the system is easily solved
using Cramer’s Rule.
This system of equations may be viewed as a continuous version of the classical method of
least squares best approximation of data points (xi , f (xi )) by the straight line y = c0 + c1 x.
4. Example No. 4: We return to the approximations that are yielded by partial sums of the
Fourier series of a function f (x) defined on the interval [−π, π], i.e.,
N
X
f (x) ≃ SN (x) = a0 + [an cos nx + bn sin nx] , (81)
n=1
where the coefficients an and bn are given in Eq. (2) in Lecture 2. The relevant normed linear
space is L2 [−π, π] which is also an inner product space. The approximation SN (x) is an element
of a 2N + 1-dimensional space that is spanned by the basis functions,
Here, we simply state that SN (x) is the best approximation to f (x) in this space and that the
coefficients an and bn are obtained from the inner product defined on this space. We shall justify
these statements in the next section, where we discuss approximations in an inner product, or
“Hilbert”, space.
Once again, we mention that in the discussion preceding the above examples, nothing was said
about the “orthogonality” of the basis set un . That is because nothing could be said! We have been
working with normed linear spaces, which are not necessarily inner product spaces. Only in
inner product spaces, the subject of the next section, can we have the property of orthogonality.
50
Inner product spaces
Of course, you are familiar with the idea of inner product spaces – at least finite-dimensional ones.
Let X be an abstract vector space with an inner product, denoted as “h , i”, a mapping from X × X
to R (or perhaps C). The inner product satisfies the following conditions,
where the bar indicates complex conjugation. Note that this implies that, from Property 2,
Note: For anyone who has taken courses in Mathematical Physics, the above properties may be
slightly different than what you have seen, as regards the complex conjugations. In Physics, the usual
convention is to complex conjugate the first entry, i.e. hαx, yi = αhx, yi.
hx, xi = kxk2
p
or kxk = hx, xi. (83)
(Note: An inner product always generates a norm. But not the other way around, i.e., a norm is not
always expressible in terms of an inner product. You may have seen this in earlier courses – the norm
has to satisfy the so-called “parallelogram law.”)
And where there is a norm, there is a metric: The norm defined by the inner product h , i will define
the following metric,
p
d(x, y) = kx − yk = hx − y, x − yi, ∀x, y ∈ X. (84)
51
A complete inner product space is called a Hilbert space, in honour of the celebrated
mathematician David Hilbert (1862-1943).
Finally, the inner product satisfies the following relation, called the “Cauchy-Schwarz inequality,”
You probably saw this relation in your studies of finite-dimensional inner product spaces, e.g., Rn . It
holds in abstract spaces as well.
52
Lecture 5
Examples:
hx, yi = x1 y1 + x2 y2 + · · · + xn yn . (86)
The norm induced by the inner product is the familiar Euclidean norm, i.e.
" n #1/2
X
2
kxk = kxk2 = xi . (87)
i=1
You’ll note that the inner product generates bf only the p = 2 norm or metric. Rn is also
complete, i.e., it is a Hilbert space.
2. X = Cn – a minor modification of the real vector space case. Here, for x = (x1 , x2 , · · · , xn ) and
y = (y1 , y2 , · · · , yn ),
hx, yi = x1 y1 + x2 y2 + · · · + xn yn . (89)
Note that l2 is the only lp space for which an inner product exists. It is a Hilbert space.
53
4. X = C[a, b], the space of continuous functions on [a, b] is NOT an inner product space.
As in the case of sequence spaces, L2 is the only Lp space for which an inner product exists. It
is also a Hilbert space.
6. The space of (real- or complex-valued) square-integrable functions L2 (R) on the real line, also
introduced earlier. Here,
Z ∞
2
kf k = hf, f i = |f (x)|2 dx < ∞. (95)
−∞
Once again, L2 is the only Lp space for which an inner product exists. It is also a Hilbert space,
and will the primary space in which we will be working for the remainder of this course.
An important property of inner product spaces is “orthogonality.” Let X be an inner product space.
If hx, yi = 0 for two elements x, y ∈ X, then x and y are said to be orthogonal (to each other).
Mathematically, this relation is written as “x ⊥ y,” just as is done for vectors in Rn .
We’re now going to need a few ideas and definitions for the discussion that is coming up.
• Recall that a subspace Y of a vector space X is a nonempty subset Y ⊂ X such that for all
y1 , y2 ∈ Y , and all scalars c1 , c2 , the element c1 y1 + c2 y2 ∈ Y , i.e., Y is itself a vector space. This
implies that Y must contain the zero element, i.e., y = 0.
54
• Moreover the subspace Y is convex: For every x, y ∈ Y , the “segment” joining x + y, i.e., the
set of all convex combinations,
z = αx + (1 − α)y, 0 ≤ α ≤ 1, (97)
is contained in Y .
• A vector space X is said to be the direct sum of two subspaces, Y and Z, written as follows,
X = Y ⊕ Z, (98)
x = y + z, y ∈ Y, z ∈ Z. (99)
Note: The concept of an algebraic complement does not have to invoke the use of orthogonality.
(Unfortunately, it was used in the lecture.)
• In Rn and inner product spaces X in general, it is convenient to consider spaces that are orthog-
onal to each other. Let S ⊂ X be a subset of X and define
55
Aside: Some remarks regarding the idea of a “complement”
This section was not covered in the lecture. It is included here only for supplementary
purposes. You will not be examined on this material.
The concept of an algebraic complement does not have to invoke the use of orthogonality. With
thanks to a student who once raised the question of algebraic vs. orthogonal complementarity in class,
let us consider the following example.
Let X denote the (11-dimensional) vector space of polynomials of degree 10, i.e.,
X10
X={ ck xk , ck ∈ R, 0 ≤ k ≤ 10}. (101)
k=0
Equivalently,
X = span {1, x, x2 , · · · , x10 }. (102)
Now define
Y = span {1, x, x2 , x3 , x4 , x5 }, Z = span {x6 , x7 , x8 , x9 , x10 }. (103)
First of all, Y and Z are subspaces of X. Furthermore, X is a direct sum of the subspaces Y and Z.
However, the spaces Y and Z are not orthogonal complements of each other.
First of all, for the notion of orthogonal complmentarity, we would have to define an interval of
support, e.g., [0, 1], over which the inner products of the functions is defined. (And then we would
have to make sure that all inner products involving these functions are defined.) Using the linearly
independent functions, xk , − ≤ k ≤ 10, one can then construct (via Gram-Schmidt orthogonalization)
an orthogonal set of polynomial basis functions, {φk (x)}, 0 ≤ k ≤ 10, over X. It is possible that the
first few members of this orthogonal set will contain the functions xk , 0 ≤ k ≤ 5, which come from
the set Y . But the remaining members of the orthogonal set will contain higher powers of x, i.e., xk ,
6 ≤ k ≤ 10, as well as lower powers of x, i.e., xk ¡ 0 ≤ k ≤ 5. In other words, the remaining members
of the orthogonal set will not be elements of the set Y – they will have nonzero components in X.
See also Example 3 below.
56
Examples:
2. As before X = R3 but with S = {(c, 0, 0) | c ∈ [0, 1]}. Now, S is no longer a subspace but simply
a subset of X. Nevertheless S ⊥ is the same set as in 1. above, i.e., S ⊥ = span{(0, 1, 0), (0, 0, 1)}.
We have to include all elements of X that are orthogonal to the elements of S. That being
said, we shall normally be working more along the lines of Example 1, i.e., subspaces and their
orthogonal complements.
3. Further to the discussion of algebraic vs. orthogonal complementarity, consider the same spaces
X, Y and Z as defined in Eqs. (102) and (103), but defined over the interval [−1, 1]. The
orthogonal polynomials φk (x) over [−1, 1] that may be constructed from the functions xk , 0 ≤
k ≤ 10 are the so-called Legendre polynomials, Pn (x), listed below:
n Pn (x)
0 1
1 x
1 2
2 2 (3x − 1)
1 3
3 2 (5x − 3x)
1 4
4 8 (35x − 30x2 + 3)
(104)
1 5
5 8 (63x − 70x3 + 15x)
1 6
6 16 (231x − 315x4 + 105x2 − 5)
1 7
7 16 (429x − 693x5 + 315x3 − 35x)
1 8
8 128 (6435x − 12012x6 + 6930x4 − 1260x2 + 35)
1 9
9 128 (12155x − 25740x7 + 18018x5 − 4620x3 + 315x)
1 10
10 256 (46189x − 109395x8 + 90090x6 − 30030x4 + 3465x2 − 63)
57
where δmn is the Kronecker delta,
1, m = n,
δmn = (106)
0, m =
6 n.
From the above table, we see that the Legendre polynomials Pn (x), 1 ≤ n ≤ 5 belong to space
Y , whereas the polynomials Pn , 5 ≤ n ≤ 10, do not belong solely to Z. This suggests that the
spaces Y and Z are not orthogonal complements. However, the following spaces are orthogonal
complements:
There is, however, another decomposition going on in this space, which is made possible by
the fact that the interval [−1, 1] is symmetric with respect to the point x = 0. Note that the
polynomials Pn (x) are either even or odd. This suggests that we should consider the following
subsets of X,
It is a quite simple exercise to show that any function f (x) defined on an interval [−a, a] may
be written as a sum of an even function and an odd function. This implies that any element
u ∈ X may be expressed n the form
u = v + w, v ∈ Y ′′ , w ∈ Z ′′ . (109)
Therefore the spaces Y ′ and Z ′ are algebraic complements. In terms of the inner product of
functions over [−1, 1], however, Y ′ and Z ′ are also orthogonal complements since
Z 1
f (x)g(x) dx = 0, (110)
−1
58
The discussion that follows will be centered around Hilbert spaces, i.e., complete inner product
spaces. This is because we shall need the closure properties of these spaces, i.e., that they contain
the limit points of all sequences. The following result is very important.
Theorem: Let H be a Hilbert space and Y ⊂ H any closed subspace of H. (Note: This means that
Y contains its limit points. In the case of finite-dimensional spaces, e.g., Rn , a subspace is closed. But
a subspace of an infinite-dimensional vector space need not be closed.) Now let Z = Y ⊥ . Then for
any x ∈ H, there is a unique decomposition of the form
x = y + z, y ∈ Y, z ∈ Z = Y ⊥ . (111)
This is an extremely important result from analysis, and equally important in applications. We’ll
examine its implications a little later, in terms of “best approximations” in a Hilbert space. In the
figure below, we provide a sketch that will hopefully illustrate the situation.
0
Y
The space Y is contained between the two lines that emanate from 0. Note that Y lies on both sides
of 0: If p ∈ Y , then −p, which lies on the “other side” of 0, is also an element of Y . The point y ∈ Y is
the orthogonal projection of x onto the set Y and may be viewed as the point of intersection betwenn
a line which extends from x to the set Y in such a way that it is “perpendicular” to Y . The examples
that we consider should clarify this idea.
59
From Eq. (111), we can define a mapping PY : H → Y , the projection of H onto Y so that
PY : x → y. (112)
Note that
PY : H → Y, PY : Y → Y, PY : Y ⊥ → {0}. (113)
PY2 = PY . (114)
This follows from the fact that PY (x) = y and PY (y) = y, implying that PY (PY (x)) = PY (x).
This follows from the fact that the norm is defined by means of the inner product:
kxk2 = hx, xi
= hy + z, y + zi
where the final line results from the fact that hy, zi = hz, yi since y ∈ Y and z ∈ Z = y ⊥ .
Example: Let H = R3 , Y = span{(1, 0, 0)} and Y ⊥ = span{(0, 1, 0), (0, 0, 1)}. Then x = (1, 2, 3)
admits the unique expansion,
(1, 2, 3) = (1, 0, 0) + (0, 2, 3), (117)
Let H denote a Hilbert space. A set {u1 , u2 , · · · un } ⊂ H is said to form an orthogonal set in H if
If, in addition,
hui , ui i = kui k2 = 1, 1 ≤ i ≤ n, (119)
60
then the {ui } are said to form an orthonormal set in H.
You will not be surprised by the following result, since you have most probably seen it in earlier
courses in linear algebra.
Theorem: An orthogonal set {u1 , u2 , · · ·} not containing the element {0} is linearly independent.
c1 u1 + c2 u2 · · · cn un = 0. (120)
For each k = 1, 2, · · · n, form the inner product of both sides of the above equation with uk , i.e.,
By the orthogonality of the ui , the LHS of the above equation reduces to ck kuk k2 , implying that
ck kuk k2 = 0, k = 1, 2, · · · , n. (122)
By assumption, however, uk 6= 0, implying that kuk k2 6= 0. This implies that all scalars ak are zero,
which means that the set {u1 , u2 , · · · , un } is linearly independent.
Note: As you have also seen in courses in linear algebra, given a linearly independent set {v1 , v2 , · · · , vn },
we can always construct an orthonormal set {e1 , e2 , · · · , en } via the Gram-Schmidt orthogonalization
procedure. Moreover,
span{v1 , v2 , · · · , vn } = span{e1 , e2 , · · · , en }. (123)
61
Best approximation in Hilbert spaces
Recall the idea of the best approximation in normed linear spaces, discussed a couple of lectures ago.
Let X be an infinite-dimensional normed linear space. Furthermore, let ui ∈ X, 1 ≤ i ≤ n, be a set
of n linearly independent elements of X and define the n-dimensional subspace,
Sn = span{u1 , u2 , · · · , un }. (124)
Then let x be an arbitrary element of X. We wish to find the best approximation to x in the subspace
Sn . It will be given by the element yn ∈ Sn that lies closest to x, i.e.,
(The variables used above may be different from those used in the earlier lecture.)
We are going to use the same idea of best approximation, but in a Hilbert space setting, where
we have the additional property that an inner product exists in our space. This, of course, opens the
door to the idea of orthogonality, which will play an important role.
Theorem: Let {e1 , e2 , · · · , en } be an orthonormal set in a Hilbert space H. Define Y = span{ei }ni=1 .
Y is a subspace of H. Then for any x ∈ H, the best approximation of x in Y is given by the unique
element
n
X
y = PY (x) = ck ek (projection of x onto Y ), (126)
k=1
where
ck = hx, ek i, k = 1, 2, · · · , n. (127)
The ck are called the Fourier coefficients of x w.r.t. the set {ek }.
Furthermore,
n
X
|ck |2 ≤ kxk2 . (128)
k=1
62
The best approximation to x in Y is the point y ∈ Y that minimizes the distance kx − vk, v ∈ Y , i.e.,
The first term on the RHS is simply hx, xi = kxk2 . The second term, using once again the linearity
properties of the inner product, may be expressed as follows,
n n n
* +
X X X
x, cl el = hx, cl el i = cl hx, el i . (134)
l=1 l=1 l=1
63
where the final line is a result of the orthonormality of the en .
From all of these calculations, Eq. (133) becomes,
n
X n
X n
X
2
g(c1 , c2 , · · · , cn ) = kxk − cl hx, el i − ck hx, ek i + c2k . (137)
l=1 k=1 k=1
Recall that we would like to minimize this function of n variables. We first impose the necessary
stationarity conditions for a minimum,
∂g
= −hx, ep i − hep , xi + 2cp = 0, p = 1, 2, · · · , n. (138)
∂cp
Therefore,
cp = hx, ep i, p = 1, 2, · · · , n, (139)
Finally, substitution of these (real or complex) values of ck into the squared distance function in
Eq. (132) yields the result
n
X n
X n
X
2 2 2
g(c1 , c2 , · · · cn ) = kxk − |cl | − |ck | + |ck |2
l=1 k=1 k=1
n
X
= kxk2 − |ck |2
k=1
≥ 0, (141)
which then implies Eq. (128). The proof of the Theorem is complete.
1. The above result implies that the element x ∈ H may be expressed uniquely as
x = y + z, y ∈ Y, z ∈ Z = Y ⊥ . (142)
64
where the ck are given by (127). For l = 1, 2, · · · , n, take the inner product of el with both sides
of this equation to give
n
X
hz, el i = hx, el i − ck hek , el i
k=1
= hx, el i − cl
= 0. (144)
2. The term z in (142) may be viewed as the residual in the approximation x ≈ y. Its norm then
defines the error of approximation,
kx − yk = kzk. (145)
kzn k = kx − yn k. (146)
holds for all appropriate values of n > 0. (If H is finite-dimensional, i.e., dim(H) = N , then
n = 1, 2, · · · , N .) In other words, the partials sums on the left are bounded from above. This
inequality, known as Bessel’s inequality, will have important consequences.
65
n
X n
X
= hx − ck ek , x − cl el i
k=1 l=1
Xn n
X n X
X n
2
= kxk − h ck ek , xi − hx, cl el i + hck ek , cl el i
k=1 l=1 k=1 l=1
n h
X i
= kxk2 − ak hx, ek i + ck hx, ek i + |ck |2
k=1
Xn h i Xn
= kxk2 + |hx, ek i|2 − ak hx, ek i − ck hx, ek i + |ck |2 − |hx, ek i|2
k=1 k=1
Xn n
X
= kxk2 + |hx, ek i − ck |2 − |hx, ek i|2 . (148)
k=1 k=1
The first and last terms are fixed. The middle term is a sum of nonnegative numbers. The minimum
value is achieved when all of these terms are zero. Consequently, f (c1 , · · · , cn ) is a minimum if and
only if ck = hx, ek i for k = 1, 2, · · · , n. As in the real case, we have
n
X
kxk2 − |hx, ek i|2 ≥ 0. (149)
k=1
66