Linear Methods of Applied Mathematics Evans M. Harrell II and James V. Herod
Linear Methods of Applied Mathematics Evans M. Harrell II and James V. Herod
The other familiar vector operation we shall use, besides sums and scalar multiples, is the
dot product, which we abstractly call an inner product. (We won’t be concerned with
analogues of the cross product, although you may see these in other courses.) You
probably learned about the dot product by beginning with trigonometry - the dot product of
two vectors is given as the product of their lengths:
v⋅w = v w cos ∠v,w (2.1)
Later you learned that this could be conveniently calculated from the coordinates of the two
vectors, by multiplying given components together and summing: If the components of the
vector v are υ1, …, υn and those of w are ω1, …, ωn, then
v⋅w = Σ υkωk. (2.2)
k
From the abstract point of view it is best not to begin with angle, since we don't have a
good intuitive picture of the angle between two functions or matrices. Instead, we will
make sense of the “angle” between functions from the connection between these formulae.
16 Linear Methods of Applied Mathematics
In the abstract setting, we shall denote the inner product <v,w> rather than v⋅ w. Given an
inner product, we may speak about the length, defined by v = v,v . We shall usually
call this the norm rather than the length.
An abstract inner product will share most of the properties you learned about for the
ordinary dot product:
1. The inner product is a mapping taking pairs of vectors and producing scalars
2. Linearity :
Note: some texts define an inner product to be linear on the right side rather than the left
side. This makes no practical difference, but if there are complex quantities, you should be
careful to be consistent with whichever convention you follow.
3. Symmetry :
v,w = w,v . (The bar denotes complex conjugate, in case these are complex
numbers.)
4. Positivity :
v,v ≥ 0, and v,v = 0only if v is thezero vector.
5. The Cauchy, Schwarz, or Buniakovskii inequality (depending on nationality):
v,w ≤ v w . (2.3)
We need Property 5 if an inner product is to correlate with Eq. (2.1), and we need Property
6 if the length v = v,v is to make sense geometrically. We will shortly define an
inner product for functions. The reason we can speak of the geometry of functions is that
properties 5 and 6 follow automatically from Properties 1-4. Since this is not at all
obvious, I shall prove it below.
Definition II.1. Given an abstract vector space, an inner product is any mapping
satisfying properties 1-4 above. A vector space with an inner product is called an inner
product space.
Examples II.2.
2
v,w A := Σ υ j A jk ωk
j,k=1
where
A jk:= 21 12 .
Actually, we could consider the vector space of n-vectors and let A be any positive n×n
matrix. (By definition, a positive matrix is one for which the inner product so defined
satisfies Property 4 of the inner product.)
3. The standard inner product for functions. Consider the set of functions which are
continuous on an interval a ≤ x ≤ b. Then the standard inner product on them is the
b
integral: f,g := f x g x dx. Another name for this is the L2 inner product. We can
a
use it for functions with discontinuities and even singularities, so long as the singularities
are not too strong.
4. Other inner products for functions. We can generalize Example 3 in various ways.
The first is to insert a positive weight function w(x):
b
f,g := f x g x w x dx .
a
Or - here is a really great one - we could have weight functions and lots of dimensions!
Theorem II.3. If V is an inner product space, then the CSB inequality (Property 5) and
the triangle inequality (Property 6) hold.
proof (don’t worry -it won’t hurt you!): Because of the positivity property 4, the square
length of any vector is ≥ 0, in particular for any linear combination αv+β w,
0 ≤ αv+βw,αv+βw = α 2 v 2 + β 2 w 2 + αβ v,w + αβ w,v
= α 2 v 2 + β 2 w 2 + 2 Re αβ v,w
18 Linear Methods of Applied Mathematics
The trick now is to choose the scalars just right. Some clever person - perhaps Hermann
Amandus Schwarz - figured out to choose α = w 2 and β = – v,w . The inequality then
boils down to
0 ≤ w 4 v 2 – w 2 v,w 2.
If we collect terms and divide through by ||w||2, we get the CSB inequality. (If ||w||=0, the
CSB inequality is automatic.)
Having the CSB inequality in hand, we may now define a strange but useful idea - the
angle between two functions. Consider, for example, the interval 0≤x≤L and the functions
f(x) = 1 and g(x) = x. With the standard inner product, we first calculate their norms:
L3 .
L L
1 := 1 2 dx = L , and x := x 2 dx =
0 0 3
Since their inner product is
L 2
1,x = x dx = L2 ,
0
Definition II.4. Two functions f and g are said to be orthogonal if <f,g> = 0. A set of
functions {fj} is orthogonal if <fj,f k> = 0 whenever j ≠ k. The set is said to be
orthonormal if it is orthogonal and ||fj|| = 1 for all j.
With the Kronecker delta symbol, δjk = 0 when j≠k, and δkk = 1, orthonormality can be
expressed as <fj,f k> = δjk.
Linearity 19
Examples II.5.
1. Integral tables, mathematical software, integration by parts (twice), substitution with the
cosine angle-sum rule, and rewriting trigonometric functions as complex exponentials can
all be used to evaluate integrals such as
L
Any or all of these methods will lead to the same conclusion, viz.:
L
∞
The set of functions sin mπx
L m=1 is orthogonal on the interval [0,L], and to turn it into
an orthonormal set, we normalize the functions by multiplying by the appropriate constant:
∞
2 mπx
L sin L m=1
(2.6)
2. Similarly,
∞
2 mπx
L cos L m=1
(2.7)
is orthonormal on the interval [0,L], and we can even include another function, the
constant:
∞
1 ∪ 2 mπx
L L cos L m=1
3. We can mix the previous two sets to have both sines and cosines as long as we leave
out all of the odd coefficients:
∞ ∞
1 ∪ 2 cos 2mπx ∪ 2 sin 2nπx
L L L m=1 L L n=1
(2.8)
is also an orthonormal set. This one is the basis of the usual Fourier series, and is perhaps
the most important of all our orthonormal sets. By the way, we do not claim that the
functions in (2.6), ( 2.7), and (2.8) are orthogonal to functions in the other sets, but only
separately among themselves. For instance, L 2 sin πx is not orthogonal to
L
1
L onthe interval [0,L].
20 Linear Methods of Applied Mathematics
4. Recall that by Euler’s formula, exp(iα) := eiα := cos(α) + i sin(α). The complex
trigonometric functions exp 2πinx have many useful properties, including
L
exp 2πinx = 1 for all x;
L
and
1 exp 2πinx ∞
(2.9)
L L n>–∞
For later purposes, we observe here that the sets of functions (2.5)-(2.9) are each
orthogonal on any interval [a,b] with L = b-a.
Before finishing this section we need two more notions about vectors and functions,
thought of as abstract vectors. The first is distance. With the standard inner product, we
would like to define the distance between two functions f and g as
b
f–g = f x –g x 2dx .
a
This turns out to be a familiar quantity in data analysis, called the root-mean-square, or
r.m.s., deviation. It is a convenient way of specifying how large the error is when the true
function f is replaced by a mathematical approximation or experimentally measured function
g. It is always positive unless f = g almost everywhere.
Definition II.6. Almost everywhere is a technical phrase meaning that f and g differ for
sufficiently few values of x that all integrals involving f have the same values as those
involving g. For most practical purposes we may regard f and g as the same functions, and
write:
f = g a.e.
The second notion we generalize from ordinary vectors is that of projection. Suppose that
we have a position vector in three dimensions such as v = (3, 4, 5).
Linearity 21
Model Problem II.7. We wish to find the vector in the plane spanned by (1,1,1) and
(1,-1,0), which is the closest to v = (3, 4, 5) .
Solution. We solve this problem with a projection. The projection of a vector v onto the
vector v1 is given by the formula:
v⋅ v1 v1
Pv 1 v= (2.10)
v1 2
Notice that this points in the direction of v1 but has a length equal to ||v|| cos(θ), where θ is
the angle between v and v1. The length of v1 has nothing to do with the result of this
projection - if we were being very careful, we would say that we were projecting v onto the
direction determined by v1, or onto the line through v1. For similar reasons we notice that
the vector v1 could be normalized to have length 1, so that the
denominator can be ignored - it is 1. In our example,
1
3 12 1 4
P 11 4 = 31 = 4.
1 5 4
(Here we write the vectors as column vectors because the projection operator is equivalent
to a 3×3 matrix multiplying them.)
------------------------------------------------
If the basis for the plane consists of orthogonal vectors v1 and v2, as in our example, then
the projection into the plane is just the sum of the projections onto the two vectors:
2 v⋅vn vn
P v 1,v2 v = Σ
n= 1 vn 2
.
(2.11)
In our example,
4 –1/2 7/2
Pv v = 4 + 1/2 = 9/2 .
1 ,v2 4 0 4
Calculations like these are easily automated with Mathematica or Maple (see the notebook
or worksheet which accompany this text).
22 Linear Methods of Applied Mathematics
Model Problem II.8. We wish to find a) the vector in the plane spanned by (1,1,1) and
(1,2,3), which is the closest to v = (3, 4, 5) and b) the vector in the same plane closest to
(1,-1,0).
Solution. This is similar to the previous problem, except that the vectors defining the plane
are not orthogonal. We need to replace them with a different pair of vectors, which are
linear combinations of the first, but which are orthogonal. (We'll do this later
systematically, with the Gram-Schmidt method.) The formula (2.11) is definitely wrong if
the vectors v n are not orthogonal. After finding the new pair of vectors, however, the
solution will be as before - just sum the projections onto the orthogonal basis vectors.
There is more than one good choice for the pair of orthogonal basis vectors. If we decide
that the pair of orthogonal vectors will include v1 = (111), then we can look for a second
vector of the form v2 = (1,2,3) - α (1,1,1) which will be orthogonal to v1, but will still lie
in the plane. For orthogonality, we need to solve the vector equation
1 1 1
1 ⋅ 2 –α 1 =0
1 3 1
The projection of (3,4,5) into the plane is the sum of its projections onto these vectors, i.e.,
3+4+5 11 + –3 +0 + 5 –1 3
0 = 4 .
3 1 2 1 5
Perhaps it looks strange to see that the projection of the vector (3,4,5) is itself, but the
interpretation is simply that the vector was already in the plane before it was projected.
In general a vector will be moved (and shortened) when it is projected into a plane, and we
can see this when we project (1,-1,0):
1–1+0 11 + –1 + 0 + 0 –1 1/2
0 = 0 .
3 1 2 1 –1/2
Linearity 23
Now that we have a vector in the plane, if we project again, it won't move. Algebraically,
projections satisfy the equation
P2 = P .
We shall next make these same calculations in function space to find the best mean-square
fit of a function f(x) by a nicer expression, such as the functions in (2.8) or polynomials.
(This can be automated with Mathematica or Maple.) The formula simply replaces the dot
product with the standard inner product:
f, g
P{g(x)}(f(x)) := 2 g(x).
g
For example, if we wish to find the multiple of sin(3x) which is closest to the function x on
the interval 0 ≤ x ≤ π, we find:
x sin(3 x) dx
sin(3 x) = 2 sin(3 x).
0
P{sin(3x)}(x) = π
2 3
sin(3 x) dx
0
Now consider what we mean when we project a function onto the constant function 1.
This should be the best approximation to f(x) consisting of a single number. What could
this be but the average of f? Indeed, the projection formula gives us
b
f(x) dx
f, 1 a
P{1}(f(x)) := 2 1= ,
b– a
1
which is familiar as the average of a function.
Model Problem II.9. Consider the set of functions on the interval 0 ≤ x ≤ π. We wish
to find the function in the span of the functions 1, x, and x2, which is the closest to f(x) =
cos(πx/2) on the interval -1 ≤ x ≤ 1 . In other words, find the best quadratic fit to the
function f in the mean-square sense. In Exercise II.5 you are asked to compare a similar
global polynomial approximation to this function with the Taylor series.
Solution. The calculations can be done with Mathematica or Maple, if you prefer.
Conceptually, the calculations are closely analogous to those of Model Problem II.8.
24 Linear Methods of Applied Mathematics
First let us ask whether the three functions are orthogonal. Actually, no. The function x is
orthogonal to each of the other two, but 1 and x2 are not orthogonal. We can see this
immediately because x is odd, while 1 and x2 are both even, and both positive a.e.
A more suitable function than x2 would be x2 minus its projection onto the direction of 1,
that is, x2 minus its average, which is easily found to be 1/3. The set of functions
{1, x, and x2 - 1/3} is an orthogonal set, as you can check.
Then we can project cos(πx/2) onto the span of the three functions 1, x, and x2 - 1/3:
1
1
2
t –1/3 cos(πt/2) dt
–1 2
P{x2 –1/3} cos(πx/2) = 1
x –1/3
2
2
t –1/3 dt
–1
2
–15 24 – 2π 2
= 3 x –1/3
2π
The best quadratic approximation to cos(πx/2) on the interval -1 ≤ x ≤ 1 is the sum of these
three functions. Here is a graph showing the original function and its approximation:
Linearity 25
For comparison, here is a plot which shows the Taylor approximation as well as the
original and the best quadratic:
The general algorithm for finding the best approximation to a function is as follows.
Suppose that we want to find the best approximation to f(x), a ≤ x ≤ b, of the form
α 1g1(x) + α 2g2(x) + ... + α ngn(x)
where g1...g n are some functions with nice properties - they may oscillate with definite
frequencies, have simple shapes, etc. They could be chosen to capture the important
features of f(x), while possibly simplifying its form or filtering out some noise.
Step 1. Replace g1...g n by an orthogonal set with the same span. (A systematic way to do
this, the Gram-Schmidt procedure, is described below.) Let's call the orthogonal set
h1...h n.
Step 2. Project f onto each of the hk
Step 3. Sum the projections. If P denotes the span of g1...g n, then
n f,h k
ProjP(f) = Σ
k=1 hk 2
h k(x). (2.12)
Perhaps the most important functional approximation uses the Fourier functions (2.8), as
we shall learn to call them. The coefficients of these functions in the projection are called
Fourier coefficients, and the approximation is as follows:
M N
f x ≅ a0 + Σ
m=1
a mcos 2πmx + Σ b nsin 2πnx .
L n =1 L
26 Linear Methods of Applied Mathematics
The right side should be the projection of f(x) on the span of the sines and cosines
(including the constant) on the right. To get the coefficients we use the analogue of the
formula
<a href="$2.10">(2.10)</a>.
For example,<p>
L
2
am = L cos 2πmx f x dx, m = 1, 2, ... (2.13)
0 L
L
1
a0 = L f x dx, m =1, 2, ... (2.14)
0
Formulae (2.13)-(2.15) will be the basis of the Fourier series in the next section.
But how do we come up with a basis in the first place? Suppose you are given several
vectors, such as v1,2,3 = (1,0,0), (1,1,0), and (1,1,1), and you want to recombine them to
Linearity 27
get an orthonormal, or at least orthogonal set. You can do this by projecting away the parts
of the vectors orthogonal to each other. The systematic way of doing this is called the
Gram-Schmidt procedure, and it depends a great deal on the order in which it is done.
Model Problem II.10. Find an orthonormal set with the same span as v1,2,3 = (1,0,0),
(1,1,0), and (1,1,1), beginning with w1 = v1 = (1,0,0). (We rename it because it is the
first vector in a new set of recombined vectors.)
Solution. The next vector v2 is not orthogonal to v1, so we subtract off the projection of v2
onto the direction of v1:
1 1 0
w2 =v 2 – Pw1 v2 = 1 – 0 = 1 .
0 0 0
For w 3, we begin with v3 and project away the parts in the plane spanned by v1 and v2,
which is the same as the plane spanned by w1 and w2. We find the standard basis vector
0
w3 = 0 .
1
Notice that the results are different if we take the same vectors in a different order:
Model Problem II.11. Find an orthonormal set with the same span as v1,2,3 = (1,0,0),
(1,1,0), and (1,1,1), beginning with v3.
Finally, taking v1, projecting out the components in these directions, and normalizing,
gives us the vector
28 Linear Methods of Applied Mathematics
1
w3 = 1 –1 .
2 0
Model Problem II.12. Construction of the Legendre polynomials. Let us consider the
interval -1 ≤ x ≤ 1, and find a set of orthonormal functions which are more mundane than
the trigonometric functions, namely the polynomials. We begin with the power functions
1, x, x2, x 3, …. Some of these are orthogonal because some are even functions and others
are odd, but they are not all orthogonal to one another. For instance,
1
1,x = x 2 dx = 23 .
2
–1
Let us denote the set of orthogonal polynomials we get using the Gram-Schmidt procedure
on the power functions (in this order) p0, p 1, p 2, …. These are important in approximation
theory and differential equations, and are known as the normalized Legendre polynomials.
Beginning with the function x0 = 1, after normalization we find
p0 x = 1
2
The next power is x1 = x. Since x is already orthogonal to any constant function, all we do
is normalize it:
p1 x = 3 x
2
To make x2 orthogonal to p0 , we need to subtract a constant: x2 - 1/3. Because of
symmetry, it is already orthogonal to p1, so we don't worry about p1 yet, and just
normalize:
p2 x = 5 3 x2 – 1
2 2 2
Similarly, when orthogonalizing x3 we need to project out x but not 1 or x2. We find
x3 - 3x/5, or, when normalized:
p3 x = 7 5 x3 – 3 x
2 2 2
etc.
By the way, Legendre polynomials are traditionally not normalized as we have done, but
rather are denoted Pk(x) and scaled so that Pk(1) = 1. The normalization for Legendre
polynomials of arbitrary index is such that
pn(x) = (n + 1/2)1/2 Pn(x).
Linearity 29
Most special functions are known to Mathematica and Maple, so calculations with them are
no more difficult than calculations with sines and cosines.
Exercises.
II.1. Find the norms of and “angles” between the following functions defined for
-1≤x≤1. (In the case of complex quantities, the “angle” is not so meaningful, but you can
still define the cosines of the angles.) Use the standard inner product:
1, x, x2, cos(πx), exp(iπx)
II.2. Verify the inner product of Example II.2.5 for 2×2 matrices. Find all matrices
orthogonal to 0 1 . Do they form a subspace of the 2×2 matrices? If so, of what
10
dimension?
II.3. It was remarked above that the projection operator on ordinary 3-vectors is
equivalent to a matrix. What is the matrix for the projection onto (1,2,-3)? What is the
matrix for the projection onto the plane through the origin determined by the vectors
(1,2,-3) and (-2,0,0)?
II.4. Use the Gram-Schmidt procedure to construct other orthonormal sets from the basis
vectors v1,2,3 given in the text, taken in other orders. How many distinct basis sets are
there?
II.6. a) Calculate the Taylor series for cos(πx/2) about the point x=0, up to the term with
x4. (Go to x6 if you choose to use software to do this problem.)
b) Use the Legendre polynomials to find the polynomial of degree 4 which is the best
approximation to cos(πx/2) for -1≤x≤1.
c) On a single graph, sketch cos(πx/2) and these two approximations.
II.7. In each case below, find a) the multiple of g which is closest in the mean-square
sense to f, and b) a function of the form f - αg which is orthogonal to f. If for some reason
this is not possible, interpret the situation geometrically.
(i) f(x) = sin(πx), g(x) = x, 0 ≤ x ≤ 1
(ii) f(x) = sin(πx), g(x) = x, -1 ≤ x ≤ 1
(iii) f(x) = cos(πx), g(x) = x, -1 ≤ x ≤ 1
(iv) f(x) = x2 - 1, g(x) = x2 + 1, -1 ≤ x ≤ 1
II.8. Show with explicit examples that the formula (2.11) is definitely wrong if the
vectors vn are not orthogonal.
---------------------------------------------------------------------------------------------
Link to HTML version of Chapter II
Chapter I
Chapter III
Table of Contents
Evans Harrell's home page
Jim Herod's home page