Lec4 Orth
Lec4 Orth
The point: The framework for approximation by orthogonal bases (‘generalized Fourier
series’) is set up - the appropriate spaces, general idea and examples using polynomials. The
Chebyshev polynomials, which have an important connection to Fourier series, are a notable
example to be revisited soon. The structure here is the foundation for essential methods
in numerical analysis - Gaussian quadrature (efficient integration), Fourier series (efficient
approximation) and more.
1 General theory
We now consider the problem of continuous approximation. For a space of functions on
[a, b], we seek a basis {φj } such that the first n can be used to approximate the function.
Let w(x) > 0 be a positive function. For complex-valued functions on [a, b], define the
‘weighted’ L2 norm1
Z b 1/2
2
kf kw = w(x)|f (x)| dx
a
Note that in the case of real functions, the overbar (complex conjugate) can be dropped).
We consider approximating functions in the ‘weighted L2 ’ space
Z b
2
Lw ([a, b]) = {f : [a, b] → C s.t. |f (x)|2 w(x) dx < ∞}
a
which includes, in practice, essentially any function of interest. The norm and inner product
(1) is well-defined on this space.
1
Technically, kf k2,w or something similar should be written to distinguish from other Lp norms, but the
2 here is implied; the subscript may be dropped entirely if the context is clear.
1
A basis {φj } (j = 1, 2, · · · ) for L2w ([a, b]) is a set of functions such that any f in the
space can be expressed uniquely as
X∞
f= cj φj
j=1
PN
in the sense that the partial sums j=1 cj φj converge in norm, i.e.
N
X
lim kf − cj φj kw = 0.
N →∞
j=1
1.1 Orthogonality
Orthogonal bases are particularly nice, both for theory and numerical approximation. Sup-
pose {φj } is an orthogonal basis for L2w ([a, b]) - that is, a basis where hφi , φj i = 0 for i 6= j.
are easily found by taking the inner product with a basis function to select that component:
X hf, φk i
hf, φk i = cj hφj , φk i =⇒ ck = .
j
hφk , φk i
SN = span(φ1 , · · · , φN )
2
Formally, this is proven by writing kgk2 = hg, gi and distributing the inner product:
kfN k2 = hf − fN , f − fN i
X∞ ∞
X
=h cj φj , ck φk i
j=N +1 k=N +1
∞
X X ∞
= cj ck hφj , φk i.
j=N +1 k=N +1
But the inner product is non-zero only when j = k, which yields the result (to be rigorous,
one has to do some work to prove convergence). Then the ‘error’ in the N -th approximation
of the squares of the norms of the omitted terms, e.g. if cj ∼ C/j 2 then the
is then the sum P
error looks like ∞ 4
j=N +1 C/j ∼ C/N .
3
To find the coefficients, note that the minimum occurs at a point where the gradient of E is
zero, so the conditions for a critical point are
Z b n
∂E X
0= = −2 (f − cj φj )φi w(x) dx.
∂ci a j=1
3
1.3 The Gram-schmidt process
Suppose we have a basis {fj } of functions and wish to convert it into an orthogonal basis
{φj }. The Gram-Schmidt process does so, ensuring that
φj ∈ span(f0 , · · · , fj ).
The process is simple: take fj as the ‘starting’ function, then subtract off the components
of fj in the direction of the previous φ’s, so that the result is orthogonal to them. That is,
we compute the sequence
φ0 = f0
hf1 , φ0 i
φ1 = f1 − φ0
hφ0 , φ0 i
.. .
. = ..
j−1
X hfj , φk i
φj = fj − φk .
k=0
hφk , φk i
It is easy to verify that this procedure generates the desired orthogonal basis.
More generally, we can take the ‘starting’ function fj to be any function not in the span of
φ0 , · · · , φj−1 , so it can be chosen to be whatever is most convenient at this step.
kφj k2 = hφj , φj i
can be freely chosen once the basis is constructed by scaling the φ’s (normalization). For
common sets of orthogonal functions, there can be more than one ‘standard’ choice of nor-
malization (e.g. kφj k = 1), so one should be careful when using such references.
However, a more judicious choice lets us remove most of the terms in the formula. Sup-
pose φ0 , · · · , φj have been constructed with the property (2). Then
φj is orthogonal to φ0 , · · · , φj−1
=⇒ φj is orthogonal to Pj−1 .
4
Now we take xφj as the starting function for the next basis function. This function is a
polynomial of degree n + 1 and
since xφk has degree ≤ j − 1 if k ≤ j − 2. Thus xφj is already orthogonal to the previous
basis functions except φj−1 and φj , so the Gram-Schmidt formula only has three terms in it:
hxφj , φj i hxφj , φj−1 i
φj+1 = xφj − φj − φj−1
hφj , φj i hφj−1 , φj−1 i
= (x − αj )φj + βj φj−1
for values αj , βj that can be computed. The formula can be simplified a bit further; see
the textbook. The point here is that the ‘three-term’ formula allows the polynomials to be
generated in the same number of calculations per step, so they are reasonable to compute
(via a computer algebra package - not so reasonable by hand!).
and one can compute nj explicitly with some work (for much more detail and a standard
reference, see Abramowitz and Stegun). Any function in L2 [−1, 1] may then be expressed
as a series in terms of this basis as
∞
hf, φj i 1 1
X Z
f= cj φj , cj = = f (x)φj (x) dx.
j=0
hφ j , φj i n j −1
5
(Convention): Traditionally, the Legendre polynomials are normalized so that φj (1) = 1.
If this is done, then they satisfy
Example (linear approx.): As a simple application, suppose we wish to construct the best
‘least-squares’ line to f (x) = ex in the interval [0, 1]. First, change variables to s ∈ [−1, 1]
using x = (s + 1)/2:
g(s) = f ((s + 1)/2) = e(s+1)/2 .
By the theory, g has a representation in terms of the Legendre basis:
∞
X hg, φj i
g= cj φj (s), cj = .
j=0
hφj , φj i
The first two terms are the best approximation in P1 in the least-squares sense (by the
general theory), so we need only calculate c0 and c1 (with φ0 = 1 and φ1 = s):
R 1 (s+1)/2 R 1 (s+1)/2
e ds e s ds
c0 = −1R 1 = e − 1, c1 = −1R 1 = 9 − 3e.
12 ds s 2 ds
−1 −1
e^x
2.5 approx.
2.0
1.5
1.0
0.00 0.25 0.50 0.75 1.00
x
6
2.2 Chebyshev polynomials
For the inner product Z 1
f (x)g(x)
hf, gi = √ dx
−1 1 − x2
we obtain the Chebyshev polynomials. To obtain them, simply transform the integral
using x = cos θ (so θ ∈ [0, π]); then
Z π
hf, gi = f (cos θ)g(cos(θ)) dθ.
0
We knowR from Fourier series that the set {cos(kθ)} is orthogonal on [0, π] in the L2 inner
π
product 0 f (θ)g(θ) dθ, from which it follows that the polynomials satisfy
Tk (cos(θ)) = cos(kθ)
so they are given by the explicit formula
Tj (x) = cos(j cos−1 (x)). (3)
Trigonometric identities guarantee that this formula actually produces polynomials. For
example,
T2 (cos(θ)) = cos(2θ) = 2 cos2 (θ) − 1 =⇒ T2 = 2x2 − 1.
The Chebyshev polynomials and the corresponding ‘Chebyshev nodes’ (the zeros)
xk = cos((k + 1/2)π/j), k = 0, · · · j − 1. (4)
play a key role in numerical analysis due to their close relation to Fourier series, among other
nice properties. The three-term recurrence reduces to
Tj+1 = 2xTj − Tj−1 , j = 1, 2, · · ·
and the first few Chebyshev polynomials are T0 (x) = 1 and
T1 (x) = x, T2 = 2x2 − 1, T3 = 4x3 − 3x, · · ·
The leading coefficient of Tj (x) is 2j−1 , so 21−j Tj (x) is a monic polynomial.
This is the polynomial of ‘least oscillation’ - it has the smallest peaks among all monic poly-
nomials of that degree.
7
Theorem The monic Chebyshev polynomials tj have the minimax property that
Since |Tj (x)| ≤ 1 and can equal 1, the minimum is 21−n , i.e.
min max |p(x)| = 21−n .
monic p∈Pj x∈[−1,1]
This suggests that Chebyshev polynomials can be used to minimize (or at least come close
to doing so) the maximum error.
Interpolation: Incidentally, the minimax property in the second form shows that for inter-
polation, the Chebyshev nodes do a good job of keeping the Lagrange error under control:
n
f (n+1) (ηx ) Y Mn
(x − xk ) ≤ 2−n ,
(n + 1)! k=0 (n + 1)!
which explains why they are such a good choice for interpolation.
3 Gaussian quadrature
The structure here provides an elegant way to construct a formula
Z b Xn
I= f (x)w(x) dx ≈ ck f (xk ) (5)
a k=0
with the highest possible degree of accuracy. Here both the coefficients and the nodes are
to be chosen. There are 2n + 2 unknowns, suggesting a degree of accuracy of 2n + 1.
Rb
• Let hf, giw = a f (x)g(x)w(x) dx (the weighted inner product). By the Gram-Schmidt
process, there is a sequence {φj } of orthogonal polynomials where φj has degree j.
• φn+1 has n + 1 distinct real zeros x0 , · · · , xn in [a, b]
• Let `k (x) be the k-th Lagrange basis polynomial for these zeros and let
Z b
ck = `k (x) dx.
a
8
The claim is that with this set of xk ’s and ck ’s, the formula (5) has degree 2n + 1.
Proof. Suppose f ∈ P2n+1 . Since pn+1 has degree n + 1, polynomial division gives
Last, we need to establish that I and the formula are equal. Because r(x) has degree ≤ n,
it is equal to its Lagrange interpolant through the nodes x0 , · · · , xn , so
n
X
r(x) = r(xk )`k (x).
k=0
which establishes equality. To see that the degree of accuracy is exactly 2n + 1, consider
n
Y
f (x) = (x − xj )2 .
j=0
9
Summary (Gaussian quadrature) Let {φj } be an orthogonal basis of polynomials in the
Rb
inner product hf, giw a f (x)g(x)w(x) dx and let x0 , · · · , xn be the zeros of the polynomial
φn+1 with Lagrange basis {`k (x)}. Then
Z b n
X Z b
I= f (x)w(x) dx ≈ ck f (xk ), ck = `k (x)w(x) dx, (6)
a k=0 a
called the Gaussian quadrature formula for w(x), has degree of accuracy 2n + 1.
Note that the nodes xk depend on the degree, so really they should be written xn,k (for
k = 0, · · · , n) for φn+1 . One can show that, unlike with equally spaced interpolation,
n
X
lim |I − ck f (xn,k )| = 0
n→∞
k=0
under reasonable assumptions on f (see textbook for details), and the rate Thus, Gaussian
quadrature does well when adding more points to reduce error when function values of f at
any point are available.
The value here is that the accuracy is quite high - however, one has to have control over the
choice of nodes for the formula to be available.
p
When n = 2, p2 (x) = x3 − 3x/5 so the zeros are at ± 3/5 and 0:
Z 1 p p
f (x) dx ≈ c0 f (− 3/5) + c1 f (0) + c2 f ( 3/5)
−1
and this formula has degree of accuracy 5. The coefficients ck are computed from the the
formula (6); while the algebra is messy, they can be computed in advance.
10
3.1 Gauss-Lobatto quadrature
In a slight variation, we instead include the endpoints a and b in the integration.2 The claim
is that the formula (supposing w(x) = 1 for simplicity)
Z b n
X
I= f (x) dx ≈ ck f (yk )
a k=0
where
y1 , y2 , · · · yn−1 = zeros of p0n in (a, b), x00 = a, yn = b,
Z b
ci = `i (x) dx, `i (x) = Lagrange basis poly. for y0 , y1 , · · · , yn
a
is exact for all f ∈ P2n−1 . That is, it has a degree of accuracy two less than Gaussian
quadrature (2n − 1 vs. 2n + 1). The Lobatto version is used when the endpoints are needed
in the approximation (and as a building block for other methods that need the endpoints).
The existence of the zeros yk is clear (it can be shown that the polynomial pn has n distinct
real zeros, the nodes we used for Gaussian quadrature). The coefficient formula cj has the
same derivation as before.
To show the degree of accuracy,suppose f ∈ P2n−1 and use polynomial division to write
(note that p0n has degree n − 1)
f (x) = q(x)p0n (x) + r(x), q, r ∈ Pn .
Then after an integration by parts,
Z b b
Z b Z b
0
f (x) dx = q(x)pn (x) − q (x)pn (x) dx + r(x) dx
a a a a
n
X
= q(b)pn (b) − q(a)pn (a) + ck r(xk )
k=0
11
3.2 Example: Laplace transform (Laguerre)
The Laplace transform is defined as
Z ∞
F (s) = L[f (t)] = f (t)e−st dt.
0
Often, we need to evaluate this transform at many different points s given a function f (t).
This integral is over an infinite interval, which can be handled using Gaussian quadrature.
First, scale out the s with x = st to get
1 ∞
Z
F (s) = f (x/s)e−x dx.
s 0
Now to be efficient, consider Gaussian quadrature in [0, ∞) with weight w(x) = e−x ; the
inner product on L2w ([0, ∞)) is
Z ∞
hf, gi = f (x)g(x)e−x dx.
0
hx2 , x − 1i hx2 , 1i
p2 (x) = x2 − (x − 1) − 1 = x2 − 4x + 2
hx − 1, x − 1i h1, 1i
and so on. With two points, the nodes are
√ √
x0 = 2 − 2, x1 = 2 + 2
One can, of course, go to a much higher degree if more accuracy is needed; the weights ci
and nodes xi are not pleasant to compute, but this can be done in advance to high accuracy
and then stored as hard-coded values in the algorithm.
12
3.3 Singular integrals, briefly
There are many strategies for computing singular integrals. A few starting ideas are presented
here. Take, for example, the integral
Z 1
sin x
I= 3/2
dx.
0 x
Option 1 (brute force): We can use an open Newton-Cotes formula, which avoids the
singularity at the endpoint x = 0. However, the singularity means convergence results may
not apply.
Now split the integral into a ‘small’ singular region and a ‘large’ good region:
Z Z 1
I= f (x) dx + f (x) dx = I1, + I2, .
0
For the good region, just use any normal method; the integrand is not singular. (Note that
an adaptive method is suggested, since one probably needs higher accuracy near x = .
For the bad region, integrate the power series term by term analytically:
Z ∞
X (−1)n+1
f (x) dx = 2n+1/2
0 n=0
(2n + 1)!(2n + 1/2)
We now choose small and enough terms of the sum to get the desired accuracy.
Option 3: Gaussian quadrature. In some cases, one can use Gaussian quadrature,
putting the singularity into the weight function. Here we write
Z 1 Z 1
sin x/x f (x)g(x)
I= 1/2
dx, hf, gi = dx.
0 x 0 x1/2
The proceed by obtaining the orthogonal polynomials and their zeros. This approach can
be useful if we can do the calculation of the weights/nodes in advance.
13