0% found this document useful (0 votes)
18 views13 pages

Lec4 Orth

Uploaded by

coffeylmao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Lec4 Orth

Uploaded by

coffeylmao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Math 563 Lecture Notes

Approximation with orthogonal bases


Spring 2020

The point: The framework for approximation by orthogonal bases (‘generalized Fourier
series’) is set up - the appropriate spaces, general idea and examples using polynomials. The
Chebyshev polynomials, which have an important connection to Fourier series, are a notable
example to be revisited soon. The structure here is the foundation for essential methods
in numerical analysis - Gaussian quadrature (efficient integration), Fourier series (efficient
approximation) and more.

Related reading: Details on orthogonal polynomials can be found in Quarteroni, 10.1.

1 General theory
We now consider the problem of continuous approximation. For a space of functions on
[a, b], we seek a basis {φj } such that the first n can be used to approximate the function.
Let w(x) > 0 be a positive function. For complex-valued functions on [a, b], define the
‘weighted’ L2 norm1
Z b 1/2
2
kf kw = w(x)|f (x)| dx
a

which has an associated (complex) inner product


Z b
hf, giw = f (x)g(x)w(x) dx. (1)
a

Note that in the case of real functions, the overbar (complex conjugate) can be dropped).
We consider approximating functions in the ‘weighted L2 ’ space
Z b
2
Lw ([a, b]) = {f : [a, b] → C s.t. |f (x)|2 w(x) dx < ∞}
a

which includes, in practice, essentially any function of interest. The norm and inner product
(1) is well-defined on this space.

1
Technically, kf k2,w or something similar should be written to distinguish from other Lp norms, but the
2 here is implied; the subscript may be dropped entirely if the context is clear.

1
A basis {φj } (j = 1, 2, · · · ) for L2w ([a, b]) is a set of functions such that any f in the
space can be expressed uniquely as
X∞
f= cj φj
j=1
PN
in the sense that the partial sums j=1 cj φj converge in norm, i.e.

N
X
lim kf − cj φj kw = 0.
N →∞
j=1

A set of functions {fj } is called orthogonal if hfi , fj i = 0 for i 6= j.

1.1 Orthogonality
Orthogonal bases are particularly nice, both for theory and numerical approximation. Sup-
pose {φj } is an orthogonal basis for L2w ([a, b]) - that is, a basis where hφi , φj i = 0 for i 6= j.

Coefficients: The coeffiicents cj in the representation


X
f= cj φj
j

are easily found by taking the inner product with a basis function to select that component:
X hf, φk i
hf, φk i = cj hφj , φk i =⇒ ck = .
j
hφk , φk i

Best approximation property: The approximation


N
X
fN = cj φj = first N terms of the series for f
j=1

is the best approximation to f in the subspace

SN = span(φ1 , · · · , φN )

in the sense that


g = fN minimizes kf − gkw for g ∈ SN .
Error: We also have have Parseval’s theorem
N
X ∞
X
kf − cj φj k2w = c2j kφj k2w .
j=1 j=N +1

2
Formally, this is proven by writing kgk2 = hg, gi and distributing the inner product:
kfN k2 = hf − fN , f − fN i
X∞ ∞
X
=h cj φj , ck φk i
j=N +1 k=N +1

X X ∞
= cj ck hφj , φk i.
j=N +1 k=N +1

But the inner product is non-zero only when j = k, which yields the result (to be rigorous,
one has to do some work to prove convergence). Then the ‘error’ in the N -th approximation
of the squares of the norms of the omitted terms, e.g. if cj ∼ C/j 2 then the
is then the sum P
error looks like ∞ 4
j=N +1 C/j ∼ C/N .
3

1.2 (Continuous) least squares


The properties listed above suggest we can use orthogonal bases to construct good approxi-
mations. Suppose {φj } is a basis (not necessarily orthogonal) and
N
X
f≈ cj φj minimizes kf − gkw over SN .
j=1

Then the cj ’s minimize the L2 error


N
X Z b N
X
2
E(c) = kf − cj φj k = (f − cj φj )2 w(x) dx.
j=1 a j=1

To find the coefficients, note that the minimum occurs at a point where the gradient of E is
zero, so the conditions for a critical point are
Z b n
∂E X
0= = −2 (f − cj φj )φi w(x) dx.
∂ci a j=1

It follows that E is minimized when


Z b n Z b
X 
f (x)φi (x) dx = φi φj w(x) dx cj , i = 1, · · · n.
a j=1 a

In matrix form, this is a linear system


Z b Z b
Ac = f , aij = φi φj w(x) dx = hφi , φj iw , fi = f (x)φi (x)w(x) dx = hf, φi iw .
a a
For numerical stability and computational efficiency, we want the matrix A to be as simple
as possible. If the basis is orthogonal then the equations reduce to a diagonal system since
hφi , φj i = 0 for i 6= j and the solution is just
hf, φi iw
ci =
hφi , φi iw
Exploiting the structure of the orthogonal basis is our starting point for building good
numerical approximations.

3
1.3 The Gram-schmidt process
Suppose we have a basis {fj } of functions and wish to convert it into an orthogonal basis
{φj }. The Gram-Schmidt process does so, ensuring that
φj ∈ span(f0 , · · · , fj ).
The process is simple: take fj as the ‘starting’ function, then subtract off the components
of fj in the direction of the previous φ’s, so that the result is orthogonal to them. That is,
we compute the sequence
φ0 = f0
hf1 , φ0 i
φ1 = f1 − φ0
hφ0 , φ0 i
.. .
. = ..
j−1
X hfj , φk i
φj = fj − φk .
k=0
hφk , φk i

It is easy to verify that this procedure generates the desired orthogonal basis.

More generally, we can take the ‘starting’ function fj to be any function not in the span of
φ0 , · · · , φj−1 , so it can be chosen to be whatever is most convenient at this step.

Caution (normalization): The norm-squared

kφj k2 = hφj , φj i

can be freely chosen once the basis is constructed by scaling the φ’s (normalization). For
common sets of orthogonal functions, there can be more than one ‘standard’ choice of nor-
malization (e.g. kφj k = 1), so one should be careful when using such references.

1.4 The ‘three-term’ recurrence


When considering polynomial basis, we can simplify the process. We seek an orthogonal
basis {φj } such that φj is a polynomial of degree j so that
span(φ0 , · · · , φj ) = Pj . (2)
One could use the starting basis 1, x, x2 , · · · and then apply Gram-Schmidt.

However, a more judicious choice lets us remove most of the terms in the formula. Sup-
pose φ0 , · · · , φj have been constructed with the property (2). Then
φj is orthogonal to φ0 , · · · , φj−1
=⇒ φj is orthogonal to Pj−1 .

4
Now we take xφj as the starting function for the next basis function. This function is a
polynomial of degree n + 1 and

hxφj , φk i = hφj , xφk i = 0 if k ≤ j − 2

since xφk has degree ≤ j − 1 if k ≤ j − 2. Thus xφj is already orthogonal to the previous
basis functions except φj−1 and φj , so the Gram-Schmidt formula only has three terms in it:
hxφj , φj i hxφj , φj−1 i
φj+1 = xφj − φj − φj−1
hφj , φj i hφj−1 , φj−1 i
= (x − αj )φj + βj φj−1

for values αj , βj that can be computed. The formula can be simplified a bit further; see
the textbook. The point here is that the ‘three-term’ formula allows the polynomials to be
generated in the same number of calculations per step, so they are reasonable to compute
(via a computer algebra package - not so reasonable by hand!).

2 Approximation by orthogonal polynomials


2.1 Legendre polynomials
To start, consider [−1, 1] and w(x) = 1. We use Gram-Schmidt and the three-term recurrence
trick to find the basis, which are the Legendre polynomials. The first few calculations
are as follows:
φ0 (x) = 1
hx, 1i
φ1 (x) = x − 1=x
h1, 1i
hx2 , xi hx2 , 1i 1
φ2 (x) = x2 − x− 1 = x2 − 0 · x − = x2 − 1/3
hx, xi h1, 1i 3
1 hxφ2 , φ2 i hxφ2 , xi 3
φ3 (x) = x3 − x − φ2 − x = x3 − x
3 hφ2 , φ2 i hx, xi 5
and so on. One can obtain a recurrence for the Legendre polynomials by some further work.
Explicitly, the orthogonality relation is that
Z 1 (
0 i 6= j
φi (x)φj (x) dx =
−1 nj i = j

and one can compute nj explicitly with some work (for much more detail and a standard
reference, see Abramowitz and Stegun). Any function in L2 [−1, 1] may then be expressed
as a series in terms of this basis as

hf, φj i 1 1
X Z
f= cj φj , cj = = f (x)φj (x) dx.
j=0
hφ j , φj i n j −1

5
(Convention): Traditionally, the Legendre polynomials are normalized so that φj (1) = 1.
If this is done, then they satisfy

(j + 1)φj+1 − (2j + 1)xφj + jφj−1 = 0.

By this process, φ2 = 21 (3x2 − 1) and φ3 = 12 (5x3 − 3x) and so on.

Example (linear approx.): As a simple application, suppose we wish to construct the best
‘least-squares’ line to f (x) = ex in the interval [0, 1]. First, change variables to s ∈ [−1, 1]
using x = (s + 1)/2:
g(s) = f ((s + 1)/2) = e(s+1)/2 .
By the theory, g has a representation in terms of the Legendre basis:

X hg, φj i
g= cj φj (s), cj = .
j=0
hφj , φj i

The first two terms are the best approximation in P1 in the least-squares sense (by the
general theory), so we need only calculate c0 and c1 (with φ0 = 1 and φ1 = s):
R 1 (s+1)/2 R 1 (s+1)/2
e ds e s ds
c0 = −1R 1 = e − 1, c1 = −1R 1 = 9 − 3e.
12 ds s 2 ds
−1 −1

Converting back to x ∈ [0, 1] we have the approximation

f (x) = g(2x − 1) ≈ (e − 1) + (9 − 3e)(2x − 1).

This line minimizes the L2 error in [0, 1].

e^x
2.5 approx.
2.0

1.5

1.0
0.00 0.25 0.50 0.75 1.00
x

6
2.2 Chebyshev polynomials
For the inner product Z 1
f (x)g(x)
hf, gi = √ dx
−1 1 − x2
we obtain the Chebyshev polynomials. To obtain them, simply transform the integral
using x = cos θ (so θ ∈ [0, π]); then
Z π
hf, gi = f (cos θ)g(cos(θ)) dθ.
0

We knowR from Fourier series that the set {cos(kθ)} is orthogonal on [0, π] in the L2 inner
π
product 0 f (θ)g(θ) dθ, from which it follows that the polynomials satisfy
Tk (cos(θ)) = cos(kθ)
so they are given by the explicit formula
Tj (x) = cos(j cos−1 (x)). (3)
Trigonometric identities guarantee that this formula actually produces polynomials. For
example,
T2 (cos(θ)) = cos(2θ) = 2 cos2 (θ) − 1 =⇒ T2 = 2x2 − 1.
The Chebyshev polynomials and the corresponding ‘Chebyshev nodes’ (the zeros)
xk = cos((k + 1/2)π/j), k = 0, · · · j − 1. (4)
play a key role in numerical analysis due to their close relation to Fourier series, among other
nice properties. The three-term recurrence reduces to
Tj+1 = 2xTj − Tj−1 , j = 1, 2, · · ·
and the first few Chebyshev polynomials are T0 (x) = 1 and
T1 (x) = x, T2 = 2x2 − 1, T3 = 4x3 − 3x, · · ·
The leading coefficient of Tj (x) is 2j−1 , so 21−j Tj (x) is a monic polynomial.

2.2.1 Minimax property


An interesting question is to determine the polynomial that minimizes
max |p(x)| for monic p ∈ Pj .
x∈[−1,1]

This is the polynomial of ‘least oscillation’ - it has the smallest peaks among all monic poly-
nomials of that degree.

Surprisingly, the answer is that the scaled Chebyshev polynomial


tj (x) = 21−j Tj (x) = xj + · · ·
is this minimizer.

7
Theorem The monic Chebyshev polynomials tj have the minimax property that

tj (x) minimizes max |p(x)| for monic p ∈ Pj .


x∈[−1,1]

Since |Tj (x)| ≤ 1 and can equal 1, the minimum is 21−n , i.e.
 
min max |p(x)| = 21−n .
monic p∈Pj x∈[−1,1]

Equivalently, the Chebyshev nodes (4) (the zeros of Tj (x)) minimize


j−1
Y
max (x − xk )
x∈[−1,1]
k=0

among all sets of nodes x0 , · · · , xj−1 .

This suggests that Chebyshev polynomials can be used to minimize (or at least come close
to doing so) the maximum error.

Interpolation: Incidentally, the minimax property in the second form shows that for inter-
polation, the Chebyshev nodes do a good job of keeping the Lagrange error under control:
n
f (n+1) (ηx ) Y Mn
(x − xk ) ≤ 2−n ,
(n + 1)! k=0 (n + 1)!

which explains why they are such a good choice for interpolation.

3 Gaussian quadrature
The structure here provides an elegant way to construct a formula
Z b Xn
I= f (x)w(x) dx ≈ ck f (xk ) (5)
a k=0

with the highest possible degree of accuracy. Here both the coefficients and the nodes are
to be chosen. There are 2n + 2 unknowns, suggesting a degree of accuracy of 2n + 1.
Rb
• Let hf, giw = a f (x)g(x)w(x) dx (the weighted inner product). By the Gram-Schmidt
process, there is a sequence {φj } of orthogonal polynomials where φj has degree j.
• φn+1 has n + 1 distinct real zeros x0 , · · · , xn in [a, b]
• Let `k (x) be the k-th Lagrange basis polynomial for these zeros and let
Z b
ck = `k (x) dx.
a

8
The claim is that with this set of xk ’s and ck ’s, the formula (5) has degree 2n + 1.
Proof. Suppose f ∈ P2n+1 . Since pn+1 has degree n + 1, polynomial division gives

f = q(x)pn+1 (x) + r(x), q, r ∈ Pn .

Plugging this expression into the integral,


Z b
I= (q(x)pn+1 (x) + r(x))w(x) dx
a
Z b
= hq, pn+1 iw + r(x)w(x) dx
a
Z b
= r(x)w(x) dx
a

because pn+1 is orthogonal to all polynomials of degree ≤ n, which includes q.


Now plug the expression into the formula:
n
X
formula = ck f (xk )
k=0
n
X n
X
= ck q(xk )pn+1 (xk ) + ck r(xk )
k=0 k=0
Xn
= ck r(xk ).
k=0

Last, we need to establish that I and the formula are equal. Because r(x) has degree ≤ n,
it is equal to its Lagrange interpolant through the nodes x0 , · · · , xn , so
n
X
r(x) = r(xk )`k (x).
k=0

Thus, working from the formula for I,


Z b n
X Z b n
X
I= r(x)w(x) dx = r(xk ) `k (x)w(x) dx = ck r(xk )
a k=0 a k=0

which establishes equality. To see that the degree of accuracy is exactly 2n + 1, consider
n
Y
f (x) = (x − xj )2 .
j=0

9
Summary (Gaussian quadrature) Let {φj } be an orthogonal basis of polynomials in the
Rb
inner product hf, giw a f (x)g(x)w(x) dx and let x0 , · · · , xn be the zeros of the polynomial
φn+1 with Lagrange basis {`k (x)}. Then
Z b n
X Z b
I= f (x)w(x) dx ≈ ck f (xk ), ck = `k (x)w(x) dx, (6)
a k=0 a

called the Gaussian quadrature formula for w(x), has degree of accuracy 2n + 1.

Note that the nodes xk depend on the degree, so really they should be written xn,k (for
k = 0, · · · , n) for φn+1 . One can show that, unlike with equally spaced interpolation,
n
X
lim |I − ck f (xn,k )| = 0
n→∞
k=0

under reasonable assumptions on f (see textbook for details), and the rate Thus, Gaussian
quadrature does well when adding more points to reduce error when function values of f at
any point are available.

Example: For example, when n = 1 and w(x) = 1we have


Z 1
f (x) dx ≈ c0 f (x0 ) + c1 f (x1 ).
−1

The nodes are the zeros of p2 (x) = x2 − 1/3, so


1 1
x0 = − √ , x1 = √ .
3 3
It is not hard to then compute c0 = c1 = 1, yielding
Z 1
1 1
f (x) dx = f (− √ ) + f ( √ )
−1 3 3
which has degree of accuracy 3.

The value here is that the accuracy is quite high - however, one has to have control over the
choice of nodes for the formula to be available.
p
When n = 2, p2 (x) = x3 − 3x/5 so the zeros are at ± 3/5 and 0:
Z 1 p p
f (x) dx ≈ c0 f (− 3/5) + c1 f (0) + c2 f ( 3/5)
−1

and this formula has degree of accuracy 5. The coefficients ck are computed from the the
formula (6); while the algebra is messy, they can be computed in advance.

10
3.1 Gauss-Lobatto quadrature
In a slight variation, we instead include the endpoints a and b in the integration.2 The claim
is that the formula (supposing w(x) = 1 for simplicity)
Z b n
X
I= f (x) dx ≈ ck f (yk )
a k=0

where
y1 , y2 , · · · yn−1 = zeros of p0n in (a, b), x00 = a, yn = b,
Z b
ci = `i (x) dx, `i (x) = Lagrange basis poly. for y0 , y1 , · · · , yn
a
is exact for all f ∈ P2n−1 . That is, it has a degree of accuracy two less than Gaussian
quadrature (2n − 1 vs. 2n + 1). The Lobatto version is used when the endpoints are needed
in the approximation (and as a building block for other methods that need the endpoints).

The existence of the zeros yk is clear (it can be shown that the polynomial pn has n distinct
real zeros, the nodes we used for Gaussian quadrature). The coefficient formula cj has the
same derivation as before.

To show the degree of accuracy,suppose f ∈ P2n−1 and use polynomial division to write
(note that p0n has degree n − 1)
f (x) = q(x)p0n (x) + r(x), q, r ∈ Pn .
Then after an integration by parts,
Z b b
Z b Z b
0
f (x) dx = q(x)pn (x) − q (x)pn (x) dx + r(x) dx
a a a a
n
X
= q(b)pn (b) − q(a)pn (a) + ck r(xk )
k=0

since q 0 has degree n − 1 so it is orthogonal to pn . Now note that


r(xk ) = f (xk ) if 1 ≤ k ≤ n − 1
but is not equal when k = 0 or k = n, so
Z b n
X
f (x) dx = q(b)pn (b) − q(a)pn (a) + ck f (xk ) − q(a)p0n (a)c0 − q(b)p0n (b)cn
a k=0
Z b n
X
f (x) dx = ck f (xk ) + q(b)(pn (b) − p0n (b)cn ) − q(a)(pn (a) + p0n (a)c0 )
a k=0
The boundary terms can be shown to vanish using the identities
Z b Z b
0= pn (x) dx, pn (b) − pn (a) = p0n (x) dx
a a
noting that pn is orthogonal to p0 = 1 (left as an exercise).
2
Adapted from Trangenstein, Scientific Computing lecture notes, 2011.

11
3.2 Example: Laplace transform (Laguerre)
The Laplace transform is defined as
Z ∞
F (s) = L[f (t)] = f (t)e−st dt.
0

Often, we need to evaluate this transform at many different points s given a function f (t).
This integral is over an infinite interval, which can be handled using Gaussian quadrature.
First, scale out the s with x = st to get
1 ∞
Z
F (s) = f (x/s)e−x dx.
s 0

Now to be efficient, consider Gaussian quadrature in [0, ∞) with weight w(x) = e−x ; the
inner product on L2w ([0, ∞)) is
Z ∞
hf, gi = f (x)g(x)e−x dx.
0

Start with p0 (x) = 1 and then (using 1, x, x2 for simplicity here)


Z ∞
hx, 1i
p1 (x) = x − 1=x− xe−x dx = x − 1,
h1, 1i 0

hx2 , x − 1i hx2 , 1i
p2 (x) = x2 − (x − 1) − 1 = x2 − 4x + 2
hx − 1, x − 1i h1, 1i
and so on. With two points, the nodes are
√ √
x0 = 2 − 2, x1 = 2 + 2

and the coefficients are


Z ∞
x − x1 −x 1
Z ∞
1 √
c0 = e dx = − √ (x − x1 )e−x dx = (2 + 2) ≈ 0.85355
0 x0 − x1 2 2 0 4
Z ∞
x − x0 −x
c1 = e dx ≈ 0.146447
0 x1 − x0
so a quick to compute approximation with degree of accuracy 3 is
Z ∞
g(x)e−x dx ≈ c0 g(x0 ) + c1 g(x1 ).
0

One can, of course, go to a much higher degree if more accuracy is needed; the weights ci
and nodes xi are not pleasant to compute, but this can be done in advance to high accuracy
and then stored as hard-coded values in the algorithm.

12
3.3 Singular integrals, briefly
There are many strategies for computing singular integrals. A few starting ideas are presented
here. Take, for example, the integral
Z 1
sin x
I= 3/2
dx.
0 x

Option 1 (brute force): We can use an open Newton-Cotes formula, which avoids the
singularity at the endpoint x = 0. However, the singularity means convergence results may
not apply.

Option 2 (local approximation): The trick here is to use an asymptotic approxima-


tion (from theory) near the singularity. In this case, we can just use a Taylor series. Let
sin x
f (x) = .
x3/2
Then expand the series for sin x to get

X (−1)n+1 2n−1/2
f (x) = x
n=0
(2n + 1)!

Now split the integral into a ‘small’ singular region and a ‘large’ good region:
Z  Z 1
I= f (x) dx + f (x) dx = I1, + I2, .
0 

For the good region, just use any normal method; the integrand is not singular. (Note that
an adaptive method is suggested, since one probably needs higher accuracy near x = .

For the bad region, integrate the power series term by term analytically:
Z  ∞
X (−1)n+1
f (x) dx = 2n+1/2
0 n=0
(2n + 1)!(2n + 1/2)

We now choose  small and enough terms of the sum to get the desired accuracy.

Option 3: Gaussian quadrature. In some cases, one can use Gaussian quadrature,
putting the singularity into the weight function. Here we write
Z 1 Z 1
sin x/x f (x)g(x)
I= 1/2
dx, hf, gi = dx.
0 x 0 x1/2
The proceed by obtaining the orthogonal polynomials and their zeros. This approach can
be useful if we can do the calculation of the weights/nodes in advance.

Option 4: Transform! There are a number of tricks to transform a singular integral


into a non-singular one. These methods are of varying complexity and tend to be problem
dependent (exception: the rather general double exponential rule). The details will not be
pursued here.

13

You might also like