Iske A Approximation Theory and Algorithms For Data Analysis
Iske A Approximation Theory and Algorithms For Data Analysis
Armin Iske
Approximation
Theory and
Algorithms for
Data Analysis
Texts in Applied Mathematics
Volume 68
Editors-in-chief
S. S. Antman, University of Maryland, College Park, USA
A. Bloch, University of Michigan, Public University, City of Michigan, USA
A. Goriely, Universiyty of Oxford, Oxford, UK
L. Greengard, New York University, New York, USA
P. J. Holmes, Princeton University, Princeton, USA
Series editors
J. Bell, Lawrence Berkeley National Lab, Berkeley, USA
R. Kohn, New York University, New York, USA
P. Newton, University of Southern California, Los Angeles, USA
C. Peskin, New York University, New York, USA
R. Pego, Carnegie Mellon University, Pittsburgh, USA
L. Ryzhik, Stanford University, Stanford, USA
A. Singer, Princeton University, Princeton, USA
A. Stevens, Max-Planck-Institute for Mathematics, Leipzig, Germany
A. Stuart, University of Warwick, Coventry, UK
T. Witelski, Duke University, Durham, USA
S. Wright, University of Wisconsin, Madison, USA
The mathematization of all sciences, the fading of traditional scientific boundaries,
the impact of computer technology, the growing importance of computer modeling
and the necessity of scientific planning all create the need both in education and
research for books that are introductory to and abreast of these developments. The
aim of this series is to provide such textbooks in applied mathematics for the student
scientist. Books should be well illustrated and have clear exposition and sound
pedagogy. Large number of examples and exercises at varying levels are
recommended. TAM publishes textbooks suitable for advanced undergraduate and
beginning graduate courses, and complements the Applied Mathematical Sciences
(AMS) series, which focuses on advanced textbooks and research-level monographs.
Approximation Theory
and Algorithms for Data
Analysis
123
Armin Iske
Department of Mathematics
University of Hamburg
Hamburg, Germany
Original German edition published by Springer-Verlag GmbH, Heidelberg, 2017. Title of German
edition: Approximation.
© Springer Nature Switzerland AG 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
V
VI Preface
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Preliminaries, Definitions and Notations . . . . . . . . . . . . . . . . . . . 2
1.2 Basic Problems and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Approximation Methods for Data Analysis . . . . . . . . . . . . . . . . . 7
1.4 Hints on Classical and More Recent Literature . . . . . . . . . . . . . 8
3 Best Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Dual Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 Direct Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
IX
X Table of Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
f : Ω −→ R
s∗ ≈ f,
C (Ω) := {u : Ω −→ R | u continuous on Ω}
denotes the linear space of all continuous functions on Ω. Recall that C (Ω)
is a linear space of infinite dimension. When equipped with the maximum
norm · ∞ , defined as
C (Ω) is a normed linear function space (or, in short: a normed space). The
normed space C (Ω), equipped with the maximum norm · ∞ , is complete,
i.e., (C (Ω), ·∞ ) is a Banach space. We note this important result as follows.
are of particular interest. The function spaces C k (Ω) form a nested sequence
is a Banach space. ♦
We will discuss further relevant examples for normed spaces (F, · ) and
approximation spaces S ⊂ F later. In this short introduction, we will only
touch a few more important aspects of approximation for an outlook.
1.2 Basic Problems and Outlook 5
S 0 ⊂ S1 ⊂ . . . ⊂ Sn ⊂ F for n ∈ N0
with respect to both the Euclidean norm · and the maximum norm · ∞ .
The latter will lead us to the Jackson theorems, one of which is as follows.
From this result, we see that the power of the approximation method does
not only depend on the approximation spaces Tn but also and essentially on
the smoothness of the target f . Indeed, the following principle holds:
The smoother the target function f ∈ C2π , the faster is the convergence
of the minimal distances η(f, Tn ), or, η∞ (f, Tn ) to zero.
We will prove this and other classical results concerning the asymptotic
behaviour of minimal distances in Chapter 6.
s1 (xn ) · · · sm (xn )
2.1 Linear Least Squares Approximation 11
its gradient
∇F (c) = 2B T Bc − 2B T fX
and its (constant) Hessian1 matrix
∇2 F (c) = 2B T B.
Recall that any local minimum of F can be characterized via the solution
of the linear equation system
B T Bc = B T fX , (2.6)
B = QR (2.7)
Note that matrix B has full rank, rank(B) = m, if and only if no diagonal
entry skk , 1 ≤ k ≤ m, in the upper triangular matrix S ∈ Rm×m vanishes.
A numerically stable solution for the minimization problem (2.5) relies on
the alternative representation
where we use the isometry of the inverse Q−1 = QT with respect to the
Euclidean norm · 2 , i.e.,
This solves the linear least squares approximation problem, Problem 2.1.
For further illustration, we discuss one example of linear regression.
2.1 Linear Least Squares Approximation 13
5.5
4.5
3.5
2.5
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
5.5
4.5
3.5
2.5
1.5
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 2.1. (a) We take 26 noisy samples f˜X = fX + εX from f (x) = 1 + 3x. (b) We
compute the regression line s∗ (x) = c∗0 + c∗1 x, with c∗0 ≈ 0.9379 and c∗1 ≈ 3.0617, by
using linear least squares approximation (cf. Example 2.2).
14 2 Basic Methods and Numerical Algorithms
f˜X = fX + εX .
J : S −→ R,
where J(s) quantifies, for instance, the smoothness, the variation, the energy,
or the oscillation of s ∈ S. By the combination of the data error ηX and the
regularization functional J, being balanced by a fixed parameter α > 0, this
leads us to an extension of linear least squares approximation, Problem 2.1,
giving a regularization method that is described by the minimization problem
c2A := cT Ac (2.14)
Note that Problem 2.3 coincides for α = 0 with the linear least squares
approximation problem. As we show in the following, the minimization prob-
lem (2.15) of Tikhonov regularization has for any α > 0 a unique solution,
in particular for the case, where the design matrix B has no full rank. We
further remark that the linear least squares approximation problem, Prob-
lem 2.1, has for rank(B) < m ambiguous solutions. However, as we will show,
3
Andrey Nikolayevich Tikhonov (1906-1993), Russian mathematician
16 2 Basic Methods and Numerical Algorithms
its gradient
∇Fα (c) = 2 B T B + αA c − 2B T fX
and the (constant) positive definite Hessian matrix
∇2 Fα = 2 B T B + αA . (2.16)
Note that the function Fα has one unique stationary point c∗α ∈ Rm satisfying
the necessary condition ∇Fα (c) = 0. Therefore, c∗ can be characterized as the
unique solution of the minimization problem (2.15) via the unique solution
of the linear system
B T B + αA c∗α = B T fX ,
−1
i.e., c∗α = B T B + αA B T fX . Due to the positive definiteness of the Hes-
∗
sian ∇ Fα in (2.16), cα is a local minimum of Fα . Moreover, in this case Fα
2
A = U ΛU T ,
This implies
2
√ B fX
Bc−fX 22 +αc2A = Bc−fX 22 + 1/2
αA c22
= √ c− .
αA1/2 0 2
C = V ΣW T ,
r
n+1
m
= (σj aj − vjT fX )2 + (vjT fX )2 + α a2j .
j=1 j=r+1 j=1
aj := 0 for r + 1 ≤ j ≤ m,
Since all terms of the cost function in (2.19) are non-negative, the minimiza-
tion problem (2.19) can be split into the r independent subproblems
gj (aj ) = 2((σj2 + α)aj − σj vjT fX ) and gj (aj ) = 2(σj2 + α) > 0
r
σj
b∗ = W a∗ = v T fX w j .
j=1
σj2 + α j
c∗ −→ 0 for α −→ ∞
and
c∗ −→ c∗0 = A−1/2 b∗0 for α 0,
where c∗0 ∈R m
denotes that solution of the linear least squares problem
2.3 Interpolation by Algebraic Polynomials 19
which minimizes the norm · A . For the solution s∗α ∈ S of (2.13), we obtain
s∗α −→ 0 for α −→ ∞
and
s∗α −→ s∗0 for α 0,
where s∗0 ∈ S is that solution for the linear least squares problem
sX − fX 22 −→ min !
s∈S
∗
whose coefficients c ∈ R m
minimize the norm · A .
n
x − xk
= for 0 ≤ j ≤ n.
k=0
x j − xk
k=j
1 for k = j
Lj (xk ) = for all 0 ≤ j, k ≤ n.
0 j
for k =
Therefore, the Lagrange polynomials L0 , . . . , Ln are a basis of the polynomial
space Pn . Moreover, the solution p ∈ Pn to the interpolation problem (2.22)
is in its Lagrange representation given as
5
Joseph-Louis Lagrange (1736-1813), mathematician and astronomer
22 2 Basic Methods and Numerical Algorithms
n
p(x) = f0 L0 (x) + . . . + fn Ln (x) = fj Lj (x). (2.25)
j=0
0.5
-0.5
-1
0 1 2 3 4 5 6
0.5
-0.5
-1
0 1 2 3 4 5 6
Fig. 2.2. For X = {0, π, 3π/2, 2π} and fX = (1, −1, 0, 1)T the cubic polynomial
p = L0 − L1 + L3 solves the interpolation problem pX = fX from Example 2.8.
24 2 Basic Methods and Numerical Algorithms
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
1.4
1.2
0.8
0.6
0.4
0.2
-0.2
0 1 2 3 4 5 6
2
L1 (x) = π3
· x(x − 32 π)(x − 2π)
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
0 1 2 3 4 5 6
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
1
L3 (x) = π3
· x(x − π)(x − 32 π)
For fixed x ∈ R the values pk,j (x) can be computed recursively. This is done
by using the Aitken6 lemma.
Lemma 2.9. For the interpolation polynomials pk,j ∈ Pj satisfying (2.26)
we have the recursion
Induction step (j − 1 −→ j): Note that the right hand side of the recursion,
x − xk
q(x) := pk,j−1 (x) + (pk−1,j−1 (x) − pk,j−1 (x)),
xk−j − xk
is a polynomial of degree at most j, i.e., q ∈ Pj . From the stated recursion
and by using the induction hypothesis we can conclude that q, as well as pk,j ,
interpolates the data
(xk−j , fk−j ), . . . , (xk , fk ).
Therefore, we have q ≡ pk,j by uniqueness of the interpolant pk,j .
6
Alexander Craig Aitken (1895-1967), New Zealand mathematician
2.3 Interpolation by Algebraic Polynomials 27
By the recursion of the Aitken lemma, Lemma 2.9, we can, on given in-
terpolation points X and function values fX , recursively evaluate the unique
interpolation polynomial p ≡ pn,n ∈ Pn at any point x ∈ R. To this end, we
organize the values pk,j ≡ pk,j (x), for 0 ≤ j ≤ k ≤ n, in a triangular scheme
as follows.
f0 = p0,0
f1 = p1,0 p1,1
f2 = p2,0 p2,1 p2,2
.. .. .. . .
. . . .
fn = pn,0 pn,1 pn,2 · · · pn,n
The values in the first column of the triangular scheme are the given function
values pk,0 = fk , for 0 ≤ k ≤ n. The values of the subsequent columns can be
computed, according to the recursion in the Aitken lemma, from two values in
the previous column. In this way, we can compute all entries of the triangular
scheme, column-wise from left to right, and so we obtain the sought function
value p(x) = pn,n .
To compute the entry pk,j we merely need (besides the interpolation
points xk−j and xk ) the two entries pk−1,j−1 and pk,j−1 from the previ-
ous column. If we compute the entries in each column from the bottom to
the top, then we can delete, in each step one entry, pk,j−1 , since pk,j−1 is no
longer needed in the subsequent computations.
This leads us to the Neville7 -Aitken algorithm, Algorithm 1, giving a
memory-efficient variant of the Aitken recursion in Lemma 2.9. The Neville-
Aitken algorithm operates on the input data vector fX = (f0 , . . . , fn )T re-
cursively as shown in Algorithm 1.
7
Eric Harold Neville (1889-1961), English mathematician
28 2 Basic Methods and Numerical Algorithms
k−1
ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n. (2.27)
j=0
f0 = pn (x0 ) = b0
f1 = pn (x1 ) = b0 + b1 (x1 − x0 )
f2 = pn (x2 ) = b0 + b1 (x2 − x0 ) + b2 (x2 − x0 )(x2 − x1 )
.. ..
. .
fn = pn (xn ) = b0 + . . . + bn (xn − x0 ) · . . . · (xn − xn−1 ).
Further note that for the computation of bk we only need the first k + 1 data
8
Sir Isaac Newton (1643-1727), English philosopher and scientist
2.4 Divided Differences and the Newton Representation 29
n
pn+1 (x) = pn (x) + bn+1 (x − xk ) = pn (x) + bn+1 ωn+1 (x),
k=0
Remark 2.11. Note that the n-th divided difference [x0 , . . . , xn ](f ) in Defi-
nition 2.10 is the leading coefficient of the interpolation polynomial p for f
on X with respect to its monomial representation (2.30). We remark that
the leading coefficient of p with respect to its monomial representation (2.30)
coincides with the leading coefficient of p with respect to its Newton repre-
sentation in (2.28) so that we have
p(x) = [x0 , . . . , xn ](f )ωn (x) + bn−1 ωn−1 (x) + . . . + b1 ω1 (x) + b0 (2.33)
30 2 Basic Methods and Numerical Algorithms
n−1
ωn (x) = (x − xj ) ∈ Pn
j=0
As we show now, all coefficients in the Newton representation (2.28) of
the interpolation polynomial p are divided differences.
Theorem 2.13. For X = {x0 , . . . , xn } and fX ∈ Rn+1 ,
n
p(x) = [x0 , . . . , xk ](f ) · ωk (x) ∈ Pn (2.34)
k=0
n−1
p= [x0 , . . . , xk ](f ) · ωk ∈ Pn−1
k=0
with qn−1 ∈ Pn−1 , where the latter follows directly from (2.33). Since
n−1
qn−1 = [x0 , . . . , xk ](f ) · ωk .
k=0
n−1
n
p = [x0 , . . . , xn ](f ) · ωn + [x0 , . . . , xk ](f ) · ωk = [x0 , . . . , xk ](f ) · ωk .
k=0 k=0
holds.
♦
By the recursion in Theorem 2.14 we can view the n-th divided difference
[x0 , . . . , xn ](f ) as a discretization of the n-th derivative of f ∈ C n . We will
be more precise on this observation later in this section.
X fX
x0 f0
x1 f1 [x0 , x1 ](f )
x2 f2 [x1 , x2 ](f ) [x0 , x1 , x2 ](f )
.. .. .. .. ..
. . . . .
xn fn [xn−1 , xn ](f ) [xn−2 , xn−1 , xn ](f ) ··· [x0 , . . . , xn ](f )
by using the efficient and stable recursion of Theorem 2.14. To this end, we
organize the divided differences in a triangular scheme, as shown in Table 2.1.
The organization of the data in Table 2.1 reminds us of the triangular
scheme of the Neville-Aitken algorithm, Algorithm 1. In fact, to compute
the Newton coefficients [x0 , . . . , xk ](f ) in (2.34), we can (similarly as in Al-
gorithm 1) process the data in Table 2.1 by a memory-efficient algorithm
operating only on the data vector fX = (f0 , . . . , fn )T , see Algorithm 2.
X3 fX3
0 1
π −1 − π2
3 2 8
2π 0 π 3π 2
2π 1 2
π 0 − 3π4 3
X4 fX4
0 1
π −1 − π2
3 2 8
2π 0 π 3π 2
2π 1 2
π √ 0 √
− 3π4 3√ √
4(1− 2) 8(2+5 2) 32(2+5 2)
π
4
√1
2
− 7√2π √
35 2π 2
− 105√2π3 − 16(16+5
√
105 2π 4
2)
0.5
-0.5
-1
0 1 2 3 4 5 6
0.5
-0.5
-1
0 1 2 3 4 5 6
Fig. 2.5. (a) The cubic polynomial p3 ∈ P3 interpolates the trigonometric function
f (x) = cos(x) on X3 = {0, π, 3π/2, 2π}. (b) The quartic polynomial p4 ∈ P4
interpolates f (x) = cos(x) on X4 = {0, π, 3π/2, 2π, π/4} (see Example 2.16).
36 2 Basic Methods and Numerical Algorithms
1
n−1
= f (n−1)
xn + λk (xk − xn ) dλ
xn − x0 Δn−1 k=1
, - /
n−1
− f (n−1)
x0 + λk (xk − x0 ) dλ
Δn−1 k=1
1
= ([xn , x1 , . . . , xn−1 ](f ) − [x0 , . . . , xn−1 ](f ))
xn − x0
1
= ([x1 , . . . , xn ](f ) − [x0 , . . . , xn−1 ](f ))
xn − x0
= [x0 , . . . , xn ](f ).
f (n) (τ )
[x0 , . . . , xn ](f ) = for some τ ∈ [xmin , xmax ],
n!
where xmin = min0≤k≤n xk and xmax = max0≤k≤n xk .
For x0 = . . . = xn , we have
f (n) (x0 )
[x0 , . . . , xn ](f ) = .
n!
(b) For p ∈ Pn−1 , we have [x0 , . . . , xn ](p) = 0 for n ≥ 1.
holds.
n
n
pf = [x0 , . . . , xk ](f )ωk and pg = [xj , . . . , xn ](g)0
ωj
k=0 j=0
k−1
n
ωk (x) = (x − x ) ∈ Pk and 0j (x) =
ω (x − xm ) ∈ Pn−j
=0 m=j+1
n
p := pf · pg = [x0 , . . . , xk ](f ) ωk · [xj , . . . , xn ](g) ω
0j (2.40)
k,j=0
n
p= [x0 , . . . , xk ](f ) · [xj , . . . , xn ](g) ωk · ω
0j .
k,j=0
k≤j
Since ωk · ω
0j ∈ Pn+k−j , for all 0 ≤ k, j ≤ n, we have p ∈ Pn . Therefore, p is
the unique interpolation polynomial in Pn for f · g on X, and so we obtain
the stated representation
n
[x0 , . . . , xn ](f · g) = [x0 , . . . , xj ](f ) · [xj , . . . , xn ](g) (2.41)
j=0
holds for h ∈ C m . Therefore, the divided differences [x0 , . . . , xm ](h) are for
h ∈ C m continuous in X, since the integrand h(m) in (2.42) is continuous in
X. Since f · g ∈ C n , we can conclude that the representation (2.41) holds for
arbitrary point sets X = {x0 , . . . , xn }.
and so
n
n!
(f · g)(n) (x0 ) = f (j) (x0 ) g (n−j) (x0 )
j=0
j! (n − j)!
n
n (j)
= f (x0 ) g (n−j) (x0 ),
j=0
j
From Corollary 2.18 (a) we see that divided differences are also well-
defined for coincident interpolation points, provided that f has sufficiently
many derivatives. In particular, for the case of coincident interpolation points,
all coefficients in the Newton representation (2.34) are in this case well-defined
(cf. Example 2.15). Now we extend the problem of Lagrange interpolation,
Problem 2.4, to the problem of Hermite interpolation. In the case of Hermite
interpolation, the interpolation conditions contain not only point evaluations
of f , but also derivative values of f . In this case, we require coincident inter-
polation points. To be more precise, we formulate the Hermite interpolation
problem as follows.
N −1
p(x) = [y0 , . . . , yk ](f )ωk (x). (2.46)
k=0
p −→ L(p) = (p(x0 ), . . . , p(μ0 −1) (x0 ), . . . , p(xn ), . . . , p(μn −1) (xn ))T ∈ RN ,
Y fY
0 1
0 1 0
π 0 − π1 − π12
π 0 − π1 0 1
π3
π 0 − π1 1
π2
1
π3 0
2π 0 0 1
π2 0 − 2π1 4 − 4π1 5
N −1
f (x) − pN −1 (x) = [y0 , . . . , yN −1 , x](f ) (x − yk ) for x ∈ R. (2.48)
k=0
N −1
pN (x) = pN −1 (x) + [y0 , . . . , yN −1 , x](f ) (x − yk )
k=0
and so
, −1
-
N
f (x) − pN −1 (x) = f (x) − pN (x) − [y0 , . . . , yN −1 , x](f ) (x − yk )
k=0
N −1
= [y0 , . . . , yN −1 , x](f ) (x − yk ).
k=0
Theorem 2.24 immediately yields the following upper bound for the in-
terpolation error f − p in (2.47) on the interval [a, b], where we combine the
representation in (2.48) with the result of Corollary 2.18 (a).
in x ∈ [a, b].
f (N ) ∞
f − p∞ ≤ · ωY ∞ for f ∈ C N [a, b] (2.51)
N!
follows from the pointwise error estimate in (2.50) for any compact interval
[a, b] ⊂ R containing the set of interpolation points Y , i.e., Y ⊂ [a, b].
To reduce the interpolation error in (2.51), we wish to minimize the maxi-
mum norm ωY ∞ of the knot polynomial ωY under variation of the inter-
polation points in Y ⊂ [a, b]. Without loss of generality, we restrict ourselves
to the interval [a, b] = [−1, 1]. This immediately leads us to the nonlinear
optimization problem
T1 (x) = x
T2 (x) = 2x2 − 1
T3 (x) = 4x3 − 3x
0 0 0
-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
T1 T2 T3
1 1 1
0 0 0
-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
T4 T5 T6
1 1 1
0 0 0
-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
T7 T8 T9
1 1 1
0 0 0
-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Corollary 2.30. For n ∈ N0 , let X ∗ = {x∗0 , . . . , x∗n } ⊂ [−1, 1] denote the set
of Chebyshev knots in (2.57). Then the corresponding knot polynomial ωX ∗
has the representation
ωX ∗ = 2−n Tn+1 . (2.59)
Proof. The knot polynomial ωX in (2.49) has for any point set X leading
coefficient one, in particular for the set X ∗ of Chebyshev knots. By the
representation in (2.56), the polynomial 2−n Tn+1 ∈ Pn+1 has also leading
coefficient one. Therefore, the difference
qn = ωX ∗ − 2−n Tn+1
Now assume that for a point set X = {x0 , . . . , xn } ⊂ [−1, 1] its knot
polynomial ωX ∈ Pn+1 satisfies
Then we have ωX (yk ) < ωX ∗ (yk ), for all even indices k ∈ {0, . . . , n} and
ωX (yk ) > ωX ∗ (yk ), for all odd indices k ∈ {1, . . . , n}. Therefore, the difference
ω = ωX ∗ − ωX
Problem 2.34. Compute from a given set X = {x0 , x1 , . . . , x2n } ⊂ [0, 2π) of
2n+1 pairwise distinct interpolation points and corresponding function values
fX = (f0 , f1 , . . . , f2n )T ∈ R2n+1 a real trigonometric polynomial T ∈ TnR
satisfying TX = fX , i.e.,
for m = 0, . . . , N , whereby c0 = . . . = cN = 0.
By the Euler13 fomula
eix = cos(x) + i sin(x) (2.67)
we can represent any real trigonometric polynomial T ∈ TnR in (2.63) as a
complex trigonometric polynomial p ∈ TNC of the form (2.66). Indeed, by
using the Euler formula (2.67) we find the standard trigonometric identities
1 ix 1 ix
cos(x) = e + e−ix and sin(x) = e − e−ix (2.68)
2 2i
and so we obtain for any T ∈ TnR the representation
a0
n
T (x) = + [ak cos(kx) + bk sin(kx)]
2
k=1
n
a0 ak ikx −ikx bk ikx −ikx
= + e +e + e −e
2 2 2i
k=1
n
a0 ak − ibk ikx ak + ibk −ikx
= + e + e
2 2 2
k=1
n
2n
= ck eikx = e−inx ck−n eikx
k=−n k=0
Note that the mapping (2.70) between the real Fourier coefficients ak , bk
of T and the complex Fourier coefficients ck of p is linear,
By the bijectivity of the linear mappings in (2.70) and (2.71) between the
complex and the real Fourier coefficients, we can determine the dimension of
TnR . The following result is a direct consequence of Theorem 2.36.
Now let us return to the interpolation problem, Problem 2.34. For the case
of complex trigonometric polynomials, we can solve Problem 2.34 as follows.
N
N
p(xk ) = cj eijxk = cj zkj .
j=0 j=0
2n
2n
q(x) := e 2inx
p(x) = cj e i(2n−j)x
= c2n−j eijx for x ∈ [0, 2π)
j=0 j=0
Lemma 2.41. For N ∈ N the N -th root of unity ωN has the property
N −1
1 (−k)j
ω = δk for all 0 ≤ , k ≤ N − 1. (2.75)
N j=0 N
N −1 (−k)N
−k j ωN −1 e2πi(−k) − 1
ωN = = =0
j=0
ωN − 1
−k −k
ωN −1
Now we are in a position where we can already give the solution to the
posed interpolation problem at equidistant interpolation points.
−1 N −1 −1 N −1
1 1 (−k)j
N N
−jk ijx
p(x ) = fk ωN e = fk ω = f
j=0
N N j=0 N
k=0 k=0
for all = 0, . . . , N − 1.
AN : CN −→ CN ,
A−1
N : C −→ C ,
N N
2.7 The Discrete Fourier Transform 53
N −1
p(x) = cj eijx ∈ TNC−1 ,
j=0
N −1
N −1
jk
fk = p(xk ) = cj eijxk = cj ωN for k = 0, . . . , N − 1 ,
j=0 j=0
The discrete Fourier analysis and the Fourier synthesis are usually referred
to as discrete Fourier transform and discrete inverse Fourier transform. In
the following discussion, we derive an efficient method for computing the
discrete (inverse) Fourier transform. But we first give a formal introduction
for the discrete (inverse) Fourier transform.
is defined componentwise as
N −1
−jk
ẑ(j) = z(k)ωN for 0 ≤ j ≤ N − 1, (2.79)
k=0
is defined componentwise as
N −1
1 jk
z(k) = ẑ(j)ωN for 0 ≤ k ≤ N − 1.
N j=0
The discrete Fourier transform (DFT) and the inverse DFT are repre-
sented by the Fourier matrices FN = N AN and FN−1 = A−1 N /N , i.e.,
54 2 Basic Methods and Numerical Algorithms
−jk
FN = ω N ∈ CN ×N
0≤j,k≤N −1
1 jk
FN−1 = ωN ∈ CN ×N .
N 0≤j,k≤N −1
Example 2.44. We compute the DFT ẑ ∈ C512 of the vector z ∈ C512 with
components z(k) = 3 sin(2π · 7k/512) − 4 cos(2π · 8k/512). To this end, we
regard the Fourier series (from the Fourier inversion formula)
1
511
z(k) = ẑ(j)e2πijk/512 ,
512 j=0
Therefore, we have
and, moreover, ẑ(j) = 0 for all j ∈ {0, . . . , 511} \ {7, 8, 504, 505}. Thereby,
the vector z ∈ C512 has a sparse representation by the four non-vanishing
Fourier coefficients ẑ(7), ẑ(8), ẑ(504) and ẑ(505) (see Figure 2.7). ♦
2.7 The Discrete Fourier Transform 55
8
-2
-4
-6
-8
0 50 100 150 200 250 300 350 400 450 500
1100
1000
900
800
700
600
500
400
300
200
100
0
0 50 100 150 200 250 300 350 400 450 500
N −1
−kj
ẑ(j) = z(k)ωN
k=0
−kj
−kj
= z(k)ωN + z(k)ωN
k even k odd
N/2−1
−2kj
N/2−1
−(2k+1)j
= z(2k)ωN + z(2k + 1)ωN
k=0 k=0
N/2−1
−2kj −j
N/2−1
−2kj
= z(2k)ωN + ωN z(2k + 1)ωN .
k=0 k=0
M −1
M −1
−2kj −j −2kj
ẑ(j) = z(2k)ωN + ωN z(2k + 1)ωN
k=0 k=0
M −1
M −1
−kj −j −kj
= u(k)ωN/2 + ωN v(k)ωN/2
k=0 k=0
M −1
M −1
−kj −j −kj
= u(k)ωM + ωN v(k)ωM
k=0 k=0
for j = 0, . . . , N − 1, where
Proof. For the entries of the Toeplitz matrix C = (Cjk )0≤j,k≤N −1 , we have
we obtain
N −1 N −1
1 1 j (k−j)
(Cω () )j = c(j−k) mod N · ωN
k
= ωN c(j−k) mod N · ωN
N N
k=0 k=0
−1 N −1
1 j
N
1 j −m −m 1 j
= ωN cm mod N · ωN = ωN cm ωN = ωN d ,
N m=0
N m=0
N
where
N −1
−k
d = ck ωN for 0 ≤ ≤ N − 1
k=0
whereby
CFN−1 = FN−1 diag(d)
or
FN CFN−1 = diag(d).
Now we finally regard the linear system (2.80) for a cyclic Toeplitz matrix
C ∈ CN ×N with generating vector c ∈ CN . By application of the discrete
Fourier transform FN to both sides in (2.80) we get the identity
FN CFN−1 FN x = FN b.
2.7 The Discrete Fourier Transform 59
Dy = r, (2.81)
x = FN−1 y.
We summarize the proposed solution for the Toeplitz system (2.80) in Al-
gorithm 3. Note that Algorithm 3 can be implemented efficiently by using the
fast Fourier transform (FFT): By Theorem 2.46 the performance of the steps
in lines 5,6 and 8 of Algorithm 3 by the (inverse) FFT costs only O(N log(N ))
operations each. In this case, a total number of only O(N log(N )) operations
are required for the performance of Algorithm 3. In comparison, the solution
of a linear equation system (2.80) via Gauss elimination requiring O(N 3 ) ope-
rations is far too expensive. But unlike in Algorithm 3, the Toeplitz structure
of the matrix C is not used in the Gauss elimination algorithm.
Moreover,
η ≡ η(f, S) = inf s − f
s∈S
is called the minimal distance between f and S.
In the following investigations, we will first address questions concerning
the existence and uniqueness of best approximations. To this end, we develop
sufficient conditions for the linear space F and the subset S ⊂ F, under which
we can guarantee for any f ∈ F the existence of a best approximation s∗ ∈ S
for f . To guarantee the uniqueness of s∗ , we require strict convexity for the
norm · .
In the following discussion, we develop suitable sufficient and necessary
conditions to characterize best approximations. To this end, we first derive
dual characterizations for best approximations, giving conditions for the ele-
ments from the topological dual space F of linear and continuous functionals
on F.
This is followed by direct characterizations of best approximations, where
we use directional derivatives (Gâteaux derivatives) of the norm · . On that
occasion, we consider computing directional derivatives of relevant norms
explicitly.
To study the material of this chapter (and for the following chapters)
we require knowledge of elementary results from optimization and functional
analysis. Therefore, we decided to explain a selection of relevant results. But
for further reading, we refer to the textbook [33].
ηp = inf s − fα p = α − 3 for p = 1, 2, ∞,
s∈S
where
s − fα p > inf s − fα p = α − 3 for all s ∈ S,
s∈S
4 4 4
3 3 3
2 2 2
1 1 1
0 f4 0 f 0 * f
0
S s* 0
S s* 4 0
S s 4
-1 -1 -1
-2 -2 -2
-3 -3 -3
-4 -4 -4
-5 -5 -5
-5 0 5 -5 0 5 -5 0 5
α = 4, · = · 1 α = 4, · = · 2 α = 4, · = · ∞
S1∗ = ∅, η1 = 1 S2∗ = ∅, η2 = 1 ∗
S∞ = ∅, η∞ = 1
4 4 4
3 3 3
2 2 2
*
1 1 1 s
2
0 f1 s* 0 f1 s* 0 f1
0
S 1
0
S 1
0
S
*
-1 -1 -1
s1
-2 -2 -2
-3 -3 -3
-4 -4 -4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
α = 1, · = · 1 α = 1, · = · 2 α = 1, · = · ∞
√ √
S1∗ = {(2, 0)} S2∗ = {(2, 0)} ∗
S∞ = 7+1
2
, ± 7−1
2
√
7−1
η1 = 1 η2 = 1 η∞ = 2
4 4 4
3 3 3
s *4
2 2 2
S* s *2 s *1
S 2
1 1 1
f0 f0 f0
0 s *1 s *2 0 S 0 S
-1 -1 -1
s *3 s *4
-2 -2 -2
s *3
-3 -3 -3
-4 -4 -4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
α = 0, · = · 1 α = 0, · = · 2 α = 0, · = · ∞
√ √
S1∗ = {(±2, 0), (0, ±2)} S2∗ = {x ∈ R2 | x2 = 2} ∗
S∞ = {(± 2, √ ± 2)}
η1 = 2 η2 = 2 η∞ = 2
3.1 Existence
In the following discussion, the notions of compactness, completeness, and
continuity play an important role. We assume that their definitions and fur-
ther properties are familiar from analysis. Nevertheless, let us recall the con-
tinuity of functionals. Throughout this chapter, F denotes a linear space with
norm · .
un − u −→ 0 for n → ∞,
we have
ϕ(un ) −→ ϕ(u) for n → ∞.
Moreover, ϕ is called continuous on F, if ϕ is continuous at every u ∈ F.
Now recall that any continuous functional attains its minimum (and its
maximum) on compact sets. Any compact set is closed and bounded. The
converse, however, is only true in finite-dimensional spaces.
For the discussion in this section, we need the continuity of norms. This
requirement is already covered by the following result.
vn − v −→ 0 for n → ∞.
and therefore
vn −→ v for n → ∞,
i.e., · is continuous at v ∈ F. Since we did not pose any further conditions
on v ∈ F, the norm · is continuous on F.
ϕ(v) = v − f for v ∈ F,
S0 = S ∩ {v ∈ F | v − f ≤ s0 − f } ⊂ S
so that altogether,
Rf = span{f, r1 , . . . , rn } ⊂ F,
66 3 Best Approximations
v−
w
w
v+w
Fig. 3.2. On the geometry of the parallelogram identity (see Theorem 3.9).
which immediately follows from the definition of (·, ·). In particular, we have
Remark 3.15. The required convexity for S is necessary for the result of
Theorem 3.14. In order to see this, we regard the sequence space
* +
∞
2
≡ (R) = x = (xk )k∈N ⊂ R
2
|xk | < ∞
2
(3.10)
k=1
k=1
where ek ∈ 2 is the sequence with (ek )j = δjk , for j, k ∈ N. Note that the
elements x(k) ∈ S are isolated in 2 , and so S is closed. But S is not convex.
Now we have η(0, S) = 1 for the minimal distance between 0 ∈ 2 and S,
and, moreover,
x(k) − 02 > 1 for all x(k) ∈ S.
Hence there exists no x(k) ∈ S with unit distance to the origin.
Finally, we remark that the result of Theorem 3.14 does not generalize
to Banach spaces. To see this, a counterexample can for instance be found
in [42, Section 5.2].
3.2 Uniqueness
In the following discussion, the notion of (strict) convexity for point sets,
functions, functionals and norms plays an important role. Recall the relevant
definitions for sets (see Definition 3.13) and for functions (see Definition 3.20),
as these should be familiar from analysis.
Now we note some fundamental results, where F denotes, throughout this
section, a linear space with norm · . We start with a relevant example for
a convex set.
3.2 Uniqueness 71
i.e., s∗λ = λs∗1 + (1 − λ)s∗2 ∈ [s∗1 , s∗2 ], for λ ∈ [0, 1], lies in S ∗ .
s*
1
f s* S
s*
2
Fig. 3.3. S is not convex and for s∗ ∈ S we have s∗ −f < η(f, S), cf. Remark 3.17.
To further illustrate this, let us make one simple example.
Example 3.19. For S = {x ∈ R2 | x∞ ≤ 1} and f = (2, 0), the set S ∗ of
best approximations to f from S with respect to the maximum norm · ∞
is given by 1 2
S ∗ = (1, α) ∈ R2 | α ∈ [−1, 1] ⊂ S
with the minimal distance
η(f, S) = inf s − f ∞ = 1.
s∈S
For s∗1 , s∗2 ∈ S ∗ every element s∗ ∈ [s∗1 , s∗2 ] lies in S ∗ (see Figure 3.4). ♦
1.5
1 S∗
0.5
0
S f
-0.5
-1
-1.5
-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
Fig. 3.4. S ∗ = (1, α) ∈ R2 | α ∈ [−1, 1] is the set of best approximations to
f = (2, 0) from S = {x ∈ R2 | x∞ ≤ 1} with respect to · ∞ (see Example 3.19).
3.2 Uniqueness 73
holds; f is said to be strictly convex on [a, b], if for all x, y ∈ [a, b], x = y,
we have
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).
holds. If f is strictly convex, then equality holds, if and only if all points
coincide, i.e., x1 = . . . = xn .
Proof. We prove the statement of Jensen’s inequality by induction on n.
Initial step: For n = 2, the statement of Jensen’s inequality is obviously true.
Induction hypothesis: Assume the statement holds for n points {x1 , . . . , xn }.
Induction step (n −→ n + 1): For n + 1 points {x1 , . . . , xn , xn+1 } ⊂ [a, b] and
n
λ1 , . . . , λn , λn+1 ∈ (0, 1) with λj = 1 − λn+1
j=1
we have
⎛ ⎞ ⎛ ⎞
n+1
n
λj
f⎝ λj xj ⎠ = f ⎝(1 − λn+1 ) xj + λn+1 xn+1 ⎠
j=1 j=1
1 − λn+1
⎛ ⎞
n
λj
≤ (1 − λn+1 )f ⎝ xj ⎠ + λn+1 f (xn+1 )
j=1
1 − λn+1
5
Johan Ludwig Jensen (1859-1925), Danish mathematician
74 3 Best Approximations
holds.
Remark 3.23. Every norm · : F −→ [0, ∞) is a convex functional on F.
Indeed, for any u, v ∈ F we find the inequality
due to the triangle inequality and the homogeneity of ·. Moreover, equality
in (3.15) holds for all pairs of linearly dependent elements u, v ∈ F with
u = αv for positive scalar α > 0, i.e., we have
by the homogeneity of · .
We introduce the notion of a strictly convex norm classically as follows.
Definition 3.24. A norm · is called strictly convex on F, if the unit
ball B = {u ∈ F | u ≤ 1} ⊂ F is strictly convex.
As we will show, not every norm is strictly convex. But before we do
so, our ”classical” introduction for strictly convex norms in Definition 3.24
deserves a comment.
3.2 Uniqueness 75
ϕ(λu + (1 − λ)v) < λϕ(u) + (1 − λ)ϕ(v) for all λ ∈ (0, 1), (3.17)
then no norm would be strictly convex in this particular sense! This important
observation is verified by the counterexample in (3.16).
When working with strictly convex norms · (according to Defini-
tion 3.24), we can exclude non-uniqueness of best approximations, if S ⊂ F
is convex. To explain this, we need to further analyze strictly convex norms.
To this end, we first prove the following useful characterization.
Theorem 3.26. Let F be a linear space with norm · . Then the following
statements are equivalent.
(a) The norm · is strictly convex.
(b) The unit ball B = {u ∈ F | u ≤ 1} ⊂ F is strictly convex.
(c) The inequality u + v < 2 holds for all u = v, with u = v = 1.
(d) The equality u + v = u + v, v = 0, implies u = αv for some α ≥ 0.
Proof. Note that the equivalence (a) ⇔ (b) holds by Definition 3.24.
(b) ⇒ (c): The strict convexity of B implies (u + v)/2 < 1 for u = v with
u = v = 1, and so in this case we have u + v < 2.
(c) ⇒ (d): For u = 0 statement (d) holds with α = 0. Now suppose u, v ∈
F \ {0} satisfy u + v = u + v. Without loss of generality, we may
assume u ≤ v (otherwise we swap u and v). In this case, in the sequence
of inequalities
u v v
2≥ + = u + v −
v
−
u v u u u v
u v
≥ + − v − v = u + v − 1 − 1 v
u u u v u u v
u + v 1 1
= − − v = 2
u u v
equality holds everywhere, in particular
u v
u + v = 2.
provided that u < 1 or v < 1. Otherwise, i.e., if u = v = 1, we have
If λu + (1 − λ)v = 1, then we have λu = α(1 − λ)v for one α > 0 from (d).
Therefore, we have u = v, since u = v. This, however, is in contradiction
to the assumption u = v. Therefore, we have, also for this case,
Remark 3.27. The absolute value |·| is a strictly convex norm on R. Indeed,
in the equivalence (c) of Theorem 3.26 we can only use the two points u = −1
and v = 1, where we have |u + v| = 0 < 2. But note that the absolute value,
when regarded as a function | · | : R −→ R is not strictly convex on R.
∞
equipped with the -norm
∞
x∞ := sup |xk | for x = (xk )k∈N ∈ .
k∈N
3.2 Uniqueness 77
p
equipped with the -norm
, ∞
-1/p
xp := |xk | p
for x = (xk )k∈N ∈ p
.
k=1
p
To further analyze the -norms we prove the Hölder6 inequality.
xp−1
p
|xk |p−1 = α|yk | with α = >0 for y = 0. (3.20)
yq
x = (xk )k∈N ∈ p
and y = (yk )k∈N ∈ q
.
by the Jensen inequality, Theorem 3.21, here applied to the strictly convex
function − log : (0, ∞) −→ R. This yields the Young7 inequality
1/p 1/q
|xk yk | |xk |p |yk |q 1 |xk |p 1 |yk |q
= ≤ p + . (3.22)
xp yq xpp yqq p xp q yqq
and this already proves the Hölder inequality (3.19), with equality, if and
only if (3.20) holds for all k ∈ N.
Proof. For 1 < p < ∞, let 1 < q < ∞ be the conjugate Hölder exponent of p
satisfying 1/p + 1/q = 1.
For
x = (xk )k∈N and y = (yk )k∈N ∈ p ,
where x = y and xp = yp = 1, we wish to prove the inequality
in particular,
x + yp ≤ 2 for xp = yp = 1.
If x + yp = 2 for xp = yp = 1, then we have equality in both (3.26)
and (3.27). But equality in (3.27) is by (3.20) equivalent to the two conditions
1
|xk |p−1 = α|sk | and |yk |p−1 = α|sk | with α = ,
sq
which implies
|xk | = |yk | for all k ∈ N.
In this case, we have equality in (3.26), if and only if sgn(xk ) = sgn(yk ), for
all k ∈ N, i.e., equality in (3.26) and (3.27) implies x = y.
Therefore, the inequality (3.25) holds for all x = y with xp = yp = 1.
for 1 < p < ∞, where Lp ≡ Lp (Rd ) is the linear space of all functions
whose p-th power is Lebesgue9 integrable. Indeed, in this case (in analogy to
Theorem 3.29) the Hölder inequality
holds for 1 < p, q < ∞ satisfying 1/p + 1/q = 1. This implies (as in the proof
of Theorem 3.30) the Minkowski inequality
where for 1 < p < ∞ we have equality, if and only if u = αv for some α ≥ 0
(see [35, Theorem 12.6]). Therefore, the Lp -norm · p , forr 1 < p < ∞, is
by equivalence statement (d) in Theorem 3.26 strictly convex.
But there are norms that are not strictly convex. Here are two examples.
9
Henri Léon Lebesgue (1875-1941), French mathematician
80 3 Best Approximations
Thus, by Theorem 3.26, statement (b), the 1 -norm ·1 is not strictly convex.
Likewise, we show that for the linear space ∞ of all bounded sequences
the ∞ -norm · ∞ , defined as
∞
x∞ = sup |xk | for x = (xk )k∈N ∈ ,
k∈N
Example 3.34. For the linear space C ([0, 1]d ) of all continuous functions
on the unit cube [0, 1]d ⊂ Rd , the maximum norm · ∞ , defined as
is not strictly convex. To see this we take a continuous function u1 ∈ C ([0, 1]d )
satisfying u1 ∞ = 1 and another continuous function u2 ∈ C ([0, 1]d ) satis-
fying u2 ∞ = 1, so that |u1 | and |u2 | attain their maximum on [0, 1]d at one
point x∗ ∈ [0, 1]d , respectively, i.e.,
u1 ∞ = max |u1 (x)| = |u1 (x∗ )| = |u2 (x∗ )| = max |u2 (x)| = u2 ∞ = 1.
x∈[0,1]d x∈[0,1]d
This then implies for uλ = λu1 + (1 − λ)u2 ∈ (u1 , u2 ), with λ ∈ (0, 1),
|uλ (x)| ≤ λ|u1 (x)| + (1 − λ)|u2 (x)| ≤ 1 for all x ∈ [0, 1]d
where equality holds for x = x∗ , whereby uλ ∞ = 1 for all λ ∈ (0, 1).
In this case, the unit ball B = {u ∈ C ([0, 1]d ) | u∞ ≤ 1} is not strictly
convex, i.e., · ∞ is not strictly convex by statement (b) in Theorem 3.26.
To make an explicit example for the required functions u1 and u2 , we take
the geometric mean ug ∈ C ([0, 1]d ) and the arithmetic mean ua ∈ C ([0, 1]d ),
3.2 Uniqueness 81
√ x1 + . . . + x d
ug (x) = d
x 1 · . . . · xd ≤ = ua (x),
d
for x = (x1 , . . . , xd ) ∈ [0, 1]d . Obviously, we have ug ∞ = ua ∞ = 1, where
ug and ua attain their unique maximum on [0, 1]d at 1 = (1, . . . , 1) ∈ [0, 1]d .
♦
d
xpp = |xk |p for 1 ≤ p < ∞ and x∞ = max |xk |
1≤k≤d
k=1
Remark 3.36. In statements (b), (c) of Corollary 3.35, we excluded the case
d = 1, since in this univariate setting the norms · 1 and · ∞ coincide
with the strictly convex norm | · | on R (see Remark 3.27).
Theorem 3.37. Let F be a linear space, equipped with a strictly convex norm
· . Moreover, assume S ⊂ F is convex and f ∈ F. If there exists a best
approximation s∗ ∈ S to f , then s∗ is unique.
Proof. Suppose s∗1 , s∗2 ∈ S are two different best approximations to f from
S, i.e., s∗1 = s∗2 . Then we have
Due to the assumed convexity for S, the element s∗ = (s∗1 + s∗2 )/2 lies in S.
Moreover, s∗ is closer to f than s∗1 and s∗2 , by (3.28). But this is in contra-
diction to the optimality of s∗1 and s∗2 .
82 3 Best Approximations
We remark that the strict convexity of the norm · gives, in combination
with the convexity of S ⊂ F, only a sufficient condition for the uniqueness
of the best approximation. Now we show that this condition is not necessary.
To this end, we make a simple example.
1.5
1 f
0.5
S∗
0
-0.5
-1
-1.5
-2
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Corollary 3.40. Let S ⊂ Lp be convex for 1 < p < ∞. Then there is for
any f ∈ Lp at most one best approximation s∗ ∈ S to f w.r.t. · p .
Corollary 3.41. Let S ⊂ p be convex for 1 < p < ∞. Then there is for any
f ∈ p at most one best approximation s∗ ∈ S to f w.r.t. · p .
3.2 Uniqueness 83
whereby we get the minimal distance ηp (f, S) between f and S with respect
to · p . Again, by the uniqueness of the best approximation we obtain the
stated result by
s∗p (x) = rp∗ (x) = s∗p (−x) for all x ∈ [−1, 1].
For an alternative proof of Proposition 3.42, we refer to Exercise 3.74.
84 3 Best Approximations
The elements from the linear space F are called dual functionals. On
this occasion, we recall the notions of linearity, continuity and boundedness
of functionals. We start with linearity.
Now we can introduce a norm for the dual space F , by using the norm
· of F. To this end, we take for any functional ϕ ∈ F the smallest upper
bound C ≡ Cϕ in (3.29). To be more precise on this, we define by
|ϕ(u)|
ϕ = sup = sup |ϕ(u)|
u∈F u u∈F
u=0 u=1
Proof. (a) ⇒ (b): Let ϕ be continuous at u0 ∈ F, and, moreover, let (un )n∈N
be a convergent sequence in F with limit u ∈ F. Then we have
since otherwise there would exist an upper bound N ∈ N for ϕ (i.e., ϕ would
be bounded). In this case, the sequence (vn )n∈N , defined as
un
vn = for n ∈ N,
|ϕ(un )|
is a zero sequence in F by
1
vn = −→ 0 for n → ∞,
|ϕ(un )|
and so, by continuity of ϕ, we have
Proof. Note that the sufficiency of the statement is covered by Theorem 3.46.
To prove the necessity, suppose that s∗ ∈ S is a best approximation to f .
Regard the open ball
Bη (f ) = {u ∈ F | u − f < s∗ − f } ⊂ F
10
Hans Hahn (1879-1934), Austrian mathematician and philosopher
11
Stefan Banach (1892-1945), Polish mathematician
12
Stanislaw Mazur (1905-1981), Polish mathematician
3.4 Direct Characterization 87
This implies
s∗ − f
ϕ = sup |ϕ(v)| ≤ ϕ
v ≤1 s∗ − f
and, moreover, by using the continuity of ϕ once more, we have
ϕ(s∗ − f )
ϕ = ⇐⇒ ϕ(s∗ − f ) = ϕ · s∗ − f .
s∗ − f
holds.
13
René Gâteaux (1889-1914), French mathematician
88 3 Best Approximations
1
Du,v (h) = (ϕ(u + hv) − ϕ(u)) for h > 0, (3.32)
h
is a monotonically increasing function in h > 0, which, moreover, is bounded
below. To verify the monotonicity, we regard the convex combination
h 2 − h1 h1
u + h1 v = u + (u + h2 v) for h2 > h1 > 0.
h2 h2
The convexity of ϕ then implies the inequality
h2 − h1 h1
ϕ(u + h1 v) ≤ ϕ(u) + ϕ(u + h2 v)
h2 h2
and, after elementary calculations, the monotonicity
1 1
Du,v (h1 ) = (ϕ(u + h1 v) − ϕ(u)) ≤ (ϕ(u + h2 v) − ϕ(u)) = Du,v (h2 ).
h1 h2
If we now form the convex combination
h2 h1
u= (u − h1 v) + (u + h2 v) for h1 , h2 > 0,
h 1 + h2 h1 + h2
we obtain, by using the convexity of ϕ, the inequality
h2 h1
ϕ(u) ≤ ϕ(u − h1 v) + ϕ(u + h2 v)
h1 + h2 h 1 + h2
and, after elementary calculations, we obtain the estimate
1
−Du,−v (h1 ) = − (ϕ(u − h1 v) − ϕ(u))
h1
1
≤ (ϕ(u + h2 v) − ϕ(u)) = Du,v (h2 ). (3.33)
h2
This implies that the monotonically increasing difference quotient Du,v is
bounded from below for all u, v ∈ F. In particular, Du,−v is a monotoni-
cally increasing function that is bounded from below. Therefore, the Gâteaux
derivatives ϕ+ (u, v) and ϕ+ (u, −v) exist. By (3.33), we finally have
1 1
− (ϕ(u − hv) − ϕ(u)) ≤ −ϕ+ (u, −v) ≤ ϕ+ (u, v) ≤ (ϕ(u + hv) − ϕ(u))
h h
for all h > 0, as stated.
holds for all λ ∈ [0, 1], by using properties (a) and (b).
Remark 3.52. By the properties (a) and (b) in Theorem 3.51, we call the
functional ϕ+ (u, ·) : F −→ R sublinear. We can show that the sublinearity of
ϕ+ (u, ·), for all u ∈ F, in combination with the inequality
holds.
Moreover, we have
1
(F ◦ ϕ)+ (u, v) = lim (F (ϕ(u + hv)) − F (ϕ(u)))
h 0h
1
= lim (F (xh ) − F (x))
h 0 h
1
= lim G(xh ) · lim (xh − x)
h 0 h 0 h
1
= lim G(ϕ(u + hv)) · lim (ϕ(u + hv) − ϕ(u))
h 0 h 0 h
proving both the existence of (F ◦ ϕ)+ (u, v) and the chain rule in (3.34).
Proof. (b) ⇒ (a): Suppose ϕ+ (u0 , u−u0 ) ≥ 0 for u ∈ K. Then we have, due to
the monotonicity of the difference quotient Du0 ,u−u0 in (3.32), in particular
for h = 1,
ϕf (v) = v − f for v ∈ F.
ϕf (λv1 + (1 − λ)v2 )
= λv1 + (1 − λ)v2 − f = λ(v1 − f ) + (1 − λ)(v2 − f )
≤ λv1 − f + (1 − λ)v2 − f = λϕf (v1 ) + (1 − λ)ϕf (v2 ).
Therefore, ϕf has a Gâteaux derivative, for which the chain rule (3.34) holds.
Now the direct characterization from Theorem 3.54 can be applied to the
distance functional ϕf . This leads us to a corresponding equivalence, which
is referred to as Kolmogorov14 criterion.
For the Gâteaux derivative of the norm ϕ = · : F −→ R we will
henceforth use the notation
Remark 3.56. For proving the implication (b) ⇒ (a) in Theorem 3.54 we
did not use the convexity of K. Therefore, we can specialize the equivalence
in Corollary 3.55 to establish the implication
Theorem 3.58. Let F be a linear space with norm · and S ⊂ F be convex.
Moreover, suppose f ∈ F. Then the following statements are equivalent.
(a) s∗ ∈ S is the strongly unique best approximation to f .
(b) There is one α > 0 satisfying + (s∗ − f, s − s∗ ) ≥ αs − s∗ for all s ∈ S.
3.4 Direct Characterization 93
2
s∗g − s∗f ≤ g − f for all f, g ∈ F,
α0
Lipschitz continuous on F with Lipschitz constant 2/α0 (see Definition 6.64).
This implies
1 1
|+ (x, y) = lim (|x + hy| − |x|) = lim (|x| + hy sgn(x) − |x|) = y sgn(x).
h 0 h h 0 h
C (Ω) = {u : Ω −→ R | u continuous on Ω}
1 1
(u + hv∞ − u∞ ) ≥ (|u(x) + hv(x)| − |u(x)|)
h h
1
= (|u(x)| + hv(x) sgn(u(x)) − |u(x)|)
h
= v(x) sgn(u(x))
for h|v(x)| < |u(x)|, which by h 0 already implies the stated inequality.
”≤”: To verify the inequality
+ (u, v) ≤ max v(x) sgn(u(x))
x∈Ω
|u(x)|=u∞
Since χΩh −→ χΩ+ , or, χΩ+ \Ωh −→ 0, for h 0, the statement in (3.37)
follows from the representations (3.38), (3.39) and (3.40).
To compute the Gâteaux derivatives for the remaining Lp -norms · p ,
1/p
up = |u(x)| dx
p
for u ∈ C (Ω),
Ω
Since χΩh −→ χΩ+ , or, χΩ+ \Ωh −→ 0, for h 0, the stated representation
in (3.41) follows from (3.42), (3.43), (3.44), and (3.45).
Now we can finally provide the Gâteaux derivatives for the remaining
Lp -norms · p , for 1 < p < ∞.
Theorem 3.67. Let Ω ⊂ Rd be compact. Moreover, suppose 1 < p < ∞.
Then, for the Gâteaux derivative of the Lp -norm · = · p on C (Ω), we
have
1
+ (u, v) = |u(x)|p−1 v(x) sgn(u(x)) dx
up−1
p Ω
Proof. The statement follows from the chain rule (3.34) in Theorem 3.53 with
F (x) = xp in combination with the representation of the Gâteaux derivative
(ϕp )+ in Lemma 3.66, whereby
(ϕp )+ (u, v) p
ϕ+ (u, v) = = |u(x)|p−1 v(x) sgn(u(x)) dx,
pϕp−1 (u) pup−1
p Ω
3.5 Exercises
Exercise 3.68. Consider approximating the parabola f (x) = x2 on the unit
interval [0, 1] by linear functions of the form
gξ (x) = ξ · x for ξ ∈ R
ηp (ξ) = gξ − f p .
along with the minimal distance ηp (ξ ∗ ), for each of the three cases p = 1, 2, ∞.
Exercise 3.69. Suppose we wish to approximate the identity f (x) = x on
the unit interval [0, 1], by an exponential sum of the form
min s − f .
s∈S
100 3 Best Approximations
Exercise 3.72. Let (F, ·) be a normed linear space, whose norm · is not
strictly convex. Show that there exists an element f ∈ F, a linear subspace
S ⊂ F, and distinct best approximations s∗1 , s∗2 ∈ S to f , s∗1 = s∗2 , satisfying
Exercise 3.73. Transfer the result of Proposition 3.42 to the case of odd
functions f ∈ C [−1, 1]. To this end, formulate and prove a corresponding
result for subsets S ⊂ C [−1, 1] that are invariant under point reflections,
i.e., for any s(x) ∈ S, we have −s(−x) ∈ S.
n
ϕ(f ) = λk f (xk ) for f ∈ C [a, b],
k=0
n
ϕ∞ = |λk |.
k=0
Exercise 3.78. Consider the linear space F = C ([0, 1]2 ), equipped with the
maximum norm · ∞ . Approximate the function
4
ϕ(g) = λj g(xj , yj ) for g ∈ F
j=1
(b) Assume that the Gâteaux derivative ϕ+ (u, v) exists for all u, v ∈ F.
Moreover, assume that ϕ+ (u, ·) : F −→ R is sublinear for all u ∈ F. If
the inequality
Theorem 4.1. Let F be a Euclidean space with inner product (·, ·). More-
over, suppose S ⊂ F is a convex subset of F. Then the following statements
are equivalent.
(a) s∗ ∈ S is a best approximation to f ∈ F \ S.
(b) We have (s∗ − f, s − s∗ ) ≥ 0 for all s ∈ S.
Note that the equivalence statement in Remark 4.2 identifies a best ap-
proximation s∗ ∈ S to f ∈ F as the unique orthogonal projection of f onto
S. In Section 4.2, we will study the projection operator Π : F −→ S, which
assigns every f ∈ F to its unique best approximation s∗ ∈ S in more detail.
Before doing so, we first use the orthogonality in (4.1) to characterize best
approximations s∗ ∈ S for convex subsets S ⊂ F. To this end, we work with
the dual characterization of Theorem 3.46.
Theorem 4.3. Let F be a Euclidean space with inner product (·, ·) and let
S ⊂ F be a convex subset of F. Moreover, suppose that s∗ ∈ S satisfies
s∗ − f ⊥ S. Then, s∗ is the unique best approximation to f .
satisfies all three conditions from the dual characterization of Theorem 3.46:
Indeed, the first condition, ϕ = 1, follows from the Cauchy1 -Schwarz2
inequality,
∗
s −f s∗ − f
|ϕ(u)| = ∗
, u ≤
s∗ − f · u = u for all u ∈ F,
s − f
where equality holds for u = s∗ − f ∈ F, since
∗
s −f s∗ − f 2
ϕ(s∗ − f ) = ∗
, s ∗
− f = = s∗ − f .
s − f s∗ − f
Therefore, ϕ also satisfies the second condition in Theorem 3.46. By s∗ −f ⊥ S
we have ∗
s −f
ϕ(s) = ,s = 0 for all s ∈ S,
s∗ − f
and so ϕ finally satisfies the third condition in Theorem 3.46.
In conclusion, s∗ is a best approximation to f . The uniqueness of s∗ follows
from the strict convexity of the Euclidean norm · = (·, ·)1/2 .
S = span{s1 , . . . , sn } ⊂ F
1
Augustin-Louis Cauchy (1789-1857), French mathematician
2
Hermann Amandus Schwarz (1843-1921), German mathematician
106 4 Euclidean Approximation
c∗ = ((f, s1 ), . . . , (f, sn )) ∈ Rn .
T
Theorem 4.5. Let F be a Euclidean space with inner product (·, ·). More-
over, let S ⊂ F be a finite-dimensional linear subspace with orthogonal basis
{s1 , . . . , sn }. Then, for any f ∈ F,
n
(f, sj )
s∗ = sj ∈ S (4.4)
j=1
sj 2
n
(f, sj ) (g, sk ) (f, sj )(g, sj )
n
= (sj , sk ) =
sj sk
2 2
j=1
sj 2
j,k=1
Proof. The Bessel inequality follows from the second stability estimate in (4.7)
in combination with the representation in (4.11). The second statement fol-
lows from the Pythagoras theorem (4.6) and the representation (4.11).
for j = k and
2π
1
(sin(j ·), cos(k ·)) = [sin((j − k)x) + sin((j + k)x)] dx = 0
2π 0
and
112 4 Euclidean Approximation
2π
1
(cos(j ·), cos(j ·)) = [1 + cos(2jx)] dx = 1
2π 0
2π
1
(sin(j ·), sin(j ·)) = [1 − cos(2jx)] dx = 1
2π 0
where we use the representations in (4.16) and (4.17) yet once more.
We now connect to the results of Theorems 4.5 and 4.11, where we can,
for any function f ∈ C2π , represent its unique best approximation s∗ ∈ Tn
by
1
n
1
s∗ (x) = f, √ √ + [(f, cos(j·)) cos(jx) + (f, sin(j·)) sin(jx)] . (4.19)
2 2 j=1
a0
n
(Fn f )(x) = + [aj cos(jx) + bj sin(jx)] . (4.20)
2 j=1
The Fourier partial sum (4.20) is split into an even part, given by the
partial sum of the even trigonometric polynomials {cos(j·), 0 ≤ j ≤ n} with
”even” Fourier coefficients aj , and into an odd part, given by the partial
sum of the odd trigonometric polynomials {sin(j·), 1 ≤ j ≤ n} with ”odd”
Fourier coefficients bj . We can show that for any even function f ∈ C2π , all
odd Fourier coefficients bj vanish. Likewise, for an odd function f ∈ C2π , all
even Fourier coefficients aj vanish. On this occasion, we recall the result of
Proposition 3.42, from which these statements immediately follow. But we
wish to compute the Fourier coefficients explicitly.
4.3 Fourier Partial Sums 113
This completes our proof for (a). We can prove (b) analogously.
Example 4.14. We consider approximating the periodic function f ∈ C2π ,
defined as f (x) = π −|x|, for x ∈ [−π, π]. To this end, we determine for n ∈ N
the Fourier coefficients aj , bj of the Fourier partial sum Fn f . Since f is an
even function, we can apply Corollary 4.13, statement (a). From this, we see
that bj = 0, for all 1 ≤ j ≤ n, and, moreover,
2 π
aj = f (x) cos(jx) dx for 0 ≤ j ≤ n.
π 0
Integration by parts gives
π π
1 1 π
f (x) cos(jx) dx = f (x) sin(jx) − f (x) sin(jx) dx
0 j j 0
0
π
1 π 1
= sin(jx) dx = − 2 cos(jx) for 1 ≤ j ≤ n,
j 0 j 0
114 4 Euclidean Approximation
4
aj = for all odd indices j ∈ {1, . . . , n}.
πj 2
We finally compute the Fourier coefficient a0 by
π
1 2π 2 π 2 1
a0 = (f, 1) = f (x) dx = (π − x) dx = − (π − x)2 = π.
π 0 π 0 π 2 0
π 4 1
n n
π
(Fn f )(x) = + aj cos(jx) = + cos(jx)
2 j=1 2 π j=1 j 2
j odd
n−1
2
π 4 cos((2k + 1)x)
= +
2 π (2k + 1)2
k=0
n
(Fn f )(x) = cj eijx . (4.23)
j=−n
For the conversion of the Fourier coefficients, we apply the linear mapping
in (2.69), whereby, with using the Eulerean formula (2.67), we obtain for the
complex Fourier coefficients in (4.23) the representation
2π
1
cj = f (x)e−ijx dx for j = −n, . . . , n. (4.24)
2π 0
We remark that the complex Fourier coefficients cj in (4.24) can also, like the
real Fourier coefficients aj in (4.21) and bj in (4.22), be expressed via inner
products. In fact, by using the complex inner product (·, ·)C in (4.14), we can
rewrite the representation in (4.24) as
2.5
1.5
0.5
0
-3 -2 -1 0 1 2 3
0.2
0.1
-0.1
-0.2
-0.3
-3 -2 -1 0 1 2 3
error function F2 f − f
Fig. 4.1. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F2 f )(x) (see Example 4.14).
116 4 Euclidean Approximation
2.5
1.5
0.5
0
-3 -2 -1 0 1 2 3
0.2
0.1
-0.1
-0.2
-0.3
-3 -2 -1 0 1 2 3
error function F4 f − f
Fig. 4.2. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F4 f )(x) (see Example 4.14).
4.3 Fourier Partial Sums 117
2.5
1.5
0.5
0
-3 -2 -1 0 1 2 3
0.2
0.1
-0.1
-0.2
-0.3
-3 -2 -1 0 1 2 3
Fig. 4.3. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F16 f )(x) (see Example 4.14).
118 4 Euclidean Approximation
where ωN = e2πi/N denotes the N -th root of unity in (2.74). In this way, the
vector c = (c−n , . . . , cn )T ∈ CN of the complex Fourier coefficients (4.24) is
approximated by the discrete Fourier transform (2.79) from the data vector
f = (f0 , . . . , fN −1 )T ∈ RN ,
of f convergent?
• If so, does the Fourier series F∞ f converge to f ?
• If so, how fast does the Fourier series F∞ f converge to f ?
In particular, we will investigate, if at all, in which sense (e.g. pointwise,
or uniformly, or with respect to the Euclidean norm · ) the convergence of
the Fourier series F∞ f holds. In Chapter 6, we will give answers, especially
for the asymptotic behaviour of the approximation error
1/2
equipped with an inner product, yielding the Euclidean norm ·w = (·, ·)w ,
so that b
f 2w = |f (x)|2 w(x) dx for f ∈ C [a, b].
a
Later in this section, we make concrete examples for the weight function w.
To approximate functions from C [a, b], we apply Theorem 4.5, so that
we can, for f ∈ C [a, b], represent the unique best approximation s∗ ∈ Pn
to f explicitly. In order to do so, we need an orthogonal system for Pn . To
this end, we propose an algorithm, which constructs for any weighted inner
product (·, ·)w an orthogonal basis
{p0 , p1 , . . . , pn } ⊂ Pn
is the orthogonal projection of the monomial xk+1 onto Pk w.r.t. (·, ·)w .
Therefore, the polynomials p0 , . . . , pn form an orthogonal basis for Pn .
Note that the orthogonalization method of Gram-Schmidt guarantees,
for any weighted inner product (·, ·)w , the existence of an orthogonal basis
for Pn with respect to (·, ·)w . Moreover, the Gram-Schmidt construction of
orthogonal polynomials in Algorithm 4 is unique up to n + 1 (non-vanishing)
scaling factors, one for the initialization (in line 2) and one for each of the
n for loop cycles (line 4). The scaling factors could be used to normalize the
orthogonal system of polynomials, where the following options are commonly
used.
• Normalization of the leading coefficient
p0 ≡ 1 and pk (x) = xk + qk−1 (x) for some qk−1 ∈ Pk−1 for k = 1, . . . , n
• Normalization at one
pk (1) = 1 for all k = 0, . . . , n
• Normalization of norm (orthonormalization)
Let p0 := p0 /p0 w (line 2) and pk := pk /pk w (line 4), k = 1, . . . , n.
However, the Gram-Schmidt algorithm is problematic for numerical rea-
sons. In fact, on the one hand, it is unstable, especially for input bases B with
almost linearly dependent basis elements. On the other hand, the Gram-
Schmidt algorithm is very inefficient. In contrast, the following three-term
recursion is much more suitable for efficient and stable constructions of or-
thogonal polynomials.
4.4 Orthogonal Polynomials 121
Theorem 4.16. For any weighted inner product (·, ·)w , there are unique or-
thogonal polynomials pk ∈ Pk , for k ≥ 0, with leading coefficient one. The
orthogonal polynomials (pk )k∈N0 satisfy the three-term recursion
k−1
(pk − xpk−1 , pj )w
pk (x) − xpk−1 (x) = cj pj (x) with cj = .
j=0
pj 2w
pk (x) = (x + ck−1 )pk−1 (x) + ck−2 pk−2 (x) = (x + ak )pk−1 (x) + bk pk−2 (x)
Theorem 4.18. Let g ∈ C [a, b] satisfy (g, p)w = 0 for all p ∈ Pn , i.e.,
g ⊥ Pn , for n ∈ N0 . Then, either g vanishes identically on [a, b] or g has at
least n + 1 zeros with changing sign in (a, b).
with changing sign. Then, the product g · p between g and the polynomial
k
p(x) = (x − xj ) ∈ Pk ⊂ Pn
j=1
Proof. On the one hand, by Theorem 4.18, pn has at least n pairwise distinct
zeros in (a, b). Now suppose pn ≡ 0. Since pn is an algebraic polynomial in
Pn \ {0}, pn has, on the other hand, at most n zeros. Altogether, pn has
exactly n zeros in (a, b), where each zero must be simple.
already in Section 2.5. Let us first recall some of the basic properties of the
Chebyshev polynomials Tn ∈ Pn , in particular the three-term recursion from
Theorem 2.27,
by using Theorem 4.11. Theorem 4.11 also yields the stated values for the
squared norms Tk 2w = (Tk , Tk )w .
8
Gábor Szegő (1895-1985), Hungarian mathematician
124 4 Euclidean Approximation
Indeed, this follows directly by induction from the three-term recursion (4.28).
Due to Corollary 2.28, the n-th Chebyshev polynomial Tn has, for n ≥ 1, the
leading coefficient 2n−1 , and so the scaled polynomial
where the form of Chebyshev partial sum in (4.32) reminds us on the form
of the Fourier partial sum Fn f from Corollary 4.12. Indeed, the coefficients
in the series expansion for the best approximation Πn f in (4.32) can be
identified as Fourier coefficients.
a0
n
Πn f = + a k Tk . (4.33)
2
k=1
Proof. For f ∈ C [−1, 1], the coefficients (f, Tk )w in (4.32) can be computed
by using the substitution φ = arccos(x):
1 π
f (x)Tk (x)
(f, Tk )w = √ dx = f (cos(φ)) cos(kφ) dφ
−1 1 − x2 0
2π
π1 π
= f (cos(x)) cos(kx) dx = ak (g),
2π 0 2
(f, T0 )w π 1 a0 (g)
= a0 (g) = .
T0 2w 2 π 2
To verify the Clenshaw algorithm, we use the recursion for the Chebyshev
polynomials in (4.28). By the assignment in line 6 of the Clenshaw algorithm,
we get the representation
for the coefficients of the Chebyshev partial sum, where for k = n with
zn+1 = 0 and zn = an we get zn+2 = 0. The sum over the last n terms of the
Chebyshev partial sum (4.33) can be rewritten by using the representation
in (4.34) in combination with the recursion (4.28):
n
n
ak Tk (x) = (zk − 2xzk+1 + zk+2 )Tk (x)
k=1 k=1
n
n+1
n+2
= zk Tk (x) − 2xzk Tk−1 (x) + zk Tk−2 (x)
k=1 k=2 k=3
= z1 T1 (x) + z2 T2 (x) − 2xz2 T1 (x)
n
+ zk [Tk (x) − 2xTk−1 (x) + Tk−2 (x)]
k=3
= z1 x + z2 (2x2 − 1) − 2xz2 x
= z1 x − z2 .
a0
n
1 1
(Πn f )(x) = + ak Tk (x) = (z0 − 2xz1 + z2 + 2z1 x − 2z2 ) = (z0 −z2 ).
2 2 2
k=1
dn n!
Ln (x) = n
(x2 − 1)n for n ≥ 0 (4.35)
dx (2n)!
We show that the Legendre polynomials are the (unique) orthogonal poly-
nomials with leading coefficient one, belonging to the weight function w ≡ 1.
Therefore, we regard the usual (unweighted) L2 inner product on C [−1, 1],
defined as
1
(f, g)w := (f, g) = f (x)g(x) dx for f, g ∈ C [−1, 1].
−1
which implies
n!k!
(Ln , Lk ) = Ink = 0 for n > k. (4.38)
(2n)!(2k)!
9
Benjamin Olinde Rodrigues (1795-1851), French mathematician and banker
10
Adrien-Marie Legendre (1752-1833), French mathematician
128 4 Euclidean Approximation
n2
Ln+1 (x) = xLn (x) − Ln−1 (x) for n ≥ 1 (4.40)
4n2−1
with initial values L0 ≡ 1 and L1 (x) = x.
By statement (b) in Theorem 4.25, the Legendre polynomial L2n is, for
any n ∈ N0 , even, and therefore xL2n (x) is odd, so that an = 0 for all n ≥ 0.
4.4 Orthogonal Polynomials 129
L1 (x) = x
1
L2 (x) = x2 −
3
3
L3 (x) = x3 − x
5
6 2 3
L4 (x) = x4 − x +
7 35
10 3 5
L5 (x) = x5 − x + x
9 21
15 4 5 2 5
L6 (x) = x6 − x + x −
11 11 231
21 5 105 3 35
L7 (x) = x7 − x + x − x
13 143 429
28 6 14 4 28 2 7
L8 (x) = x8 − x + x − x +
15 13 143 1287
36 7 126 5 84 3 63
L9 (x) = x9 − x + x − x + x
17 85 221 2431
45 8 630 6 210 4 315 2 63
L10 (x) = x10 − x + x − x + x −
19 323 323 4199 46189
and therefore
Ln 2 n4 22 (2n − 1)
bn+1 = − =−
Ln−1 2 (2n) (2n − 1)
2 2 2n + 1
2 2
n n
=− =− 2 for n ≥ 1,
(2n − 1)(2n + 1) 4n − 1
w(x) = e−x .
2
with Pn+1 (x) = Pn (x)−2xPn (x), where Pn+1 ∈ Pn+1 \Pn for Pn ∈ Pn \Pn−1 .
2xt−t2
This yields, for the function h(x, t) = e , the series expansion
∞ k
2 t
h(x, t) = w(x − t) · ex = Hk (x) for all x, t ∈ R. (4.44)
k!
k=0
On the other hand, with using the uniform convergence of the series for
h(x, t) in (4.44), we have the representation
,∞ -⎛ ∞ ⎞
t k s j
e−x h(x, t)h(x, s) dx = e−x Hk (x) ⎝ Hj (x)⎠ dx
2 2
R R k! j=0
j!
k=0
∞
tk s j
e−x Hk (x)Hj (x) dx.
2
= (4.46)
k!j! R
k,j=0
and so in particular
√
Hk 2w = 2k πk! for all k ∈ N0 .
This completes our proof.
Now we proof a three-term recursion for the Hermite polynomials.
Theorem 4.29. The Hermite polynomials satisfy the three-term recursion
Hn+1 (x) = 2xHn (x) − 2nHn−1 (x) for n ≥ 0 (4.48)
with the initial values H−1 ≡ 0 and H0 (x) ≡ 1.
Proof. Obviously, we have H0 ≡ 1. By applying partial differentiation to the
series expansion for h(x, t) in (4.44) with respect to variable t we get
tk−1 ∞
∂
h(x, t) = 2(x − t)h(x, t) = Hk (x)
∂t (k − 1)!
k=1
Moreover, we have
∞ k+1
∞
tk ∞
t tk+1
Hk (x) = (k + 1)Hk (x) = kHk−1 (x) (4.50)
k! (k + 1)! k!
k=0 k=0 k=0
H1 (x) = 2x
H2 (x) = 4x2 − 2
whereby (4.52) follows from the three-term recursion for Hn+1 in (4.48).
Proof. Statement (a) follows by induction from the three-term recursion (4.48),
whereas statement (b) follows from (4.52) with H0 ≡ 1 and H1 (x) = 2x.
We can conclude that for the weighted L2 inner product (·, ·)w in (4.42)
the Hermite polynomials Hn are the unique orthogonal polynomials with
leading coefficient 2n . The Hermite polynomials Hn are, for n = 1, . . . , 8,
shown in their monomial form in Table 4.2.
134 4 Euclidean Approximation
4.5 Exercises
Exercise 4.32. Let F = C [−1, 1] be equipped with the Euclidean norm
· 2 , defined by the inner product
1
(f, g) = f (x)g(x) dx for f, g ∈ C [−1, 1],
−1
a0
n
Fn (x) = + [aj cos(jx) + bj sin(jx)] for x ∈ [0, 2π)
2 j=1
Plot the graphs of R and the best approximation F10 R to R in one figure.
Plot the graphs of S and the best approximation F10 S to S in one figure.
Exercise 4.34. Approximate the function f (x) = 2x−1 on the unit interval
[0, 1] by a trigonometric polynomial of the form
c0
n
Tn (x) = + ck cos(kπx) for x ∈ [0, 1]. (4.53)
2
k=1
(a) Among all polynomials p ∈ Pk with leading coefficient one, the ortho-
gonal polynomial pk is norm-minimal with respect to · w , i.e.,
1 2
pk w = min pw | p ∈ Pk with p(x) = xk + q(x) for q ∈ Pk−1 .
k
pj (x) pj (y) 1 pk+1 (x) pk (y) − pk (x) pk+1 (y)
=
j=0
pj 2w pk 2w x−y
and, moreover,
k
(pj (x))2 pk+1 (x) pk (x) − pk (x) pk+1 (x)
= for all x ∈ [a, b].
j=0
pj 2w pk 2w
(c) Conclude from (b) that all zeros of pk are simple. Moreover, conclude
that pk+1 and pk have no common zeros.
Exercise 4.37. In this problem, make use of the results in Exercise 4.36.
(a) Prove for g ∈ C [−1, 1] and h(x) = x · g(x), for x ∈ [−1, 1], the relation
1
c0 (h) = c1 (g) and ck (h) = (ck−1 (g) + ck+1 (g)) for all k ≥ 1
2
between the Chebyshev coefficients ck (g) of g and ck (h) of h.
(b) Conclude from the relation in Exercise 4.36 (c) the representation
T2k (x) = Tk (2x2 − 1) for all x ∈ [−1, 1] and k ∈ N0 . (4.54)
(c) Can the representation in (4.54) be used to simplify the evaluation of
a Chebyshev partial sum for an even function in the Clenshaw algo-
rithm, Algorithm 5? If so, how could this simplification be used for the
implementation of the Clenshaw algorithm?
Exercise 4.38. On given coefficient functions ak ∈ C [a, b], for k ≥ 1, and
bk ∈ C [a, b], for k ≥ 2, let pk ∈ C [a, b], for k ≥ 0, be a function sequence
satisfying the three-term recursion
pk+1 (x) = ak+1 (x) pk (x) + bk+1 (x) pk−1 (x) for k ≥ 1
with initial functions p0 ∈ C [a, b] and p1 = a1 p0 ∈ C [a, b]. Show that the
sum
n
fn (x) = cj pj (x) for x ∈ [a, b]
j=0
dk k!
Lk (x) = (x2 − 1)k for 0 ≤ k ≤ n
dxk (2k)!
to determine the best approximation p∗n ∈ Pn , n ∈ N0 , to the exponential
function f (x) = e−x on [−1, 1] w.r.t. the (unweighted) Euclidean norm · 2 .
Compute the first eight coefficients c∗ = (c∗0 , . . . , c∗7 )T ∈ R8 of the sought
best approximation
n
p∗n (x) = c∗k Lk (x) for x ∈ [−1, 1].
k=0
Hint: Use the recursions from Theorem 4.29 and Corollary 4.30.
5 Chebyshev Approximation
where
Es∗ −f = {x ∈ Ω : |(s∗ − f )(x)| = s∗ − f ∞ } ⊂ Ω
denotes the set of extremal points of s∗ − f in Ω.
where we have used the Gâteaux derivative of the norm · ∞ from Theo-
rem 3.64. By the linearity of S, this condition is equivalent to (5.1).
Given the result of Theorem 5.1, we can immediately solve one simple
problem of Chebyshev approximation. To this end, we regard the univariate
case, d = 1, where Ω = [a, b] ⊂ R for a compact interval. In this case, we
wish to approximate continuous functions from C [a, b] by constants.
fmin + fmax
c∗ = ∈ P0
2
is the unique best approximation to f from P0 with respect to · ∞ , where
for c < 0 on the other hand. Altogether, the Kolmogorov criterion from
Theorem 5.1,
max c sgn(c∗ − f (x)) ≥ 0 for all c ∈ P0 ,
x∈Ec∗ −f
m−1
ωX ∗ (x) = (x − x∗k ) ∈ Pm−1 ⊂ Pn−1
k=1
5.1 Approaches to Construct Best Approximations 143
1 ∗ 1
|(p − f )(xk )| + |(q ∗ − f )(xk )|
p − f ∞ = |(p − f )(xk )| ≤
2 2
1 1
≤ p∗ − f ∞ + q ∗ − f ∞ = p∗ − f ∞ = q ∗ − f ∞ ,
2 2
equality holds for k = 1, . . . , n + 1. In particular, we have
for all 1 ≤ k ≤ n + 1.
Due to the strict convexity of the norm | · | (see Remark 3.27) and by the
equivalence statement (d) in Theorem 3.26, the signs of the error functions
p∗ − f and q ∗ − f must agree on {x1 , . . . , xn+1 }, i.e.,
Now we note another important corollary, which directly follows from our
observation in Proposition 3.42 and from Exercise 3.73.
Corollary 5.5. For L > 0 let f ∈ C [−L, L]. Moreover, let p∗ ∈ Pn , for
n ∈ N0 , be the unique best approximation to f from Pn with respect to · ∞ .
Then the following statements are true.
(a) If f is even, then its best approximation p∗ ∈ Pn is even.
(b) If f is odd, then its best approximation p∗ ∈ Pn is odd.
Proof. The linear space Pn of algebraic polynomials is reflection-invariant,
i.e., for p(x) ∈ Pn , we have p(−x) ∈ Pn . Moreover, by Corollary 5.4 there
exists for any f ∈ C [−L, L] a unique best approximation p∗ ∈ Pn to f from
Pn with respect to · ∞ . Without loss of generality, we assume L = 1. By
Proposition 3.42 and Exercise 3.73, both statements (a) and (b) hold.
For illustration, we apply Corollary 5.5 in the following two examples.
Example 5.6. We approximate fm (x) = sin(mx), for m ∈ N, on [−π, π] by
linear polynomials. The function fm is odd, for all m ∈ N, and so is the best
approximation p∗m ∈ P1 to fm odd. Therefore, p∗m has the form p∗m (x) = αm x
for a slope αm ≥ 0, which is yet to be determined.
Case 1: For m = 1, the constant c ≡ 0 cannot be a best approximation
to f1 (x) = sin(x), since c − f1 has only two alternation points ±π/2. By
symmetry, we can restrict our following investigations to the interval [0, π].
The function p∗1 (x) − f1 (x) = α1 x − sin(x), with α1 > 0, has two alternation
points {x∗ , π} on [0, π],
(p∗1 − f1 )(x∗ ) = α1 x∗ − sin(x∗ ) = −η and (p∗1 − f1 )(π) = α1 π = η,
where η = p∗1 − f1 ∞ is the minimal distance between f1 and P1 . Moreover,
the alternation point x∗ satisfies the condition
0 = (p∗1 − f1 ) (x∗ ) = α1 − cos(x∗ ) which implies α1 = cos(x∗ ).
Therefore, x∗ is a solution of the nonlinear equation
cos(x∗ )(x∗ + π) = sin(x∗ ),
which we can solve numerically, whereby we obtain the alternation point x∗ ≈
1.3518, the slope α1 = cos(x∗ ) ≈ 0.2172 and the minimal distance η ≈ 0.6825.
Altogether, the best approximation p∗1 (x) = α1 x with {−π, −x∗ , x∗ , π} gives
four alternation points for p∗1 − f1 on [−π, π], see Figure 5.1 (a).
Case 2: For m > 1, p∗m ≡ 0 is the unique best approximation to fm .
For the minimal distance, we get p∗m − fm ∞ = 1 and the error function
∗
pm − fm has 2m alternation points
2k − 1
xk = ± π for k = 1, 2, . . . , m,
2m
see Figure 5.1 (b) for the case m = 2. ♦
5.1 Approaches to Construct Best Approximations 145
1
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
-3 -2 -1 0 1 2 3
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
-3 -2 -1 0 1 2 3
with the minimal distance η = p∗2 −f ∞ , for some positive slope α > 0. More-
over, e = p∗2 − f has on the set of extremal points Ep∗2 −f = {−1, −x∗ , 0, x∗ , 1}
alternating signs ε = (1, −1, 1, −1, 1). We compute α by the alternation con-
dition at x = 1,
(p∗2 − f )(1) = η + α − 1 = η,
and so we obtain α = 1, so that p∗2 (x) = η + x2 . The local minimum x∗ of
the error function e = p∗2 − f satisfies the necessary condition
e (x∗ ) = 2x∗ − 1 = 0,
5.1 Approaches to Construct Best Approximations 147
whereby x∗ = 1/2, so that Ep∗2 −f = {−1, −1/2, 0, 1/2, 1}. Finally, at x∗ = 1/2
we have the alternation condition
holds, whereby η = 1/8. Hence, the quadratic polynomial p∗2 (x) = 1/8 + x2
is the unique best approximation to f from P2 with respect to · ∞ . ♦
1.2
0.8
0.6
0.4
0.2
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Fig. 5.2. Approximation of the function f (x) = |x| on [−1, 1] by quadratic polyno-
mials. The best approximation p∗2 ∈ P2 to f is even and convex. The set of extremal
points Ep∗2 −f = {−1, −x∗ , 0, x∗ , 1} has five alternation points.
ε = (ε1 , . . . , εm )T ∈ {±1}m
m
m
ϕ(s∗ − f ) = λk εk (s∗ − f )(xk ) = λk |(s∗ − f )(xk )| = s∗ − f ∞ .
k=1 k=1
Definition 5.8. Let F be a linear space and M ⊂ F. Then the convex hull
conv(M) of M is the smallest convex set in F containing M, i.e.,
conv(M) = K.
M⊂K⊂F
K convex
5.1 Approaches to Construct Best Approximations 149
Note that any convex combination αx + (1 − α)y, α ∈ [0, 1], can be written
as a convex combination of the points x1 , . . . , xm , y1 , . . . , yn ,
m
n
m
n
αx + (1 − α)y = α λj xj + (1 − α) μ k yk = αλj xj + (1 − α)μk yk ,
j=1 k=1 j=1 k=1
m
x= λj xj with λ = (λ1 , . . . , λm )T ∈ Λm and x1 , . . . , xm ∈ M
j=1
μj (t∗ )
ρj = : m ∗
≥0 for j = 1, . . . , m
k=1 μk (t )
we have
m
ρj = 1
j=1
and
m
m
ρj (x − xj ) = 0 ⇐⇒ x= ρj x j .
j=1 j=1
Proof. We regard on the compact set Ln+1 = Λn+1 × Mn+1 the continuous
mapping ϕ : Ln+1 −→ F, defined as
n+1
ϕ(λ, X) = λj x j
j=1
m
0= λj xj with λ = (λ1 , . . . , λm )T ∈ Λm and x1 , . . . , xm ∈ M.
j=1
(a) ⇒ (b): Suppose statement (a) holds. Further suppose that 0 ∈ / conv(M).
Since conv(M) is compact, by Corollary 5.11, there is one β∗ ∈ conv(M),
β∗ = 0, of minimal Euclidean norm in conv(M). This minimum β∗ , viewed
as a best approximation from conv(M) to the origin with respect to · 2 , is
characterized by
Remark 5.13. The equivalence statement (a) in Theorem 5.12 says that the
Euclidean space Rd cannot be split by a separating hyperplane through the
origin into two half-spaces, such that M is entirely contained in one of the
two half-spaces.
|(s∗ − f )(x)|2 − 2(s∗ − f )(x)sβ (x) + s2β (x) < |(s∗ − f )(x)|2
But this is, due to the Kolmogorov criterion, Theorem 5.1, in contradiction
to the optimality of s∗ in (a),
Corollary 5.14 yields an important result concerning the characterization
of best approximations.
Corollary 5.15. For s∗ ∈ S the following statements are equivalent.
(a) s∗ is a best approximation to f ∈ C (Ω) \ S.
(b) There are m ≤ n + 1
• pairwise distinct extremal points x1 , . . . , xm ∈ Es∗ −f
• signs εj = sgn((s∗ − f )(xj )), for j = 1, . . . , m,
• coefficients λ = (λ1 , . . . , λm )T ∈ Λm with λj > 0 for all 1 ≤ j ≤ m,
satisfying
m
ϕ(s) := λj εj s(xj ) = 0 for all s ∈ S. (5.11)
j=1
154 5 Chebyshev Approximation
m
m
m
0= λj ((s∗ − f )(xj ))sk (xj ) = λj εj s∗ − f ∞ sk (xj ) = λj εj sk (xj )
j=1 j=1 j=1
εj · εj+1 = −1 for j = 1, . . . , m − 1
m
ϕ(u) = λk εk u(xk ) for u ∈ C (Ω) (5.12)
k=1
satisfying the dual characterization (5.11) of Corollary 5.15 for a point set
X = {x1 , . . . , xm } ⊂ Es∗ −f , where 2 ≤ m ≤ n + 1. Then, we have for any
s ∈ S the estimates
λmin
s − f ∞ ≥ s − f ∞,X ≥ s∗ − f ∞ + s∗ − s∞,X , (5.13)
1 − λmin
where λmin := min1≤j≤m λj > 0.
Since m ≥ 2, we have λmin ∈ (0, 1/2] and so λmin /(1 − λmin ) ∈ (0, 1].
Now let xj ∗ ∈ X be a point satisfying |(s − s∗ )(xj ∗ )| = s − s∗ ∞,X . If
εj (s − s∗ )(xj ∗ ) = s − s∗ ∞,X , then the second estimate in (5.13) is satisfied,
with λmin /(1−λmin ) ≤ 1. Otherwise, we have εj (s−s∗ )(xj ∗ ) = −s−s∗ ∞,X ,
whereby with ϕ(s − s∗ ) = 0 the estimate
m
λj ∗ s−s∗ ∞,X = λk εk (s−s∗ )(xk ) ≤ (1−λj ∗ ) max∗ εk (s−s∗ )(xk ) (5.15)
k=j
k=1
k=j ∗
λmin λj ∗
s∗ − s∞,X ≤ s∗ − s∞,X ≤ εk∗ (s − s∗ )(xk∗ ),
1 − λmin 1 − λj ∗
λmin
s − f ∞ − s∗ − f ∞ ≥ s − s∗ ∞,X for all s ∈ S. (5.16)
1 − λmin
Given this result, we can further analyze the question for the (strong) unique-
ness of best approximations to f ∈ C (Ω) \ S. To this end, we first take note
of the following simple observation.
λmin
0 = s∗∗ − f ∞ − s∗ − f ∞ ≥ s∗∗ − s∗ ∞,X
1 − λmin
by (5.16), and this implies, for λmin ∈ (0, 1), the identity
s∗∗ − s∗ ∞,X = 0.
m
λj εj p(xj ) = 0 for all p ∈ P2 . (5.18)
j=1
λ1 + . . . + λm = 1 (5.19)
We pose the conditions from (5.20) to the three elements of the monomial
basis {1, x, x2 } of P2 . For p ≡ 1 we get −λ1 + λ2 − λ3 + λ4 = 0, whereby
from (5.19) we get
1 λmin 1
λmin = and = .
8 1 − λmin 7
The characterization (5.13) in Theorem 5.17 implies the estimate
1
p − f ∞ − p∗2 − f ∞ ≥ p − p∗2 ∞,X for all p ∈ P2 . (5.23)
7
Next, we show the strong uniqueness of p∗2 , where we use Theorem 5.19.
To this end, note that · ∞,X is a norm on P2 . Therefore, it remains to
determine an equivalence constant β > 0, like in (5.17), satisfying
158 5 Chebyshev Approximation
for all p ∈ P2 , whereby (5.24) holds for β = 1/7. Together with (5.23), this
finally yields the sought estimate
1 1
p − f ∞ − p∗2 − f ∞ ≥ p − p∗2 ∞,X ≥ p − p∗2 ∞ for all p ∈ P2 .
7 49
Therefore, p∗2 is the strongly unique best approximation to f . ♦
sX = 0 =⇒ s ≡ 0 on Ω,
According to the Mairhuber4 -Curtis5 theorem [17, 48] there are no non-
trivial Haar systems on multivariate connected domains Ω ⊂ Rd , d > 1.
Before we prove the Mairhuber-Curtis theorem, we introduce a few notions.
Fig. 5.3. According to the Mairhuber-Curtis theorem, Theorem 5.25, there are no
non-trivial Haar systems H on domains Ω containing bifurcations.
If d{x1 ,x2 ,x3 ...,xn } = 0, then H, by Theorem 5.23, is not a Haar system.
Otherwise, we can shift the two points x1 und x2 by a continuous mapping
along the two branches of the bifurcation, without any coincidence between
points in X (see Figure 5.4).
Therefore, the determinant d{x2 ,x1 ,x3 ,...,xn } has, by swapping the first two
columns in matrix VH,X , opposite sign to d{x1 ,x2 ,x3 ,...,xn } , i.e.,
sgn d{x1 ,x2 ,x3 ,...,xn } = −sgn d{x2 ,x1 ,x3 ,...,xn } .
5.3 Haar Spaces 161
x2 x
xn 1
...
domain Ω X = (x1 , . . . , xn ) ∈ Ω n
x1 x1
x2
xn xn
x2
... ...
x1 x1 x
xn xn 2
x2
... ...
Fig. 5.4. Illustration of the Mairhuber-Curtis theorem, Theorem 5.25. The two
points x1 and x2 can be swapped by a continuous mapping, i.e., by shifts along the
branches of the bifurcation without coinciding with any other point from X.
162 5 Chebyshev Approximation
Due to the continuity of the determinant, there must be a sign change of the
determinant during the (continuous) swapping between x1 and x2 . In this
case, H = {s1 , . . . , sn } cannot be a Haar system, by Theorem 5.23. But this
is in contradiction to our assumption to H.
Due to the result of the Mairhuber-Curtis theorem, Theorem 5.25, we
restrict ourselves from now to the univariate case, d = 1. Moreover, we assume
from now that the domain Ω is a compact interval, i.e.,
Ω = [a, b] ⊂ R for − ∞ < a < b < ∞.
Before we continue our analysis on strongly unique best approximations,
we first give a few elementary examples for Haar spaces.
Example 5.26. For n ∈ N0 and [a, b] ⊂ R the linear space of polynomials Pn
is a Haar space of dimension n+1 on [a, b], since according to the fundamental
theorem of algebra any non-trivial polynomial from Pn has at most n zeros.
♦
Example 5.27. For N ∈ N0 the linear space TNC of all complex trigonometric
polynomials of degree at most N is a Haar space of dimension N + 1 on
[0, 2π), since TNC is, by Theorem 2.36, a linear space of dimension N + 1, and,
moreover, the linear mapping p −→ pX , for p ∈ TNC is, due to Theorem 2.39,
for all sets X ⊂ [0, 2π) of |X| = N + 1 pairwise distinct points bijective.
Likewise, we can show, by using Corollaries 2.38 and 2.40, that the linear
space TnR of all real trigonometric polynomials of degree at most n ∈ N0 is a
Haar space of dimension 2n + 1 on [0, 2π). ♦
Example 5.28. For [a, b] ⊂ R and λ0 < . . . < λn the functions
1 λ0 x 2
e , . . . , e λn x
are a Haar system on [a, b]. We can show this by induction on n.
Initial step: For n = 0 the statement is trivial.
Induction hypothesis: Suppose the statement is true for n − 1 ∈ N.
Induction step (n − 1 −→ n): If a function of the form
1 2
u(x) ∈ span eλ0 x , . . . , eλn x
has n + 1 zeros in [a, b], then the function
d −λ0 x
v(x) = e · u(x) for x ∈ [a, b]
dx
has, according to the Rolle6 theorem, at least n zeros in [a, b]. However,
3 4
v(x) ∈ span e(λ1 −λ0 )x , . . . , e(λn −λ0 )x ,
Example 5.29. The functions f1 (x) = x and f2 (x) = ex are not a Haar
system on [0, 2]. This is because dim(S) = 2 for S = span{f1 , f2 }, but the
continuous function
f (x) = ex − 3x ≡ 0
has by f (0) = 1, f (1) = e − 3 < 0 and f (2) > 0 at least two zeros in [0, 2].
Therefore, S cannot be a Haar space on [0, 2]. ♦
Example 5.30. For [a, b] ⊂ R let g ∈ C n+1 [a, b] satisfy g (n+1) (x) > 0 for all
x ∈ [a, b]. Then, the functions {1, x, . . . , xn , g} are a Haar system on [a, b]:
First note that the functions 1, x, . . . , xn , g(x) are linearly independent, since
from
α0 1 + α1 x + . . . + αn xn + αn+1 g(x) ≡ 0 for x ∈ [a, b]
we can conclude αn+1 g (n+1) (x) ≡ 0 after (n + 1)-fold differentiation, whereby
αn+1 = 0. The remaining coefficients α0 , . . . , αn do also vanish, since the
monomials 1, x, . . . , xn are linearly independent. Moreover, we can show that
any function u ∈ span{1, x, . . . , xn , g} \ {0} has at most n + 1 zeros in [a, b]:
Suppose
n
u(x) = αj xj + αn+1 g(x) ≡ 0
j=0
satisfying ϕ(S) = 0, where m ≤ n+1. For the case of Haar spaces S ⊂ C [a, b]
the length of the dual functional in (5.25) is necessarily m = n + 1. Let us
take note of this important observation.
Proposition 5.31. Let ϕ : C [a, b] −→ R be a functional of the form (5.25),
where m ≤ n + 1. Moreover, let S ⊂ C [a, b] be a Haar space of dimension
dim(S) = n ∈ N on [a, b]. If ϕ(S) = {0}, then we have m = n + 1.
Proof. Suppose m ≤ n. Then, due to Theorem 5.23 (c), the Haar space S
contains one element s ∈ S satisfying s(xj ) = εj , for all 1 ≤ j ≤ m. But for
this s, we find ϕ(s) = λ1 = 1, in contradiction to ϕ(S) = {0}.
164 5 Chebyshev Approximation
dk = det(VH,X\{xk } ) = 0 for 1 ≤ k ≤ n + 1
εk = (−1)k−1 σ for 1 ≤ k ≤ n + 1
satisfying d(0) = dk+1 and d(1) = dk must have a sign change on (0, 1). Due
to the continuity of d there is one α∗ ∈ (0, 1) satisfying d(α∗ ) = 0. However,
in this case, the Vandermonde matrix VH,(x1 ,...,xk−1 ,γ(α∗ ),xk+2 ,...,xn+1 ) ∈ Rn×n
is singular. Due to Theorem 5.23 (d), the elements in (s1 , . . . , sn ) are not a
Haar system on I ⊂ R. But this is in contradiction to our assumption.
5.3 Haar Spaces 165
(b): According to the Laplace7 expansion (here with respect to the first
row), the determinant of Aε,H,X has the representation
n+1
n+1
det(Aε,H,X ) = (−1)k+1 (−1)k−1 σ · dk = σ dk .
k=1 k=1
By using the results of Propositions 5.31 and 5.32, we can prove the
alternation theorem, being the central result of this chapter. According to the
alternation theorem, the signs ε = (ε1 , . . . , εn+1 ) of the dual characterization
in (5.25) are for the case of Haar spaces S alternating. Before we prove the
alternation theorem, we first give a formal definition for alternation sets.
with the alternation matrix Aε,H,X on the left hand side in (5.28). According
to Proposition 5.32 (a), the matrix Aε,H,X is non-singular. Therefore, the
products εk λk , for 1 ≤ k ≤ n + 1, uniquely solve the linear system (5.28).
Due to the Cramer8 rule we have the representation
(−1)k−1 dk
ε k λk = for all 1 ≤ k ≤ n + 1,
det(Aε,H,X )
where according to Proposition 5.32 (a) the signs of the n + 1 determinants
dk = det(VH,X\{xk } ), for 1 ≤ k ≤ n + 1, are constant. This implies εk λk = 0,
and, moreover, there is one unique vector λ = (λ1 , . . . , λn+1 )T ∈ Λn+1 with
positive coefficients
dk
λk = :n+1 >0 for all 1 ≤ k ≤ n + 1
j=1 dj
which solves the linear system (5.28). This solution λ ∈ Λn+1 of (5.28) finally
yields the characterizing functional (according to Corollary 5.15),
n+1
ϕ(u) = λj εj u(xj ) for u ∈ C (IK ), (5.29)
j=1
satisfying ϕ(S) = {0}. Due to Corollary 5.15, s∗ is the (strongly unique) best
approximation to f .
Now suppose that s∗ ∈ S is the strongly unique best approximation to
f ∈ C (IK ) \ S. Recall that the dual characterization in Corollary 5.15 proves
the existence of a functional ϕ : C (IK ) −→ R of the form (5.25) satisfying
ϕ(S) = {0}, where ϕ has, according to Proposition 5.31, length m = n + 1.
We show that the point set X = (x1 , . . . , xn+1 ) ∈ Esn+1 ∗ −f (from the dual
with the right hand side fX = (f (x1 ), . . . , f (xn+1 ))T ∈ Rn+1 and the alter-
nation matrix Aε,H,X ∈ R(n+1)×(n+1) in (5.27), containing the sign vector
ε = (−1, 1, . . . , (−1)n+1 ) ∈ {±1}n+1 , or,
⎡ ⎤⎡ ⎤ ⎡ ⎤
−1 s1 (x1 ) · · · sn (x1 ) ηX f (x1 )
⎢ 1 s1 (x2 ) · · · sn (x2 ) ⎥ ⎢ ∗⎥ ⎢ ⎥
⎢ ⎥ ⎢ α1 ⎥ ⎢ f (x2 ) ⎥
⎢ .. .. .. ⎥ ⎢ . ⎥ = ⎢ .. ⎥.
⎣ . . . ⎦ ⎣ .. ⎦ ⎣ . ⎦
(−1)n+1 s1 (xn+1 ) · · · sn (xn+1 ) αn∗ f (xn+1 )
k
ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n − 1.
j=1
n−1
s∗X = [x1 , . . . , xk+1 ](f − ηX ε)ωk ∈ Pn−1 (5.34)
k=0
Indeed, due to Corollary 2.18 (b), all polynomials from Pn−1 are contained
in the kernel of [x1 , . . . , xn+1 ]. In particular, we have [x1 , . . . , xn+1 ](s∗X ) = 0.
5.4 The Remez Algorithm 169
Under the alternation condition (5.32), s∗X ∈ Pn−1 is the unique solution
of the interpolation problem
already for the first n alternation points (x1 , . . . , xn ) ∈ Esn∗ −f . This gives the
X
stated Newton representation of s∗X in (5.34).
X fX X εX
0 1 0 −1
1 e e−1 1 1 2
2 e 2
e(e − 1) (e − 1) /22
2 −1 −2 −2
Hereby we obtain
2 2
[0, 1, 2](f ) e−1 e−1
ηX = =− and so s∗X − f ∞,X = .
[0, 1, 2](ε) 2 2
Moreover,
2
e−1 e2 − 1
s∗X = [0](f − ηX ε) + [0, 1](f − ηX ε)x = 1 − + x
2 2
is the unique best approximation to f from P1 w.r.t. ·∞,X (see Fig. 5.5 (a)).
♦
170 5 Chebyshev Approximation
8
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
X fX X εX
−1 1 −1 −1
− 12 1
2 −1 − 12 1 4
0 0 −1 0 0 −1 −4 −8
1 1 4 1 32
2 2 1 2 3 2 1 4 8 3
Now we describe the iteration steps of the Remez algorithm. At any Remez
step the current reference set (in increasing order)
With the Remez exchange, the point x∗ is swapped for the point x̂ ∈ X, such
that the points of the new reference set X+ are in increasing order, i.e.,
172 5 Chebyshev Approximation
a ≤ x+
1 < x2 < . . . < xn < xn+1 ≤ b,
+ + +
sgn((s∗X − f )(x+ j
j )) = (−1) σ for 1 ≤ j ≤ n + 1
for some σ ∈ {±1}. The exchange for the point pair (x̂, x∗ ) ∈ X × [a, b] \ X
is described by the Remez exchange, Algorithm 8.
These conditions are required for the performance of the Remez algorithm.
5.4 The Remez Algorithm 173
The Remez algorithm generates a sequence (Xk )k∈N0 ⊂ [a, b]n+1 of reference
sets, so that for the transition from X = Xk to X+ = Xk+1 , for any k ∈ N0 ,
all three conditions in Remark 5.39 are satisfied. The corresponding sequence
of best approximations s∗k ∈ S to f with respect to · ∞,Xk satisfying
Remark 5.40. We remark that the construction of the reference set Xk+1
in line 11 of Algorithm 9 can be accomplished by a Remez exchange step,
Algorithm 8. In this case, all three conditions in lines 12-15 of Algorithm 9
are satisfied, according to Remark 5.39.
174 5 Chebyshev Approximation
n+1
(k) (k) (k)
ϕ(u) = λj εj u(xj ) for u ∈ C [a, b],
j=1
Proposition 5.41. Let the assumptions from the Remez algorithm be satis-
fied. Then, for any step k ∈ N0 , where the Remez iteration does not terminate,
we have the monotonicity of the minimal distances,
ηk+1 > ηk .
n+1
(k+1) (k+1) ∗ (k+1)
ηk+1 = λj εj (sk+1 − f )(xj )
j=1
n+1
(k+1) (k+1) ∗ (k+1)
= λj εj (sk − f )(xj )
j=1
n+1
|(s∗k − f )(xj
(k+1) (k+1)
= λj )|
j=1
= sgn(s∗k − f )(xj
(k+1) (k+1)
εj ) for all 1 ≤ j ≤ n + 1
Suppose the statement is false. Then, there are sequences of reference sets
(Xk )k , signs (ε(k) )k , and coefficients (λ(k) )k satisfying
n+1
(k) (k) (k)
ηk = − λj εj f (xj ) ≥ η0 > 0 for all k ∈ N0 , (5.37)
j=1
But the elements of the sequences (Xk )k , (ε(k) )k , and (λ(k) )k lie in com-
pact sets, respectively. Therefore, there are convergent subsequences with
(k )
xj −→ xj ∈ [a, b] for → ∞,
(k )
εj −→ εj ∈ {±1} for → ∞,
(k )
λj −→ λj ∈ [0, 1] for → ∞,
for all 1 ≤ j ≤ n + 1, where λj ∗ = 0 for one index j ∗ ∈ {1, . . . , n + 1}.
Now we regard an interpolant s ∈ S satisfying s(xj ) = f (xj ) for all
1 ≤ j ≤ n + 1, j = j ∗ . Then, we have
n+1
(k ) (k ) ∗ (k )
n+1
(k ) (k ) (k )
η k = λj εj (sk − f )(xj )= λj εj (s − f )(xj )
j=1 j=1
n+1
(k ) (k ) (k ) (k ) (k ) (k )
= λj εj (s − f )(xj ) + λj ∗ εj ∗ (s − f )(xj ∗ )
j=1
j=j ∗
n+1
−→ λj εj (s − f )(xj ) + λj ∗ εj ∗ (s − f )(xj ∗ ) = 0 for → ∞.
j=1
j=j ∗
176 5 Chebyshev Approximation
and so
(k+1)
η − ηk+1 < (1 − λj ∗ )(η − ηk ).
(k+1)
By Lemma 5.42, there is one α > 0 satisfying λj ≥ α, for all 1 ≤ j ≤ n+1
and all k ∈ N0 . Therefore, the stated contraction (5.38) holds for θ = 1 − α ∈
(0, 1). From this, we get the estimate
We can conclude that the sequence (s∗k )k ⊂ S of the strongly unique best
approximations to f on Xk converges to the strongly unique best approxi-
mation s∗ to f .
Finally, we discuss one important observation. We note that for the ap-
proximation of strictly convex functions f ∈ C [a, b] by linear polynomials,
the Remez algorithm may return the best approximation s∗ ∈ P1 to f after
only one step.
(f − s)(λx + (1 − λ)y)
= f (λx + (1 − λ)y) − m · (λx + (1 − λ)y) − c
< λf (x) + (1 − λ)f (y) − m · (λx + (1 − λ)y) − c
= λf (x) − λmx − λc + (1 − λ)f (y) − (1 − λ)my − (1 − λ)c
= λ(f − s)(x) + (1 − λ)(f − s)(y)
(f − s∗ )(a) = f − s∗ ∞ = (f − s∗ )(b).
f (b) − f (a)
m∗ = = [a, b](f ).
b−a
178 5 Chebyshev Approximation
(f − s∗0 )(a) = σf − s∗0 ∞,X0 = (f − s∗0 )(b) for some σ ∈ {±1},
(f − s∗0 )(x∗ ) < (f − s∗0 )(x) for all x ∈ [a, b], where x = x∗ . (5.40)
By the strict convexity of f −s∗0 , we can further conclude the strict inequality
or,
ρ0 = f −s∗0 ∞ = |(f −s∗0 )(x∗ )| > |(f −s∗0 )(x0 )| = f −s∗0 ∞,X0 = η0 . (5.41)
By (5.40) and (5.41), the point x∗ is the unique global maximum of |f −s∗0 |
on [a, b]. Therefore, x∗ is the only candidate for the required Remez exchange
(in line 5 of Algorithm 8) for x0 . After the execution of the Remez exchange,
we have X1 = (a, x∗ , b), so that the Remez algorithm immediately terminates
with returning s∗1 = s∗ .
According to Proposition 5.44, the Remez algorithm returns already after the
next iteration the best approximation s∗1 to f .
Finally, we compute s∗1 , the best approximation to f for the reference set
X1 = (0, x∗ , 2). To this end, we proceed as in Example 5.37, where we first
determine the required divided differences for f and ε = (−1, 1, −1) by using
the recursion in Theorem 2.14:
X fX X εX
0 1 0 −1
e2 −1 e2 −3
x∗ 2 2x∗ x∗ 1 2
x∗
e2 +1 (e2 −1)(x∗ −1)+2 2 −1 − 2−x
2
− (2−x2∗ )x∗
2 e2 2(2−x∗ ) 2(2−x∗ )x∗
∗
From this we compute the minimal distance s∗1 − f ∞,X1 = −η1 ≈ 0.7579
by
1
η1 = − (e2 − 1)(x∗ − 1) + 2
4
and the best approximation to f from P1 with respect to · ∞,X1 by
e2 − 4η1 − 3
s∗1 = [0](f − ηX ε) + [0, x∗ ](f − ηX ε)x = 1 + η1 + x.
2x∗
By Proposition 5.44, the Remez algorithm terminates with the reference set
X1 = Es∗1 −f , so that by s∗1 ∈ P1 the unique best approximation to f with
respect to · ∞ is found. Figure 5.5 shows the best approximations s∗j ∈ P1
to f for the reference sets Xj , for j = 0, 1. ♦
5.5 Exercises
Exercise 5.46. Let F = C [−1, 1] be equipped with the maximum norm
· ∞ . Moreover, let f ∈ P3 \ P2 be a cubic polynomial, i.e., has the form
for s∗ and f (see Corollary 5.4). For the dual characterization of the best
approximation p∗ ∈ Pn−1 we use, as in (5.6), a linear functional ϕ ∈ F of
the form
n+1
ϕ(u) = λk εk u(xk ) for u ∈ C [a, b]
k=1
a ≤ x 1 < . . . < xn ≤ b
Exercise 5.51. Let F = C [0, 2] be equipped with the maximum norm ·∞ .
Determine the strongly unique best approximation p∗ ∈ P1 from P1 to the
function f ∈ C [0, 2], defined as
n−1
k
p∗ (x) = αk ωk (x) where ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n − 1
k=0 j=1
n−1
p(x) = αk ωk (x) ∈ Pn−1 ,
k=0
Exercise 5.62. Analyze for the case S = Pn−1 the asymptotic computa-
tional complexity for only one iteration of the Remez algorithm, Algorithm 9.
(a) Determine the costs for the minimal distance ηk = s∗k − f ∞,Xk .
Hint: Use divided differences (according to Proposition 5.35).
(b) Determine the costs for computing the Newton coefficients of s∗k .
Hint: Reuse the divided differences from (a).
(c) Sum up the required asymptotic costs in (a) and (b).
How do you efficiently compute the update ηk+1 from information that is
required to compute ηk ?
ρk = s∗k − f ∞ for k ∈ N0
between f ∈ C [a, b] and the current best approximation s∗k ∈ S to f , for the
(k) (k)
current reference set Xk = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 and w.r.t. · ∞,Xk .
Show that the sequence (ρk )k∈N0 is not necessarily strictly increasing. To
this end, construct a simple (but non-trivial) counterexample.
6 Asymptotic Results
(f, 1)
n
(Fn f )(x) = + [(f, cos(j·)) cos(jx) + (f, sin(j·)) sin(jx)] ,
2 j=1
and with respect to the maximum norm · ∞ . To this end, we first show for
continuous functions f ∈ C2π convergence of Fn f to f with respect to · ,
and then we prove convergence rates, for f ∈ C2π
k
, k ∈ N0 , of the form
Fn f − f ∞ for n → ∞.
Likewise, we will also discuss the algebraic case for the approximation to
f ∈ C [a, b] by partial sums Pn f from Pn .
s − f < ε.
Now we can give a more concise formulation for the above two questions.
• Are the algebraic polynomials P dense in C [a, b] with respect to · ∞ ?
• Are the trigonometric polynomials T dense in C2π with respect to · ∞ ?
S = F,
or, in other words: For any f ∈ F, there is a convergent sequence (sn )n∈N in
S with limit f , so that sn − f −→ 0 for n → ∞.
Example 6.4. The set Q of rational numbers is dense in the set R of real
numbers with respect to the absolute-value function | · |. ♦
Now let us turn to the Weierstrass theorems, for which there exist many
different proofs (see, e.g. [33]). Our constructive proof for the algebraic case of
the Weierstrass theorem relies on a classical account via Korovkin sequences.
n
(n)
βj (x) = 1 for all x ∈ [0, 1].
j=0
2
Pavel Petrovich Korovkin (1913-1985), Russian mathematician
3
Sergei Natanovich Bernstein (1880-1968), Russian mathematician
188 6 Asymptotic Results
Note that property (c) holds by the binomial theorem, whereas properties (a)
and (b) can be verified by elementary calculations (cf. Exercise 6.83).
By using the Bernstein polynomials in (6.1) we can make an important
example for monotone linear operators on C [0, 1].
Definition 6.8. For n ∈ N, the Bernstein operator Bn : C [0, 1] −→ Pn
is defined as
n
(n)
(Bn f )(x) = f (j/n)βj (x) for f ∈ C [0, 1], (6.2)
j=0
(n) (n)
where β0 , . . . , βn ∈ Pn are the Bernstein polynomials in (6.1).
The Bernstein operators Bn are obviously linear on C [0, 1]. By the posi-
(n)
tivity of the Bernstein polynomials βj , Remark 6.7 (b), the Bernstein ope-
rators Bn are, moreover, positive (and therefore monotone) on C [0, 1]. We
note yet another elementary property of the operators Bn .
Remark 6.9. The Bernstein operators Bn : C [0, 1] −→ Pn in (6.2) are
bounded on C [0, 1] with respect to · ∞ , since for any f ∈ C [0, 1], we have
n
n (n)
Bn f ∞ =
(n)
f (j/n)βj (x) ≤ f ∞ βj (x)
= f ∞
j=0 j=0
∞ ∞
and so
Bn f ∞ ≤ f ∞ for all f ∈ C [0, 1].
In particular, by transferring the result of Theorem 3.45 from linear func-
tionals to linear operators, we can conclude that the Bernstein operators
Bn : C [0, 1] −→ Pn are continuous on C [0, 1].
Now we prove the Korovkin property for the Bernstein operators.
Theorem 6.10. The sequence of Bernstein operators Bn : C [0, 1] −→ Pn ,
for n ∈ N, is a Korovkin sequence on C [0, 1].
Proof. The Bernstein operators Bn , n ∈ N, reproduce linear polynomials.
Indeed, on the one hand, we have Bn 1 ≡ 1, for all n ∈ N, by the partition of
unity, according to Remark 6.7 (c). On the other hand, we find for p1 (x) = x
the identity Bn p1 = p1 , for any n ∈ N, since we get
n n
j n j n−1 j
(Bn p1 )(x) = x (1 − x) n−j
= x (1 − x)n−j
j=0
n j j=1
j − 1
n−1
n−1 j
=x x (1 − x)n−j−1 = x.
j=0
j
6.1 The Weierstrass Theorem 189
for the quadratic monomial p2 (x) = x2 . To this end, we apply the Bernstein
operators Bn to the sequence of functions
n x
fn (x) = x2 − ∈ P2 for n ≥ 2,
n−1 n−1
where for n ≥ 2 we have
n 2
n j n j
(Bn fn )(x) = − xj (1 − x)n−j
j=0
j n n − 1 n(n − 1)
2
n
n! j(j − 1) j
= x (1 − x)n−j
j=0
(n − j)!j! n(n − 1)
n
(n − 2)!
= xj (1 − x)n−j
j=2
(n − j)!(j − 2)!
n−2
n−2 j
= x2 x (1 − x)n−j−2 = p2 (x).
j=0
j
Proof. Suppose f ∈ C [a, b]. Then, f is bounded on [a, b], i.e., there is some
M > 0 with f ∞ ≤ M . Moreover, f is uniformly continuous on the compact
interval [a, b], i.e., for any ε > 0 there is some δ > 0 satisfying
|x − y| < δ =⇒ |f (x) − f (y)| < ε/2 for all x, y ∈ [a, b].
Now let t ∈ [a, b] be fixed. Then, we have for x ∈ [a, b] the two estimates
2
ε x−t ε 2M
f (x) − f (t) ≤ + 2M = + 2 x2 − 2xt + t2
2 δ 2 δ
2
ε x−t ε 2M
f (x) − f (t) ≥ − − 2M = − − 2 x2 − 2xt + t2 ,
2 δ 2 δ
190 6 Asymptotic Results
as well as
for all n ≥ N . From (6.4), (6.5) and (6.6), we obtain the estimate
|(Kn f )(x) − f (t)| ≤ |(Kn f )(x) − f (t)(Kn 1)(x)| + |f (t)(Kn 1)(x) − f (t)|
ε 2M
≤ (ε̃ + 1) + 2 ε̃(1 + 2|t| + t2 ) + (x − t)2 + M ε̃,
2 δ
where for x = t, the inequality
ε 2M
|(Kn f )(t) − f (t)| ≤ (ε̃ + 1) + 2 ε̃(1 + 2|t| + t2 ) + M ε̃ (6.7)
2 δ
follows for all n ≥ N .
Now the right hand side in (6.7) can uniformly be bounded from above
by an arbitrarily small ε̂ > 0, so that we have, for some N ≡ N (ε̂) ∈ N,
p(cos(kx)) ∈ T for k ∈ N0
is an even function.
We now show that the even trigonometric polynomials are, with respect
to the maximum norm · ∞ , dense in C [0, π].
Lemma 6.16. For any f ∈ C [0, π] and ε > 0, there is one even trigono-
metric polynomial Tg ∈ T satisfying
Tg − f ∞ < ε.
Proof. Suppose f ∈ C [0, π]. Then, g(t) = f (arccos(t)) ∈ C [−1, 1]. Therefore,
according to the Weierstrass theorem, Corollary 6.12, there is one algebraic
polynomial p ∈ P satisfying p − g∞,[−1,1] < ε. This implies
with (even) error functions ηfe , ηge ∈ C2π , where ηfe ∞ , ηge ∞ < ε/4.
From these two representations, we obtain the identity
where
cos2 (x)f (x) = Tfs˜(x − π/2) + ηfs˜(x − π/2) = Tfc (x) + ηfc (x) with ηfc ∞ < ε/2,
f (x) = Tfs (x) + Tfc (x) + ηfs (x) + ηfc (x) = Tf (x) + ηf (x) with ηf ∞ < ε
Remark 6.19. Corollary 6.18 states that convergence in the maximum norm
· ∞ implies convergence in any p-norm · p , 1 ≤ p < ∞. The converse,
however, does not hold in general. In this sense, the maximum norm · ∞
is the strongest among all p-norms, for 1 ≤ p ≤ ∞.
1/2
and the Euclidean norm · w = (·, ·)w . Then, any function f ∈ C [a, b] can,
w.r.t. · w , be approximated arbitrarily well by algebraic polynomials, i.e.,
the polynomial space P is, with respect to · w , dense in C [a, b].
We wish to transfer our results from Section 4.2 to infinite (countable and
ordered) orthogonal systems (and orthonormal systems) (sj )j∈N in F. Our
first result on this is based on the following characterization.
Theorem 6.21. Let (sj )j∈N be an orthogonal system in a Euclidean space
F with inner product (·, ·) and norm · = (·, ·)1/2 . Then, the following
statements are equivalent.
(a) The span of (sj )j∈N is dense in F, i.e., F = span{sj | j ∈ N}.
(b) For any f ∈ F the sequence (Πn f )n∈N of partial sums Πn f in (6.9)
converges to f with respect to the norm · , i.e.,
Πn f −→ f for n → ∞. (6.11)
Proof. For any f ∈ F, the n-th partial sum Πn f is the unique best approxi-
mation to f from Sn = span{s1 , . . . , sn } with respect to · .
(a) ⇒ (b): Suppose for f ∈ F and ε > 0, there is one N ∈ N and sN ∈ SN
satisfying sN − f < ε. Then, we have for n ≥ N
Πn f − f −→ 0 for n → ∞,
(c) ⇒ (a): From the Pythagoras theorem (6.13) and by (6.10), we obtain
n
|(f, sj )|2
Πn f − f 2 = f 2 − −→ 0 for n → ∞
j=1
sj 2
and so there is, for any ε > 0, one N ≡ N (ε) satisfying ΠN f − f < ε.
Definition 6.22. An orthogonal system (sj )j∈N satisfying one of the proper-
ties (a), (b), or (c) in Theorem 6.21 (and so all three properties), is called a
complete orthogonal system in F. The notion of a complete orthonormal
system is defined accordingly.
Proof. The representation (6.15) follows from property (c) in Theorem 6.21
by the Pythagoras theorem (6.13) and the Parseval identity (6.10).
Next, we prove a useful criterion for the completeness of systems (sj )j∈N
in Hilbert spaces F, in particular for the completeness of orthogonal systems.
S := span{sj | j ∈ N} ⊂ F
where 2 denotes the linear space of all square summable sequences with
indices in Z (cf. Remark 3.15).
For a Riesz basis B, the “best possible” constants, i.e., the largest A and the
smallest B satisfying (6.18), are called Riesz constants of B.
4
Frigyes Riesz (1880-1956), Hungarian mathematician
6.2 Complete Orthogonal Systems and Riesz Bases 199
describe the stability of the Riesz basis representation with respect to per-
turbations of the coefficients in c ∈ 2 . Therefore, Riesz bases are also often
referred to as 2 -stable bases of F.
Proposition 6.29. Let B = (un )n∈Z be a Riesz basis of F with Riesz con-
stants 0 < A ≤ B < ∞. Then, the synthesis operator G : 2 −→ F in (6.19)
has the following properties.
√
(a) The operator G is continuous, where G has operator norm G = B.
(b) The operator G is bijective. √
(c) The inverse G−1 of G is continuous with operator norm G−1 = 1/ A.
Proof. Statement (a) follows directly from the upper Riesz estimate in (6.18).
As for the proof of (b), note that G is surjective, since span{un | n ∈ Z}
is by (6.17) dense in F. Moreover, G is injective, since by (6.18) the kernel of
G can only contain the zero element. Altogether, the operator G is bijective.
Finally, for the inverse G−1 : F −→ 2
of G we find by (6.18) the estimate
1
G−1 (f )2 ≤ f 2 for all f ∈ F
A
√
and this implies the continuity of G−1 at operator norm G−1 = 1/ A.
This proves property (c).
G∗ (f ) = ((f, un ))n∈Z ∈ 2
for all f ∈ F.
(b) The operator G∗ is bijective and has the inverse (G∗ )−1 = (G−1 )∗ .
(c) The operators G∗ and (G∗ )−1 are continuous via the isometries
for all c ∈ 2 , and this already implies the stated representation in (a).
By the representation in (a) in combination with the Riesz basis property
of B, we see that G∗ is bijective. Moreover, for f, g ∈ F the representation
(b) the Riesz basis B̃ has Riesz constants 0 < 1/B ≤ 1/A < ∞.
(c) any f ∈ F can uniquely be represented w.r.t. B or B̃, respectively, as
f= (f, ũn )un = (f, un )ũn . (6.22)
n∈Z n∈Z
6.2 Complete Orthogonal Systems and Riesz Bases 201
(un , ũm ) = (un , (GG∗ )−1 um ) = (G−1 un , G−1 um )2 = δmn (6.23)
holds for any m, n ∈ Z. Moreover, for c = (cn )n∈Z ∈ 2 , we have the identity
, -
∗ −1
cn ũn = (GG ) cn un = (G∗ )−1 c .
n∈Z n∈Z
By G∗ 2 = B and (G∗ )−1 2 = 1/A, we get the Riesz stability for B̃, i.e.,
2
1 1
c22 ≤ cn ũn ≤ c22 for all c = (cn )n∈Z ∈ 2 . (6.24)
B A
n∈Z
Now the continuity of (GG∗ )−1 and the completeness of B in (6.17) implies
F = span{ũn | n ∈ Z},
i.e., B̃ is a Riesz basis of F with Riesz constants 0 < 1/B ≤ 1/A < ∞. The
stated uniqueness of B̃ follows from the orthonormality relation (6.23).
Let us finally show property (c). Since G is surjective, any f ∈ F can be
represented as
f= c n un for some c = (cn )n∈Z ∈ 2 .
n∈Z
Likewise, the stated representation in (6.22) with respect to the Riesz basis
B̃ can be shown by similar arguments.
202 6 Asymptotic Results
From the estimates in (6.24) and the representation in (6.22), we get the
stability of the coefficients (f, un ))n∈Z ∈ 2 under perturbations of f ∈ F.
Corollary 6.32. Let B = (un )n∈Z be a Riesz basis of F with Riesz constants
0 < A ≤ B < ∞. Then, the stability estimates
hold.
hold, where the “best possible” constants, i.e., the largest A and the smallest
B satisfying (6.26), are called frame constants of B.
Remark 6.35. Any frame B = (un )n∈Z of F is complete in F, i.e., the span
of B is dense in F,
F = span{un | n ∈ Z}.
This immediately follows from the completeness criterion, Theorem 6.26, by
using the lower estimate in (6.26).
Remark 6.36. Every Riesz basis B is a frame, but the converse is general not
true. Indeed, a frame B = (un )n∈Z allows ambiguities in the representation
f= c n un for f ∈ F,
n∈Z
due to a possible 2
-linear dependence of the elements in B.
6.2 Complete Orthogonal Systems and Riesz Bases 203
Remark 6.37. For any frame B = (un )n∈Z of F, there exists a dual frame
B̃ = (ũn )n∈Z of F satisfying
f= (f, un )ũn = (f, ũn )un for all f ∈ F.
n∈Z n∈Z
However, the duality relation (un , ũm ) = δnm in (6.21) does not in general
hold, since otherwise the elements of B and the elements of B̃ would be 2 -
linearly independent, respectively.
Example 6.39. For the Euclidean space Rd , where d ∈ N, equipped with the
Euclidean norm · 2 , any basis B = {u1 , . . . , ud } of Rd is a Riesz basis of Rd .
Indeed, in this case, we have for the regular matrix U = (u1 , . . . , ud ) ∈ Rd×d
and for any vector c = (c1 , . . . , cd )T ∈ Rd the stability estimates
N
U −1 −1
2 c2 ≤ cn un = U c2 ≤ U 2 c2 .
n=1 2
Therefore, the Riesz constants 0 < A ≤ B < ∞ of B are given by the spectral
norms of the matrices U and U −1 , so that A = U −1 −2
2 and B = U 2 . The
2
−1
unique dual Riesz basis B̃ of B is given by the rows of the inverse U . This
immediately follows by U U −1 = I from Theorem 6.31 (a). ♦
N
f= (f, un )ũn for all f ∈ Rd .
n=1
By U f = cf , we have U U T f = U cf and so
T
a0
n
R
(Fn f )(x) = + (aj cos(jx) + bj sin(jx)) for f ∈ C2π (6.29)
2 j=1
with Fourier coefficients a0 = (f, 1)R , aj = (f, cos(j·))R , and bj = (f, sin(j·))R ,
for j ∈ N, see Corollary 4.12. As we noticed in Section 4.3, the Fourier ope-
R
rator Fn : C2π −→ TnR gives the orthogonal projection of C2π R
onto TnR . In
R R
particular, Fn f ∈ Tn is the unique best approximation to f ∈ C2π from TnR
with respect to the Euclidean norm · R .
As regards our notations concerning real-valued against complex-valued
R
functions, we recall Remark 4.10: For real-valued functions f ∈ C2π ≡ C2π ,
we apply the inner product (·, ·) = (·, ·)R and the norm · = · R . In
C
contrast, for complex-valued functions f ∈ C2π , we use (·, ·)C and · C .
From our above discussion, we can conclude the following convergence result.
Proof. The statement follows immediately from property (b) in Theorem 6.21
in combination with Corollary 6.41.
206 6 Asymptotic Results
Next, we quantify the speed of convergence for the Fourier partial sums
Fn f . To this end, the complex representation in (4.23),
n
(Fn f )(x) = cj eijx , (6.30)
j=−n
for f ∈ C2π
k
and therefore
1 (k)
Fn f − f ≤ F n f (k)
− f = o(n−k ) for n → ∞,
(n + 1)k
where we use the convergence
Fn f (k) − f (k) −→ 0 for n → ∞
Further note that the decay of cj (f ) follows from the assumption f ∈ C2πk
.
As for the converse, we can determine the smoothness of f from the asymp-
totic decay of the Fourier coefficients cj (f ). More precisely: If the Fourier
coefficients cj (f ) of f have the asymptotic decay
|cj (f )| = O |j|−(k+1+ε) for |j| → ∞
lim Fn f − f ∞ = 0.
n→∞
Therefore, the error function Fn f − f has at least one zero xn in the open
interval (0, 2π), whereby for x ∈ [0, 2π] we obtain the representation
x x
(Fn f − f )(x) = (Fn f − f ) (ξ) dξ = (Fn f − f )(ξ) dξ,
xn xn
where we used the identity (Fn f ) = Fn f (see Exercise 6.92). By the Cauchy-
Schwarz inequality, we further obtain
x x
|(Fn f − f )(x)| ≤
2
1 dξ · |(Fn f − f )(ξ)| dξ
2
xn xn
≤ (2π)2 Fn f − f 2 −→ 0 for n → ∞, (6.33)
which already proves the stated uniform convergence.
Now we conclude from Theorem 6.44 a corresponding result concerning
the convergence rate of (Fn f )n∈N0 with respect to the maximum norm · ∞ .
Corollary 6.47. For f ∈ C2π k
, where k ≥ 1, the Fourier partial sums Fn f
converge uniformly to f at convergence rate k − 1, according to
Fn f − f ∞ = o(n−(k−1) ) for n → ∞.
Proof. For f ∈ C2π
k−1
, we have by (6.33) and (6.31) the estimate
2π (k)
Fn f − f ∞ ≤ 2πFn f − f ≤ Fn f (k)
− f ,
(n + 1)k−1
whereby we obtain for f (k) ∈ C2π the asymptotic convergence behaviour
Fn f − f ∞ = o(n−(k−1) ) for n → ∞
according to Corollary 6.43.
Note that in the last line we applied the trigonometric addition formula
is called Dirichlet5 kernel. Note that the Dirichlet kernel is 2π-periodic and
even, so that we can further simplify the representation in (6.36) to obtain
2π
1
(Fn f )(x) = f (τ )Dn (τ − x) dτ
π 0
1 2π−x
= f (x + σ)Dn (σ) dσ
π −x
1 π
= f (x + σ)Dn (σ) dσ. (6.38)
π −π
we can rewrite the representation for the pointwise error as a sum of the form
with the Fourier coefficients bn (vx ) and an (wx ) of the 2π-periodic functions
so that the Fourier coefficients (bn (vx ))n∈Z and (an (wx ))n∈Z are a zero se-
quence, respectively, whereby the pointwise convergence of (Fn f )(x) to f (x)
at x would follow.
Now we are in a position where we can, from our above investigations,
formulate a sufficient condition for f ∈ C2π which guarantees pointwise con-
vergence of (Fn f )(x) to f (x) at x ∈ R.
Proof. First note that the function gx in (6.39) can only have singularities at
σk = 2πk, for k ∈ Z. Now we analyze the behaviour of gx around zero, where
we find
6.3 Convergence of Fourier Partial Sums 211
f (x + σ) − f (x) f (x + σ) − f (x) σ
lim gx (σ) = lim = lim · lim
σ→0 σ→0 2 sin(σ/2) σ→0 σ σ→0 2 sin(σ/2)
= f (x),
Now let us return to the uniform convergence of Fourier partial sums, where
the following question is of particular importance.
Question: Can we, under mild as possible conditions on f ∈ C2π \ C2π
1
, prove
statements concerning uniform convergence of the Fourier partial sums Fn f ?
To answer this question, we need to analyze the norm Fn ∞ of the
Fourier operator Fn with respect to the maximum norm · ∞ . To this end,
we first derive a suitable representation for the operator norm
Fn f ∞
Fn ∞ := sup for n ∈ N0 , (6.40)
f ∈C2π \{0} f ∞
Theorem 6.49. The norm of the Fourier operator Fn has the representation
where
2 π
1 π sin((n + 1/2)σ)
λn := |Dn (σ)| dσ = dσ (6.42)
π π sin(σ/2)
0 0
Fn f ∞ ≤ f ∞ · λn
6
Marquis de L’Hôpital (1661-1704), French mathematician
7
Henri Léon Lebesgue (1875-1941), French mathematician
212 6 Asymptotic Results
Fn f − f ∞ −→ 0 for n → ∞,
Fn f ∞ ≤ Fn f − f ∞ + f ∞ .
Indeed, if the norms Fn ∞ are not uniformly bounded from above, then
there must be at least one f ∈ C2π yielding divergence Fn f ∞ −→ ∞ for
n → ∞, in which case the sequence of error norms Fn f − f ∞ must be
divergent, i.e., Fn f − f ∞ −→ ∞ for n → ∞.
4 1
n−1
=
π2 k+1
k=0
4
≥ 2 log(n + 1), (6.45)
π
where we have used the estimate
n−1
1
≥ log(n + 1) for all n ∈ N
k+1
k=0
in (6.45).
On the other hand, we have for the integrand in (6.42) the estimates
⎡ ⎤
sin((n + 1/2)σ) 1 n
n
= 2 ⎣ + cos(jσ) ⎦ = 1 + 2 cos(jσ) ≤ 1 + 2n,
sin(σ/2)
2 j=1 j=1
Let us first quote the Banach8 -Steinhaus9 theorem, a well-known result from
functional analysis, before we draw relevant conclusions. We will not prove the
Banach-Steinhaus theorem, but rather refer the reader to the textbook [33].
Ln : B1 −→ B2 for n ∈ N
Then, the uniform boundedness principle holds for the operators Ln , i.e.,
Fn f − f ∞ −→ ∞ for n → ∞.
Fn f ∞ −→ ∞ for n → ∞.
Proof. The function space C2π , equipped with the maximum norm · ∞ , is
a Banach space. By the divergence Fn ∞ = λn −→ ∞ for n → ∞, there is
one f ∈ C2π with Fn f ∞ −→ ∞ for n → ∞. Indeed, otherwise this would
contradict the Banach-Steinhaus theorem. Now the estimate
Fn f − f ∞ ≥ Fn f ∞ − f ∞
Next, we show the norm minimality of the Fourier operator Fn among all
surjective projection operators onto the linear space of trigonometric poly-
nomials Tn . The following result dates back to Charshiladse-Losinski.
L∞ ≥ Fn ∞ .
G∞ ≤ L∞ .
Case 2: For |j| > n, we have Fn eij· (x) = 0. Moreover, the function
e is orthogonal to the trigonometric polynomial L eij· (x − s) ∈ TnC .
ijs
for n → ∞ with respect to the maximum norm ·∞ . According to the Weier-
strass theorems, Corollaries 6.12 and 6.17, we can rely on the convergence
η∞ (f, Tn ) −→ 0 and η∞ (f, Pn ) −→ 0 for n → ∞.
In this section, we quantify the asymptotic decay of the zero sequences
(η∞ (f, Tn ))n∈N0 and (η∞ (f, Pn ))n∈N0 for n → ∞.
We begin our analysis with the trigonometric case, i.e., with the asymp-
totic behaviour of (η∞ (f, Tn ))n∈N0 . On this occasion, we first recall the con-
vergence rates of the Fourier partial sums Fn f for f ∈ C2π . By the estimate
η∞ (f, Tn ) ≤ Fn f − f ∞ for n ∈ N0
we expect for f ∈ C2π
k
, k ≥ 1, at least the convergence rate k − 1, according
to Corollary 6.47. However, as it turns out, we gain even more. In fact, we
will obtain the convergence rate k, i.e.,
η∞ (f, Tn ) = O(n−k ) for n → ∞ for f ∈ C2π
k
.
Note that this complies with the convergence behaviour of Fourier partial
sums Fn f with respect to the Euclidean norm · . Indeed, in that case, we
have, by Theorem 6.44, the asymptotic behaviour
η(f, Tn ) = o(n−k ) for n → ∞ for f ∈ C2π
k
.
For an intermediate conclusion, we note one important principle:
The smoother f ∈ C2πk
is, i.e., the larger k ∈ N, the faster the convergence
of the minimal distances η(f, Tn ) and η∞ (f, Tn ) to zero, for n → ∞.
10
Georg Faber (1877-1966), German mathematician
218 6 Asymptotic Results
where a0 = (f, 1), aj = (f, cos(j·)) and bj = (f, sin(j·)), for 1 ≤ j ≤ n, are
the Fourier coefficients of f in (6.29). Then we have, for f ∈ C2π 1
, the error
representation
⎡ ⎤
π n j
1 ⎣ξ + (−1)
(Ln f − f )(x) = Aj sin(jξ)⎦ f (x + π − ξ) dξ. (6.55)
π −π 2 j=1 j
the estimate
η∞ (f, Tn ) ≤ Ln f − f ∞
π
1 ξ
n
(−1) j
≤ f ∞ · + Aj sin(jξ) dξ
π −π 2 j=1 j
π
1 n
2(−1) j
= f ∞ · ξ + A sin(jξ) dξ
π 0 j
j
j=1
1 π2
= f ∞ · ·
π 2(n + 1)
π
= f ∞ · ,
2(n + 1)
220 6 Asymptotic Results
where in the second line we use the error representation (6.55). Moreover,
in the penultimate line we choose optimal coefficients A1 , . . . , An according
to (6.53).
ξ (−1)j
n
g(ξ) := + Aj sin(jξ)
2 j=1 j
for the first factor of the integrand in (6.55). This way we obtain
1 π
g(ξ)f (x + π − ξ) dξ
π −π
ξ=π
1 1 π
= − g(ξ)f (x + π − ξ) + g (ξ)f (x + π − ξ) dξ
π ξ=−π π −π
1π 1π 1 π
=− f (x) − f (x + 2π) + g (x + π − σ)f (σ) dσ
π2 π2 π −π
π
1
= −f (x) + g (x + π − σ)f (σ) dσ
π −π
after integration by parts from the error representation (6.55). Now we have
g (x + π − σ)
1 (−1)j
n
= + Aj · j · cos(j(x + π − σ))
2 j=1 j
1
n
= + (−1)j Aj [cos(j(x + π)) cos(jσ) + sin(j(x + π)) sin(jσ)]
2 j=1
1
n
= + (−1)j Aj (−1)j (cos(jx) cos(jσ) + sin(jx) sin(jσ))
2 j=1
1
n
= + Aj [cos(jx) cos(jσ) + sin(jx) sin(jσ)]
2 j=1
and so
a0
π n
1
g (x + π − σ)f (σ) dσ = + Aj [aj cos(jx) + bj sin(jx)]
π −π 2 j=1
= (Ln f )(x),
must necessarily change signs at the points ξk = kπ/(n + 1) ∈ (0, π), for
1 ≤ k ≤ n. Indeed, this is because the function sgn(sin((n + 1)ξ)) has sign
changes on (0, π) only at the points ξ1 , . . . , ξn .
Note that this requirement yields n conditions on the sought coefficients
a1 , . . . , an ∈ R, where these conditions are the interpolation conditions
n
ξk = aj sin(jξk ) for 1 ≤ k ≤ n. (6.59)
j=1
But the interpolation problem (6.59) has a unique solution, since the trigono-
metric polynomials sin(j·), 1 ≤ j ≤ n, form a Haar system on (0, π) (see
Exercise 5.54).
222 6 Asymptotic Results
Proof. The integrand in (6.60) is an even function. Now we regard the integral
in (6.60) on [−π, π] (rather than on [0, π]). By using the identity
1 ijξ
sin(jξ) = e − e−ijξ
2i
it is sufficient to show
π
Ij := eijξ · sgn(sin((n + 1)ξ)) dξ = 0 for 1 ≤ |j| < n + 1. (6.61)
−π
= −eijπ/(n+1) · Ij
holds. Since −eijπ/(n+1) = 1, this implies Ij = 0 for 1 ≤ |j| < n + 1.
We wish to work with weaker conditions on f (i.e., weaker than f ∈ C2π
1
Remark 6.69. Note that the modulus of continuity ω(f, δ) quantifies the
local distance between the function values of f uniformly on [a, b]. In fact,
the smaller the modulus of continuity ω(f, δ), the smaller is the local variation
of f on [a, b]. For a compact interval [a, b] ⊂ R, the modulus of continuity
ω(f, δ) of f ∈ C [a, b] is finite by
ω(f, δ) −→ 0 for δ 0.
ω(f, δ) ≤ δ · f ∞ .
The following Jackson theorem gives an upper bound for the minimal
distance η∞ (f, Tn ) by involving the modulus of continuity of f ∈ C2π .
Remark 6.71. The estimate of Jackson 3, Theorem 6.70, is not sharp. For
more details, we refer to Exercise 6.97.
6.4 The Jackson Theorems 225
η∞ (f, Tn ) ≤ T ∗ (ϕδ ) − f ∞
≤ T ∗ (ϕδ ) − ϕδ ∞ + ϕδ − f ∞
π 1
≤ · · ω(f, 2δ) + ω(f, δ)
2(n + 1) 2δ
π
≤ ω(f, 2δ) +1 .
4δ(n + 1)
the zero sequence (η∞ (f, Tn ))n∈N0 . Our perception matches with the result
of the following Jackson theorem.
Tn = {T ∈ C2π | T ∈ Tn } ⊂ Tn
and this explains our notation Tn . By Tn ⊂ Tn , we find the estimate
we have T = T ∗ and so
(T − f ) ∞ = T ∗ − f ∞ = η∞ (f , Tn ).
But this implies, by using Jackson 1, Theorem 6.59, the stated estimate:
π π
η∞ (f, Tn ) = η∞ (T − f, Tn ) ≤ · (T − f ) ∞ = · η∞ (f , Tn ).
2(n + 1) 2(n + 1)
a0
n
(Ln f )(x) = + Ak (ak cos(kx) + bk sin(kx)),
2
k=1
for f ∈ C2π
k
, where k ≥ 1.
Now we return to the discussion from the outset of this section concer-
ning the uniform convergence of Fourier partial sums. In that discussion, we
developed the error estimate (6.49),
Fn f − f ∞ −→ 0 for n → ∞,
(c) If f ∈ C2π
k
, for k ≥ 1, then we have (by Jackson 4)
3π · L
η∞ (f, Pn ) ≤ .
2(n + 1)
We split the proof of Jackson 5, Theorem 6.77, into several lemmas. The
following lemma reveals the structural connection between the trigonometric
and the algebraic case.
η∞ (f, Pn ) = η∞ (g, Tn ).
Proof. For f ∈ C [−1, 1] the function g ∈ C2π is even. Therefore, the unique
best approximation T ∗ ∈ Tn to g is even, so that we have
n
(f, Tk )w
Πn f = Tk
Tk 2w
k=0
Πn f − f ∞ −→ 0 for n → ∞,
6.5 Exercises
Exercise 6.82. Prove the following results.
(a) Show that for a set of n + 1 pairwise distinct interpolation points
a ≤ x 0 < . . . < xn ≤ b
(n)
with the Bernstein polynomials βj (x) = n
j xj (1 − x)n−j , for 0 ≤ j ≤ n.
Show that, for any f ∈ C [0, 1], the sequence ((Bn f ) )n∈N0 of derivatives
of Bn f converges uniformly on [0, 1] to f , i.e.,
lim Bn (f ) − f ∞ = 0.
n→∞
6.5 Exercises 233
where pt (x) = 0, if and only if t = x. Then, for any sequence (Ln )n∈N of
linear positive operators Ln : C (Ω) −→ C (Ω) satisfying
Conclude from this the statement of the Korovkin theorem, Theorem 6.11.
T e n = un for all n ∈ Z.
at equidistant knots
b−a
xj,n = a + j for j = 0, . . . , n
n
and weights
b
1
αj,n = Lj,n (x) dx for j = 0, . . . , n,
b−a a
where {L0,n , . . . , Ln,n } ⊂ Pn are the Lagrange basis functions for the knot set
Xn = {x0,n , . . . , xn,n } (cf. the discussion on Lagrange bases in Section 2.3).
Show that there is a continuous function f ∈ C [a, b], for which the se-
quence of Newton-Cotes approximations ((Qn f ))n∈N diverges.
Hint: Apply the Kuzmin15 theorem, according to which the sum of the
weights’ moduli |αj,n | diverges, i.e.,
n
|αj,n | −→ ∞ for n → ∞.
j=0
is also sharp.
14
Roger Cotes (1682-1716), English mathematician
15
Rodion Ossijewitsch Kuzmin (1891-1949), Russian mathematician
236 6 Asymptotic Results
Exercise 6.97. The estimate of Jackson 3, Theorem 6.70, is not sharp. Show
that the estimate
π
η∞ (f, Tn ) ≤ ω f, for f ∈ C2π
n+1
is sharp (under the assumptions and with the notations in Theorem 6.70).
Hint: Apply the theorem of de La Vallée Poussin from Exercise 6.96.
Exercise 6.99. Prove part (c) of the Dini-Lipschitz theorem, Theorem 6.81,
in two steps as follows. First show that, for any f ∈ C 1 [−1, 1], the sequence
(Πn f )n∈N0 of Chebyshev partial sums
n
(f, Tj )w
Πn f = Tj where Tj = cos(j arccos(·)) ∈ Pj
j=0
Tj 2w
lim Πn f − f ∞ = 0.
n→∞
16
Charles-Jean de La Vallée Poussin (1866-1962), Belgian mathematician
7 Basic Concepts of Signal Approximation
The second half of this chapter is devoted to wavelets. Wavelets are popu-
lar and powerful tools of modern mathematical signal processing, in particular
for the approximation of functions f ∈ L2 (R). A wavelet approximation to f
is essentially based on a multiresolution of L2 (R), i.e., on a nested sequence
· · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · ⊂ Vj−1 ⊂ Vj ⊂ · · · ⊂ L2 (R) (7.2)
of closed scale spaces Vj ⊂ L2 (R). The nested sequence in (7.2) leads us to
stable approximation methods, where f is represented on different frequency
bands by orthogonal projectors Πj : L2 (R) −→ Vj . More precisely, for a fixed
scaling function ϕ ∈ L2 (R), the scale spaces Vj ⊂ L2 (R) in (7.2) are generated
by dilations and translations of basis functions ϕjk (x) := 2j/2 ϕ(2j x − k), for
j, k ∈ Z, so that
Wj = span{ψkj | k ∈ Z} for j ∈ Z.
The basic construction of wavelet approximations to f ∈ L2 (R) is based
on refinement equations of the form
ϕ(x) = hk ϕ(2x − k) and ψ(x) = gk ϕ(2x − k),
k∈Z k∈Z
with the frequency ω = 2π/T and with the complex Fourier coefficients
T T /2
1 −ijωξ 1
cj = fT (ξ)e dξ = fT (ξ)e−ijωξ dξ (7.4)
T 0 T −T /2
where the T -periodic signal fT is assumed to coincide on (−T /2, T /2) with f .
Moreover, we regard the function
T /2
gT (ω) := fT (ξ)e−iωξ dξ
−T /2
from (7.6). We remark that the infinite series in (7.8) is a Riemannian sum
on the knot sequence {wj }j∈Z . Note that the mesh width Δω of the sequence
{wj }j∈Z is, for large enough T > 0, arbitrarily small. This observation leads
us, via the above-mentioned limit in (7.7), to the function
∞
g(ω) := lim gT (ω) = f (ξ)e−iωξ dξ for ω ∈ R. (7.9)
T →∞ −∞
Proposition 7.2. The Fourier transform F : L1 (R) −→ C (R) has the fol-
lowing properties, where we assume f ∈ L1 (R) for all statements (a)-(e).
(a) For fx0 := f (· − x0 ), where x0 ∈ R, we have
(c) For the conjugate complex f¯ ∈ L1 (R), where f¯(x) = f (x), we have
d
(Ff )(ω) = −i(F(xf ))(ω) for all ω ∈ R
dω
under the assumption xf ∈ L1 (R).
Now Cc (R) is dense in L1 (R), so that for any f ∈ L1 (R) and ε > 0 there is
one g ∈ Cc (R) satisfying f − gL1 (R) < ε. From this, the statement follows
from the estimate (7.11), whereby
F : L1 (R) −→ C0 (R).
Proposition 7.6. For f, g ∈ L1 (R) both functions fˆg and f ĝ are integrable.
Moreover, we have
fˆ(x)g(x) dx = f (ω)ĝ(ω) dω. (7.15)
R R
Proof. Since the functions fˆ and ĝ are continuous and bounded, respectively,
both functions fˆg and f ĝ are integrable. By using the Fubini2 theorem, we
can conclude
−ixω
f (ω)ĝ(ω) dω = f (ω) g(x)e dx dω
R R R
−ixω
= f (ω)e dω g(x) dx = fˆ(x)g(x) dx.
R R R
Example 7.7. For α > 0, let α = χ[−α,α] be the indicator function of the
compact interval [−α, α] ⊂ R. Then,
1
(F1 )(ω) = e−ixω dx = 2 · sinc(ω) for ω ∈ R
−1
sin(ω)/ω for ω = 0
sinc(ω) :=
1 for ω = 0
1.5
0.5
−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
1.5
0.5
−0.5
−25 −20 −15 −10 −5 0 5 10 15 20 25
Fig. 7.1. The sinc function yields the Fourier transform of the function 1.
7.1 The Continuous Fourier Transform 245
gα (x) = e−αx
2
for x ∈ R
for α > 0 by
−αx2 −ixω
e−α(x
2
g;
α (ω) = e e dx = +ixω/α)
dx
R R
2
−α x2 + ixω
α +( 2α )
iω 2
eα( 2α ) dx
iω
= e
R
iω 2
=e e−α(x+ 2α ) dx
−ω 2 /(4α)
R
<
π −ω2 /(4α)
= ·e ,
α
Ff ∞
FL1 (R)→C0 (R) = sup = 1.
f ∈L1 (R)\{0} f L1 (R)
From the result of Proposition 7.9, we can draw the following conclusion.
Corollary 7.10. Let (fn )n∈N be a convergent sequence in L1 (R) with limit
f ∈ L1 (R). Then, the corresponding sequence (fˆn )n∈N of Fourier transforms
Ffn = fˆn ∈ C0 (R) converges uniformly on R to fˆ.
Proof. The statement follows immediately from the estimate
fˆn − fˆ∞ = F(fn − f )∞ ≤ F · fn − f L1 (R) = fn − f L1 (R) ,
Remark 7.13. Due to Proposition 7.12, the Banach space L1 (R) is closed
under the convolution product ∗, i.e., we have f ∗ g ∈ L1 (R) for f, g ∈ L1 (R).
Moreover, for f, g ∈ L1 (R), we have the identity
(f ∗ g)(x) = f (x − y)g(y) dy = f (y)g(x − y) dy = (g ∗ f )(x)
R R
Due to Proposition 7.12 and Remark 7.13, we can apply the Fourier trans-
form F to the convolution of two L1 -functions. As we show now, the Fourier
transform F(f ∗ g) of the convolution f ∗ g, for f, g ∈ L1 (R), coincides with
the algebraic product of their Fourier transforms Ff and Fg.
where we used the properties (a) and (b) in Definition 7.17. Note that the
function hy := g − g(· − y) satisfies, for any y ∈ R, the estimate
Now we split the outer integral in (7.22) into a sum of two terms, which
we estimate uniformly from above by (7.23), so that we have, for any ρ > 0,
g − g ∗ δk L1 (R)
ρ
≤ δk (y) |hy (x)| dx dy + δk (y) |hy (x)| dx dy
−ρ R R\(−ρ,ρ) R
≤ 2 · |supp(g)| · g ∞ · ρ + 2 · |supp(g)| · g∞ δk (y) dy
R\(−ρ,ρ)
≤ 4·K ·M ·ρ
by using property (c) in Definition 7.17. For ε > 0 we have g−g∗δk L1 (R) < ε,
for all k ≥ N , with assuming ρ < ε/(4KM ), Therefore, g ∈ Cc1 (R) can
be approximated arbitrarily well in L1 (R) by convolutions g ∗ δk . Finally,
Cc1 (R) is dense in L1 (R), which implies the stated L1 -convergence in (7.21)
for f ∈ L1 (R).
250 7 Basic Concepts of Signal Approximation
Now we turn to the Fourier inversion formula. At the outset of Section 7.1,
we derived the representation (7.8) for periodic functions. We can transfer
the inversion formula (7.8) from the discrete case to the continuous case. This
motivates the following definition.
f = F −1 Ff
Proof. In the following proof, we utilize the Dirac sequence (δk )k∈N of Gauss
functions from Example 7.18. For δk in (7.20), the identity (7.17) yields the
representation
k 1
e−y /2 · eikxy dy = e−ω /(2k ) · eixω dω
2 2 2
δk (x) = (7.26)
2π R 2π R
for all k ∈ N. This in turn implies
1
e−ω /(2k ) · ei(x−y)ω dω dy
2 2
(f ∗ δk )(x) = f (y)
R 2π R
1
f (y) · e−iyω dy e−ω /(2k ) · eixω dω
2 2
=
2π R R
1
f (ω) · e−ω /(2k ) · eixω dω,
2 2
= ˆ (7.27)
2π R
where, for changing order of integration, we applied the dominated conver-
gence theorem with the dominating function |f (y)|e−ω .
2
Remark 7.22. According to Remark 7.5, the Fourier transform maps any
f ∈ L1 (R) to a continuous function fˆ ∈ C0 (R). Therefore, by the Fourier
inversion formula, Theorem 7.21, there exists for any f ∈ L1 (R) satisfying
fˆ ∈ L1 (R) a continuous representative f˜ ∈ L1 (R), which coincides with f
almost everywhere on R (i.e., f ≡ f˜ in the L1 -sense), and for which the
Fourier inversion formula holds on R.
In the following discussion, we will often apply the Fourier inversion for-
mula to continuous functions f ∈ L1 (R) ∩ C (R). By the following result, we
can in this case drop the assumption fˆ ∈ L1 (R) (see Exercise 7.64).
holds.
Remark 7.25. Every function f ∈ S(R) and all of its derivatives f (k) , for
k ∈ N, are rapidly decaying to zero around infinity, i.e., for any (complex-
valued) polynomial p ∈ P C and for any k ∈ N0 , we have
p(x)f (k) (x) −→ 0 for |x| → ∞.
Therefore, all derivatives f (k) of f ∈ S(R), for k ∈ N, are also contained
in S(R). Obviously, we have the inclusion S(R) ⊂ L1 (R), and so f ∈ S(R)
and all its derivatives f (k) , for k ∈ N, are absolutely integrable, i.e., we have
f (k) ∈ L1 (R) for all k ∈ N0 .
Typical examples of elements in the Schwartz space S(R) are C ∞ func-
tions with compact support. Another example is the Gauss function gα , for
α > 0, from Example 7.8. Before we give further examples of functions in the
Schwartz space S(R), we first note a few observations.
According to Remark 7.25, every function f ∈ S(R) and all its derivatives
f (k) , for k ∈ N, have a Fourier transform. Moreover, for f ∈ S(R) and
k, ∈ N0 , we have the representations
d
(Ff )(ω) = (−i) (F(x f ))(ω) for all ω ∈ R
dω
(Ff (k) )(ω) = (iω)k (Ff )(ω) for all ω ∈ R,
as they directly follow (by induction) from Proposition 7.2 (d)-(e) (see Exer-
cise 7.59). This yields the uniform estimate
k
k d d
ω
dω (Ff )(ω) ≤ dxk (x f (x)) 1 for all ω ∈ R.
(7.29)
L (R)
i.e., all functions ω k (Ff )() (ω), for k, ∈ N0 , are bounded. Therefore, we see
that the Fourier transform Ff of any f ∈ S(R) is also contained in S(R).
By the Fourier inversion formula, Theorem 7.25, the Fourier transform F is
bijective on S(R). We reformulate this important result as follows.
Theorem 7.26. The Fourier transform F : S(R) −→ S(R) is an automor-
phism on the Schwartz space S(R), i.e., F is linear and bijective on S(R).
Now we make an important example for a family of functions that are
contained in the Schwartz space S(R). To this end, we recall the Hermite
polynomials Hn from Section 4.4.3 and their associated Hermite functions
hn from Exercise 4.42.
Example 7.27. The Hermite functions
Proposition 7.28. The Hermite functions (hn )n∈N0 in (7.30) are a com-
plete orthogonal system in the Hilbert space L2 (R).
Proof. The orthogonality of (hn )n∈N0 follows from the orthogonality of the
Hermite polynomials in Theorem 4.28. According to (4.47), we have
√
(hm , hn ) = 2n n! π · δmn for all m, n ∈ N0 . (7.32)
Now we show the completeness of the system (hn )n∈N0 . To this end, we
use the completeness criterion in Theorem 6.26, as follows.
Suppose that f ∈ L2 (R) satisfies (f, hn ) = 0 for all n ∈ N0 . Then, we
consider the function g : C −→ C, defined as
g(z) = h0 (x)f (x)e−ixz dx for z ∈ C.
R
R dx
x=R
= lim −e−ixω hn (x) x=−R + (−iω + x)e−ixω hn (x) dx
R→∞ R
;n (ω) + xh
= −iω h =n (ω).
holds with the initial values h−1 ≡ 0 and h0 (x) = exp(−x2 /2). By using the
recursion Hn (x) = 2nHn−1 (x), for n ∈ N, from Corollary 4.30, we get
d −x2 /2
hn (x) = · Hn (x) = −x · e−x /2 · Hn (x) + e−x /2 · Hn (x)
2 2
e
dx
= −xhn (x) + e−x /2 (2nHn−1 (x))
2
We close this section by the following remark.
Remark 7.31. The Fourier operator F : L2 (R) −→ L2 (R) is uniquely de-
termined by the properties in Theorem 7.30. Moreover, we remark that the
Fourier transform F : L1 (R) −→ C0 (R) maps any f ∈ L1 (R) to a unique
uniformly continous function Ff ∈ C0 (R). In contrast, the Fourier transform
F : L2 (R) −→ L2 (R) maps any f ∈ L2 (R) to a function Ff ∈ L2 (R) that is
merely almost everywhere unique.
pointwise for all x ∈ R. Note that we have applied the Fourier inversion
formula of the Plancherel theorem, Theorem 7.30, to obtain (7.38) and (7.40).
Finally, we remark that the interchange of integration and summation
in (7.39) is valid by the Parseval identity
π
1
g(ω)h(ω) dω = cj (g) · cj (h) for all g, h ∈ L2 [−π, π],
2π −π
j∈Z
which completes our proof for the stated reconstruction formula in (7.37).
7
Raymond Paley (1907-1933), English mathematician
8
Norbert Wiener (1894-1964), US-American mathematician
7.4 The Multivariate Fourier Transform 257
Remark 7.35. By the Shannon sampling theorem, Theorem 7.34, any band-
limited function f ∈ L1 (R) ∩ C (R), or, f ∈ L2 (R) with bandwidth L > 0
can uniquely be reconstructed from its values on the uniform sampling grid
{jd | j ∈ Z} ⊂ R for all sampling rates d ≤ π/L. Therefore, the optimal
sampling rate is d∗ = π/L, and this rate corresponds to half of the smallest
wave length 2π/L that is present in the signal f . The optimal sampling rate
d∗ = π/L is called the Nyquist rate (or, Nyquist distance).
Remark 7.37. The Shannon sampling theorem is, in its different variants,
also connected with the names of Nyquist9 , Whittaker10 , and Kotelnikov11 . In
fact, Kotelnikov had formulated and published the sampling theorem already
in 1933, although his work was widely unknown for a long time. Shannon
formulated the sampling theorem in 1948, where he used this result as a
starting point for his theory on maximal channel capacities.
9
Harry Nyquist (1889-1976), US-American electrical engineer
10
Edmund Taylor Whittaker (1873-1956), British astronomer, mathematician
11
Vladimir Kotelnikov (1908-2005), Russian pioneer of information theory
258 7 Basic Concepts of Signal Approximation
appearing in the Fourier transform’s formulas (7.41) and (7.42) we can, via
generalize the results for the univariate case, d = 1, to the multivariate case,
d ≥ 1. In the following of this section, we merely quote results that are
needed in Chapters 8 and 9. Of course, the Fourier inversion formula from
Theorem 7.21 is of central importance.
holds.
gα (x) = e−α
2
x 2 for x ∈ Rd and α > 0
is π d/2
e−
2
(Fd gα )(ω) = ω 2 /(4α) for ω ∈ Rd .
α
♦
As for the univariate case, in Theorem 7.14 and Corollary 7.16, the Fourier
convolution theorem holds for the multivariate Fourier transform.
By following along the lines of Section 7.2, we can transfer the multivariate
Fourier transform Fd : L1 (Rd ) −→ C0 (Rd ) to the Hilbert space
8
L2 (Rd ) = f : Rd −→ C |f (x)|2 dx < ∞
Rd
and the Euclidean norm · L2 (Rd ) = (·, ·)1/2 . To this end, we first introduce
the Fourier transform Fd on the Schwartz space
8
∞
d k d
S(R ) = f ∈ C (R ) x · f (x) is bounded for all k, ∈ N0
d d
dx
of all rapidly decaying C ∞ functions. As for the univariate case, Theorem 7.26,
the Fourier transform Fd is bijective on S(Rd ).
Theorem 7.44. The multivariate Fourier transform Fd : S(Rd ) −→ S(Rd )
is an automorphism on the Schwartz space S(Rd ).
This implies the Plancherel theorem, as in Theorem 7.30 for d = 1.
Theorem 7.45. (Plancherel theorem).
The Fourier transform Fd : S(Rd ) −→ S(Rd ) can uniquely be extended to
a bounded and bijective linear mapping on the Hilbert space L2 (Rd ). The
extended Fourier transform Fd : L2 (Rd ) −→ L2 (Rd ) has the following pro-
perties.
(a) The Parseval identity
(Fd f, Fd g) = (2π)d (f, g) for all f, g ∈ L2 (Rd ),
holds, so that in particular
Fd f L2 (Rd ) = (2π)d/2 f L2 (Rd ) for all f ∈ L2 (Rd ).
(b) The Fourier inversion formula
Fd−1 (Fd f ) = f for all f ∈ L2 (Rd )
holds on L2 (R), i.e.,
f (x) = (2π)−d fˆ(ω)eix,ω dω for almost every x ∈ Rd .
Rd
(c) For any j, k ∈ Z, the wavelet function ψkj has compact support, where
1.5
-1 -1
1
-1 0
0.5
-0.5
-1
-1.5
-2
1.5 0 0 0 0
1
-2 -1 0 1
0.5
-0.5
-1
-1.5
-2
2 1 1 1 1 1 1 1 1
1.5 -4 -3 -2 -1 0 1 2 3
1
0.5
-0.5
-1
-1.5
-2
Fig. 7.2. The Haar wavelet ψ = ψ00 generates the functions ψkj = 2j/2 ψ(2j · −k).
7.5 The Haar Wavelet 263
Proof. According to Proposition 7.47 (b), any ψkj has unit L2 -norm.
Now suppose that ψkj and ψm , are, for j, k, , m ∈ Z, distinct.
Case 1: If j = m, then k = . In this case, the intersection of the support
intervals of ψkj and ψm contains at most one point, according to Proposi-
tion 7.47 (c), so that (ψkj , ψm ) = 0.
Case 2: If j = m, then we assume m > j (without loss of generality). In
this case, we either have, for = 2m−j k, . . . , 2m−j (k + 1) − 1,
supp(ψkj ) ∩ supp(ψm ) = ∅,
so that
(ψkj , ψm ) = ±2 j/2
ψm (x) dx = 0.
supp(ψm )
ϕ = χ[0,1)
(b) For any j, k ∈ Z, the function ϕjk has compact support, where
ϕj−1
k = 2−1/2 (ϕj2k + ϕj2k+1 ) for all j, k ∈ Z (7.48)
−1/2
ψkj−1 =2 (ϕj2k − ϕj2k+1 ) for all j, k ∈ Z (7.49)
hold.
Proof. Property (a) follows from the scale-invariance of the wavelet basis,
of subspaces in L2 (R).
Now we study further properties of the nested sequence (Vj )j∈Z . To this
end, we work with the orthogonal projection operator Πj : L2 (R) −→ Vj ,
for j ∈ Z, which assigns every f ∈ L2 (R) to its unique best approximation
s∗j = Πj f in L2 (R). According to our discussion in Section 6.2, we have the
series representation
Πj f = (f, ϕjk )ϕjk ∈ Vj for f ∈ L2 (R) (7.52)
k∈Z
Πj f − f −→ 0 for j → ∞.
Proof. Let ε > 0 and f ∈ L2 (R). Then there is, for a (sufficiently fine) dyadic
decomposition of R, a step function T ∈ L2 (R) with T −f < ε/2. Moreover,
for the indicator functions χI j of the dyadic intervals Ikj := [2−j k, 2−j (k +1)),
k
we have the reproduction property Πj χI j = χI j , for all k ∈ Z. Therefore,
k k
there is a level index j0 ∈ Z with T = Πj T for all j ≥ j0 . From this, we can
conclude statement (a) by the estimate
= 2j (c−1 χI j + c0 χI j ),
−1 0
where c−1 = (g, ϕj−1 ) and c0 = (g, ϕj0 ). Then, we have Πj g2 = 2j (c2−1 + c20 )
and, moreover, Πj g < ε/2 for j ≡ j(ε) ∈ Z small enough. For this j, we
finally get
Theorem 7.54. The system (Vj )j∈Z of scale spaces Vj in (7.50) forms a
multiresolution analysis of L2 (R) by satisfying the following conditions.
(a) The scale spaces in (Vj )j∈Z are nested, so that the inclusions (7.51) hold.
>
(b) The system (Vj )j∈Z is complete in L2 (R), i.e., 2
? L (R) = j∈Z Vj .
(c) The system (Vj )j∈Z satisfies the separation j∈Z Vj = {0}.
for the orthogonality relation between Wj−1 and Vj−1 . In this way, the lin-
ear scale space Vj is by (7.53) decomposed into a smooth scale space Vj−1
7.5 The Haar Wavelet 267
Wj = span{ψkj | k ∈ Z} for j ∈ Z.
(ψkj−1 , ϕj−1
)=0 for all k, ∈ Z. (7.56)
This representation follows directly from Theorem 6.21 (b) and Theorem 7.55.
Now we organize the representation (7.57) for f ∈ L2 (R) on multiple
wavelet scales. Our starting point for doing so is the multiresolution analysis
of L2 (R) in Theorem 7.54. For simplification we suppose supp(f ) ⊂ [0, 1]. We
approximate f on the scale space Vj , for j ∈ N, by the orthogonal projectors
Πj : L2 (R) −→ Vj , given as
N −1
Πj f = cjk ϕjk ∈ Vj for f ∈ L2 (R), (7.58)
k=0
N/2−1
⊥
Πj−1 f= dj−1
k ψk
j−1
for f ∈ L2 (R), (7.60)
k=0
where dj−1
k := (f, ψkj−1 ), for k = 0, . . . , N/2 − 1.
By (7.58) and (7.60), the identity (7.59) can be written in the basis form
N −1
N/2−1
N/2−1
cjk ϕjk = dj−1
k ψk
j−1
+ cj−1 j−1
k ϕk . (7.61)
k=0 k=0 k=0
j−1
Πj f = Πr⊥ f + Π0 f for f ∈ L2 (R), (7.62)
r=0
In the next level, the vector cj−1 ∈ RN/2 is decomposed into the vec-
tors cj−2 ∈ RN/4 and dj−2 ∈ RN/4 . The resulting recursion is called the
pyramid algorithm. The decomposition scheme of the pyramid algorithm is
represented as follows.
T = T1 · T2 · . . . · Tj−1 · Tj ∈ RN ×N (7.67)
T −1 = Tj−1 · Tj−1
−1
· . . . · T2−1 · T1−1 = TjT · Tj−1
T
· . . . · T2T · T1T ∈ RN ×N
of T in (7.67), so that
7.6 Exercises 271
cj = TjT · . . . · T1T · d.
The discrete wavelet analysis and the discrete wavelet synthesis are as-
sociated with the terms discrete wavelet transform (wavelet analysis) and
inverse discrete wavelet transformation (wavelet synthesis).
Due to the orthogonality of the matrices Tj−r in (7.66), the wavelet trans-
form is numerically stable, since
Moreover, the complexity of the wavelet transform is only linear, since the j
decomposition steps (for r = 0, 1, . . . , j − 1) require altogether
operations.
7.6 Exercises
Exercise 7.56. Show that the Fourier transform fˆ : R −→ C,
fˆ(ω) = f (x)e−ixω dx for ω ∈ R,
R
Exercise 7.57. Consider the Banach space (L1 (R), ·L1 (R) ) and the Hilbert
space (L2 (R), · L2 (R) ). Show that neither the inclusion L1 (R) ⊂ L2 (R) nor
the inclusion L2 (R) ⊂ L1 (R) holds. Make a (non-trivial) example for a linear
space S satisfying S ⊂ L1 (R) and S ⊂ L2 (R).
Exercise 7.59. Prove the following statements for the Fourier transform F.
(a) For the Fourier transform of the k-th derivative f (k) of f , we have
dk
(Ff )(ω) = (−i)k (F(xk f ))(ω) for all ω ∈ R
dω k
under the assumption xk f ∈ L1 (R).
272 7 Basic Concepts of Signal Approximation
Exercise 7.60. Conclude from the results in Exercise 7.59 the statement:
”f ∈ L1 (R) is smooth, if and only if Ff has rapid decay around infinity”.
Be more precise on this and quantify the decay and the smoothness of f .
Exercise 7.64. Prove for f ∈ L1 (R) ∩ C (R) the Fourier inversion formula
1
fˆ(ω) · eixω e−ε|ω| dω
2
f (x) = lim for all x ∈ R,
ε 0 2π R
W0 = span{ψ(· − k) | k ∈ Z}
Wj = span{ψkj | k ∈ Z} for j ∈ Z
S = span{s1 , . . . , sn } ⊂ C (Ω)
VB,X · c = fX
SX = span{K(·, xj ) | xj ∈ X} ⊂ C (Ω),
For the sake of unique interpolation, in Problem 8.1, and with assuming (8.5),
the matrix AK,X must necessarily be regular. Indeed, this follows directly
from Theorem 5.23. In the following discussion, we wish to construct conti-
nuous functions K : Ω × Ω −→ R, such that AK,X is symmetric positive
definite for all finite sets X of interpolation points, in which case AK,X would
be regular. Obviously, the matrix AK,X is symmetric, if the function K is
symmetric, i.e., if K(x, y) = K(y, x) for all x, y ∈ Rd . The requirement
for AK,X to be positive definite leads us to the notion of positive definite
functions. Since we allow arbitrary parameter domains Ω ⊂ Rd , we will from
now restrict ourselves (without loss of generality) to the case Ω = Rd .
1 for j = k
j (xk ) = δjk = for all 1 ≤ j, k ≤ n. (8.7)
0 for j = k
Therefore, the Lagrange basis functions are also often referred to as cardinal
interpolants. We can represent the elements of the Lagrange basis { 1 , . . . , n }
as follows.
where
where ·, · denotes the usual inner product on the Euclidean space Rn .
Proof. For x = xj , the right hand side R(xj ) in (8.8) coincides with the j-th
column of AK,X , and so the j-th unit vector ej ∈ Rn is the unique solution
of the linear equation system (8.8), i.e.,
The following fundamental result is due to Bochner1 who studied in [8] posi-
tive (semi-)definite functions of one variable. We can make use of the Bochner
theorem in [8] to prove suitable characterizations for multivariate positive
definite functions.
Theorem 8.7. (Bochner, 1932).
Suppose that Φ ∈ C (Rd )∩L1 (Rd ) is an even function. If the Fourier transform
Φ̂ of Φ is positive on Rd , Φ̂ > 0, then Φ is positive definite on Rd , Φ ∈ PDd .
Proof. For Φ ∈ C (Rd ) ∩ L1 (Rd ), the Fourier inversion formula
−d
Φ(x) = (2π) Φ̂(ω)eix,ω dω
Rd
Φ(x) = e−
2
x 2 for x ∈ Rd
Φ̂(ω) = π d/2 e−
2
ω 2 /4 > 0,
Now that we have provided three explicit examples for positive definite
(radial) functions, we remark that the characterization of Bochner’s theorem
allows us to construct even larger classes of positive definite functions. This
is done by using convolutions. Recall that for any pair f, g ∈ L1 (Rd ) of
functions, the Fourier transform maps the convolution product f ∗g ∈ L1 (Rd ),
(f ∗ g)(x) = f (x − y)g(y) dy for f, g ∈ L1 (Rd )
Rd
f
∗ g = fˆ · ĝ for f, g ∈ L1 (Rd )
f
∗ f ∗ = fˆ · fˆ = |fˆ|2 for f ∈ L1 (Rd ).
Proof. For Ψ ∈ L1 (Rd )\{0}, we have Φ ∈ L1 (Rd )\{0}, and so Φ̂ ∈ C (Rd )\{0}.
Moreover, the Fourier transform Φ̂ = |Ψ̂ |2 of the autocorrelation Φ = Ψ ∗ Ψ ∗
is, due to the Fourier convolution theorem, Theorem 7.43, non-negative, so
that Φ ∈ PDd , due to Remark 8.8.
The practical value of the construction resulting from Corollary 8.12 is,
however, rather limited. This is because the autocorrelations Ψ ∗Ψ ∗ are rather
awkward to evaluate. To avoid numerical integration, one would prefer to
work with explicit (preferably simple) analytic expressions for positive defi-
nite functions Φ = Ψ ∗ Ψ ∗ .
We remark that the basic idea of Corollary 8.12 has led to the construc-
tion of compactly supported positive definite (radial) functions, dating back to
8.2 Native Reproducing Kernel Hilbert Spaces 283
earlier Göttingen works of Schaback & Wendland [62] (in 1993), Wu [74] (in
1994), and Wendland [71] (in 1995). In their constructions, explicit formulas
were given for autocorrelations Φ = Ψ ∗Ψ ∗ , whose generators Ψ (x) = ψ(x2 ),
x ∈ Rd , are specific radially symmetric and compactly supported functions
ψ : [0, ∞) −→ R. This has provided a large family of continuous, radially
symmetric, and compactly supported functions Φ = Ψ ∗ Ψ ∗ , as they were
later popularized by Wendland [71], who used the radial characteristic func-
tions of Example 8.11 for Ψ to obtain piecewise polynomial positive definite
compactly supported radial functions of minimal degree. For further details
concerning the construction of compactly supported positive definite radial
functions, we refer to the survey [61] of Schaback.
n
s(x) = cj K(x, xj ) (8.15)
j=1
n
s(x) ≡ sλ (x) := λy K(x, y) for λ = c j δxj (8.16)
j=1
1/2
By · K := (·, ·)K , L is a Euclidean space. Likewise, via the duality relation
in (8.16), we can equip S with the inner product
(sλ , sμ )K := (λ, μ)K for sλ , sμ ∈ S (8.18)
1/2
and the norm · K = (·, ·)K . Note that the normed linear spaces S and L
are isometric isomorphic, S ∼
= L, via the linear bijection λ −→ sλ and by the
norm isometry
λK = sλ K for all λ ∈ L. (8.19)
Before we study the topology of the spaces L and S in more detail, we first
discuss a few concrete examples for inner products and norms of elements in
L and S.
Example 8.13. For any pair of point evaluation functionals δz1 , δz2 ∈ L,
with z1 , z2 ∈ Rd , their inner product is given by
(δz1 , δz2 )K = δzx1 δzy2 K(x, y) = K(z1 , z2 ) = Φ(z1 − z2 ).
Moreover, for the norm of any δz ∈ L, z ∈ Rd , we obtain
δz 2K = (δz , δz )K = δzx δzy K(x, y) = K(z, z) = Φ(0) = 1,
with using the normalization Φ(0) = 1, as introduced in Remark 8.6. Likewise,
we have
(K(·, z1 ), K(·, z2 ))K = K(z1 , z2 ) = Φ(z1 − z2 ) (8.20)
for all z1 , z2 ∈ Rd and
K(·, z)K = δz K = 1 for all z ∈ Rd .
♦
8.2 Native Reproducing Kernel Hilbert Spaces 285
To extend this first elementary example, we regard, for a fixed point set
X = {x1 , . . . , xn } ⊂ Rd , the linear bijection operator G : Rn −→ SX , defined
as
n
G(c) = cj K(·, xj ) = c, R(x) for c = (c1 , . . . , cn )T ∈ Rn . (8.21)
j=1
where
c, d AK,X := cT AK,X d for c, d ∈ Rn
denotes the inner product generated by the positive definite matrix AK,X . In
particular, G is an isometry by
n
(G(c), G(d))K = cj dk (K(·, xj ), K(·, xk ))K = cT AK,X d = c, d AK,X
j,k=1
Proposition 8.15. For any finite point set X = {x1 , . . . , xn } ⊂ Rd , the dual
operator G∗ : SX −→ Rn of G in (8.21), characterized by the relation
is given as
G∗ (s) = sX for s ∈ SX .
for all c ∈ Rn , in which case the assertion follows directly from (8.22).
Next, we compute inner products and norms for the Lagrange basis func-
tions 1 , . . . , n of SX . The following proposition yields an important result
concerning our subsequent stability analysis of the interpolation method.
286 8 Kernel-based Approximation
( j, k )K = a−1
jk for all 1 ≤ j, k ≤ n,
where A−1 −1
K,X = (ajk )1≤j,k≤n ∈ R
n×n
. In particular, the norm of j ∈ SX is
j 2K = a−1
jj for all 1 ≤ j ≤ n.
( j, k )K = eTj A−1 −1 T −1 −1
K,X AK,X AK,X ek = ej AK,X ek = ajk
for all 1 ≤ j, k ≤ n.
From Example 8.13 and Proposition 8.16, we see that the matrices
are Gramian, i.e., the entries of the symmetric positive definite matrices AK,X
and A−1
K,X are represented by inner products, respectively.
D∼
= F,
|μ(sλ )| = |μx λy K(x, y)| = |(μ, λ)K | ≤ μK · λK = μK · sλ K .
Now we are in a position where we can show that the positive definite function
K ∈ PDd is the (unique) reproducing kernel for the Hilbert space F ≡ FK .
To this end, we rely on the seminal works [45, 46, 47] of Madych and Nelson.
holds, with the inner products (·, ·)K : L × L −→ R and (·, ·)K : S × S −→ R
in (8.17) and in (8.18). By continuous extension of the representation (8.24)
from L to D and from S to F we already obtain the statement in (8.23).
Remark 8.25. Any dual functional λ ∈ D is, according to (8.23) and in the
sense of the Fréchet-Riesz representation theorem, Theorem 8.21, uniquely
represented by the element sλ = λy K(·, y) ∈ F.
for f ∈ F, and by
In this section, we prove further results that directly follow from the Madych-
Nelson theorem, Theorem 8.24. As we show, the proposed Lagrange interpo-
lation method is optimal in two different senses.
F = SX ⊕ {f ∈ F | fX = 0} , (8.25)
⊥
where SX = {f ∈ F | fX = 0} is the orthogonal complement of SX in F.
For f ∈ F and the unique interpolant s ∈ SX to f on X satisfying
sX = fX , the Pythagoras theorem holds, i.e.,
gX = 0 =⇒ g ⊥ SX
⊥
holds. But this implies f − s ⊥ SX , or f − s ∈ SX , since (f − s)X = 0.
Therefore, the stated decomposition with the direct sum in (8.25) holds by
⊥
f = s + (f − s) ∈ SX ⊕ SX
and, moreover,
Moreover, Corollary 8.28 implies that the interpolant has minimal variation.
IX f K
IX K = sup .
f ∈F \{0} f K
IX K = 1.
Remark 8.32. By the stability property (8.27) in Theorem 8.31, the pro-
posed interpolation method has minimal condition number w.r.t. · K .
292 8 Kernel-based Approximation
holds, where the norm εx K of the error functional can be written as
The error estimate in (8.30) is sharp, where equality holds for the function
so that (8.30) follows directly from (8.34) and the Cauchy-Schwarz inequality.
We compute the norm of the error functional εx in (8.29) by
(cf. Example 8.13), where we use the representation in (8.8). The upper bound
for εx K in (8.32) follows from the positive definiteness of AK,X .
Finally, for the function fx in (8.33) equality holds in (8.30), since we get
|εx (fx )| = |(εyx K(·, y), fx )K | = (fx , fx )K = (εx , εx )K = εx K · fx K
from the Madych-Nelson theorem, and so the estimate in (8.30) is sharp.
Finally, we show the pointwise optimality of the interpolation method.
To this end, we regard quasi-interpolants of the form
n
s = T
fX = j f (xj ) for = ( 1, . . . , n)
T
∈ Rn
j=1
()
For the norm εx K we have, like in (8.31), the representation
x K = 1 − 2
ε() 2 T
R(x) + T
AK,X .
()
Now let us minimize the norm εx K under variation of the coefficients
∈ Rn . This leads us directly to the unconstrained optimization problem
ε()
x K = 1 − 2
2 T
R(x) + T
AK,X −→ minn ! (8.36)
∈R
whose unique solution is the solution to the linear system AK,X = R(x).
But this already implies the pointwise optimality, that we state as follows.
Corollary 8.34. Let X = {x1 , . . . , xn } ⊂ Rd and x ∈ Rd . Then, the point-
wise error functional εx in (8.29) is norm-minimal among all error func-
tionals of the form (8.35), where
εx K < ε()
x K for all ∈ Rn with AK,X = R(x),
i.e., εx is the unique solution to the optimization problem (8.36).
for all s, s̃ ∈ SX .
Proof. For s, s̃ ∈ SX we have the Lagrange representations
n
n
s(x) = sX , (x) = s(xj ) j (x) and s̃(x) = s̃X , (x) = s̃(xk ) k (x)
j=1 k=1
according to (8.9) in Proposition 8.4. From this and Proposition 8.16, we get
n
n
(s, s̃)K = s(xj )s̃(xk )( j , k )K = s(xj )s̃(xk )a−1
jk = sX , s̃X A−1 ,
K,X
j,k=1 j,k=1
FΩ := span {K(·, y) | y ∈ Ω} ⊂ F
i.e., f ∈ FΩ . Finally,
hX,Ω := sup min y − x2 (8.40)
y∈Ω x∈X
X1 ⊂ X2 ⊂ X3 ⊂ . . . ⊂ Xn ⊂ . . . ⊂ Ω (8.41)
y − xn 2 ≤ hXn ,Ω −→ 0 for n → ∞.
Moreover, we have
2
ηK (K(·, y), SXn ) ≤ K(·, xn ) − K(·, y)2K = 2 − 2K(y, xn ) −→ 0
(j)
For any yj ∈ Y , 1 ≤ j ≤ N , we take a sequence (xn )n∈N ⊂ Ω of interpolation
(j) (j)
points xn ∈ Xn satisfying yj − xn 2 ≤ hXn ,Ω . Moreover, we consider the
functions
N
sc,n = n ) ∈ SXn
cj K(·, x(j) for n ∈ N.
j=1
Then, we have
N
ηK (fc,Y , SXn ) ≤ sc,n − fc,Y K =
n ) − K(·, yj )
cj K(·, x(j)
j=1
K
N
≤ |cj | · K(·, x(j)
n ) − K(·, yj )K −→ 0 for n → ∞.
j=1
SΩ := {fc,Y ∈ SY | |Y | < ∞} ⊂ FΩ .
of the interpolant
n+1
(n+1)
sn+1 = cj K(·, xj ) ∈ SXn+1
j=1
For the special case of kernel-based interpolation from finite data, we can
characterize Riesz bases in a rather straightforward manner: For a finite set
X = {x1 , . . . , xn } ⊂ Rd of pairwise distinct interpolation points, the basis
functions BX = {K(·, xj )}nj=1 ⊂ SX are (obviously) a Riesz basis of SX ,
where we have the Riesz stability estimate
2
n
σmin (AK,X )c2 ≤ cj K(·, xj )
≤ σmax (AK,X )c2
2 2
(8.53)
j=1
K
the representation
Therefore, the stated Riesz stability estimate in (8.53) holds by the Courant6 -
Fischer7 theorem, which should be familiar from linear algebra. In fact, ac-
cording to the Courant-Fischer theorem, the minimal eigenvalue σmin (A) and
the maximal eigenvalue σmax (A) of a symmetric matrix A can be represented
by the minimal and the maximal Rayleigh8 quotient, respectively, i.e.,
c, Ac c, Ac
σmin (A) = min and σmax (A) = max .
c∈Rn \{0} c, c c∈Rn \{0} c, c
By Theorem 6.31, any Riesz basis B has a unique dual Riesz basis B̃.
Now let us determine the dual Riesz basis of BX = {K(·, xj )}nj=1 ⊂ SX . To
this end, we rely on the results from Section 6.2.2. By Theorem 6.31, we can
identify the Lagrange basis of SX as dual to BX , i.e., B̃X = { 1 , . . . , n } ⊂ SX .
for all s ∈ SX .
Due to Theorem 6.31, the Lagrange basis B̃X = { j }nj=1 ⊂ SX is the uniquely
determined dual Riesz basis of BX = {K(·, xj )}nj=1 ⊂ SX .
Moreover, by Proposition 8.36, the representation
2
n
f (xj ) j T −1
= fX A−1 for all fX ∈ Rn
2
= fX AK,X fX
j=1 K,X
K
6
Richard Courant (1888-1972), German-US American mathematician
7
Ernst Sigismund Fischer (1875-1954), Austrian mathematician
8
John William Strutt, 3. Baron Rayleigh (1842-1919), English physicist
304 8 Kernel-based Approximation
σmin (A−1 T −1 −1
K,X )fX 2 ≤ fX AK,X fX ≤ σmax (AK,X )fX 2
2 2
hold for all fX ∈ Rn . This implies the stability estimate in (8.55), where
−1
σmax (AK,X ) = σmin (A−1
K,X ) and −1
σmin (AK,X ) = σmax (A−1
K,X ).
From the Riesz duality relation between the bases BX = {K(·, xj )}nj=1 and
B̃X = { j }nj=1 , in combination with Theorem 6.31, in particular with (6.22),
we can conclude another important result.
hold.
from Proposition 8.16. On the other hand, we have (f, K(·, xj ))K = f (xj ) by
the reproduction property of the kenel K, for all 1 ≤ j ≤ n.
For our subsequent analysis, we equip C (Ω) with the maximum norm
· ∞ . Moreover, for any set of interpolation points X = {x1 , . . . , xn } ⊂ Ω,
we denote by IX : C (Ω) −→ SX the interpolation operator for X, which
assigns every function f ∈ C (Ω) to its unique interpolant s ∈ SX satisfying
s X = fX .
n
Λ∞ := max | j (x)| = max (x)1 , (8.58)
x∈Ω x∈Ω
j=1
i.e., IX ∞ = Λ∞ .
n
IX f ∞ = s∞ ≤ max | j (x)| · |f (xj )| ≤ Λ∞ · f ∞ ,
x∈Ω
j=1
n
n
IX g∞ ≥ (IX g)(x∗ ) = j (x
∗
)g(xj ) = | j (x∗ )| = Λ∞
j=1 j=1
Proof. We first prove the upper bound in (8.59). To this end, we assume that
the maximum in (8.58) is attained at x∗ ∈ Ω. Then, from Example 8.13 and
Proposition 8.16, we get the first upper bound in (8.59) by
n
n
Λ∞ = | j (x∗ )| = |δx∗ ( j )|
j=1 j=1
n
n n A
≤ δx∗ K · j K = j K = a−1
jj .
j=1 j=1 j=1
a−1 −1
jj ≤ σmax (AK,X ) for all 1 ≤ j ≤ n.
To make a compromise between the data error in (8.60) and the smooth-
ness in (8.61), we consider in the following of this section the minimization
of the cost functional Jα : S −→ R, defined as
The term αJ(s) in (8.62) is called the regularization term, which penalizes
non-smooth elements s ∈ R, that are admissible for the optimization problem.
Moreover, the regularization parameter α > 0 is used to balance between the
data error ηX (f, s) and the smoothness J(s) of s.
Therefore, we can view the approximation method of this section as a
regularization method (see Section 2.2). According to the jargon of approxi-
mation theory, the proposed method of this section is also referred to as
penalized least squares approximation (see, e.g. [30]).
According to Remark 4.2, the best approximation s∗α is unique and, moreover,
characterized by the orthogonality condition
(sα )X = AX,Y cα ∈ RN ,
and so we obtain the normal equation in (8.69): On the one hand, the left
hand side in (8.69) can be written as
1 1
(f − sα )X , R(yk ) = RT (yk )fX − RT (yk )AX,Y cα
N N
1
= eTk ATX,Y fX − eTk ATX,Y AX,Y cα .
N
On the other hand, the right hand side in (8.69) can be written as
of the kernel K.
1
N
1
(sα − f )X 22 + αsα 2K = |sα (xk ) − f (xk )|2 + αsα 2K
N N
k=1
1
N
≤ |sf (xk ) − f (xk )|2 + αsf 2K
N
k=1
1
≤ εx 2K · f 2K + αf 2K
N
x∈X\Y
⎛ ⎞
1
=⎝ εx 2K + α⎠ f 2K
N
x∈X\Y
N −n
≤ + α f 2K ≤ (1 + α)f 2K ,
N
where we use the pointwise error estimate in (8.30) along with the uniform
estimate εx K ≤ 1 in (8.32).
Next, we analyze the sensitivity of problem (Pα ) under variation of the
smoothing parameter α ≥ 0. To this end, we first observe that the solution
sα ≡ sα (f ) of problem (Pα ) coincides with that of the target function s0 , i.e.,
sα (s0 ) = sα (f ).
Lemma 8.56. For any α ≥ 0, the solution sα ≡ sα (f ) of (Pα ) satisfies the
following properties.
(a) The Pythagoras theorem, i.e.,
sα K ≤ s0 K .
8.7 Exercises
Exercise 8.59. Let K : Rd × Rd −→ R be a continuous symmetric function,
for d > 1. Moreover, suppose that for some n ∈ N all symmetric matrices of
the form
AK,X = (K(xk , xj ))1≤j,k≤n ∈ Rn×n ,
for sets X = {x1 , . . . , xn } ⊂ Rd of n pairwise distinct points, are regular.
Show that all symmetric matrices AK,X ∈ Rn×n are positive definite, as
soon as there is one point set Y = {y1 , . . . , yn } ⊂ Rd for which the matrix
AK,Y ∈ Rn×n is symmetric positive definite.
Hint: Proof of the Mairhuber-Curtis theorem, Theorem 5.25.
Sc,X ≡ 0 =⇒ c=0
(k)
by using the n linear conditions Sc,X (0) = 0, for 0 ≤ k < n. Finally, to
prove the assertion for the multivariate case, d > 1, use the separation of the
components in eixj ,ω , for ω = (ω1 , . . . , ωd )T ∈ Rd and 1 ≤ j ≤ n.
Exercise 8.62. Let K ∈ PDd . Show that the native space norm · K of
the Hilbert space F ≡ FK is stronger than the maximum norm · ∞ , i.e.,
if a sequence (fn )n∈N of functions in F converges w.r.t. · K to f ∈ F, so
that fn − f K −→ 0 for n → ∞, then (fn )n∈N does also converge w.r.t. the
maximum norm · ∞ to f , so that fn − f ∞ −→ 0 for n → ∞.
Exercise 8.63. Let H be a Hilbert space of functions with reproducing ker-
nel K ∈ PDd . Show that H is the native Hilbert space of K, i.e., FK = H.
Hint: First, show the inclusion FK ⊂ H. Then, consider the direct sum
H = FK ⊕ G
around zero, for some r > 0 and some C > 0. Show that in this case, every
f ∈ F ≡ FK is globally Hölder continuous with Hölder exponent α/2, i.e.,
α/2
|f (x) − f (y)| ≤ Cx − y2 for all x, y ∈ Rd .
around zero, for some r > 0 and C > 0. Moreover, for compact Ω ⊂ Rd , let
(Xn )n∈N be a nested sequence of subsets Xn ⊂ Ω, as in (8.41), whose mono-
tonically decreasing fill distances hXn ,Ω are a zero sequence, i.e., hXn ,Ω 0
for n → ∞. Show for f ∈ FΩ the uniform convergence
8.7 Exercises 315
α/2
sf,Xn − f ∞ = O hXn ,Ω for n → ∞.
Determine from this result the convergence rate for the special case of the
Gauss kernel in Example 8.9.
of the Cholesky factor L̄n+1 in (8.49) is positive. To this end, first show the
representation
1 − SnT Dn−1 Sn = εxn+1 ,Xn 2K ,
where εxn+1 ,Xn is the error functional in (8.29) at xn+1 ∈ Xn+1 \ Xn with
respect to the set of interpolation points Xn .
9 Computerized Tomography
xE
xD
Ω
Fig. 9.1. X-ray beam travelling from emitter xE to detector xD along [xE , xD ] ⊂ Ω.
dI(x)
= −f (x)I(x) (9.1)
dx
the rate of change for the X-ray intensity I(x) at x is quantified by the factor
f (x), where f (x) is referred to as the attenuation-coefficient function. There-
fore, the attenuation-coefficient function f (x) yields the energy absorption on
the computational domain Ω, and so f (x) represents an important material
property of the scanned medium.
In the following of this chapter, we are interested in the reconstruction
of f (x). To this end, we further study the differential equation (9.1). By
integrating (9.1) along the straight line segment [xE , xD ], we determine the
loss of intensity (or, the loss of energy) of the X-ray beam on [xE , xD ] by
xD xD
dI(x)
=− f (x)dx. (9.2)
xE I(x) xE
n⊥
θ = − sin(θ), cos(θ)
nθ = cos(θ), sin(θ)
•
x = t cos(θ), t sin(θ)
x
t,θ
Fig. 9.2. Representation of straight line t,θ ⊂ R2 by coordinates (t, θ) ∈ R × [0, π).
For the parameterization of a straight line t,θ , for (t, θ) ∈ R × [0, π), we
use the standard point-vector representation, whereby any point (x, y) ∈ t,θ
in t,θ is uniquely represented as a linear combination of the form
320 9 Computerized Tomography
(x, y) = t · nθ + s · n⊥
θ (9.5)
n⊥
θ = (− sin(θ), cos(θ)),
or
x cos(θ) − sin(θ) t t
= · = Qθ · (9.6)
y sin(θ) cos(θ) s s
with the rotation matrix Qθ ∈ R2×2 . The inverse of the orthogonal matrix
Qθ is given by the rotation matrix Q−θ = QTθ , whereby the representation
cos(θ) sin(θ) x t x
· = = QTθ · (9.7)
− sin(θ) cos(θ) y s y
follows immediately from (9.6). Moreover, (9.6), or (9.7), yields the relation
t2 + s2 = x2 + y 2 , (9.8)
that are assumed to be known for all straight lines t,θ , (t, θ) ∈ R × [0, π).
3
Johann Radon (1887-1956), Austrian mathematician
9.1 The Radon Transform 321
For any function f ∈ L1 (R2 ), the line integral in (9.9) is, for any coordi-
nate pair (t, θ) ∈ R × [0, π), defined as
f (x, y) dx dy = f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) ds, (9.11)
t,θ R
where we use the coordinate transform in (9.6) with the arc length element
9
(ẋ(s), ẏ(s))2 ds = (− sin(θ))2 + (cos(θ))2 ds = ds
Before we turn to the solution of Problem 9.5, we first give some elemen-
tary examples of Radon transforms. We begin with the indicator function
(i.e., the characteristic function) of the disk Br = {x ∈ R2 | x2 ≤ r}, r = 0.
1 for x2 + y 2 ≤ r2 ,
f (x, y) = χBr (x, y) :=
0 for x2 + y 2 > r2 ,
1 for t2 + s2 ≤ r2 ,
f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) =
0 for t2 + s2 > r2 .
322 9 Computerized Tomography
Note that Rf (t, θ) = 0, if and only if the straight line t,θ does not intersect
with the interior of the disk Br , i.e, if and only if |t| ≥ r. Otherwise, i.e., for
|t| < r, we obtain by
√
r 2 −t2 9
Rf (t, θ) = f (x, y) d(x, y) = √ 1 ds = 2 r 2 − t2
t,θ − r 2 −t2
Remark 9.8. For any radially symmetric function f (·) = f (·2 ), the Radon
transform Rf (t, θ) does only depend on t ∈ R, but not on the angle θ ∈ [0, π).
Indeed, in this case, we have the identity
Rf (t, θ) = f (x2 ) dx = f (Qθ x2 ) dx = f (x2 ) dx
t,θ t,0 t,0
= Rf (t, 0)
1.2
0.8
0.6
0.4
0.2
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Fig. 9.3. Bull’s eye and its Radon transform (see Example 9.9).
324 9 Computerized Tomography
Example 9.9. The phantom bull’s eye is given by the linear combination
3 1
f (x, y) = χB3/4 (x, y) − χB1/2 (x, y) + χB1/4 (x, y) (9.12)
4 4
of three indicator functions χBr of the disks Br , for r = 3/4, 1/2, 1/4. To
compute Rf , we apply the linearity of operator R, whereby
3 1
Rf (t, θ) = (RχB3/4 )(t, θ) − (RχB1/2 )(t, θ) + (RχB1/4 )(t, θ). (9.13)
4 4
Due to the radial symmetry of f (or, of χBr ), the Radon transform Rf (t, θ)
does depend on t, but not on θ (cf. Remark 9.8). Now we can use the result
of Example 9.6 to represent the Radon transform Rf in (9.13) by linear com-
bination of the Radon transforms RχBr , for r = 3/4, 1/2, 1/4. The phantom
f and its Radon transform Rf are shown in Figure 9.3. ♦
t = x cos(θ) + y sin(θ),
see (9.7), and so this condition on t is also sufficient. Therefore, only the
straight lines
x cos(θ)+y sin(θ),θ for θ ∈ [0, π)
contain the point (x, y). This observation leads us to the following definition
for the back projection operator.
Remark 9.12. The back projection B is not the inverse of the Radon trans-
form R. To see this, we make a simple counterexample. We consider the indi-
cator function f := χB1 ∈ L1 (R2 ) of the unit ball B1 = {x ∈ R2 | x2 ≤ 1},
whose (non-negative) Radon transform
√
2 1 − t2 for |t| ≤ 1,
Rf (t, θ) =
0 for |t| > 1
Fig. 9.5. The Shepp-Logan phantom f and its back projection B(Rf ).
9.2 The Filtered Back Projection 327
The following result will lead us directly to the inversion of the Radon
transform. In fact, the Fourier slice theorem (also often referred to as central
slice theorem) is an important result in Fourier analysis.
Theorem 9.14. (Fourier slice theorem). For f ∈ L1 (R2 ), we have
F2 f (S cos(θ), S sin(θ)) = F1 (Rf )(S, θ) for all S ∈ R, θ ∈ [0, π). (9.14)
Proof. For f ≡ f (x, y) ∈ L1 (R2 ), we consider the Fourier transform
F2 f (S cos(θ), S sin(θ)) = f (x, y)e−iS(x cos(θ)+y sin(θ)) dx dy (9.15)
R R
at (S, θ) ∈ R × [0, π). By the variable transformation (9.6), the right hand
side in (9.15) can be represented as
f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ))e−iSt ds dt,
R R
or, as
f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) ds e−iSt dt.
R R
Note that the inner integral coincides with the Radon transform Rf (t, θ).
But this already implies the stated identity
F2 f (S cos(θ), S sin(θ)) = Rf (t, θ)e−iSt dt = F1 (Rf )(S, θ).
R
328 9 Computerized Tomography
is, for L > 0, the indicator function of the interval [−L, L], and we let := 1 .
Example 9.18. The Ram-Lak filter FRL is given by the window
so that
|S| for |S| ≤ L,
FRL (S) = |S| · L (S) =
0 for |S| > L.
The Ram-Lak filter is shown in Figure 9.6 (a). ♦
Example 9.19. The Shepp-Logan filter FSL is given by the window
so that
sin(πS/(2L)) 2L
· | sin(πS/(2L))| for |S| ≤ L,
FSL (S) = |S| · · L (S) = π
πS/(2L) 0 for |S| > L.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
Fig. 9.6. Three commonly used low-pass filters (see Examples 9.18-9.20).
9.3 Construction of Low-Pass Filters 331
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(a) β = 0.5
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(b) β = 0.6
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(c) β = 0.7
Fig. 9.7. The Hamming filter Fβ for β ∈ {0.5, 0.6, 0.7} (see Example 9.21).
332 9 Computerized Tomography
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(a) α = 2.5
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(b) α = 5.0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
(c) α = 10.0
Fig. 9.8. The Gauss filter Fα for α ∈ {2.5, 5.0, 10.0} (see Example 9.22).
9.3 Construction of Low-Pass Filters 333
so that
|S| · cos(πS/(2L)) for |S| ≤ L,
FCF (S) = |S| · cos(πS/(2L)) · L (S) =
0 for |S| > L.
Note that the Hamming filter Fβ is a combination of the Ram-Lak filter FRL
and the cosine filter FCF . The Hamming filter Fβ is shown in Figure 9.7, for
β ∈ {0.5, 0.6, 0.7}. ♦
The Gauss filter Fα is shown in Figure 9.8, for α ∈ {2.5, 5.0, 10.0}. ♦
For θ ∈ [0, π) and functions g(·, θ), h(·, θ) ∈ L1 (R), the convolution g ∗ h
between g and h is defined as
(g ∗ h)(T, θ) = g(T − t, θ)h(t, θ) dt for T ∈ R.
R
Theorem 9.24. For h ∈ L1 (R×[0, π)) and f ∈ L1 (R2 ), we have the relation
B (h ∗ Rf ) (X, Y ) = (Bh ∗ f ) (X, Y ) for all (X, Y ) ∈ R2 . (9.19)
Proof. For the right hand side in (9.19), we obtain the representation
(Bh ∗ f ) (X, Y )
= (Bh)(X − x, Y − y)f (x, y) dx dy
R R
π
1
= h((X − x) cos(θ) + (Y − y) sin(θ), θ) dθ f (x, y) dx dy.
π R R 0
By variable transformation on (x, y) by (9.6) and dx dy = ds dt, we obtain
1 π
(Bh ∗ f ) (X, Y ) = h(X cos(θ) + Y sin(θ) − t, θ)(Rf )(t, θ) dt dθ
π 0 R
1 π
= (h ∗ Rf )(X cos(θ) + Y sin(θ), θ) dθ
π 0
= B(h ∗ Rf )(X, Y )
for all (X, Y ) ∈ R2 .
Theorem 9.24 and (9.18) provide a very useful representation for fF ,
where we use the inverse Fourier transform F1−1 F of the filter F by
(F1−1 F )(t, θ) := (F −1 F )(t) for t ∈ R and θ ∈ [0, π)
as a bivariate function.
Corollary 9.25. Let f ∈ L1 (R2 ). Moreover, let F be a filter satisfying
F1−1 F ∈ L1 (R × [0, π)). Then, the representation
1
fF (x, y) = B(F1−1 F ) ∗ f (x, y) = (KF ∗ f ) (x, y), (9.20)
2
holds, where
1
B F1−1 F (x, y)
KF (x, y) :=
2
denotes the convolution kernel of the low-pass filter F .
9.4 Error Estimates and Convergence Rates 335
Remark 9.26. The statement of Corollary 9.25 does also hold without the
assumption F1−1 F ∈ L1 (R×[0, π)), see [5]. Therefore, in Section 9.4, we apply
Corollary 9.25 without any assumptions on the low-pass filter F .
where
(1 − W (S))2
Φα,W (L) := sup 2 2 α
for L > 0. (9.23)
S∈[−1,1] (1 + L S )
4
Sergei Lvovich Sobolev (1908-1989), Russian mathematician
336 9 Computerized Tomography
1
f − f ∗ KF 2L2 (R2 ) = Ff − Ff · FKF 2L2 (R2 ;C)
4π 2
1
= Ff − WL · Ff 2L2 (R2 ;C) , (9.24)
4π 2
where for the scaled window WL (S) := W (S/L), S ∈ R, we used the identity
(see Exercise 9.44). Since supp(WL ) ⊂ [−L, L], we can split the square error
in (9.24) into a sum of two integrals,
1
Ff − WL · Ff 2L2 (R2 ;C)
4π 2
1
= |(Ff − WL · Ff )(x, y)|2 d(x, y) (9.26)
4π 2 (x,y) 2 ≤L
1
+ 2 |Ff (x, y)|2 d(x, y). (9.27)
4π (x,y) 2 >L
Finally, the sum of the two upper bounds in (9.29) and in (9.28) yields the
stated error estimate in (9.22).
9.4 Error Estimates and Convergence Rates 337
Remark 9.28. For the Ram-Lak filter from Example 9.18, we have W ≡ 1
on [−1, 1], and so Φα,W ≡ 0. In this case, Theorem 9.27 yields the error
estimate
(1 − W (S))2
Φα,W (L) = max α −→ 0 for L → ∞. (9.30)
S∈[0,1] (1 + L2 S 2 )
∗
Proof. Let Sα,W,L ∈ [0, 1] be the smallest maximum on [0, 1] for the function
(1 − W (S))2
Φα,W,L (S) := α for S ∈ [0, 1].
(1 + L2 S 2 )
∗
Case 1: Suppose Sα,W,L is uniformly bounded away from zero, i.e., we
∗
have Sα,W,L ≥ c > 0 for all L > 0, for some c ≡ cα,W > 0. Then,
2
∗
∗
1 − W (Sα,W,L ) 1 − W 2∞,[−1,1]
0 ≤ Φα,W,L Sα,W,L = ∗ α ≤ α −→ 0
1 + L2 (Sα,W,L )2 (1 + L2 c2 )
holds for L → ∞.
∗
Case 2: Suppose Sα,W,L −→ 0 for L → ∞. Then, we have
∗ 2
∗
1 − W (Sα,W,L ) ∗ 2
0≤ Φα,W,L Sα,W,L = ∗ α ≤ 1 − W (Sα,W,L ) −→ 0,
1+ L2 (Sα,W,L )2
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.5 0 0.5 1
Fig. 9.9. Parallel beam geometry. Regular distribution of 110 Radon lines tj ,θk ,
for N = 10 angles θk , 2M + 1 = 11 Radon lines per angle, at sampling rate d = 0.2.
is given by
−1 1 (Lt) · sin(Lt) 2 · sin2 (Lt/2)
F FRL (t) = − for t ∈ R. (9.34)
π t2 t2
The evaluation of F −1 FRL at tj = jd, with sampling rate d = π/L > 0, yields
340 9 Computerized Tomography
⎧
⎨
L2 /(2π) for j = 0,
−1
F FRL (πj/L) = 0 0 even,
for j = (9.35)
⎩
−2L2 /(π 3 · j 2 ) for j odd.
Proof. The inverse Fourier transform F −1 FRL of the even function FRL is
given by the inverse cosine transform
L
1
F −1 FRL (t) = S · cos(tS) dS.
π 0
The evaluation of F −1 FSL at tj = jd, with sampling rate d = π/L > 0, yields
4L2
F −1 FSL (πj/L) = . (9.36)
π 3 (1 − 4j 2 )
9.5 Implementation of the Reconstruction Method 341
For the inverse Fourier transforms of the remaining filters F from Exam-
ples 9.20-9.22 we refer to Exercise 9.43.
π
M
(F1−1 F ∗ Rf )(tm , θk ) ≈ um−j · vj for m ∈ Z (9.39)
L
j=−M
π
M
hik := F −1 F ((i − j)π/L) · Rf (tj , θk )
L j=−M 1
1
N −1
fnm := Ih(xn cos(θk ) + ym sin(θk ), θk ).
2N
k=0
(a) 2460 Radon lines (b) 15150 Radon lines (c) 60300 Radon lines
9.6 Exercises
Exercise 9.33. Prove for f ∈ L1 (R2 ) the estimate
to conclude
Rf ∈ L1 (R × [0, π)) for all f ∈ L1 (R2 ),
i.e., for f ∈ L1 (R2 ), we have
Exercise 9.35. Show that the Radon transform Rf of f ∈ L1 (R2 ) has com-
pact support, if f has compact support.
Does the converse of this statement hold? I.e., does f ∈ L1 (R2 ) necessarily
have compact support, if supp(Rf ) is compact?
Exercise 9.36. Recall the rotation matrix Qθ ∈ R2×2 in (9.6) and the unit
vector nθ = (cos(θ), sin(θ))T ∈ R2 , for θ ∈ [0, π), respectively.
Prove the following properties for the Radon transform Rf of f ∈ L1 (R2 ).
Exercise 9.37. Show that for a radially symmetric function f ∈ L1 (R2 ), the
backward projection B(Rf ) of Rf is radially symmetric.
Now consider the indicator function f = χB1 of the unit ball B1 and its
Radon transform Rf from Example 9.6. Show that the backward projection
B(Rf ) of Rf is positive on the open annulus
√ 3 √ 4
R1 2 = x ∈ R2 1 < x2 < 2 ⊂ R2 .
Exercise 9.39. Show that the backward projection B is (up to factor π) the
adjoint operator of the Radon transform R. To this end, prove the relation
for the initial value ∧0 := and where, moreover, the positive scaling factor
αk > 0 in (9.46) is chosen, such that supp(∧k ) = [−1, 1].
In this exercise, we construct a spline filter of second order.
(a) Show that the initial value ∧0 yields the Ram-Lak filter, i.e., F0 ≡ FRL .
(b) Show that the scaling factor αk > 0 in (9.46) is, for any k ∈ N, uniquely
determined by the requirement supp(∧k ) = [−1, 1].
(c) Show that ∧1 generates by F1 the spline filter from Exercise 9.40.
Determine the scaling factor α1 of F1 .
(d) Compute the second order spline filter F2 . To this end, determine the
B-spline ∧2 in (9.46), along with its scaling factor α2 .
Exercise 9.42. Develop a construction scheme for higher order spline filters
Fk of the form (9.45), where k ≥ 3. To this end, apply the recursion in (9.46)
and determine the scaling factors αk , for k ≥ 3.
(F −1 F )(πj/L) for j ∈ Z.
(a) 630 Radon lines (b) 2460 Radon lines (c) 15150 Radon lines
Fig. 9.11. Reconstruction of bull’s eye from Example 9.9 (see Figure 9.3).
References
22. B. Diederichs, A. Iske: Improved estimates for condition numbers of radial basis
function interpolation matrices. J. Approximation Theory, published electroni-
cally on October 16, 2017, https://doi.org/10.1016/j.jat.2017.10.004.
23. G. Faber: Über die interpolatorische Darstellung stetiger Funktionen.
Jahresbericht der Deutschen Mathematiker-Vereinigung 23, 1914, 192–210.
24. G.E. Fasshauer: Meshfree Approximation Methods with Matlab.
World Scientific, Singapore, 2007.
25. G.E. Fasshauer, M. McCourt: Kernel-based Approximation Methods using
Matlab. World Scientific, Singapore, 2015.
26. G.B. Folland: Fourier Analysis and its Applications.
Brooks/Cole, Pacific Grove, CA, U.S.A., 1992.
27. B. Fornberg, N. Flyer: A Primer on Radial Basis Functions with Applications
to the Geosciences. SIAM, Philadelphia, 2015.
28. W. Gander, M.J. Gander, F. Kwok: Scientific Computing – An Introduction
using Maple and MATLAB. Texts in CSE, volume 11, Springer, 2014.
29. C. Gasquet, P. Witomski: Fourier Analysis and Applications. Springer Sci-
ence+Business Media, New York, 1999.
30. M. v. Golitschek: Penalized least squares approximation problems.
Jaen Journal on Approximation Theory 1(1), 2009, 83–96.
31. J. Gomes, L. Velho: From Fourier Analysis to Wavelets. Springer, 2015.
32. A. Haar: Zur Theorie der orthogonalen Funktionensysteme.
Mathematische Annalen 69, 1910, 331–371.
33. M. Haase: Functional Analysis: An Elementary Introduction.
American Mathematical Society, Providence, RI, U.S.A., 2014.
34. P.C. Hansen, J.G. Nagy, D.P. O’Leary: Deblurring Images: Matrices, Spectra,
and Filtering. Fundamentals of Algorithms. SIAM, Philadelphia, 2006.
35. E. Hewitt, K.A. Ross: Abstract Harmonic Analysis I. Springer, Berlin, 1963.
36. K. Höllig, J. Hörner: Approximation and Modeling with B-Splines.
SIAM, Philadelphia, 2013.
37. A. Iske: Charakterisierung bedingt positiv definiter Funktionen für multivari-
ate Interpolationsmethoden mit radialen Basisfunktionen. Dissertation, Uni-
versität Göttingen, 1994.
38. A. Iske: Multiresolution Methods in Scattered Data Modelling. Lecture Notes
in Computational Science and Engineering, vol. 37, Springer, Berlin, 2004.
39. J.L.W.V. Jensen: Sur les fonctions convexes et les inégalités entre les valeurs
moyennes. Acta Mathematica 30, 1906, 175–193.
40. P. Jordan, J. von Neumann: On inner products in linear, metric spaces.
Annals of Mathematics 36(3), 1935, 719–723.
41. C.L. Lawson, R.J. Hanson: Solving Least Squares Problems.
Prentice-Hall, Englewood Cliffs, NJ, U.S.A., 1974.
42. P.D. Lax: Functional Analysis. Wiley-Interscience, New York, U.S.A., 2002.
43. G.G. Lorentz, M. v. Golitschek, Y. Makovoz: Constructive Approximation.
Grundlehren der mathematischen Wissenschaften, Band 304, Springer, 2011.
44. W.R. Madych: Summability and approximate reconstruction from Radon
transform data. In: Integral Geometry and Tomography, E. Grinberg and
T. Quinto (eds.), AMS, Providence, RI, U.S.A., 1990, 189–219.
45. W.R. Madych, S.A. Nelson: Multivariate Interpolation: A Variational Theory.
Technical Report, Iowa State University, 1983.
46. W.R. Madych, S.A. Nelson: Multivariate interpolation and conditionally posi-
tive definite functions. Approx. Theory Appl. 4, 1988, 77–89.
References 351
47. W.R. Madych, S.A. Nelson: Multivariate interpolation and conditionally posi-
tive definite functions II. Mathematics of Computation 54, 1990, 211–230.
48. J. Mairhuber: On Haar’s theorem concerning Chebysheff problems having
unique solutions. Proc. Am. Math. Soc. 7, 1956, 609–615.
49. S. Mallat: A Wavelet Tour of Signal Processing. Academic Press, 1998.
50. G. Meinardus: Approximation of Functions: Theory and Numerical Methods.
Springer, Berlin, 1967.
51. V. Michel: Lectures on Constructive Approximation. Birkhäuser, NY, 2013.
52. P. Munshi: Error analysis of tomographic filters I: theory.
NDT & E Int. 25, 1992, 191–194.
53. P. Munshi, R.K.S. Rathore, K.S. Ram, M.S. Kalra: Error estimates for tomo-
graphic inversion. Inverse Problems 7, 1991, 399–408.
54. P. Munshi, R.K.S. Rathore, K.S. Ram, M.S. Kalra: Error analysis of tomo-
graphic filters II: results. NDT & E Int. 26, 1993, 235–240.
55. J.J. O’Connor, E.F. Robertson: MacTutor History of Mathematics archive.
http://www-history.mcs.st-andrews.ac.uk.
56. M.J.D. Powell: Approximation Theory and Methods.
Cambridge University Press, Cambridge, UK, 1981.
57. A. Quarteroni, R. Sacco, F. Saleri: Numerical Mathematics.
Springer, New York, 2000.
58. M. Reed, B. Simon: Fourier Analysis, Self-Adjointness. In: Methods of Modern
Mathematical Physics II, Academic Press, New York, 1975.
59. E.Y. Remez: Sur le calcul effectiv des polynômes d’approximation des
Tschebyscheff. Compt. Rend. Acad. Sc. 199, 1934, 337.
60. E.Y. Remez: Sur un procédé convergent d’approximations successives pour
déterminer les polynômes d’approximation. Compt. Rend. Acad. Sc. 198, 1934,
2063.
61. R. Schaback: Creating surfaces from scattered data using radial basis functions.
In: Mathematical Methods for Curves and Surfaces, M. Dæhlen, T. Lyche, and
L.L. Schumaker (eds.), Vanderbilt University Press, Nashville, 1995, 477–496.
62. R. Schaback, H. Wendland: Special Cases of Compactly Supported Radial
Basis Functions. Technical Report, Universität Göttingen, 1993.
63. R. Schaback, H. Wendland: Numerische Mathematik. Springer, Berlin, 2005.
64. L.L. Schumaker: Spline Functions: Basic Theory. Third Edition,
Cambridge University Press, Cambridge, UK, 2007.
65. L.L. Schumaker: Spline Functions: Computational Methods. SIAM, 2015.
66. L.A. Shepp, B.F. Logan: The Fourier reconstruction of a head section.
IEEE Trans. Nucl. Sci. 21, 1974, 21–43.
67. G. Szegő: Orthogonal Polynomials. AMS, Providence, RI, U.S.A., 1939.
68. L.N. Trefethen: Approximation Theory and Approximation Practice.
SIAM, Philadelphia, 2013.
69. D.F. Walnut: An Introduction to Wavelet Analysis. Birkhäuser Basel, 2004.
70. G.A. Watson: Approximation Theory and Numerical Methods.
John Wiley & Sons, Chichester, 1980.
71. H. Wendland: Piecewise polynomial, positive definite and compactly supported
radial functions of minimal degree. Advances in Comp. Math. 4, 1995, 389–396.
72. H. Wendland: Scattered Data Approximation.
Cambridge University Press, Cambridge, UK, 2005.
73. Wikipedia. The free encyclopedia. https://en.wikipedia.org/wiki/
74. Z. Wu: Multivariate compactly supported positive definite radial functions.
Advances in Comp. Math. 4, 1995, 283–292.
Subject Index
Fourier Lagrange
– coefficient, 48, 112 – basis, 278
– convolution theorem, 247 – polynomial, 21
– inversion formula, 250, 251, 255, 258 – representation, 21, 278
– matrix, 53 Lebesgue
– operator, 118, 240, 250, 258 – constant, 211, 305
– partial sum, 112 – integrable, 79
– series, 118 Legendre polynomial, 127
– slice theorem, 327 Leibniz formula, 37
– spectrum, 239, 241 Lemma
– transform, 240, 258, 327 – Aitken, 26
frame, 202 – Riemann-Lebesgue, 242
frequency spectrum, 239 Lipschitz
functional – constant, 222
– bounded, 84 – continuity, 222
– continuous, 64 low-pass filter, 329
– convex, 74
– dual, 84 matrix
– linear, 84 – alternation, 164
– design, 10
Gâteaux derivative, 87 – Gram, 106, 286
Gauss – Toeplitz, 57
– filter, 333 – unitriangular, 299
– function, 245, 259, 281 – Vandermonde, 20, 276
– normal equation, 11 minimal
– distance, 61
Hölder inequality, 77, 79 – sequence, 69
Haar Minkowski inequality, 78, 79
– space, 158 modulus of continuity, 224
– system, 158 multiresolution analysis, 266
– wavelet, 261
Hermite Newton
– function, 138, 252 – Cotes quadrature, 235
– Genocchi formula, 36 – polynomial, 28, 168
– polynomials, 130
Hilbert space, 69 operator
– analysis, 199
indicator function, 261 – Bernstein, 188
inequality – difference, 29
– Bessel, 109 – projection, 108
– Hölder, 77, 79 – synthesis, 199
– Jensen, 73 orthogonal
– Minkowski, 78, 79 – basis, 106
– Young, 77 – complement, 108, 266
– projection, 104, 108, 265
Jackson theorems, 217 – system, 196
Jensen inequality, 73 orthonormal
– basis, 107, 267
Kolmogorov criterion, 92 – system, 196, 293
Subject Index 355