0% found this document useful (0 votes)

71 views

Iske A Approximation Theory and Algorithms For Data Analysis

Uploaded by

Eligiusz Pawłowski

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

Iske A Approximation Theory and Algorithms For Data Analysis

Uploaded by

Eligiusz Pawłowski

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 363

Texts in Applied Mathematics 68

Armin Iske

Approximation
Theory and
Algorithms for
Data Analysis
Texts in Applied Mathematics

Volume 68

Editors-in-chief
S. S. Antman, University of Maryland, College Park, USA
A. Bloch, University of Michigan, Public University, City of Michigan, USA
A. Goriely, Universiyty of Oxford, Oxford, UK
L. Greengard, New York University, New York, USA
P. J. Holmes, Princeton University, Princeton, USA

Series editors
J. Bell, Lawrence Berkeley National Lab, Berkeley, USA
R. Kohn, New York University, New York, USA
P. Newton, University of Southern California, Los Angeles, USA
C. Peskin, New York University, New York, USA
R. Pego, Carnegie Mellon University, Pittsburgh, USA
L. Ryzhik, Stanford University, Stanford, USA
A. Singer, Princeton University, Princeton, USA
A. Stevens, Max-Planck-Institute for Mathematics, Leipzig, Germany
A. Stuart, University of Warwick, Coventry, UK
T. Witelski, Duke University, Durham, USA
S. Wright, University of Wisconsin, Madison, USA
The mathematization of all sciences, the fading of traditional scientiﬁc boundaries,
the impact of computer technology, the growing importance of computer modeling
and the necessity of scientiﬁc planning all create the need both in education and
research for books that are introductory to and abreast of these developments. The
aim of this series is to provide such textbooks in applied mathematics for the student
scientist. Books should be well illustrated and have clear exposition and sound
pedagogy. Large number of examples and exercises at varying levels are
recommended. TAM publishes textbooks suitable for advanced undergraduate and
beginning graduate courses, and complements the Applied Mathematical Sciences
(AMS) series, which focuses on advanced textbooks and research-level monographs.

More information about this series at http://www.springer.com/series/1214

Armin Iske

Approximation Theory
and Algorithms for Data
Analysis

123
Armin Iske
Department of Mathematics
University of Hamburg
Hamburg, Germany

ISSN 0939-2475 ISSN 2196-9949 (electronic)

Texts in Applied Mathematics
ISBN 978-3-030-05227-0 ISBN 978-3-030-05228-7 (eBook)
https://doi.org/10.1007/978-3-030-05228-7
Library of Congress Control Number: 2018963282

Mathematics Subject Classiﬁcation (2010): 41-XX, 42-XX, 65-XX, 94A12

Original German edition published by Springer-Verlag GmbH, Heidelberg, 2017. Title of German
edition: Approximation.
© Springer Nature Switzerland AG 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This textbook oﬀers an elementary introduction to the theory and numerics

of approximation methods, combining classical topics of approximation with
selected recent advances in mathematical signal processing, and adopting a
constructive approach, in which the development of numerical algorithms for
data analysis plays an important role.
Although the title may suggest otherwise, this textbook is not a result
of the current hype on big data science. Nevertheless, both classical and
contemporary topics of approximation include the analysis and representation
of functions (e.g. signals), where suitable mathematical tools (e.g. Fourier
transforms) are essential for the analysis and synthesis of the data. As such,
the subject of data analysis is a central topic within approximation, as we
will discuss in further detail.
Prerequisites. This textbook is suitable for undergraduate students who
have a sound background in linear algebra and analysis. Further relevant
basics on numerical methods are provided in Chapter 2, so that this textbook
can be used by students attending a course on numerical mathematics. For
others, the material in Chapter 2 oﬀers a welcome review of basic numerical
methods. The text of this work is suitable for courses, seminars, and distance
learning programs on approximation.
Contents and Standard Topics. The central theme of approximation
is the characterization and construction of best approximations in normed
linear spaces. Readers are introduced to this standard topic (in Chapter 3),
before approximation in Euclidean spaces (in Chapter 4) and Chebyshev ap-
proximation (in Chapter 5) are addressed. These are followed by asymptotic
results concerning the approximation of univariate continuous functions by
algebraic and trigonometric polynomials (in Chapter 6), where the asymp-
totic behaviour of Fourier partial sums is of primary importance. The core
topics of Chapters 3-6 should be an essential part of any introductory course
on approximation theory.

V
VI Preface

Biographical Data. To allow readers to appreciate the historical con-

text of the presented topics and their developments, we decided to provide
footnotes, where we refer to those whose names are linked with the corres-
ponding results, deﬁnitions, and terms. For a better overview, we have also
added a name index. The listed biographical data mainly relies on the online
archive MacTutor History of Mathematics [55] and on the free encyclopedia
Wikipedia [73], where more detailed information can be found.
Preface VII

Acknowledgement. The material of this book has grown over many

years out the courses on approximation and mathematical signal processing
that I taught at the universities of Hamburg (Germany), Lund (Sweden),
and Padua (Italy). I thank the participating students for their constructive
feedback, which has added great didactical value to this textbook. More-
over, I would like to thank my (post)doctoral students Dr Adeleke Bankole,
Dr Matthias Beckmann, Dr Benedikt Diederichs, and Niklas Wagner for their
careful proofreading. Additional comments and suggestions from Dr Matthias
Beckmann and Dr Benedikt Diederichs concerning conceptional and didac-
tical aspects as well as the technical details of the presentation are grate-
fully appreciated. Last but not least, I would like to thank Dr Martin Peters
(SpringerSpektrum, Heidelberg) for his support and encouragement, which
led to the initiation of the book project.

Hamburg, October 2018 Armin Iske

[email protected]
Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Preliminaries, Deﬁnitions and Notations . . . . . . . . . . . . . . . . . . . 2
1.2 Basic Problems and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Approximation Methods for Data Analysis . . . . . . . . . . . . . . . . . 7
1.4 Hints on Classical and More Recent Literature . . . . . . . . . . . . . 8

2 Basic Methods and Numerical Algorithms . . . . . . . . . . . . . . . . 9

2.1 Linear Least Squares Approximation . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Regularization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Interpolation by Algebraic Polynomials . . . . . . . . . . . . . . . . . . . . 19
2.4 Divided Diﬀerences and the Newton Representation . . . . . . . . . 28
2.5 Error Estimates and Optimal Interpolation Points . . . . . . . . . . 41
2.6 Interpolation by Trigonometric Polynomials . . . . . . . . . . . . . . . . 47
2.7 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3 Best Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Dual Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.4 Direct Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4 Euclidean Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.1 Construction of Best Approximations . . . . . . . . . . . . . . . . . . . . . 104
4.2 Orthogonal Bases and Orthogonal Projections . . . . . . . . . . . . . . 107
4.3 Fourier Partial Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.4 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5 Chebyshev Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.1 Approaches to Construct Best Approximations . . . . . . . . . . . . . 140
5.2 Strongly Unique Best Approximations . . . . . . . . . . . . . . . . . . . . . 152
5.3 Haar Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 The Remez Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

IX
X Table of Contents

6 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

6.1 The Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.2 Complete Orthogonal Systems and Riesz Bases . . . . . . . . . . . . . 195
6.3 Convergence of Fourier Partial Sums . . . . . . . . . . . . . . . . . . . . . . 204
6.4 The Jackson Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

7 Basic Concepts of Signal Approximation . . . . . . . . . . . . . . . . . . 237

7.1 The Continuous Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . 239
7.2 The Fourier Transform on L2 (R) . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.3 The Shannon Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 255
7.4 The Multivariate Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 257
7.5 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

8 Kernel-based Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

8.1 Multivariate Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . 276
8.2 Native Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . 283
8.3 Optimality of the Interpolation Method . . . . . . . . . . . . . . . . . . . 289
8.4 Orthonormal Systems, Convergence, and Updates . . . . . . . . . . 293
8.5 Stability of the Reconstruction Scheme . . . . . . . . . . . . . . . . . . . . 302
8.6 Kernel-based Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 306
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

9 Computerized Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

9.1 The Radon Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
9.2 The Filtered Back Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.3 Construction of Low-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . 329
9.4 Error Estimates and Convergence Rates . . . . . . . . . . . . . . . . . . . 335
9.5 Implementation of the Reconstruction Method . . . . . . . . . . . . . 338
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

1 Introduction

Contemporary applications in computational science and engineering, as well

as in finance, require powerful mathematical methods to analyze big data sets.
Due to the rapidly growing complexity of relevant application data at limi-
ted computational (hardware) capacities, efficient numerical algorithms are
required for the simulation of complex systems with only a few parameters.
Both the parameter identification and the data assimilation are based on
high-performance computational methods to approximately represent mathe-
matical functions.
This textbook gives an introduction to the theory and the numerics of
approximation methods, where the approximation of real-valued functions

f : Ω −→ R

on compact parameter domains Ω ⊂ Rd , d ≥ 1, plays a central role.

But we do not only restrict ourselves to the approximation of functions.
We rather work with the more general assumption, where f lies in a linear
space F, i.e., f ∈ F. For the construction of a concrete approximation method
for elements f ∈ F, we ﬁrst ﬁx a suitable subset S ⊂ F, from which we seek
to select an approximation s∗ ∈ S to f . But the selection of s∗ requires
particular care. For the ”best possible” representation of f by s∗ ∈ S we are
interested in the selection of a best approximation s∗ ∈ S to f ,

s∗ ≈ f,

i.e., an element s∗ ∈ S which is among all elements s ∈ S closest to f .

In this introduction, we first explain important concepts and notions of
approximation, but only very briefly. Later in this chapter, we discuss con-
crete examples for relevant function spaces F and suitable subsets S ⊂ F.
For further motivation and outlook, we finally sketch selected questions and
results of approximation, which we will later address in more detail.

© Springer Nature Switzerland AG 2018 1

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_1
2 1 Introduction

1.1 Preliminaries, Deﬁnitions and Notations

For the construction of best approximations to f ∈ F, we necessarily need to
measure distances between f and its approximations s ∈ S. To this end, we
introduce a norm for F, where throughout this text we assume that F is a
linear space (i.e., vector space) over the real numbers R or over the complex
numbers C.

Deﬁnition 1.1. For a linear space F, a mapping · : F −→ [0, ∞) is said

to be a norm for F, if the following properties are satisﬁed.
(a) u = 0, if and only if u = 0 (deﬁniteness)
(b) αu = |α|u for all u ∈ F and all α ∈ R (or, α ∈ C) (homogeneity)
(c) u + v ≤ u + v for all u, v ∈ F (triangle inequality).
In this case, F with the norm · , (F, · ), is called a normed space.

For the approximation of functions, inﬁnite-dimensional linear spaces F

are of particular interest. Let us make one relevant example: For a compact
domain Ω ⊂ Rd , d ∈ N,

C (Ω) := {u : Ω −→ R | u continuous on Ω}

denotes the linear space of all continuous functions on Ω. Recall that C (Ω)
is a linear space of inﬁnite dimension. When equipped with the maximum
norm · ∞ , deﬁned as

u∞ := max |u(x)| for u ∈ C (Ω),

x∈Ω

C (Ω) is a normed linear function space (or, in short: a normed space). The
normed space C (Ω), equipped with the maximum norm · ∞ , is complete,
i.e., (C (Ω), ·∞ ) is a Banach space. We note this important result as follows.

Theorem 1.2. For compact Ω ⊂ Rd , (C (Ω), · ∞ ) is a Banach space.

Further important examples for norms on C (Ω) are the p-norms · p ,

1 ≤ p < ∞, deﬁned as
1/p
up := |u(x)|p dx for u ∈ C (Ω).
Ω

Example 1.3. For 1 ≤ p < ∞, (C (Ω), · p ) is a normed space. ♦

We remark that the case p = 2 is of particular interest: In this case, the

2-norm · 2 on C (Ω) is generated by the inner product (·, ·),

(u, v) := u(x)v(x) dx for u, v ∈ C (Ω),
Ω
1.1 Preliminaries, Deﬁnitions and Notations 3

via · 2 = (·, ·)1/2 , so that

u2 = |u(x)|2 dx for u ∈ C (Ω).
Ω

To be more general, a linear space F, equipped with an inner product

(·, ·) : F × F −→ R is with · := (·, ·)1/2 is a normed space, in which case
we say that F is a Euclidean space.
Example 1.4. The normed space (C (Ω), · 2 ) is a Euclidean space. ♦
We analyze the approximation in Euclidean spaces in detail in Chapter 4.
As we will prove, the smoothness of a target function f to be approximated
takes inﬂuence on the resulting approximation quality, where we quantify the
smoothness of f by its diﬀerentiation order k ∈ N0 . For this reason, the linear
subspaces

C k (Ω) = {u : Ω −→ R | u has k continuous derivatives on Ω} ⊂ C (Ω)

are of particular interest. The function spaces C k (Ω) form a nested sequence

C ∞ (Ω) ⊂ C k+1 (Ω) ⊂ C k (Ω) ⊂ C k−1 (Ω) ⊂ · · · ⊂ C 1 (Ω) ⊂ C 0 (Ω) = C (Ω)

of inﬁnite-dimensional linear subsets of C (Ω), where

C ∞ (Ω) := C k (Ω)
k∈N0

is the linear space of functions with arbitrary diﬀerentiation order on Ω.

For the construction of approximation methods, ﬁnite-dimensional linear
subspaces S ⊂ F are useful. To this end, let {s1 , . . . , sn } ⊂ S be a ﬁxed
basis of S, where n = dim(S) ∈ N. In this case, any s ∈ S can uniquely be
represented by a linear combination

n
s= cj sj
j=1

by n parameters c1 , . . . , cn ∈ R. As we will see later, the assumption of

finite dimension for S will help simplify our computations, especially for the
coding and the evaluation of best approximations s∗ ∈ S to f . But the finite
dimension of S will also be useful in theory, in particular when it comes to
discussing the existence of best approximations.
For the special case of univariate functions, i.e., for Ω = [a, b] ⊂ R com-
pact, we consider the approximation of continuous functions f ∈ C [a, b] by
using algebraic polynomials. In this case, we choose S = Pn , for a fixed de-
gree n ∈ N0 , so that Pn is the linear space of all univariate polynomials of
degree at most n.
4 1 Introduction

For the representation of algebraic polynomials from Pn the monomial

basis {1, x, x2 , . . . , xn } is particularly popular, where in this case any p ∈ Pn
is represented by a unique linear combination of the monomial form
p(x) = a0 + a1 x + a2 x2 + . . . + an xn for x ∈ R
with real coeﬃcients a0 , . . . , an . Note that dim(Pn ) = n + 1.
Further relevant examples for inﬁnite-dimensional linear spaces of uni-
variate functions F are the 2π-periodic continuous functions,
C2π := {u ∈ C (R) | u(x) = u(x + 2π) for all x ∈ R} ⊂ C (R),
and their linear subspaces
C2π
k
:= C k (R) ∩ C2π for k ∈ N0 ∪ {∞}.
Example 1.5. For k ∈ N0 ∪ {∞} and 1 ≤ p < ∞, the function space C2π
k
,
equipped with the p-norm
2π 1/p
up := |u(x)|p dx for u ∈ C2π
k
,
0

is a normed linear space. For p = 2 the function space C2πk

is Euclidean by
the inner product
2π
(u, v) := u(x)v(x) dx for u, v ∈ C2π
k
.
0

Finally, the function space C2π

k
, equipped with the maximum norm
u∞ := max |u(x)| for u ∈ C2π
k
x∈[0,2π]

is a Banach space. ♦

The approximation of functions from C2π plays an important role in math-

ematical signal processing, where trigonometric polynomials of the form
a0
n
T (x) = + [aj cos(jx) + bj sin(jx)] for x ∈ R
2 j=1

with Fourier coeﬃcients a0 , . . . , an , b1 , . . . , bn ∈ R are used. In this case, we

choose S = Tn , for n ∈ N0 , where
Tn = span {1, sin(x), cos(x), . . . , sin(nx), cos(nx)} ⊂ C2π
is the linear space of all real-valued trigonometric polynomials of degree at
most n ∈ N0 . Note that dim(Tn ) = 2n + 1.

We will discuss further relevant examples for normed spaces (F, · ) and
approximation spaces S ⊂ F later. In this short introduction, we will only
touch a few more important aspects of approximation for an outlook.
1.2 Basic Problems and Outlook 5

1.2 Basic Problems and Outlook

For the analysis of best approximations, the following questions are relevant.
• On given f ∈ F, does there exist a best approximation s∗ ∈ S for f ?
• Is a best approximation s∗ for f unique?
• Are there necessary/sufficient conditions for a best approximation s∗ ?
• How can we compute a best approximation s∗ analytically or numerically?
The answers to the above questions depend on the properties of the linear
space F, its norm · , as well as on the chosen approximation space S ⊂ F.
We will give satisfying answers to the above questions. In Chapter 3 we first
provide general answers that do not depend on the particular choices of F,
· , and S, but rather on their structural properties. Then we analyze the
special case of the Euclidean norm (in Chapter 4) and that of the maximum
norm · ∞ , also referred to as Chebyshev norm (in Chapter 5).
Later in Chapter 6, we study the asymptotic behaviour of approximation
methods. In that discussion, we ask the question, how well we can approxi-
mate a target f ∈ F by certain sequences of approximations to f . To further
explain on this, suppose f ∈ F, the norm · , and the approximation space
S are fixed. Then we can quantify the approximation quality by the minimal
distance
η ≡ η(f, S) = inf s − f = s∗ − f
s∈S

between f and S. In relevant application scenarios, we wish to approximate

f arbitrarily well. For ﬁxed S, this will not work, however, since in that case
the minimal distance η(f, S) is already the best possible.
Therefore, we work with nested sequences of approximation spaces

S 0 ⊂ S1 ⊂ . . . ⊂ Sn ⊂ F for n ∈ N0

where we also regard the corresponding sequence of minimal distances

η(f, S0 ) ≥ η(f, S1 ) ≥ . . . ≥ η(f, Sn ) ≥ 0,

whose asymptotic behaviour we will analyze. Now if we wish to approximate

f ∈ F arbitrarily well, then the minimal distances must necessarily be a zero
sequence, i.e.,
η(f, Sn ) −→ 0 for n → ∞.
This leads us to the following fundamental question of approximation.
6 1 Introduction

Question: Is there for any f ∈ F and any ε > 0 one n ∈ N satisfying

η(f, Sn ) = s∗n − f < ε,

where s∗n ∈ Sn is a best approximation to f from Sn ? ♦

If the answer to the above question is positive, then the union

S= Sn ⊂ F
n≥0

is called dense in F with respect to the norm · , or, dense subset of F.

We are particularly interested in the approximation of continuous func-
tions, where we ﬁrst study the approximation of univariate continuous func-
tions (in Chapters 2-6), before we later turn to multivariate approximation
(in Chapters 8-9).
For an outlook, we quote two classical results from univariate approxi-
mation, which are studied in Chapter 6. The following result of Weierstrass
(dating back to 1885) is often referred to as the ”birth of approximation”.

Theorem 1.6. (Weierstrass, 1885). For a compact interval [a, b] ⊂ R, the

set of algebraic polynomials P is dense in C [a, b] with respect to the maximum
norm ·∞ . In other words, for any f ∈ C [a, b] and ε > 0 there is an algebraic
polynomial p satisfying

p − f ∞,[a,b] = max |p(x) − f (x)| < ε.

x∈[a,b]

The above version of the Weierstrass theorem is an algebraic one. But

there is also a trigonometric version of the Weierstrass theorem, according
to which the set of trigonometric polynomials T is dense in C2π with respect
to the maximum norm · ∞ . We will prove both versions of the Weierstrass
theorem in Section 6.1. Moreover, in Sections 6.3 and 6.4 we will, for f ∈ C2π ,
analyze decay rates for the minimal distances

η(f, Tn ) := inf T − f and η∞ (f, Tn ) := inf T − f ∞

T ∈Tn T ∈Tn

with respect to both the Euclidean norm · and the maximum norm · ∞ .
The latter will lead us to the Jackson theorems, one of which is as follows.

Theorem 1.7. (Jackson). For f ∈ C2π

k
we have
k
π
η∞ (f, Tn ) ≤ · f (k) ∞ = O n−k for n → ∞.
2(n + 1)

1.3 Approximation Methods for Data Analysis 7

From this result, we see that the power of the approximation method does
not only depend on the approximation spaces Tn but also and essentially on
the smoothness of the target f . Indeed, the following principle holds:
The smoother the target function f ∈ C2π , the faster is the convergence
of the minimal distances η(f, Tn ), or, η∞ (f, Tn ) to zero.
We will prove this and other classical results concerning the asymptotic
behaviour of minimal distances in Chapter 6.

1.3 Approximation Methods for Data Analysis

Having studied classical topics of approximation (in Chapters 3-6) we will

address more recent developments and trends of approximation. To this end,
we develop and analyze specific approximation methods for data analysis,
where relevant applications in signal processing play an important role.
We first introduce (in Chapter 7) basic concepts of Fourier analysis. Fur-
ther, in Section 7.3 we prove the Shannon sampling theorem, Theorem 7.34,
which is a fundamental result of signal theory. According to the Shannon
sampling theorem, any band-limited signal f ∈ L2 (R) (i.e., f has limited fre-
quency density) can be recovered exactly from its values taken on an infinite
discrete sampling mesh of constant mesh width. Our proof of the Shannon
sampling theorem will demonstrate the relevance and the fundamental im-
portance of the introduced Fourier methods.
Further advanced topics of approximation are comprising wavelets and
kernel-based methods for multivariate approximation. In this introductory
text, however, we can only cover a few selected theoretical and numerical
aspects of these multifaceted topics. Therefore, as regards wavelet methods
(in Section 7.5) we restrict ourselves to an introduction of the Haar wavelet.
Moreover, the subsequent discussion on basic concepts of kernel-based ap-
proximation (in Chapter 8) is based on positive definite kernels. Among our
addressed applications in multivariate data analysis are kernel-based methods
in machine learning (in Section 8.6). For further details on this subject, we
refer to our references in the following Section 1.4.
Another important application is the approximation of bivariate signals
in computer tomography, as addressed in Chapter 9, where we analyze theore-
tical aspects of this inverse problem rigorously from the viewpoint of approxi-
mation. This finally leads us to novel error estimates and convergence rates,
as developed in Section 9.4. The constructive account taken here provides a
new concept of evaluation methods for low-pass filters. We finally discuss the
implementation of the filtered back projection formula (in Section 9.5).
8 1 Introduction

1.4 Hints on Classical and More Recent Literature

Approximation theory is a vivid research area of mathematics with a long
history [55]. More recent developments have provided powerful numerical ap-
proximation methods aiming to address challenging application problems in
the relevant areas of data and computer science, natural science and engi-
neering. This has led to a large variety of diverse contributions to the approx-
imation literature by research monographs and publications that can hardly
be overviewed. In fact, it is obviously impossible to cover all relevant aspects
of approximation in broad width and depth in one textbook. For this ele-
mentary introduction, we decided to ﬁrst treat selected theoretical aspects of
classical approximation, before we turn to more recent concepts of numerical
approximation.
For further reading, we refer to a selection of classical and contemporary
sources on approximation theory and numerical methods. Likewise, the list
of references cannot be complete, and in fact we can only give a few hints,
although the selection of more recent texts on approximation is rather limited.
As regards classical texts on approximation (from the second half of the last
century) we refer to [11, 12, 19, 50, 56, 70]. Further material on more advanced
topics, including nonlinear approximation, can be found in [9, 21, 43].
For a more recent introduction to approximation theory with accentuated
links to relevant applications in geomathematics we refer to [51]. A modern
account to approximation with pronounced algorithmic and numerical ele-
ments is provided in the modern teaching concept of [68].
Literature sources to more speciﬁc topics of approximation are dealing
with spline approximations [20, 36, 64, 65], wavelets [14, 18, 49] and ra-
dial basis functions [10, 24, 25, 27, 38, 72]. Since spline approximation is a
well-established topic in standard courses on numerical mathematics [57], we
decided to omit a discussion on splines in this work.
2 Basic Methods and Numerical Algorithms

In this chapter, we discuss basic mathematical methods and numerical al-

gorithms for interpolation and approximation of functions in one variable.
The concepts and principles which we address here should already be known
from numerical mathematics. Nevertheless, the material of this chapter will
be necessary for our subsequent discussion. Therefore, a repetition of selected
elements from numerical mathematics should be most welcome.
For the sake of preparation, let us first fix a few notations. We denote
by f : [a, b] −→ R a continuous function on a compact interval [a, b] ⊂ R,
f ∈ C [a, b]. Moreover,
X = {x0 , x1 , . . . , xn } ⊂ [a, b] for n ∈ N0
is a set of |X| = n + 1 pairwise distinct interpolation knots. We collect the
function values fj = f (xj ) of f on X in one data vector,
fX = (f0 , f1 , . . . , fn )T ∈ Rn+1 .
For the approximation of f , we will later specify suitable linear subspaces of
continuous functions, S ⊂ C [a, b], of finite dimension dim(S) ≤ n + 1.
We first consider linear least squares approximation. In this problem, we
seek an approximation s∗ ∈ S to f which minimizes among all s ∈ S the sum
of pointwise square errors on X, so that

|s∗ (x) − f (x)|2 ≤ |s(x) − f (x)|2 for all s ∈ S. (2.1)
x∈X x∈X

Moreover, we discuss numerical algorithms for interpolation, which could

be viewed as a special case of linear least squares approximation. To this end,
we first consider using algebraic polynomials, where S = Pn . To compute a
solution s ∈ Pn of the interpolation problem sX = fX , i.e.,
s(xj ) = f (xj ) for all 0 ≤ j ≤ n, (2.2)
we develop efficient and numerically stable algorithms. Finally, we address
interpolation to periodic functions by using trigonometric polynomials, where
S = Tn . This leads us directly to the discrete Fourier transform (DFT), which
will be of primary importance later in this book. We show how the DFT can
be computed efficiently by using the fast Fourier transform (FFT).

© Springer Nature Switzerland AG 2018 9

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_2
10 2 Basic Methods and Numerical Algorithms

2.1 Linear Least Squares Approximation

The following discussion on linear least squares approximation leads us to a
first concrete example of an approximation problem. As a starting point for
our investigations, we regard the minimization problem (2.1), whose solution
we wish to construct. To this end, we first fix a set B = {s1 , . . . , sm } ⊂ C [a, b]
of m ≤ n + 1 linearly independent continuous functions. This immediately
leads us to the linear approximation space
⎧ ⎫
⎨ ⎬
m

S = span{s1 , . . . , sm } := cj sj c1 , . . . , cm ∈ R ⊂ C [a, b]
⎩ ⎭
j=1

with basis B and of dimension dim(S) = m. In typical applications of linear

least squares approximation the number n + 1 of given function values in fX
is assumed to be much larger than the dimension m of S. Indeed, we prefer
to work with a simple model for S which in particular is generated by only
a few basis functions B. We use the notation m n + 1 to indicate that m
is assumed to be much smaller than n + 1.
But the following method for solving the minimization problem (2.1) can
be applied for all m ≤ n + 1. We formulate the linear least squares approxi-
mation problem (2.1) more precisely as follows.

Problem 2.1. Compute from a given set X = {x0 , . . . , xn } ⊂ [a, b] of n + 1

pairwise distinct points and a data vector fX = (f0 , . . . , fn )T ∈ Rn+1 a con-
tinuous function s∗ ∈ S = span{s1 , . . . , sm }, for m ≤ n + 1, which minimizes
among all s ∈ S the pointwise sum of square errors on X, so that

s∗X − fX 22 ≤ sX − fX 22 for all s ∈ S. (2.3)

To solve the minimization of Problem 2.1 we represent s∗ ∈ S as a unique

linear combination
m
s∗ = c∗j sj (2.4)
j=1

of the basis functions in B. Thereby, the linear least squares approximation

problem can be reformulated as an equivalent minimization problem of the
form
Bc − fX 22 −→ minm ! , (2.5)
c∈R

where the design matrix

⎡ ⎤
s1 (x0 ) · · · sm (x0 )
⎢ ⎥
B ≡ BB,X := ⎣ ... ..
. ⎦∈R
(n+1)×m

s1 (xn ) · · · sm (xn )
2.1 Linear Least Squares Approximation 11

contains all evaluations of the basis functions from B at the points in X. To

solve the minimization problem (2.5), we regard for the multivariate function
F : Rm −→ [0, ∞), deﬁned as

F (c) = Bc − fX 22 = (Bc − fX )T (Bc − fX ) = cT B T Bc − 2cT B T fX + fX

T
fX ,

its gradient
∇F (c) = 2B T Bc − 2B T fX
and its (constant) Hessian1 matrix

∇2 F (c) = 2B T B.

Recall that any local minimum of F can be characterized via the solution
of the linear equation system

B T Bc = B T fX , (2.6)

referred to as Gaussian2 normal equation. If B has full rank, i.e., rank(B) =

m, then the symmetric matrix B T B is positive definite. In this case, the Gaus-
sian normal equation (2.6) has a unique solution c∗ = (c∗1 , . . . , c∗m )T ∈ Rm
satisfying
F (c∗ ) < F (c) for all c ∈ Rm \ {c∗ }.
The solution c∗ ∈ Rm yields the sought coefficient vector for s∗ in (2.4).
Hence, our first approximation problem, Problem 2.1, is solved.
However, our suggested solution via the Gaussian normal equation (2.6)
is problematic from a numerical viewpoint: If B has full rank, then the spec-
tral condition numbers of the matrices B T B and B are related by (cf. [57,
Section 3.1])
2
κ2 (B T B) = (κ2 (B)) .
Therefore, the spectral condition number κ2 (B T B) of the matrix B T B grows
quadratically proportional to the reciprocal of the smallest singular value of
B. For matrices B arising in relevant applications of linear least squares
approximation, however, its smallest singular value is typically very small,
whereby the condition number κ2 (B T B) of B T B is even worse. In fact, the
condition number of linear least squares approximation problems is, especially
for very small residuals Bc − fX 2 , very critical, so that a solution via the
Gaussian normal equation (2.6) should be avoided for the sake of numerical
stability (see [28, Section 6.2]). A more comprehensive error analysis on linear
least squares approximation can be found in the textbook [7].
Instead of this, a numerically stable solution for the linear least squares
approximation problem works with a QR factorization
1
Ludwig Otto Hesse (1811-1874), German mathematician
2
Carl Friedrich Gauß (1777-1855), German mathematician and astronomer
12 2 Basic Methods and Numerical Algorithms

B = QR (2.7)

of the design matrix B ∈ R(n+1)×m , where Q ∈ R(n+1)×(n+1) is an orthogonal

matrix and R is an upper triangular matrix of the form
⎡ ⎤
s11 · · · s1m
⎢ .. . ⎥
S ⎢ . .. ⎥
R= =⎢ ⎥ ∈ R(n+1)×m . (2.8)
0 ⎣ smm ⎦
0

Note that matrix B has full rank, rank(B) = m, if and only if no diagonal
entry skk , 1 ≤ k ≤ m, in the upper triangular matrix S ∈ Rm×m vanishes.
A numerically stable solution for the minimization problem (2.5) relies on
the alternative representation

F (c) = Bc − fX 22 = QRc − fX 22 = Rc − QT fX 22 , (2.9)

where we use the isometry of the inverse Q−1 = QT with respect to the
Euclidean norm · 2 , i.e.,

QT y2 = y2 for all y ∈ Rn+1 .

Now the vector QT fX ∈ Rn+1 can be partitioned into two blocks g ∈ Rm

and h ∈ Rn+1−m , so that

g
QT fX = ∈ Rn+1 . (2.10)
h

Therefore, the representation for F (c) in (2.9) can be rewritten as a sum

of the form
F (c) = Sc − g22 + h22 , (2.11)
where we use the partitioning (2.8) for R and that in (2.10) for QT fX . The
minimum of F (c) in (2.11) can ﬁnally be computed via the solution of the
triangular linear system
Sc = g
by a backward substitution. The solution c∗ of this linear system is unique,
if and only if B has full rank.
In conclusion, the described procedure provides a numerically stable algo-
rithm to compute the solution c∗ of the minimization problem (2.5), and this
yields the coeﬃcient vector c∗ of s∗ in (2.4). For the approximation error, we
obtain
F (c∗ ) = Bc∗ − fX 22 = h22 .

This solves the linear least squares approximation problem, Problem 2.1.
For further illustration, we discuss one example of linear regression.
2.1 Linear Least Squares Approximation 13
5.5

4.5

3.5

2.5

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) noisy observations (X, f˜X )

5.5

4.5

3.5

2.5

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(b) regression line s∗ ∈ P1

Fig. 2.1. (a) We take 26 noisy samples f˜X = fX + εX from f (x) = 1 + 3x. (b) We
compute the regression line s∗ (x) = c∗0 + c∗1 x, with c∗0 ≈ 0.9379 and c∗1 ≈ 3.0617, by
using linear least squares approximation (cf. Example 2.2).
14 2 Basic Methods and Numerical Algorithms

Example 2.2. We assume a linear model, i.e., we approximate f ∈ C [a, b]

by a linear function s(x) = c0 +c1 x, for c0 , c1 ∈ R. Moreover, we observe noisy
measurements f˜X , taken from f at n + 1 sample points X = {x0 , x1 , . . . , xn },
so that
f˜(xj ) = f (xj ) + εj for 0 ≤ j ≤ n,
where εj is the error for the j-th sample. We collect the error terms in one
vector εX = (ε0 , ε1 , . . . , εn )T ∈ Rn+1 . Figure 2.1 (a) shows an example for
noisy observations (X, f˜X ), where

f˜X = fX + εX .

We ﬁnally display the assumed linear relationship between the sample

points X (of the input data sites) and the observations f˜X (of the target
object). To this end, we ﬁx the basis functions s1 ≡ 1 and s2 (x) = x, so that
span{s1 , s2 } = P1 . Now the solution of the resulting minimization problem
in (2.5) has the form
Bc − f˜X 22 −→ min2 ! (2.12)
c∈R

with the design matrix B ∈ R (n+1)×2

, where

1 1 ··· 1
T
B = ∈ R2×(n+1) .
x0 x1 · · · xn

The minimization problem (2.12) can be solved via a QR decomposition of

B in (2.7) to obtain a numerically stable solution c∗ = (c∗0 , c∗1 )T ∈ R2 . The
resulting regression line is given by s∗ (x) = c∗0 + c∗1 x, see Figure 2.1 (b). ♦

2.2 Regularization Methods

Next, we develop a relevant extension of linear least squares approximation,
Problem 2.1. To this end, we augment the data error

ηX (f, s) = sX − fX 22

of the cost function by a regularization term, given by a suitable functional

J : S −→ R,

where J(s) quantiﬁes, for instance, the smoothness, the variation, the energy,
or the oscillation of s ∈ S. By the combination of the data error ηX and the
regularization functional J, being balanced by a ﬁxed parameter α > 0, this
leads us to an extension of linear least squares approximation, Problem 2.1,
giving a regularization method that is described by the minimization problem

sX − fX 22 + αJ(s) −→ min ! (2.13)

s∈S
2.2 Regularization Methods 15

By choice of the regularization parameter α, we aim to compromise be-

tween the approximation quality ηX (f, s∗α ) of a solution s∗α ∈ S for (2.13)
and its regularity, being measured by J(s∗α ). We can further explain this as
follows. On the one hand, for very small values α the error term ηX (f, s) will
be dominant over the regularization term J(s), which improves the approxi-
mation quality of a solution s∗α for (2.13). On the other hand, for very large
values α the regularization term J(s) will be dominant. In this case, however,
we wish to avoid overfitting of a solution s∗α . But the selection of J : S −→ R
requires particular care, where a problem-adapted choice of J relies on spe-
cific model assumptions from the application addressed. In applications of
information technology, for instance, regularization methods are applied for
smoothing, deblurring or denoising image and signal data (see [34]).
Now we discuss one relevant special case, where the functional J : S −→ R
is being defined by a symmetric positive definite matrix A ∈ Rm×m . To
further explain the definition of J, for a fixed basis B = {s1 , . . . , sm } of S,
each element
m
s= cj sj ∈ S
j=1

is mapped onto the A-norm (i.e., the norm being induced by A)

c2A := cT Ac (2.14)

of its coeﬃcients c = (c1 , . . . , cm )T ∈ Rm .

Starting from our above discussion on linear least squares approximation,
this particular choice of J leads to a class of regularization methods termed
Tikhonov3 regularization. We formulate the problem of Tikhonov regu-
larization as follows.

Problem 2.3. Under the assumption α > 0 and A ∈ Rm×m symmetric

positive deﬁnite, compute on given function values fX ∈ Rn+1 and design
matrix B = (sj (xk ))0≤k≤n;1≤j≤m ∈ R(n+1)×m , m ≤ n + 1, a solution for the
minimization problem

Bc − fX 22 + αc2A −→ minm ! (2.15)

c∈R

Note that Problem 2.3 coincides for α = 0 with the linear least squares
approximation problem. As we show in the following, the minimization prob-
lem (2.15) of Tikhonov regularization has for any α > 0 a unique solution,
in particular for the case, where the design matrix B has no full rank. We
further remark that the linear least squares approximation problem, Prob-
lem 2.1, has for rank(B) < m ambiguous solutions. However, as we will show,
3
Andrey Nikolayevich Tikhonov (1906-1993), Russian mathematician
16 2 Basic Methods and Numerical Algorithms

the solution s∗α ∈ S converges for α 0 to a norm minimal solution s∗ of

linear least squares approximation.
Now we regard, for ﬁxed α > 0, the cost function Fα : Rm −→ [0, ∞),

Fα (c) = Bc − fX 22 + αc2A = cT B T B + αA c − 2cT B T fX + fX

T
fX ,

its gradient
∇Fα (c) = 2 B T B + αA c − 2B T fX
and the (constant) positive deﬁnite Hessian matrix

∇2 Fα = 2 B T B + αA . (2.16)

Note that the function Fα has one unique stationary point c∗α ∈ Rm satisfying
the necessary condition ∇Fα (c) = 0. Therefore, c∗ can be characterized as the
unique solution of the minimization problem (2.15) via the unique solution
of the linear system
B T B + αA c∗α = B T fX ,
−1
i.e., c∗α = B T B + αA B T fX . Due to the positive deﬁniteness of the Hes-
∗
sian ∇ Fα in (2.16), cα is a local minimum of Fα . Moreover, in this case Fα
2

is convex, and so c∗α is the unique minimum of Fα on Rm .

Now we explain how c∗α can be computed by a stable numerical algorithm.
By the spectral theorem, there is an orthogonal matrix U ∈ Rm×m satisfying

A = U ΛU T ,

where Λ = diag(λ1 , . . . , λm ) ∈ Rm×m is a diagonal matrix containing the

positive eigenvalues λ1 , . . . , λm > 0 of A. This allows us to deﬁne the square
root of A by letting
A1/2 := U Λ1/2 U T ,
√ √
where Λ1/2 = diag( λ1 , . . . , λm ) ∈ Rm×m . Note that the square root A1/2
is, like A, also symmetric positive deﬁnite, and we have
√ T √ √ 2

αc2A = αcT Ac = αA1/2 c αA1/2 c = αA1/2 c .
2

This implies
2
√ B fX
Bc−fX 22 +αc2A = Bc−fX 22 + 1/2
αA c22
= √ c− .
αA1/2 0 2

Using the notation

B fX
Bα = √ 1/2 ∈ R((n+1)+m)×m and gX = ∈ R(n+1)+m ,
αA 0

we can reformulate (2.15) by the linear least squares approximation problem

2.2 Regularization Methods 17

Bα c − gX 22 −→ minm ! , (2.17)

c∈R

whose solution c∗α can be computed by a stable algorithm via a QR factori-

zation of Bα , as already explained in the previous section for the numerical
solution of the linear least squares approximation problem, Problem 2.1.
Next, we aim to characterize the asymptotic behaviour of s∗α for α 0
and for α → ∞. To this end, we ﬁrst develop a suitable representation for
the solution c∗α to (2.15). Since A is symmetric positive deﬁnite, we have

Bc − fX 22 + αc2A = BA−1/2 A1/2 c − fX 22 + αA1/2 c22 .

By using the notation

C = BA−1/2 ∈ R(n+1)×m and b = A1/2 c ∈ Rm

we can rewrite (2.15) as

Cb − fX 22 + αb22 −→ minm ! (2.18)

b∈R

For the solution of (2.18), we employ the singular value decomposition of C,

C = V ΣW T ,

where V = (v1 , . . . , vn+1 ) ∈ R(n+1)×(n+1) and W = (w1 , . . . , wm ) ∈ Rm×m

are orthogonal, and where the matrix Σ has the form
⎡ ⎤
σ1 0
⎢ .. ⎥
⎢ . ⎥
Σ=⎢ ⎥ ∈ R(n+1)×m
⎣ σr ⎦
0 0

with singular values σ1 ≥ . . . ≥ σr > 0, where r = rank(C) ≤ m.

But this implies

Cb − fX 22 + αb22 = V ΣW T b − fX 22 + αb22

= ΣW T b − V T fX 22 + αW T b22

and, moreover, we obtain for a = W T b ∈ Rm the representation

Cb − fX 22 + αb22 = Σa − V T fX 22 + αa22

r
n+1
m
= (σj aj − vjT fX )2 + (vjT fX )2 + α a2j .
j=1 j=r+1 j=1

For the minimization of this expression, we ﬁrst let

18 2 Basic Methods and Numerical Algorithms

aj := 0 for r + 1 ≤ j ≤ m,

so that it remains to solve the minimization problem

r

(σj aj − vjT fX )2 + αa2j −→ min ! (2.19)
a1 ,...,ar ∈R
j=1

Since all terms of the cost function in (2.19) are non-negative, the minimiza-
tion problem (2.19) can be split into the r independent subproblems

gj (aj ) = (σj aj − vjT fX )2 + αa2j −→ min ! for 1 ≤ j ≤ r (2.20)

aj ∈R

with the scalar-valued cost functions gj : R −→ R, for 1 ≤ j ≤ r. Since

gj (aj ) = 2((σj2 + α)aj − σj vjT fX ) and gj (aj ) = 2(σj2 + α) > 0

the function gj is a convex parabola. The unique minimum a∗j of gj in (2.20)

is given by
σj
a∗j = 2 v T fX for all 1 ≤ j ≤ r.
σj + α j
Therefore, for the unique solution b∗ of (2.18) we have

r
σj
b∗ = W a∗ = v T fX w j .
j=1
σj2 + α j

From this we obtain

b∗ −→ 0 for α −→ ∞
and
r
1 T
b∗ −→ b∗0 := v j fX w j = C + fX for α 0
j=1
σ j

for the asymptotic behaviour of b∗ , where C + denotes the pseudoinverse of C.

Therefore, b∗0 is the unique norm minimal solution of the linear least squares
approximation problem

Cb − fX 22 −→ minm !

b∈R

Therefore, for the solution c∗ = A−1/2 b∗ to (2.15) we get

c∗ −→ 0 for α −→ ∞

and
c∗ −→ c∗0 = A−1/2 b∗0 for α 0,
where c∗0 ∈R m
denotes that solution of the linear least squares problem
2.3 Interpolation by Algebraic Polynomials 19

Bc − fX 22 −→ minm !

c∈R

which minimizes the norm · A . For the solution s∗α ∈ S of (2.13), we obtain
s∗α −→ 0 for α −→ ∞
and
s∗α −→ s∗0 for α 0,
where s∗0 ∈ S is that solution for the linear least squares problem
sX − fX 22 −→ min !
s∈S
∗
whose coeﬃcients c ∈ R m
minimize the norm · A .

2.3 Interpolation by Algebraic Polynomials

In this section, we work with algebraic polynomials for the interpolation of a
continuous function f ∈ C [a, b]. Algebraic polynomials p : R −→ R are often
represented as linear combinations

n
p(x) = a k x k = a0 + a 1 x + a 2 x 2 + . . . + a n x n (2.21)
k=0

of monomials 1, x, x2 , . . . , xn with coeﬃcients a0 , . . . , an ∈ R. Recall that

n ∈ N0 denotes the degree of p, provided that the leading coeﬃcient an ∈ R
does not vanish. We collect all algebraic polynomials of degree at most n in
the linear space
Pn := span{1, x, x2 , . . . , xn } for n ∈ N0 .
Now we consider the following interpolation problem.
Problem 2.4. Compute from a given set X = {x0 , . . . , xn } ⊂ R of n + 1
pairwise distinct points and a data vector fX = (f0 , . . . , fn )T ∈ Rn+1 an
algebraic polynomial p ∈ Pn satisfying pX = fX , i.e.,
p(xj ) = fj for all 0 ≤ j ≤ n. (2.22)

If we represent p ∈ Pn as a linear combination of monomials (2.21), then
the interpolation conditions (2.22) lead to a linear system of the form
a0 + a1 x0 + a2 x20 + . . . + an xn0 = f0
a0 + a1 x1 + a2 x21 + . . . + an xn1 = f1
.. ..
. .
2 n
a 0 + a 1 x n + a 2 x n + . . . + an x n = fn ,
20 2 Basic Methods and Numerical Algorithms

or, in the shorter matrix-vector notation,

VX · a = fX (2.23)
with coeﬃcient vector a = (a0 , . . . , an )T ∈ Rn+1 and the Vandermonde4
matrix ⎡ ⎤
1 x0 x20 . . . xn0
⎢ 1 x1 x21 . . . xn1 ⎥
⎢ ⎥
VX = ⎢ . . . .. ⎥ ∈ R
(n+1)×(n+1)
. (2.24)
⎣ .. .. .. . ⎦
1 xn x2n . . . xnn
Now the interpolation problem pX = fX in (2.22) has a unique solution, if
and only if the linear equation system (2.23) has a unique solution. Therefore,
it remains to investigate the regularity of the Vandermonde matrix VX , where
we can rely on a classical result from linear algebra.
Theorem 2.5. For the determinant of the Vandermonde matrix VX , we have

det(VX ) = (xk − xj ).
0≤j<k≤n

Proof. We prove the statement by induction on n ∈ N, where we only use ele-

mentary properties of the determinant, in particular its linearity with respect
to the matrix rows.
Initial step: For n = 1 we have det(VX ) = x1 − x0 for X = {x0 , x1 }.
Induction hypothesis: Assume the statement is true for n points {x1 , . . . , xn }.
Induction step: For X = {x0 , x1 , . . . , xn } we have
⎡ ⎤ ⎡ ⎤
1 x0 . . . xn−1
0 xn0 1 x0 . . . xn0
⎢ 1 x1 . . . xn−1 xn1 ⎥ ⎢ 0 x1 − x0 . . . xn1 − xn0 ⎥
⎢ 1 ⎥ ⎢ ⎥
det(VX ) = det ⎢ . . .. .. ⎥ = det ⎢ .. .. .. ⎥
⎣ .. .. . . ⎦ ⎣ . . . ⎦
1 xn . . . xn−1
n xnn 0 xn − x0 . . . xnn − xn0
⎡ ⎤
x1 − x0 . . . xn1 − xn0
⎢ .. .. ⎥
= det ⎣ . . ⎦
xn − x0 . . . xnn − xn0
⎡ ⎤
x1 − x0 x21 − x0 x1 . . . xn1 − x0 xn−1
1
⎢ .. .. .. ⎥
= det ⎣ . . . ⎦
xn − x0 x2n − x0 xn . . . xnn − x0 xn−1 n
⎡ ⎤
1 x1 . . . xn−1
1
⎢ .. ⎥
= (x1 − x0 ) · . . . · (xn − x0 ) · det ⎣ ... ... . ⎦
1 xn . . . xn−1
n
4
Alexandre-Théophile Vandermonde (1735-1796), French mathematician
2.3 Interpolation by Algebraic Polynomials 21

= (x1 − x0 ) · . . . · (xn − x0 ) · det(VX\{x0 } )

= (x1 − x0 ) · . . . · (xn − x0 ) · (xk − xj )
1≤j<k≤n

= (xk − xj ),
0≤j<k≤n

which already completes our proof.

We can conclude that for any set X = {x0 , x1 , . . . , xn } of n + 1 pairwise
distinct interpolation points the Vandermonde matrix VX is regular. This
gives an answer to our initial question for the existence and uniqueness of a
solution to Problem 2.4. We summarize our discussion on this as follows.
Corollary 2.6. The interpolation problem (2.22), Problem 2.4, has a unique
solution p ≡ pf,X ∈ Pn whose coefficients with respect to its monomial repre-
sentation in (2.21) are given by the solution of the linear system (2.23).
We remark that the computation of the coefficient vector a in (2.23) is
rather problematic. This is because heterogeneous distributions of interpo-
lation points X typically yield ill-conditioned Vandermonde matrices VX .
Moreover, the computation of the coefficients in a ∈ Rn+1 via (2.23) is too
costly. Therefore, we prefer to avoid the linear system (2.23) for the solution
of Problem 2.4, mainly for the sake of numerical stability and computational
efficiency (see [28, Section 4.2]).
In fact, the above method for the solution to Problem 2.4 via (2.23) is
rather naive. We remark that the solution to the interpolation problem (2.22)
does not require a linear system at all. By choosing a suitable polynomial basis
we can immediately give the solution to Problem 2.4.
To this end, we consider the Lagrange5 polynomials
(x − x0 ) · . . . · (x − xj−1 ) · (x − xj+1 ) · . . . · (x − xn )
Lj (x) =
(xj − x0 ) · . . . · (xj − xj−1 ) · (xj − xj+1 ) · . . . · (xj − xn )

n
x − xk
= for 0 ≤ j ≤ n.
k=0
x j − xk
k=j

Any Lj is a polynomial of degree n, Lj ∈ Pn , for 0 ≤ j ≤ n, and we have

1 for k = j
Lj (xk ) = for all 0 ≤ j, k ≤ n.
0 j
for k =
Therefore, the Lagrange polynomials L0 , . . . , Ln are a basis of the polynomial
space Pn . Moreover, the solution p ∈ Pn to the interpolation problem (2.22)
is in its Lagrange representation given as
5
Joseph-Louis Lagrange (1736-1813), mathematician and astronomer
22 2 Basic Methods and Numerical Algorithms

n
p(x) = f0 L0 (x) + . . . + fn Ln (x) = fj Lj (x). (2.25)
j=0

Now let us make a ﬁrst and very simple example.

Example 2.7. For two distinct interpolation points in X = {x0 , x1 } the two
corresponding Lagrange polynomials L0 , L1 ∈ P1 are
x1 − x x − x0
L0 (x) = and L1 (x) = for x ∈ R.
x1 − x0 x1 − x0
Note that L0 + L1 ≡ 1. The Lagrange representation of the unique inter-
polation polynomial p ∈ P1 satisfying pX = fX , for given function values
fX = (f0 , f1 )T ∈ R2 , is given by the aﬃne combination
x1 − x x − x0
p(x) = · f0 + · f1 for x ∈ R
x1 − x0 x1 − x0
of the function values f0 and f1 . ♦
Now let us turn to a concrete interpolation problem.
Example 2.8. In this example, we consider interpolating the trigonometric
function f (x) = cos(x) by a cubic polynomial for the set of interpolation
points X = {0, π, 3π/2, 2π}. This yields the data vector fX = (1, −1, 0, 1)T ,
see Figure 2.2 (a). The cubic Lagrange polynomials for the points in X are

(x − π)(x − 3π/2)(x − 2π) 1 3
L0 (x) = = − 3 · (x − π) x − π (x − 2π)
(0 − π)(0 − 3π/2)(0 − 2π) 3π 2

(x − 0)(x − 3π/2)(x − 2π) 2 3
L1 (x) = = 3 · x x − π (x − 2π)
(π − 0)(π − 3π/2)(π − 2π) π 2

(x − 0)(x − π)(x − 2π) 8

L2 (x) = = − 3 · x(x − π)(x − 2π)
(3π/2 − 0)(3π/2 − π)(3π/2 − 2π) 3π

(x − 0)(x − π)(x − 3π/2) 1 3
L3 (x) = = 3 · x(x − π) x − π .
(2π − 0)(2π − π)(2π − 3π/2) π 2
The function graphs of L0 , L1 , L2 , L3 are shown in Figures 2.3 and 2.4.
The unique solution to the interpolation problem pX = fX is
p(x) = L0 (x) − L1 (x) + L3 (x)

1 3
= − 3 · (x − π) x − π (x − 2π)
3π 2

2 3
− 3 · x x − π (x − 2π)
π 2

1 3
+ 3 · x(x − π) x − π .
π 2
The function graph of p is shown in Figure 2.2 (b). ♦
2.3 Interpolation by Algebraic Polynomials 23

0.5

-0.5

-1

0 1 2 3 4 5 6

(a) f (x) = cos(x) and data X, fX

0.5

-0.5

-1

0 1 2 3 4 5 6

(b) Interpolant p ∈ P3 satisfying pX = fX

Fig. 2.2. For X = {0, π, 3π/2, 2π} and fX = (1, −1, 0, 1)T the cubic polynomial
p = L0 − L1 + L3 solves the interpolation problem pX = fX from Example 2.8.
24 2 Basic Methods and Numerical Algorithms

0.8

0.6

0.4

0.2

0 1 2 3 4 5 6

L0 (x) = − 3π1 3 · (x − π)(x − 32 π)(x − 2π)

1.4

1.2

0.8

0.6

0.4

0.2

-0.2

0 1 2 3 4 5 6

2
L1 (x) = π3
· x(x − 32 π)(x − 2π)

Fig. 2.3. The Lagrange polynomials L0 , L1 ∈ P3 for X = {0, π, 3π/2, 2π}.

2.3 Interpolation by Algebraic Polynomials 25

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1

0 1 2 3 4 5 6

L2 (x) = − 3π8 3 · x(x − π)(x − 2π)

0.8

0.6

0.4

0.2

0 1 2 3 4 5 6

1
L3 (x) = π3
· x(x − π)(x − 32 π)

Fig. 2.4. The Lagrange polynomials L2 , L3 ∈ P3 for X = {0, π, 3π/2, 2π}.

26 2 Basic Methods and Numerical Algorithms

Although the Lagrange representation in (2.25) leads us directly to the

solution of the interpolation problem (2.22), this Lagrangian interpolation
scheme is not our preferred solution in practice. In fact, from a numeri-
cal viewpoint, the evaluation and updating of the interpolation polynomial
in (2.25) is far too costly (see [28, Section 4.2] and [57, Section 8.2]).
An eﬃcient and numerically stable method to evaluate interpolation poly-
nomials is based on a recursive representation, which we explain in the fol-
lowing discussion. To this end, pk,j ∈ Pj denotes, for 0 ≤ j ≤ k ≤ n, the
unique polynomial of degree at most j satisfying the interpolation conditions

pk,j (x ) = f for all k − j ≤ ≤ k. (2.26)

For ﬁxed x ∈ R the values pk,j (x) can be computed recursively. This is done
by using the Aitken6 lemma.
Lemma 2.9. For the interpolation polynomials pk,j ∈ Pj satisfying (2.26)
we have the recursion

pk,0 (x) = fk for 0 ≤ k ≤ n

x − xk
pk,j (x) = pk,j−1 (x) + (pk−1,j−1 (x) − pk,j−1 (x))
xk−j − xk
xk−j − x x − xk
= pk,j−1 (x) + pk−1,j−1 (x) for k ≥ j > 0.
xk−j − xk xk−j − xk
Proof. We prove the statement by induction on j ≥ 0.
Initial step: For j = 0 we have pk,0 ≡ fk ∈ P0 , for 0 ≤ k ≤ n.
Induction hypothesis: Suppose pk,j−1 ∈ Pj−1 interpolates the data

(xk−j+1 , fk−j+1 ), . . . , (xk , fk )

and pk−1,j−1 ∈ Pj−1 is the interpolation polynomial for the data

(xk−j , fk−j ), . . . , (xk−1 , fk−1 ).

Induction step (j − 1 −→ j): Note that the right hand side of the recursion,
x − xk
q(x) := pk,j−1 (x) + (pk−1,j−1 (x) − pk,j−1 (x)),
xk−j − xk
is a polynomial of degree at most j, i.e., q ∈ Pj . From the stated recursion
and by using the induction hypothesis we can conclude that q, as well as pk,j ,
interpolates the data
(xk−j , fk−j ), . . . , (xk , fk ).
Therefore, we have q ≡ pk,j by uniqueness of the interpolant pk,j .
6
Alexander Craig Aitken (1895-1967), New Zealand mathematician
2.3 Interpolation by Algebraic Polynomials 27

By the recursion of the Aitken lemma, Lemma 2.9, we can, on given in-
terpolation points X and function values fX , recursively evaluate the unique
interpolation polynomial p ≡ pn,n ∈ Pn at any point x ∈ R. To this end, we
organize the values pk,j ≡ pk,j (x), for 0 ≤ j ≤ k ≤ n, in a triangular scheme
as follows.
f0 = p0,0
f1 = p1,0 p1,1
f2 = p2,0 p2,1 p2,2
.. .. .. . .
. . . .
fn = pn,0 pn,1 pn,2 · · · pn,n
The values in the ﬁrst column of the triangular scheme are the given function
values pk,0 = fk , for 0 ≤ k ≤ n. The values of the subsequent columns can be
computed, according to the recursion in the Aitken lemma, from two values in
the previous column. In this way, we can compute all entries of the triangular
scheme, column-wise from left to right, and so we obtain the sought function
value p(x) = pn,n .
To compute the entry pk,j we merely need (besides the interpolation
points xk−j and xk ) the two entries pk−1,j−1 and pk,j−1 from the previ-
ous column. If we compute the entries in each column from the bottom to
the top, then we can delete, in each step one entry, pk,j−1 , since pk,j−1 is no
longer needed in the subsequent computations.
This leads us to the Neville7 -Aitken algorithm, Algorithm 1, giving a
memory-eﬃcient variant of the Aitken recursion in Lemma 2.9. The Neville-
Aitken algorithm operates on the input data vector fX = (f0 , . . . , fn )T re-
cursively as shown in Algorithm 1.

Algorithm 1 Neville-Aitken algorithm

1: function Neville-Aitken(X,fX , x)
2: input: Interpolation points X = {x0 , . . . , xn };
3: Function values fX = (f0 , . . . , fn )T ∈ Rn+1 ;
4: Evaluation point x ∈ R for p ∈ Pn ;
5: for j = 1, . . . , n do
6: for k = n, . . . , j do
7: let
x − xk
fk := fk + (fk−1 − fk );
xk−j − xk
8: end for
9: end for
10: output: p(x) = fn .
11: end function

7
Eric Harold Neville (1889-1961), English mathematician
28 2 Basic Methods and Numerical Algorithms

2.4 Divided Diﬀerences and the Newton Representation

Now we use the Aitken recursion in Lemma 2.9 to elaborate a suitable repre-
sentation for interpolation polynomials. To this end, we consider for a ﬁxed
set X = {x0 , . . . , xn } of interpolation points the Newton8 polynomials

k−1
ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n. (2.27)
j=0

For the Newton polynomials we have

⎧
⎪
⎪
0 for < k,
⎨

ωk (x ) = k−1
⎪
⎪ (x − xj ) = 0 for ≥ k.
⎩
j=0

The Newton polynomials are obviously linearly independent, so that they

are a basis for the polynomial space Pn . Therefore, for any vector of function
values fX = (f0 , . . . , fn )T ∈ Rn+1 , the interpolation polynomial pn ∈ Pn to
fX has unique Newton coefficients b0 , b1 , . . . , bn ∈ R, so that

n
pn (x) = bk ωk (x)
k=0
= b0 + b1 (x − x0 ) + . . . + bn (x − x0 ) · . . . · (x − xn−1 ). (2.28)

The form of the polynomial pn in (2.28) is called Newton representation.

Next, we turn to the computation of the Newton coeﬃcients in (2.28).
We start with the following scheme involving the function values of pn on X.

f0 = pn (x0 ) = b0
f1 = pn (x1 ) = b0 + b1 (x1 − x0 )
f2 = pn (x2 ) = b0 + b1 (x2 − x0 ) + b2 (x2 − x0 )(x2 − x1 )
.. ..
. .
fn = pn (xn ) = b0 + . . . + bn (xn − x0 ) · . . . · (xn − xn−1 ).

Note that the Newton coeﬃcients bk of pn can be determined recursively by

⎛ ⎞
1
k−1
bk = ⎝fk − bj ωj (xk )⎠ for k = 0, . . . , n. (2.29)
ωk (xk ) j=0

Further note that for the computation of bk we only need the ﬁrst k + 1 data
8
Sir Isaac Newton (1643-1727), English philosopher and scientist
2.4 Divided Diﬀerences and the Newton Representation 29

(x0 , f0 ), (x1 , f1 ), . . . , (xk , fk ).

This gives the Newton representation an important advantage concer-

ning updating: If we add one datum (xn+1 , fn+1 ) to Xn = {x0 , . . . , xn } and
fXn , then it will be rather simple to update the interpolation polynomial pn
in (2.28). In fact, for the interpolation polynomial pn+1 ∈ Pn+1 from data
Xn+1 = {x0 , . . . , xn , xn+1 } and fXn+1 we have the representation

n
pn+1 (x) = pn (x) + bn+1 (x − xk ) = pn (x) + bn+1 ωn+1 (x),
k=0

where under the additional interpolation condition pn+1 (xn+1 ) = fn+1 we

immediately get
fn+1 − pn (xn+1 )
bn+1 = .
ωn+1 (xn+1 )
Now in order to develop a systematic scheme for computing the Newton
coeﬃcients of interpolation polynomials we introduce divided diﬀerences.

Deﬁnition 2.10. On given data X = {x0 , . . . , xn } ⊂ R and fX ∈ Rn+1 let

n
p(x) = a k x k ∈ Pn (2.30)
k=0

be the unique interpolation polynomial satisfying pX = fX . Then, the leading

coeﬃcient an ∈ R of p in (2.30) is called the n-th divided difference of
f with respect to X, where we use the notation

an = [x0 , . . . , xn ](f ). (2.31)

The linear mapping [x0 , . . . , xn ] : C (R) −→ R is referred to as the divided

difference operator, or the difference operator, with respect to X.

Before we discuss relevant properties of divided diﬀerences, we ﬁrst make

a remark for further clariﬁcation.

Remark 2.11. Note that the n-th divided difference [x0 , . . . , xn ](f ) in Defi-
nition 2.10 is the leading coefficient of the interpolation polynomial p for f
on X with respect to its monomial representation (2.30). We remark that
the leading coefficient of p with respect to its monomial representation (2.30)
coincides with the leading coefficient of p with respect to its Newton repre-
sentation in (2.28) so that we have

p(x) = [x0 , . . . , xn ](f )xn + an−1 xn−1 + . . . + a1 x + a0 (2.32)

with the coeﬃcients a0 , . . . , an−1 of the monomial representation of p and

p(x) = [x0 , . . . , xn ](f )ωn (x) + bn−1 ωn−1 (x) + . . . + b1 ω1 (x) + b0 (2.33)
30 2 Basic Methods and Numerical Algorithms

with the coeﬃcients b0 , . . . , bn−1 of the Newton representation. This proper-

ty follows directly from the structure of the Newton polynomials ωk ∈ Pk
in (2.27), with the leading Newton polynomial

n−1
ωn (x) = (x − xj ) ∈ Pn
j=0

as a product of the n linear factors x − xj , for 0 ≤ j ≤ n − 1. Indeed, note

that the leading coefficient of ωn (with respect to the monomial basis) is one.
Therefore, the leading coefficients of p, as in the Newton representation (2.33)
and in the monomial representation (2.32), must be equal.
In hindsight, we could as well have introduced the n-th divided difference
[x0 , . . . , xn ](f ) in Definition 2.10 as the leading coefficient bn of the interpo-
lation polynomial p in its Newton representation in (2.28). Nevertheless, we
have decided to follow the common standard from the literature. We finally
remark that in the following recursive evaluation of [x0 , . . . , xn ](f ) by divided
differences (of a smaller order than n), the monomial representation (2.32)
of the interpolation polynomial p will be quite useful.
In our discussion so far, we have not made any assumptions on the or-
dering of the interpolation points in X. Since the interpolation polynomial
p is, on given data fX , always unique, we can conclude that the leading co-
efficient an of p in its monomial representation (2.30) is independent of the
interpolation points’ order in X. We formulate this observation as follows.
Proposition 2.12. For X = {x0 , . . . , xn } and fX ∈ Rn+1 the divided
difference [x0 , . . . , xn ](f ) is independent of the order of interpolation points
x0 , . . . , xn in X, i.e., for any permutation σ of the indices {0, . . . , n} we have

[x0 , . . . , xn ](f ) = [xσ(0) , . . . , xσ(n) ](f ).

As we show now, all coeﬃcients in the Newton representation (2.28) of
the interpolation polynomial p are divided diﬀerences.
Theorem 2.13. For X = {x0 , . . . , xn } and fX ∈ Rn+1 ,

n
p(x) = [x0 , . . . , xk ](f ) · ωk (x) ∈ Pn (2.34)
k=0

is the unique interpolation polynomial satisfying pX = fX .

Proof. We prove the statement by induction on n.
Initial step: For n = 0 we have p ≡ f0 = [x0 ](f ) for X = {x0 } and f0 ∈ R.
Induction hypothesis: Assume that, on given data X = {x0 , . . . , xn−1 } and
fX ∈ Rn , n ≥ 1,
2.4 Divided Diﬀerences and the Newton Representation 31

n−1
p= [x0 , . . . , xk ](f ) · ωk ∈ Pn−1
k=0

is the unique interpolation polynomial in Pn−1 satisfying pX = fX .

Induction step (n−1 −→ n): On given data X = {x0 , . . . , xn } and fX ∈ Rn+1
the unique interpolation polynomial p ∈ Pn has, according to Remark 2.11,
the representations (2.32) and

p(x) = [x0 , . . . , xn ](f ) · ωn (x) + qn−1 (x) (2.35)

with qn−1 ∈ Pn−1 , where the latter follows directly from (2.33). Since

qn−1 (xk ) = p(xk ) − [x0 , . . . , xn ](f ) ωn (xk ) = p(xk ),

& '( )
=0

for all 0 ≤ k ≤ n − 1, we see that the polynomial qn−1 ∈ Pn−1 is the

unique interpolation polynomial to f from Pn−1 on the interpolation points
x0 , . . . , xn−1 . By the induction hypothesis, qn−1 has the representation

n−1
qn−1 = [x0 , . . . , xk ](f ) · ωk .
k=0

This in combination with (2.35) completes our proof already, since

n−1
n
p = [x0 , . . . , xn ](f ) · ωn + [x0 , . . . , xk ](f ) · ωk = [x0 , . . . , xk ](f ) · ωk .
k=0 k=0

We ﬁnally turn to the computation of the divided diﬀerences. To this end,

we rely on the Aitken recursion in Lemma 2.9.

Theorem 2.14. For X = {x0 , . . . , xn } and fX ∈ Rn+1 the recursion

[xj+1 , . . . , xk ](f ) − [xj , . . . , xk−1 ](f )

[xj , . . . , xk ](f ) = for 0 ≤ j < k ≤ n
xk − xj
[xj ] (f ) = f (xj ) for 0 ≤ j ≤ n

holds.

Proof. For n ≥ k ≥ j ≥ 0, let pj,k ∈ Pk−j be the unique interpolation poly-

nomial to f from Pk−j on the interpolation points {xj , . . . , xk }. Moreover,
for k > j, let pj+1,k ∈ Pk−j−1 be the unique interpolation polynomial to f on
{xj+1 , . . . , xk } and pj,k−1 ∈ Pk−j−1 be the unique interpolation polynomial
to f on {xj , . . . , xk−1 }.
Then, by the Aitken recursion in Lemma 2.9, we have the representation
32 2 Basic Methods and Numerical Algorithms

(xj − x)pj+1,k (x) − (xk − x)pj,k−1 (x)

pj,k (x) = . (2.36)
xj − xk
If we compare the leading coeﬃcients in (2.36), then we get
−[xj+1 , . . . , xk ](f ) + [xj , . . . , xk−1 ](f )
[xj , . . . , xk ](f ) =
xj − x k
[xj+1 , . . . , xk ](f ) − [xj , . . . , xk−1 ](f )
=
xk − xj
for n ≥ k > j ≥ 0. For j = k we get [xj ](f ) = f (xj ), for 0 ≤ j ≤ n.
Example 2.15. For X = {x0 , x1 } ⊂ R and fX = (f0 , f1 ) T
∈ R , the ﬁrst
2

order divided diﬀerence yields the diﬀerence quotient

[x1 ](f ) − [x0 ](f ) f1 − f 0
[x0 , x1 ](f ) = = .
x1 − x0 x1 − x0
If f is diﬀerentiable at x0 , i.e., f ∈ C 1 (x0 − ε, x0 + ε), for ε > 0, then we have

lim [x0 , x1 ](f ) = f (x0 ).

x1 →x0

Therefore, we allow coinciding interpolation points for f ∈ C 1 , where we let

[x, x](f ) := f (x).

♦
By the recursion in Theorem 2.14 we can view the n-th divided diﬀerence
[x0 , . . . , xn ](f ) as a discretization of the n-th derivative of f ∈ C n . We will
be more precise on this observation later in this section.

Table 2.1. Organization of divided diﬀerences, on input data X = {x0 , . . . , xn }

and fX = (f0 , . . . , fn )T ∈ Rn+1 , in a triangular scheme.

X fX
x0 f0
x1 f1 [x0 , x1 ](f )
x2 f2 [x1 , x2 ](f ) [x0 , x1 , x2 ](f )
.. .. .. .. ..
. . . . .
xn fn [xn−1 , xn ](f ) [xn−2 , xn−1 , xn ](f ) ··· [x0 , . . . , xn ](f )

On given points X = {x0 , . . . , xn } and values fX = (f0 , . . . , fn )T ∈ Rn+1

we can evaluate all divided diﬀerences [xj , . . . , xk ](f ), for 0 ≤ j ≤ k ≤ n,
2.4 Divided Diﬀerences and the Newton Representation 33

by using the efficient and stable recursion of Theorem 2.14. To this end, we
organize the divided differences in a triangular scheme, as shown in Table 2.1.
The organization of the data in Table 2.1 reminds us of the triangular
scheme of the Neville-Aitken algorithm, Algorithm 1. In fact, to compute
the Newton coefficients [x0 , . . . , xk ](f ) in (2.34), we can (similarly as in Al-
gorithm 1) process the data in Table 2.1 by a memory-efficient algorithm
operating only on the data vector fX = (f0 , . . . , fn )T , see Algorithm 2.

Algorithm 2 Computation of Newton coeﬃcients [x0 , . . . , xk ](f )

1: function Divided Differences(X,fX )
2: input: interpolation points X = {x0 , x1 , . . . , xn };
3: function values fX = (f0 , f1 , . . . , fn )T ∈ Rn+1 ;
4: for j = 1, . . . , n do
5: for k = n, . . . , j do
6: let
fk − fk−1
fk := ;
xk − xk−j
7: end for
8: end for
9: output: (f0 , . . . , fn ) = ([x0 ](f ), [x0 , x1 ](f ), . . . , [x0 , . . . , xn ](f ))T ∈ Rn+1 .
10: end function

For further illustration, we make an example that is linked to Example 2.8.

Example 2.16. We consider interpolating the function f (x) = cos(x) on

interpolation points X3 = {0, π, 3π/2, 2π}. By fX3 = (1, −1, 0, 1) we get the
following divided diﬀerences in the triangular scheme of Table 2.1, for n = 3.

X3 fX3
0 1
π −1 − π2
3 2 8
2π 0 π 3π 2

2π 1 2
π 0 − 3π4 3

The Newton polynomials ω0 , . . . , ω3 for the point set X3 are given by

3
ω0 ≡ 1, ω1 (x) = x, ω2 (x) = x(x − π), ω3 (x) = x(x − π) x − π .
2

Therefore, the cubic polynomial

2 8 4 3
p3 (x) = 1 − x + 2 x(x − π) − 3 x(x − π) x − π (2.37)
π 3π 3π 2
34 2 Basic Methods and Numerical Algorithms

is the unique interpolation polynomial in P3 satisfying pX3 = fX3 .

The leading coefficient of the interpolation polynomial p3 in its Newton
representation (2.37) coincides with that of its monomial representation (see
Remark 2.11). The leading coefficient of p3 with respect to its monomial
representation can also be obtained by the sum of the coefficients of its La-
grange representation (see Example 2.8), i.e.,
1 2 1 4
− 3
− 3 + 3 = − 3.
3π π π 3π
On the downside, the approximation quality of the cubic interpolation
polynomial p3 for f on X3 is rather bad, see Figure 2.5 (a), where we find
p3 − f ∞,[0,2π] ≈ 1.1104 for the approximation error. To improve on the
approximation quality we add one interpolation point x4 = π/4 and so we
obtain X4 = {0, π, 3π/2, 2π,
√ π/4} for the updated set of interpolation points
and fX4 = (1, −1, 0, 1, 1/ 2) for the updated data vector of function values.
To compute the interpolation polynomial p4 ∈ P4 we update the triangular
scheme (see Table 2.1, for n = 4) as follows.

X4 fX4
0 1
π −1 − π2
3 2 8
2π 0 π 3π 2

2π 1 2
π √ 0 √
− 3π4 3√ √
4(1− 2) 8(2+5 2) 32(2+5 2)
π
4
√1
2
− 7√2π √
35 2π 2
− 105√2π3 − 16(16+5
√
105 2π 4
2)

Therefore, the quartic (i.e., degree four) polynomial

√
16(16 + 5 2) 3
p4 (x) = p3 (x) − √ x(x − π) x − π (x − 2π)
105 2π 4 2
is the unique interpolation polynomial in P4 satisfying pX4 = fX4 , where the
approximation error p4 − f ∞,[0,2π] ≈ 0.0736 of p4 ∈ P4 is much smaller
than that of p3 ∈ P3 , see Figure 2.5 (b). ♦
Next, we develop a very useful representation for divided differences,
termed the Hermite9 -Genocchi10 formula, whereby divided differences can
be viewed as mean values of derivatives of f over a simplex spanned by the
interpolation points. In the following formulation for the Hermite-Genocchi
formula, we regard the n-dimensional standard simplex
* +
n
n
Δn = (λ1 , . . . , λn ) ∈ R λk ≥ 0 for 1 ≤ k ≤ n and
T
λk ≤ 1 . (2.38)
k=1
9
Charles Hermite (1822-1901), French mathematician
10
Angelo Genocchi (1817-1889), Italian mathematician
2.4 Divided Differences and the Newton Representation 35

0.5

-0.5

-1

0 1 2 3 4 5 6

(a) p3 with approximation error p3 − f ∞,[0,2π] ≈ 1.1104

0.5

-0.5

-1

0 1 2 3 4 5 6

(b) p4 with approximation error p4 − f ∞,[0,2π] ≈ 0.0736

Fig. 2.5. (a) The cubic polynomial p3 ∈ P3 interpolates the trigonometric function
f (x) = cos(x) on X3 = {0, π, 3π/2, 2π}. (b) The quartic polynomial p4 ∈ P4
interpolates f (x) = cos(x) on X4 = {0, π, 3π/2, 2π, π/4} (see Example 2.16).
36 2 Basic Methods and Numerical Algorithms

Theorem 2.17. For f ∈ C n , n ≥ 1, the Hermite-Genocchi formula

, -

n
[x0 , . . . , xn ](f ) = f (n)
x0 + λk (xk − x0 ) dλ,
Δn k=1

holds, where Δn is the n-dimensional standard simplex (2.38) in Rn .

Proof. We prove the Hermite-Genocchi formula by induction on n.

Initial step: For n = 1, we have Δ1 = [0, 1] and so

1
f (x0 + λ1 (x1 − x0 )) dλ1 = (f (x1 ) − f (x0 )) = [x0 , x1 ](f ).
Δ1 x1 − x0

Induction hypothesis: Suppose the Hermite-Genocchi formula holds for n ≥ 1.

Induction step (n − 1 −→ n): For dλ = dλ1 · · · dλn−1 we have
, -

n
f (n)
x0 + λk (xk − x0 ) dλ dλn
Δn k=1
n−1 , -
1− k=1 λk
n−1
= f (n)
x0 + λk (xk − x0 ) + λn (xn − x0 ) dλn dλ
Δn−1 0 k=1
. n−1 /
xn + λk (xk −xn )
1 k=1
(n)
= n−1
f (z) dz dλ
xn − x0 Δn−1 x0 + λk (xk −x0 )
. , -
k=1

1
n−1
= f (n−1)
xn + λk (xk − xn ) dλ
xn − x0 Δn−1 k=1
, - /

n−1
− f (n−1)
x0 + λk (xk − x0 ) dλ
Δn−1 k=1
1
= ([xn , x1 , . . . , xn−1 ](f ) − [x0 , . . . , xn−1 ](f ))
xn − x0
1
= ([x1 , . . . , xn ](f ) − [x0 , . . . , xn−1 ](f ))
xn − x0
= [x0 , . . . , xn ](f ).

Now we can state further properties of divided diﬀerences. The following

results are direct consequences from the Hermite-Genocchi formula, Theo-
rem 2.17, and the standard mean value theorem of integration.
2.4 Divided Diﬀerences and the Newton Representation 37

Corollary 2.18. The divided diﬀerences satisfy the following properties.

(a) For f ∈ C n , n ≥ 0, we have

f (n) (τ )
[x0 , . . . , xn ](f ) = for some τ ∈ [xmin , xmax ],
n!
where xmin = min0≤k≤n xk and xmax = max0≤k≤n xk .
For x0 = . . . = xn , we have

f (n) (x0 )
[x0 , . . . , xn ](f ) = .
n!
(b) For p ∈ Pn−1 , we have [x0 , . . . , xn ](p) = 0 for n ≥ 1.

The discretization of higher order derivatives by divided diﬀerences is

consistent with the standard product rule of diﬀerentiation. We show this by
proving the Leibniz11 rule.

Corollary 2.19. For arbitrary points x0 , . . . , xn and f, g ∈ C n , n ∈ N0 , the

Leibniz formula

n
[x0 , . . . , xn ](f · g) = [x0 , . . . , xj ](f ) · [xj , . . . , xn ](g) (2.39)
j=0

holds.

Proof. Suppose that X = {x0 , . . . , xn } is a set of pairwise distinct points.

Moreover, let pf ∈ Pn be the unique interpolation polynomial for f on X
and pg ∈ Pn be the unique interpolation polynomial for g on X. Then, pf
and pg have the representations

n
n
pf = [x0 , . . . , xk ](f )ωk and pg = [xj , . . . , xn ](g)0
ωj
k=0 j=0

with the Newton polynomials

k−1
n
ωk (x) = (x − x ) ∈ Pk and 0j (x) =
ω (x − xm ) ∈ Pn−j
=0 m=j+1

for 0 ≤ k, j ≤ n, where we have used the independence of the divided diﬀe-

rences on the order of the interpolation points in X (cf. Proposition 2.12).
Now the product
11
Gottfried Wilhelm Leibniz (1646-1716), German philosopher and scientist
38 2 Basic Methods and Numerical Algorithms

n
p := pf · pg = [x0 , . . . , xk ](f ) ωk · [xj , . . . , xn ](g) ω
0j (2.40)
k,j=0

interpolates the function f · g on X. For the Newton polynomials ωk and ω

0j
we have
ωk (xi ) · ω
0j (xi ) = 0 for all 0 ≤ i ≤ n,
for k > j, so that the polynomial p in (2.40) has the representation

n
p= [x0 , . . . , xk ](f ) · [xj , . . . , xn ](g) ωk · ω
0j .
k,j=0
k≤j

Since ωk · ω
0j ∈ Pn+k−j , for all 0 ≤ k, j ≤ n, we have p ∈ Pn . Therefore, p is
the unique interpolation polynomial in Pn for f · g on X, and so we obtain
the stated representation

n
[x0 , . . . , xn ](f · g) = [x0 , . . . , xj ](f ) · [xj , . . . , xn ](g) (2.41)
j=0

for the case of pairwise distinct points x0 , . . . , xn .

By the Hermite-Genocchi formula, Theorem 2.17, the representation
, -

m
[x0 , . . . , xm ](h) = h (m)
x0 + λk (xk − x0 ) dλ (2.42)
Δm k=1

holds for h ∈ C m . Therefore, the divided diﬀerences [x0 , . . . , xm ](h) are for
h ∈ C m continuous in X, since the integrand h(m) in (2.42) is continuous in
X. Since f · g ∈ C n , we can conclude that the representation (2.41) holds for
arbitrary point sets X = {x0 , . . . , xn }.

Remark 2.20. For coincident points x0 = . . . = xn , the Leibniz formula

(2.39), in combination with Corollary 2.18 (a), yields the identity

(f · g)(n) (x0 ) f (j) (x0 ) g (n−j) (x0 )

n
= ·
n! j=0
j! (n − j)!

and so

n
n!
(f · g)(n) (x0 ) = f (j) (x0 ) g (n−j) (x0 )
j=0
j! (n − j)!
n
n (j)
= f (x0 ) g (n−j) (x0 ),
j=0
j

which is the standard product rule for higher derivatives.

2.4 Divided Diﬀerences and the Newton Representation 39

From Corollary 2.18 (a) we see that divided differences are also well-
defined for coincident interpolation points, provided that f has sufficiently
many derivatives. In particular, for the case of coincident interpolation points,
all coefficients in the Newton representation (2.34) are in this case well-defined
(cf. Example 2.15). Now we extend the problem of Lagrange interpolation,
Problem 2.4, to the problem of Hermite interpolation. In the case of Hermite
interpolation, the interpolation conditions contain not only point evaluations
of f , but also derivative values of f . In this case, we require coincident inter-
polation points. To be more precise, we formulate the Hermite interpolation
problem as follows.

Problem 2.21. Let X = {x0 , . . . , xn } be a set of n + 1 pairwise distinct

interpolation points. Moreover, suppose we are given N = μ0 + μ1 + . . . + μn
Hermite data

f () (xk ) for 0 ≤ < μk and 0 ≤ k ≤ n (2.43)

for f ∈ C m−1 , where m = maxk μk and μk ∈ N for 0 ≤ k ≤ n.

Then, the Hermite interpolation problem for (2.43) requires determining
a polynomial p ∈ PN −1 satisfying the Hermite interpolation conditions

p() (xk ) = f () (xk ) for 0 ≤ < μk and 0 ≤ k ≤ n. (2.44)

Note that Lagrange interpolation, Problem 2.4, is by μk = 1, 0 ≤ k ≤ n,

and N = n + 1 a special case of Hermite interpolation. Further note that the
Hermite data in (2.43) necessarily need to contain, for every interpolation
point xk ∈ X, all derivatives

f (xk ), f (xk ), . . . , f (μk −1) (xk ) for k = 0, . . . , n.

In the following solution to Hermite interpolation, Problem 2.21, we ﬁrst

add interpolation points to X, such that the resulting point set Y contains the
interpolation points xk multiple times, namely according to its multiplicity
μk of the Hermite data in (2.44). This leads us to the extended point set
1 2
Y = x0 , . . . , x0 , x1 , . . . , x1 , . . . , xn , . . . , xn = {y0 , . . . , yN −1 } (2.45)
& '( ) & '( ) & '( )
μ0 -fold μ1 -fold μn -fold

containing N = μ0 + μ1 + . . . + μn interpolation points (including their

multiplicities), where x0 = y0 = . . . = yμ0 −1 and

xk = yμ0 +...+μk−1 = . . . = yμ0 +...+μk −1 for 1 ≤ k ≤ n.

We can solve the Hermite interpolation problem, Problem 2.21, as follows.

40 2 Basic Methods and Numerical Algorithms

Theorem 2.22. The problem of Hermite interpolation, Problem 2.21, has

a unique solution p ∈ PN −1 . For the extended set of interpolation points
Y = {y0 , . . . , yN −1 } in (2.45) and the divided diﬀerences

[y0 , . . . , yk ](f ) for 0 ≤ k < N

the interpolation polynomial p has the Newton representation

N −1
p(x) = [y0 , . . . , yk ](f )ωk (x). (2.46)
k=0

Proof. The linear mapping L : PN −1 −→ RN , deﬁned as

p −→ L(p) = (p(x0 ), . . . , p(μ0 −1) (x0 ), . . . , p(xn ), . . . , p(μn −1) (xn ))T ∈ RN ,

is injective, due to the fundamental theorem of algebra. Further, due to the

dimension formula of linear algebra, L is also surjective, and so L is bijective.
The Newton representation (2.46) for p follows directly from our solu-
tion (2.34) to Lagrange interpolation, Problem 2.4, which holds in particular
for the case of coincident interpolation points (with using our results on di-
vided diﬀerences for coincident interpolation points).

Again, we can organize the divided diﬀerences of the Newton represen-

tation (2.46) in a triangular scheme (as in Table 2.1). Moreover, we can use
the recursion of Algorithm 2 to compute the scheme’s entries, where for the
case of coincident interpolation points yk = yk−j (see line 6 in Algorithm 2)
we insert the corresponding derivative value f (j) (yk )/j!.
For further illustration, we ﬁnally discuss the following example.

Example 2.23. We consider interpolating the sinc function f (x) = sin(x)/x.

We have
x cos(x) − sin(x) 2 sin(x) − 2x cos(x) − x2 sin(x)
f (x) = and f (x) = .
x2 x3
For the interpolation of f , we work with the Hermite data
1 2
f (0) = 1, f (0) = 0, f (π) = 0, f (π) = − , f (π) = 2 , f (2π) = 0.
π π
This gives the extended set of interpolation points Y = {0, 0, π, π, π, 2π}.
We display the divided diﬀerences of the Newton representation (2.46) in
a triangular scheme (as in Table 2.1 for n = 5) as follows, where we mark the
inserted derivative values f (j) (yk )/j! by a box, respectively.
2.5 Error Estimates and Optimal Interpolation Points 41

Y fY
0 1
0 1 0
π 0 − π1 − π12

π 0 − π1 0 1
π3

π 0 − π1 1
π2
1
π3 0
2π 0 0 1
π2 0 − 2π1 4 − 4π1 5

Given the above divided diﬀerences, we see that the polynomial

1 2 1 1
p5 (x) = 1 − 2
x + 3 x2 (x − π) − 5 x2 (x − π)3 ∈ P5
π π 4π
is the unique solution to the posed Hermite interpolation problem. ♦

2.5 Error Estimates and Optimal Interpolation Points

In this section, we develop error estimates, i.e., upper bounds on the diﬀerence

f (x) − p(x) for x ∈ [a, b] (2.47)

between f and the interpolation polynomial p. In the following discussion, we

regard the problem of Lagrange interpolation, Problem 2.4, as a special case of
Hermite interpolation, Problem 2.21. To unify the notations of Problems 2.4
and 2.21 we denote by Y = {y0 , . . . , yN −1 } ⊂ [a, b] the extended set of
interpolation points, where for the case of Hermite interpolation we allow
coincident interpolation points, according to Problem 2.21 and as in (2.45).
We denote the unique solution to the Hermite interpolation problem by pN −1 .
In particular, the Newton representation (2.46) holds for pN −1 ∈ PN −1 .
We can represent the error in (2.47) as follows.
Theorem 2.24. Let pN −1 ∈ PN −1 be the solution to the Hermite interpola-
tion problem, Problem 2.21. Then we have the pointwise error representation

N −1
f (x) − pN −1 (x) = [y0 , . . . , yN −1 , x](f ) (x − yk ) for x ∈ R. (2.48)
k=0

Proof. The error representation in (2.48) is obviously fulﬁlled for any x ∈ Y .

Indeed, in this case, we have f (x) = pN −1 (x), and, moreover, the Newtonian
knot polynomial
N−1
ωY (x) := (x − yk ) (2.49)
k=0
42 2 Basic Methods and Numerical Algorithms

vanishes at every interpolation point from Y .

Now for x ∈ R\Y , we extend Y by the interpolation point x. Moreover, we
let pN ∈ PN denote the unique polynomial in PN which satisﬁes the Hermite
conditions (2.44) and the additional interpolation condition pN (x) = f (x).
In this case, we have the representation

N −1
pN (x) = pN −1 (x) + [y0 , . . . , yN −1 , x](f ) (x − yk )
k=0

and so
, −1
-

N
f (x) − pN −1 (x) = f (x) − pN (x) − [y0 , . . . , yN −1 , x](f ) (x − yk )
k=0

N −1
= [y0 , . . . , yN −1 , x](f ) (x − yk ).
k=0

Theorem 2.24 immediately yields the following upper bound for the in-
terpolation error f − p in (2.47) on the interval [a, b], where we combine the
representation in (2.48) with the result of Corollary 2.18 (a).

Corollary 2.25. Let p ∈ PN −1 denote the unique solution to the Hermite

interpolation problem, Problem 2.21. Then we have for f ∈ C N the pointwise
error estimate

N
−1
1
|f (x) − p(x)| ≤ max |f (ξ)| ·
(N )
(x − yk ) (2.50)
N ! ξ∈[a,b]
k=0

in x ∈ [a, b].

As a direct consequence of Corollary 2.25, the uniform error estimate

f (N ) ∞
f − p∞ ≤ · ωY ∞ for f ∈ C N [a, b] (2.51)
N!
follows from the pointwise error estimate in (2.50) for any compact interval
[a, b] ⊂ R containing the set of interpolation points Y , i.e., Y ⊂ [a, b].
To reduce the interpolation error in (2.51), we wish to minimize the maxi-
mum norm ωY ∞ of the knot polynomial ωY under variation of the inter-
polation points in Y ⊂ [a, b]. Without loss of generality, we restrict ourselves
to the interval [a, b] = [−1, 1]. This immediately leads us to the nonlinear
optimization problem

ωX ∞,[−1,1] −→ min ! (2.52)

X⊂[−1,1]
|X|=n+1
2.5 Error Estimates and Optimal Interpolation Points 43

As we show in this section, the minimization problem in (2.52) has a

unique solution X ∗ ⊂ [−1, 1] consisting of n+1 pairwise distinct interpolation
points. This explains our chosen notation X = Y and n = N − 1, which is
in accordance with the Lagrange interpolation problem, Problem 2.4. We
formulate the minimization problem in (2.52) more precisely as follows.
Problem 2.26. Determine for n ∈ N0 a set X ∗ = {x∗0 , . . . , x∗n } ⊂ [−1, 1]
of n + 1 interpolation points, which minimizes the maximum norm of the
corresponding knot polynomial ωX ∗ on [−1, 1], so that the upper bound
ωX ∗ ∞,[−1,1] ≤ ωX ∞,[−1,1] (2.53)
holds for all point sets X = {x0 , . . . , xn } ⊂ [−1, 1] of size |X| = n + 1.
For the solution of the minimization problem, Problem 2.26, we work with
the Chebyshev polynomials
Tn (x) = cos(n arccos(x)) for n ∈ N0 , (2.54)
where in the subsequent discussion we rely on their following properties.
Theorem 2.27. The Chebyshev polynomials are generated by the recursion
Tn+1 (x) = 2xTn (x) − Tn−1 (x) for n ∈ N (2.55)
with the initial values T0 ≡ 1 and T1 (x) = x.
Proof. The initial values T0 ≡ 1 and T1 (x) = x are obviously consistent with
Definition (2.54). For φ = arccos(x), we find the representation
cos((n + 1)φ) = 2 cos(φ) cos(nφ) − cos((n − 1)φ)
from standard trigonometric identities, which implies the recursion (2.55).
Corollary 2.28. For n ∈ N0 , the Chebyshev polynomial Tn+1 is an algebraic
polynomial of degree n + 1 with leading coefficient 2n , so that an identity of
the form
Tn+1 (x) = 2n xn+1 + qn (x) (2.56)
holds for some qn ∈ Pn .
Proof. We prove the identity (2.56) by induction on n ∈ N0 . For n = 0, the
statement is trivial. Under the induction hypothesis for n ∈ N0 , the statement
in (2.56) follows, for n + 1, directly from the recursion in (2.55).
Corollary 2.29. The n + 1 zeros of the Chebyshev polynomial Tn+1 ∈ Pn+1
are, for n ∈ N0 , given by the Chebyshev knots

∗ 2k + 1
xk = cos π ∈ [−1, 1] for 0 ≤ k ≤ n. (2.57)
2n + 2
Moreover, all extrema of Tn+1 on [−1, 1] are attained at the n + 2 points

k
yk = cos π ∈ [−1, 1] for 0 ≤ k ≤ n + 1. (2.58)
n+1
44 2 Basic Methods and Numerical Algorithms

Proof. For 0 ≤ k ≤ n, we have

Tn+1 (x∗k ) = cos((n + 1) arccos(x∗k ))

2k + 1 π
= cos (n + 1) π = cos (2k + 1) = 0.
2(n + 1) 2
The n + 1 Chebyshev knots x∗k in (2.57) are obviously pairwise distinct.
Therefore, the Chebyshev polynomial Tn+1 ∈ Pn+1 \{0} has no further zeros.
As regards the extrema of Tn+1 , we have Tn+1 ∞,[−1,1] ≤ 1 and, moreover,

k
Tn+1 (yk ) = cos (n + 1) arccos cos π = cos(kπ) = (−1)k ,
n+1
so that all n + 2 points in Y = {y0 , . . . , yn+1 } ⊂ [−1, 1] are extremal points
for Tn+1 on [−1, 1]. Since Tn+1 is a polynomial of degree n + 1, its derivative

Tn+1 has at most n zeros. Therefore, Tn+1 has at most n extrema in the open
interval (−1, 1) and at most n + 2 extrema in the closed interval [−1, 1]. But
this implies that Y already contains all zeros of Tn+1 on [−1, 1].

Table 2.2. Monomial form of the Chebyshev polynomials Tn ∈ Pn , n = 1, . . . , 12.

T1 (x) = x

T2 (x) = 2x2 − 1

T3 (x) = 4x3 − 3x

T4 (x) = 8x4 − 8x2 + 1

T5 (x) = 16x5 − 20x3 + 5x

T6 (x) = 32x6 − 48x4 + 18x2 − 1

T7 (x) = 64x7 − 112x5 + 56x3 − 7x

T8 (x) = 128x8 − 256x6 + 160x4 − 32x2 + 1

T9 (x) = 256x9 − 576x7 + 432x5 − 120x3 + 9x

T10 (x) = 512x10 − 1280x8 + 1120x6 − 400x4 + 50x2 − 1

T11 (x) = 1024x11 − 2816x9 + 2816x7 − 1232x5 + 220x3 − 11x

T12 (x) = 2048x12 − 6144x10 + 6912x8 − 3584x6 + 840x4 − 72x2 + 1

2.5 Error Estimates and Optimal Interpolation Points 45
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

-0.2 -0.2 -0.2

-0.4 -0.4 -0.4

-0.6 -0.6 -0.6

-0.8 -0.8 -0.8

-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

T1 T2 T3
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

-0.2 -0.2 -0.2

-0.4 -0.4 -0.4

-0.6 -0.6 -0.6

-0.8 -0.8 -0.8

-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

T4 T5 T6
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

-0.2 -0.2 -0.2

-0.4 -0.4 -0.4

-0.6 -0.6 -0.6

-0.8 -0.8 -0.8

-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

T7 T8 T9
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

-0.2 -0.2 -0.2

-0.4 -0.4 -0.4

-0.6 -0.6 -0.6

-0.8 -0.8 -0.8

-1 -1 -1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

T10 T11 T12

Fig. 2.6. Chebyshev polynomials Tn ∈ Pn and their n knots, for n = 1, . . . , 12.

46 2 Basic Methods and Numerical Algorithms

Figure 2.6 shows the graphs of the Chebyshev polynomials Tn ∈ Pn ,

for n = 1, . . . , 12, along with their Chebyshev knots (2.57). Moreover, the
monomial representations of Tn , for n = 1, . . . , 12, are in Table 2.2.

Corollary 2.30. For n ∈ N0 , let X ∗ = {x∗0 , . . . , x∗n } ⊂ [−1, 1] denote the set
of Chebyshev knots in (2.57). Then the corresponding knot polynomial ωX ∗
has the representation
ωX ∗ = 2−n Tn+1 . (2.59)

Proof. The knot polynomial ωX in (2.49) has for any point set X leading
coefficient one, in particular for the set X ∗ of Chebyshev knots. By the
representation in (2.56), the polynomial 2−n Tn+1 ∈ Pn+1 has also leading
coefficient one. Therefore, the difference

qn = ωX ∗ − 2−n Tn+1

is an algebraic polynomial of degree at most n, i.e., qn ∈ Pn . Since qn (x∗ ) = 0,

for x∗ ∈ X ∗ , the polynomial qn has at least n + 1 zeros, and so qn ≡ 0.

Now we can solve the minimization problem in (2.52), Problem 2.26.

Theorem 2.31. For n ∈ N0 the Chebyshev knots X ∗ = {x∗0 , . . . , x∗n },

2k + 1
x∗k = cos π ∈ [−1, 1] for 0 ≤ k ≤ n,
2n + 2

are the unique solution of the minimization problem (2.52).

Proof. By Corollary 2.30 the knot polynomial ωX ∗ = 2−n Tn+1 ∈ Pn+1 is

a multiple of Tn+1 . Moreover, due to Corollary 2.29 all extrema of ωX ∗ on
[−1, 1] are attained at the n + 2 points Y = {y0 , . . . , yn+1 } in (2.58), where
we have ωX ∗ ∞,[−1,1] = 2−n and

ωX ∗ (yk ) = 2−n (−1)k for 0 ≤ k ≤ n + 1.

Now assume that for a point set X = {x0 , . . . , xn } ⊂ [−1, 1] its knot
polynomial ωX ∈ Pn+1 satisﬁes

ωX ∞,[−1,1] < ωX ∗ ∞,[−1,1] = 2−n . (2.60)

Then we have ωX (yk ) < ωX ∗ (yk ), for all even indices k ∈ {0, . . . , n} and
ωX (yk ) > ωX ∗ (yk ), for all odd indices k ∈ {1, . . . , n}. Therefore, the diﬀerence

ω = ωX ∗ − ωX

has in any of the n + 1 intervals (y1 , y0 ), (y2 , y1 ), . . . , (yn+1 , yn ) at least one

sign change, i.e., ω has at least n + 1 zeros. Since the knot polynomials
ωX , ωX ∗ ∈ Pn+1 have leading coeﬃcient one, respectively, we see that their
2.6 Interpolation by Trigonometric Polynomials 47

diﬀerence ω = ωX ∗ − ωX ∈ Pn is a polynomial of degree at most n. But this

implies ω ≡ 0, i.e.,
ωX ≡ ωX ∗ ∈ Pn+1 ,
in contradiction to (2.60). We can ﬁnally conclude that the Chebyshev knots
X ∗ = {x∗0 , . . . , x∗n } ⊂ [−1, 1] are the unique solution to the minimization
problem in (2.52), Problem 2.26.

2.6 Interpolation by Trigonometric Polynomials

In this section, we consider the interpolation of periodic functions.

Deﬁnition 2.32. A function f : R −→ R is said to be periodic, if

f (x) = f (x + T ) for all x ∈ R (2.61)

for some T > 0. In this case, f is called a T -periodic function. A minimal

T > 0 satisfying (2.61) is called the period of f .

By (2.61) any T -periodic function f is uniquely determined by its func-

tion values on the interval [0, T ). In the following discussion, we restrict our-
selves to 2π-periodic functions. This is without loss of generality since any
T -periodic function f can be transformed into a 2π-periodic function by sca-
ling its argument with a scaling factor T /(2π), i.e., the function g : R −→ R,
given as
T
g(x) = f ·x for all x ∈ R,
2π
is 2π-periodic, if and only if f has period T . We collect all continuous and
2π-periodic functions in the linear space

C2π = {f ∈ C (R) | f (x) = f (x + 2π) for all x ∈ R} .

Now let us turn to the interpolation of periodic functions from C2π . To

this end, we first fix a linear space of interpolants, where it makes sense to
choose a finite-dimensional subspace of C2π . Obvious examples for functions
from C2π are the trigonometric polynomials cos(jx), for j ∈ N0 , and sin(jx),
for j ∈ N. In fact, trigonometric polynomials are suitable choices for the
construction of interpolation spaces contained in C2π . To be more precise on
this, we give the following definition.

Deﬁnition 2.33. For n ∈ N0 , we denote by

TnR = spanR {1, cos(j ·), sin(j·) | 1 ≤ j ≤ n} ⊂ C2π (2.62)

the linear space of real trigonometric polynomials of degree at most n.

48 2 Basic Methods and Numerical Algorithms

Clearly, TnR is a ﬁnite-dimensional linear space, and, moreover, any real-

valued trigonometric polynomial T ∈ TnR can be represented as a linear com-
bination
a0
n
T (x) = + [ak cos(kx) + bk sin(kx)] (2.63)
2
k=1

with coeﬃcients a0 , . . . , an , b1 , . . . , bn ∈ R, the Fourier12 coeﬃcients of T . We

will provide supporting arguments in favour of the chosen form in (2.63) for
the interpolating functions, i.e., as a linear combination of the 2n + 1 (basis)
functions in (2.63).
In the following formulation of the interpolation problem we can, due to
the 2π-periodicity of the target function f ∈ C2π , restrict ourselves without
loss of generality to interpolation points from the interval [0, 2π).

Problem 2.34. Compute from a given set X = {x0 , x1 , . . . , x2n } ⊂ [0, 2π) of
2n+1 pairwise distinct interpolation points and corresponding function values
fX = (f0 , f1 , . . . , f2n )T ∈ R2n+1 a real trigonometric polynomial T ∈ TnR
satisfying TX = fX , i.e.,

T (xj ) = fj for all 0 ≤ j ≤ 2n. (2.64)

For the solution to Problem 2.34, our following investigations concerning

interpolation of complex-valued functions will be very useful. To this end, we
distinguish between real (i.e., real-valued) trigonometric polynomials, as for
TnR in (2.62), and complex (i.e., complex-valued) trigonometric polynomials.
In the following, the symbol i denotes as usual the imaginary unit.

Deﬁnition 2.35. For N ∈ N0 , the linear space of all complex trigonometric

polynomials of degree at most N is given as

TNC = spanC {exp(ij·) | 0 ≤ j ≤ N }. (2.65)

Theorem 2.36. For N ∈ N0 , the linear space TNC has dimension N + 1.

Proof. A complex-valued trigonometric polynomial p ∈ TNC can be written as

a linear combination of the form

N
p(x) = ck eikx (2.66)
k=0

with complex coeﬃcients c0 , . . . , cN ∈ C.

12
Jean Baptiste Joseph Fourier (1768-1830), French mathematician, physicist
2.6 Interpolation by Trigonometric Polynomials 49

We can show the linear independence of the generating function system

{eik· | 0 ≤ k ≤ N } by a simple argument: For p ≡ 0 we have
2π
N N 2π
−imx ikx
0= e ck e dx = ck ei(k−m)x dx = 2πcm
0 k=0 k=0 0

for m = 0, . . . , N , whereby c0 = . . . = cN = 0.
By the Euler13 fomula
eix = cos(x) + i sin(x) (2.67)
we can represent any real trigonometric polynomial T ∈ TnR in (2.63) as a
complex trigonometric polynomial p ∈ TNC of the form (2.66). Indeed, by
using the Euler formula (2.67) we ﬁnd the standard trigonometric identities
1 ix 1 ix
cos(x) = e + e−ix and sin(x) = e − e−ix (2.68)
2 2i
and so we obtain for any T ∈ TnR the representation
a0
n
T (x) = + [ak cos(kx) + bk sin(kx)]
2
k=1
n
a0 ak ikx −ikx bk ikx −ikx
= + e +e + e −e
2 2 2i
k=1
n
a0 ak − ibk ikx ak + ibk −ikx
= + e + e
2 2 2
k=1

n
2n
= ck eikx = e−inx ck−n eikx
k=−n k=0

with the complex Fourier coeﬃcients

1 1 1
c0 = a0 , ck = (ak − ibk ), c−k = (ak + ibk ) for k = 1, . . . , n. (2.69)
2 2 2
Let us now draw an intermediate conclusion.
Proposition 2.37. Any real trigonometric polynomial T ∈ TnR in (2.63) can
be represented as a product
T (x) = e−inx p(x),
where p ∈ TNC is a complex trigonometric polynomial of the form (2.66), for
N = 2n. Moreover, the Fourier coefficients of p are uniquely determined by
1 a0 1
cn−k = (ak +ibk ), cn = , cn+k = (ak −ibk ) for k = 1, . . . , n, (2.70)
2 2 2
where we have applied a periodification to the coefficients ck in (2.69).
13
Leonhard Euler (1707-1783), Swiss mathematician, physicist, and astronomer
50 2 Basic Methods and Numerical Algorithms

Note that the mapping (2.70) between the real Fourier coeﬃcients ak , bk
of T and the complex Fourier coeﬃcients ck of p is linear,

(a0 , . . . , an , b1 , . . . , bn )T ∈ C2n+1 −→ (c0 , . . . , c2n ) ∈ C2n+1 .

Moreover, this linear mapping is bijective, where its inverse is described as

a0 = 2c0 , ak = cn+k + cn−k , bk = i(cn+k − cn−k ) for k = 1, . . . , n. (2.71)

The Fourier coeﬃcients a0 , . . . , an , b1 , . . . , bn are real, if and only if

cn+k = cn−k for all k = 0, . . . , n.

By the bijectivity of the linear mappings in (2.70) and (2.71) between the
complex and the real Fourier coeﬃcients, we can determine the dimension of
TnR . The following result is a direct consequence of Theorem 2.36.

Corollary 2.38. For n ∈ N0 , the linear space TnR has dimension 2n + 1.

Now let us return to the interpolation problem, Problem 2.34. For the case
of complex trigonometric polynomials, we can solve Problem 2.34 as follows.

Theorem 2.39. Let X = {x0 , . . . , xN } ⊂ [0, 2π) be a set of N + 1 pairwise

distinct interpolation points and fX = (f0 , . . . , fN )T ∈ CN +1 be a data vector
of complex function values, for N ∈ N0 . Then there is a unique complex
trigonometric polynomial p ∈ TNC satisfying pX = fX , i.e.,

p(xk ) = fk for all 0 ≤ k ≤ N. (2.72)

Proof. We regard the linear mapping L : TNC −→ CN +1 , deﬁned as

p ∈ TNC −→ pX = (p(x0 ), . . . , p(xN ))T ∈ CN +1 ,

which assigns every complex trigonometric polynomial p ∈ TNC in (2.66) to

the data vector pX ∈ CN +1 .
By letting zk = eixk ∈ C, for 0 ≤ k ≤ N , we obtain N +1 pairwise distinct
interpolation points on the boundary of the unit circle, where we have

N
N
p(xk ) = cj eijxk = cj zkj .
j=0 j=0

If L(p) = 0, then the complex polynomial p has at least N + 1 zeros. But

in this case, we have p ≡ 0, due to the fundamental theorem of algebra.
Therefore, the linear mapping L is injective. Due to the dimension formula,
L is also surjective, and thus bijective. This already proves the existence and
uniqueness of the sought polynomial p ∈ TNC .
2.7 The Discrete Fourier Transform 51

We ﬁnally turn to the solution of the interpolation problem, Problem 2.34,

by real trigonometric polynomials. The following result is a direct consequence
of Theorem 2.39.
Corollary 2.40. Let X = {x0 , . . . , x2n } ⊂ [0, 2π) be a set of 2n + 1 pairwise
distinct interpolation points and fX = (f0 , . . . , f2n )T ∈ R2n+1 be a data vector
of real function values, for n ∈ N0 . Then there is a unique real trigonometric
polynomial T ∈ TnR satisfying TX = fX .
C
Proof. Let p ∈ T2n be the unique complex trigonometric interpolation poly-
nomial satisfying p(xk ) = einxk fk , for 0 ≤ k ≤ 2n, with Fourier coeﬃcients
cj , for 0 ≤ j ≤ 2n. Then we have

2n
2n
q(x) := e 2inx
p(x) = cj e i(2n−j)x
= c2n−j eijx for x ∈ [0, 2π)
j=0 j=0

and, moreover, since fk ∈ R,

q(xk ) = e2inxk p(xk ) = einxk fk = p(xk ) for all 0 ≤ k ≤ 2n.
C
Therefore, the complex trigonometric polynomial q ∈ T2n is also a solution
to the interpolation problem q(xk ) = e inxk
fk for all 0 ≤ k ≤ 2n. From the
uniqueness of the interpolation by complex trigonometric polynomials we get
q ≡ p, and so in particular
cj = c2n−j for all 0 ≤ j ≤ 2n. (2.73)
The Fourier coefficients of the interpolating real trigonometric polynomial
T ∈ TnR can finally be obtained by the inversion of the complex Fourier
coefficients in (2.71). Note that the Fourier coefficients a0 , . . . , an , b1 , . . . , bn
of T are real, due to (2.73).

2.7 The Discrete Fourier Transform

In this section, we explain interpolation by trigonometric polynomials. More
specifically, we discuss the special case of N ∈ N equidistant interpolation
points
2π
xk = k ∈ [0, 2π) for 0 ≤ k ≤ N − 1.
N
As we will show, the required Fourier coefficients can be computed efficiently.
In the following discussion, we denote the values of the target function f
by fk = f (xk ), for 0 ≤ k ≤ N − 1. Moreover, we use the notation
ωN = e2πi/N for N ∈ N. (2.74)
for the N -th root of unity.
For further preparation, we make a note of the following observation.
52 2 Basic Methods and Numerical Algorithms

Lemma 2.41. For N ∈ N the N -th root of unity ωN has the property
N −1
1 (−k)j
ω = δk for all 0 ≤ , k ≤ N − 1. (2.75)
N j=0 N

Proof. Let 0 ≤ , k ≤ N − 1. Note that for = k the statement in (2.75) is

trivial. For = k we have ωN
−k
= 1, so that we can use the standard identity

N −1 (−k)N
−k j ωN −1 e2πi(−k) − 1
ωN = = =0
j=0
ωN − 1
−k −k
ωN −1

of geometric series. This already completes our proof for (2.75).

Now we are in a position where we can already give the solution to the
posed interpolation problem at equidistant interpolation points.

Theorem 2.42. For N ∈ N equidistant points x = 2π /N ∈ [0, 2π), for

0 ≤ ≤ N − 1, and function values fX = (f0 , . . . , fN −1 )T ∈ CN the Fourier
coeﬃcients of the complex trigonometric interpolation polynomial p ∈ TNC−1
satisfying pX = fX are given as
N −1
1 −jk
cj = fk ωN for 0 ≤ j ≤ N − 1. (2.76)
N
k=0

Proof. By using Lemma 2.41, we obtain the identity

−1 N −1 −1 N −1
1 1 (−k)j
N N
−jk ijx
p(x ) = fk ωN e = fk ω = f
j=0
N N j=0 N
k=0 k=0

for all = 0, . . . , N − 1.

Therefore, the linear mapping in (2.76) yields an automorphism

AN : CN −→ CN ,

which maps the data vector fX = (f0 , . . . , fN −1 )T ∈ CN on the Fourier coef-

ﬁcients c = (c0 , . . . , cN −1 )T ∈ CN of the complex trigonometric interpolation
polynomial p ∈ TNC−1 satisfying pX = fX . The bijective linear mapping AN ,
called discrete Fourier analysis, is represented by the matrix
1 −jk
AN = ωN ∈ CN ×N . (2.77)
N 0≤j,k≤N −1

We can characterize the inverse of AN as follows. The linear mapping

A−1
N : C −→ C ,
N N
2.7 The Discrete Fourier Transform 53

which assigns every vector c = (c0 , . . . , cN −1 )T ∈ CN of Fourier coeﬃcients,

for a complex trigonometric polynomial

N −1
p(x) = cj eijx ∈ TNC−1 ,
j=0

to the complex values

N −1
N −1
jk
fk = p(xk ) = cj eijxk = cj ωN for k = 0, . . . , N − 1 ,
j=0 j=0

i.e., pX = fX , is called discrete Fourier synthesis. Therefore, the linear

mapping A−1 N is the inverse of AN , being represented by the matrix

A−1
N = ωN
jk
∈ CN ×N . (2.78)
0≤j,k≤N −1

The discrete Fourier analysis and the Fourier synthesis are usually referred
to as discrete Fourier transform and discrete inverse Fourier transform. In
the following discussion, we derive an eﬃcient method for computing the
discrete (inverse) Fourier transform. But we ﬁrst give a formal introduction
for the discrete (inverse) Fourier transform.

Deﬁnition 2.43. The discrete Fourier transformation (DFT) of

z = (z(0), z(1), . . . , z(N − 1))T ∈ CN

is deﬁned componentwise as

N −1
−jk
ẑ(j) = z(k)ωN for 0 ≤ j ≤ N − 1, (2.79)
k=0

and the inverse discrete Fourier transform (IDFT) of

ẑ = (ẑ(0), ẑ(1), . . . , ẑ(N − 1))T ∈ CN

is deﬁned componentwise as
N −1
1 jk
z(k) = ẑ(j)ωN for 0 ≤ k ≤ N − 1.
N j=0

The discrete Fourier transform (DFT) and the inverse DFT are repre-
sented by the Fourier matrices FN = N AN and FN−1 = A−1 N /N , i.e.,
54 2 Basic Methods and Numerical Algorithms

−jk
FN = ω N ∈ CN ×N
0≤j,k≤N −1
1 jk
FN−1 = ωN ∈ CN ×N .
N 0≤j,k≤N −1

Therefore, with using the notations in Deﬁnition 2.43, we have

ẑ = FN z and z = FN−1 ẑ for all z, ẑ ∈ CN .

This ﬁnally leads us to the Fourier inversion formula

z = FN−1 FN z for all z ∈ CN .

Now let us make one simple example for further illustration.

Example 2.44. We compute the DFT ẑ ∈ C512 of the vector z ∈ C512 with
components z(k) = 3 sin(2π · 7k/512) − 4 cos(2π · 8k/512). To this end, we
regard the Fourier series (from the Fourier inversion formula)

1
511
z(k) = ẑ(j)e2πijk/512 ,
512 j=0

whereby we obtain the unique representation of z ∈ C512 in the Fourier basis

3 4
e2πijk/512 0 ≤ j ≤ 511 .

On the other hand, the Euler formula yields the representation

z(k) = 3 sin(2π7k/512) − 4 cos(2π8k/512)

3 2πi7k/512 4
= e − e−2πi7k/512 − e2πi8k/512 + e−2πi8k/512
2i 2
−3i 2πi7k/512 3i 2πi(−7+512)k/512
= e + e − 2e2πi8k/512 − 2e2πi(−8+512)k/512
2 2
1
= −3 · 256i · e2πi7k/512 − 1024 · e2πi8k/512
512
−1024 · e2πi504k/512 + 3 · 256i · e2πi505k/512 .

Therefore, we have

ẑ(7) = −768i, ẑ(8) = −1024, ẑ(504) = −1024, ẑ(505) = 768i,

and, moreover, ẑ(j) = 0 for all j ∈ {0, . . . , 511} \ {7, 8, 504, 505}. Thereby,
the vector z ∈ C512 has a sparse representation by the four non-vanishing
Fourier coeﬃcients ẑ(7), ẑ(8), ẑ(504) and ẑ(505) (see Figure 2.7). ♦
2.7 The Discrete Fourier Transform 55
8

-2

-4

-6

-8
0 50 100 150 200 250 300 350 400 450 500

(a) input vector z(k), k = 0, . . . , 511

1100

1000

900

800

700

600

500

400

300

200

100

0
0 50 100 150 200 250 300 350 400 450 500

(b) amplitude spectrum |ẑ(j)|

Fig. 2.7. Sparse representation of z(k) = 3 sin(2π · 7k/512) − 4 cos(2π · 8k/512)

with amplitude spectrum |ẑ(j)| (see Example 2.44).
56 2 Basic Methods and Numerical Algorithms

Remark 2.45. A componentwise computation of the DFT ẑ (or of the

IDFT) according to Deﬁnition 2.43 requires asymptotically O(N 2 ) steps,
namely O(N ) steps for each of the N components.

In the remainder of this section, we explain how to compute the DFT

by an eﬃcient algorithm, termed the fast Fourier transform (FFT), de-
signed by Cooley14 and Tukey15 [16]. The Cooley-Tukey algorithm is based
on a recursion according to a common (political) principle divide et impera
(Latin for divide and conquer) of Machiavelli16 dating back to 1513.
The recursion step of the Cooley-Tukey algorithm relies on the identity
2
ω2N = ωN ,

being applied as follows.

For N = 2n , n ≥ 1, and 0 ≤ j ≤ N − 1 we have

N −1
−kj
ẑ(j) = z(k)ωN
k=0
−kj
−kj
= z(k)ωN + z(k)ωN
k even k odd

N/2−1
−2kj

N/2−1
−(2k+1)j
= z(2k)ωN + z(2k + 1)ωN
k=0 k=0

N/2−1
−2kj −j

N/2−1
−2kj
= z(2k)ωN + ωN z(2k + 1)ωN .
k=0 k=0

This already yields for M = N/2 the reduction

M −1
M −1
−2kj −j −2kj
ẑ(j) = z(2k)ωN + ωN z(2k + 1)ωN
k=0 k=0

M −1
M −1
−kj −j −kj
= u(k)ωN/2 + ωN v(k)ωN/2
k=0 k=0

M −1
M −1
−kj −j −kj
= u(k)ωM + ωN v(k)ωM
k=0 k=0

for j = 0, . . . , N − 1, where

u(k) = z(2k) and v(k) = z(2k + 1) for k = 0, 1, . . . , M − 1.

14
James W. Cooley (1926-2016), US American mathematician
15
John Wilder Tukey (1915-2000), US American mathematician
16
Niccolò di Bernardo dei Machiavelli (1469-1527), Florentine philosopher
2.7 The Discrete Fourier Transform 57

Therefore, we can, for any input vector z ∈ CN of length N = 2M , reduce

the computation of its DFT ẑ to the computation of two DFTs of half length
M = N/2 each. Indeed, the DFTs of the two vectors u, v ∈ CM yields the
DFT of z by
−j
ẑ(j) = û(j) + ωN v̂(j).
From this basic observation, we can already determine the complexity, i.e.,
the asymptotic computational costs, of the fast Fourier transform (FFT).
Theorem 2.46. For N = 2n , n ∈ N, the discrete Fourier transform of a
vector z ∈ CN is computed by the FFT in asymptotically O(N log(N )) steps.
Proof. In the first reduction step the DFT of z ∈ CN with length N is de-
composed into two DFTs (for u, v ∈ CN/2 ) of length N/2 each. By induction,
in the m-th reduction step for the current 2m DFTs of length N/2m , each
of these DFT can be decomposed into two DFTs of length N/2m+1 . After
n = log2 (N ) reduction steps we have N atomic DFTs of unit length. But the
DFT for a vector z of unit length is trivial: In this case, we have ẑ(0) = z(0)
for z = z(0) ∈ C1 , and so the recursion terminates. Altogether, N log2 (N )
steps are performed in the recursion.
We finally discuss one relevant application of the fast Fourier transform.
In this application, we consider solving a linear equation system of the form
Cx = b (2.80)
efficiently, where C is a cyclic Toeplitz matrix.
Definition 2.47. A cyclic Toeplitz17 matrix has the form
⎡ ⎤
c0 cN −1 · · · c2 c1
⎢ . . ⎥
⎢ c1 c0 . . .. c2 ⎥
⎢ ⎥
⎢ .. .. .. ⎥
⎢ . cN −1 . ⎥ N ×N
C=⎢ . c1 ⎥∈C
⎢ .. . . ⎥
⎢c . c0 cN −1 ⎥
⎣ N −2 . ⎦
cN −1 cN −2 · · · c1 c0
where c = (c0 , . . . , cN −1 )T ∈ CN is called the generating vector of C.
The following observation is quite important for our solution to (2.80).
Proposition 2.48. Let C be a cyclic Toeplitz matrix with generating vector
c ∈ CN . Then C is diagonalized by the discrete Fourier transform FN , so
that
FN CFN−1 = diag(d),
where the eigenvalues d = (d0 , . . . , dN −1 ) ∈ CN of C are given by the discrete
Fourier transform of c, i.e.,
d = FN c.
17
Otto Toeplitz (1881-1940), German mathematician
58 2 Basic Methods and Numerical Algorithms

Proof. For the entries of the Toeplitz matrix C = (Cjk )0≤j,k≤N −1 , we have

Cjk = c(j−k) mod N for 0 ≤ j, k ≤ N − 1.

We recall the deﬁnition of the Fourier matrices

1 jk
−jk
FN = ωN and FN−1 = ωN ,
0≤j,k≤N −1 N 0≤j,k≤N −1

where ωN = e2πi/N . For 0 ≤ ≤ N − 1 we let

1 j
ω () = (ω )0≤j≤N −1 ∈ CN
N N
denote the -th column of FN−1 . By using the identity

(k−j) (k−j) mod N
ωN = ωN

we obtain
N −1 N −1
1 1 j (k−j)
(Cω () )j = c(j−k) mod N · ωN
k
= ωN c(j−k) mod N · ωN
N N
k=0 k=0
−1 N −1
1 j
N
1 j −m −m 1 j
= ωN cm mod N · ωN = ωN cm ωN = ωN d ,
N m=0
N m=0
N

where

N −1
−k
d = ck ωN for 0 ≤ ≤ N − 1
k=0

is the -th component of d = FN c.

Therefore, ω () is an eigenvector of C with eigenvalue d , i.e.,

Cω () = d ω () for 0 ≤ ≤ N − 1,

whereby
CFN−1 = FN−1 diag(d)
or
FN CFN−1 = diag(d).

Now we ﬁnally regard the linear system (2.80) for a cyclic Toeplitz matrix
C ∈ CN ×N with generating vector c ∈ CN . By application of the discrete
Fourier transform FN to both sides in (2.80) we get the identity

FN CFN−1 FN x = FN b.
2.7 The Discrete Fourier Transform 59

Using Proposition 2.48 leads us to the linear system

Dy = r, (2.81)

where we let y = FN x and r = FN b, and where D = diag(d) for d = FN c.

Now the matrix C is non-singular, if and only if none of its eigenvalues in d
vanishes. In this case
T
r0 rN −1
y= ,..., ∈ CN
d0 dN −1

is the unique solution of the linear system (2.81). By backward transformation

with the inverse discrete Fourier transform FN−1 , we ﬁnally obtain the solution
of the linear system (2.80) by

x = FN−1 y.

We summarize the proposed solution for the Toeplitz system (2.80) in Al-
gorithm 3. Note that Algorithm 3 can be implemented eﬃciently by using the
fast Fourier transform (FFT): By Theorem 2.46 the performance of the steps
in lines 5,6 and 8 of Algorithm 3 by the (inverse) FFT costs only O(N log(N ))
operations each. In this case, a total number of only O(N log(N )) operations
are required for the performance of Algorithm 3. In comparison, the solution
of a linear equation system (2.80) via Gauss elimination requiring O(N 3 ) ope-
rations is far too expensive. But unlike in Algorithm 3, the Toeplitz structure
of the matrix C is not used in the Gauss elimination algorithm.

Algorithm 3 Solution of linear Toeplitz systems Cx = b in (2.80)

1: function Toeplitz-Solution(c,b)
2: input: generating vector c ∈ CN of a non-singular
3: cyclic Toeplitz matrix C ∈ CN ×N ;
4: right hand side b ∈ CN ;
5: compute DFT d = FN c;
6: compute DFT r = FN b;
7: let y := (r0 /d0 , . . . , rN −1 /dN −1 )T ;
−1
8: compute IDFT x = FN y
9: output: solution x ∈ CN of Cx = b.
10: end function
3 Best Approximations

In this chapter, we analyze fundamental questions of approximation. To this

end, let F be a linear space, equipped with a norm · . Moreover, S ⊂ F be
a non-empty subset of F. To approximate one f ∈ F \ S by elements from S
we are interested in finding a s∗ ∈ S, whose distance to f is minimal among
all elements from S. This leads us to the definition of best approximations.
Definition 3.1. Let F be a linear space with norm ·. Moreover, let S ⊂ F
be a non-empty subset of F. For f ∈ F, an element s∗ ∈ S is said to be a
best approximation to f from S with respect to (F, · ), or in short: s∗ is
a best approximation to f , if
s∗ − f = inf s − f .
s∈S

Moreover,
η ≡ η(f, S) = inf s − f
s∈S
is called the minimal distance between f and S.
In the following investigations, we will first address questions concerning
the existence and uniqueness of best approximations. To this end, we develop
sufficient conditions for the linear space F and the subset S ⊂ F, under which
we can guarantee for any f ∈ F the existence of a best approximation s∗ ∈ S
for f . To guarantee the uniqueness of s∗ , we require strict convexity for the
norm · .
In the following discussion, we develop suitable sufficient and necessary
conditions to characterize best approximations. To this end, we first derive
dual characterizations for best approximations, giving conditions for the ele-
ments from the topological dual space F of linear and continuous functionals
on F.
This is followed by direct characterizations of best approximations, where
we use directional derivatives (Gâteaux derivatives) of the norm · . On that
occasion, we consider computing directional derivatives of relevant norms
explicitly.
To study the material of this chapter (and for the following chapters)
we require knowledge of elementary results from optimization and functional
analysis. Therefore, we decided to explain a selection of relevant results. But
for further reading, we refer to the textbook [33].

© Springer Nature Switzerland AG 2018 61

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_3
62 3 Best Approximations

Before we address theoretical questions concerning the existence and

uniqueness of best approximations, we ﬁrst discuss one elementary example,
which will illustrate relevant scenarios and phenomena.

Example 3.2. For F = R2 , let S = {x = (x1 , x2 ) | 2 ≤ x2 < 3} ⊂ R2

be a concentric circle around the origin. Moreover, let fα = (α, 0) ∈ R2 , for
α ∈ R. Now we wish to best-approximate fα (according to Definition 3.1) by
elements from S. To do so, we first need to fix a norm on R2 . To this end,
we work with three different norms on R2 :
• the 1-norm · 1 , defined as x1 = |x1 | + |x2 | for x = (x1 , x2 );
• the Euclidean norm · 2 , defined as x22 = |x1 |2 + |x2 |2 ;
• the maximum norm · ∞ , defined as x∞ = max(|x1 |, |x2 |).
We let Sp∗ ≡ Sp∗ (fα ) denote the set of best approximations to fα with
respect to · = · p at minimal distances ηp ≡ ηp (fα , S), for p = 1, 2, ∞.
For the construction and characterization of best approximations to fα we
distinguish different cases (see Fig. 3.1).
Case (a): Suppose α ≥ 3. In this case, we have

ηp = inf s − fα p = α − 3 for p = 1, 2, ∞,
s∈S

where
s − fα p > inf s − fα p = α − 3 for all s ∈ S,
s∈S

i.e., there is no best approximation to fα from S, and so Sp∗ = ∅.

Case (b): Suppose α ∈ (0, 2). In this case, we have
√
8 − α2 − α
η 1 = η2 = 2 − α and η∞ =
2
3 √ √ 4
2 −α
and, moreover, Sp∗ = {(2, 0)} for p = 1, 2 and S∞∗
= 8−α2 +α
2 , ± 8−α
2 .
Case (c): Suppose α = 0. In this case, we have
√
η 1 = η2 = 2 and η∞ = 2
√ √
where S1∗ = {(±2, 0), (0, ±2)}, S2∗ = {x ∈ S | x2 = 2}, S∞∗
= {(± 2, ± 2)}.
Therefore, there exists, for any of the three norms · p , p = 1, 2, ∞, a best
approximation to f0 . In either case, however, the best approximations are
not unique. For · 2 there are even uncountably many best approximations
to f0 .
Case (d): For α ∈ [2, 3) we have fα ∈ S and so Sp∗ = {fα } with ηp = 0.
For any other case, i.e., for α < 0, we can analyze the set of best approx-
imations by using one of the (symmetric) cases (a)-(d). ♦
3 Best Approximations 63
5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 f4 0 f 0 * f
0
S s* 0
S s* 4 0
S s 4
-1 -1 -1

-2 -2 -2

-3 -3 -3

-4 -4 -4

-5 -5 -5
-5 0 5 -5 0 5 -5 0 5

α = 4, · = · 1 α = 4, · = · 2 α = 4, · = · ∞
S1∗ = ∅, η1 = 1 S2∗ = ∅, η2 = 1 ∗
S∞ = ∅, η∞ = 1

4 4 4

3 3 3

2 2 2

*
1 1 1 s
2
0 f1 s* 0 f1 s* 0 f1
0
S 1
0
S 1
0
S
*
-1 -1 -1
s1
-2 -2 -2

-3 -3 -3

-4 -4 -4

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

α = 1, · = · 1 α = 1, · = · 2 α = 1, · = · ∞
√ √
S1∗ = {(2, 0)} S2∗ = {(2, 0)} ∗
S∞ = 7+1
2
, ± 7−1
2
√
7−1
η1 = 1 η2 = 1 η∞ = 2

4 4 4

3 3 3

s *4
2 2 2
S* s *2 s *1
S 2
1 1 1

f0 f0 f0
0 s *1 s *2 0 S 0 S
-1 -1 -1

s *3 s *4
-2 -2 -2
s *3
-3 -3 -3

-4 -4 -4

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

α = 0, · = · 1 α = 0, · = · 2 α = 0, · = · ∞
√ √
S1∗ = {(±2, 0), (0, ±2)} S2∗ = {x ∈ R2 | x2 = 2} ∗
S∞ = {(± 2, √ ± 2)}
η1 = 2 η2 = 2 η∞ = 2

Fig. 3.1. Approximation of fα = (α, 0) ∈ R2 , for α = 4, 1, 0, by elements from the

approximation set S = {x = (x1 , x2 ) | 2 ≤ x2 < 3} ⊂ R2 and with respect to the
norms · p for p = 1, 2, ∞ (see Example 3.2).
64 3 Best Approximations

3.1 Existence
In the following discussion, the notions of compactness, completeness, and
continuity play an important role. We assume that their deﬁnitions and fur-
ther properties are familiar from analysis. Nevertheless, let us recall the con-
tinuity of functionals. Throughout this chapter, F denotes a linear space with
norm · .

Deﬁnition 3.3. A functional ϕ : F −→ R is said to be continuous at

u ∈ F, if for any convergent sequence (un )n∈N ⊂ F with limit u ∈ F, i.e.,

un − u −→ 0 for n → ∞,

we have
ϕ(un ) −→ ϕ(u) for n → ∞.
Moreover, ϕ is called continuous on F, if ϕ is continuous at every u ∈ F.

Now recall that any continuous functional attains its minimum (and its
maximum) on compact sets. Any compact set is closed and bounded. The
converse, however, is only true in ﬁnite-dimensional spaces.
For the discussion in this section, we need the continuity of norms. This
requirement is already covered by the following result.

Theorem 3.4. Every norm is continuous.

Proof. Let F be a linear space with norm · . Moreover let v ∈ F and

(vn )n∈N ⊂ F be a convergent sequence in F with limit v, i.e.,

vn − v −→ 0 for n → ∞.

Now, by the triangle inequality for the norm · , this implies

|vn − v| ≤ vn − v −→ 0 for n → ∞

and therefore
vn −→ v for n → ∞,
i.e., · is continuous at v ∈ F. Since we did not pose any further conditions
on v ∈ F, the norm · is continuous on F.

The above result allows us to prove a ﬁrst elementary result concerning

the existence of best approximations.

Theorem 3.5. Let S ⊂ F be compact. Then there exists for any f ∈ F a

best approximation s∗ ∈ S to f .
3.1 Existence 65

Proof. For f ∈ F the functional ϕ : F −→ [0, ∞), deﬁned as

ϕ(v) = v − f for v ∈ F,

is continuous on F. In particular, ϕ attains its minimum on the compact set

S, i.e., there is one s∗ ∈ S satisfying

ϕ(s∗ ) = s∗ − f ≤ s − f = ϕ(s) for all s ∈ S.

From this result, we can further conclude as follows.

Corollary 3.6. Let F be ﬁnite-dimensional, and S ⊂ F be closed in F.

Then there exists for any f ∈ F a best approximation s∗ ∈ S to f .

Proof. For s0 ∈ S and f ∈ F the non-empty set

S0 = S ∩ {v ∈ F | v − f ≤ s0 − f } ⊂ S

is closed and bounded, i.e., S0 ⊂ F is compact. By Theorem 3.5 there is one

best approximation s∗ ∈ S0 to f from S0 , so that

s∗ − f ≤ s − f for all s ∈ S0 .

Moreover, for any s ∈ S \ S0 we have the inequality

s − f > s0 − f ≥ s∗ − f ,

so that altogether,

s∗ − f ≤ s − f for all s ∈ S,

i.e., s∗ ∈ S0 ⊂ S is a best approximation to f from S.

Corollary 3.7. Let S ⊂ F be a closed subset of F. If S is contained in a

ﬁnite-dimensional linear subspace R ⊂ F of F, i.e., S ⊂ R, then there exists
for any f ∈ F a best approximation s∗ ∈ S to f .

Proof. Regard, for f ∈ F, the ﬁnite-dimensional linear space

Rf = span{f, r1 , . . . , rn } ⊂ F,

where {r1 , . . . , rn } be a basis of R. Then there exists, by Corollary 3.6 a best

approximation s∗ ∈ S to f ∈ Rf , where in particular,

s∗ − f ≤ s − f for all s ∈ S.

66 3 Best Approximations

The result of Corollary 3.7 holds in particular for the case R = S.

Corollary 3.8. Let S ⊂ F be a finite-dimensional subspace of F. Then there
exists for any f ∈ F a best approximation s∗ ∈ S to f .
In the above results concerning the existence of best approximations, we
require that S ⊂ F is contained in a finite-dimensional linear space. For
the approximation in Euclidean spaces F, we can refrain from using this
restriction. To this end, the following geometric identities are of fundamental
importance.
Theorem 3.9. Let F be a Euclidean space with inner product (·, ·) and norm
· = (·, ·)1/2 . Then the parallelogram identity
v + w2 + v − w2 = 2v2 + 2w2 for all v, w ∈ F (3.1)
holds. If F is a Euclidean space over the real numbers R, then the polarization
identity
1
(v, w) =v + w2 − v − w2 for all v, w ∈ F. (3.2)
4
holds. If F is a Euclidean space over the complex numbers C, then the
polarization identity holds as
1
(v, w) = v + w2 − v − w2 + iv + iw2 − iv − iw2 (3.3)
4
for all v, w ∈ F.
Proof. Equations (3.1),(3.2) follow directly from the identities
v ± w2 = (v ± w, v ± w) = (v, v) ± 2(v, w) + (w, w) = v2 ± 2(v, w) + w2 .
Likewise, the polarization identity (3.3) can be verified by elementary calcu-
lations.
For the geometric interpretation of the parallelogram identity, we make the
following remark.
For any parallelogram the sum of square lengths of the four edges coincides
with the sum of the square lengths of the two diagonals (see Fig. 3.2).
For the statement in Theorem 3.9, the converse is true, according to the
theorem of Jordan1 and von Neumann2 [40].
Theorem 3.10. (Jordan-von Neumann theorem, 1935).
Let F be a linear space with norm · , for which the parallelogram iden-
tity (3.1) holds. Then there is an inner product (·, ·) on F, so that
(v, v) = v2 for all v ∈ F, (3.4)
i.e., F is a Euclidean space.
1
Pascual Jordan (1902-1980), German mathematician and physicist
2
John von Neumann (1903-1957), Hungarian-US American mathematician
3.1 Existence 67

v−
w
w

v+w

Fig. 3.2. On the geometry of the parallelogram identity (see Theorem 3.9).

Proof. Let F be a linear space over R. By using the norm · of F we deﬁne

a mapping (·, ·) : F × F −→ R through the polarization identity (3.2), i.e., we
let
1
(v, w) := v + w2 − v − w2 for v, w ∈ F.
4
Obviously, we have (3.4) and so (·, ·) is positive deﬁnite. Moreover, (·, ·) is
obviously symmetric, so that (v, w) = (w, v) for all v, w ∈ F.
It remains to verify the linearity

(αu + βv, w) = α(u, w) + β(v, w) for all α, β ∈ R, u, v, w ∈ F. (3.5)

To this end, we note the property

(−v, w) = −(v, w) for all v, w ∈ F, (3.6)

which immediately follows from the deﬁnition of (·, ·). In particular, we have

(0, w) = 0 for all w ∈ F.

Moreover, by the parallelogram identity (3.1) we obtain

1
(u, w) + (v, w) = u + w2 − u − w2 + v + w2 − v − w2
4,
2 2 -
1 1 1
(u + v) + w − (u + v) − w

=
2 2 2

1
=2 (u + v), w ,
2

which, for v = 0, implies

68 3 Best Approximations

1
(u, w) = 2 u, w for all u, w ∈ F (3.7)
2
and thereby the additivity
(u, w) + (v, w) = (u + v, w) for all u, v, w ∈ F. (3.8)
From (3.7),(3.8) we obtain for m, n ∈ N the identities
m(u, w) = (mu, w) for all u, w ∈ F

1 1
(u, w) = u, w for all u, w ∈ F
2n 2n
by induction on m ∈ N and by induction on n ∈ N, respectively.
In combination with (3.6) and (3.8) this implies the homogeneity
(αu, w) = α(u, w) for all u, w ∈ F (3.9)
for all dyadic numbers α ∈ Q of the form

n
αk
α=m+ for m ∈ Z, n ∈ N, αk ∈ {0, 1}, 1 ≤ k ≤ n.
2k
k=1

Since any real number α ∈ R can be approximated arbitrarily well by

a dyadic number, the continuity of the norm · implies the homogeneity
(3.9) even for all α ∈ R. Together with the additivity (3.8) this implies the
linearity (3.5). Therefore, (·, ·) is an inner product over R.
If F is a linear space over C, then we deﬁne (·, ·) : F × F −→ C through
the polarization identity (3.3), for which we then verify the properties of an
inner product for (·, ·) in (3.2), by analogy.
Given the above characterization of Euclidean norms by the parallelogram
identity (3.1) and the polarization identities (3.2),(3.3), approximation in
Euclidean spaces is of particular importance. From the equivalence relation
in Theorems 3.9 and 3.10, we can immediately draw the following conclusion.
Corollary 3.11. Every inner product is continuous.
Proof. Let F be a Euclidean space over R with inner product (·, ·). Moreover,
let (vn )n∈N ⊂ F and (wn )n∈N ⊂ F be convergent sequences in F with limit
elements v ∈ F and w ∈ F. From the polarization identity (3.2) and by the
continuity of the norm · = (·, ·)1/2 , from Theorem 3.4, we have
1
(vn , wm ) = vn + wm 2 − vn − wm 2
4
1
−→ v + w2 − v − w2 = (v, w) for n, m → ∞.
4
For the case where F is a Euclidean space over C, we show the continuity
of (·, ·) from the polarization identity (3.3), by analogy.
3.1 Existence 69

Now we return to the question for the existence of best approximations. In

Euclidean spaces F we can rely on the parallelogram identity (3.1). Moreover,
we need the completeness of F. On this occasion, we recall the following
definition.
Definition 3.12. A complete Euclidean space is called Hilbert3 space.
Moreover, let us recall the notion of strictly convex sets.
Definition 3.13. A non-empty subset K ⊂ F is called convex, if for any
u, v ∈ K the straight line
[u, v] = {λu + (1 − λ)v | λ ∈ [0, 1]}
between u and v lies in K, i.e., if [u, v] ⊂ K for all u, v ∈ K.
If for any u, v ∈ K, u = v, the open straight line
(u, v) = {λu + (1 − λ)v | λ ∈ (0, 1)}
is contained in the interior of K, then K is called strictly convex.
Now we prove an important result concerning the existence of best ap-
proximations in Hilbert spaces.
Theorem 3.14. Let F be a Hilbert space with inner product (·, ·) and norm
· = (·, ·)1/2 . Moreover, let S ⊂ F be a closed and convex subset of F. Then
there exists for any f ∈ F a best approximation s∗ ∈ S to f .
Proof. Let (sn )n∈N ⊂ S be a minimal sequence in S, i.e.,
sn − f −→ η(f, S) for n → ∞
with minimal distance η ≡ η(f, S) = inf s∈S s − f .
From the parallelogram identity (3.1) we obtain the estimate
2
sn + sm
sn − sm = 2sn − f + 2sm − f − 4
2 2 2 − f
2
≤ 2sn − f 2 + 2sm − f 2 − 4η 2 .
Therefore, for any ε > 0 there is one N ≡ N (ε) ∈ N satisfying
sn − sm < ε for all n, m ≥ N,
i.e., (sn )n∈N is a Cauchy4 sequence in the Hilbert space F, and therefore
convergent in F. Since S is a closed set, the limit element s∗ lies in S, and
we have
η = lim sn − f = s∗ − f ,
n→∞
∗
i.e., s ∈ S is a best approximation to f .
3
David Hilbert (1862-1943), German mathematician
4
Augustin-Louis Cauchy (1789-1857), French mathematician
70 3 Best Approximations

Remark 3.15. The required convexity for S is necessary for the result of
Theorem 3.14. In order to see this, we regard the sequence space
* +
∞
2
≡ (R) = x = (xk )k∈N ⊂ R
2
|xk | < ∞
2
(3.10)
k=1

consisting of all square summable sequences of real numbers. The sequence

space 2 , being equipped with the inner product
∞

(x, y) = x k yk for x = (xk )k∈N , y = (yk )k∈N ∈ 2

k=1

is a Hilbert space with the 2 -norm

5
6∞
6
x2 := 7 |xk |2 for x = (xk )k∈N ∈ 2
.
k=1

Now we regard the subset

8
1
S= x (k)
= 1+
ek k ∈ N ⊂ 2
,
k

where ek ∈ 2 is the sequence with (ek )j = δjk , for j, k ∈ N. Note that the
elements x(k) ∈ S are isolated in 2 , and so S is closed. But S is not convex.
Now we have η(0, S) = 1 for the minimal distance between 0 ∈ 2 and S,
and, moreover,
x(k) − 02 > 1 for all x(k) ∈ S.
Hence there exists no x(k) ∈ S with unit distance to the origin.
Finally, we remark that the result of Theorem 3.14 does not generalize
to Banach spaces. To see this, a counterexample can for instance be found
in [42, Section 5.2].

3.2 Uniqueness
In the following discussion, the notion of (strict) convexity for point sets,
functions, functionals and norms plays an important role. Recall the relevant
definitions for sets (see Definition 3.13) and for functions (see Definition 3.20),
as these should be familiar from analysis.
Now we note some fundamental results, where F denotes, throughout this
section, a linear space with norm · . We start with a relevant example for
a convex set.
3.2 Uniqueness 71

Theorem 3.16. Let S ⊂ F be convex and f ∈ F. Then the set

S ∗ ≡ S ∗ (f, S) = {s∗ ∈ S | s∗ − f = inf s − f } ⊂ S
s∈S
∗
of best approximations s ∈ S to f is convex.
Proof. Let s∗1 , s∗2 ∈ S ∗ be two best approximations to f ∈ F. Then, for any
element
s∗λ = λs∗1 + (1 − λ)s∗2 ∈ [s∗1 , s∗2 ] ⊂ S for λ ∈ [0, 1] (3.11)
we have
s∗λ − f = (λs∗1 + (1 − λ)s∗2 ) − (λ + (1 − λ))f
= λ(s∗1 − f ) + (1 − λ)(s∗2 − f )
≤ λs∗1 − f + (1 − λ)s∗2 − f
= λ inf s − f + (1 − λ) inf s − f
s∈S s∈S
= inf s − f ,
s∈S

i.e., s∗λ = λs∗1 + (1 − λ)s∗2 ∈ [s∗1 , s∗2 ], for λ ∈ [0, 1], lies in S ∗ .

s*
1

f s* S

s*
2
Fig. 3.3. S is not convex and for s∗ ∈ S we have s∗ −f < η(f, S), cf. Remark 3.17.

We continue with the following remarks concerning Theorem 3.16.

Remark 3.17. If S ⊂ F is in the situation of Theorem 3.16 not convex,
then any element s∗ ∈ [s∗1 , s∗2 ] is at least as close to f ∈ F as s∗1 and s∗2 are,
i.e.,
s∗ − f ≤ η ≡ η(f, S) for all s∗ ∈ [s∗1 , s∗2 ].
Eventually, one s∗ ∈ [s∗1 , s∗2 ] could even lie closer to f than s∗1 , s∗2 , so that
s∗ − f < η, as the example in Figure 3.3 shows.
72 3 Best Approximations

Remark 3.18. For a convex subset S ⊂ F, and in the case of non-unique

best approximations, we can explain the situation as follows. If there are
at least two best approximations s∗1 = s∗2 to f , then all s∗ ∈ [s∗1 , s∗2 ] are
contained in the set of best approximations S ∗ , and so the distance between
f and elements in [s∗1 , s∗2 ] is constant, i.e.,

s∗ − f = η(f, S) for all s∗ ∈ [s∗1 , s∗2 ].

To further illustrate this, let us make one simple example.
Example 3.19. For S = {x ∈ R2 | x∞ ≤ 1} and f = (2, 0), the set S ∗ of
best approximations to f from S with respect to the maximum norm · ∞
is given by 1 2
S ∗ = (1, α) ∈ R2 | α ∈ [−1, 1] ⊂ S
with the minimal distance

η(f, S) = inf s − f ∞ = 1.
s∈S

For s∗1 , s∗2 ∈ S ∗ every element s∗ ∈ [s∗1 , s∗2 ] lies in S ∗ (see Figure 3.4). ♦

1.5

1 S∗
0.5

0
S f
-0.5

-1

-1.5

-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Fig. 3.4. S ∗ = (1, α) ∈ R2 | α ∈ [−1, 1] is the set of best approximations to
f = (2, 0) from S = {x ∈ R2 | x∞ ≤ 1} with respect to · ∞ (see Example 3.19).
3.2 Uniqueness 73

Next, we recall the deﬁnition for (strictly) convex functions.

Deﬁnition 3.20. A function f : [a, b] −→ R is called convex on an interval
[a, b] ⊂ R, if for all x, y ∈ [a, b] the inequality

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for all λ ∈ [0, 1]

holds; f is said to be strictly convex on [a, b], if for all x, y ∈ [a, b], x = y,
we have

f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).

An important property of convex functions is described by the Jensen5

inequality [39], whereby the value of a convex function, when evaluated at a
ﬁnite convex combination of arguments, is bounded above by the correspond-
ing convex combination of functions values at these arguments.
Theorem 3.21. (Jensen inequality, 1906).
Let f : [a, b] −→ R be a convex function, and {x1 , . . . , xn } ⊂ [a, b] a set of
n ≥ 2 points. Then, the Jensen inequality
⎛ ⎞
n
n n
f⎝ λj x j ⎠ ≤ λj f (xj ) for all λj ∈ (0, 1) with λj = 1
j=1 j=1 j=1

holds. If f is strictly convex, then equality holds, if and only if all points
coincide, i.e., x1 = . . . = xn .
Proof. We prove the statement of Jensen’s inequality by induction on n.
Initial step: For n = 2, the statement of Jensen’s inequality is obviously true.
Induction hypothesis: Assume the statement holds for n points {x1 , . . . , xn }.
Induction step (n −→ n + 1): For n + 1 points {x1 , . . . , xn , xn+1 } ⊂ [a, b] and

n
λ1 , . . . , λn , λn+1 ∈ (0, 1) with λj = 1 − λn+1
j=1

we have
⎛ ⎞ ⎛ ⎞

n+1
n
λj
f⎝ λj xj ⎠ = f ⎝(1 − λn+1 ) xj + λn+1 xn+1 ⎠
j=1 j=1
1 − λn+1
⎛ ⎞
n
λj
≤ (1 − λn+1 )f ⎝ xj ⎠ + λn+1 f (xn+1 )
j=1
1 − λn+1

5
Johan Ludwig Jensen (1859-1925), Danish mathematician
74 3 Best Approximations

by the convexity of f . By the induction hypothesis, we can further conclude

⎛ ⎞
n
λ n
λj 1 n
f⎝ xj ⎠ ≤
j
f (xj ) = λj f (xj ) (3.12)
j=1
1 − λn+1 j=1
1 − λn+1 1 − λn+1 j=1

and thus, altogether,

⎛ ⎞

n+1
n+1
f⎝ λj x j ⎠ ≤ λj f (xj ). (3.13)
j=1 j=1

If f is strictly convex, then equality holds in (3.12) only for x1 = . . . = xn

(by induction hypothesis), and, moreover, equality in (3.13) holds only for

n
λj
xn+1 = xj ,
j=1
1 − λn+1

thus altogether only for x1 = . . . = xn = xn+1 .

Next, we introduce the convexity for functionals.
Deﬁnition 3.22. A functional ϕ : F −→ R is said to be convex on F, if
for all u, v ∈ F the inequality

ϕ(λu + (1 − λ)v) ≤ λϕ(u) + (1 − λ)ϕ(v) for all λ ∈ [0, 1] (3.14)

holds.
Remark 3.23. Every norm · : F −→ [0, ∞) is a convex functional on F.
Indeed, for any u, v ∈ F we ﬁnd the inequality

λu + (1 − λ)v ≤ λu + (1 − λ)v for all λ ∈ [0, 1] (3.15)

due to the triangle inequality and the homogeneity of ·. Moreover, equality
in (3.15) holds for all pairs of linearly dependent elements u, v ∈ F with
u = αv for positive scalar α > 0, i.e., we have

λαv + (1 − λ)v = λαv + (1 − λ)v for all λ ∈ [0, 1] (3.16)

by the homogeneity of · .
We introduce the notion of a strictly convex norm classically as follows.
Deﬁnition 3.24. A norm · is called strictly convex on F, if the unit
ball B = {u ∈ F | u ≤ 1} ⊂ F is strictly convex.
As we will show, not every norm is strictly convex. But before we do
so, our ”classical” introduction for strictly convex norms in Deﬁnition 3.24
deserves a comment.
3.2 Uniqueness 75

Remark 3.25. Had we introduced the strict convexity of ϕ : F −→ R in

Deﬁnition 3.22 in a straightforward manner through the inequality

ϕ(λu + (1 − λ)v) < λϕ(u) + (1 − λ)ϕ(v) for all λ ∈ (0, 1), (3.17)

then no norm would be strictly convex in this particular sense! This important
observation is verified by the counterexample in (3.16).
When working with strictly convex norms · (according to Defini-
tion 3.24), we can exclude non-uniqueness of best approximations, if S ⊂ F
is convex. To explain this, we need to further analyze strictly convex norms.
To this end, we first prove the following useful characterization.
Theorem 3.26. Let F be a linear space with norm · . Then the following
statements are equivalent.
(a) The norm · is strictly convex.
(b) The unit ball B = {u ∈ F | u ≤ 1} ⊂ F is strictly convex.
(c) The inequality u + v < 2 holds for all u = v, with u = v = 1.
(d) The equality u + v = u + v, v = 0, implies u = αv for some α ≥ 0.
Proof. Note that the equivalence (a) ⇔ (b) holds by Definition 3.24.
(b) ⇒ (c): The strict convexity of B implies (u + v)/2 < 1 for u = v with
u = v = 1, and so in this case we have u + v < 2.
(c) ⇒ (d): For u = 0 statement (d) holds with α = 0. Now suppose u, v ∈
F \ {0} satisfy u + v = u + v. Without loss of generality, we may
assume u ≤ v (otherwise we swap u and v). In this case, in the sequence
of inequalities

u v v
2≥ + = u + v −
v
−
u v u u u v

u v
≥ + − v − v = u + v − 1 − 1 v
u u u v u u v

u + v 1 1
= − − v = 2
u u v
equality holds everywhere, in particular

u v

u + v = 2.

From (c) we can conclude u/u = v/v and therefore

u
u = αv for α = > 0.
v

(d) ⇒ (b): Suppose u, v ∈ B, u = v, i.e., u ≤ 1 and v ≤ 1. Then we ﬁnd

for any λ ∈ (0, 1) the inequality
76 3 Best Approximations

λu + (1 − λ)v ≤ λu + (1 − λ)v < 1,

provided that u < 1 or v < 1. Otherwise, i.e., if u = v = 1, we have

λu + (1 − λ)v = λu + (1 − λ)v = 1 for λ ∈ (0, 1).

If λu + (1 − λ)v = 1, then we have λu = α(1 − λ)v for one α > 0 from (d).
Therefore, we have u = v, since u = v. This, however, is in contradiction
to the assumption u = v. Therefore, we have, also for this case,

λu + (1 − λ)v < 1 for all λ ∈ (0, 1).

Next, we make explicit examples of strictly convex norms. A ﬁrst simple

example is the absolute value | · |, taken as a norm on R.

Remark 3.27. The absolute value |·| is a strictly convex norm on R. Indeed,
in the equivalence (c) of Theorem 3.26 we can only use the two points u = −1
and v = 1, where we have |u + v| = 0 < 2. But note that the absolute value,
when regarded as a function | · | : R −→ R is not strictly convex on R.

Further examples are Euclidean norms.

Theorem 3.28. Every Euclidean norm is strictly convex.

Proof. Let F be a linear space with Euclidean norm · = (·, ·)1/2 . By

Theorem 3.9, the parallelogram inequality (3.1) holds in F, and so

u + v 2 u − v 2 u2 v2
+ for all u, v ∈ F.
2 2 = 2 + 2

For u, v ∈ F, u = v, with u = v we thus have

u + v 2

2 < u = v ,
2 2

or, u + v < 2 for u = v = 1.

By statement (c) in Theorem 3.26, we see that · is strictly convex.

Next, we regard the linear space of all bounded sequences,

8

∞
≡ ∞
(R) = x = (xk )k∈N ⊂ R sup |xk | < ∞ ,
k∈N

∞
equipped with the -norm
∞
x∞ := sup |xk | for x = (xk )k∈N ∈ .
k∈N
3.2 Uniqueness 77

Moreover, we regard for 1 ≤ p < ∞ the linear subspaces

* +
∞
p
≡ (R) = x = (xk )k∈N ⊂ R
p
|xk | < ∞ ⊂
p ∞
, (3.18)
k=1

p
equipped with the -norm
, ∞
-1/p

xp := |xk | p
for x = (xk )k∈N ∈ p
.
k=1

p
To further analyze the -norms we prove the Hölder6 inequality.

Theorem 3.29. (Hölder inequality, 1889).

Let 1 < p, q < ∞ satisfy 1/p + 1/q = 1. Then, the Hölder inequality

xy1 ≤ xp yq for all x ∈ p

,y ∈ q
, (3.19)

holds with equality in (3.19), if and only if either x = 0 or y = 0 or

xp−1
p
|xk |p−1 = α|yk | with α = >0 for y = 0. (3.20)
yq

Proof. For 1 < p, q < ∞ with 1/p + 1/q = 1 let

x = (xk )k∈N ∈ p
and y = (yk )k∈N ∈ q
.

For x = 0 or y = 0 the Hölder inequality (3.19) is trivial. Now suppose

x, y = 0. Then, we ﬁnd for k ∈ N the estimate

1 |xk |p 1 |yk |q 1 |xk |p 1 |yk |q
− log + ≤ − log − log (3.21)
p xpp q yqq p xpp q yqq

by the Jensen inequality, Theorem 3.21, here applied to the strictly convex
function − log : (0, ∞) −→ R. This yields the Young7 inequality
1/p 1/q
|xk yk | |xk |p |yk |q 1 |xk |p 1 |yk |q
= ≤ p + . (3.22)
xp yq xpp yqq p xp q yqq

Moreover, by Theorem 3.21, we have equality in (3.21), and therefore equality

in (3.22), if and only if
|xk |p |yk |q
p = . (3.23)
xp yqq
By q = p/(p − 1) we see that (3.23) is equivalent to
6
Hölder, Otto (1859-1937), German mathematician
7
William Henry Young (1863-1942), English mathematician
78 3 Best Approximations
1/(p−1)
|xk | |yk |
= . (3.24)
xp yq

Therefore, we have equality in (3.22), if and only if (3.20) holds. Summing

up both sides in the Young inequality (3.22) over k, we ﬁnd
∞
∞
∞
|xk yk | 1 |xk |p 1 |yk |q 1 1
≤ p + q = + = 1,
xp yq p xp q yq p q
k=1 k=1 k=1

and this already proves the Hölder inequality (3.19), with equality, if and
only if (3.20) holds for all k ∈ N.

Now we can show the strict convexity of the p

-norms, for 1 < p < ∞.

Theorem 3.30. For 1 < p < ∞, the p

-norm · p on p
is strictly convex.

Proof. For 1 < p < ∞, let 1 < q < ∞ be the conjugate Hölder exponent of p
satisfying 1/p + 1/q = 1.
For
x = (xk )k∈N and y = (yk )k∈N ∈ p ,
where x = y and xp = yp = 1, we wish to prove the inequality

x + yp < 2, (3.25)

in which case the norm · p would be strictly convex by the equivalence

statement (c) in Theorem 3.26.
For sk := |xk + yk |p−1 and s := (sk )k∈N ∈ q
we have
∞

x + ypp = |xk + yk ||sk |
k=1
∞
≤ (|xk ||sk | + |yk ||sk |) (3.26)
k=1
≤ xp sq + yp sq , (3.27)

where we applied the Hölder inequality (3.19) in (3.27) twice.

By p = (p − 1)q, we have
, ∞
-1/q , ∞
- p1 · pq

sq = |xk + yk | (p−1)q
= |xk + yk | p
= x + yp−1
p
k=1 k=1

and this implies, in combination with (3.27), the Minkowski8 inequality

8
Hermann Minkowski (1864-1909), German mathematician and physicist
3.2 Uniqueness 79

x + yp ≤ xp + yp for all x, y ∈ p

in particular,
x + yp ≤ 2 for xp = yp = 1.
If x + yp = 2 for xp = yp = 1, then we have equality in both (3.26)
and (3.27). But equality in (3.27) is by (3.20) equivalent to the two conditions
1
|xk |p−1 = α|sk | and |yk |p−1 = α|sk | with α = ,
sq

which implies
|xk | = |yk | for all k ∈ N.
In this case, we have equality in (3.26), if and only if sgn(xk ) = sgn(yk ), for
all k ∈ N, i.e., equality in (3.26) and (3.27) implies x = y.
Therefore, the inequality (3.25) holds for all x = y with xp = yp = 1.

Remark 3.31. Theorem 3.30 can be generalized to Lp -norms,

1/p
up := |u(x)| dx
p
for u ∈ Lp ,
Rd

for 1 < p < ∞, where Lp ≡ Lp (Rd ) is the linear space of all functions
whose p-th power is Lebesgue9 integrable. Indeed, in this case (in analogy to
Theorem 3.29) the Hölder inequality

uv1 ≤ up vq for all u ∈ Lp , v ∈ Lq

holds for 1 < p, q < ∞ satisfying 1/p + 1/q = 1. This implies (as in the proof
of Theorem 3.30) the Minkowski inequality

u + vp ≤ up + vp for all u, v ∈ Lp ,

where for 1 < p < ∞ we have equality, if and only if u = αv for some α ≥ 0
(see [35, Theorem 12.6]). Therefore, the Lp -norm · p , forr 1 < p < ∞, is
by equivalence statement (d) in Theorem 3.26 strictly convex.

We can conclude our statement from Remark 3.31 as follows.

Theorem 3.32. For 1 < p < ∞, the Lp -norm · p on Lp is strictly convex.

But there are norms that are not strictly convex. Here are two examples.
9
Henri Léon Lebesgue (1875-1941), French mathematician
80 3 Best Approximations

Example 3.33. The 1

-norm · 1 on 1
in (3.18), deﬁned as
∞

x1 = |xk | for x = (xk )k∈N ∈ 1
,
k=1

is not strictly convex, since the unit ball B1 = {x ∈ 1 | x1 ≤ 1} ⊂ 1 is not

strictly convex. Indeed, for any pair of two unit vectors ej , ek ∈ 1 , j =
k, we
have ej 1 = ek 1 = 1 and

λej + (1 − λ)ek 1 = λ + (1 − λ) = 1 for all λ ∈ (0, 1).

Thus, by Theorem 3.26, statement (b), the 1 -norm ·1 is not strictly convex.
Likewise, we show that for the linear space ∞ of all bounded sequences
the ∞ -norm · ∞ , deﬁned as
∞
x∞ = sup |xk | for x = (xk )k∈N ∈ ,
k∈N

is not strictly convex. This is because for any ek ∈ ∞ , k ∈ N, and the

constant sequence 1 = (1)k∈N ∈ ∞ , we have ek ∞ = 1∞ = 1 and

λek + (1 − λ)1∞ = 1 for all λ ∈ (0, 1).

Example 3.34. For the linear space C ([0, 1]d ) of all continuous functions
on the unit cube [0, 1]d ⊂ Rd , the maximum norm · ∞ , deﬁned as

u∞ = max |u(x)| for u ∈ C ([0, 1]d ),

x∈[0,1]d

is not strictly convex. To see this we take a continuous function u1 ∈ C ([0, 1]d )
satisfying u1 ∞ = 1 and another continuous function u2 ∈ C ([0, 1]d ) satis-
fying u2 ∞ = 1, so that |u1 | and |u2 | attain their maximum on [0, 1]d at one
point x∗ ∈ [0, 1]d , respectively, i.e.,

u1 ∞ = max |u1 (x)| = |u1 (x∗ )| = |u2 (x∗ )| = max |u2 (x)| = u2 ∞ = 1.
x∈[0,1]d x∈[0,1]d

This then implies for uλ = λu1 + (1 − λ)u2 ∈ (u1 , u2 ), with λ ∈ (0, 1),

|uλ (x)| ≤ λ|u1 (x)| + (1 − λ)|u2 (x)| ≤ 1 for all x ∈ [0, 1]d

where equality holds for x = x∗ , whereby uλ ∞ = 1 for all λ ∈ (0, 1).
In this case, the unit ball B = {u ∈ C ([0, 1]d ) | u∞ ≤ 1} is not strictly
convex, i.e., · ∞ is not strictly convex by statement (b) in Theorem 3.26.
To make an explicit example for the required functions u1 and u2 , we take
the geometric mean ug ∈ C ([0, 1]d ) and the arithmetic mean ua ∈ C ([0, 1]d ),
3.2 Uniqueness 81
√ x1 + . . . + x d
ug (x) = d
x 1 · . . . · xd ≤ = ua (x),
d
for x = (x1 , . . . , xd ) ∈ [0, 1]d . Obviously, we have ug ∞ = ua ∞ = 1, where
ug and ua attain their unique maximum on [0, 1]d at 1 = (1, . . . , 1) ∈ [0, 1]d .
♦

Now we consider the Euclidean space Rd , for d ∈ N, as a linear subspace of

the sequence space p , 1 ≤ p ≤ ∞, via the canonical embedding i : Rd → p ,

x = (x1 , . . . , xd )T ∈ Rd −→ i(x) = (x1 , . . . , xd , 0, . . . , 0, . . .) ∈ p

This allows us to formulate the following statements concerning the strict

convexity of the p -norms · p on Rd .

Corollary 3.35. For the p

-norms · p on Rd , deﬁned as

d
xpp = |xk |p for 1 ≤ p < ∞ and x∞ = max |xk |
1≤k≤d
k=1

the following statements are true.

(a) For 1 < p < ∞, the p -norms · p are strictly convex on Rd .
(b) For d > 1, the 1 -norm · 1 is not strictly convex on Rd .
(c) For d > 1, the ∞ -norm · ∞ is not strictly convex on Rd .

Remark 3.36. In statements (b), (c) of Corollary 3.35, we excluded the case
d = 1, since in this univariate setting the norms · 1 and · ∞ coincide
with the strictly convex norm | · | on R (see Remark 3.27).

Now we formulate the main result of this section.

Theorem 3.37. Let F be a linear space, equipped with a strictly convex norm
· . Moreover, assume S ⊂ F is convex and f ∈ F. If there exists a best
approximation s∗ ∈ S to f , then s∗ is unique.

Proof. Suppose s∗1 , s∗2 ∈ S are two diﬀerent best approximations to f from
S, i.e., s∗1 = s∗2 . Then we have

s∗1 − f = s∗2 − f = inf s − f ,

s∈S

which, in combination with the strict convexity of the norm · , implies

∗ ∗
s1 + s∗2 (s1 − f ) + (s∗2 − f )
− f = < s∗1 − f = s∗2 − f . (3.28)
2 2

Due to the assumed convexity for S, the element s∗ = (s∗1 + s∗2 )/2 lies in S.
Moreover, s∗ is closer to f than s∗1 and s∗2 , by (3.28). But this is in contra-
diction to the optimality of s∗1 and s∗2 .
82 3 Best Approximations

We remark that the strict convexity of the norm · gives, in combination
with the convexity of S ⊂ F, only a suﬃcient condition for the uniqueness
of the best approximation. Now we show that this condition is not necessary.
To this end, we make a simple example.

Example 3.38. We regard the maximum norm · ∞ on F = R2 . Moreover,

we let f = (0, 1) ∈ R2 and S = {(α, α) | α ∈ R} ⊂ R2 . Then, s∗ = ( 12 , 12 ) ∈ S
is the unique best approximation to f from S with respect to · ∞ , although
· ∞ is not stricty convex (see Figure 3.5). ♦

1.5

1 f
0.5

S∗
0

-0.5

-1

-1.5

-2
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Fig. 3.5. s∗ = ( 12 , 12 ) ∈ S = {(α, α) | α ∈ R} is the unique best approximation to

f = (0, 1) w.r.t. · ∞ , although · ∞ is not strictly convex (see Example 3.38).

We ﬁnally summarize our discussion concerning the uniqueness of best ap-

proximations, where we note three immediate conclusions from Theorem 3.37.

Corollary 3.39. Let F be a Euclidean space and S ⊂ F be convex. Then

there is for any f ∈ F at most one best approximation s∗ ∈ S to f .

Corollary 3.40. Let S ⊂ Lp be convex for 1 < p < ∞. Then there is for
any f ∈ Lp at most one best approximation s∗ ∈ S to f w.r.t. · p .

Corollary 3.41. Let S ⊂ p be convex for 1 < p < ∞. Then there is for any
f ∈ p at most one best approximation s∗ ∈ S to f w.r.t. · p .
3.2 Uniqueness 83

We ﬁnally formulate an important result concerning the approximation

of continuous functions from C [−1, 1] by approximation spaces S ⊂ C [−1, 1]
that are invariant under reflections of the argument (in short: reflection-
invariant), i.e., for any s(x) ∈ S the function s(−x) lies also in S. For exam-
ple, the linear space Pn of all algebraic polynomials of degree at most n ∈ N0
is reflection-invariant. For the following observation, the uniqueness of the
best approximation plays an important role.
Proposition 3.42. Let f ∈ C [−1, 1] be an even function and, moreover, let
S ⊂ C [−1, 1] be a reflection-invariant subset of C [−1, 1]. If there exists a
unique best approximation s∗p ∈ S to f with respect to the Lp -norm · p , for
1 ≤ p ≤ ∞, then s∗p to f is an even function.
Proof. Let f ∈ C [−1, 1] be an even function, i.e., f (x) = f (−x) for all
x ∈ [−1, 1]. Moreover, let s∗p ∈ S be the unique best approximation to f with
respect to · p , for 1 ≤ p ≤ ∞. We regard the reflected function rp∗ for s∗p ,
defined as
rp∗ (x) = s∗p (−x) for x ∈ [−1, 1].
By our assumption we have rp∗ in S.
∗
Case p = ∞: For the distance between r∞ and f with respect to · ∞ ,
∗ ∗ ∗
r∞ − f ∞ = max |r∞ (x) − f (x)| = max |r∞ (−x) − f (−x)|
x∈[−1,1] x∈[−1,1]

= max |s∗∞ (x) − f (x)| = s∗∞ − f ∞ = η∞ (f, S),

x∈[−1,1]

we obtain the minimal distance between f and S with respect to · ∞ ,

∗
i.e., r∞ ∈ S is the best approximation to f . Now the uniqueness of the best
approximation implies our statement s∗∞ (x) = r∞ ∗
(x), or, s∗∞ (x) = s∗∞ (−x)
for all x ∈ [−1, 1], i.e., s∗∞ is an even function on [−1, 1].
Case 1 ≤ p < ∞: In this case, we regard the distance between rp∗ and f
in the Lp -Norm,
1 1
rp∗ − f pp = |rp∗ (x) − f (x)|p dx = |rp∗ (−x) − f (−x)|p dx
−1 −1
1
= |s∗p (x) − f (x)|p dx = s∗p − f pp = ηpp (f, S),
−1

whereby we get the minimal distance ηp (f, S) between f and S with respect
to · p . Again, by the uniqueness of the best approximation we obtain the
stated result by

s∗p (x) = rp∗ (x) = s∗p (−x) for all x ∈ [−1, 1].

For an alternative proof of Proposition 3.42, we refer to Exercise 3.74.
84 3 Best Approximations

3.3 Dual Characterization

In this section and in the following section, we develop necessary and suf-
ﬁcient conditions to characterize best approximations. We begin with dual
characterizations. To this end, let F be a normed linear space. We introduce
the topological dual, or, the dual space of F as usual by
1 2
F = ϕ : F −→ R ϕ linear and continuous .

The elements from the linear space F are called dual functionals. On
this occasion, we recall the notions of linearity, continuity and boundedness
of functionals. We start with linearity.

Deﬁnition 3.43. A functional ϕ : F −→ R is called linear on F, if

ϕ(αu + βv) = αϕ(u) + βϕ(v) for all u, v ∈ F and all α, β ∈ R.

We had introduced continuity already in Deﬁnition 3.3. Next, we turn

to the boundedness of functionals. In the following discussion, F denotes a
linear space with norm · .

Deﬁnition 3.44. A functional ϕ : F −→ R is said to be bounded on F, if

there exists a constant C ≡ Cϕ > 0 satisfying

|ϕ(u)| ≤ Cu for all u ∈ F. (3.29)

We call such a constant C upper bound for ϕ.

Now we can introduce a norm for the dual space F , by using the norm
· of F. To this end, we take for any functional ϕ ∈ F the smallest upper
bound C ≡ Cϕ in (3.29). To be more precise on this, we deﬁne by

|ϕ(u)|
ϕ = sup = sup |ϕ(u)|
u∈F u u∈F
u=0 u=1

a mapping · : F −→ R. As it can be veriﬁed by elementary calculations,

· is a norm on F , according to Deﬁnition 1.1. In other words, the dual
space F is a linear space with norm · .
The following result for linear functionals is quite important.

Theorem 3.45. For a linear functional ϕ : F −→ R, the following state-

ments are equivalent.
(a) ϕ is continuous at one u0 ∈ F.
(b) ϕ is continuous on F.
(c) ϕ is bounded on F.
3.3 Dual Characterization 85

Proof. (a) ⇒ (b): Let ϕ be continuous at u0 ∈ F, and, moreover, let (un )n∈N
be a convergent sequence in F with limit u ∈ F. Then we have

ϕ(un ) = ϕ(un −u+u0 )+ϕ(u−u0 ) −→ ϕ(u0 )+ϕ(u−u0 ) = ϕ(u) for n → ∞.

Therefore, ϕ is continuous at every u ∈ F, i.e., ϕ is continuous on F.

The implication (b) ⇒ (a) is trivial, so the equivalence (a) ⇔ (b) is shown.
(c) ⇒ (b): Let ϕ be bounded on F, i.e., we have (3.29) for some C > 0. This
implies ϕ(un ) −→ 0, n → ∞, for every zero sequence (un )n∈N in F, and so
ϕ is continuous at zero. By the equivalence (a) ⇔ (b) is ϕ continuous on F.
(b) ⇒ (c): Let ϕ be continuous on F. Suppose ϕ is not bounded on F. Then,
there is a sequence (un )n∈N in F satisfying

un = 1 and |ϕ(un )| > n for all n ∈ N,

since otherwise there would exist an upper bound N ∈ N for ϕ (i.e., ϕ would
be bounded). In this case, the sequence (vn )n∈N , deﬁned as
un
vn = for n ∈ N,
|ϕ(un )|
is a zero sequence in F by
1
vn = −→ 0 for n → ∞,
|ϕ(un )|
and so, by continuity of ϕ, we have

ϕ(vn ) −→ ϕ(0) = 0 for n → ∞.

But this is in contradiction to |ϕ(vn )| = 1 for all n ∈ N.

Now we are in a position where we can formulate a suﬃcient condition
for the dual characterization of best approximations.
Theorem 3.46. Let S ⊂ F be a non-empty subset of F. Moreover, let f ∈ F
and s∗ ∈ S. Suppose that ϕ ∈ F is a dual functional satisfying the following
properties.
(a) ϕ = 1.
(b) ϕ(s∗ − f ) = s∗ − f .
(c) ϕ(s − s∗ ) ≥ 0 for all s ∈ S.
Then s∗ is a best approximation to f .
Proof. For s ∈ S, we have ϕ(s − f ) ≤ s − f , due to (a). Moreover, we have

s − f ≥ ϕ(s − f ) = ϕ(s − s∗ ) + ϕ(s∗ − f ) ≥ s∗ − f

by (b) and (c). Therefore, s∗ is a best approximation to f .

86 3 Best Approximations

Note that the above characterization in Theorem 3.46 only requires S to

be non-empty. However, if we assume S ⊂ F to be convex, then we can show
that the suﬃcient condition in Theorem 3.46 is also necessary. To this end, we
need the following separation theorem for convex sets, which can be viewed
as a geometric implication from the well-known Hahn10 -Banach11 theorem
(see [33, Section 16.1]) which was proven by Mazur12 in [3].

Theorem 3.47. (Banach-Mazur separation theorem, 1933).

Let K1 , K2 ⊂ F be two non-empty, disjoint and convex subsets in a normed
linear space F. Moreover, suppose K1 is an open set. Then there exists a
separating functional ϕ ∈ F for K1 and K2 , i.e., we have

ϕ(u1 ) < ϕ(u2 ) for all u1 ∈ K1 , u2 ∈ K2 .

On the Banach-Mazur separation theorem, we can formulate a suﬃcient

and necessary condition for the dual characterization of best approximations.

Theorem 3.48. Let S ⊂ F be a convex set in F. Moreover, suppose f ∈

F \ S. Then, s∗ ∈ S is a best approximation to f , if and only if there exists
a dual functional ϕ ∈ F satisfying the following properties.
(a) ϕ = 1.
(b) ϕ(s∗ − f ) = s∗ − f .
(c) ϕ(s − s∗ ) ≥ 0 for all s ∈ S.

Proof. Note that the suﬃciency of the statement is covered by Theorem 3.46.
To prove the necessity, suppose that s∗ ∈ S is a best approximation to f .
Regard the open ball

Bη (f ) = {u ∈ F | u − f < s∗ − f } ⊂ F

around f with radius η = s∗ − f . Note that for K1 = Bη (f ) and K2 = S

the assumptions of the Banach-Mazur separation theorem, Theorem 3.47, are
satisﬁed. Therefore, there is a separating functional ϕ ∈ F with

ϕ(u) < ϕ(s) for all u ∈ Bη (f ) and s ∈ S. (3.30)

Now let (un )n∈N ⊂ Bη (f ) be a convergent sequence with limit element s∗ ,

i.e., un −→ s∗ for n → ∞. By the continuity of ϕ, this implies

ϕ(un ) −→ ϕ(s∗ ) = inf ϕ(s),

s∈S

10
Hans Hahn (1879-1934), Austrian mathematician and philosopher
11
Stefan Banach (1892-1945), Polish mathematician
12
Stanislaw Mazur (1905-1981), Polish mathematician
3.4 Direct Characterization 87

i.e., ϕ(s∗ ) ≤ ϕ(s) for all s ∈ S, and so ϕ has property (c).

To show properties (a) and (b), let v ∈ F with v < 1. Then, u = ηv + f
lies in Bη (f ). With (3.30) and by the linearity of ϕ, we have
∗
u−f s −f
ϕ(v) = ϕ < ϕ .
s∗ − f s∗ − f

This implies
s∗ − f
ϕ = sup |ϕ(v)| ≤ ϕ
v ≤1 s∗ − f
and, moreover, by using the continuity of ϕ once more, we have

ϕ(s∗ − f )
ϕ = ⇐⇒ ϕ(s∗ − f ) = ϕ · s∗ − f .
s∗ − f

If we ﬁnally normalize the length of ϕ = 0, by scaling ϕ to unit norm, i.e.,

ϕ = 1, then ϕ satisﬁes properties (a) and (b).

3.4 Direct Characterization

In this section, we develop necessary and suﬃcient conditions for the mini-
mization of convex functionals. We then apply these conditions to norms to
obtain useful characterizations for best approximations. To this end, we work
with Gâteaux13 derivatives to compute directional derivatives for relevant
norms. In the following discussion, F denotes a linear space.

Deﬁnition 3.49. For a functional ϕ : F −→ R,

1
ϕ+ (u, v) := lim (ϕ(u + hv) − ϕ(u)) for u, v ∈ F (3.31)
h 0 h
is said to be the G^
ateaux derivative of ϕ at u in direction v, provided that
the limit on the right hand side in (3.31) exists.

For convex functionals ϕ : F −→ R we can show that the limit on the

right hand side in (3.31) exists.

Theorem 3.50. Let ϕ : F −→ R be a convex functional. Then, the Gâteaux

derivative ϕ+ (u, v) exists for all u, v ∈ F. Moreover, the inequality

−ϕ+ (u, −v) ≤ ϕ+ (u, v) for all u, v ∈ F

holds.
13
René Gâteaux (1889-1914), French mathematician
88 3 Best Approximations

Proof. Let ϕ : F −→ R be a convex functional. We show that for any u, v ∈ F

the diﬀerence quotient Du,v : (0, ∞) −→ R, deﬁned as

1
Du,v (h) = (ϕ(u + hv) − ϕ(u)) for h > 0, (3.32)
h
is a monotonically increasing function in h > 0, which, moreover, is bounded
below. To verify the monotonicity, we regard the convex combination
h 2 − h1 h1
u + h1 v = u + (u + h2 v) for h2 > h1 > 0.
h2 h2
The convexity of ϕ then implies the inequality
h2 − h1 h1
ϕ(u + h1 v) ≤ ϕ(u) + ϕ(u + h2 v)
h2 h2
and, after elementary calculations, the monotonicity
1 1
Du,v (h1 ) = (ϕ(u + h1 v) − ϕ(u)) ≤ (ϕ(u + h2 v) − ϕ(u)) = Du,v (h2 ).
h1 h2
If we now form the convex combination
h2 h1
u= (u − h1 v) + (u + h2 v) for h1 , h2 > 0,
h 1 + h2 h1 + h2
we obtain, by using the convexity of ϕ, the inequality
h2 h1
ϕ(u) ≤ ϕ(u − h1 v) + ϕ(u + h2 v)
h1 + h2 h 1 + h2
and, after elementary calculations, we obtain the estimate
1
−Du,−v (h1 ) = − (ϕ(u − h1 v) − ϕ(u))
h1
1
≤ (ϕ(u + h2 v) − ϕ(u)) = Du,v (h2 ). (3.33)
h2
This implies that the monotonically increasing diﬀerence quotient Du,v is
bounded from below for all u, v ∈ F. In particular, Du,−v is a monotoni-
cally increasing function that is bounded from below. Therefore, the Gâteaux
derivatives ϕ+ (u, v) and ϕ+ (u, −v) exist. By (3.33), we ﬁnally have

1 1
− (ϕ(u − hv) − ϕ(u)) ≤ −ϕ+ (u, −v) ≤ ϕ+ (u, v) ≤ (ϕ(u + hv) − ϕ(u))
h h
for all h > 0, as stated.

Now we note a few elementary properties of the Gâteaux derivative.

3.4 Direct Characterization 89

Theorem 3.51. Let ϕ : F −→ R be a convex functional. Then the Gâteaux

derivative ϕ+ of ϕ has for all u, v, w ∈ F the following properties.
(a) ϕ+ (u, αv) = αϕ+ (u, v) for all α ≥ 0.
(b) ϕ+ (u, v + w) ≤ ϕ+ (u, v) + ϕ+ (u, w).
(c) ϕ+ (u, ·) : F −→ R is a convex functional.

Proof. (a): The case α = 0 is trivial. For α > 0 we have

1
ϕ+ (u, αv) = lim (ϕ(u + hαv) − ϕ(u))
h 0 h
1
= α lim (ϕ(u + hαv) − ϕ(u)) = αϕ+ (u, v).
h 0 hα

(b): The representation

1 1
u + h(v + w) = (u + 2hv) + (u + 2hw),
2 2
in combination with the convexity of ϕ, implies
1
ϕ+ (u, v + w) = lim (ϕ(u + h(v + w)) − ϕ(u))
h h
0

1 1 1
≤ lim ϕ(u + 2hv) + ϕ(u + 2hw) − ϕ(u)
h 0 h 2 2
1 1
= lim (ϕ(u + 2hv) − ϕ(u)) + lim (ϕ(u + 2hw) − ϕ(u))
h 0 2h h 0 2h

= ϕ+ (u, v) + ϕ+ (u, w).

(c): For u ∈ F, the Gâteaux derivative ϕ+ (u, ·) : F −→ R is convex, since

ϕ+ (u, λv + (1 − λ)w) ≤ ϕ+ (u, λv) + ϕ+ (u, (1 − λ)w)

= λϕ+ (u, v) + (1 − λ)ϕ+ (u, w)

holds for all λ ∈ [0, 1], by using properties (a) and (b).

Remark 3.52. By the properties (a) and (b) in Theorem 3.51, we call the
functional ϕ+ (u, ·) : F −→ R sublinear. We can show that the sublinearity of
ϕ+ (u, ·), for all u ∈ F, in combination with the inequality

ϕ+ (u, v − u) ≤ ϕ(v) − ϕ(u) for all u, v ∈ F,

implies the convexity of ϕ. To see this, we refer to Exercise 3.80.

Now we show further elementary properties of the Gâteaux derivative.

90 3 Best Approximations

Theorem 3.53. Let ϕ : F −→ R be a continuous functional. Suppose that

for u, v ∈ F the Gâteaux derivative ϕ+ (u, v) exists. Moreover, suppose that
F : R −→ R has a continuous derivative, i.e., F ∈ C 1 (R). Then the Gâteaux
derivative (F ◦ ϕ)+ (u, v) of the composition F ◦ ϕ : F −→ R exists at u in
direction v, and, moreover, the chain rule

(F ◦ ϕ)+ (u, v) = F (ϕ(u)) · ϕ+ (u, v) (3.34)

holds.

Proof. For x := ϕ(u) and xh := ϕ(u + hv), for h > 0, we let

⎧
⎨ F (xh ) − F (x) for xh = x,
⎪
G(xh ) := xh − x
⎪
⎩ F (x) for xh = x.

By the continuity of ϕ we have xh −→ x for h 0. Since F ∈ C 1 (R) this

implies

F (x) = lim G(xh ) = lim G(ϕ(u + hv)) = F (ϕ(u)).

xh →x h 0

Moreover, we have

F (xh ) − F (x) = G(xh )(xh − x) for all h > 0.

This ﬁnally implies

1
(F ◦ ϕ)+ (u, v) = lim (F (ϕ(u + hv)) − F (ϕ(u)))
h 0h
1
= lim (F (xh ) − F (x))
h 0 h
1
= lim G(xh ) · lim (xh − x)
h 0 h 0 h
1
= lim G(ϕ(u + hv)) · lim (ϕ(u + hv) − ϕ(u))
h 0 h 0 h

= F (ϕ(u)) · ϕ+ (u, v),

proving both the existence of (F ◦ ϕ)+ (u, v) and the chain rule in (3.34).

We now formulate a fundamental suﬃcient and necessary condition for

the characterization of minima for convex functionals.

Theorem 3.54. Let ϕ : F −→ R be a convex functional. Moreover, let

K ⊂ F be convex and u0 ∈ K. Then the following statements are equivalent.
(a) ϕ(u0 ) = inf u∈K ϕ(u).
(b) ϕ+ (u0 , u − u0 ) ≥ 0 for all u ∈ K.
3.4 Direct Characterization 91

Proof. (b) ⇒ (a): Suppose ϕ+ (u0 , u−u0 ) ≥ 0 for u ∈ K. Then we have, due to
the monotonicity of the diﬀerence quotient Du0 ,u−u0 in (3.32), in particular
for h = 1,

0 ≤ ϕ+ (u0 , u − u0 ) ≤ ϕ(u0 + (u − u0 )) − ϕ(u0 ) = ϕ(u) − ϕ(u0 )

and thus ϕ(u) ≥ ϕ(u0 ).

(a) ⇒ (b): Suppose ϕ(u0 ) = inf u∈K ϕ(u).
Then for u ∈ K and small enough h > 0 the inequality
1
(ϕ(u0 + h(u − u0 )) − ϕ(u0 )) ≥ 0
h
holds, since, due to convexity of K, we have

u0 + h(u − u0 ) = hu + (1 − h)u0 ∈ K for all h ∈ (0, 1).

This ﬁnally implies

1
ϕ+ (u0 , u − u0 ) = lim (ϕ(u0 + h(u − u0 )) − ϕ(u0 )) ≥ 0.
h 0 h

Now we wish to use the conditions from Theorem 3.54 to derive direct
characterizations of best approximations to f ∈ F. To this end, we regard
the distance functional ϕf : F −→ R, deﬁned as

ϕf (v) = v − f for v ∈ F.

Note that the distance functional ϕf is, as a composition between two

continuous functionals, namely the translation about f and the norm · , a
continuous functional. Moreover, ϕf : F −→ R is convex. Indeed, for λ ∈ [0, 1]
and v1 , v2 ∈ F we have

ϕf (λv1 + (1 − λ)v2 )
= λv1 + (1 − λ)v2 − f = λ(v1 − f ) + (1 − λ)(v2 − f )
≤ λv1 − f + (1 − λ)v2 − f = λϕf (v1 ) + (1 − λ)ϕf (v2 ).

Therefore, ϕf has a Gâteaux derivative, for which the chain rule (3.34) holds.
Now the direct characterization from Theorem 3.54 can be applied to the
distance functional ϕf . This leads us to a corresponding equivalence, which
is referred to as Kolmogorov14 criterion.
For the Gâteaux derivative of the norm ϕ = · : F −→ R we will
henceforth use the notation

+ (u, v) := ϕ+ (u, v) for u, v ∈ F.

14
Andrey Nikolaevich Kolmogorov (1903-1987), Russian mathematician
92 3 Best Approximations

Corollary 3.55. (Kolmogorov criterion).

For f ∈ F, S ⊂ F convex, and s∗ ∈ S the following statements are equivalent.
(a) s∗ is a best approximation to f .
(b) + (s∗ − f, s − s∗ ) ≥ 0 for all s ∈ S.

Proof. Using ϕ(u) = u − f in Theorem 3.54, we see that s∗ ∈ S is a best

approximation to f , if and only if
1
ϕ+ (s∗ , s − s∗ ) = lim (ϕ(s∗ + h(s − s∗ )) − ϕ(s∗ ))
h 0 h
1
= lim (s∗ + h(s − s∗ ) − f − s∗ − f )
h 0 h
1
= lim (s∗ − f + h(s − s∗ ) − s∗ − f )
h 0 h

= + (s∗ − f, s − s∗ ) ≥ 0 for all s ∈ S.

Remark 3.56. For proving the implication (b) ⇒ (a) in Theorem 3.54 we
did not use the convexity of K. Therefore, we can specialize the equivalence
in Corollary 3.55 to establish the implication

+ (s∗ − f, s − s∗ ) ≥ 0 for all s ∈ S =⇒ s∗ is best approximation to f

for subsets S ⊂ F that are not necessarily convex.

Now we use the Gâteaux derivative to prove a characterization concerning

the uniqueness of best approximations. To this end, we introduce a suﬃcient
condition, which will be more restrictive than that of (plain) uniqueness.

Deﬁnition 3.57. Let F be a linear space with norm · , S ⊂ F be a subset

of F, and f ∈ F. Then s∗ ∈ S is said to be the strongly unique best
approximation to f , if there is a constant α > 0 satisfying

s − f − s∗ − f ≥ αs − s∗ for all s ∈ S.

Relying on our previous investigations (especially on Theorem 3.54 and

Corollary 3.55), we can characterize the strong uniqueness of best approxi-
mations for convex subsets S ⊂ F directly as follows.

Theorem 3.58. Let F be a linear space with norm · and S ⊂ F be convex.
Moreover, suppose f ∈ F. Then the following statements are equivalent.
(a) s∗ ∈ S is the strongly unique best approximation to f .
(b) There is one α > 0 satisfying + (s∗ − f, s − s∗ ) ≥ αs − s∗ for all s ∈ S.
3.4 Direct Characterization 93

Proof. Suppose f ∈ F and s∗ ∈ S. For f ∈ F we regard the convex distance

functional ϕ : S −→ [0, ∞), deﬁned as ϕ(s) = s − f for s ∈ S. Now any
element in S \ {s∗ } can be written as a convex combination of the form
s∗ + h(s − s∗ ) = hs + (1 − h)s∗ ∈ S with s ∈ S \ {s∗ } and 1 ≥ h > 0.
Thereby, we can formulate the strong uniqueness of s∗ , for some α > 0, as
1 1
(ϕ(s∗ + h(s − s∗ )) − ϕ(s∗ )) ≥ α for all s ∈ S \ {s∗ } and h > 0.
s − s∗ h
By the monotonicity of the Gâteaux derivative, this is equivalent to
1
ϕ+ (s∗ , s − s∗ ) = lim (ϕ(s∗ + h(s − s∗ )) − ϕ(s∗ )) ≥ αs − s∗
h 0 h

for all perturbation directions s − s∗ ∈ S, or, in other words, we have

+ (s∗ − f, s − s∗ ) ≥ αs − s∗ for all s ∈ S.

Next, we add an important stability estimate to our discussion on strongly
unique best approximations. The following result is due to Freud15 .
Theorem 3.59. (Freud).
For a linear space F with norm · and a subset S ⊂ F, let s∗f ∈ S be the
strongly unique best approximation to f ∈ F with constant α > 0. Moreover,
let s∗g ∈ S be a best approximation to g ∈ F. Then the stability estimate
2
s∗g − s∗f ≤ g − f
α
holds.
Proof. By the strong uniqueness of s∗f , we have the estimate
s∗g − f − s∗f − f ≥ αs∗g − s∗f ,
which further implies the inequalities
1
s∗g − s∗f ≤
(s∗ − f − s∗f − f )
α g
1
≤ (s∗g − g + g − f − s∗f − f )
α
1
≤ (s∗f − g + g − f − s∗f − f )
α
1
≤ (s∗f − f + f − g + g − f − s∗f − f )
α
2
= g − f ,
α
which already proves the stated stability estimate.
15
Géza Freud (1922-1979), Hungarian mathematician
94 3 Best Approximations

Remark 3.60. If the best approximation s∗g ∈ S is for any g ∈ F unique,

then the mapping g −→ s∗g is well-deﬁned. Further, due to the Freud theorem,
Theorem 3.59, this mapping is continuous at all elements f ∈ F which have
a strongly unique best approximation s∗f ∈ S.
If every f ∈ F has a strongly unique best approximation s∗f ∈ S, such
that their corresponding constants αf > 0 are on F uniformly bounded away
from zero, i.e., if there is some α0 > 0 satisfying αf ≥ α0 > 0 for all f ∈ F,
then the mapping f −→ s∗f is, due to the Freud theorem, Theorem 3.59, by

2
s∗g − s∗f ≤ g − f for all f, g ∈ F,
α0
Lipschitz continuous on F with Lipschitz constant 2/α0 (see Deﬁnition 6.64).

Next, we will explicitly compute Gâteaux derivatives for relevant norms

· . But ﬁrst we make the following simple observation.

Remark 3.61. For the Gâteaux derivative of ϕ(u) = u at u = 0, we have

1
+ (0, v) = lim (0 + hv − 0) = v
h 0 h

for any direction v ∈ F.

Now we compute Gâteaux derivatives for Euclidean norms.

Theorem 3.62. The Gâteaux derivative of any Euclidean norm · is given

as
u
+ (u, v) = ,v for all u ∈ F \ {0} and v ∈ F.
u

Proof. Let F be a Euclidean space with norm · = (·, ·)1/2 .

For ϕ(u) = u the chain rule in (3.34) holds in particular for F (x) = x2 ,
i.e.,

2 + (u, v) = 2u · + (u, v) for all u, v ∈ F. (3.35)
Moreover, we have
1
2 +
(u, v) = lim u + hv2 − u2
h h
0
1
= lim u2 + 2h(u, v) + h2 v2 − u2
h 0 h
1
= lim 2h(u, v) + h2 v2
h 0 h
= 2(u, v).

This implies, for u = 0 with (3.35),

3.4 Direct Characterization 95

u
+ (u, v) = ,v for all v ∈ F.
u

The Gâteaux derivative of the absolute value | · | is rather elementary.

Lemma 3.63. For the absolute-value function | · | : R −→ [0, ∞), we have

|+ (x, y) = y sgn(x) for all x = 0 and y ∈ R.

Proof. First, note that for x = 0 we have

|x + hy| = |x| + hy sgn(x) for h|y| < |x|. (3.36)

This implies
1 1
|+ (x, y) = lim (|x + hy| − |x|) = lim (|x| + hy sgn(x) − |x|) = y sgn(x).
h 0 h h 0 h

By using the observation (3.36), we can compute the Gâteaux derivatives

for all Lp -norms · p , for 1 ≤ p ≤ ∞, on the linear space

C (Ω) = {u : Ω −→ R | u continuous on Ω}

of all continuous functions on a compact domain Ω ⊂ Rd , d ≥ 1. We begin

with the maximum norm · ∞ on C (Ω), deﬁned as

u∞ = max |u(x)| for u ∈ C (Ω).

x∈Ω

Theorem 3.64. Let Ω ⊂ Rd be compact. Then, for the Gâteaux derivative

of the maximum norm · = · ∞ on C (Ω), we have

+ (u, v) = max v(x) sgn(u(x))

x∈Ω
|u(x)|=u∞

for any u, v ∈ C (Ω), u ≡ 0.

Proof. Suppose u ∈ C (Ω), u ≡ 0, and v ∈ C (Ω).

”≥”: We ﬁrst show the inequality

+ (u, v) ≥ max v(x) sgn(u(x)).

x∈Ω
|u(x)|=u∞

Suppose x ∈ Ω with |u(x)| = u∞ . Then, by (3.36) we have the inequality

96 3 Best Approximations

1 1
(u + hv∞ − u∞ ) ≥ (|u(x) + hv(x)| − |u(x)|)
h h
1
= (|u(x)| + hv(x) sgn(u(x)) − |u(x)|)
h
= v(x) sgn(u(x))
for h|v(x)| < |u(x)|, which by h 0 already implies the stated inequality.
”≤”: To verify the inequality
+ (u, v) ≤ max v(x) sgn(u(x))
x∈Ω
|u(x)|=u∞

we regard a strictly monotonically decreasing zero sequence (hk )k∈N of posi-

and where x ∈ Ω is an extremum of u in Ω, i.e., |u(x)| = u∞ .

Theorem 3.65. Let Ω ⊂ Rd be compact and suppose u, v ∈ C (Ω). Then,
we have
+ (u, v) = v(x) sgn(u(x)) dx + |v(x)| dx (3.37)
Ω+ Ω0

for the Gâteaux derivative of the L1 -norm · = · 1 on C (Ω),

u1 = |u(x)| dx for u ∈ C (Ω),
Ω

where Ω0 := {x ∈ Ω | u(x) = 0} ⊂ Ω and Ω+ := Ω \ Ω0 .

3.4 Direct Characterization 97

Proof. For u, v ∈ C (Ω), we have

1
(u + hv1 − u1 )
h
1
= |u(x) + hv(x)| dx − |u(x)| dx
h
Ω Ω

1
= (|u(x) + hv(x)| − |u(x)|) dx + |v(x)| dx. (3.38)
h Ω+ Ω0

By Ωh := {x ∈ Ω+ | h · |v(x)| < |u(x)|} ⊂ Ω+ , for h > 0, and with (3.36)

we represent the ﬁrst integral in (3.38) as

1
χΩ+ (x)(|u(x) + hv(x)| − |u(x)|) dx
h Rd

= χΩh (x)v(x) sgn(u(x)) dx
Rd

1
+ χΩ+ \Ωh (x) (|u(x) + hv(x)| − |u(x)|) dx, (3.39)
h Rd
where for Ω ⊂ Rd
1 for x ∈ Ω,
χΩ (x) =
0 for x ∈ Ω,
denotes the indicator function (i.e., the characteristic function) for Ω.
Now we estimate the integral in (3.39) from above by

χΩ+ \Ωh (x) (|u(x) + hv(x)| − |u(x)|) dx
Rd

≤ χΩ+ \Ωh (x) (|u(x)| + h|v(x)| − |u(x)|) dx
Rd

= h· χΩ+ \Ωh (x)|v(x)| dx. (3.40)
Rd

Since χΩh −→ χΩ+ , or, χΩ+ \Ωh −→ 0, for h 0, the statement in (3.37)
follows from the representations (3.38), (3.39) and (3.40).
To compute the Gâteaux derivatives for the remaining Lp -norms · p ,
1/p
up = |u(x)| dx
p
for u ∈ C (Ω),
Ω

for 1 < p < ∞, we need the following lemma.

Lemma 3.66. For 1 < p < ∞, let ϕ(u) = up for u ∈ C (Ω), where Ω ⊂ Rd
is assumed to be compact. Then, we have

(ϕp )+ (u, v) = p |u(x)|p−1 v(x) sgn(u(x)) dx (3.41)
Ω

for all u, v ∈ C (Ω), u ≡ 0.

98 3 Best Approximations

Proof. For u, v ∈ C (Ω), we have

1 p
(ϕ (u + hv) − ϕp (u))
h
1 1
= u + hvp − up =
p p
|u(x) + hv(x)| dx −
p
|u(x)| dx
p
h h Ω Ω
, -
1
= (|u(x) + hv(x)| − |u(x)| ) dx + h
p p p−1
|v(x)|p dx, (3.42)
h Ω+ Ω0

where Ω0 = {x ∈ Ω | u(x) = 0} and Ω+ = Ω \ Ω0 .

For x ∈ Ωh = {x ∈ Ω | h · |v(x)| < |u(x)|} ⊂ Ω+ , where h > 0, we have
p
|u(x) + hv(x)|p = (|u(x)| + hv(x) sgn(u(x)))
= |u(x)|p + p · |u(x)|p−1 · hv(x) sgn(u(x)) + o(h) for h 0
by (3.36) and by Taylor16 expansion of F (u) = up at |u|.
Thereby, we can split the ﬁrst integral in (3.42) into the sum

1
χΩ+ (x) (|u(x) + hv(x)| − |u(x)| ) dx
p p
h
R
d

=p χΩh (x)|u(x)|p−1 v(x) sgn(u(x)) dx + o(1) (3.43)

R d

1
+ χΩ+ \Ωh (x) (|u(x) + hv(x)|p − |u(x)|p ) dx. (3.44)
h Rd
Now we estimate the expression in (3.44) from above by

1
χΩ+ \Ωh (x) (|u(x) + hv(x)|p − |u(x)|p ) dx
h Rd

1 p
≤ χΩ+ \Ωh (x) ((|u(x)| + h|v(x)|) − |u(x)|p ) dx
h Rd

=p χΩ+ \Ωh (x)|u(x)|p−1 |v(x)| dx + o(1) for h 0. (3.45)
Rd

Since χΩh −→ χΩ+ , or, χΩ+ \Ωh −→ 0, for h 0, the stated representation
in (3.41) follows from (3.42), (3.43), (3.44), and (3.45).
Now we can ﬁnally provide the Gâteaux derivatives for the remaining
Lp -norms · p , for 1 < p < ∞.
Theorem 3.67. Let Ω ⊂ Rd be compact. Moreover, suppose 1 < p < ∞.
Then, for the Gâteaux derivative of the Lp -norm · = · p on C (Ω), we
have
1
+ (u, v) = |u(x)|p−1 v(x) sgn(u(x)) dx
up−1
p Ω

for all u, v ∈ C (Ω), u ≡ 0.

16
Brook Taylor (1685-1731), English mathematician
3.5 Exercises 99

Proof. The statement follows from the chain rule (3.34) in Theorem 3.53 with
F (x) = xp in combination with the representation of the Gâteaux derivative

(ϕp )+ in Lemma 3.66, whereby

(ϕp )+ (u, v) p
ϕ+ (u, v) = = |u(x)|p−1 v(x) sgn(u(x)) dx,
pϕp−1 (u) pup−1
p Ω

for ϕ(u) = up .

3.5 Exercises
Exercise 3.68. Consider approximating the parabola f (x) = x2 on the unit
interval [0, 1] by linear functions of the form

gξ (x) = ξ · x for ξ ∈ R

with respect to the p-norms · p , for p = 1, 2, ∞, respectively. To this end,

ﬁrst compute the distance function

ηp (ξ) = gξ − f p .

Then, determine the best approximation gξ∗ to f satisfying

gξ∗ − f p = inf gξ − f p

ξ∈R

along with the minimal distance ηp (ξ ∗ ), for each of the three cases p = 1, 2, ∞.
Exercise 3.69. Suppose we wish to approximate the identity f (x) = x on
the unit interval [0, 1], by an exponential sum of the form

pξ (x) = ξ1 eξ2 x + ξ3 for ξ = (ξ1 , ξ2 , ξ3 )T ∈ R3 ,

and with respect to the maximum norm · ∞ .

Show that there is no best approximation to f from S = {pξ | ξ ∈ R3 }.
Hint: Use the parameter sequence ξ (k) = (k, 1/k, −k)T , for k ∈ N.
Exercise 3.70. Regard the linear space C [−π, π], equipped with the norm

g := g1 + g∞ for g ∈ C [−π, π].

Moreover, let f (x) = x, for −π ≤ x ≤ π, and

1 2
S = α sin2 (·) | α ∈ R ⊂ C [−π, π].

Analyze the existence and uniqueness of the approximation problem

min s − f .
s∈S
100 3 Best Approximations

Exercise 3.71. Let ϕ : F −→ R be a convex functional on a linear space F.

Prove the following statements for ϕ.
(a) If ϕ has a (global) maximum on F, then ϕ is constant.
(b) A local minimum of ϕ is also a global minimum of ϕ.

Exercise 3.72. Let (F, ·) be a normed linear space, whose norm · is not
strictly convex. Show that there exists an element f ∈ F, a linear subspace
S ⊂ F, and distinct best approximations s∗1 , s∗2 ∈ S to f , s∗1 = s∗2 , satisfying

η(f, S) = s∗1 − f = s∗2 − f .

Hint: Take for suitable f1 , f2 ∈ F, f1 = f2 , satisfying f1 = f2 = 1

and f1 + f2 = 2, the element f = 12 (f1 + f2 ) ∈ F and the linear subspace
S = {α(f1 − f2 ) | α ∈ R} ⊂ F.

Exercise 3.73. Transfer the result of Proposition 3.42 to the case of odd
functions f ∈ C [−1, 1]. To this end, formulate and prove a corresponding
result for subsets S ⊂ C [−1, 1] that are invariant under point reﬂections,
i.e., for any s(x) ∈ S, we have −s(−x) ∈ S.

Exercise 3.74. Let (F, · ) be a normed linear space and T : F −→ F be

a linear operator which is isometric on F, i.e., T v = v for all v ∈ F.
Moreover, let S ⊂ F be a non-empty subset of F satisfying T (S) ⊂ S.
First prove statements (a) and (b), before you analyze question (c).
(a) If s∗ ∈ S is a best approximation to f ∈ F and T (S) = S, then T s∗ ∈ S
is a best approximation to T f ∈ F.
(b) If f ∈ F is a fixed point of T in F, i.e., T f = f , and s∗ ∈ S is a unique
best approximation to f , then s∗ is a fixed point of T in S.
(c) Suppose f ∈ F is a fixed point of T in F. Moreover, suppose there is no
fixed point of T in S, which is also a best approximation to f . Can you
draw conclusions concerning the uniqueness of best approximation to f ?
Use the results from this exercise to prove Proposition 3.42.

Exercise 3.75. In this exercise, we analyze the existence of discontinuous

linear functionals ϕ on (C [0, 1], · 2 ) and on (C [0, 1], · ∞ ). Give examples,
if possible.
(a) Are there discontinuous linear functionals on (C [0, 1], · 2 )?
(b) Are there discontinuous linear functionals on (C [0, 1], · ∞ )?
3.5 Exercises 101

Exercise 3.76. Let a ≤ x0 < . . . < xn ≤ b be a sequence of pairwise

distinct points in [a, b] ⊂ R and λ0 , . . . , λn ∈ R. Show that the mapping
ϕ : C [a, b] −→ R, deﬁned as

n
ϕ(f ) = λk f (xk ) for f ∈ C [a, b],
k=0

is a continuous linear functional on (C [a, b], · ∞ ) with operator norm

n
ϕ∞ = |λk |.
k=0

Exercise 3.77. Let (F, · ) be a normed linear space and S ⊂ F be a

ﬁnite-dimensional linear subspace of F. Moreover, suppose f ∈ F.
Prove the following statements on linear functionals from the dual space F .
(a) If ϕ ∈ F satisﬁes ϕ ≤ 1 and ϕ(S) = 0, i.e., ϕ(s) = 0 for all s ∈ S,
then we have
η(f, S) = inf s − f ≥ |ϕ(f )|
s∈S

for the minimal distance η(f, S) between f and S.

(b) There exists one ϕ ∈ F satisfying ϕ ≤ 1 and ϕ(S) = 0, such that

|ϕ(f )| = η(f, S).

If η(f, S) > 0, then ϕ = 1.

Exercise 3.78. Consider the linear space F = C ([0, 1]2 ), equipped with the
maximum norm · ∞ . Approximate the function

f (x, y) = x · y for (x, y)T ∈ [0, 1]2

by a function from the linear approximation space

1 2
S = s ∈ F | s(x, y) = s1 (x) + s2 (y) for (x, y)T ∈ [0, 1]2 with s1 , s2 ∈ C [0, 1] .

(a) Construct a linear functional of the form

4
ϕ(g) = λj g(xj , yj ) for g ∈ F
j=1

to estimate the minimal distance η(f, S) between f and S, where

1
η(f, S) ≥ .
4
102 3 Best Approximations

(b) Show that

x y 1
s∗ (x, y) = + − for (x, y)T ∈ [0, 1]2
2 2 4
is a best approximation to f from S with respect to · ∞ .

Exercise 3.79. Show that the function ϕ : R2 −→ R, deﬁned as

* xy2
2 4 for (x, y) = 0
ϕ(x, y) = x +y for (x, y)T ∈ R2 ,
0 for (x, y) = 0

has a Gâteaux derivative at zero, although ϕ is not continuous at zero.

Exercise 3.80. Let F be a linear space and ϕ : F −→ R a functional on F.

Prove the following statements (related to Remark 3.52).

(a) If ϕ is convex on F, then the Gâteaux derivative ϕ+ is monotone, i.e.,

ϕ+ (u1 , u1 − u2 ) − ϕ+ (u2 , u1 − u2 ) ≥ 0 for all u1 , u2 ∈ F.

(b) Assume that the Gâteaux derivative ϕ+ (u, v) exists for all u, v ∈ F.
Moreover, assume that ϕ+ (u, ·) : F −→ R is sublinear for all u ∈ F. If
the inequality

ϕ+ (u, v − u) ≤ ϕ(v) − ϕ(u) for all u, v ∈ F,

holds, then ϕ is convex on F.

4 Euclidean Approximation

In this chapter, we study approximation in Euclidean spaces. Therefore, F

denotes a linear space, equipped with a Euclidean norm · , i.e., · is
deﬁned by an inner product,

f = (f, f )1/2 for f ∈ F.

From the preceding chapter, we understand that Euclidean approximation

has fundamental advantages, in particular for the existence and uniqueness
of best approximations. We brieﬂy summarize our previous results as follows.

Existence of best approximations: For a Hilbert space F, i.e., F is

complete with respect to · , and a closed and convex subset S ⊂ F,
there exists for any f ∈ F a best approximation s∗ ∈ S to f .
Uniqueness best approximations: For convex S ⊂ F, a best approxi-
mation s∗ ∈ S to f ∈ F is unique, due to the strict convexity of · .
The above statements are based on Theorems 3.14, 3.28, and Corol-
lary 3.39 from Chapter 3. Recall that for the existence and uniqueness of
s∗ , the parallelogram identity (3.1) plays a central role. Moreover, accord-
ing to the Jordan-von Neumann theorem, Theorem 3.10, the parallelogram
identity holds only in Euclidean spaces. Therefore, the problem of Euclidean
approximation is fundamentally diﬀerent from the problem of approximation
in non-Euclidean spaces.
In this chapter, we explain further advantages of Euclidean approxima-
tion. To this end, we rely on the characterizations for best approximations. In
particular, we make use of the Kolmogorov criterion, Corollary 3.55, in com-
bination with the representation of Gâteaux derivatives for Euclidean norms
from Theorem 3.62. This yields for ﬁnite-dimensional approximation spaces
S ⊂ F constructive methods to compute best approximations by orthogonal
projection Π : F −→ S of f ∈ F on S.
We treat two important special cases of Euclidean approximation: Firstly,
the approximation of 2π-periodic continuous functions by trigonometric poly-
nomials, where F = C2π and S = Tn . Secondly, the approximation of con-
tinuous functions by algebraic polynomials, in which case F = C [a, b], for a
compact interval [a, b] ⊂ R, and S = Pn .

© Springer Nature Switzerland AG 2018 103

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_4
104 4 Euclidean Approximation

4.1 Construction of Best Approximations

In this section, we apply the characterizations for best approximations from
the previous chapter to Euclidean spaces. To this end, we assume that the
approximation space S ⊂ F is a linear subspace of the Euclidean space F. The
application of the Kolmogorov criterion immediately provides the following
fundamental result.

Theorem 4.1. Let F be a Euclidean space with inner product (·, ·). More-
over, suppose S ⊂ F is a convex subset of F. Then the following statements
are equivalent.
(a) s∗ ∈ S is a best approximation to f ∈ F \ S.
(b) We have (s∗ − f, s − s∗ ) ≥ 0 for all s ∈ S.

Proof. Under the stated assumptions, the equivalence of the Kolmogorov

criterion, Corollary 3.55, holds. Thereby, a best approximation s∗ ∈ S to f
is characterized by the necessary and suﬃcient condition
∗
s −f
+ (s∗ − f, s − s∗ ) = , s − s ∗
≥0 for all s ∈ S,
s∗ − f
using the representation of the Gâteaux derivative from Theorem 3.62.

Remark 4.2. If S ⊂ F is a linear subspace of F, then the variational inequa-

lity in statement (b) of Theorem 4.1 immediately leads us to the necessary
and suﬃcient condition

(s∗ − f, s) = 0 for all s ∈ S, (4.1)

i.e., in this case, s∗ ∈ S is a best approximation to f ∈ F \ S, if and only if

the orthogonality relation s∗ − f ⊥ S holds.

Note that the equivalence statement in Remark 4.2 identiﬁes a best ap-
proximation s∗ ∈ S to f ∈ F as the unique orthogonal projection of f onto
S. In Section 4.2, we will study the projection operator Π : F −→ S, which
assigns every f ∈ F to its unique best approximation s∗ ∈ S in more detail.
Before doing so, we ﬁrst use the orthogonality in (4.1) to characterize best
approximations s∗ ∈ S for convex subsets S ⊂ F. To this end, we work with
the dual characterization of Theorem 3.46.

Theorem 4.3. Let F be a Euclidean space with inner product (·, ·) and let
S ⊂ F be a convex subset of F. Moreover, suppose that s∗ ∈ S satisﬁes
s∗ − f ⊥ S. Then, s∗ is the unique best approximation to f .

Proof. The linear functional ϕ ∈ F , deﬁned as

∗
s −f
ϕ(u) = , u for u ∈ F,
s∗ − f
4.1 Construction of Best Approximations 105

satisfies all three conditions from the dual characterization of Theorem 3.46:
Indeed, the first condition, ϕ = 1, follows from the Cauchy1 -Schwarz2
inequality,
∗
s −f s∗ − f
|ϕ(u)| = ∗
, u ≤
s∗ − f · u = u for all u ∈ F,
s − f
where equality holds for u = s∗ − f ∈ F, since
∗
s −f s∗ − f 2
ϕ(s∗ − f ) = ∗
, s ∗
− f = = s∗ − f .
s − f s∗ − f
Therefore, ϕ also satisfies the second condition in Theorem 3.46. By s∗ −f ⊥ S
we have ∗
s −f
ϕ(s) = ,s = 0 for all s ∈ S,
s∗ − f
and so ϕ finally satisfies the third condition in Theorem 3.46.
In conclusion, s∗ is a best approximation to f . The uniqueness of s∗ follows
from the strict convexity of the Euclidean norm · = (·, ·)1/2 .

Now we consider the special case of Euclidean approximation by ﬁnite-

dimensional approximation spaces S. Therefore, suppose that S ⊂ F is a
linear subspace with dim(S) < ∞. According to Corollary 3.8, there exists
for any f ∈ F a best approximation s∗ ∈ S to f and, moreover, s∗ is unique,
due to Theorem 3.37.
Suppose that S is spanned by n ∈ N basis elements {s1 , . . . , sn } ⊂ F, i.e.,

S = span{s1 , . . . , sn } ⊂ F

so that dim(S) = n < ∞. To compute the unique best approximation s∗ ∈ S

to some f ∈ F we utilize the representation

n
s∗ = c∗j sj ∈ S. (4.2)
j=1

Now the (necessary and suﬃcient) orthogonality condition s∗ − f ⊥ S from

Remark 4.2, or, in (4.1) is equivalent to the requirement

(s∗ , sk ) = (f, sk ) for all 1 ≤ k ≤ n.

Therefore, the representation for s∗ in (4.2) leads us to the n linear conditions

n
c∗j (sj , sk ) = (f, sk ) for 1 ≤ k ≤ n
j=1

1
Augustin-Louis Cauchy (1789-1857), French mathematician
2
Hermann Amandus Schwarz (1843-1921), German mathematician
106 4 Euclidean Approximation

and so to the linear equation system

⎡ ⎤ ⎡ ∗⎤ ⎡ ⎤
(s1 , s1 ) (s2 , s1 ) · · · (sn , s1 ) c1 (f, s1 )
⎢ (s1 , s2 ) (s2 , s2 ) · · · (sn , s2 ) ⎥ ⎢ ∗⎥ ⎢ ⎥
⎢ ⎥ ⎢ c2 ⎥ ⎢ (f, s2 ) ⎥
⎢ .. .. .. .. ⎥ · ⎢ .. ⎥ = ⎢ .. ⎥ ,
⎣ . . . . ⎦ ⎣ . ⎦ ⎣ . ⎦
(s1 , sn ) (s2 , sn ) · · · (sn , sn ) c∗n (f, sn )
or, in short,
Gc∗ = b (4.3)
with the Gram matrix G = ((sj , sk ))1≤k,j≤n ∈ R
3 n×n
, the unknown coeffi-
cient vector c∗ = (c∗1 , . . . , c∗n )T ∈ Rn of s∗ in (4.2) and the right hand side
b = ((f, s1 ), . . . , (f, sn ))T ∈ Rn . Therefore, the solution c∗ ∈ Rn of the linear
system (4.3) yields the unknown coefficients of s∗ in (4.2).
Due to the existence and uniqueness of the best approximation s∗ , the
Gram matrix G must be regular. We specialize this statement on G as follows.
Theorem 4.4. The Gram matrix G in (4.3) is symmetric positive definite.
Proof. The symmetry of G follows from the symmetry of the inner product,
whereby (sj , sk ) = (sk , sj ) for all 1 ≤ j, k ≤ n. Moreover, G is positive defi-
nite, which also immediately follows from the properties of the inner product,
⎛ ⎞ 2

n n n
n

T
c Gc = cj ck (sj , sk ) = ⎝ cj sj , ⎠
ck sk = cj sj
>0
j,k=1 j=1 k=1 j=1

for all c = (c1 , . . . , cn )T ∈ Rn \ {0}.

Given our investigations in this section, the problem of Euclidean approxi-
mation by ﬁnite-dimensional approximation spaces S seems to be solved. The
unique best approximation s∗ can be determined by the unique solution of
the linear system (4.3), i.e., computing s∗ is equivalent to solving (4.3).
But note that we have not posed any conditions on the basis of S, yet.
Next, we show that by suitable choices for bases of S, we can avoid the
linear system (4.3). Indeed, for an orthogonal basis {s1 , . . . , sn } of S, i.e.,
*
0 for j = k,
(sj , sk ) =
sj > 0 for j = k,
2

the Gram matrix G is a diagonal matrix,

⎡ ⎤
s1 2
⎢ s2 2 ⎥
⎢ ⎥
G = diag(s1 2 , . . . , sn 2 ) = ⎢ .. ⎥,
⎣ . ⎦
sn 2
3
Jørgen Pedersen Gram (1850-1916), Danish mathematician
4.2 Orthogonal Bases and Orthogonal Projections 107

in which case the solution c∗ of (4.3) is given by

T
(f, s1 ) (f, sn )
c∗ = ,..., ∈ Rn .
s1 2 sn 2

For an orthonormal basis {s1 , . . . , sn } of S, i.e., (sj , sk ) = δjk , the Gram

matrix G is the identity matrix, G = In ∈ Rn×n , in which case

c∗ = ((f, s1 ), . . . , (f, sn )) ∈ Rn .
T

In the following of this chapter, we develop suitable constructions and

characterizations for orthogonal bases in relevant applications. Before doing
so, we summarize the discussion of this section, and, moreover, we derive a
few elementary properties of orthogonal bases.

4.2 Orthogonal Bases and Orthogonal Projections

From our discussion in the previous section, we can explicitly represent, for
a ﬁxed orthogonal basis (orthonormal basis), {s1 , . . . , sn } of S, the unique
best approximation s∗ ∈ S to f , for any f ∈ F.

Theorem 4.5. Let F be a Euclidean space with inner product (·, ·). More-
over, let S ⊂ F be a ﬁnite-dimensional linear subspace with orthogonal basis
{s1 , . . . , sn }. Then, for any f ∈ F,

n
(f, sj )
s∗ = sj ∈ S (4.4)
j=1
sj 2

is the unique best approximation to f . For the special case of an orthonormal

basis {s1 , . . . , sn } for S we have the representation

n
s∗ = (f, sj )sj ∈ S.
j=1

Now we study the linear and surjective operator Π : F −→ S, which maps

any element f ∈ F to its unique best approximation s∗ ∈ S. But ﬁrst we
note that the optimality of the best approximation s∗ = Π(f ) immediately
implies the stability estimate

(I − Π)(f ) ≤ f − s for all f ∈ F, s ∈ S (4.5)

where I denotes the identity on F. Moreover, for f = s in (4.5) we get

Π(s) = s for all s ∈ S

108 4 Euclidean Approximation

and so Π is a projection operator, i.e., Π ◦Π = Π. By the characterization

of the best approximation s∗ = Π(f ) ∈ S in (4.1), the operator Π is an
orthogonal projection, since
f − Π(f ) = (I − Π)(f ) ⊥ S for all f ∈ F,
i.e., the linear operator I − Π : F −→ S ⊥ maps onto the orthogonal comple-
ment S ⊥ ⊂ F of S in F. Moreover, I − Π is also a projection operator, since
for any f ∈ F, we have
((I − Π) ◦ (I − Π))(f ) = (I − Π)(f − Π(f ))
= f − Π(f ) − Π(f ) + (Π ◦ Π)(f )
= f − Π(f ) = (I − Π)(f ).
The orthogonality of Π immediately implies another well-known result.
Theorem 4.6. The Pythagoras4 theorem
f − Π(f )2 + Π(f )2 = f 2 for all f ∈ F (4.6)
holds.
Proof. For f ∈ F we have
f 2 = f − Π(f ) + Π(f )2
= f − Π(f )2 + 2(f − Π(f ), Π(f )) + Π(f )2
= f − Π(f )2 + Π(f )2 .

The Pythagoras theorem implies two further stability results.
Corollary 4.7. For I = Π the stability estimates
(I − Π)(f ) ≤ f and Π(f ) ≤ f for all f ∈ F (4.7)
hold. In particular, we have
I − Π = 1 and Π = 1
for the operator norms of I − Π and Π.
Proof. The stability estimates in (4.7) follow directly from the Pythagoras
theorem, Theorem 4.6. In the ﬁrst inequality in (4.7), we have equality for
every element f − Π(f ) ∈ S ⊥ , whereas in the second inequality, we have
equality for every s ∈ S. Thereby, the operator norms of I − Π and Π are
already determined by
(I − Π)(f ) Π(f )
I − Π = sup =1 and Π = sup = 1.
f =0 f f =0 f

4
Pythagoras of Samos (around 570-510 BC), ancient Greek philosopher
4.2 Orthogonal Bases and Orthogonal Projections 109

Next, we compute for f ∈ F the norm Π(f ) of Π(f ) = s∗ . To this end,

we utilize, for a ﬁxed orthogonal basis {s1 , . . . , sn } of S the representation
in (4.4), whereby

n
(f, sj )
Π(f ) = sj ∈ S for f ∈ F. (4.8)
j=1
sj 2

In particular, for s ∈ S, we have the representation

n
(s, sj )
Π(s) = s = sj ∈ S for all s ∈ S. (4.9)
j=1
sj 2

Theorem 4.8. Let {s1 , . . . , sn } ⊂ S be an orthogonal basis of S. Then, the

Parseval5 identity

n
(f, sj )(g, sj )
(Π(f ), Π(g)) = for all f, g ∈ F (4.10)
j=1
sj 2

holds, where in particular

n
|(f, sj )|2
Π(f )2 = for all f ∈ F. (4.11)
j=1
sj 2

Proof. By the representation of Π in (4.8), we have

⎛ ⎞
n
(f, s )
n
(g, s )
(Π(f ), Π(g)) = ⎝ s ⎠
j k
2 j
s , 2 k
j=1
s j s k
k=1

n
(f, sj ) (g, sk ) (f, sj )(g, sj )
n
= (sj , sk ) =
sj sk
2 2
j=1
sj 2
j,k=1

for all f, g ∈ F. For f = g we obtain the stated representation in (4.11).

We ﬁnally add another important result.
Theorem 4.9. Let {s1 , . . . , sn } ⊂ S be an orthogonal basis of S. Then the
Bessel6 inequality

n
|(f, sj )|2
Π(f )2 = ≤ f 2 for all f ∈ F (4.12)
j=1
sj 2

holds. Moreover, we have the identity

n
|(f, sj )|2
f − Π(f )2 = f 2 − ≤ f 2 for all f ∈ F.
j=1
sj 2
5
Marc-Antoine Parseval des Chênes (1755-1836), French mathematician
6
Friedrich Wilhelm Bessel (1784-1846), German astronomer, mathematician
110 4 Euclidean Approximation

Proof. The Bessel inequality follows from the second stability estimate in (4.7)
in combination with the representation in (4.11). The second statement fol-
lows from the Pythagoras theorem (4.6) and the representation (4.11).

4.3 Fourier Partial Sums

In this section, we study one concrete example for Euclidean approximation.
In this particular case, we wish to approximate a continuous 2π-periodic
function by real-valued trigonometric polynomials. To this end, we equip the
linear space of all real-valued continuous 2π-periodic functions
R
C2π ≡ C2π = {f : R −→ R | f ∈ C (R) and f (x) = f (x + 2π) for all x ∈ R}
with the inner product
2π
1
(f, g)R = f (x)g(x) dx for f, g ∈ C2π , (4.13)
π 0
1/2
which by · R = (·, ·)R defines a Euclidean norm on C2π , so that

1 2π
f R =
2
|f (x)|2 dx for f ∈ C2π .
π 0
1/2
Therefore, C2π with · R = (·, ·)R is a Euclidean space.
For the approximation space, we consider choosing the linear space of all
real-valued trigonometric polynomials of degree at most n ∈ N0 ,
8
1
Tn ≡ Tn = span √ , cos(j ·), sin(j ·) 1 ≤ j ≤ n
R
for n ∈ N0 .
2
Therefore, by using the notations introduced at the outset of this chapter,
we consider the special case of the Euclidean space F = C2π , equipped with
1/2
the norm · R = (·, ·)R , and the linear approximation space S = Tn ⊂ C2π
of finite dimension dim(Tn ) = 2n + 1, for n ∈ N0 .
Remark 4.10. In the following chapters, we also use complex-valued tri-
gonometric polynomials from TnC for the approximation of complex-valued
continuous 2π-periodic functions from
C
C2π = {f : R −→ C | f ∈ C (R) and f (x) = f (x + 2π) for all x ∈ R} .
C
In that case, we equip C2π with the inner product
2π
1 C
(f, g)C = f (x)g(x) dx for f, g ∈ C2π , (4.14)
2π 0
1/2 C
and thereby obtain the Euclidean norm · C = (·, ·)C on C2π . The different
scalar factors, 1/π for (·, ·)R in (4.13) and 1/(2π) for (·, ·)C in (4.14), will be
useful later. To keep notations simple, we will from now use (·, ·) = (·, ·)R and
R
· = · R for the inner product (4.13) and the norm on C2π ≡ C2π .
4.3 Fourier Partial Sums 111

For the approximation of f ∈ C2π , we use fundamental results, as de-

veloped in the previous section. In particular, we make use of orthonormal
systems to construct best approximations to f . To this end, we take note of
the following important result.

Theorem 4.11. For n ∈ N0 , the real-valued trigonometric polynomials

8
1

√ , cos(j ·), sin(j ·) 1 ≤ j ≤ n (4.15)
2
form an orthonormal system in C2π .

Proof. From the usual addition theorems for trigonometric polynomials we

get the identities

2 cos(jx) cos(kx) = cos((j − k)x) + cos((j + k)x) (4.16)

2 sin(jx) sin(kx) = cos((j − k)x) − cos((j + k)x) (4.17)
2 sin(jx) cos(kx) = sin((j − k)x) + sin((j + k)x). (4.18)

The 2π-periodicity of cos((j ± k)x) and sin((j ± k)x) implies

2π
1
(cos(j ·), cos(k ·)) = [cos((j − k)x) + cos((j + k)x)] dx = 0
2π 0
2π
1
(sin(j ·), sin(k ·)) = [cos((j − k)x) − cos((j + k)x)] dx = 0
2π 0

for j = k and
2π
1
(sin(j ·), cos(k ·)) = [sin((j − k)x) + sin((j + k)x)] dx = 0
2π 0

for all j, k ∈ {1, . . . , n}. Moreover, we have

2π
1 1
√ , cos(j·) = √ cos(jx) dx = 0
2 2π 0
2π
1 1
√ , sin(j·) = √ sin(jx) dx = 0
2 2π 0

for j = 1, . . . , n, so that the functions in (4.15) form an orthogonal system.

The orthonormality of the functions in (4.15) ﬁnally follows from
2π
1 1 1
√ ,√ = 1 dx = 1
2 2 2π 0

and
112 4 Euclidean Approximation
2π
1
(cos(j ·), cos(j ·)) = [1 + cos(2jx)] dx = 1
2π 0
2π
1
(sin(j ·), sin(j ·)) = [1 − cos(2jx)] dx = 1
2π 0

where we use the representations in (4.16) and (4.17) yet once more.

We now connect to the results of Theorems 4.5 and 4.11, where we can,
for any function f ∈ C2π , represent its unique best approximation s∗ ∈ Tn
by

1
n
1
s∗ (x) = f, √ √ + [(f, cos(j·)) cos(jx) + (f, sin(j·)) sin(jx)] . (4.19)
2 2 j=1

We reformulate the representation for s∗ in (4.19) and introduce on this

occasion the important notion of Fourier partial sums.

Corollary 4.12. For f ∈ C2π , the unique best approximation s∗ ∈ Tn to f

is given by the n-th Fourier partial sum of f ,

a0
n
(Fn f )(x) = + [aj cos(jx) + bj sin(jx)] . (4.20)
2 j=1

The coeﬃcients a0 = (f, 1) and

2π
1
aj ≡ aj (f ) = (f, cos(j·)) = f (x) cos(jx) dx (4.21)
π 0
2π
1
bj ≡ bj (f ) = (f, sin(j·)) = f (x) sin(jx) dx (4.22)
π 0

for 1 ≤ j ≤ n are called Fourier coefficients of f .

The Fourier partial sum (4.20) is split into an even part, given by the
partial sum of the even trigonometric polynomials {cos(j·), 0 ≤ j ≤ n} with
”even” Fourier coefficients aj , and into an odd part, given by the partial
sum of the odd trigonometric polynomials {sin(j·), 1 ≤ j ≤ n} with ”odd”
Fourier coefficients bj . We can show that for any even function f ∈ C2π , all
odd Fourier coefficients bj vanish. Likewise, for an odd function f ∈ C2π , all
even Fourier coefficients aj vanish. On this occasion, we recall the result of
Proposition 3.42, from which these statements immediately follow. But we
wish to compute the Fourier coefficients explicitly.
4.3 Fourier Partial Sums 113

Corollary 4.13. For f ∈ C2π , the following statements are true.

(a) If f is even, then the Fourier partial sum Fn f in (4.20) is even and the
Fourier coeﬃcients aj in (4.21) have the representation

2 π
aj = f (x) cos(jx) dx for 0 ≤ j ≤ n.
π 0
(b) If f is odd, then the Fourier partial sum Fn f in (4.20) is odd and the
Fourier coeﬃcients bj in (4.22) have the representation

2 π
bj = f (x) sin(jx) dx for 1 ≤ j ≤ n.
π 0
Proof. For an even function f ∈ C2π we have bj = 0, for all 1 ≤ j ≤ n, since

1 2π 1 −2π
bj = f (x) sin(jx) dx = − f (−x) sin(−jx) dx
π 0 π 0
0 2π
1 1
=− f (x) sin(jx) dx = − f (x) sin(jx) dx = −bj ,
π −2π π 0
and so the Fourier partial sum Fn f in (4.20) is even. Moreover, we have

1 2π 2 π
a0 = f (x) dx = f (x) dx
π 0 π 0
and, for 1 ≤ j ≤ n,
π 0 π
πaj = f (x) cos(jx) dx = f (x) cos(jx) dx + f (x) cos(jx) dx
−π −π 0
π π π
= f (−x) cos(−jx) dx + f (x) cos(jx) dx = 2 f (x) cos(jx) dx.
0 0 0

This completes our proof for (a). We can prove (b) analogously.
Example 4.14. We consider approximating the periodic function f ∈ C2π ,
deﬁned as f (x) = π −|x|, for x ∈ [−π, π]. To this end, we determine for n ∈ N
the Fourier coeﬃcients aj , bj of the Fourier partial sum Fn f . Since f is an
even function, we can apply Corollary 4.13, statement (a). From this, we see
that bj = 0, for all 1 ≤ j ≤ n, and, moreover,

2 π
aj = f (x) cos(jx) dx for 0 ≤ j ≤ n.
π 0
Integration by parts gives
π π
1 1 π

f (x) cos(jx) dx = f (x) sin(jx) − f (x) sin(jx) dx
0 j j 0

0
π
1 π 1
= sin(jx) dx = − 2 cos(jx) for 1 ≤ j ≤ n,
j 0 j 0
114 4 Euclidean Approximation

and so we have aj = 0 for all even indices j ∈ {1, . . . , n}, while

4
aj = for all odd indices j ∈ {1, . . . , n}.
πj 2
We ﬁnally compute the Fourier coeﬃcient a0 by
π
1 2π 2 π 2 1
a0 = (f, 1) = f (x) dx = (π − x) dx = − (π − x)2 = π.
π 0 π 0 π 2 0

Altogether, we obtain the representation

π 4 1
n n
π
(Fn f )(x) = + aj cos(jx) = + cos(jx)
2 j=1 2 π j=1 j 2
j odd

n−1
2
π 4 cos((2k + 1)x)
= +
2 π (2k + 1)2
k=0

for the n-th Fourier partial sum of f .

For illustration the function graphs of the Fourier partial sums Fn f and
of the error functions Fn f − f , for n = 2, 4, 16, are shown in Figures 4.1-4.3.
♦

As we have seen in Section in 2.6, the real-valued Fourier partial sum Fn f

in (4.20) can be represented as complex Fourier partial sum of the form

n
(Fn f )(x) = cj eijx . (4.23)
j=−n

For the conversion of the Fourier coeﬃcients, we apply the linear mapping
in (2.69), whereby, with using the Eulerean formula (2.67), we obtain for the
complex Fourier coeﬃcients in (4.23) the representation
2π
1
cj = f (x)e−ijx dx for j = −n, . . . , n. (4.24)
2π 0

We remark that the complex Fourier coeﬃcients cj in (4.24) can also, like the
real Fourier coeﬃcients aj in (4.21) and bj in (4.22), be expressed via inner
products. In fact, by using the complex inner product (·, ·)C in (4.14), we can
rewrite the representation in (4.24) as

cj = (f, exp(ij·))C for j = −n, . . . , n.

4.3 Fourier Partial Sums 115

2.5

1.5

0.5

0
-3 -2 -1 0 1 2 3

Fourier partial sum F2 f

0.2

0.1

-0.1

-0.2

-0.3
-3 -2 -1 0 1 2 3

error function F2 f − f

Fig. 4.1. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F2 f )(x) (see Example 4.14).
116 4 Euclidean Approximation

2.5

1.5

0.5

0
-3 -2 -1 0 1 2 3

Fourier partial sum F4 f

0.2

0.1

-0.1

-0.2

-0.3
-3 -2 -1 0 1 2 3

error function F4 f − f

Fig. 4.2. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F4 f )(x) (see Example 4.14).
4.3 Fourier Partial Sums 117

2.5

1.5

0.5

0
-3 -2 -1 0 1 2 3

Fourier partial sum F16 f

0.2

0.1

-0.1

-0.2

-0.3
-3 -2 -1 0 1 2 3

error function F16 f − f

Fig. 4.3. Approximation of the function f (x) = π − |x| on the interval [−π, π] by
the Fourier partial sum (F16 f )(x) (see Example 4.14).
118 4 Euclidean Approximation

Now we wish to approximate the complex Fourier coeﬃcients cj .

To this end, we apply the composite trapezoidal rule with N = 2n + 1
equidistant knots
2π
xk = k ∈ [0, 2π) for k = 0, . . . , N − 1,
N
so that
N −1 N −1
1 1 −jk
cj ≈ f (xk )e−ijxk = f (xk )ωN (4.25)
N N
k=0 k=0

where ωN = e2πi/N denotes the N -th root of unity in (2.74). In this way, the
vector c = (c−n , . . . , cn )T ∈ CN of the complex Fourier coeﬃcients (4.24) is
approximated by the discrete Fourier transform (2.79) from the data vector

f = (f0 , . . . , fN −1 )T ∈ RN ,

where fk = f (xk ) for k = 0, . . . , N − 1. In order to compute the Fourier

coeﬃcients c ∈ CN eﬃciently, we apply the fast Fourier transform (FFT)
from Section 2.7. According to Theorem 2.46, the FFT can be performed in
O(N log(N )) steps.
We close this section with the following remark.
The Fourier operator Fn : C2π −→ Tn gives the orthogonal projection
of C2π onto Tn . In Chapter 6, we will analyze the asymptotic behaviour of the
operator Fn , for n → ∞, in more detail, where we will address the following
fundamental questions.
• Is the Fourier series
∞
a0
(F∞ f )(x) = + [aj cos(jx) + bj sin(jx)] for f ∈ C2π
2 j=1

of f convergent?
• If so, does the Fourier series F∞ f converge to f ?
• If so, how fast does the Fourier series F∞ f converge to f ?
In particular, we will investigate, if at all, in which sense (e.g. pointwise,
or uniformly, or with respect to the Euclidean norm · ) the convergence of
the Fourier series F∞ f holds. In Chapter 6, we will give answers, especially
for the asymptotic behaviour of the approximation error

η(f, Tn ) = Fn f − f and η∞ (f, Tn ) = Fn f − f ∞ for n → ∞.

This will lead us to speciﬁc conditions on the smoothness of f .

4.4 Orthogonal Polynomials 119

4.4 Orthogonal Polynomials

Now we study another important special case of Euclidean approximation.
In this particular case, we wish to approximate continuous functions from
C [a, b], where [a, b] ⊂ R denotes a compact interval.
For the approximation space, we take, for ﬁxed n ∈ N0 , the linear space
Pn of algebraic polynomials of degree at most n ∈ N0 , where dim(Pn ) = n+1,
i.e., throughout this section, we regard the special case where S = Pn and
F = C [a, b].
We introduce an inner product for the function space C [a, b] as follows.
For a positive and integrable weight function w ∈ C (a, b), satisfying
b
w(x) dx < ∞,
a

the function space C [a, b] is by

b
(f, g)w = f (x)g(x)w(x) dx for f, g ∈ C [a, b]
a

1/2
equipped with an inner product, yielding the Euclidean norm ·w = (·, ·)w ,
so that b
f 2w = |f (x)|2 w(x) dx for f ∈ C [a, b].
a
Later in this section, we make concrete examples for the weight function w.
To approximate functions from C [a, b], we apply Theorem 4.5, so that
we can, for f ∈ C [a, b], represent the unique best approximation s∗ ∈ Pn
to f explicitly. In order to do so, we need an orthogonal system for Pn . To
this end, we propose an algorithm, which constructs for any weighted inner
product (·, ·)w an orthogonal basis

{p0 , p1 , . . . , pn } ⊂ Pn

for the polynomial space Pn .

The following orthogonalization algorithm by Gram-Schmidt7 belongs to
the standard repertoire of linear algebra. In this iterative method, a given
basis B ⊂ S of a ﬁnite-dimensional Euclidean space S is, by successive or-
thogonal projections of the basis elements from B, transformed into an or-
thogonal basis of S. The formulation of this constructive method is detailed
in the Gram-Schmidt algorithm, Algorithm 4, which we here apply to the
monomial basis B = {1, x, x2 , . . . , xn } of S = Pn .
7
Erhard Schmidt (1876-1959), German mathematician
120 4 Euclidean Approximation

Algorithm 4 Gram-Schmidt algorithm

1: function Gram-Schmidt
2: let p0 := 1;
3: for k = 0, . . . , n − 1 do
4: let
k
(xk+1 , pj )w
pk+1 := xk+1 − pj ;
j=0
pj 2w
5: end for
6: end function

Proposition 4.15. The polynomials p0 , . . . , pn ∈ Pn , output by the Gram-

Schmidt algorithm, Algorithm 4, form an orthogonal basis for Pn .
Proof. Obviously, pk ∈ Pk ⊂ Pn , for all 0 ≤ k ≤ n. Moreover, the orthogo-
nality relation
pk+1 = xk+1 − Π(xk+1 ) ⊥ Pk for all k = 0, . . . , n − 1,
holds, where

k
(xk+1 , pj )w
Π(xk+1 ) = pj
j=0
pj 2w

is the orthogonal projection of the monomial xk+1 onto Pk w.r.t. (·, ·)w .
Therefore, the polynomials p0 , . . . , pn form an orthogonal basis for Pn .
Note that the orthogonalization method of Gram-Schmidt guarantees,
for any weighted inner product (·, ·)w , the existence of an orthogonal basis
for Pn with respect to (·, ·)w . Moreover, the Gram-Schmidt construction of
orthogonal polynomials in Algorithm 4 is unique up to n + 1 (non-vanishing)
scaling factors, one for the initialization (in line 2) and one for each of the
n for loop cycles (line 4). The scaling factors could be used to normalize the
orthogonal system of polynomials, where the following options are commonly
used.
• Normalization of the leading coefficient
p0 ≡ 1 and pk (x) = xk + qk−1 (x) for some qk−1 ∈ Pk−1 for k = 1, . . . , n
• Normalization at one
pk (1) = 1 for all k = 0, . . . , n
• Normalization of norm (orthonormalization)
Let p0 := p0 /p0 w (line 2) and pk := pk /pk w (line 4), k = 1, . . . , n.
However, the Gram-Schmidt algorithm is problematic for numerical rea-
sons. In fact, on the one hand, it is unstable, especially for input bases B with
almost linearly dependent basis elements. On the other hand, the Gram-
Schmidt algorithm is very inefficient. In contrast, the following three-term
recursion is much more suitable for efficient and stable constructions of or-
thogonal polynomials.
4.4 Orthogonal Polynomials 121

Theorem 4.16. For any weighted inner product (·, ·)w , there are unique or-
thogonal polynomials pk ∈ Pk , for k ≥ 0, with leading coeﬃcient one. The
orthogonal polynomials (pk )k∈N0 satisfy the three-term recursion

pk (x) = (x + ak )pk−1 (x) + bk pk−2 (x) for k ≥ 1 (4.26)

for initial values p−1 ≡ 0, p0 ≡ 1 and coeﬃcients

(xpk−1 , pk−1 )w pk−1 2w

ak = − for k ≥ 1 and b1 = 1, bk = − for k ≥ 2.
pk−1 2w pk−2 2w

Proof. We prove the statement by induction on k.

Initial step: For k = 0, the constant p0 ≡ 1 is the unique polynomial in P0
with leading coefficient one.
Induction hypothesis: Assume that p0 , . . . , pk−1 , k ≥ 1, are unique orthogonal
polynomials with leading coefficient one, where pj ∈ Pj for j = 0, . . . , k − 1.
Induction step (k −1 −→ k): Let pk ∈ Pk \Pk−1 be a polynomial with leading
coefficient one. Then, the difference pk − xpk−1 lies in Pk−1 , so that we have
(with using the orthogonal basis p0 , . . . , pk−1 of Pk−1 ) the representation

k−1
(pk − xpk−1 , pj )w
pk (x) − xpk−1 (x) = cj pj (x) with cj = .
j=0
pj 2w

We now formulate necessary conditions for the coeﬃcients cj , under which

the orthogonality pk ⊥ Pk−1 holds. From pk ⊥ Pk−1 we get

(xpk−1 , pj )w (pk−1 , xpj )w

cj = − =− for j = 0, . . . , k − 1.
pj 2w pj 2w

This in turn implies c0 = . . . = ck−3 = 0 and, moreover,

(xpk−1 , pk−1 )w (pk−1 , xpk−2 )w (pk−1 , pk−1 )w

ck−1 = − and ck−2 = − =− .
pk−1 2w pk−2 2w pk−2 2w

Therefore, all coeﬃcients c0 , . . . , ck−1 are uniquely determined, whereby pk

is uniquely determined. Moreover, the stated three-term recursion in (4.26),

pk (x) = (x + ck−1 )pk−1 (x) + ck−2 pk−2 (x) = (x + ak )pk−1 (x) + bk pk−2 (x)

holds with ak = ck−1 , for k ≥ 1, bk = ck−2 , for k ≥ 2, and where b1 := 1.

Remark 4.17. Due to the uniqueness of the coeﬃcients ak , for k ≥ 1, and

bk , for k ≥ 2, the conditions of the three-term recursion (4.26) are also suﬃ-
cient, i.e., the three-term recursion in (4.26) generates the unique orthogonal
polynomials pk ∈ Pk , for k ≥ 0, w.r.t. the weighted inner product (·, ·)w .
122 4 Euclidean Approximation

Next, we discuss important properties of orthogonal polynomials, where

their zeros are of particular interest. We can show that orthogonal polynomi-
als have only simple zeros. To this end, we ﬁrst prove a more general result
for continuous functions.

Theorem 4.18. Let g ∈ C [a, b] satisfy (g, p)w = 0 for all p ∈ Pn , i.e.,
g ⊥ Pn , for n ∈ N0 . Then, either g vanishes identically on [a, b] or g has at
least n + 1 zeros with changing sign in (a, b).

Proof. Suppose g ∈ C [a, b] \ {0} has only k < n + 1 zeros

a < x 1 < . . . < xk < b

with changing sign. Then, the product g · p between g and the polynomial

k
p(x) = (x − xj ) ∈ Pk ⊂ Pn
j=1

has no sign change on (a, b). Therefore, the inner product

b
(g, p)w = g(x)p(x)w(x) dx
a

cannot vanish. This is in contradiction to the assumed orthogonality g ⊥ Pn .

Therefore, g has at least n + 1 zeros with changing sign in (a, b).

Corollary 4.19. Suppose pn ∈ Pn is a polynomial satisfying pn ⊥ Pn−1 , for

n ∈ N. Then, either pn ≡ 0 or pn has exactly n simple zeros in (a, b).

Proof. On the one hand, by Theorem 4.18, pn has at least n pairwise distinct
zeros in (a, b). Now suppose pn ≡ 0. Since pn is an algebraic polynomial in
Pn \ {0}, pn has, on the other hand, at most n zeros. Altogether, pn has
exactly n zeros in (a, b), where each zero must be simple.

Corollary 4.20. Let p∗n ∈ Pn be a best approximation to f ∈ C [a, b] \ Pn .

Then, the error function p∗n − f has at least n + 1 zeros with changing sign
in (a, b).

Proof. According to Remark 4.2, we have the orthogonality p∗n − f ⊥ Pn .

Since f ≡ p∗n , the error function p∗n − f has, due to Theorem 4.18, at least
n + 1 zeros with changing sign in (a, b).

We remark that Corollary 4.20 yields a necessary condition to character-

ize the best approximation p∗ ∈ Pn to f ∈ C [a, b]. This condition could a
posteriori be used for consistency check. If we knew the n + 1 simple zeros
X = {x1 , . . . , xn+1 } ⊂ (a, b) of the error function p∗ − f a priori, then we
4.4 Orthogonal Polynomials 123

would be able to compute the best approximation p∗ ∈ Pn via the interpo-

lation conditions p∗X = fX . However, in the general case, the zeros of p∗ − f
are usually unknown.
We ﬁnally introduce three relevant families of orthogonal polynomials.
For a more comprehensive discussion on orthogonal polynomials, we refer to
the classical textbook [67] by Gábor Szegő8 or, for numerical aspects, to the
textbook [57].

4.4.1 Chebyshev Polynomials

We have studied the Chebyshev polynomials

Tn (x) = cos(n arccos(x)) for n ∈ N0 (4.27)

already in Section 2.5. Let us ﬁrst recall some of the basic properties of the
Chebyshev polynomials Tn ∈ Pn , in particular the three-term recursion from
Theorem 2.27,

Tn+1 (x) = 2xTn (x) − Tn−1 (x) for n ∈ N (4.28)

with initial values T0 ≡ 1 and T1 (x) = x.

Now we show that the Chebyshev polynomials {T0 , . . . , Tn } ⊂ Pn are
orthogonal polynomials w.r.t. the weight function w : (−1, 1) −→ (0, ∞),
deﬁned as
1
w(x) = √ for x ∈ (−1, 1). (4.29)
1 − x2
Theorem 4.21. For n ∈ N0 the set of Chebyshev polynomials {T0 , . . . , Tn }
form an orthogonal basis for Pn w.r.t. the weight function w in (4.29), where
⎧
⎨ 0 for j = k
(Tj , Tk )w = π for j = k = 0 for 0 ≤ j, k ≤ n. (4.30)
⎩
π/2 for j = k > 0

Proof. By the substitution φ = arccos(x) we show the orthogonality

1 1
Tj (x)Tk (x) cos(j arccos(x)) cos(k arccos(x))
(Tj , Tk )w = √ dx = √ dx
−1 1−x 2
−1 1 − x2
0 π
cos(jφ) cos(kφ)
= 9 (− sin(φ)) dφ = cos(jφ) cos(kφ) dφ
π 1 − cos2 (φ) 0

= Tk 2w δjk

by using Theorem 4.11. Theorem 4.11 also yields the stated values for the
squared norms Tk 2w = (Tk , Tk )w .
8
Gábor Szegő (1895-1985), Hungarian mathematician
124 4 Euclidean Approximation

We remark that the Chebyshev polynomials are normalized by

Tn (1) = 1 for all n ≥ 0.

Indeed, this follows directly by induction from the three-term recursion (4.28).
Due to Corollary 2.28, the n-th Chebyshev polynomial Tn has, for n ≥ 1, the
leading coeﬃcient 2n−1 , and so the scaled polynomial

pn (x) = 21−n Tn (x) for n ≥ 1 (4.31)

has leading coeﬃcient one. Thus, the orthogonal polynomials p0 , . . . , pn ∈ Pn

satisfy the three-term recursion (4.26) in Theorem 4.16. We now show that
the three-term recursion in (4.26) is consistent with the three-term recur-
sion (4.28) for the Chebyshev polynomials.
To this end, we compute the coefficients ak and bk from Theorem 4.16. We
first remark that the coefficients ak are invariant under scalings of the basis
elements pk . Thus we can show that for the case of the Chebyshev polynomials
all coefficients ak in the three-term recursion (4.26) must vanish, since by the
substitution φ = arccos(x) we have
π
(xTk (x), Tk (x))w = cos(φ) cos2 (kφ) dφ = 0 for all k ≥ 0
0

for the nominator of ak+1 in (4.26). For the coeﬃcients bk , we get b1 = 1,

b2 = −1/2, and

pk 2w 21−k Tk 2w 1 Tk 2w 1

bk+1 = − = − 2−k =− =− for k ≥ 2.
pk−1 w2 2 Tk−1 w 2 4 Tk−1 2w 4
From Theorem 4.16, we obtain p0 ≡ 1, p1 (x) = x, p2 (x) = x2 − 1/2 and
1
pk+1 (x) = xpk (x) − pk−1 (x) for k ≥ 2.
4
Rescaling with (4.31) finally yields the three-term recursion (4.28).
In our above computations, we rely on structural advantages of Chebyshev
polynomials: On the one hand, the degree-independent representation of the
squared norms Tk 2w in (4.30) simplifies the calculations of the coefficients
bk significantly. On the other hand, for the calculations of the coefficients ak
we can take advantage of the orthonormality of the even trigonometric poly-
nomials cos(k·). We wish to further discuss this important relation between
the orthonormal system (cos(k·))k∈N0 and the orthogonal system (Tk )k∈N0 .
By our previous (more general) investigations in Section 4.2 the unique
best approximation p∗n ∈ Pn to a function f ∈ C [−1, 1] is given by the
orthogonal projection Πn f ≡ ΠPn f of f onto Pn ,

n
(f, Tk )w 1 2
n
Πn f = Tk = (f, 1)w + (f, Tk )w Tk , (4.32)
Tk 2w π π
k=0 k=1
4.4 Orthogonal Polynomials 125

where the form of Chebyshev partial sum in (4.32) reminds us on the form
of the Fourier partial sum Fn f from Corollary 4.12. Indeed, the coefficients
in the series expansion for the best approximation Πn f in (4.32) can be
identified as Fourier coefficients.

Theorem 4.22. For f ∈ C [−1, 1] the coeﬃcients of the Chebyshev partial

sum (4.32) coincide with the Fourier coeﬃcients ak ≡ ak (g) of the even
function g(x) = f (cos(x)), so that

a0
n
Πn f = + a k Tk . (4.33)
2
k=1

Proof. For f ∈ C [−1, 1], the coeﬃcients (f, Tk )w in (4.32) can be computed
by using the substitution φ = arccos(x):
1 π
f (x)Tk (x)
(f, Tk )w = √ dx = f (cos(φ)) cos(kφ) dφ
−1 1 − x2 0
2π
π1 π
= f (cos(x)) cos(kx) dx = ak (g),
2π 0 2

where ak (g), k ≥ 1, denotes the k-th Fourier coeﬃcient of g(x) = f (cos(x)).

For k = 0 we ﬁnally get the Fourier coeﬃcient a0 of g by

(f, T0 )w π 1 a0 (g)
= a0 (g) = .
T0 2w 2 π 2

As we had remarked at the end of Section 4.3, the Fourier coeﬃcients

ak ≡ ak (g) can efficiently be approximated by the fast Fourier transform
(FFT). Now we introduce the Clenshaw algorithm [15], Algorithm 5, which,
on input coefficients a = (a0 , . . . , an )T ∈ Rn+1 , yields an efficient and stable
evaluation of the Chebyshev partial sum (4.33) at x ∈ [−1, 1].

Algorithm 5 Clenshaw algorithm

1: function Clenshaw(a, x)
2: Input: coeﬃcients a = (a0 , . . . , an )T ∈ Rn+1 and x ∈ [−1, 1].
3:
4: let zn+1 := 0; zn := an ;
5: for k = n − 1, . . . , 0 do
6: let zk := ak + 2xzk+1 − zk+2 ;
7: end for
8: return (Πn f )(x) = (z0 − z2 )/2.
9: end function
126 4 Euclidean Approximation

To verify the Clenshaw algorithm, we use the recursion for the Chebyshev
polynomials in (4.28). By the assignment in line 6 of the Clenshaw algorithm,
we get the representation

ak = zk − 2xzk+1 + zk+2 for k = n − 1, . . . , 0 (4.34)

for the coeﬃcients of the Chebyshev partial sum, where for k = n with
zn+1 = 0 and zn = an we get zn+2 = 0. The sum over the last n terms of the
Chebyshev partial sum (4.33) can be rewritten by using the representation
in (4.34) in combination with the recursion (4.28):

n
n
ak Tk (x) = (zk − 2xzk+1 + zk+2 )Tk (x)
k=1 k=1

n
n+1
n+2
= zk Tk (x) − 2xzk Tk−1 (x) + zk Tk−2 (x)
k=1 k=2 k=3
= z1 T1 (x) + z2 T2 (x) − 2xz2 T1 (x)
n
+ zk [Tk (x) − 2xTk−1 (x) + Tk−2 (x)]
k=3
= z1 x + z2 (2x2 − 1) − 2xz2 x
= z1 x − z2 .

With a0 = z0 − 2xz1 + z2 we get the representation

a0
n
1 1
(Πn f )(x) = + ak Tk (x) = (z0 − 2xz1 + z2 + 2z1 x − 2z2 ) = (z0 −z2 ).
2 2 2
k=1

Finally, we provide a memory eﬃcient implementation of the Clenshaw

algorithm, Algorithm 6.

Algorithm 6 Clenshaw algorithm (memory eﬃcient)

1: function Clenshaw(a, x)
2: Input: coeﬃcients a = (a0 , . . . , an )T ∈ Rn+1 and x ∈ [−1, 1].
3:
4: let z ≡ (z0 , z1 , z2 ) := (an , 0, 0);
5: for k = n − 1, . . . , 0 do
6: let z2 = z1 ; z1 = z0 ;
7: let z0 = ak + 2x · z1 − z2 ;
8: end for
9: return (Πn f )(x) = (z0 − z2 )/2.
10: end function
4.4 Orthogonal Polynomials 127

4.4.2 Legendre Polynomials

Now we discuss another example for orthogonal polynomials on [−1, 1].

Deﬁnition 4.23. For n ∈ N0 , the Rodrigues9 formula

dn n!
Ln (x) = n
(x2 − 1)n for n ≥ 0 (4.35)
dx (2n)!

deﬁnes the n-th Legendre10 polynomial.

We show that the Legendre polynomials are the (unique) orthogonal poly-
nomials with leading coeﬃcient one, belonging to the weight function w ≡ 1.
Therefore, we regard the usual (unweighted) L2 inner product on C [−1, 1],
deﬁned as
1
(f, g)w := (f, g) = f (x)g(x) dx for f, g ∈ C [−1, 1].
−1

Theorem 4.24. For n ∈ N0 , the Legendre polynomials L0 , . . . , Ln form an

orthogonal basis for Pn with respect to the weight function w ≡ 1 on [−1, 1].

Proof. Obviously, Lk ∈ Pk ⊂ Pn for all 0 ≤ k ≤ n.

Now for 0 ≤ k ≤ n we consider the integral
1 n
d dk
Ink = n
(x2 − 1)n k
(x2 − 1)k dx.
−1 dx dx

For 0 ≤ i ≤ n, we have the representation

1 n−i
d dk+i
Ink = (−1)i n−i
(x2 − 1)n k+i
(x2 − 1)k dx, (4.36)
−1 dx dx

as can be shown by induction (using integration by parts).

For i = n in (4.36), we have
1
dk+n
Ink = (−1) n
(x2 − 1)n (x2 − 1)k dx = 0 for n > k (4.37)
−1 dxk+n

which implies
n!k!
(Ln , Lk ) = Ink = 0 for n > k. (4.38)
(2n)!(2k)!

9
Benjamin Olinde Rodrigues (1795-1851), French mathematician and banker
10
Adrien-Marie Legendre (1752-1833), French mathematician
128 4 Euclidean Approximation

We note two more important properties of the Legendre polynomials.

Theorem 4.25. The Legendre polynomials Ln in (4.35) satisfy the following

properties.
(a) Ln has leading coeﬃcient one.
(b) We have Ln (−x) = (−1)n Ln (x) for all x ∈ [−1, 1].

Proof. For n ≥ 0, by using (4.35), we get the representation

⎛ ⎞
n
(2n)! dn d n
⎝ n
Ln (x) = (x2 − 1)n = x2j (−1)n−j ⎠
n! dxn dxn j=0 j
n2j
= n! x2j−n (−1)n−j . (4.39)
j n
n/2≤j≤n

(a) By (4.39) the Legendre polynomial Ln has leading coeﬃcient one.

(b) For even n, we have that 2j − n is even, and so in this case all terms
in (4.39) are even, which implies that Ln is even. Likewise, we can show that
Ln is odd for odd n (by analogy). Altogether, we see that statement (b) holds.

In conclusion, the Legendre polynomials are, for the L2 inner product

(·, ·) on [−1, 1], the unique orthogonal polynomials with leading coeﬃcient
one. Finally, we derive a three-term recursion for the Legendre polynomials
from (4.26).

Theorem 4.26. The Legendre polynomials satisfy the three-term recursion

n2
Ln+1 (x) = xLn (x) − Ln−1 (x) for n ≥ 1 (4.40)
4n2−1
with initial values L0 ≡ 1 and L1 (x) = x.

Proof. Obviously, L0 ≡ 1 and L1 (x) = x.

By Theorem 4.16, the sought three-term recursion has the form (4.26),
where
(xLn−1 , Ln−1 ) Ln−1 2
an = − for n ≥ 1 and b1 = 1, bn = − for n ≥ 2.
Ln−1 2 Ln−2 2

By statement (b) in Theorem 4.25, the Legendre polynomial L2n is, for
any n ∈ N0 , even, and therefore xL2n (x) is odd, so that an = 0 for all n ≥ 0.
4.4 Orthogonal Polynomials 129

Table 4.1. The Legendre polynomials Ln in monomial form, for n = 1, . . . , 10.

L1 (x) = x
1
L2 (x) = x2 −
3
3
L3 (x) = x3 − x
5
6 2 3
L4 (x) = x4 − x +
7 35
10 3 5
L5 (x) = x5 − x + x
9 21
15 4 5 2 5
L6 (x) = x6 − x + x −
11 11 231
21 5 105 3 35
L7 (x) = x7 − x + x − x
13 143 429
28 6 14 4 28 2 7
L8 (x) = x8 − x + x − x +
15 13 143 1287
36 7 126 5 84 3 63
L9 (x) = x9 − x + x − x + x
17 85 221 2431
45 8 630 6 210 4 315 2 63
L10 (x) = x10 − x + x − x + x −
19 323 323 4199 46189

Now we compute the coeﬃcients bn for n ≥ 2.

By the representation (4.37) for the integral Ink we have, for k = n,
1 1
Inn = (−1)n (2n)! (x2 − 1)n dx = (2n)! (1 − x2 )n dx
−1 −1
1
= (2n)! (1 − x)n (1 + x)n dx
−1
1
n!
= (2n)! · (1 + x)2n dx
(n + 1) · . . . · (2n) −1
x=1
1
= (n!)2 · (1 + x)2n+1
2n + 1 x=−1
22n+1
= (n!)2 ·
2n + 1
after n-fold integration by parts. This gives the representation
130 4 Euclidean Approximation

(n!)2 (n!)4 22n+1

Ln 2 = · I nn = · for n ≥ 0
((2n)!)2 ((2n)!)2 2n + 1

and therefore
Ln 2 n4 22 (2n − 1)
bn+1 = − =−
Ln−1 2 (2n) (2n − 1)
2 2 2n + 1
2 2
n n
=− =− 2 for n ≥ 1,
(2n − 1)(2n + 1) 4n − 1

which proves the stated three-term recursion.

The Legendre polynomials Ln , for n = 1, . . . , 10, are, in their monomial

form, shown in Table 4.1. To compute the entries for Table 4.1, we have used
the three-term recursion (4.40) in Theorem 4.26, with initial values L0 ≡ 1
and L1 (x) = x. In summary, we see from Theorem 4.25 that
• Ln has leading coeﬃcient one;
• L2k is even for k ∈ N0 ;
• L2k+1 is odd for k ∈ N.
Note that the above properties of the Legendre polynomials are consistent
with the representations of Ln for n = 2, . . . , 10, in Table 4.1.

4.4.3 Hermite Polynomials

We ﬁnally discuss one example of orthogonal polynomials on R.

Deﬁnition 4.27. For n ∈ N0 , we let Hn : R −→ R, deﬁned as

2 dn −x2
Hn (x) = (−1)n ex e for n ≥ 0, (4.41)
dxn
denote the n-th Hermite11 polynomial.

Next, we show that the Hermite polynomials Hn are orthogonal polyno-

mials on R with leading coeﬃcient 2n with respect to the weight function

w(x) = e−x .
2

Therefore, in this case, we work with the weighted L2 inner product

f (x)g(x) e−x dx
2
(f, g)w = (f, g) = for f, g ∈ C (R). (4.42)
R

Theorem 4.28. For n ∈ N0 , the Hermite polynomials H0 , . . . , Hn form an

orthogonal basis for Pn with respect to the weighted inner product (·, ·)w .
11
Charles Hermite (1822-1901), French mathematician
4.4 Orthogonal Polynomials 131

Proof. We ﬁrst show for n ∈ N0 the representation

w(n) (x) = Pn (x) · e−x

2
for some Pn ∈ Pn \ Pn−1 . (4.43)
We prove the representation in (4.43) by induction on n ≥ 0.
Initial step: For n = 0, we have (4.43) with P0 ≡ 1 ∈ P0 .
Induction hypothesis: Suppose the representation in (4.43) holds for n ∈ N0 .
Induction step (n −→ n + 1): We get the stated representation by
d (n) d
Pn (x) · e−x
2
w(n+1) (x) = w (x) =
dx dx
= (Pn (x) − 2xPn (x)) · e−x = Pn+1 (x) · e−x
2 2

with Pn+1 (x) = Pn (x)−2xPn (x), where Pn+1 ∈ Pn+1 \Pn for Pn ∈ Pn \Pn−1 .

Due to (4.43), the Hermite polynomial Hn , n ≥ 0, has the representation

Hn (x) = (−1)n ex · Pn (x) · e−x = (−1)n Pn (x)

2 2
for x ∈ R,
so that Hn ∈ Pn \ Pn−1 . Moreover, by (4.43) we have

w(n) (x) = (−1)n e−x · Hn (x)

2
for x ∈ R.
Now we consider for ﬁxed x ∈ R the function gx : R −→ R, deﬁned as

gx (t) := w(x + t) = e−(x+t)

2
for t ∈ R.
By Taylor series expansion on the analytic function gx around zero, we get
∞
(k) ∞ k
∞ k

gx (0) t t
(−1)k e−x Hk (x).
2
k (k)
w(x + t) = gx (t) = t = w (x) =
k! k! k!
k=0 k=0 k=0

2xt−t2
This yields, for the function h(x, t) = e , the series expansion
∞ k

2 t
h(x, t) = w(x − t) · ex = Hk (x) for all x, t ∈ R. (4.44)
k!
k=0

Now on the one hand we have for s, t ∈ R the representation

−x2
e−x e2x(t+s) e−(t +s ) dx
2 2 2
e h(x, t)h(x, s) dx =
R
R
e−(x−(t+s)) e2ts dx
2
=
R

√
e−x dx = π · e2ts
2
2ts
=e
R
∞

√ (2ts)k
= π· . (4.45)
k!
k=0
132 4 Euclidean Approximation

On the other hand, with using the uniform convergence of the series for
h(x, t) in (4.44), we have the representation
,∞ -⎛ ∞ ⎞
t k s j
e−x h(x, t)h(x, s) dx = e−x Hk (x) ⎝ Hj (x)⎠ dx
2 2

R R k! j=0
j!
k=0
∞

tk s j
e−x Hk (x)Hj (x) dx.
2
= (4.46)
k!j! R
k,j=0

By comparing the coeﬃcients in (4.45) and (4.46), we get

√
e−x Hk (x)Hj (x) dx = 2k πk! · δjk
2
for all j, k ∈ N0 , (4.47)
R

and so in particular
√
Hk 2w = 2k πk! for all k ∈ N0 .
This completes our proof.
Now we proof a three-term recursion for the Hermite polynomials.
Theorem 4.29. The Hermite polynomials satisfy the three-term recursion
Hn+1 (x) = 2xHn (x) − 2nHn−1 (x) for n ≥ 0 (4.48)
with the initial values H−1 ≡ 0 and H0 (x) ≡ 1.
Proof. Obviously, we have H0 ≡ 1. By applying partial diﬀerentiation to the
series expansion for h(x, t) in (4.44) with respect to variable t we get
tk−1 ∞
∂
h(x, t) = 2(x − t)h(x, t) = Hk (x)
∂t (k − 1)!
k=1

and this implies

∞ k
∞
∞ k

t tk+1 t
2xHk (x) − 2 Hk (x) = Hk+1 (x). (4.49)
k! k! k!
k=0 k=0 k=0

Moreover, we have
∞ k+1
∞
tk ∞
t tk+1
Hk (x) = (k + 1)Hk (x) = kHk−1 (x) (4.50)
k! (k + 1)! k!
k=0 k=0 k=0

with H−1 ≡ 0. Inserting (4.50) into (4.49) gives the identity

∞ k
∞ k

t t
(2xHk (x) − 2kHk−1 (x)) = Hk+1 (x). (4.51)
k! k!
k=0 k=0

By comparing the coeﬃcients in (4.51), we ﬁnally get the stated three-term

recursion in (4.48) with the initial values H−1 ≡ 0 and H0 ≡ 1.
4.4 Orthogonal Polynomials 133

Table 4.2. The Hermite polynomials Hn in monomial form, for n = 1, . . . , 8.

H1 (x) = 2x

H2 (x) = 4x2 − 2

H3 (x) = 8x3 − 12x

H4 (x) = 16x4 − 48x2 + 12

H5 (x) = 32x5 − 160x3 + 120x

H6 (x) = 64x6 − 480x4 + 720x2 − 120

H7 (x) = 128x7 − 1344x5 + 3360x3 − 1680x

H8 (x) = 256x8 − 3584x6 + 13440x4 − 13440x + 1680

By Theorem 4.29, we get another recursion for the Hermite polynomials.

Corollary 4.30. The Hermite polynomials Hn satisfy the recursion

Hn (x) = 2nHn−1 (x) for n ∈ N. (4.52)

Proof. Diﬀerentiation of Hn in (4.41) yields

n

d 2 d
Hn (x) = (−1)n ex e −x2
= 2xHn (x) − Hn+1 (x),
dx dxn

whereby (4.52) follows from the three-term recursion for Hn+1 in (4.48).

Further properties of the Hermite polynomials Hn follow immediately

from the recursions in Theorem 4.29 and in Corollary 4.30.

Corollary 4.31. The Hermite polynomials Hn in (4.41) satisfy the following

properties.
(a) Hn has leading coeﬃcient 2n , for n ≥ 0.
(b) H2n is even and H2n+1 is odd, for n ≥ 0.

Proof. Statement (a) follows by induction from the three-term recursion (4.48),
whereas statement (b) follows from (4.52) with H0 ≡ 1 and H1 (x) = 2x.

We can conclude that for the weighted L2 inner product (·, ·)w in (4.42)
the Hermite polynomials Hn are the unique orthogonal polynomials with
leading coeﬃcient 2n . The Hermite polynomials Hn are, for n = 1, . . . , 8,
shown in their monomial form in Table 4.2.
134 4 Euclidean Approximation

4.5 Exercises
Exercise 4.32. Let F = C [−1, 1] be equipped with the Euclidean norm
· 2 , deﬁned by the inner product
1
(f, g) = f (x)g(x) dx for f, g ∈ C [−1, 1],
−1

so that · 2 = (·, ·)1/2 . Compute on given coeﬃcients a, b, c, d ∈ R, a = 0, of

a cubic polynomial

f (x) = a x3 + b x2 + c x + d ∈ P3 \ P2 for x ∈ [−1, 1]

the unique best approximation p∗2 to f from P2 with respect to · 2 .

Exercise 4.33. Compute for n ∈ N0 the Fourier coeﬃcients a0 , . . . , an ∈ R
and b1 , . . . , bn ∈ R of the Fourier partial sum

a0
n
Fn (x) = + [aj cos(jx) + bj sin(jx)] for x ∈ [0, 2π)
2 j=1

(a) for the rectangular wave

⎧
⎨ 0 for x ∈ {0, π, 2π}
R(x) = 1 for x ∈ (0, π)
⎩
−1 for x ∈ (π, 2π);

(b) for the saw-tooth function

0 for x ∈ {0, 2π}
S(x) = 1
2 (π − x) for x ∈ (0, 2π).

Plot the graphs of R and the best approximation F10 R to R in one ﬁgure.
Plot the graphs of S and the best approximation F10 S to S in one ﬁgure.
Exercise 4.34. Approximate the function f (x) = 2x−1 on the unit interval
[0, 1] by a trigonometric polynomial of the form

c0
n
Tn (x) = + ck cos(kπx) for x ∈ [0, 1]. (4.53)
2
k=1

Compute (for arbitrary n ∈ N0 ) the unique best approximation Tn∗ of the

form (4.53) to f with respect to the Euclidean norm · 2 on [0, 1]. Then
determine the smallest m ∈ N, satisfying
1
∗
|f (x) − Tm (x)|2 dx ≤ 10−4 ,
0
∗
and give the best approximation Tm to f in explicit form.
4.5 Exercises 135

Exercise 4.35. For a continuous, positive and integrable weight function

w : (a, b) −→ (0, ∞), let C [a, b] be equipped with the weighted Euclidean
1/2
norm · w = (·, ·)w , deﬁned by
b
(f, g)w = f (x) g(x) w(x) dx for f, g ∈ C [a, b].
a

Moreover, let (pk )k∈N0 , with pk ∈ Pk , be the unique sequence of orthogonal

polynomials with respect to (·, ·)w with leading coeﬃcient one. According to
Theorem 4.16, the orthogonal polynomials pk satisfy the three-term recursion

pk (x) = (x + ak ) pk−1 (x) + bk pk−2 (x) for k ≥ 1

with initial values p−1 ≡ 0, p0 ≡ 1, and with the coeﬃcients

(xpk−1 , pk−1 )w pk−1 2w

ak = − for k ≥ 1 and b1 = 1, bk = − for k ≥ 2.
pk−1 2w pk−2 2w

Prove the following statements for k ∈ N0 .

(a) Among all polynomials p ∈ Pk with leading coeﬃcient one, the ortho-
gonal polynomial pk is norm-minimal with respect to · w , i.e.,
1 2
pk w = min pw | p ∈ Pk with p(x) = xk + q(x) for q ∈ Pk−1 .

(b) For all x, y ∈ [a, b], where x = y, we have

k
pj (x) pj (y) 1 pk+1 (x) pk (y) − pk (x) pk+1 (y)
=
j=0
pj 2w pk 2w x−y

and, moreover,

k
(pj (x))2 pk+1 (x) pk (x) − pk (x) pk+1 (x)
= for all x ∈ [a, b].
j=0
pj 2w pk 2w

Exercise 4.36. Show the following identities of the Chebyshev polynomials.

(a) Tk · T = 12 Tk+ + T|k−| for all k, ∈ N0 .
(b) Tk (−x) = (−1)k Tk (x) for all k ∈ N0 .
(c) Tk ◦ T = Tk for all k, ∈ N0 .
136 4 Euclidean Approximation

Exercise 4.37. In this problem, make use of the results in Exercise 4.36.
(a) Prove for g ∈ C [−1, 1] and h(x) = x · g(x), for x ∈ [−1, 1], the relation
1
c0 (h) = c1 (g) and ck (h) = (ck−1 (g) + ck+1 (g)) for all k ≥ 1
2
between the Chebyshev coefficients ck (g) of g and ck (h) of h.
(b) Conclude from the relation in Exercise 4.36 (c) the representation
T2k (x) = Tk (2x2 − 1) for all x ∈ [−1, 1] and k ∈ N0 . (4.54)
(c) Can the representation in (4.54) be used to simplify the evaluation of
a Chebyshev partial sum for an even function in the Clenshaw algo-
rithm, Algorithm 5? If so, how could this simplification be used for the
implementation of the Clenshaw algorithm?
Exercise 4.38. On given coefficient functions ak ∈ C [a, b], for k ≥ 1, and
bk ∈ C [a, b], for k ≥ 2, let pk ∈ C [a, b], for k ≥ 0, be a function sequence
satisfying the three-term recursion
pk+1 (x) = ak+1 (x) pk (x) + bk+1 (x) pk−1 (x) for k ≥ 1
with initial functions p0 ∈ C [a, b] and p1 = a1 p0 ∈ C [a, b]. Show that the
sum

n
fn (x) = cj pj (x) for x ∈ [a, b]
j=0

can, on given coeﬃcients c = (c0 , . . . , cn )T ∈ Rn+1 , be evaluated by the

following generalization of the Clenshaw algorithm, Algorithm 7.

Algorithm 7 Clenshaw algorithm

1: function Clenshaw(c, x)
2: Input: coeﬃcients c = (c0 , . . . , cn )T ∈ Rn+1 and x ∈ [a, b].
3:
4: let zn+1 = 0; zn = cn ;
5: for k = n − 1, . . . , 0 do
6: let zk = ck + ak+1 (x) zk+1 + bk+2 (x) zk+2 ;
7: end for
8: return fn (x) = p0 (x) z0 .
9: end function

Which algorithm results especially for the evaluation of a Legendre partial

sum
n
fn (x) = cj Lj (x) for x ∈ [−1, 1]
j=0

with the Legendre polynomials L0 , . . . , Ln (from Deﬁnition 4.23)?

4.5 Exercises 137

Exercise 4.39. In this problem, we consider approximating the exponential

function f (x) = e−x on the interval [−1, 1] by polynomials from Pn , for
1/2
n ∈ N0 , with respect to the weighted norm · w = (·, ·)w , where
1
w(x) = √ for x ∈ (−1, 1).
1 − x2
To this end, we use the Chebyshev polynomials Tk (x) = cos(k arccos(x)).
Compute the coeﬃcients c∗ = (c∗0 , . . . , c∗n )T ∈ Rn+1 of the best approximation

n
p∗n (x) = c∗k Tk (x) ∈ Pn for x ∈ [−1, 1] and n ∈ N0 .
k=0

Exercise 4.40. In this problem, we use the Legendre polynomials

dk k!
Lk (x) = (x2 − 1)k for 0 ≤ k ≤ n
dxk (2k)!
to determine the best approximation p∗n ∈ Pn , n ∈ N0 , to the exponential
function f (x) = e−x on [−1, 1] w.r.t. the (unweighted) Euclidean norm · 2 .
Compute the ﬁrst eight coeﬃcients c∗ = (c∗0 , . . . , c∗7 )T ∈ R8 of the sought
best approximation

n
p∗n (x) = c∗k Lk (x) for x ∈ [−1, 1].
k=0

Exercise 4.41. In this programming problem, we compare the two approx-

imations to f (x) = e−x from the previous Exercises 4.39 and 4.40.
(a) Evaluate the two best approximations p∗n ∈ Pn (from Exercises 4.39
and 4.40, respectively) for n = 3, 4, 5, 6, 7 at N + 1 equidistant points
2j
xj = −1 + for j = 0, . . . , N
N
for a suitable N ≥ 1 by the modiﬁed Clenshaw algorithm, Algorithm 6.
Plot the graphs of the functions p∗n and f in one ﬁgure, for n = 3, 4, 5, 6, 7.
(b) Record for your computations in (a) the approximation errors
5
6N
6
ε2 = 7 |p∗n (xj ) − f (xj )|2 and ε∞ = max |p∗n (xj ) − f (xj )|,
0≤j≤N
j=0

for n = 3, 4, 5, 6, 7. Display your results in one table.

(c) Compare the approximation by Chebyshev polynomials (Exercise 4.39)
with the approximation by Legendre polynomials (Exercise 4.40). Take
notes of your numerical observations. Did the observed numerical results
match your perception?
138 4 Euclidean Approximation

Exercise 4.42. Consider for n ∈ N0 the Hermite function

hn (x) := Hn (x) · e−x

2
/2
for x ∈ R, (4.55)

where Hn denotes the n-th Hermite polynomial in (4.41).

Show that the Hermite functions hn satisfy the diﬀerential equation

hn (x) − x2 − 2n − 1 hn (x) = 0 for n ≥ 0.

Moreover, prove the recursion

hn+1 (x) = xhn (x) − hn (x) for n ≥ 0.

Hint: Use the recursions from Theorem 4.29 and Corollary 4.30.
5 Chebyshev Approximation

In this chapter, we study, for a compact domain Ω ⊂ Rd , d ≥ 1, the approxi-

mation of continuous functions from the linear space
C (Ω) = {u : Ω −→ R | u continuous}
with respect to the maximum norm
u∞ = max |u(x)| for u ∈ C (Ω).
x∈Ω

The maximum norm · ∞ is also referred to as Chebyshev1 norm, and so in

this chapter, we are concerned with Chebyshev approximation, i.e., approxi-
mation with respect to · ∞ .
To approximate functions from C (Ω), we work with finite-dimensional
linear subspaces S ⊂ C (Ω). Under this assumption, there is for any f ∈ C (Ω)
a best approximation s∗ ∈ S to f , see Corollary 3.8. Further in Chapter 3,
we analyzed the problem of Chebyshev approximation from a more general
viewpoint. Recall that we have made a negative observation: The Chebyshev
norm · ∞ is not strictly convex, as shown in Example 3.34. According to
Theorem 3.37, however, strictly convex norms guarantee (for convex S ⊂ F)
the uniqueness of best approximations. Therefore, the problem of approxi-
mation with respect to the Chebyshev norm · ∞ appears to be, at first
sight, rather critical.
But we should not be too pessimistic. In this chapter, we will derive
suitable conditions on the approximation space S ⊂ C (Ω), under which we
can even guarantee strong uniqueness of best approximations. However, accor-
ding to the Mairhuber-Curtis theorem, Theorem 5.25, strong uniqueness can
only be achieved for the univariate case. Therefore, the case d = 1, where
Ω = [a, b] ⊂ R is a compact interval, is of primary importance. In fact, we
will study Chebyshev approximation to continuous functions in C [a, b] by
algebraic polynomials from Pn , for n ∈ N0 , in more detail.
Furthermore, we derive suitable characterizations for best approximations
for the particular case · = ·∞ , where we can rely on our previous results in
Chapter 3. This finally leads us to the Remez algorithm, an iterative numerical
method to compute best approximations with respect to the Chebyshev norm
· ∞ . We show linear convergence for the Remez iteration.
1
Pafnuty Lvovich Chebyshev (1821-1894), Russian mathematician

© Springer Nature Switzerland AG 2018 139

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_5
140 5 Chebyshev Approximation

5.1 Approaches to Construct Best Approximations

For a compact domain Ω ⊂ Rd , d ≥ 1, we denote by C (Ω) the linear space of
all continuous functions on Ω. Moreover, we assume throughout this chapter
that S ⊂ C (Ω) is a ﬁnite-dimensional linear subspace of C (Ω). Under these
assumptions, there exists, according to Corollary 3.8, for any f ∈ C (Ω) a
best approximation s∗ ∈ S to f with respect to the Chebyshev norm · ∞ .
However, s∗ is not necessarily unique, since · ∞ is not strictly convex.
Now we apply the characterizations for best approximations from Chap-
ter 3 to the special case of the Chebyshev norm ·∞ . We begin with the direct
characterizations from Section 3.4, where we had proven the Kolmogorov cri-
terion, Corollary 3.55. We can adapt the Kolmogorov criterion to the Cheby-
shev norm · ∞ as follows.

Theorem 5.1. Let S ⊂ C (Ω) be a linear subspace of C (Ω) and suppose

f ∈ C (Ω) \ S. Then s∗ ∈ S is a best approximation to f with respect to
· ∞ , if and only if

max s(x) sgn((s∗ − f )(x)) ≥ 0 for all s ∈ S, (5.1)

x∈Es∗ −f

where
Es∗ −f = {x ∈ Ω : |(s∗ − f )(x)| = s∗ − f ∞ } ⊂ Ω
denotes the set of extremal points of s∗ − f in Ω.

Proof. By the equivalence of the Kolmogorov criterion, Corollary 3.55, s∗ is

a best approximation to f with respect to · = · ∞ , if and only if

+ (s∗ − f, s − s∗ ) = max (s − s∗ )(x) sgn((s∗ − f )(x)) ≥ 0 for all s ∈ S,

x∈Es∗ −f

where we have used the Gâteaux derivative of the norm · ∞ from Theo-
rem 3.64. By the linearity of S, this condition is equivalent to (5.1).

Given the result of Theorem 5.1, we can immediately solve one simple
problem of Chebyshev approximation. To this end, we regard the univariate
case, d = 1, where Ω = [a, b] ⊂ R for a compact interval. In this case, we
wish to approximate continuous functions from C [a, b] by constants.

Corollary 5.2. Let [a, b] ⊂ R be compact and f ∈ C [a, b]. Then

fmin + fmax
c∗ = ∈ P0
2
is the unique best approximation to f from P0 with respect to · ∞ , where

fmin = min f (x) and fmax = max f (x).

x∈[a,b] x∈[a,b]
5.1 Approaches to Construct Best Approximations 141

Proof. For f ∈ P0 , the statement is trivial. Now suppose f ∈ C [a, b] \ P0 .

The continuous function f ∈ C [a, b] attains its minimum and maximum on
the compact interval [a, b]. Therefore, there are xmin , xmax ∈ [a, b] satisfying
fmin = f (xmin ) and fmax = f (xmax ).
Obviously, xmin , xmax lie in the set of extremal points Ec∗ −f , where
c∗ − f (xmin ) = η
c∗ − f (xmax ) = −η
with η = c∗ − f ∞ = (fmax − fmin )/2 > 0. Moreover, in this case we have
max c sgn(c∗ − f (x)) = c sgn(c∗ − f (xmin )) = c ≥ 0
x∈Ec∗ −f

for c ≥ 0 on the one hand, and

max c sgn(c∗ − f (x)) = c sgn(c∗ − f (xmax )) = −c > 0
x∈Ec∗ −f

for c < 0 on the other hand. Altogether, the Kolmogorov criterion from
Theorem 5.1,
max c sgn(c∗ − f (x)) ≥ 0 for all c ∈ P0 ,
x∈Ec∗ −f

is satisﬁed. Therefore, c∗ is a best approximation to f from P0 with respect

to · ∞ .
Finally, c∗ is the unique best approximation to f , since for c = c∗ we have
c − f ∞ ≥ |c − fmin | > |c∗ − fmin | = c∗ − f ∞ for c > c∗ ;
c − f ∞ ≥ |c − fmax | > |c∗ − fmax | = c∗ − f ∞ for c < c∗ .

Observe from the above construction of the unique best approximation
c∗ ∈ P0 to f ∈ C [a, b] that there are at least two diﬀerent extremal points
x1 , x2 ∈ Ec∗ −f which satisfy the alternation condition
(c∗ − f )(xk ) = (−1)k σc∗ − f ∞ for k = 1, 2 (5.2)
for some σ ∈ {±1}. The alternation condition (5.2) is necessary and suﬃcient
for a best approximation from P0 . Moreover, there is no upper bound for the
number of alternation points. We can further explain this by the following
simple example.
Example 5.3. We approximate fm (x) = cos(mx), for m ∈ N, on the interval
[−π, π] by constants. According to Corollary 5.2, c∗ ≡ 0 is the unique best
approximation to fm from P0 , for all m ∈ N. We get c∗ − fm ∞ = 1 for
the minimal distance between fm and P0 , and the error function c∗ − fm has
2m + 1 alternation points xk = πk/m, for k = −m, . . . , m. ♦
142 5 Chebyshev Approximation

For the approximation of f ∈ C [a, b] by polynomials from Pn−1 , there are

at least n+1 alternation points. Moreover, the best approximation p∗ ∈ Pn−1
to f is unique. We can prove these two statements by another corollary from
the Kolmogorov criterion, Theorem 5.1.
Corollary 5.4. Let [a, b] ⊂ R be compact and f ∈ C [a, b] \ Pn−1 , where
n ∈ N. Then there is a unique best approximation p∗ ∈ Pn−1 to f from Pn−1
with respect to · ∞ . Moreover, there are at least n + 1 extremal points
{x1 , . . . , xn+1 } ⊂ Ep∗ −f with a ≤ x1 < . . . < xn+1 ≤ b, that are satisfying
the alternation condition

(p∗ − f )(xk ) = (−1)k σp∗ − f ∞ for k = 1, . . . , n + 1 (5.3)

for some σ ∈ {±1}.

Proof. The existence of a best approximation is covered by Corollary 3.8.
Now let p∗ ∈ Pn−1 be a best approximation to f . We decompose the set
of extremal points Ep∗ −f into m pairwise disjoint, non-empty, and monoto-
nically increasing subsets, E1 , . . . , Em ⊂ Ep∗ −f , i.e.,

a ≤ x 1 < x 2 < . . . < xm ≤ b for all xk ∈ Ek and k = 1, . . . , m, (5.4)

so that the sign of the error p∗ − f is alternating on the sets Ek ⊂ Ep∗ −f ,

1 ≤ k ≤ m, i.e., we have, for some σ ∈ {±1},

sgn((p∗ − f )(xk )) = (−1)k σ for all xk ∈ Ek for k = 1, . . . , m. (5.5)

We denote the order relation in (5.4) in short by E1 < . . . < Em .

Note that there are at least two extremal points in Ep∗ −f , at which the
error function p∗ −f has diﬀerent signs. Indeed, this is because the continuous
function p∗ − f attains its minimum and maximum on [a, b], so that

(p∗ − f )(xmin ) = −p∗ − f ∞ and (p∗ − f )(xmax ) = p∗ − f ∞

for xmin , xmax ∈ [a, b]. Otherwise, p∗ cannot be a best approximation to f .

Therefore, there are at least two subsets Ek in the above decomposition
of Ep∗ −f , i.e., m ≥ 2. We now show that we even have m ≥ n + 1 for the
number of subsets Ek .
Suppose m < n+1, or, m ≤ n. Then there is a set X ∗ = {x∗1 , . . . , x∗m−1 } of
size m − 1, whose points x∗k are located between the points from neighbouring
subsets Ek < Ek+1 , respectively, so that

xk < x∗k < xk+1 for all xk ∈ Ek , xk+1 ∈ Ek+1 and k = 1, . . . , m − 1.

In this case, the corresponding knot polynomial

m−1
ωX ∗ (x) = (x − x∗k ) ∈ Pm−1 ⊂ Pn−1
k=1
5.1 Approaches to Construct Best Approximations 143

has on the subsets Ek alternating signs, where

sgn(ωX ∗ (xk )) = (−1)m−k for all k = 1, . . . , m.

Now for the polynomial p = p∗ + σ̂ ωX ∗ ∈ Pn−1 , with σ̂ ∈ {±1}, we have

sgn((p − p∗ )(xk )(p∗ − f )(xk )) = σ̂(−1)m−k (−1)k σ = σ̂(−1)m σ

for all xk ∈ Ek and for k = 1, . . . , m. Letting σ̂ = −(−1)m σ, we have

max (p − p∗ )(xk )sgn((p∗ − f )(xk ) < 0.

xk ∈Ep∗ −f

This, however, is in contradiction to the Kolmogorov criterion, Theorem 5.1.

Therefore, there are at least m ≥ n + 1 monotonically increasing non-empty
subsets E1 < . . . < Em of Ep∗ −f , for which the sign of the error function
p∗ − f is alternating, i.e., we have (5.5) with m ≥ n + 1, which implies the
alternation condition (5.3).
Now we prove the uniqueness of p∗ by contradiction.
Suppose there is another best approximation q ∗ ∈ Pn−1 to f , p∗ = q ∗ .
Then, the convex combination p = (p∗ + q ∗ )/2 ∈ Pn−1 is according to The-
orem 3.16 yet another best approximation to f . In this case, there are for p
at least n + 1 alternation points x1 < . . . < xn+1 , so that

(p − f )(xk ) = (−1)k σp − f ∞ for k = 1, . . . , n + 1

for some σ ∈ {±1}, where {x1 , . . . , xn+1 } ⊂ Ep−f .

But the n + 1 alternation points x1 , . . . , xn+1 of p are also contained in
each of the extremal point sets Ep∗ −f and Eq∗ −f . Indeed, this is because in

1 ∗ 1
|(p − f )(xk )| + |(q ∗ − f )(xk )|
p − f ∞ = |(p − f )(xk )| ≤
2 2
1 1
≤ p∗ − f ∞ + q ∗ − f ∞ = p∗ − f ∞ = q ∗ − f ∞ ,
2 2
equality holds for k = 1, . . . , n + 1. In particular, we have

|(p∗ − f )(xk ) + (q ∗ − f )(xk )| = |(p∗ − f )(xk )| + |(q ∗ − f )(xk )|

for all 1 ≤ k ≤ n + 1.
Due to the strict convexity of the norm | · | (see Remark 3.27) and by the
equivalence statement (d) in Theorem 3.26, the signs of the error functions
p∗ − f and q ∗ − f must agree on {x1 , . . . , xn+1 }, i.e.,

sgn((p∗ − f )(xk )) = sgn((q ∗ − f )(xk )) for all k = 1, . . . , n + 1.

Altogether, the values of the polynomials p∗ , q ∗ ∈ Pn−1 coincide on the n + 1

points x1 , . . . , xn+1 , which implies p∗ ≡ q ∗ .
144 5 Chebyshev Approximation

Now we note another important corollary, which directly follows from our
observation in Proposition 3.42 and from Exercise 3.73.
Corollary 5.5. For L > 0 let f ∈ C [−L, L]. Moreover, let p∗ ∈ Pn , for
n ∈ N0 , be the unique best approximation to f from Pn with respect to · ∞ .
Then the following statements are true.
(a) If f is even, then its best approximation p∗ ∈ Pn is even.
(b) If f is odd, then its best approximation p∗ ∈ Pn is odd.
Proof. The linear space Pn of algebraic polynomials is reﬂection-invariant,
i.e., for p(x) ∈ Pn , we have p(−x) ∈ Pn . Moreover, by Corollary 5.4 there
exists for any f ∈ C [−L, L] a unique best approximation p∗ ∈ Pn to f from
Pn with respect to · ∞ . Without loss of generality, we assume L = 1. By
Proposition 3.42 and Exercise 3.73, both statements (a) and (b) hold.
For illustration, we apply Corollary 5.5 in the following two examples.
Example 5.6. We approximate fm (x) = sin(mx), for m ∈ N, on [−π, π] by
linear polynomials. The function fm is odd, for all m ∈ N, and so is the best
approximation p∗m ∈ P1 to fm odd. Therefore, p∗m has the form p∗m (x) = αm x
for a slope αm ≥ 0, which is yet to be determined.
Case 1: For m = 1, the constant c ≡ 0 cannot be a best approximation
to f1 (x) = sin(x), since c − f1 has only two alternation points ±π/2. By
symmetry, we can restrict our following investigations to the interval [0, π].
The function p∗1 (x) − f1 (x) = α1 x − sin(x), with α1 > 0, has two alternation
points {x∗ , π} on [0, π],
(p∗1 − f1 )(x∗ ) = α1 x∗ − sin(x∗ ) = −η and (p∗1 − f1 )(π) = α1 π = η,
where η = p∗1 − f1 ∞ is the minimal distance between f1 and P1 . Moreover,
the alternation point x∗ satisﬁes the condition
0 = (p∗1 − f1 ) (x∗ ) = α1 − cos(x∗ ) which implies α1 = cos(x∗ ).
Therefore, x∗ is a solution of the nonlinear equation
cos(x∗ )(x∗ + π) = sin(x∗ ),
which we can solve numerically, whereby we obtain the alternation point x∗ ≈
1.3518, the slope α1 = cos(x∗ ) ≈ 0.2172 and the minimal distance η ≈ 0.6825.
Altogether, the best approximation p∗1 (x) = α1 x with {−π, −x∗ , x∗ , π} gives
four alternation points for p∗1 − f1 on [−π, π], see Figure 5.1 (a).
Case 2: For m > 1, p∗m ≡ 0 is the unique best approximation to fm .
For the minimal distance, we get p∗m − fm ∞ = 1 and the error function
∗
pm − fm has 2m alternation points
2k − 1
xk = ± π for k = 1, 2, . . . , m,
2m
see Figure 5.1 (b) for the case m = 2. ♦
5.1 Approaches to Construct Best Approximations 145
1

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1
-3 -2 -1 0 1 2 3

(a) approximation of the function f1 (x) = sin(x)

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1
-3 -2 -1 0 1 2 3

(b) approximation of the function f2 (x) = sin(2x)

Fig. 5.1. Approximation of the function fm (x) = sin(mx) on [−π, π] by linear

polynomials for (a) m = 1 and (b) m = 2. The best approximation p∗m ∈ P1 to fm ,
m = 1, 2, is odd. In Example 5.6, we determine the best approximation p∗m ∈ P1 to
fm and the corresponding alternation points for all m ∈ N.
146 5 Chebyshev Approximation

Regrettably, the characterization of best approximations in Corollary 5.4

is not constructive, since neither do we know the set of extremal points Ep∗ −f
nor do we know the minimal distance p∗ −f ∞ a priori. Otherwise, we could
immediately compute the best approximating polynomial p∗ ∈ Pn−1 from the
interpolation conditions

p∗ (xk ) = f (xk ) + (−1)k η where η = σp∗ − f ∞ ,

for k = 1, . . . , n+1. For further illustration, we discuss the following example,

where we can predetermine some of the extremal points.
Example 5.7. We approximate the absolute-value function f (x) = |x|
on [−1, 1] by quadratic polynomials. To construct the best approximation
p∗2 ∈ P2 to f ∈ C [−1, 1] we ﬁrst note the following observations.
• The function f is even, and so p∗2 must be even, by Corollary 5.5.
• By Corollary 5.4 there are at least four extremal points, |Ep∗2 −f | ≥ 4.
• The error function e = p∗2 −f on [0, 1] is a quadratic polynomial. Therefore,
e has on (0, 1) at most one local extremum x∗ ∈ (0, 1). This local extremum
must lie in the set of extremal points Ep∗2 −f . By symmetry, −x∗ ∈ (−1, 0)
is also contained in the set of extremal points Ep∗2 −f .
• Further extrema of the error function e can only be at the origin or at
the boundary points ±1. Since |Ep∗2 −f | ≥ 4 and due to symmetry, both
boundary points ±1 must lie in Ep∗2 −f .
• To satisfy the alternation condition, the origin must necessarily lie in
Ep∗2 −f . Indeed, for the subset E = {−1, −x∗ , x∗ , 1} ⊂ Ep∗2 −f the four
signs of e on E are symmetric, in particular not alternating. Therefore, we
have Ep∗2 −f = {−1, −x∗ , 0, x∗ , 1} for some x∗ ∈ (0, 1).
• By symmetry we can restrict ourselves in the following investigations to the
unit interval [0, 1]: Since the error function e = p∗2 − f has three extrema
{0, x∗ , 1} in [0, 1], e has two zeros in (0, 1), i.e., the function graphs of f and
p∗2 intersect in (0, 1) at two points. Hence, p∗2 is convex, where p∗2 (0) > 0.
We can now sketch the function graphs of f and p∗2 (see Figure 5.2).
By our above observations the best approximation p∗2 has the form

p∗2 (x) = η + αx2

with the minimal distance η = p∗2 −f ∞ , for some positive slope α > 0. More-
over, e = p∗2 − f has on the set of extremal points Ep∗2 −f = {−1, −x∗ , 0, x∗ , 1}
alternating signs ε = (1, −1, 1, −1, 1). We compute α by the alternation con-
dition at x = 1,
(p∗2 − f )(1) = η + α − 1 = η,
and so we obtain α = 1, so that p∗2 (x) = η + x2 . The local minimum x∗ of
the error function e = p∗2 − f satisﬁes the necessary condition

e (x∗ ) = 2x∗ − 1 = 0,
5.1 Approaches to Construct Best Approximations 147

whereby x∗ = 1/2, so that Ep∗2 −f = {−1, −1/2, 0, 1/2, 1}. Finally, at x∗ = 1/2
we have the alternation condition

(p∗2 − f )(1/2) = η + 1/4 − 1/2 = −η,

holds, whereby η = 1/8. Hence, the quadratic polynomial p∗2 (x) = 1/8 + x2
is the unique best approximation to f from P2 with respect to · ∞ . ♦

1.2

0.8

0.6

0.4

0.2

0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Fig. 5.2. Approximation of the function f (x) = |x| on [−1, 1] by quadratic polyno-
mials. The best approximation p∗2 ∈ P2 to f is even and convex. The set of extremal
points Ep∗2 −f = {−1, −x∗ , 0, x∗ , 1} has ﬁve alternation points.

A more constructive account for the computation of best approximations

relies on the dual characterizations from Section 3.3. To this end, we recall
the necessary and suﬃcient condition from Theorem 3.48. According to The-
orem 3.48, s∗ ∈ S ⊂ C (Ω) is a best approximation to f ∈ C (Ω) with respect
to · ∞ , if and only if there is a dual functional ϕ ∈ (C (Ω)) satisfying
(a) ϕ∞ = 1.
(b) ϕ(s∗ − f ) = s∗ − f ∞ .
(c) ϕ(s − s∗ ) ≥ 0 for all s ∈ S.
148 5 Chebyshev Approximation

To construct such a characterizing dual functional, we use the assumption

m
ϕ(u) = λk εk u(xk ) for u ∈ C (Ω) (5.6)
k=1

with coeﬃcient vector λ = (λ1 , . . . , λm )T ∈ Λm , lying at the boundary

* +
m
m
Λm = (λ1 , . . . , λm ) ∈ R λk ∈ [0, 1], 1 ≤ k ≤ m,
T
λk = 1 (5.7)
k=1

of the standard simplex Δm ⊂ Rm from (2.38). Moreover,

ε = (ε1 , . . . , εm )T ∈ {±1}m

denotes a sign vector and X = {x1 , . . . , xm } ⊂ Ω is a point set.

Assuming (5.6) condition (a) is already satisﬁed, since
m

|ϕ(u)| = λk εk u(xk ) ≤ u∞ for all u ∈ C (Ω) (5.8)

k=1

and so ϕ∞ ≤ 1. Moreover, for any u ∈ C (Ω) satisfying u∞ = 1 and

u(xk ) = εk , for all 1 ≤ k ≤ m, we have equality in (5.8), so that ϕ has norm
length one by ϕ∞ = λ1 = 1.
To satisfy condition (b), we choose X = Es∗ −f , i.e., Es∗ −f = {x1 , . . . , xm }.
In this case, we get, in combination with εk = sgn((s∗ − f )(xk )), the identity

m
m
ϕ(s∗ − f ) = λk εk (s∗ − f )(xk ) = λk |(s∗ − f )(xk )| = s∗ − f ∞ .
k=1 k=1

But the set of extremal points Es∗ −f is unknown a priori. Moreover, it

remains to satisfy condition (c).
From now on we study the construction of coeﬃcients λ ∈ Λm , signs
ε ∈ {±1}m and points X = {x1 , . . . , xm } in more detail. In order to do so,
we need some technical preparations. We begin with the representation of
convex hulls.

Deﬁnition 5.8. Let F be a linear space and M ⊂ F. Then the convex hull
conv(M) of M is the smallest convex set in F containing M, i.e.,

conv(M) = K.
M⊂K⊂F
K convex
5.1 Approaches to Construct Best Approximations 149

The following representation for conv(M) is much more useful in practice.

Theorem 5.9. Let F be a linear space and M ⊂ F. Then we have
⎧ ⎫
⎨ ⎬
m

conv(M) = λj xj xj ∈ M and λ = (λ1 , . . . , λm )T ∈ Λm for m ∈ N .
⎩ ⎭
j=1

Proof. Let us consider the set

⎧ ⎫
⎨ ⎬
m

K= λj xj xj ∈ M and λ = (λ1 , . . . , λm )T ∈ Λm for m ∈ N . (5.9)
⎩ ⎭
j=1

We now show the following properties for K.

(a) K is convex.
(b) M ⊂ K.
(c) conv(M) ⊂ K.
(a): For x, y ∈ K we have the representations

m
x= λj x j with λ = (λ1 , . . . , λm )T ∈ Λm , {x1 , . . . , xm } ⊂ M, m ∈ N
j=1

n
y= μ k yk with μ = (μ1 , . . . , μn )T ∈ Λn , {y1 , . . . , yn } ⊂ M, n ∈ N.
k=1

Note that any convex combination αx + (1 − α)y, α ∈ [0, 1], can be written
as a convex combination of the points x1 , . . . , xm , y1 , . . . , yn ,

m
n
m
n
αx + (1 − α)y = α λj xj + (1 − α) μ k yk = αλj xj + (1 − α)μk yk ,
j=1 k=1 j=1 k=1

so that αx + (1 − α)y ∈ K for all α ∈ [0, 1].

(b): Any point x ∈ M lies in K, by m = 1, λ1 = 1 and x1 = x in (5.9).
Therefore, the inclusion M ⊂ K holds.
(c): By (a) and (b) K is a convex set containing M. From the minimality
of conv(M) we can conclude conv(M) ⊂ K.
We now show the inclusion K ⊂ conv(M). To this end, we ﬁrst note that
any convex L containing M, i.e., M ⊂ L, is necessarily a superset of K, i.e.,
K ⊂ L. Indeed, this is because L contains all ﬁnite convex combinations of
points from M. This immediately implies

K⊂ L = conv(M).
M⊂L
L convex

Altogether, we have K = conv(M).

150 5 Chebyshev Approximation

By the characterization in Theorem 5.9, we can identify the convex hull

conv(M), for any set M ⊂ F, as the set of all ﬁnite convex combina-
tions of points from M. For ﬁnite-dimensional linear spaces F, the length of
the convex combinations can uniformly be bounded above according to the
Carathéodory2 theorem.

Theorem 5.10. (Carathéodory).

Let F be a linear space of ﬁnite dimension dim(F) = n < ∞. Moreover,
suppose M ⊂ F. Then we have the representation
⎧ ⎫
⎨ ⎬
m

conv(M) = λj xj xj ∈ M, λ = (λ1 , . . . , λm )T ∈ Λm for m ≤ n + 1 .
⎩ ⎭
j=1

Proof. For x ∈ conv(M) we consider a representation of the form

m
x= λj xj with λ = (λ1 , . . . , λm )T ∈ Λm and x1 , . . . , xm ∈ M
j=1

but with minimal m ∈ N. Then λj > 0, i.e., λj ∈ (0, 1] for all 1 ≤ j ≤ m.

From the assumed representation we get

m
λj (x − xj ) = 0 ,
j=1

i.e., the elements x − xj ∈ F, 1 ≤ j ≤ m, are linearly dependent in F.

Now suppose m > n + 1, or, m − 1 > n. Then there are α2 , . . . , αm ∈ R,
that are not all vanishing, with

m
αj (x − xj ) = 0,
j=1

where we let α1 = 0. This gives the representation

m
m
0= (λj + tαj )(x − xj ) = μj (t)(x − xj ) for all t ∈ R,
j=1 j=1

with μj (t) = (λj + tαj ) and therefore μj (0) = λj > 0.

Now we choose one t∗ ∈ R satisfying

μj (t∗ ) = λj + t∗ αj ≥ 0 for all j = 1, . . . , m,

and μk (t∗ ) = 0 for some k ∈ {1, . . . , m}. By

2
Constantin Carathéodory (1873-1950), Greek mathematician
5.1 Approaches to Construct Best Approximations 151

μj (t∗ )
ρj = : m ∗
≥0 for j = 1, . . . , m
k=1 μk (t )

we have

m
ρj = 1
j=1

and

m
m
ρj (x − xj ) = 0 ⇐⇒ x= ρj x j .
j=1 j=1

Note that ρk = 0. But this is in contradiction to the minimality of m.

The Carathéodory theorem implies the following important result.

Corollary 5.11. Let F be a normed linear space of ﬁnite dimension n < ∞.

Suppose M ⊂ F is a compact subset of F. Then conv(M) is compact.

Proof. We regard on the compact set Ln+1 = Λn+1 × Mn+1 the continuous
mapping ϕ : Ln+1 −→ F, deﬁned as

n+1
ϕ(λ, X) = λj x j
j=1

for λ = (λ1 , . . . , λn+1 )T ∈ Λn+1 and X = (x1 , . . . , xn+1 ) ∈ Mn+1 . According

to the Carathéodory theorem, Theorem 5.10, we have ϕ(Ln+1 ) = conv(M).
Therefore, conv(M) is also compact, since conv(M) is the image of the com-
pact set Ln+1 under the continuous mapping ϕ : Ln+1 −→ F.

From Corollary 5.11, we gain the following separation theorem.

Theorem 5.12. Let M ⊂ Rd be compact. Then the following two statements

are equivalent.
(a) There is no β ∈ Rd \ {0} satisfying β T x > 0 for all x ∈ M.
(b) 0 ∈ conv(M).

Proof. (b) ⇒ (a): Let 0 ∈ conv(M). Then we have the representation

m
0= λj xj with λ = (λ1 , . . . , λm )T ∈ Λm and x1 , . . . , xm ∈ M.
j=1

Suppose there is one β ∈ Rd \ {0} satisfying β T x > 0 for all x ∈ M. Then

we immediately get a contradiction by

m
βT 0 = 0 = λj β T xj > 0.
j=1
152 5 Chebyshev Approximation

(a) ⇒ (b): Suppose statement (a) holds. Further suppose that 0 ∈ / conv(M).
Since conv(M) is compact, by Corollary 5.11, there is one β∗ ∈ conv(M),
β∗ = 0, of minimal Euclidean norm in conv(M). This minimum β∗ , viewed
as a best approximation from conv(M) to the origin with respect to · 2 , is
characterized by

(β∗ − 0, x − β∗ ) ≥ 0 for all x ∈ conv(M)

according to the Kolmogorov theorem, Corollary 3.55, in combination with

the Gâteaux derivative for Euclidean norms in Theorem 3.62. But this con-
dition is equivalent to

β∗T x = (β∗ , x) ≥ (β∗ , β∗ ) = β∗ 22 > 0 for all x ∈ conv(M).

But this is in contradiction to our assumption in (a).

Remark 5.13. The equivalence statement (a) in Theorem 5.12 says that the
Euclidean space Rd cannot be split by a separating hyperplane through the
origin into two half-spaces, such that M is entirely contained in one of the
two half-spaces.

5.2 Strongly Unique Best Approximations

Now we wish to further develop the characterizations for best approximations
from Sections 3.3 and 3.4 for the special case of the Chebyshev norm · ∞ .
In the following discussion, {s1 , . . . , sn } ⊂ S, for n ∈ N, denotes a basis for
the ﬁnite-dimensional approximation space S ⊂ C (Ω). To characterize a best
approximation s∗ ∈ S to some f ∈ C (Ω)\S, we work with the compact point
set 1 2
Ms∗ −f = (s∗ − f )(x)(s1 (x), . . . , sn (x))T x ∈ Es∗ −f ⊂ Rn ,
where we can immediately draw the following conclusion from Theorem 5.12.

Corollary 5.14. For s∗ ∈ S the following statements are equivalent.

(a) s∗ is a best approximation to f ∈ C (Ω) \ S.
(b) 0 ∈ conv(Ms∗ −f ).

Proof. In this proof, we use the notation

n
sβ = βj sj ∈ S for β = (β1 , . . . , βn )T ∈ Rn . (5.10)
j=1

(b) ⇒ (a): Let 0 ∈ conv(Ms∗ −f ). Suppose s∗ ∈ S is not a best approxi-

mation to f . Then there is one β = (β1 , . . . , βn )T ∈ Rn \ {0} satisfying

s∗ − f − sβ ∞ < s∗ − f ∞

5.2 Strongly Unique Best Approximations 153

In this case, we have

|(s∗ − f )(x) − sβ (x)|2 < |(s∗ − f )(x)|2 for all x ∈ Es∗ −f .

But this is equivalent to

|(s∗ − f )(x)|2 − 2(s∗ − f )(x)sβ (x) + s2β (x) < |(s∗ − f )(x)|2

for all x ∈ Es∗ −f , so that

1 2
(s∗ − f )(x)sβ (x) > s (x) ≥ 0 for all x ∈ Es∗ −f ,
2 β
i.e.,

β T (s∗ − f )(x)(s1 (x), . . . , sn (x))T > 0 for all x ∈ Es∗ −f .

By the equivalence statements in Theorem 5.12, we see that the origin is

in this case not contained in the convex hull conv(Ms∗ −f ). But this is in
contradiction to statement (b).
(a) ⇒ (b): Let s∗ be a best approximation to f . Suppose 0 ∈ / conv(Ms∗ −f ).
Due to Theorem 5.12, there is one β = (β1 , . . . , βn )T ∈ Rn \ {0} satisfying
β T u > 0, or, −β T u < 0, for all u ∈ Ms∗ −f . But this is equivalent to

(s∗ − f )(x) s−β (x) < 0 for all x ∈ Es∗ −f ,

i.e., s∗ − f and s−β have opposite signs on Es∗ −f , whereby

sgn((s∗ − f )(x)) s−β (x) < 0 for all x ∈ Es∗ −f .

In particular (by using the compactness of Es∗ −f ), we have

max sgn((s∗ − f )(x)) s−β (x) < 0.

x∈Es∗ −f

But this is, due to the Kolmogorov criterion, Theorem 5.1, in contradiction
to the optimality of s∗ in (a),
Corollary 5.14 yields an important result concerning the characterization
of best approximations.
Corollary 5.15. For s∗ ∈ S the following statements are equivalent.
(a) s∗ is a best approximation to f ∈ C (Ω) \ S.
(b) There are m ≤ n + 1
• pairwise distinct extremal points x1 , . . . , xm ∈ Es∗ −f
• signs εj = sgn((s∗ − f )(xj )), for j = 1, . . . , m,
• coeﬃcients λ = (λ1 , . . . , λm )T ∈ Λm with λj > 0 for all 1 ≤ j ≤ m,
satisfying
m
ϕ(s) := λj εj s(xj ) = 0 for all s ∈ S. (5.11)
j=1
154 5 Chebyshev Approximation

Proof. (a) ⇒ (b): Let s∗ be a best approximation to f .

Then we have 0 ∈ conv(Ms∗ −f ) by Corollary 5.14. According to the
Carathéodory theorem, Theorem 5.10, there are m ≤ n + 1 extremal points
x1 , . . . , xm ∈ Es∗ −f and coeﬃcients λ = (λ1 , . . . , λm )T ∈ Λm satisfying

m
m
m
0= λj ((s∗ − f )(xj ))sk (xj ) = λj εj s∗ − f ∞ sk (xj ) = λj εj sk (xj )
j=1 j=1 j=1

for all basis elements sk ∈ S, k = 1, . . . , n.

(b) ⇒ (a): Under the assumption in (b), we have 0 ∈ conv(Ms∗ −f ),
whereby s∗ is a best approximation to f , due to Corollary 5.14.

Remark 5.16. In statement (b) of Corollary 5.15 the alternation condition

εj · εj+1 = −1 for j = 1, . . . , m − 1

is not necessarily satisﬁed. In Corollary 5.4, we considered the special case

of polynomial approximation by S = Pn−1 ⊂ C [a, b]. In that case, the alter-
nation condition (5.3) is satisﬁed with at least n + 1 extremal points. But in
Corollary 5.15 only at most n + 1 extremal points are allowed.

In the following discussion, we will see how the characterizations of Corol-

lary 5.4 and Corollary 5.15 can be combined. To this end, the following result
is of central importance, whereby we can even prove strong uniqueness for
best approximations.

Theorem 5.17. For f ∈ C (Ω) \ S, let s∗ ∈ S be a best approximation to f .

Moreover, suppose that ϕ : C (Ω) −→ R is a linear functional of the form

m
ϕ(u) = λk εk u(xk ) for u ∈ C (Ω) (5.12)
k=1

satisfying the dual characterization (5.11) of Corollary 5.15 for a point set
X = {x1 , . . . , xm } ⊂ Es∗ −f , where 2 ≤ m ≤ n + 1. Then, we have for any
s ∈ S the estimates
λmin
s − f ∞ ≥ s − f ∞,X ≥ s∗ − f ∞ + s∗ − s∞,X , (5.13)
1 − λmin
where λmin := min1≤j≤m λj > 0.

Proof. Suppose s ∈ S. Then, the ﬁrst estimate in (5.13) is trivial. To show

the second estimate in (5.13), we use the ingredients ε ∈ {±1}m , λ ∈ Λm ,
and X ⊂ Es∗ −f for the functional ϕ in (5.12) from the dual characterization
of Corollary 5.15. Note that we have

s − f ∞,X ≥ εj (s − f )(xj ) = εj (s − s∗ )(xj ) + εj (s∗ − f )(xj )

5.2 Strongly Unique Best Approximations 155

and, moreover, εj (s∗ − f )(xj ) = s∗ − f ∞ , for all j = 1, . . . , m, so that

s − f ∞,X ≥ s∗ − f ∞ + εj (s − s∗ )(xj ) for all 1 ≤ j ≤ m. (5.14)

Since m ≥ 2, we have λmin ∈ (0, 1/2] and so λmin /(1 − λmin ) ∈ (0, 1].
Now let xj ∗ ∈ X be a point satisfying |(s − s∗ )(xj ∗ )| = s − s∗ ∞,X . If
εj (s − s∗ )(xj ∗ ) = s − s∗ ∞,X , then the second estimate in (5.13) is satisﬁed,
with λmin /(1−λmin ) ≤ 1. Otherwise, we have εj (s−s∗ )(xj ∗ ) = −s−s∗ ∞,X ,
whereby with ϕ(s − s∗ ) = 0 the estimate

m
λj ∗ s−s∗ ∞,X = λk εk (s−s∗ )(xk ) ≤ (1−λj ∗ ) max∗ εk (s−s∗ )(xk ) (5.15)
k=j
k=1
k=j ∗

follows. Now then, for k ∗ ∈ {1, . . . , m} \ {j ∗ } satisfying

εk∗ (s − s∗ )(xk∗ ) = max∗ εk (s − s∗ )(xk )

k=j

we ﬁnd, due to (5.15), the estimate

λmin λj ∗
s∗ − s∞,X ≤ s∗ − s∞,X ≤ εk∗ (s − s∗ )(xk∗ ),
1 − λmin 1 − λj ∗

which implies, in combination with (5.14), the second estimate in (5.13).

Note that for any best approximation s∗ ∈ S to f ∈ C (Ω) \ S, the

estimates in (5.13) yield the inequality

λmin
s − f ∞ − s∗ − f ∞ ≥ s − s∗ ∞,X for all s ∈ S. (5.16)
1 − λmin

Given this result, we can further analyze the question for the (strong) unique-
ness of best approximations to f ∈ C (Ω) \ S. To this end, we ﬁrst take note
of the following simple observation.

Remark 5.18. Let s∗ ∈ S be a best approximation to f ∈ C (Ω) \ S. Then,

for any other best approximation s∗∗ ∈ S to f ∈ C (Ω) we have

λmin
0 = s∗∗ − f ∞ − s∗ − f ∞ ≥ s∗∗ − s∗ ∞,X
1 − λmin

by (5.16), and this implies, for λmin ∈ (0, 1), the identity

s∗∗ − s∗ ∞,X = 0.

In conclusion, all best approximations to f must coincide on X. Now if ·∞,X

is a norm on S, then s∗ will be the unique best approximation to f .
156 5 Chebyshev Approximation

In the following Section 5.3, we develop suitable conditions on S ⊂ C (Ω),

under which we can guarantee the uniqueness of best approximations. In
our developments the definiteness of · ∞,X plays an important role. As
we can show already now, the definiteness of · ∞,X guarantees the strong
uniqueness of a best approximation s∗ ∈ S to f ∈ C (Ω) \ S.
Theorem 5.19. Under the assumptions of Theorem 5.17, let · ∞,X be a
norm on S. Then there exists for any f ∈ C (Ω) \ S a strongly unique best
approximation s∗ ∈ S to f .
Proof. The approximation space S ⊂ C (Ω) is finite-dimensional. Therefore,
there exists for any f ∈ C (Ω) a best approximation s∗ ∈ S to f , according
to Corollary 3.8. Moreover, all norms on S are equivalent. In particular, the
two norms · ∞ and · ∞,X are on S equivalent, so that there is a constant
β > 0 satisfying
s∞,X ≥ βs∞ for all s ∈ S. (5.17)
By (5.13), the best approximation s∗ to f is strongly unique, since
λmin
s−f ∞ −s∗ −f ∞ ≥ s−s∗ ∞,X ≥ αs−s∗ ∞ for all s ∈ S,
1 − λmin
where α = βλmin /(1 − λmin ) > 0.
Before we continue our analysis, we first discuss two examples.
Example 5.20. Let F = C [−1, 1], S = P1 ⊂ F and f (x) = x2 . Then,
c∗ ≡ 1/2 is according to Corollary 5.2 the unique best approximation to f
from P0 . Since f is even, the unique best approximation p∗1 ∈ P1 to f from
P1 is also even, due to Corollary 5.5. In this case, p∗1 is necessarily constant,
and so c∗ is also the unique best approximation to f from P1 . Moreover, the
error function c∗ − f has on the interval [−1, 1] exactly three extremal points
X = {x1 , x2 , x3 } = {−1, 0, 1}, where the alternation conditions are satisfied,
1
c∗ − f (xj ) = (−1)j c∗ − f ∞ = (−1)j · for j = 1, 2, 3.
2
For λ1 = 1/4, λ2 = 1/2, λ3 = 1/4 and εj = (−1)j , for j = 1, 2, 3, we have

3
λj εj p(xj ) = 0 for all p ∈ P1 .
j=1

Moreover, · ∞,X is a norm on P1 . According to Theorem 5.19, the

constant c∗ is the strongly unique best approximation to f from P1 . By
λmin = min1≤j≤3 λj = 1/4 we get, like in the proof of Theorem 5.19 (with
β = 1), the estimate
λmin 1
p − f ∞ − c∗ − f ∞ ≥ p − c∗ ∞,X = p − c∗ ∞ for all p ∈ P1
1 − λmin 3
for the strong uniqueness of c∗ with the constant α = 1/3. ♦
5.2 Strongly Unique Best Approximations 157

For further illustration, we make the following link to Example 5.7.

Example 5.21. Let F = C [−1, 1], S = P2 ⊂ F and f (x) = |x|. From

Example 5.7 the function p∗2 (x) = 1/8 + x2 is the unique best approximation
to f from P2 with extremal point set Ep∗2 −f = {0, ±1/2, ±1}.
For the dual characterization of the best approximation p∗2 ∈ P2 to f , we
seek, according to Corollary 5.15, a set X ⊂ Ep∗2 −f of extremal points, where
2 ≤ m = |X| ≤ dim(P2 ) + 1 = 4, signs εj = sgn((p∗2 − f )(xj )), 1 ≤ j ≤ m,
and coeﬃcients λ = (λ1 , . . . , λm ) ∈ Λm satisfying

m
λj εj p(xj ) = 0 for all p ∈ P2 . (5.18)
j=1

This results in dim(P2 ) = 3 linear equations. Together with

λ1 + . . . + λm = 1 (5.19)

we get a total number of four linear equation conditions for λ ∈ Λm . There-

fore, we let m = 4 and, moreover, we take X = {−1/2, 0, 1/2, 1} ⊂ Ep∗2 −f
with signs ε = (−1, 1, −1, 1). In this way, we reformulate (5.18) as follows.

−λ1 p(−1/2) + λ2 p(0) − λ3 p(1/2) + λ4 p(1) = 0 for all p ∈ P2 . (5.20)

We pose the conditions from (5.20) to the three elements of the monomial
basis {1, x, x2 } of P2 . For p ≡ 1 we get −λ1 + λ2 − λ3 + λ4 = 0, whereby
from (5.19) we get

λ2 + λ4 = 1/2 and λ1 + λ3 = 1/2. (5.21)

For p(x) = x and p(x) = x2 , we get by (5.20) the conditions

1 1
λ4 = (λ3 − λ1 ) and λ4 = (λ1 + λ3 ). (5.22)
2 4
Then, (5.21) implies λ4 = 1/8 and moreover λ2 = 3/8. From (5.21) and (5.22)
we ﬁnally compute λ3 = 3/8 and λ1 = 1/8. Therefore,

1 λmin 1
λmin = and = .
8 1 − λmin 7
The characterization (5.13) in Theorem 5.17 implies the estimate
1
p − f ∞ − p∗2 − f ∞ ≥ p − p∗2 ∞,X for all p ∈ P2 . (5.23)
7
Next, we show the strong uniqueness of p∗2 , where we use Theorem 5.19.
To this end, note that · ∞,X is a norm on P2 . Therefore, it remains to
determine an equivalence constant β > 0, like in (5.17), satisfying
158 5 Chebyshev Approximation

p∞,X ≥ βp∞ for all p ∈ P2 . (5.24)

We choose for p ∈ P2 the monomial representation p(x) = a0 + a1 x + a2 x2 .

By evaluation of p on the point set X = {−1/2, 0, 1/2, 1}, we get

a0 = p(0), a1 = p(1/2) − p(−1/2), a2 = p(1) − p(0) − p(1/2) + p(−1/2),

and therefore the (rough) estimate

p∞ ≤ |a0 | + |a1 | + |a2 | ≤ p∞,X + 2p∞,X + 4p∞,X = 7p∞,X

for all p ∈ P2 , whereby (5.24) holds for β = 1/7. Together with (5.23), this
ﬁnally yields the sought estimate
1 1
p − f ∞ − p∗2 − f ∞ ≥ p − p∗2 ∞,X ≥ p − p∗2 ∞ for all p ∈ P2 .
7 49
Therefore, p∗2 is the strongly unique best approximation to f . ♦

5.3 Haar Spaces

In this section, we develop sufficient conditions for the approximation space
S ⊂ C (Ω) under which a best approximation s∗ ∈ S to f ∈ C (Ω) \ S is
strongly unique. To this end, we can rely on the result of Theorem 5.19,
according to which we need to ensure the definiteness of · ∞,X on S for
any X ⊂ Es∗ −f .
We continue to use the assumptions and notations from the previous
section, where (s1 , . . . , sn ) ∈ S n , for n ∈ N, denotes an ordered basis of a
finite-dimensional linear approximation space S ⊂ C (Ω). By the introduction
of Haar3 spaces we specialize our assumptions on S and (s1 , . . . , sn ) as follows.

Deﬁnition 5.22. A linear space S ⊂ C (Ω) with dim(S) = n < ∞ is called

a Haar space of dimension n ∈ N on Ω, if any s ∈ S \ {0} has at most n − 1
zeros on Ω. A basis H = (s1 , . . . , sn ) ∈ S n for a Haar space S on Ω is called
a Haar system on Ω.

In Haar spaces S of dimension n ∈ N, we can solve interpolation problems

for a discrete set X ⊂ Ω containing |X| = n pairwise distinct points. In this
case, · ∞,X is a norm on S, and so a solution of the interpolation problem
is unique. We can further characterize Haar spaces as follows.
3
Alfréd Haar (1885-1933), Hungarian mathematician
5.3 Haar Spaces 159

Theorem 5.23. Let S ⊂ C (Ω) be a linear space of dimension n ∈ N and

X = {x1 , . . . , xn } ⊂ Ω a set of n pairwise distinct points. Then the following
statements are equivalent.
(a) Any s ∈ S \ {0} has at most n − 1 zeros on X.
(b) For s ∈ S, we have the implication

sX = 0 =⇒ s ≡ 0 on Ω,

i.e., · ∞,X is a norm on S.

(c) For any fX ∈ Rn , there is one unique s ∈ S satisfying sX = fX .
(d) For any basis H = (s1 , . . . , sn ) ∈ S n of S, the Vandermonde matrix
⎡ ⎤
s1 (x1 ) · · · s1 (xn )
⎢ .. ⎥ ∈ Rn×n
VH,X = ⎣ ... . ⎦
sn (x1 ) · · · sn (xn )

is regular, where in particular det(VH,X ) = 0.

If one of the statements (a)-(d) holds for all sets X = {x1 , . . . , xn } ⊂ Ω of
n pairwise distinct points, then all of the remaining three statements (a)-(d)
are satisﬁed for all X. In this case, S is a Haar space of dimension n on Ω.

Proof. Let X = {x1 , . . . , xn } ⊂ Ω be a set of n pairwise distinct points in

Ω. Obviously, the statements (a) and (b) are equivalent. By statement (b),
the linear mapping LX : s −→ sX is injective. Since n = dim(S) = dim(Rn ),
this is equivalent to statement (c), i.e., LX is surjective, and, moreover, also
equivalent to statement (d), i.e., LX is bijective. This completes our proof
for the equivalence of statements (a)-(d).
If one of the statements in (a)-(d) holds for all sets X = {x1 , . . . , xn } ⊂ Ω
of n pairwise distinct points, then all of the remaining three statements in
(a)-(d) are satisﬁed, due to the equivalence of statements (a)-(d). In this
case, statement (a) holds in particular, for all sets X = {x1 , . . . , xn } ⊂ Ω,
i.e., any s ∈ S \ {0} has at most n − 1 zeros on Ω, whereby S is, according
Deﬁnition 5.22, a Haar space of dimension n on Ω.

According to the Mairhuber4 -Curtis5 theorem [17, 48] there are no non-
trivial Haar systems on multivariate connected domains Ω ⊂ Rd , d > 1.
Before we prove the Mairhuber-Curtis theorem, we introduce a few notions.

Deﬁnition 5.24. A domain Ω ⊂ Rd is said to be connected, if for any

pair of two points x, y ∈ Ω there is a continuous mapping γ : [0, 1] −→ Ω
satisfying γ(0) = x and γ(1) = y, i.e., the points x and y can be connected
by a continuous path in Ω.
4
John C. Mairhuber (1922-2007), US American mathematician
5
Philip C. Curtis, Jr. (1928-2016), US-American mathematician
160 5 Chebyshev Approximation

Moreover, we call a domain Ω ⊂ Rd homeomorphic to a subset of the

sphere S1 := {x ∈ R2 | x2 = 1} ⊂ R2 , if for a non-empty and connected
subset U ⊂ S1 there is a bijective continuous mapping ϕ : Ω −→ U with
continuous inverse ϕ−1 : U −→ Ω.

Theorem 5.25. (Mairhuber-Curtis, 1956/1959).

Let H = (s1 , . . . , sn ) ∈ (C (Ω))n be a Haar system of dimension n ≥ 2 on
a connected set Ω ⊂ Rd , d > 1. Then, Ω contains no bifurcation, i.e., Ω is
homeomorphic to a subset of the sphere S1 ⊂ R2 .

Fig. 5.3. According to the Mairhuber-Curtis theorem, Theorem 5.25, there are no
non-trivial Haar systems H on domains Ω containing bifurcations.

Proof. Suppose Ω contains a bifurcation (see Figure 5.3 for illustration).

Moreover, X = {x1 , . . . , xn } ⊂ Ω be a subset of n ≥ 2 pairwise distinct
points in Ω. Now regard the determinant
⎡ ⎤
s1 (x1 ) s1 (x2 ) s1 (x3 ) · · · s1 (xn )
⎢ s2 (x1 ) s2 (x2 ) s2 (x3 ) · · · s2 (xn ) ⎥
⎢ ⎥
d{x1 ,x2 ,x3 ...,xn } = det(VH,X ) = det ⎢ . .. .. .. ⎥ .
⎣ .. . . . ⎦
sn (x1 ) sn (x2 ) sn (x3 ) · · · sn (xn )

If d{x1 ,x2 ,x3 ...,xn } = 0, then H, by Theorem 5.23, is not a Haar system.
Otherwise, we can shift the two points x1 und x2 by a continuous mapping
along the two branches of the bifurcation, without any coincidence between
points in X (see Figure 5.4).
Therefore, the determinant d{x2 ,x1 ,x3 ,...,xn } has, by swapping the ﬁrst two
columns in matrix VH,X , opposite sign to d{x1 ,x2 ,x3 ,...,xn } , i.e.,

sgn d{x1 ,x2 ,x3 ,...,xn } = −sgn d{x2 ,x1 ,x3 ,...,xn } .
5.3 Haar Spaces 161

x2 x
xn 1
...

domain Ω X = (x1 , . . . , xn ) ∈ Ω n

x1 x1

x2
xn xn
x2
... ...

step 1: shift of x1 step 2: shift of x2

x1 x1 x
xn xn 2
x2
... ...

step 3: back-shift of x1 step 4: back-shift of x2

Fig. 5.4. Illustration of the Mairhuber-Curtis theorem, Theorem 5.25. The two
points x1 and x2 can be swapped by a continuous mapping, i.e., by shifts along the
branches of the bifurcation without coinciding with any other point from X.
162 5 Chebyshev Approximation

Due to the continuity of the determinant, there must be a sign change of the
determinant during the (continuous) swapping between x1 and x2 . In this
case, H = {s1 , . . . , sn } cannot be a Haar system, by Theorem 5.23. But this
is in contradiction to our assumption to H.
Due to the result of the Mairhuber-Curtis theorem, Theorem 5.25, we
restrict ourselves from now to the univariate case, d = 1. Moreover, we assume
from now that the domain Ω is a compact interval, i.e.,
Ω = [a, b] ⊂ R for − ∞ < a < b < ∞.
Before we continue our analysis on strongly unique best approximations,
we ﬁrst give a few elementary examples for Haar spaces.
Example 5.26. For n ∈ N0 and [a, b] ⊂ R the linear space of polynomials Pn
is a Haar space of dimension n+1 on [a, b], since according to the fundamental
theorem of algebra any non-trivial polynomial from Pn has at most n zeros.
♦
Example 5.27. For N ∈ N0 the linear space TNC of all complex trigonometric
polynomials of degree at most N is a Haar space of dimension N + 1 on
[0, 2π), since TNC is, by Theorem 2.36, a linear space of dimension N + 1, and,
moreover, the linear mapping p −→ pX , for p ∈ TNC is, due to Theorem 2.39,
for all sets X ⊂ [0, 2π) of |X| = N + 1 pairwise distinct points bijective.
Likewise, we can show, by using Corollaries 2.38 and 2.40, that the linear
space TnR of all real trigonometric polynomials of degree at most n ∈ N0 is a
Haar space of dimension 2n + 1 on [0, 2π). ♦
Example 5.28. For [a, b] ⊂ R and λ0 < . . . < λn the functions
1 λ0 x 2
e , . . . , e λn x
are a Haar system on [a, b]. We can show this by induction on n.
Initial step: For n = 0 the statement is trivial.
Induction hypothesis: Suppose the statement is true for n − 1 ∈ N.
Induction step (n − 1 −→ n): If a function of the form
1 2
u(x) ∈ span eλ0 x , . . . , eλn x
has n + 1 zeros in [a, b], then the function
d −λ0 x
v(x) = e · u(x) for x ∈ [a, b]
dx
has, according to the Rolle6 theorem, at least n zeros in [a, b]. However,
3 4
v(x) ∈ span e(λ1 −λ0 )x , . . . , e(λn −λ0 )x ,

which implies v ≡ 0 by the induction hypothesis, and so u ≡ 0. ♦

6
Michel Rolle (1652-1719), French mathematician
5.3 Haar Spaces 163

Example 5.29. The functions f1 (x) = x and f2 (x) = ex are not a Haar
system on [0, 2]. This is because dim(S) = 2 for S = span{f1 , f2 }, but the
continuous function
f (x) = ex − 3x ≡ 0
has by f (0) = 1, f (1) = e − 3 < 0 and f (2) > 0 at least two zeros in [0, 2].
Therefore, S cannot be a Haar space on [0, 2]. ♦
Example 5.30. For [a, b] ⊂ R let g ∈ C n+1 [a, b] satisfy g (n+1) (x) > 0 for all
x ∈ [a, b]. Then, the functions {1, x, . . . , xn , g} are a Haar system on [a, b]:
First note that the functions 1, x, . . . , xn , g(x) are linearly independent, since
from
α0 1 + α1 x + . . . + αn xn + αn+1 g(x) ≡ 0 for x ∈ [a, b]
we can conclude αn+1 g (n+1) (x) ≡ 0 after (n + 1)-fold diﬀerentiation, whereby
αn+1 = 0. The remaining coeﬃcients α0 , . . . , αn do also vanish, since the
monomials 1, x, . . . , xn are linearly independent. Moreover, we can show that
any function u ∈ span{1, x, . . . , xn , g} \ {0} has at most n + 1 zeros in [a, b]:
Suppose
n
u(x) = αj xj + αn+1 g(x) ≡ 0
j=0

has n + 2 zeros in [a, b]. Then, the (n + 1)-th derivative

u(n+1) (x) = αn+1 g (n+1) (x)
has, due to the Rolle theorem, at least one zero in [a, b]. But this implies
αn+1 = 0, since g (n+1) is positive on [a, b]. In this case, u ∈ Pn is a polynomial
of degree at most n, which, according to the fundamental theorem of algebra,
vanishes identically on [a, b]. But this is in contraction to our assumption. ♦
Now we return to the dual characterization of (strongly) unique best
approximations. According to Corollary 5.15, there is for any best approxi-
mation s∗ ∈ S to f ∈ C [a, b] a characterizing dual functional ϕ : C (Ω) −→ S
of the form
m
ϕ(u) = λj εj u(xj ) for u ∈ C [a, b] (5.25)
j=1

satisfying ϕ(S) = 0, where m ≤ n+1. For the case of Haar spaces S ⊂ C [a, b]
the length of the dual functional in (5.25) is necessarily m = n + 1. Let us
take note of this important observation.
Proposition 5.31. Let ϕ : C [a, b] −→ R be a functional of the form (5.25),
where m ≤ n + 1. Moreover, let S ⊂ C [a, b] be a Haar space of dimension
dim(S) = n ∈ N on [a, b]. If ϕ(S) = {0}, then we have m = n + 1.
Proof. Suppose m ≤ n. Then, due to Theorem 5.23 (c), the Haar space S
contains one element s ∈ S satisfying s(xj ) = εj , for all 1 ≤ j ≤ m. But for
this s, we ﬁnd ϕ(s) = λ1 = 1, in contradiction to ϕ(S) = {0}.
164 5 Chebyshev Approximation

In the following discussion, we consider, for a ﬁxed basis H = (s1 , . . . , sn )

of the Haar space S, points X = (x1 , . . . , xn+1 ) ∈ I n+1 , and sign vectors
ε = (ε1 , . . . , εn+1 ) ∈ {±1}n+1 , the non-singular Vandermonde matrices
⎡ ⎤
s1 (x1 ) · · · s1 (xk−1 ) s1 (xk+1 ) · · · s1 (xn+1 )
⎢ ⎥
VH,X\{xk } = ⎣ ... ..
.
..
.
..
. ⎦∈R
n×n
(5.26)
sn (x1 ) · · · sn (xk−1 ) sn (xk+1 ) · · · sn (xn+1 )

for 1 ≤ k ≤ n + 1, and the alternation matrix

⎡ ⎤
ε1 . . . εn+1
⎢ s (x ) · · · s (x ⎥
ε ⎢ 1 1 1 n+1 ) ⎥
Aε,H,X = =⎢ . . ⎥ ∈ R(n+1)×(n+1) . (5.27)
VH,X ⎣ .. .. ⎦
sn (x1 ) · · · sn (xn+1 )

We ﬁrst take note of a few properties for VH,X\{xk } and Aε,H,X .

Proposition 5.32. Let H = (s1 , . . . , sn ) be a Haar system on an interval

I ⊂ R and X = (x1 , . . . , xn+1 ) ∈ I n+1 be a vector of n + 1 pairwise distinct
points. Then, the following statements are true.
(a) For the Vandermonde matrices VH,X\{xk } in (5.26), the signs of the n+1
determinants

dk = det(VH,X\{xk } ) = 0 for 1 ≤ k ≤ n + 1

are constant, i.e., sgn(dk ) = σ, for all 1 ≤ k ≤ n + 1, for some σ ∈ {±1}.

(b) If the signs in ε = (ε1 , . . . , εn+1 ) ∈ {±1}n+1 are alternating, i.e., if

εk = (−1)k−1 σ for 1 ≤ k ≤ n + 1

for some σ ∈ {±1}, then the matrix Aε,H,X in (5.27) is non-singular.

Proof. (a): Suppose we have sgn(dk ) = sgn(dk+1 ) for some 1 ≤ k ≤ n.

We consider a continuous mapping γ : [0, 1] −→ I satisfying γ(0) = xk
and γ(1) = xk+1 . In this case, the continuous determinant mapping

d(α) = det(VH,(x1 ,...,xk−1 ,γ(α),xk+2 ,...,xn+1 ) ) for α ∈ [0, 1]

satisfying d(0) = dk+1 and d(1) = dk must have a sign change on (0, 1). Due
to the continuity of d there is one α∗ ∈ (0, 1) satisfying d(α∗ ) = 0. However,
in this case, the Vandermonde matrix VH,(x1 ,...,xk−1 ,γ(α∗ ),xk+2 ,...,xn+1 ) ∈ Rn×n
is singular. Due to Theorem 5.23 (d), the elements in (s1 , . . . , sn ) are not a
Haar system on I ⊂ R. But this is in contradiction to our assumption.
5.3 Haar Spaces 165

(b): According to the Laplace7 expansion (here with respect to the ﬁrst
row), the determinant of Aε,H,X has the representation

n+1
n+1
det(Aε,H,X ) = (−1)k+1 (−1)k−1 σ · dk = σ dk .
k=1 k=1

Due to statement (a), the signs of the determinants dk , 1 ≤ k ≤ n + 1, are

constant, which implies det(Aε,H,X ) = 0.

By using the results of Propositions 5.31 and 5.32, we can prove the
alternation theorem, being the central result of this chapter. According to the
alternation theorem, the signs ε = (ε1 , . . . , εn+1 ) of the dual characterization
in (5.25) are for the case of Haar spaces S alternating. Before we prove the
alternation theorem, we ﬁrst give a formal deﬁnition for alternation sets.

Deﬁnition 5.33. Let S ⊂ C (I) be a Haar space of dimension n ∈ N on an

interval I ⊂ R. Moreover, suppose s∗ ∈ S and f ∈ C (I)\S. Then, an ordered
set X = (x1 , . . . , xn+1 ) ∈ Esn+1
∗ −f ⊂ I
n+1
of n + 1 monotonically increasing
extremal points x1 < . . . < xn+1 is called an alternation set for s∗ and f ,
if
εj = sgn((s∗ − f )(xj )) = (−1)j σ for all j = 1, . . . , n + 1
for some σ ∈ {±1}, i.e., if the signs of s∗ − f are alternating on X.

Theorem 5.34. (Alternation theorem).

Let S ⊂ C (I) be a Haar space of dimension n ∈ N on an interval I ⊂ R.
Moreover, let IK ⊂ I be a compact subset containing at least n + 1 elements.
Then, there is for any f ∈ C (IK ) \ S a strongly unique best approximation
s∗ ∈ S to f with respect to · ∞,IK . The best approximation s∗ is charac-
∗ −f ⊂ IK
terized by the existence of an alternation set X ∈ Esn+1 n+1
for s∗ and f .

Proof. Due to Corollary 3.8, any f ∈ C (IK ) has a best approximation s∗ ∈ S.

Moreover, the strong uniqueness of s∗ follows from Theorem 5.19, where the
assumptions required therein for Theorem 5.17 are covered by Corollary 5.15.
Now we prove the stated characterization for s∗ .
To this end, let X = (x1 , . . . , xn+1 ) ∈ Esn+1
∗ −f ⊂ IK
n+1
be an alternation
∗ ∗
set for s and f with (alternating) signs εj = sgn((s − f )(xj )) = (−1)j σ, for
1 ≤ j ≤ n + 1, and some σ ∈ {±1}. Then, we consider the linear system
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ε1 · · · εn+1 ε1 λ 1 1
⎢ s1 (x1 ) · · · s1 (xn+1 ) ⎥ ⎢ ε2 λ2 ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. .. ⎥·⎢ .. ⎥ = ⎢ .. ⎥ (5.28)
⎣ . . ⎦ ⎣ . ⎦ ⎣.⎦
sn (x1 ) · · · sn (xn+1 ) εn+1 λn+1 0
7
Pierre-Simon Laplace (1749-1827), French mathematician and physicist
166 5 Chebyshev Approximation

with the alternation matrix Aε,H,X on the left hand side in (5.28). According
to Proposition 5.32 (a), the matrix Aε,H,X is non-singular. Therefore, the
products εk λk , for 1 ≤ k ≤ n + 1, uniquely solve the linear system (5.28).
Due to the Cramer8 rule we have the representation
(−1)k−1 dk
ε k λk = for all 1 ≤ k ≤ n + 1,
det(Aε,H,X )
where according to Proposition 5.32 (a) the signs of the n + 1 determinants
dk = det(VH,X\{xk } ), for 1 ≤ k ≤ n + 1, are constant. This implies εk λk = 0,
and, moreover, there is one unique vector λ = (λ1 , . . . , λn+1 )T ∈ Λn+1 with
positive coeﬃcients
dk
λk = :n+1 >0 for all 1 ≤ k ≤ n + 1
j=1 dj
which solves the linear system (5.28). This solution λ ∈ Λn+1 of (5.28) ﬁnally
yields the characterizing functional (according to Corollary 5.15),

n+1
ϕ(u) = λj εj u(xj ) for u ∈ C (IK ), (5.29)
j=1

satisfying ϕ(S) = {0}. Due to Corollary 5.15, s∗ is the (strongly unique) best
approximation to f .
Now suppose that s∗ ∈ S is the strongly unique best approximation to
f ∈ C (IK ) \ S. Recall that the dual characterization in Corollary 5.15 proves
the existence of a functional ϕ : C (IK ) −→ R of the form (5.25) satisfying
ϕ(S) = {0}, where ϕ has, according to Proposition 5.31, length m = n + 1.
We show that the point set X = (x1 , . . . , xn+1 ) ∈ Esn+1 ∗ −f (from the dual

characterization in Corollary 5.15) is an alternation set for s∗ and f , where

our proof is by contradiction. To this end, let ε = (ε1 , . . . , εn+1 ) ∈ {±1}n+1
denote the sign vector of s∗ −f with εj = sgn((s∗ −f )(xj )), for 1 ≤ j ≤ n+1.
Now suppose there is one index k ∈ {1, . . . , n} satisfying εk = εk+1 . Then
there is one s ∈ S \ {0} satisfying s(xj ) = 0 for all j ∈ {k, k + 1} and
s(xk ) = εk . Since s cannot have more than n − 1 zeros in I, we necessarily
have
εk = sgn(s(xk )) = sgn(s(xk+1 )) = εk+1 .
This particularly implies
ϕ(s) = λk + λk+1 |s(xk+1 )| > 0,
which, however, is in contradiction to ϕ(s) = 0.
We ﬁnally remark that the characterizing alternation set X ∈ Esn+1
for
∗ −f

s∗ and f in the alternation theorem, Theorem 5.34, is not necessarily unique.

This is because the set Es∗ −f of extremal points can be arbitrarily large (see
Example 5.3).
8
Gabriel Cramer (1704-1752), Swiss mathematician
5.4 The Remez Algorithm 167

5.4 The Remez Algorithm

In this section, we discuss the Remez9 algorithm [59, 60], an iterative method
to numerically compute the (strongly unique) best approximation s∗ ∈ S to
f ∈ C [a, b] \ S, where [a, b] ⊂ R is a compact interval. Moreover, S ⊂ C [a, b]
denotes a Haar space of dimension n ∈ N on [a, b].
In any of its iteration steps, the Remez algorithm computes, for an ordered
(i.e., monotonically increasing) reference set X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 of
length |X| = n + 1, the corresponding (strongly unique) best approximation
s∗X to f with respect to · ∞,X , so that

s∗X − f ∞,X < s − f ∞,X for all s ∈ S \ {s∗X }.

To compute s∗X , we ﬁrst ﬁx an ordered basis H = (s1 , . . . , sn ) of the Haar

space S, so that s∗X can be represented as linear combination

n
s∗X = αj∗ sj ∈ S (5.30)
j=1

of the Haar system H with coeﬃcients α∗ = (α1∗ , . . . , αn∗ )T ∈ Rn . According

to the alternation theorem, Theorem 5.34, the sought best approximation s∗X
necessarily satisﬁes the alternation condition

(s∗X − f )(xk ) = (−1)k−1 σs∗X − f ∞,X for 1 ≤ k ≤ n + 1 (5.31)

for some σ ∈ {±1}. Letting ηX = σs∗X − f ∞,X we rewrite (5.31) as

s∗X (xk ) + (−1)k ηX = f (xk ) for 1 ≤ k ≤ n + 1. (5.32)

Therefore, ηX and the unknown coeﬃcients α∗ ∈ Rn of s∗X are the solution

of the linear equation system

η
ATε,H,X · X∗ = fX (5.33)
α

with the right hand side fX = (f (x1 ), . . . , f (xn+1 ))T ∈ Rn+1 and the alter-
nation matrix Aε,H,X ∈ R(n+1)×(n+1) in (5.27), containing the sign vector
ε = (−1, 1, . . . , (−1)n+1 ) ∈ {±1}n+1 , or,
⎡ ⎤⎡ ⎤ ⎡ ⎤
−1 s1 (x1 ) · · · sn (x1 ) ηX f (x1 )
⎢ 1 s1 (x2 ) · · · sn (x2 ) ⎥ ⎢ ∗⎥ ⎢ ⎥
⎢ ⎥ ⎢ α1 ⎥ ⎢ f (x2 ) ⎥
⎢ .. .. .. ⎥ ⎢ . ⎥ = ⎢ .. ⎥.
⎣ . . . ⎦ ⎣ .. ⎦ ⎣ . ⎦
(−1)n+1 s1 (xn+1 ) · · · sn (xn+1 ) αn∗ f (xn+1 )

By Proposition 5.32 (b), the matrix Aε,H,X is non-singular, and so the

solution of the linear system (5.33) is unique. By the solution of (5.33), we
9
Evgeny Yakovlevich Remez (1896-1975), mathematician
168 5 Chebyshev Approximation

do not only obtain the coeﬃcients α∗ = (α1∗ , . . . , αn∗ )T ∈ Rn of the best

approximation s∗X in (5.30), but also by |ηX | = s∗X − f ∞,X we get the
minimal distance and the sign σ = sgn(ηX ) in (5.31).
The following observation concerning Chebyshev approximation to a func-
tion f ∈ C [a, b] by algebraic polynomials from Pn−1 shows that, in this special
case, the linear system (5.33) can be avoided. To this end, we use the New-
ton representation (2.34) for the interpolation polynomial in Theorem 2.13.
Recall that the Newton polynomials are given as

k
ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n − 1.
j=1

In particular, we apply the linear operator [x1 , . . . , xn+1 ] : C [a, b] −→ R of the

divided differences to f (see Definition 2.10). To evaluate [x1 , . . . , xn+1 ](f ), we
apply the recursion in Theorem 2.14. The recursion in Theorem 2.14 operates
only on the vector of function values fX = (f (x1 ), . . . , f (xn+1 ))T ∈ Rn+1 .
Therefore, the application of [x1 , . . . , xn+1 ] is also well-defined for any sign
vector ε ∈ {±1}n+1 of length n + 1. In particular, the divided difference
[x1 , . . . , xn+1 ](ε) can also be evaluated by the recursion in Theorem 2.14. In
the formulation of the following result, we apply divided differences to vectors
ε with alternating signs.

Proposition 5.35. For n ∈ N, let X = (x1 , . . . , xn+1 ) be an ordered set of

n + 1 points in [a, b] ⊂ R, ε = (−1, 1, . . . , (−1)n+1 ) ∈ {±1}n+1 a sign vector,
and f ∈ C [a, b] \ Pn−1 . Then,

n−1
s∗X = [x1 , . . . , xk+1 ](f − ηX ε)ωk ∈ Pn−1 (5.34)
k=0

is the strongly unique best approximation s∗X ∈ Pn−1 to f w.r.t. · ∞,X ,

where
[x1 , . . . , xn+1 ](f )
ηX = . (5.35)
[x1 , . . . , xn+1 ](ε)
The minimal distance is given as

s∗X − f ∞,X = |ηX |.

Proof. Application of the linear operator [x1 , . . . , xn+1 ] : C [a, b] −→ R to the

alternation condition (5.32) immediately gives the representation

[x1 , . . . , xn+1 ](f )

ηX = .
[x1 , . . . , xn+1 ](ε)

Indeed, due to Corollary 2.18 (b), all polynomials from Pn−1 are contained
in the kernel of [x1 , . . . , xn+1 ]. In particular, we have [x1 , . . . , xn+1 ](s∗X ) = 0.
5.4 The Remez Algorithm 169

Under the alternation condition (5.32), s∗X ∈ Pn−1 is the unique solution
of the interpolation problem

s∗X (xk ) = f (xk ) − (−1)k ηX for 1 ≤ k ≤ n,

already for the ﬁrst n alternation points (x1 , . . . , xn ) ∈ Esn∗ −f . This gives the
X
stated Newton representation of s∗X in (5.34).

Remark 5.36. Note that all divided diﬀerences

[x1 , . . . , xk+1 ](f − ηX ε) = [x1 , . . . , xk+1 ](f ) − ηX [x1 , . . . , xk+1 ](ε)

in (5.34) are readily available from the computation of ηX in (5.35). Therefore,

we can compute the best approximation s∗X ∈ Pn−1 to f with respect to
· ∞,X by divided diﬀerences in only O(n2 ) steps, where the computation
is eﬃcient and stable.

To show how the result of Proposition 5.35, in combination with Re-

mark 5.36, can be applied, we make the following concrete example.

Example 5.37. Let F = C [0, 2] and S = P1 ⊂ F. We approximate the

exponential function f (x) = ex on the reference set X = {0, 1, 2}. To com-
pute the minimal distance ηX and the best approximation s∗X ∈ P1 we use
Proposition 5.35 with n = 2. We apply divided diﬀerences to the sign vector
ε = (−1, 1, −1) and to the data vector fX = (1, e, e2 ), where e denotes the
Euler number. By the recursion in Theorem 2.14, we obtain the following
triangular scheme for divided diﬀerences (see Table 2.1).

X fX X εX
0 1 0 −1
1 e e−1 1 1 2
2 e 2
e(e − 1) (e − 1) /22
2 −1 −2 −2

Hereby we obtain
2 2
[0, 1, 2](f ) e−1 e−1
ηX = =− and so s∗X − f ∞,X = .
[0, 1, 2](ε) 2 2

Moreover,
2
e−1 e2 − 1
s∗X = [0](f − ηX ε) + [0, 1](f − ηX ε)x = 1 − + x
2 2

is the unique best approximation to f from P1 w.r.t. ·∞,X (see Fig. 5.5 (a)).
♦
170 5 Chebyshev Approximation
8

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(a) X0 = {0, 1, 2}, s∗0 − f ∞,X0 ≈ 0.7381

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(b) X1 = {0, x∗ , 2}, s∗1 − f ∞,X1 ≈ 0.7579

Fig. 5.5. Approximation to f (x) = ex on [0, 2] by linear polynomials from P1 . (a)

Initial reference set X0 = {0, 1, 2} with minimal distance |η0 | = (e−1)2 /4 ≈ 0.7381.
(b) Reference set X1 = {0, x∗ , 2}, where x∗ = log((e2 − 1)/2) ≈ 1.1614, with
minimal distance |η1 | = 14 (e2 − 1)(x∗ − 1) + 2 ≈ 0.7579 (see Example 5.45).
5.4 The Remez Algorithm 171

We present another example, where we link with Example 5.7.

Example 5.38. We approximate the absolute-value function f (x) = |x|

on [−1, 1] by quadratic polynomials, i.e., S = P2 . By our previous investi-
gations in Example 5.7, Ep∗2 −f = {−1, −1/2, 0, 1/2, 1} is the set of extremal
points for the best approximation p∗2 ∈ P2 to f . To compute p∗2 , we ap-
ply Proposition 5.35 with n = 3. We let ε = (−1, 1, −1, 1) and we choose
X = {−1, −1/2, 0, 1/2} ⊂ Ep∗2 −f as the reference set. By the recursion in
Theorem 2.14, we obtain the following triangular scheme (see Table 2.1).

X fX X εX
−1 1 −1 −1
− 12 1
2 −1 − 12 1 4
0 0 −1 0 0 −1 −4 −8
1 1 4 1 32
2 2 1 2 3 2 1 4 8 3

Hereby we obtain ηX = 1/8, and so s∗X − f ∞,X = 1/8, along with

9 3
[x1 ](f − ηX ε) = , [x1 , x2 ](f − ηX ε) = − , [x1 , x2 , x3 ](f − ηX ε) = 1.
8 2
Therefore,

9 3 1 1
s∗X (x) = − (x + 1) + (x + 1) x + = + x2
8 2 2 8

is the unique best approximation to f from P2 with respect to · ∞ . Note

that this is consistent with our observations in Example 5.7, since p∗2 ≡ s∗X .
♦

Now we describe the iteration steps of the Remez algorithm. At any Remez
step the current reference set (in increasing order)

X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1

is modiﬁed. This is done by a Remez exchange of one point x̂ ∈ X for one

point x∗ ∈ [a, b] \ X, where

|(s∗X − f )(x∗ )| = s∗X − f ∞ ,

so that the next reference set is

X+ = (X \ {x̂}) ∪ {x∗ } = (x+

1 , . . . , xn+1 ) ∈ [a, b]
+ n+1
.

With the Remez exchange, the point x∗ is swapped for the point x̂ ∈ X, such
that the points of the new reference set X+ are in increasing order, i.e.,
172 5 Chebyshev Approximation

a ≤ x+
1 < x2 < . . . < xn < xn+1 ≤ b,
+ + +

and with maintaining the alternation condition, i.e.,

sgn((s∗X − f )(x+ j
j )) = (−1) σ for 1 ≤ j ≤ n + 1

for some σ ∈ {±1}. The exchange for the point pair (x̂, x∗ ) ∈ X × [a, b] \ X
is described by the Remez exchange, Algorithm 8.

Algorithm 8 Remez exchange

1: function Remez exchange(X,s∗X )
2: Input: reference set X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 ;
3: best approximation s∗X to f with respect to · ∞,X ;
4:
5: ﬁnd x∗ ∈ [a, b] satisfying |(s∗X − f )(x∗ )| = s∗X − f ∞ ;
6: let σ ∗ := sgn((s∗X − f )(x∗ );
7:
8: if x∗ ∈ X then return X; best approximation found
9: else if x∗ < x1 then
10: if sgn((s∗X − f )(x1 )) = σ ∗ then X+ = (x∗ , x2 , . . . , xn+1 );
11: else X+ = (x∗ , x1 , . . . , xn );
12: end if
13: else if x∗ > xn+1 then
14: if sgn((s∗X − f )(xn+1 )) = σ ∗ then X+ = (x1 , . . . , xn , x∗ );
15: else X+ = (x2 , . . . , xn+1 , x∗ );
16: end if
17: else
18: ﬁnd j ∈ {1, . . . , n} satisfying xj < x∗ < xj+1 ;
19: if sgn((s∗X − f )(xj )) = σ ∗ then X+ = (x1 , . . . , xj−1 , x∗ , xj+1 , . . . , xn+1 );
20: else X+ = (x1 , . . . , xj , x∗ , xj+2 , . . . , xn+1 );
21: end if
22: end if
23: return X+ ;
24: end function

Remark 5.39. The reference set X+ = (x+ 1 , . . . , xn+1 ) ∈ [a, b]

+ n+1
, after the
application of one Remez exchange, Algorithm 8, to the previous reference
set X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 , satisﬁes the following three conditions.
• |(s∗X − f )(x∗ )| = s∗X − f ∞ for one x∗ ∈ X+ ;
• |(s∗X − f )(x)| ≥ s∗X − f ∞,X for all x ∈ X+ ;
• sgn((s∗X − f )(x+ j )) = (−1) σ for 1 ≤ j ≤ n + 1 and some σ ∈ {±1};
j

These conditions are required for the performance of the Remez algorithm.

5.4 The Remez Algorithm 173

Now we formulate the Remez algorithm, Algorithm 9, as an iterative

method to numerically compute the (strongly unique) best approximation
s∗ ∈ S to f ∈ C [a, b] \ S satisfying

η = s∗ − f ∞ < s − f ∞ for all s ∈ S \ {s∗ }.

The Remez algorithm generates a sequence (Xk )k∈N0 ⊂ [a, b]n+1 of reference
sets, so that for the transition from X = Xk to X+ = Xk+1 , for any k ∈ N0 ,
all three conditions in Remark 5.39 are satisﬁed. The corresponding sequence
of best approximations s∗k ∈ S to f with respect to · ∞,Xk satisfying

ηk = s∗k − f ∞,Xk < s − f ∞,Xk for all s ∈ S \ {s∗k }

converges to s∗ , i.e., s∗k −→ s∗ and ηk −→ η, for k → ∞, as we will prove in

Theorem 5.43.

Algorithm 9 Remez algorithm

1: function Remez algorithm
2: Input: Haar space S of dimension n ∈ N; f ∈ C [a, b] \ S;
3:
(0) (0)
4: ﬁnd initial reference set X0 = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 ;
5: for k = 0, 1, 2, . . . do
6: compute best approximation s∗k ∈ S to f with respect to · ∞,Xk ;
7: let ηk := s∗k − f ∞,Xk ;
8: compute ρk = s∗k − f ∞ ;
9: if ρk ≤ ηk then return s∗k best approximation found
10: else
(k+1) (k+1)
11: ﬁnd reference set Xk+1 = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 satisfying
∗ ∗ ∗
12: • |(sk − f )(x )| = ρk for some x ∈ Xk+1 ;
13: • |(s∗k − f )(x)| ≥ ηk for all x ∈ Xk+1 ;
(k+1)
14: • sgn((s∗k − f )(xj )) = (−1)j σk
15: for 1 ≤ j ≤ n + 1 and some σk ∈ {±1}. alternation condition
16: end if
17: end for
18: end function

Remark 5.40. We remark that the construction of the reference set Xk+1
in line 11 of Algorithm 9 can be accomplished by a Remez exchange step,
Algorithm 8. In this case, all three conditions in lines 12-15 of Algorithm 9
are satisﬁed, according to Remark 5.39.
174 5 Chebyshev Approximation

Next, we analyze the convergence of the Remez algorithm. To this end,

we first remark that at any step k in the Remez algorithm, we have
(k) (k)
• a current reference set Xk = (x1 , . . . , xn+1 ) ⊂ [a, b]n+1 ,
(k) (k)
• alternating signs ε(k) = (ε1 , . . . , εn+1 ) ∈ {±1}n+1 ,
(k) (k)
• and positive coefficients λ(k) = (λ1 , . . . , λn+1 )T ∈ Λn+1 ,
so that the dual functional ϕ : [a, b] −→ R, defined as

n+1
(k) (k) (k)
ϕ(u) = λj εj u(xj ) for u ∈ C [a, b],
j=1

satisﬁes the characterization (5.29) in the alternation theorem, Theorem 5.34.

In particular, by the alternation theorem, the following properties hold.
• Xk ⊂ Es∗k −f ;
= sgn((s∗k − f )(xj )) = (−1)j σk for all 1 ≤ j ≤ n + 1 with σk ∈ {±1};
(k) (k)
• εj
• εj (s∗k − f )(xj ) = s∗k − f ∞,Xk = ηk for all 1 ≤ j ≤ n + 1;
(k) (k)

• ϕ(s) = 0 for all s ∈ S.

Now we prove the monotonicity of the minimal distances ηk .

Proposition 5.41. Let the assumptions from the Remez algorithm be satis-
ﬁed. Then, for any step k ∈ N0 , where the Remez iteration does not terminate,
we have the monotonicity of the minimal distances,

ηk+1 > ηk .

Proof. The representation

n+1
(k+1) (k+1) ∗ (k+1)
ηk+1 = λj εj (sk+1 − f )(xj )
j=1

n+1
(k+1) (k+1) ∗ (k+1)
= λj εj (sk − f )(xj )
j=1

n+1
|(s∗k − f )(xj
(k+1) (k+1)
= λj )|
j=1

holds. Moreover, we have

= sgn(s∗k − f )(xj
(k+1) (k+1)
εj ) for all 1 ≤ j ≤ n + 1

after the Remez exchange (see Algorithm 9, line 14).

Moreover, we have |(s∗k −f )(xj
(k+1)
)| ≥ ηk , for all 1 ≤ j ≤ n+1 (cf. line 13),
∗
and there is one index j ∈ {1, . . . , n + 1} (cf. line 12) satisfying
5.4 The Remez Algorithm 175

|(s∗k − f )(xj ∗ )| = ρk = s∗k − f ∞ .

(k+1)

But this implies

(k+1) (k+1) (k+1) (k+1)
ηk+1 ≥ λj ∗ ρk + (1 − λj ∗ )ηk > λj ∗ ηk + (1 − λj ∗ )ηk = ηk , (5.36)
which already completes our proof.
(k)
Next, we show that the coeﬃcients λj are uniformly bounded away from
zero.
Lemma 5.42. Let f ∈ C [a, b]\S. Then, under the assumptions of the Remez
algorithm, the uniform bound
(k)
λj ≥α>0 for all 1 ≤ j ≤ n + 1 and all k ∈ N0 ,
holds for some α > 0 which is independent of 1 ≤ j ≤ n and k ∈ N0 .
Proof. We have

n+1
n+1
λj εj (s∗k − f )(xj ) ≥ η0 = s∗ − f ∞,X0 .
(k) (k) (k) (k) (k) (k)
ηk = − λj εj f (xj ) =
j=1 j=1

Suppose the statement is false. Then, there are sequences of reference sets
(Xk )k , signs (ε(k) )k , and coeﬃcients (λ(k) )k satisfying

n+1
(k) (k) (k)
ηk = − λj εj f (xj ) ≥ η0 > 0 for all k ∈ N0 , (5.37)
j=1

where one index j ∗ ∈ {1, . . . , n + 1} satisﬁes λj ∗ −→ 0, for k → ∞.

(k)

But the elements of the sequences (Xk )k , (ε(k) )k , and (λ(k) )k lie in com-
pact sets, respectively. Therefore, there are convergent subsequences with
(k )
xj −→ xj ∈ [a, b] for → ∞,
(k )
εj −→ εj ∈ {±1} for → ∞,
(k )
λj −→ λj ∈ [0, 1] for → ∞,
for all 1 ≤ j ≤ n + 1, where λj ∗ = 0 for one index j ∗ ∈ {1, . . . , n + 1}.
Now we regard an interpolant s ∈ S satisfying s(xj ) = f (xj ) for all
1 ≤ j ≤ n + 1, j = j ∗ . Then, we have

n+1
(k ) (k ) ∗ (k )

n+1
(k ) (k ) (k )
η k = λj εj (sk − f )(xj )= λj εj (s − f )(xj )
j=1 j=1

n+1
(k ) (k ) (k ) (k ) (k ) (k )
= λj εj (s − f )(xj ) + λj ∗ εj ∗ (s − f )(xj ∗ )
j=1
j=j ∗

n+1
−→ λj εj (s − f )(xj ) + λj ∗ εj ∗ (s − f )(xj ∗ ) = 0 for → ∞.
j=1
j=j ∗
176 5 Chebyshev Approximation

But this is in contradiction to (5.37).

Now we can ﬁnally prove convergence for the Remez agorithm.

Theorem 5.43. Either the Remez algorithm, Algorithm 9, terminates after

k ∈ N steps with returning the best approximation s∗k = s∗ to f ∈ C [a, b]
or the Remez algorithm generates convergent sequences of minimal distances
(ηk )k and best approximations (s∗k )k with limit elements

lim ηk = η = s∗ − f ∞ and lim s∗k = s∗ ∈ S,

k→∞ k→∞

where s∗ ∈ S is the strongly unique best approximation to f ∈ C [a, b] with

minimal distance η. The sequence (ηk )k of minimal distances converges lin-
early to η by the contraction

η − ηk+1 < θ(η − ηk ) for some θ ∈ (0, 1). (5.38)

Proof. Let f ∈ C [a, b] \ S (for f ∈ S the statement is trivial).

If the Remez algorithm terminates after k ∈ N steps, in line 9 of Algo-
rithm 9, with s∗k ∈ S, then s∗k = s∗ is, according to the alternation theorem,
Theorem 5.34, the strongly unique best approximation to f .
Now suppose the Remez algorithm does not terminate after ﬁnitely many
steps. For this case, we ﬁrst show the contraction property (5.38).
By the estimate in (5.36),
(k+1) (k+1)
ηk+1 ≥ λj ∗ ρk + (1 − λj ∗ )ηk (5.39)

and from ρk = s∗k − f ∞ > s∗ − f ∞ = η > 0 it follows that

(k+1) (k+1)
ηk+1 > λj ∗ η + (1 − λj ∗ )ηk

and so
(k+1)
η − ηk+1 < (1 − λj ∗ )(η − ηk ).
(k+1)
By Lemma 5.42, there is one α > 0 satisfying λj ≥ α, for all 1 ≤ j ≤ n+1
and all k ∈ N0 . Therefore, the stated contraction (5.38) holds for θ = 1 − α ∈
(0, 1). From this, we get the estimate

η − ηk < θk (η − η0 ) for all k ∈ N0

by induction on k. Therefore, the sequence (ηk )k of minimal distances is

convergent with limit element η, i.e., ηk −→ η, for k → ∞.
From estimate (5.39), we can conclude
ηk+1 − ηk ηk+1 − ηk
ρk ≤ + ηk < + ηk ,
(k+1)
λj ∗ 1−θ
5.4 The Remez Algorithm 177

and this gives the estimates

ηk+1 − ηk
ηk < ρk < + ηk .
1−θ
This implies the convergence of the distances ρk to η, i.e.,

lim ρk = lim s∗k − f ∞ = s∗ − f ∞ = η.

k→∞ k→∞

We can conclude that the sequence (s∗k )k ⊂ S of the strongly unique best
approximations to f on Xk converges to the strongly unique best approxi-
mation s∗ to f .

Finally, we discuss one important observation. We note that for the ap-
proximation of strictly convex functions f ∈ C [a, b] by linear polynomials,
the Remez algorithm may return the best approximation s∗ ∈ P1 to f after
only one step.

Proposition 5.44. Let f ∈ C [a, b] be a strictly convex function on a compact

interval [a, b] and S = P1 . Moreover, let X0 = (a, x0 , b), for x0 ∈ (a, b), be
an initial reference set for the Remez algorithm. Then, the Remez algorithm
terminates after at most one Remez exchange.

Proof. Regard s ∈ P1 in its monomial representation s(x) = m · x + c for

m, c ∈ R. Then, we have for x, y ∈ [a, b], x = y, and λ ∈ (0, 1) the strict
inequality

(f − s)(λx + (1 − λ)y)
= f (λx + (1 − λ)y) − m · (λx + (1 − λ)y) − c
< λf (x) + (1 − λ)f (y) − m · (λx + (1 − λ)y) − c
= λf (x) − λmx − λc + (1 − λ)f (y) − (1 − λ)my − (1 − λ)c
= λ(f − s)(x) + (1 − λ)(f − s)(y)

by the strict convexity of f , i.e., f − s is also strictly convex.

Now let s∗ ∈ P1 be the strongly unique best approximation to f . Due to
the alternation theorem, Theorem 5.34, the error function f − s∗ has at least
three extremal points with alternating signs in [a, b]. Since f − s∗ is strictly
convex and continuous, f − s∗ has exactly one global minimum x∗ on (a, b).
Moreover, two global maxima of f − s∗ are at the boundary of [a, b], i.e., we
have {a, b} ⊂ Es∗ −f with

(f − s∗ )(a) = f − s∗ ∞ = (f − s∗ )(b).

From the representation s∗ (x) = m∗ · x + c∗ , we obtain the slope

f (b) − f (a)
m∗ = = [a, b](f ).
b−a
178 5 Chebyshev Approximation

Let s∗0 ∈ P1 be the best approximation to f with respect to X0 = (a, x0 , b).

Then, according to the alternation theorem, we have

(f − s∗0 )(a) = σf − s∗0 ∞,X0 = (f − s∗0 )(b) for some σ ∈ {±1},

whereby the representation s∗0 (x) = m0 · x + c0 implies m0 = [a, b](f ) = m∗ ,

i.e., s∗ and s∗0 diﬀer by at most one constant.
If x0 ∈ Es∗ −f , then x0 = x∗ , and the best approximation s∗ to f is
already found by s∗0 , due to the alternation theorem. In this case, the Remez
algorithm terminates immediately with returning s∗ = s∗0 .
If x0 ∈ Es∗ −f , then the Remez algorithm selects the unique global mini-
mum x∗ = x0 of f − s∗ for the exchange with x0 : Since s∗ and s∗0 diﬀer by
at most a constant, x∗ is also the unique global minimum of f − s∗0 on (a, b),
i.e., we have

(f − s∗0 )(x∗ ) < (f − s∗0 )(x) for all x ∈ [a, b], where x = x∗ . (5.40)

By the strict convexity of f −s∗0 , we can further conclude the strict inequality

(f − s∗0 )(x∗ ) < (f − s∗0 )(x0 ) < 0,

or,

ρ0 = f −s∗0 ∞ = |(f −s∗0 )(x∗ )| > |(f −s∗0 )(x0 )| = f −s∗0 ∞,X0 = η0 . (5.41)

By (5.40) and (5.41), the point x∗ is the unique global maximum of |f −s∗0 |
on [a, b]. Therefore, x∗ is the only candidate for the required Remez exchange
(in line 5 of Algorithm 8) for x0 . After the execution of the Remez exchange,
we have X1 = (a, x∗ , b), so that the Remez algorithm immediately terminates
with returning s∗1 = s∗ .

For further illustration, we make an example linked with Example 5.37.

Example 5.45. We approximate the strictly convex exponential function

f (x) = exp(x) on the interval [0, 2] by linear polynomials, i.e., F = C [0, 2]
and S = P1 . We take X0 = (0, 1, 2) as the initial reference set in the Remez
algorithm, Algorithm 9. According to Example 5.37,
2
e−1 e2 − 1
s∗0 (x) =1− + x
2 2

is the unique best approximation to f from P1 with respect to · ∞,X0 , with

minimal distance |η0 | = (e − 1)2 /4 ≈ 0.7381, where e is the Euler number.
The error function |s∗0 (x) − exp(x)| attains on [0, 2] its unique maximum
ρ0 = s∗0 − exp ∞ ≈ 0.7776 at x∗ = log((e2 − 1)/2) > 1. We have ρ0 > η0 ,
and so one Remez exchange leads to the new reference set X1 = (0, x∗ , 2).
5.5 Exercises 179

According to Proposition 5.44, the Remez algorithm returns already after the
next iteration the best approximation s∗1 to f .
Finally, we compute s∗1 , the best approximation to f for the reference set
X1 = (0, x∗ , 2). To this end, we proceed as in Example 5.37, where we ﬁrst
determine the required divided diﬀerences for f and ε = (−1, 1, −1) by using
the recursion in Theorem 2.14:
X fX X εX
0 1 0 −1
e2 −1 e2 −3
x∗ 2 2x∗ x∗ 1 2
x∗
e2 +1 (e2 −1)(x∗ −1)+2 2 −1 − 2−x
2
− (2−x2∗ )x∗
2 e2 2(2−x∗ ) 2(2−x∗ )x∗
∗

From this we compute the minimal distance s∗1 − f ∞,X1 = −η1 ≈ 0.7579
by
1
η1 = − (e2 − 1)(x∗ − 1) + 2
4
and the best approximation to f from P1 with respect to · ∞,X1 by

e2 − 4η1 − 3
s∗1 = [0](f − ηX ε) + [0, x∗ ](f − ηX ε)x = 1 + η1 + x.
2x∗
By Proposition 5.44, the Remez algorithm terminates with the reference set
X1 = Es∗1 −f , so that by s∗1 ∈ P1 the unique best approximation to f with
respect to · ∞ is found. Figure 5.5 shows the best approximations s∗j ∈ P1
to f for the reference sets Xj , for j = 0, 1. ♦

5.5 Exercises
Exercise 5.46. Let F = C [−1, 1] be equipped with the maximum norm
· ∞ . Moreover, let f ∈ P3 \ P2 be a cubic polynomial, i.e., has the form

f (x) = a x3 + b x2 + c x + d for x ∈ [−1, 1]

with coeﬃcients a, b, c, d ∈ R, where a = 0.

(a) Compute a best approximation p∗2 ∈ P2 to f from P2 w.r.t. · ∞ .
(b) Is the best approximation p∗2 from (a) unique?

Exercise 5.47. Let P∞ : C [a, b] −→ Pn denote the operator, which assigns

every f ∈ C [a, b] to its best approximation p∗∞ (f ) ∈ Pn from Pn w.r.t. · ∞ ,
i.e.,
P∞ (f ) = p∗∞ (f ) for f ∈ C [a, b].
(a) Show that P∞ is well-deﬁned.
(b) Is P∞ linear or non-linear?
180 5 Chebyshev Approximation

Exercise 5.48. For a compact interval [a, b] ⊂ R, let F = C [a, b] be

equipped with the maximum norm · ∞ . Moreover, let f ∈ C [a, b] \ Pn−1 ,
for n ∈ N. Then, there is a strongly unique best approximation p∗ ∈ Pn−1 to
f from Pn−1 w.r.t. · ∞ and an alternation set X = (x1 , . . . , xn+1 ) ∈ Esn+1
∗ −f

for s∗ and f (see Corollary 5.4). For the dual characterization of the best
approximation p∗ ∈ Pn−1 we use, as in (5.6), a linear functional ϕ ∈ F of
the form

n+1
ϕ(u) = λk εk u(xk ) for u ∈ C [a, b]
k=1

with coeﬃcients λ = (λ1 , . . . , λn+1 )T ∈ Λn+1 and alternating signs

εk = sgn(p∗ − f )(xk ) = σ (−1)k for k = 1, . . . , n + 1.

By these assumptions on ϕ, two conditions of the dual characterization (ac-

cording to Theorem 3.48) are already satisfied, that is (a) ϕ∞ = 1 and
(b) ϕ(p∗ − f ) = p∗ − f ∞ .
Now this problem is concerning condition (c) of the dual characterization
in Theorem 3.48. To this end, consider using divided differences (cf. Defini-
tion 2.10) to construct, from given alternation points

a ≤ x 1 < . . . < xn ≤ b

and σ ∈ {±1}, a coeﬃcient vector λ = (λ1 , . . . , λn+1 )T ∈ Λn+1 satisfying

ϕ(p) = 0 for all p ∈ Pn−1 .

Exercise 5.49. Let F = C [0, 2π] and S = P1 . Moreover, for n ∈ N, let

fn (x) = sin(nx) for x ∈ [0, 2π].

(a) Compute the unique best approximation s∗n ∈ P1 to fn w.r.t. · ∞ .

(b) How many alternation points occur for the error function s∗n − fn in (a)?
But should not there only be three alternation points?

Exercise 5.50. Let F = C [−2, 1] be equipped with the maximum norm

· ∞ . Compute the unique best approximation p∗ ∈ P2 from P2 to the
function f ∈ C [−2, 1], deﬁned as

f (x) = |x + 1| for x ∈ [−2, 1]

with respect to the maximum norm · ∞ .

Moreover, determine the set of extremal points X = Ep∗ −f , along with a
constant K > 0 satisfying

p − f ∞,X ≥ p∗ − f ∞ + K · p − p∗ ∞,X for all p ∈ P2 .

Plot the graphs of f and the best approximation p∗ to f in one ﬁgure.

5.5 Exercises 181

Exercise 5.51. Let F = C [0, 2] be equipped with the maximum norm ·∞ .
Determine the strongly unique best approximation p∗ ∈ P1 from P1 to the
function f ∈ C [0, 2], deﬁned as

f (x) = exp −(x − 1)2 for x ∈ [0, 2]

with respect to the maximum norm · ∞ .

Moreover, determine a constant K > 0 satisfying

p − f ∞ − p∗ − f ∞ ≥ K · p − p∗ ∞ for all p ∈ P1 .

Use this inequality to conclude the uniqueness of the best approximation

p∗ ∈ P1 yet once more.
Exercise 5.52. Let S ⊂ C [a, b] be a Haar space with dim(S) = n + 1 ∈ N.
Prove the Haar condition: If s ∈ S \ {0} has on the interval [a, b] exactly
m zeros from which k zeros are without sign change, then we have m + k ≤ n.
Exercise 5.53. In this problem, let I ⊂ R be a compact set containing
suﬃciently many points, respectively. Analyze whether or not the following
function systems H = (s1 , . . . , sn ) ∈ (C (I))n are a Haar system on I.
(a) H = (x, 1/x) for I ⊂ (0, ∞).
(b) H = (1/(x − c0 ), 1/(x − c1 )) for I ⊂ R \ {c0 , c1 }, where c0 = c1 .
(c) H = (1, x2 , x4 , . . . , x2n ) for I = [−1, 1].
(d) H = (1, x, . . . , xn , g(x)) for a compact interval I = [a, b],
where g ∈ C n+1 [a, b] with g (n+1) ≥ 0 and g (n+1) ≡ 0 on [a, b].
Exercise 5.54. For n ∈ N0 , let Tne be the linear space of all even real-valued
trigonometric polynomials of degree at most n, and let Tno be the linear space
of all odd real-valued trigonometric polynomials of degree at most n.
(a) Show that Tne is a Haar space on the interval [0, π).
(b) Determine the dimension of Tne .
(c) Is Tno a Haar space on the interval [0, π)?
(d) Is Tno a Haar space on the open interval (0, π)?
(e) Determine the dimension of Tno .
Exercise 5.55. Prove the following results.
(a) The functions

s0 (x) = 1, s1 (x) = x cos(x), s2 (x) = x sin(x)

are a Haar system on [0, π].

(b) There is no two-dimensional subspace of

S = span{s0 , s1 , s2 } ⊂ C [0, π],

which is a Haar space on [0, π].

182 5 Chebyshev Approximation

Exercise 5.56. For n ∈ N0 , let S ⊂ C [a, b] be a (n + 1)-dimensional linear

subspace of C [a, b]. Moreover, let S satisfy the weak Haar condition on [a, b],
according to which any s ∈ S has at most n sign changes in [a, b].
Prove the following statements for f ∈ C [a, b].
(a) If there is an alternation set for s ∈ S and f of length n + 2, so that there
are n + 2 pairwise distinct alternation points a ≤ x0 < . . . < xn+1 ≤ b
and one sign σ ∈ {±1} satisfying

(s − f )(xk ) = σ (−1)k s − f ∞ for all k = 0, . . . , n + 1,

then s is a best approximation to f from S with respect to · ∞ .

(b) The converse of statement (a) is false (for the general case).
Exercise 5.57. Let F = C [a, b] and S ⊂ F be a Haar space on [a, b] of
dimension n + 1 containing the constant functions. Moreover, let f ∈ F \ S,
such that 1 2
span S ∪ {f }
is a Haar space on [a, b]. Finally, s∗ ∈ S be the unique best approximation to
f from S with respect to · ∞ .
Show that the error function f − s∗ has exactly n + 2 extremal points

a = x0 < . . . < xn+1 = b,

where f − s∗ is strictly monotone between neighbouring extremal points.

Exercise 5.58. In this programming exercise, we wish to compute for any
n ∈ N the strongly unique best approximation p∗ ∈ Pn−1 to f ∈ C [a, b]\Pn−1
from Pn−1 w.r.t. · ∞,X on a point set X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 , so
that
p∗ − f ∞,X < p − f ∞,X for all p ∈ Pn−1 \ {p∗ }.
To this end, implement a function called mybestpoly with header
[alpha,eta] = mybestpoly(f,X),
which returns on input point set X (of length |X| = n + 1) the Newton
coeﬃcients α = (α0 , . . . , αn−1 ) ∈ Rn of the best approximation

n−1
k
p∗ (x) = αk ωk (x) where ωk (x) = (x − xj ) ∈ Pk for 0 ≤ k ≤ n − 1
k=0 j=1

to f w.r.t. · ∞,X , along with the minimal distance ηX = f − s∗ ∞,X .

Exercise 5.59. To eﬃciently evaluate the best approximation p∗ ∈ Pn−1
from Exercise 5.58, we use the Horner10 scheme (a standard numerical
method, see e.g. [28, Section 5.3.3].
10
William George Horner (1786-1837), English mathematician
5.5 Exercises 183

To this end, implement a function called mynewtonhorner with header

[p] = mynewtonhorner(X,alpha,x),
which returns on an input point set X = {x1 , . . . , xn+1 } ⊂ [a, b], Newton
coeﬃcients α = (α0 , . . . , αn−1 ) ∈ Rn and x ∈ R the value

n−1
p(x) = αk ωk (x) ∈ Pn−1 ,
k=0

where the evaluation of p at x should rely on the Horner scheme.

Exercise 5.60. Implement the Remez exchange, Algorithm 8. To this end,

write a function called myremezexchange with header
[X] = myremezexchange(X,epsilon,x),
which returns, on input reference set X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 , an
extremal point x = x∗ ∈ [a, b] \ X satisfying

|(p∗ − f )(x∗ )| = p∗ − f ∞

and a sign vector ε = (ε1 , ε2 ) ∈ {±1}2 satisfying

ε1 = sgn(p∗ − f )(x1 ) and ε2 = sgn(p∗ − f )(x∗ )

the updated reference set X+ (as output by Algorithm 8), i.e.,

X+ = (X \ {xj }) ∪ {x∗ } for one 1 ≤ j ≤ n + 1.

Exercise 5.61. Implement the Remez algorithm, Algorithm 9. To this end,

write a function myremez with header
[alpha,eta,X,its] = myremez(f,X),
which returns, on input function f ∈ C [a, b]\Pn−1 and an initial reference set
X = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 , the Newton coeﬃcients α = (α0 , . . . , αn−1 )
of the (strongly unique) best approximation p∗ ∈ Pn−1 to f from Pn−1
w.r.t. · ∞ , the minimal distance η = p∗ − f ∞ , a set of alternation points
X ⊂ Ep∗ −f , and the number its of the performed Remez iterations. For
your implementation, use the functions mybestpoly (from Exercise 5.58),
mynewtonhorner (Exercise 5.59) and myremezexchange (Exercise 5.60).
Verify your function myremez by using the following examples.
√
(a) f (x) = 3 x, [a, b] = [0, 1], X = 0, 21 , 34 , 1 ;
(b) f (x) = sin(5x) + cos(6x), [a, b] = [0, π], X = 0, 12 , 32 , 52 , π .
184 5 Chebyshev Approximation

Exercise 5.62. Analyze for the case S = Pn−1 the asymptotic computa-
tional complexity for only one iteration of the Remez algorithm, Algorithm 9.
(a) Determine the costs for the minimal distance ηk = s∗k − f ∞,Xk .
Hint: Use divided differences (according to Proposition 5.35).
(b) Determine the costs for computing the Newton coefficients of s∗k .
Hint: Reuse the divided differences from (a).
(c) Sum up the required asymptotic costs in (a) and (b).

How do you eﬃciently compute the update ηk+1 from information that is
required to compute ηk ?

Exercise 5.63. Assuming the notations of the Remez algorithm, Algorithm 9,

we consider the (global) distance

ρk = s∗k − f ∞ for k ∈ N0

between f ∈ C [a, b] and the current best approximation s∗k ∈ S to f , for the
(k) (k)
current reference set Xk = (x1 , . . . , xn+1 ) ∈ [a, b]n+1 and w.r.t. · ∞,Xk .
Show that the sequence (ρk )k∈N0 is not necessarily strictly increasing. To
this end, construct a simple (but non-trivial) counterexample.
6 Asymptotic Results

In this chapter, we prove asymptotic statements to quantify the convergence

behaviour of both algebraic and trigonometric approximation by partial sums.
For the trigonometric case, the analysis of Fourier partial sums plays a
central role. Recall that we have studied Fourier partial sums,

(f, 1)
n
(Fn f )(x) = + [(f, cos(j·)) cos(jx) + (f, sin(j·)) sin(jx)] ,
2 j=1

for f ∈ C2π , already in Chapter 4: According to Corollary 4.12, Fn f is the

unique best approximation to f from the linear space Tn of trigonometric
polynomials of degree at most n ∈ N0 with respect to the Euclidean norm ·.
As we proceed in this chapter, we will analyze the asymptotic behaviour
of the minimal distances with respect to both the Euclidean norm · ,

η(f, Tn ) := inf T − f = Fn f − f for n → ∞,

T ∈Tn

and with respect to the maximum norm · ∞ . To this end, we ﬁrst show for
continuous functions f ∈ C2π convergence of Fn f to f with respect to · ,
and then we prove convergence rates, for f ∈ C2π
k
, k ∈ N0 , of the form

η(f, Tn ) = o(n−k ) for n → ∞.

Finally, we analyze the uniform convergence of Fourier partial sums, i.e., we

study the asymptotic behaviour of the distances

Fn f − f ∞ for n → ∞.

In this chapter, we prove the following classical results of approximation:

• The Weierstrass theorem, according to which any function f ∈ C2π can,
w.r.t. ·∞ , be approximated arbitrarily well by trigonometric polynomials.
• The Jackson inequalities, which allow us to quantify the asymptotic
behaviour of the minimal distances

η∞ (f, Tn ) := inf T − f ∞ for n → ∞.

T ∈Tn

Likewise, we will also discuss the algebraic case for the approximation to
f ∈ C [a, b] by partial sums Pn f from Pn .

© Springer Nature Switzerland AG 2018 185

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_6
186 6 Asymptotic Results

6.1 The Weierstrass Theorem

We analyze the following two fundamental questions of approximation:
Question 1: Can we approximate any function f ∈ C [a, b] on a compact inter-
val [a, b] ⊂ R with respect to · ∞ arbitrarily well by algebraic polynomials?
Question 2: Can we approximate any continuous 2π-periodic function f ∈ C2π
with respect to · ∞ arbitrarily well by trigonometric polynomials?
Not too surprisingly, the two questions are related. In fact, a positive ans-
wer to both questions was given already in 1885 by Weierstrass1 , who also
discovered the intrinsic relation between the two problems of these questions.
As we show in this section, the answer to the trigonometric case (Question 2)
can be concluded from the solution for the algebraic case (Question 1). The
solutions given by Weierstrass were celebrated as the birth of approximation.
In the following discussion, we will be more precise about the above two
questions. To this end, we need only a few preparations.

Deﬁnition 6.1. Let F be a normed linear space with norm · . Then a

subset S ⊂ F is said to lie dense in F with respect to · , if there exists,
for any f ∈ F and any ε > 0, an element s ≡ s(f, ε) ∈ S satisfying

s − f < ε.

Now we can give a more concise formulation for the above two questions.
• Are the algebraic polynomials P dense in C [a, b] with respect to · ∞ ?
• Are the trigonometric polynomials T dense in C2π with respect to · ∞ ?

Remark 6.2. If S ⊂ F is dense in F with respect to ·, then the topological

closure S of S (with respect to · ) coincides with F , i.e.,

S = F,

or, in other words: For any f ∈ F, there is a convergent sequence (sn )n∈N in
S with limit f , so that sn − f −→ 0 for n → ∞.

Remark 6.3. For a linear subspace S ⊂ F, S = F, Deﬁnition 6.1 does only

make sense, if S is inﬁnite-dimensional. Otherwise, if S = F is only ﬁnite-
dimensional, then there is, according to Corollary 3.8, for any f ∈ F \ S a
best approximation s∗ ∈ S to f at a positive minimal distance η(f, S) > 0,
i.e., f cannot be approximated arbitrarily well by elements from S, since the
closest distance between f and S is η(f, S). In this case, S is not dense in F.

1
Karl Weierstrass (1815-1897), German mathematician
6.1 The Weierstrass Theorem 187

Example 6.4. The set Q of rational numbers is dense in the set R of real
numbers with respect to the absolute-value function | · |. ♦

Now let us turn to the Weierstrass theorems, for which there exist many
diﬀerent proofs (see, e.g. [33]). Our constructive proof for the algebraic case of
the Weierstrass theorem relies on a classical account via Korovkin sequences.

Deﬁnition 6.5. A sequence (Kn )n∈N of linear and monotone operators

Kn : C [a, b] −→ C [a, b] is called a Korovkin2 sequence on C [a, b], if

lim Kn p − p∞ = 0 for all p ∈ P2 .

n→∞

To further explain the utilized terminology, we recall a standard charac-

terization for monotone linear operators.

Remark 6.6. A linear operator K : C [a, b] −→ C [a, b] is monotone on

C [a, b], if and only if K is positive on C [a, b], i.e., the following two statements
are equivalent.
(a) For any f, g ∈ C [a, b] satisfying f ≤ g, we have Kf ≤ Kg;
(b) For any f ∈ C [a, b] satisfying f ≥ 0, we have Kf ≥ 0;
where all inequalities in (a) and (b) are taken pointwise on [a, b].

Next, we study an important special case for a Korovkin sequence. To

this end, we restrict ourselves to the continuous functions C [0, 1] on the unit
interval [0, 1]. This is without loss of generality, since otherwise, i.e., for any
other compact interval [a, b] ⊂ R, we may apply the aﬃne-linear mapping
x −→ (x − a)/(b − a), for x ∈ [a, b].
Now we consider the Bernstein3 polynomials

(n) n j
βj (x) = x (1 − x)n−j ∈ Pn for 0 ≤ j ≤ n. (6.1)
j

Let us note a few elementary properties of Bernstein polynomials.

(n) (n)
Remark 6.7. The Bernstein polynomials β0 , . . . , βn ∈ Pn , for n ∈ N0 ,
(a) form a basis for the polynomial space Pn ,
(n)
(b) are positive on [0, 1], i.e., βj (x) ≥ 0 for all x ∈ [0, 1],
(c) are on [0, 1] a partition of unity, i.e.,

n
(n)
βj (x) = 1 for all x ∈ [0, 1].
j=0
2
Pavel Petrovich Korovkin (1913-1985), Russian mathematician
3
Sergei Natanovich Bernstein (1880-1968), Russian mathematician
188 6 Asymptotic Results

Note that property (c) holds by the binomial theorem, whereas properties (a)
and (b) can be verified by elementary calculations (cf. Exercise 6.83).
By using the Bernstein polynomials in (6.1) we can make an important
example for monotone linear operators on C [0, 1].
Definition 6.8. For n ∈ N, the Bernstein operator Bn : C [0, 1] −→ Pn
is defined as

n
(n)
(Bn f )(x) = f (j/n)βj (x) for f ∈ C [0, 1], (6.2)
j=0

(n) (n)
where β0 , . . . , βn ∈ Pn are the Bernstein polynomials in (6.1).
The Bernstein operators Bn are obviously linear on C [0, 1]. By the posi-
(n)
tivity of the Bernstein polynomials βj , Remark 6.7 (b), the Bernstein ope-
rators Bn are, moreover, positive (and therefore monotone) on C [0, 1]. We
note yet another elementary property of the operators Bn .
Remark 6.9. The Bernstein operators Bn : C [0, 1] −→ Pn in (6.2) are
bounded on C [0, 1] with respect to · ∞ , since for any f ∈ C [0, 1], we have

n
n (n)

Bn f ∞ =
(n)
f (j/n)βj (x) ≤ f ∞ βj (x)
= f ∞
j=0 j=0
∞ ∞

and so
Bn f ∞ ≤ f ∞ for all f ∈ C [0, 1].
In particular, by transferring the result of Theorem 3.45 from linear func-
tionals to linear operators, we can conclude that the Bernstein operators
Bn : C [0, 1] −→ Pn are continuous on C [0, 1].
Now we prove the Korovkin property for the Bernstein operators.
Theorem 6.10. The sequence of Bernstein operators Bn : C [0, 1] −→ Pn ,
for n ∈ N, is a Korovkin sequence on C [0, 1].
Proof. The Bernstein operators Bn , n ∈ N, reproduce linear polynomials.
Indeed, on the one hand, we have Bn 1 ≡ 1, for all n ∈ N, by the partition of
unity, according to Remark 6.7 (c). On the other hand, we ﬁnd for p1 (x) = x
the identity Bn p1 = p1 , for any n ∈ N, since we get
n n
j n j n−1 j
(Bn p1 )(x) = x (1 − x) n−j
= x (1 − x)n−j
j=0
n j j=1
j − 1

n−1
n−1 j
=x x (1 − x)n−j−1 = x.
j=0
j
6.1 The Weierstrass Theorem 189

According to Deﬁnition 6.5, it remains to show the uniform convergence

lim Bn p2 − p2 ∞ = 0
n→∞

for the quadratic monomial p2 (x) = x2 . To this end, we apply the Bernstein
operators Bn to the sequence of functions
n x
fn (x) = x2 − ∈ P2 for n ≥ 2,
n−1 n−1
where for n ≥ 2 we have
n 2

n j n j
(Bn fn )(x) = − xj (1 − x)n−j
j=0
j n n − 1 n(n − 1)
2

n
n! j(j − 1) j
= x (1 − x)n−j
j=0
(n − j)!j! n(n − 1)

n
(n − 2)!
= xj (1 − x)n−j
j=2
(n − j)!(j − 2)!

n−2
n−2 j
= x2 x (1 − x)n−j−2 = p2 (x).
j=0
j

Together with the boundedness of the Bernstein operators Bn (according to

Remark 6.9), this ﬁnally implies
Bn p2 − p2 ∞ = Bn (p2 − fn )∞ ≤ p2 − fn ∞ ,
whereby through p2 − fn ∞ −→ 0, for n → ∞, the statement is proven.
The following result of Korovkin is of fundamental importance.
Theorem 6.11. (Korovkin, 1953). For a compact interval [a, b] ⊂ R, let
(Kn )n∈N be a Korovkin sequence on C [a, b]. Then, we have
lim Kn f − f ∞ = 0 for all f ∈ C [a, b]. (6.3)
n→∞

Proof. Suppose f ∈ C [a, b]. Then, f is bounded on [a, b], i.e., there is some
M > 0 with f ∞ ≤ M . Moreover, f is uniformly continuous on the compact
interval [a, b], i.e., for any ε > 0 there is some δ > 0 satisfying
|x − y| < δ =⇒ |f (x) − f (y)| < ε/2 for all x, y ∈ [a, b].
Now let t ∈ [a, b] be ﬁxed. Then, we have for x ∈ [a, b] the two estimates
2
ε x−t ε 2M
f (x) − f (t) ≤ + 2M = + 2 x2 − 2xt + t2
2 δ 2 δ
2
ε x−t ε 2M
f (x) − f (t) ≥ − − 2M = − − 2 x2 − 2xt + t2 ,
2 δ 2 δ
190 6 Asymptotic Results

where ε, δ and M are independent of x. If we apply the linear and monotone

operator Kn , for n ∈ N, to both sides of these inequalities (with respect to
variable x), then this implies

(Kn f )(x) − f (t)(Kn 1)(x) ≤

ε 2M
(Kn 1)(x) + 2 (Kn x2 )(x) − 2t(Kn x)(x) + t2 (Kn 1)(x)
2 δ
(Kn f )(x) − f (t)(Kn 1)(x) ≥
ε 2M
− (Kn 1)(x) − 2 (Kn x2 )(x) − 2t(Kn x)(x) + t2 (Kn 1)(x)
2 δ
for all x ∈ [a, b]. Therefore, we have the estimate

|(Kn f )(x) − f (t)(Kn 1)(x)| ≤

ε 2M
|(Kn 1)(x)| + 2 |(Kn x2 )(x) − 2t(Kn x)(x) + t2 (Kn 1)(x)|. (6.4)
2 δ
By assumption, there is for any ε̃ > 0 some N ≡ N (ε̃) ∈ N satisfying

(Kn xk ) − xk ∞ < ε̃ for k = 0, 1, 2,

for all n ≥ N . This in particular implies

|(Kn 1)(x)| ≤ Kn 1∞ = (Kn 1 − 1) + 1∞ ≤ ε̃ + 1 (6.5)

as well as

|(Kn x2 )(x) − 2t(Kn x)(x) + t2 (Kn 1)(x)| =

|((Kn x2 )(x) − x2 ) − 2t((Kn x)(x) − x) + t2 ((Kn 1)(x) − 1) + x2 − 2tx + t2 |
≤ ε̃(1 + 2|t| + t2 ) + (x − t)2 (6.6)

for all n ≥ N . From (6.4), (6.5) and (6.6), we obtain the estimate

|(Kn f )(x) − f (t)| ≤ |(Kn f )(x) − f (t)(Kn 1)(x)| + |f (t)(Kn 1)(x) − f (t)|
ε 2M
≤ (ε̃ + 1) + 2 ε̃(1 + 2|t| + t2 ) + (x − t)2 + M ε̃,
2 δ
where for x = t, the inequality
ε 2M
|(Kn f )(t) − f (t)| ≤ (ε̃ + 1) + 2 ε̃(1 + 2|t| + t2 ) + M ε̃ (6.7)
2 δ
follows for all n ≥ N .
Now the right hand side in (6.7) can uniformly be bounded from above
by an arbitrarily small ε̂ > 0, so that we have, for some N ≡ N (ε̂) ∈ N,

Kn f − f ∞ < ε̂ for all n ≥ N.

This proves the uniform convergence in (6.3), as stated.

6.1 The Weierstrass Theorem 191

Now we can prove the density theorem of Weierstrass.

Corollary 6.12. (Weierstrass theorem for algebraic polynomials).
The algebraic polynomials P are, w.r.t. the maximum norm · ∞ on a com-
pact interval [a, b] ⊂ R, dense in C [a, b]. In particular, any f ∈ C [a, b] can,
w.r.t. · ∞ , be approximated arbitrarily well by algebraic polynomials, i.e.,
for any f ∈ C [a, b] and ε > 0, there is a polynomial p ∈ P satisfying
p − f ∞ < ε.
Proof. We use the Bernstein operators (Bn )n∈N , which are a Korovkin se-
quence on C [0, 1]. Suppose f ∈ C [0, 1] and ε > 0. Then, according to the
Korovkin theorem, there is one n ≡ n(ε) ∈ N, satisfying Bn f − f ∞ < ε. By
p = Bn f ∈ Pn ⊂ P, the statement follows immediately from Theorem 6.11.

Note that the Weierstrass theorem gives a positive answer to Question 1,
as posed at the outset of this section. Next, we specialize the density theorem
of Weierstrass, Corollary 6.12, to even (or odd) functions.
Corollary 6.13. Any even continuous function f ∈ C [−1, 1] can, w.r.t. the
norm ·∞ , be approximated arbitrarily well by an even algebraic polynomial.
Likewise, any odd continuous function f ∈ C [−1, 1] can, with respect to
· ∞ , be approximated arbitrarily well by an odd algebraic polynomial.
Proof. Let f ∈ C [−1, 1] be even and ε > 0. Then, due to the Weierstrass
theorem, Corollary 6.12, and Proposition 3.42, there is an even algebraic
polynomial p ∈ P satisfying p − f ∞ < ε. Likewise, for odd f ∈ C [−1, 1],
the second statement follows by similar arguments (cf. Exercise 3.73).
Now from our observations in Corollary 6.13, we wish to conclude a corres-
ponding density result for the case of trigonometric polynomials T ⊂ C2π . In
preparation, we ﬁrst prove two lemmas.
Lemma 6.14. The linear space of real-valued trigonometric polynomials
8
1
T = spanR √ , cos(jx), sin(jx) j ∈ N
2
is a unital commutative algebra over R. In particular, T is closed under the
multiplication, i.e., the product of two real-valued trigonometric polynomials
is a real-valued trigonometric polynomial.
Proof. The statement follows directly from the trigonometric addition for-
mulas, in particular from the representations (4.16)-(4.18), for j, k ∈ Z, i.e.,
2 cos(jx) cos(kx) = cos((j − k)x) + cos((j + k)x)
2 sin(jx) sin(kx) = cos((j − k)x) − cos((j + k)x)
2 sin(jx) cos(kx) = sin((j − k)x) + sin((j + k)x).
The remaining properties for a unital commutative algebra T are trivial.
192 6 Asymptotic Results

Remark 6.15. Let p ∈ P be an algebraic polynomial. Then,

p(sin(jx) cos(kx)) ∈ T for j, k ∈ N0

is a trigonometric polynomial. Moreover, every trigonometric polynomial

p(cos(kx)) ∈ T for k ∈ N0

is an even function.

We now show that the even trigonometric polynomials are, with respect
to the maximum norm · ∞ , dense in C [0, π].

Lemma 6.16. For any f ∈ C [0, π] and ε > 0, there is one even trigono-
metric polynomial Tg ∈ T satisfying

Tg − f ∞ < ε.

Proof. Suppose f ∈ C [0, π]. Then, g(t) = f (arccos(t)) ∈ C [−1, 1]. Therefore,
according to the Weierstrass theorem, Corollary 6.12, there is one algebraic
polynomial p ∈ P satisfying p − g∞,[−1,1] < ε. This implies

p(cos(·)) − f ∞,[0,π] = p − g∞,[−1,1] < ε

with the (bijective) variable transformation x = arccos(t), or, t = cos(x).

Letting Tg (x) = p(cos(x)) ∈ T , our proof is complete.

Now we transfer the Weierstrass theorem for algebraic polynomials, Corol-

lary 6.12, to the case of trigonometric polynomials. To this end, we consider
the linear space C2π ⊂ C (R) of all continuous 2π-periodic target functions.
Due to the periodicity of the elements in C2π , we can restrict ourselves to the
compact interval [0, 2π].
This results in the formulation of the Weierstrass density theorem.

Corollary 6.17. (Weierstrass theorem for the trigonometric case).

The trigonometric polynomials T are, w.r.t. the maximum norm ·∞ , dense
in C2π . In particular, any function f ∈ C2π can, w.r.t. ·∞ , be approximated
arbitrarily well by trigonometric polynomials, i.e., for any f ∈ C2π and ε > 0
there is a trigonometric polynomial Tf ∈ T satisfying Tf − f ∞ < ε.

Proof. Any f ∈ C2π can be decomposed as a sum

1 1
f (x) = (f (x) + f (−x)) + (f (x) − f (−x)) = fe (x) + fo (x)
2 2
of an even function fe ∈ C2π and an odd function fo ∈ C2π . Now the two
even functions

fe (x) and ge (x) = sin(x)fo (x)

6.1 The Weierstrass Theorem 193

can be approximated arbitrarily well on [0, π] by even trigonometric polyno-

mials Tfe , Tge ∈ T , so that we have

Tfe − fe ∞ = Tfe − fe ∞,[−π,π] = Tfe − fe ∞,[0,π] < ε/4

Tge − ge ∞ = Tge − ge ∞,[−π,π] = Tge − ge ∞,[0,π] < ε/4.

Therefore, we have, everywhere on R, the representations

fe = Tfe + ηfe and ge = Tge + ηge

with (even) error functions ηfe , ηge ∈ C2π , where ηfe ∞ , ηge ∞ < ε/4.
From these two representations, we obtain the identity

sin2 (x)f (x) = sin2 (x)(fe (x) + fo (x))

= sin2 (x)Tfe (x) + sin(x)Tge (x) + sin2 (x)ηfe (x) + sin(x)ηge (x)
= Tfs (x) + ηfs (x),

where

Tfs (x) = sin2 (x)Tfe (x) + sin(x)Tge (x) ∈ T

ηfs (x) = sin2 (x)ηfe (x) + sin(x)ηge (x) with ηfs ∞ < ε/2.

Using similar arguments we can derive, for the phase-shifted function

f˜(x) = f (x + π/2) ∈ C2π ,

a representation of the form

sin2 (x)f˜(x) = Tfs˜(x) + ηfs˜(x) with ηfs˜∞ < ε/2

with Tfs˜ ∈ T , so that after reversion of the translation x −→ x − π/2, we

have

cos2 (x)f (x) = Tfs˜(x − π/2) + ηfs˜(x − π/2) = Tfc (x) + ηfc (x) with ηfc ∞ < ε/2,

where Tfc (x) ∈ T . By summation of the two representations for f , we obtain

f (x) = Tfs (x) + Tfc (x) + ηfs (x) + ηfc (x) = Tf (x) + ηf (x) with ηf ∞ < ε

the stated estimate

Tf − f ∞ < ε
for the so constructed trigonometric polynomial Tf = Tfs + Tfc ∈ T .
This gives a positive answer to Question 2 from the outset of this section.
Finally, we remark that the maximum norm ·∞ is in the following sense
stronger than any p-norm · p , 1 ≤ p < ∞.
194 6 Asymptotic Results

Corollary 6.18. The algebraic polynomials P are, w.r.t. any p-norm · p ,

1 ≤ p < ∞, and for compact [a, b] ⊂ R, dense in C [a, b]. Likewise, the
trigonometric polynomials T are, w.r.t. ·p , dense in C2π for all 1 ≤ p < ∞.

Proof. For f ∈ C [a, b] and ε > 0 there is one p ∈ P satisfying p − f ∞ < ε.

This immediately implies the estimate
b
p−f pp = |p(x)−f (x)|p dx ≤ (b−a)p−f p∞ < (b−a)εp for 1 ≤ p < ∞,
a

i.e., any f ∈ C [a, b] can, w.r.t. · p , be approximated arbitrarily well by

algebraic polynomials. The case of trigonometric polynomials T , i.e., the
second statement, can be covered by using similar arguments.

Remark 6.19. Corollary 6.18 states that convergence in the maximum norm
· ∞ implies convergence in any p-norm · p , 1 ≤ p < ∞. The converse,
however, does not hold in general. In this sense, the maximum norm · ∞
is the strongest among all p-norms, for 1 ≤ p ≤ ∞.

A corresponding statement holds for weighted Euclidean norms.

Corollary 6.20. Let w : (a, b) −→ (0, ∞) be a continuous and integrable

weight function, so that w deﬁnes on C [a, b], for compact [a, b] ⊂ R, the
inner product
b
(f, g)w = f (x)g(x)w(x) dx for f, g ∈ C [a, b] (6.8)
a

1/2
and the Euclidean norm · w = (·, ·)w . Then, any function f ∈ C [a, b] can,
w.r.t. · w , be approximated arbitrarily well by algebraic polynomials, i.e.,
the polynomial space P is, with respect to · w , dense in C [a, b].

Proof. For f ∈ C [a, b], we have

b b
f 2w = |f (x)|2 w(x) dx ≤ f 2∞ w(x) dx = Cw f 2∞ ,
a a
√
where Cw = 1w < ∞. Now let ε > 0 and p ∈ P with p − f ∞ < ε/ Cw .
Then, 9
p − f w ≤ Cw p − f ∞ < ε,
i.e., f can, with respect to · w , be approximated arbitrarily well by p ∈ P.

6.2 Complete Orthogonal Systems and Riesz Bases 195

6.2 Complete Orthogonal Systems and Riesz Bases

We recall the notion and properties of orthogonal (and orthonormal) systems
from Section 4.2. In the following discussion, we consider a Euclidean space
F with inner product (·, ·) and norm · = (·, ·)1/2 . Moreover, let Sn ⊂ F be
a ﬁnite-dimensional linear subspace of dimension dim(Sn ) = n ∈ N with an
(ordered) orthogonal basis (sj )nj=1 in Sn , so that the orthogonality relation

(sj , sk ) = δjk · sj 2 for 1 ≤ j, k ≤ n

holds. According to Theorem 4.5, the unique best approximation to f ∈ F is

given by the orthogonal projection

n
(f, sj )
Πn f = s j ∈ Sn (6.9)
j=1
sj 2

of f onto Sn , obtained by the orthogonal projection operator Πn : F −→ Sn .

In the following discussion, we investigate approximation properties of the
partial sums Πn f in (6.9), where our particular interest is placed on their
asymptotic behaviour. To this end, we analyze convergence for the sequence
(Πn f )n∈N , for n → ∞, where we link to our discussion in Section 4.2. On this
occasion, we recall the Pythagoras theorem (4.6), the Bessel inequality (4.12),
and the Parseval identity (4.10), or, (4.11), according to which we have

n
|(f, sj )|2
Πn f 2 = for all f ∈ F. (6.10)
j=1
sj 2

6.2.1 Complete Orthogonal Systems

We wish to transfer our results from Section 4.2 to inﬁnite (countable and
ordered) orthogonal systems (and orthonormal systems) (sj )j∈N in F. Our
ﬁrst result on this is based on the following characterization.
Theorem 6.21. Let (sj )j∈N be an orthogonal system in a Euclidean space
F with inner product (·, ·) and norm · = (·, ·)1/2 . Then, the following
statements are equivalent.
(a) The span of (sj )j∈N is dense in F, i.e., F = span{sj | j ∈ N}.
(b) For any f ∈ F the sequence (Πn f )n∈N of partial sums Πn f in (6.9)
converges to f with respect to the norm · , i.e.,

Πn f −→ f for n → ∞. (6.11)

(c) For any f ∈ F we have the Parseval identity

∞
|(f, sj )|2
f 2 = . (6.12)
j=1
sj 2
196 6 Asymptotic Results

Proof. For any f ∈ F, the n-th partial sum Πn f is the unique best approxi-
mation to f from Sn = span{s1 , . . . , sn } with respect to · .
(a) ⇒ (b): Suppose for f ∈ F and ε > 0, there is one N ∈ N and sN ∈ SN
satisfying sN − f < ε. Then, we have for n ≥ N

Πn f − f = inf s − f ≤ inf s − f ≤ sN − f < ε,

s∈Sn s∈SN

and so the sequence (Πn f )n∈N converges, with respect to · , to f , i.e.,

Πn f − f −→ 0 for n → ∞,

or, in short, Πn (f ) −→ f for n → ∞.

(b) ⇒ (c): Suppose the sequence (Πn f )n∈N of the partial sums Πn f con-
verges to f ∈ F, so that Πn f −f −→ 0 for n → ∞. Then, by the Pythagoras
theorem,
f 2 = Πn f − f 2 + Πn f 2 , (6.13)
in combination with the Parseval identity (6.10) we obtain, for n → ∞, the
representation
∞
|(f, sj )|2
f = lim Πn f =
2 2
.
n→∞
j=1
sj 2

n
|(f, sj )|2
Πn f − f 2 = f 2 − −→ 0 for n → ∞
j=1
sj 2

and so there is, for any ε > 0, one N ≡ N (ε) satisfying ΠN f − f < ε.

Deﬁnition 6.22. An orthogonal system (sj )j∈N satisfying one of the proper-
ties (a), (b), or (c) in Theorem 6.21 (and so all three properties), is called a
complete orthogonal system in F. The notion of a complete orthonormal
system is deﬁned accordingly.

Remark 6.23. For a complete orthogonal system (sj )j∈N in F we have,

according to property (b) in Theorem 6.21, the series representation
∞
(f, sj )
f= sj for f ∈ F (6.14)
j=1
sj 2

by convergence of (Πn f )n∈N to f with respect to · . The series in (6.14)

is often referred to as (generalized) Fourier series of f with (generalized)
Fourier coeﬃcients (f, sj )/sj 2 .

From the equivalence in Theorem 6.21, we can conclude a useful result.

6.2 Complete Orthogonal Systems and Riesz Bases 197

Corollary 6.24. Under the assumptions in Theorem 6.21, we have

∞
|(f, sj )|2
Πn f − f 2 = for all f ∈ F (6.15)
j=n+1
sj 2

for the representation of the squared error norm of Πn (f ) − f .

Proof. The representation (6.15) follows from property (c) in Theorem 6.21
by the Pythagoras theorem (6.13) and the Parseval identity (6.10).

By using the Weierstrass density theorems for algebraic and trigonometric

polynomials, in Corollaries 6.12 and 6.17, we can give examples for complete
orthogonal systems.
Our ﬁrst example draws a link to Corollary 6.20.

Example 6.25. Let w : (a, b) −→ (0, ∞) be a continuous weight function,

so that w deﬁnes on C [a, b], for compact [a, b] ⊂ R, an inner product (·, ·)w ,
see (6.8). Moreover, suppose (pj )j∈N0 is a sequence of orthogonal polynomials
with respect to (·, ·)w (cf. our construction in Theorem 4.16). Then, (pj )j∈N0
is a complete orthogonal system in C [a, b] with respect to the Euclidean
1/2
norm · w = (·, ·)w . Indeed, this is because the algebraic polynomials P
are, according to the Weierstrass theorem, Corollary 6.12, dense in C [a, b]
with respect to the maximum norm · ∞ , and so, by Corollary 6.20, P is
also dense in C [a, b] with respect to · w . ♦

Next, we prove a useful criterion for the completeness of systems (sj )j∈N
in Hilbert spaces F, in particular for the completeness of orthogonal systems.

Theorem 6.26. (Completeness criterion). For a system (sj )j∈N of ele-

ments in a Hilbert space F, the following statements are equivalent.
(a) The system (sj )j∈N is complete in F, i.e., F = span{sj | j ∈ N}.
(b) If f ∈ F is orthogonal to all elements sj , then f = 0, i.e., we have the
implication
(f, sj ) = 0 for all j ∈ N =⇒ f = 0.

Proof. Without loss of generality, we suppose that (sj )j∈N is an orthonormal

system in F. Otherwise, we can choose a subsequence (sjk )k∈N of linearly in-
dependent elements, which we then orthonormalize (as in the Gram-Schmidt
algorithm, Algorithm 4). In the following, we use the notation

S := span{sj | j ∈ N} ⊂ F

for the closure of span{sj | j ∈ N} in F and, moreover,

S ⊥ := {u ∈ F | (u, s) = 0 for all s ∈ S} ⊂ F

198 6 Asymptotic Results

for the orthogonal complement of S in F, so that F = S ⊕ S ⊥ .

(a) ⇒ (b): Let (sj )j∈N be complete in F. Then, the Parseval identity (6.12)
holds according to Theorem 6.21. From this, we see that (f, sj ) = 0, for all
j ∈ N, implies f = 0, and so f = 0.
(b) ⇒ (a): Let f ∈ F satisfy (f, sj ) = 0 for all j ∈ N. In this case,
we have f ∈ S ⊥ by the linearity and the continuity of the inner product.
Conversely, for f ∈ S ⊥ the orthogonality relation (f, sj ) = 0 holds for all
j ∈ N. Therefore, f is contained in S ⊥ , if and only if (f, sj ) = 0 for all j ∈ N.
With the assumed implication in (b), we have S ⊥ = {0} and so S = F.

6.2.2 Riesz Bases and Frames

Next, we extend the concept of complete orthonormal systems. To this end,

we ﬁx a Hilbert space F with inner product (·, ·) and norm · = (·, ·)1/2 . In
the following discussion, we regard systems (sn )n∈Z with the bi-inﬁnite index
set Z. Recall that for a complete orthonormal system (sn )n∈Z in F, we have,
for any f ∈ F, the series representation

f= (f, sn )sn
n∈Z

according to Remark 6.23. Moreover, the Parseval identity in (6.12) holds,

which we represent as

f = ((f, sn ))n∈Z 2 , (6.16)

where 2 denotes the linear space of all square summable sequences with
indices in Z (cf. Remark 3.15).

Deﬁnition 6.27. A system B = (un )n∈Z of elements in a Hilbert space F is

called a Riesz4 basis of F, if the following properties are satisﬁed.
(a) The span of B is dense in F, i.e.,

F = span{un | n ∈ Z}. (6.17)

(b) There are constants 0 < A ≤ B < ∞ satisfying

2

Ac22 ≤ cn un ≤ Bc22 for all c = (cn )n∈Z ∈ 2
. (6.18)

n∈Z

For a Riesz basis B, the “best possible” constants, i.e., the largest A and the
smallest B satisfying (6.18), are called Riesz constants of B.
4
Frigyes Riesz (1880-1956), Hungarian mathematician
6.2 Complete Orthogonal Systems and Riesz Bases 199

Remark 6.28. Every complete orthonormal system in F is a Riesz basis

of F. Indeed, in this case, we have the Parseval identity in (6.16), whereby
equality holds in (6.18) for A = B = 1. Moreover, the completeness in (6.17)
holds by Theorem 6.21 (a).
We remark that the Riesz estimates in (6.18), often written in short as

cn un ∼ c2 for all c = (cn )n∈Z ∈ 2 ,

n∈Z

describe the stability of the Riesz basis representation with respect to per-
turbations of the coeﬃcients in c ∈ 2 . Therefore, Riesz bases are also often
referred to as 2 -stable bases of F.

In the following analysis concerning Riesz bases B = (un )n∈Z of F, the

linear synthesis operator G : 2 −→ F, deﬁned as

G(c) = c n un ∈ F for c = (cn )n∈Z ∈ 2 , (6.19)
n∈Z

plays an important role. We note the following properties of G.

Proposition 6.29. Let B = (un )n∈Z be a Riesz basis of F with Riesz con-
stants 0 < A ≤ B < ∞. Then, the synthesis operator G : 2 −→ F in (6.19)
has the following properties.
√
(a) The operator G is continuous, where G has operator norm G = B.
(b) The operator G is bijective. √
(c) The inverse G−1 of G is continuous with operator norm G−1 = 1/ A.

Proof. Statement (a) follows directly from the upper Riesz estimate in (6.18).
As for the proof of (b), note that G is surjective, since span{un | n ∈ Z}
is by (6.17) dense in F. Moreover, G is injective, since by (6.18) the kernel of
G can only contain the zero element. Altogether, the operator G is bijective.
Finally, for the inverse G−1 : F −→ 2
of G we ﬁnd by (6.18) the estimate
1
G−1 (f )2 ≤ f 2 for all f ∈ F
A
√
and this implies the continuity of G−1 at operator norm G−1 = 1/ A.
This proves property (c).

Now we consider the dual analysis operator G∗ : F −→ 2

of G in
(6.19), where G∗ is characterized by the duality relation

(G∗ (f ), c)2 = (f, G(c)) for all c ∈ 2

and all f ∈ F. (6.20)

We note the following properties of G∗ .

200 6 Asymptotic Results

Proposition 6.30. The pair of dual operators G in (6.19) and G∗ in (6.20)

satisﬁes the following properties.
(a) The operator G∗ has the representation

G∗ (f ) = ((f, un ))n∈Z ∈ 2
for all f ∈ F.

(b) The operator G∗ is bijective and has the inverse (G∗ )−1 = (G−1 )∗ .
(c) The operators G∗ and (G∗ )−1 are continuous via the isometries

G = G∗ and G−1 = (G∗ )−1 .

Proof. By (6.20), we ﬁnd for the dual operator G∗ : F −→ 2 the identty

(G∗ (f ), c)2 = (f, G(c)) = cn (f, un ) = (((f, un ))n∈Z , c)2
n∈Z

for all c ∈ 2 , and this already implies the stated representation in (a).
By the representation in (a) in combination with the Riesz basis property
of B, we see that G∗ is bijective. Moreover, for f, g ∈ F the representation

((G−1 )∗ G∗ (f ), g) = (G∗ (f ), G−1 (g))2 = (f, GG−1 (g)) = (f, g)

holds. Therefore, (G−1 )∗ G∗ is the identity on F. Likewise, we see that

G∗ (G−1 )∗ is the identity on 2 . This proves statement (b).
As regards statement (c), we ﬁnd on the one hand

G∗ (f )22 = (G∗ (f ), G∗ (f ))2 = (f, GG∗ (f )) ≤ f · G · G∗ (f )2

by letting c = G∗ (f ) in (6.20), and this implies G∗ ≤ G. On the other

hand, we have

G(c)2 = (G(c), G(c)) = (G∗ G(c), c)2 ≤ G∗ · G(c) · c2

by letting f = G(c) in (6.20), which implies G ≤ G∗ . Altogether, we have

G = G∗ . The other statement in (c) follows from similar arguments.
Now we explain a fundamental duality property for Riesz bases.
Theorem 6.31. For any Riesz basis B = (un )n∈Z of F with Riesz constants
0 < A ≤ B < ∞, there is a unique Riesz basis B̃ = (ũn )n∈Z of F, such that
(a) the elements in B and B̃ are mutually orthonormal, i.e.,

(un , ũm ) = δnm for all n, m ∈ Z. (6.21)

(b) the Riesz basis B̃ has Riesz constants 0 < 1/B ≤ 1/A < ∞.
(c) any f ∈ F can uniquely be represented w.r.t. B or B̃, respectively, as

f= (f, ũn )un = (f, un )ũn . (6.22)
n∈Z n∈Z
6.2 Complete Orthogonal Systems and Riesz Bases 201

The Riesz basis B̃ is called the dual Riesz basis of B in F.

Proof. We consider the linear operator G : 2 −→ F in (6.19) associated with
the Riesz basis B = (un )n∈Z and its dual operator G∗ : F −→ 2 in (6.20).
According to Propositions 6.29 and 6.30 each of the linear operators G and
G∗ is continuous and has a continuous inverse. Therefore, their composition
GG∗ : F −→ F is continuous and has a continuous inverse.
Now we consider B̃ = (ũn )n∈Z , where

ũn := (GG∗ )−1 un for n ∈ Z.

The elements in B̃ satisfy the orthonormality relation (6.21) in (a), since

(un , ũm ) = (un , (GG∗ )−1 um ) = (G−1 un , G−1 um )2 = δmn (6.23)

holds for any m, n ∈ Z. Moreover, for c = (cn )n∈Z ∈ 2 , we have the identity
, -

∗ −1
cn ũn = (GG ) cn un = (G∗ )−1 c .

n∈Z n∈Z

By G∗ 2 = B and (G∗ )−1 2 = 1/A, we get the Riesz stability for B̃, i.e.,
2

1 1
c22 ≤ cn ũn ≤ c22 for all c = (cn )n∈Z ∈ 2 . (6.24)
B A
n∈Z

Now the continuity of (GG∗ )−1 and the completeness of B in (6.17) implies

F = span{ũn | n ∈ Z},

i.e., B̃ is a Riesz basis of F with Riesz constants 0 < 1/B ≤ 1/A < ∞. The
stated uniqueness of B̃ follows from the orthonormality relation (6.23).
Let us ﬁnally show property (c). Since G is surjective, any f ∈ F can be
represented as

f= c n un for some c = (cn )n∈Z ∈ 2 .
n∈Z

But this implies , -

(f, ũm ) = cn un , ũm = cm ,
n∈Z

whereby the stated (unique) representation in (6.22) holds, i.e.,

f= (f, ũn )un for all f ∈ F.
n∈Z

Likewise, the stated representation in (6.22) with respect to the Riesz basis
B̃ can be shown by similar arguments.
202 6 Asymptotic Results

From the estimates in (6.24) and the representation in (6.22), we get the
stability of the coeﬃcients (f, un ))n∈Z ∈ 2 under perturbations of f ∈ F.

Corollary 6.32. Let B = (un )n∈Z be a Riesz basis of F with Riesz constants
0 < A ≤ B < ∞. Then, the stability estimates

Af 2 ≤ ((f, un ))n∈Z 22 ≤ Bf 2 for all f ∈ F (6.25)

hold.

Remark 6.33. Every Riesz basis B = (un )n∈Z of F yields a system of 2 -

linearly independent elements in F, i.e., for c = (cn )n∈Z ∈ 2 the implication

cn un = 0 =⇒ c = 0
n∈Z

holds. In other words, G(c) = 0 implies c = 0, as this is covered by Proposi-

tion 6.29 (b). Moreover, Corollary 6.32 gives the stability estimates in (6.25).

The required conditions for a Riesz basis B = (un )n∈Z (according to

Deﬁnition 6.27) often appear as too restrictive. In fact, relevant applications
work with weaker conditions on B, where they merely require the stability
in (6.25), but not the 2 -linearly independence of B.

Deﬁnition 6.34. A system B = (un )n∈Z of elements in a Hilbert space F is

called a frame of F, if for 0 < A ≤ B < ∞ the estimates

Af 2 ≤ ((f, un ))n∈Z 22 ≤ Bf 2 for all f ∈ F (6.26)

hold, where the “best possible” constants, i.e., the largest A and the smallest
B satisfying (6.26), are called frame constants of B.

Remark 6.35. Any frame B = (un )n∈Z of F is complete in F, i.e., the span
of B is dense in F,
F = span{un | n ∈ Z}.
This immediately follows from the completeness criterion, Theorem 6.26, by
using the lower estimate in (6.26).

Remark 6.36. Every Riesz basis B is a frame, but the converse is general not
true. Indeed, a frame B = (un )n∈Z allows ambiguities in the representation

f= c n un for f ∈ F,
n∈Z

due to a possible 2
-linear dependence of the elements in B.
6.2 Complete Orthogonal Systems and Riesz Bases 203

Remark 6.37. For any frame B = (un )n∈Z of F, there exists a dual frame
B̃ = (ũn )n∈Z of F satisfying

f= (f, un )ũn = (f, ũn )un for all f ∈ F.
n∈Z n∈Z

However, the duality relation (un , ũm ) = δnm in (6.21) does not in general
hold, since otherwise the elements of B and the elements of B̃ would be 2 -
linearly independent, respectively.

For further illustration, we discuss the following examples.

Example 6.38. The three vectors

√ √
u1 = (0, 1)T , u2 = (− 3/2, −1/2)T , u3 = ( 3/2, −1/2)T

form a frame in F = R2 , since for f = (f1 , f2 )T ∈ R2 , we have

, √ -2 , √ -2

3
3 1 3 1
2
(f, uj ) = f22 + − f1 − f 2 + f1 − f 2
j=1
2 2 2 2
3 2 3
= (f1 + f22 ) = f 22 ,
2 2
and so the stability in (6.25) holds with A = B = 3/2. However, note that
the vectors u1 , u2 , u3 are 2 -linearly dependent, since u1 + u2 + u3 = 0. ♦

In our next example we discuss Riesz bases in ﬁnite-dimensional Euclidean

spaces for the prototypical case of the Euclidean space F = Rd , where d ∈ N.

Example 6.39. For the Euclidean space Rd , where d ∈ N, equipped with the
Euclidean norm · 2 , any basis B = {u1 , . . . , ud } of Rd is a Riesz basis of Rd .
Indeed, in this case, we have for the regular matrix U = (u1 , . . . , ud ) ∈ Rd×d
and for any vector c = (c1 , . . . , cd )T ∈ Rd the stability estimates
N

U −1 −1
2 c2 ≤ cn un = U c2 ≤ U 2 c2 .

n=1 2

Therefore, the Riesz constants 0 < A ≤ B < ∞ of B are given by the spectral
norms of the matrices U and U −1 , so that A = U −1 −2
2 and B = U 2 . The
2
−1
unique dual Riesz basis B̃ of B is given by the rows of the inverse U . This
immediately follows by U U −1 = I from Theorem 6.31 (a). ♦

We close this section by studying an example for a frame of Rd .

204 6 Asymptotic Results

Example 6.40. We continue to work with the Euclidean space Rd , where

d ∈ N, whose inner product is denoted by (·, ·). For a frame B = (un )N
n=1 of
Rd , where N > d, we consider the dual operator G∗ : Rd −→ RN in (6.20).
According to Proposition 6.30 (a), the representation
G∗ (f ) = ((f, un ))N
n=1 = (un f )n=1 ∈ R
T N N
for f ∈ Rd
holds, or, in matrix notation,
G ∗ f = U T f = cf for f ∈ Rd ,
where U = (u1 , . . . , uN ) ∈ Rd×N and cf = ((f, un ))N n=1 ∈ R . Due to
N

the completeness of B, according to Deﬁnition 6.34 (a), we see that the

columns (u1 , . . . , uN ) of U must contain a basis of Rd . Hence, U has full
rank, d = rank(U ), and U T ∈ RN ×d is injective. But this is consistent with
the injectivity of the dual operator G∗ , as established in the lower estimate
in (6.26), for A > 0.
Now we consider the dual frame B̃ = (ũn )Nn=1 of B, as characterized by

N
f= (f, un )ũn for all f ∈ Rd .
n=1

By U f = cf , we have U U T f = U cf and so
T

f = (U U T )−1 U cf for all f ∈ Rd ,

T −1
i.e., the dual frame B̃ = (ũn )N
n=1 is determined by the columns of (U U ) U.
However, the elements in B and B̃ do not satisfy the orthonormality relation
in Theorem 6.31 (a). ♦

6.3 Convergence of Fourier Partial Sums

In this section, we analyze the approximation behaviour of Fourier partial
sums in more detail. To this end, we recall our discussion from Section 4.3,
where, in particular, we had proven the orthonormality of the real-valued
R
trigonometric polynomials in C2π ≡ C2π , see Theorem 4.11. By using the
Weierstrass theorem for trigonometric polynomials, Corollary 6.17, we can
prove the following result.
Corollary 6.41. The real-valued trigonometric polynomials
8
1
R
√ , cos(j·), sin(j·) j ∈ N ⊂ C2π (6.27)
2
R
form a complete orthonormal system in C2π with respect to the Euclidean
1/2
norm · R = (·, ·)R , as deﬁned by the (real) inner product

1 2π R
(f, g)R = f (x)g(x) dx for f, g ∈ C2π .
π 0
6.3 Convergence of Fourier Partial Sums 205

Proof. The orthonormality of the trigonometric polynomials in (6.27) holds

by Theorem 4.11. Moreover, due to the trigonometric version of the Weier-
strass theorem, Corollary 6.17, the real-valued trigonometric polynomials
T ≡ T R are dense in C2π = C2π R
with respect to the maximum norm · ∞ ,
and so T is a dense subset of C2π also with respect to the weaker Euclidean
norm · R , cf. Corollary 6.18.

Remark 6.42. The result of Corollary 6.41 can directly be transferred to

the complex case, whereby the complex-valued trigonometric polynomials
C
{eij· | j ∈ Z} ⊂ C2π
C
form a complete orthonormal system in C2π with respect to the Euclidean
1/2
norm · C = (·, ·)C , deﬁned by the (complex) inner product
2π
1 C
(f, g)C = f (x)g(x) dx for f, g ∈ C2π , (6.28)
2π 0

cf. Remark 4.10.

Now we consider, for n ∈ N0 , real-valued Fourier partial sums of the form

a0
n
R
(Fn f )(x) = + (aj cos(jx) + bj sin(jx)) for f ∈ C2π (6.29)
2 j=1

with Fourier coeﬃcients a0 = (f, 1)R , aj = (f, cos(j·))R , and bj = (f, sin(j·))R ,
for j ∈ N, see Corollary 4.12. As we noticed in Section 4.3, the Fourier ope-
R
rator Fn : C2π −→ TnR gives the orthogonal projection of C2π R
onto TnR . In
R R
particular, Fn f ∈ Tn is the unique best approximation to f ∈ C2π from TnR
with respect to the Euclidean norm · R .
As regards our notations concerning real-valued against complex-valued
R
functions, we recall Remark 4.10: For real-valued functions f ∈ C2π ≡ C2π ,
we apply the inner product (·, ·) = (·, ·)R and the norm · = · R . In
C
contrast, for complex-valued functions f ∈ C2π , we use (·, ·)C and · C .

6.3.1 Convergence in Quadratic Mean

From our above discussion, we can conclude the following convergence result.

Corollary 6.43. For the approximation to f ∈ C2π by Fourier partial sums

Fn f we have convergence in quadratic mean, i.e.,

lim Fn f − f = 0 for all f ∈ C2π .

n→∞

Proof. The statement follows immediately from property (b) in Theorem 6.21
in combination with Corollary 6.41.
206 6 Asymptotic Results

Next, we quantify the speed of convergence for the Fourier partial sums
Fn f . To this end, the complex representation in (4.23),

n
(Fn f )(x) = cj eijx , (6.30)
j=−n

with the complex Fourier coeﬃcients cj ≡ cj (f ) = (f, exp(ij ·))C , i.e.,

2π
1
cj = f (x)e−ijx dx for − n ≤ j ≤ n,
2π 0
and the orthonormal system {exp(ij ·) | − n ≤ j ≤ n} ⊂ TnC with respect to
the complex inner product (·, ·)C in (6.28) turns out to be particularly useful.
Indeed, from the representation in (6.30) we can prove the following result.
Theorem 6.44. For f ∈ C2π k
the Fourier partial sums Fn f converge to f at
convergence rate k ∈ N0 according to

1
Fn f − f ≤ k Fn f (k) − f (k) = o(n−k ) for n → ∞. (6.31)
(n + 1)
Proof. For k = 0, we obtain the stated convergence result from Corollary 6.43.
For k = 1, we apply integration by parts to obtain, for j = 0 and f ∈ C2π 1
,
by the identity
2π 2π
1 i 1 2π i 1
cj (f ) = f (x)e−ijx dx = f (x)e−ijx 0 − f (x)e−ijx dx
2π 0 j 2π j 2π 0
2π
i 1 i i
=− f (x)e−ijx dx = − (f , e−ij· ) = − cj (f )
j 2π 0 j j
an alternative representation for the complex Fourier coeﬃcients cj in (6.30).
By induction on k, we obtain for f ∈ C2π k
the representation
1
cj (f ) = (−i)k cj (f (k) ) for all j ∈ Z \ {0},
jk
and so in this case, we ﬁnd the estimate
1
|cj (f )| ≤ |cj (f (k) )| for all j ∈ Z \ {0} and k ∈ N0 . (6.32)
|j|k
By the representation of the error in Corollary 6.24, this in turn implies
1
Fn f − f 2C = |cj (f )|2 ≤ |cj (f (k) )|2
j 2k
|j|≥n+1 |j|≥n+1
1
≤ |cj (f (k) )|2
(n + 1)2k
|j|≥n+1
2
1 (k)
= Fn f (k)
− f
(n + 1)2k C
6.3 Convergence of Fourier Partial Sums 207

for f ∈ C2π
k
and therefore

1 (k)
Fn f − f ≤ F n f (k)
− f = o(n−k ) for n → ∞,
(n + 1)k
where we use the convergence

Fn f (k) − f (k) −→ 0 for n → ∞

for f (k) ∈ C2π according to Corollary 6.43.

Remark 6.45. The convergence rate k ∈ N0 , as achieved in Theorem 6.44,
follows from the asymptotic decay of the Fourier coeﬃcients cj (f ) of f
in (6.32), whereby

|cj (f )| = O |j|−k for |j| → ∞.

Further note that the decay of cj (f ) follows from the assumption f ∈ C2πk
.
As for the converse, we can determine the smoothness of f from the asymp-
totic decay of the Fourier coeﬃcients cj (f ). More precisely: If the Fourier
coeﬃcients cj (f ) of f have the asymptotic decay

|cj (f )| = O |j|−(k+1+ε) for |j| → ∞

for some ε > 0, then this implies f ∈ C2πk

(see Exercise 6.91).
Conclusion: The smoother f ∈ C2π , the faster the convergence of the Fourier
partial sums Fn f to f , and vice versa.

6.3.2 Uniform Convergence

Next, we analyze the uniform convergence of the Fourier partial sums Fn f .

Although we have proven convergence in quadratic mean, i.e., convergence
with respect to the Euclidean norm · , we cannot expect convergence in the
stronger maximum norm ·∞ , due to Remark 6.19. In fact, to prove uniform
convergence we need to assume further conditions on f ∈ C2π , especially
concerning its smoothness. As we show now, it is suﬃcient to require that f
has a continuous derivative, i.e., f ∈ C2π
1
.
Corollary 6.46. For f ∈ C2π 1
, the Fourier partial sums Fn f converge uni-
formly to f , i.e., we have

lim Fn f − f ∞ = 0.
n→∞

Proof. For any n ∈ N, the orthogonality Fn f − f ⊥ 1 holds, i.e., we have

2π
(Fn f − f )(x) dx = 0 for all n ∈ N.
0
208 6 Asymptotic Results

Therefore, the error function Fn f − f has at least one zero xn in the open
interval (0, 2π), whereby for x ∈ [0, 2π] we obtain the representation
x x
(Fn f − f )(x) = (Fn f − f ) (ξ) dξ = (Fn f − f )(ξ) dξ,
xn xn

where we used the identity (Fn f ) = Fn f (see Exercise 6.92). By the Cauchy-
Schwarz inequality, we further obtain
x x

|(Fn f − f )(x)| ≤
2
1 dξ · |(Fn f − f )(ξ)| dξ
2
xn xn
≤ (2π)2 Fn f − f 2 −→ 0 for n → ∞, (6.33)
which already proves the stated uniform convergence.
Now we conclude from Theorem 6.44 a corresponding result concerning
the convergence rate of (Fn f )n∈N0 with respect to the maximum norm · ∞ .
Corollary 6.47. For f ∈ C2π k
, where k ≥ 1, the Fourier partial sums Fn f
converge uniformly to f at convergence rate k − 1, according to
Fn f − f ∞ = o(n−(k−1) ) for n → ∞.
Proof. For f ∈ C2π
k−1
, we have by (6.33) and (6.31) the estimate

2π (k)
Fn f − f ∞ ≤ 2πFn f − f ≤ Fn f (k)
− f ,
(n + 1)k−1
whereby we obtain for f (k) ∈ C2π the asymptotic convergence behaviour
Fn f − f ∞ = o(n−(k−1) ) for n → ∞
according to Corollary 6.43.

6.3.3 Pointwise Convergence

Next, we analyze pointwise convergence for the Fourier partial sums Fn f . To
this end, we ﬁrst derive for x ∈ R a suitable representation for the pointwise
error (Fn f )(x) − f (x) at x. We utilize, for f ∈ C2π , the real representation
of Fn f , whereby we obtain
a0
n
(Fn f )(x) = + [aj cos(jx) + bj sin(jx)]
2 j=1
⎡ ⎤
n
1 2π 1
= f (τ ) ⎣ + (cos(jτ ) cos(jx) + sin(jτ ) sin(jx))⎦ dτ
π 0 2 j=1
⎡ ⎤
n
1 2π 1
= f (τ ) ⎣ + cos(j(τ − x))⎦ dτ. (6.34)
π 0 2 j=1
6.3 Convergence of Fourier Partial Sums 209

Note that in the last line we applied the trigonometric addition formula

cos(u + v) = cos(u) cos(v) − sin(u) sin(v)

for u = jτ and v = −jx. Now we simplify the integrand in (6.34) by applying

the substitution z = τ − x along with the representation
⎡ ⎤
1 n
⎣ + cos(jz)⎦ 2 sin(z/2)
2 j=1

n
= sin(z/2) + 2 cos(jz) sin(z/2)
j=1
n
1 1
= sin(z/2) + sin j+ z − sin j− z
j=1
2 2

1
= sin n+ z , (6.35)
2

where we used the trigonometric identity

u+v u−v
sin(u) − sin(v) = 2 cos sin
2 2

for u = (j + 1/2)z and v = (j − 1/2)z. This implies the representation

2π
1
(Fn f )(x) = f (τ )Dn (τ − x) dτ, (6.36)
π 0

where the function

1 sin((n + 1/2)z)
Dn (z) = for n ∈ N0 (6.37)
2 sin(z/2)

is called Dirichlet5 kernel. Note that the Dirichlet kernel is 2π-periodic and
even, so that we can further simplify the representation in (6.36) to obtain
2π
1
(Fn f )(x) = f (τ )Dn (τ − x) dτ
π 0

1 2π−x
= f (x + σ)Dn (σ) dσ
π −x

1 π
= f (x + σ)Dn (σ) dσ. (6.38)
π −π

Since Fn 1 ≡ 1, for n ∈ N0 , we further obtain by (6.38) the representation

5
Peter Gustav Lejeune Dirichlet (1805-1859), German mathematician
210 6 Asymptotic Results

1 π
(Fn f )(x) − f (x) = [f (x + σ) − f (x)] Dn (σ) dσ
π −π
π
1
= gx (σ) · sin((n + 1/2)σ) dσ
π −π

for the pointwise error at x ∈ R, where

f (x + σ) − f (x)
gx (σ) := . (6.39)
2 sin(σ/2)
By using the trigonometric addition formula

sin(nσ + σ/2) = sin(nσ) cos(σ/2) + cos(nσ) sin(σ/2)

we can rewrite the representation for the pointwise error as a sum of the form

(Fn f )(x) − f (x)

1 π 1 π
= gx (σ) cos(σ/2) · sin(nσ) dσ + gx (σ) sin(σ/2) · cos(nσ) dσ
π −π π −π
= bn (gx (·) cos(·/2)) + an (gx (·) sin(·/2))

with the Fourier coeﬃcients bn (vx ) and an (wx ) of the 2π-periodic functions

vx (σ) = gx (σ) cos(σ/2)

wx (σ) = gx (σ) sin(σ/2).

Suppose gx (σ) is a continuous function. Then, vx , wx ∈ C2π . Moreover,

by the Parseval identity, we have in this case

vx 2C = |(vx , exp(in·))|2 < ∞ and wx 2C = |(wx , exp(in·))|2 < ∞,
n∈Z n∈Z

so that the Fourier coeﬃcients (bn (vx ))n∈Z and (an (wx ))n∈Z are a zero se-
quence, respectively, whereby the pointwise convergence of (Fn f )(x) to f (x)
at x would follow.
Now we are in a position where we can, from our above investigations,
formulate a suﬃcient condition for f ∈ C2π which guarantees pointwise con-
vergence of (Fn f )(x) to f (x) at x ∈ R.

Theorem 6.48. Let f ∈ C2π be diﬀerentiable at x ∈ R. Then, we have

pointwise convergence of (Fn f )(x) to f (x) at x, i.e.,

(Fn f )(x) −→ f (x) for n → ∞.

Proof. First note that the function gx in (6.39) can only have singularities at
σk = 2πk, for k ∈ Z. Now we analyze the behaviour of gx around zero, where
we ﬁnd
6.3 Convergence of Fourier Partial Sums 211

f (x + σ) − f (x) f (x + σ) − f (x) σ
lim gx (σ) = lim = lim · lim
σ→0 σ→0 2 sin(σ/2) σ→0 σ σ→0 2 sin(σ/2)
= f (x),

by using L’Hôpital’s 6 rule. Therefore, the function gx is continuous at σ = 0.

By the periodicity of gx and f , we see that the function gx is also continuous
at σ = 2πk, for all k ∈ Z, whereby gx is continuous on R.

6.3.4 Asymptotic Behaviour of the Fourier Operator Norms

Now let us return to the uniform convergence of Fourier partial sums, where
the following question is of particular importance.
Question: Can we, under mild as possible conditions on f ∈ C2π \ C2π
1
, prove
statements concerning uniform convergence of the Fourier partial sums Fn f ?
To answer this question, we need to analyze the norm Fn ∞ of the
Fourier operator Fn with respect to the maximum norm · ∞ . To this end,
we ﬁrst derive a suitable representation for the operator norm

Fn f ∞
Fn ∞ := sup for n ∈ N0 , (6.40)
f ∈C2π \{0} f ∞

before we study the asymptotic behaviour of Fn ∞ for n → ∞.

From (6.38), we obtain the uniform estimate
π
1 2 π
|(Fn f )(x)| ≤ f ∞ |Dn (σ)| dσ = f ∞ · |Dn (σ)| dσ. (6.41)
π −π π 0

This leads us to a suitable representation for the norm Fn ∞ of Fn in (6.40).

Theorem 6.49. The norm of the Fourier operator Fn has the representation

Fn ∞ = λn for all n ∈ N0 ,

where

2 π
1 π sin((n + 1/2)σ)
λn := |Dn (σ)| dσ = dσ (6.42)
π π sin(σ/2)
0 0

is called Lebesgue7 constant.

Proof. From (6.41), we immediately obtain

Fn f ∞ ≤ f ∞ · λn
6
Marquis de L’Hôpital (1661-1704), French mathematician
7
Henri Léon Lebesgue (1875-1941), French mathematician
212 6 Asymptotic Results

and so we have, on the one hand, the upper bound Fn ∞ ≤ λn .

On the other hand, we can choose, for any ε > 0 an even 2π-periodic
continuous function f satisfying f ∞ = 1, such that f approximates the
even step function sgn(Dn (x)) arbitrarily well, i.e.,

1 π
Fn ∞ ≥ Fn f ∞ ≥ |(Fn f )(0)| = f (σ)Dn (σ) dσ
π −π

1 π
≥ sgn(Dn (σ))Dn (σ) dσ − ε
π −π

2 π
= |Dn (σ)| dσ − ε
π 0
= λn − ε,

whereby for ε → 0 we have the lower bound Fn ∞ ≥ λn .

Altogether, we ﬁnd Fn ∞ = λn , as stated.

Remark 6.50. To obtain uniform convergence,

Fn f − f ∞ −→ 0 for n → ∞,

for all f ∈ C2π , we require the Fourier operator norms Fn ∞ = λn to be

uniformly bounded from above. We can see this from the triangle inequality

Fn f ∞ ≤ Fn f − f ∞ + f ∞ .

Indeed, if the norms Fn ∞ are not uniformly bounded from above, then
there must be at least one f ∈ C2π yielding divergence Fn f ∞ −→ ∞ for
n → ∞, in which case the sequence of error norms Fn f − f ∞ must be
divergent, i.e., Fn f − f ∞ −→ ∞ for n → ∞.

Unfortunately, the operator norms Fn ∞ are not uniformly bounded

from above. This is because we have the following estimates for λn = Fn ∞ .

Theorem 6.51. For the Lebesgue constants λn in (6.42), we have

4
log(n + 1) ≤ λn ≤ 1 + log(2n + 1) for all n ∈ N0 . (6.43)
π2
Proof. For n = 0, the estimates in (6.43) are satisﬁed by λ0 = 1.
Now suppose n ≥ 1. For the zeros
kπ
σk = for k ∈ Z
n + 1/2

of Dn (σ) in (6.37) we obtain, on the one hand, the lower estimates

6.3 Convergence of Fourier Partial Sums 213
n−1
1 σk+1 sin((n + 1/2)σ)
λn ≥ dσ
π sin(σ/2)
k=0 σk
σk+1
2 1
n−1
≥ | sin((n + 1/2)σ)| dσ (6.44)
π σk+1 σk
k=0

4 1
n−1
=
π2 k+1
k=0
4
≥ 2 log(n + 1), (6.45)
π
where we have used the estimate

| sin(σ/2)| ≤ |σ/2| for all σ ∈ R

in (6.44) and, moreover, we have used the estimate

n−1
1
≥ log(n + 1) for all n ∈ N
k+1
k=0

in (6.45).
On the other hand, we have for the integrand in (6.42) the estimates
⎡ ⎤

sin((n + 1/2)σ) 1 n
n

= 2 ⎣ + cos(jσ) ⎦ = 1 + 2 cos(jσ) ≤ 1 + 2n,
sin(σ/2)
2 j=1 j=1

see (6.35), and, moreover,

sin((n + 1/2)σ)
≤ 1 = π for π ≥ σ ≥
π
=: μn ,
sin(σ/2) σ/π σ 2n + 1
where we have used the estimate

sin(σ/2) ≥ σ/π for all σ ∈ [0, π].

But this already implies the upper bound

μn π
1 π
λn ≤ (2n + 1) dσ + dσ
π 0 μn σ
μn
= (2n + 1) + log(π/μn ) = 1 + log(2n + 1).
π

Since Fn ∞ is unbounded, we can conclude from Remark 6.50, that there
exists at least one function f ∈ C2π for which the sequence of Fourier partial
sums Fn f does not converge uniformly to f . This fundamental insight is
based on the important uniform boundedness principle of Banach-Steinhaus.
214 6 Asymptotic Results

6.3.5 Uniform Boundedness Principle

Let us ﬁrst quote the Banach8 -Steinhaus9 theorem, a well-known result from
functional analysis, before we draw relevant conclusions. We will not prove the
Banach-Steinhaus theorem, but rather refer the reader to the textbook [33].

Theorem 6.52. (Banach-Steinhaus, 1927).

Let (Ln )n∈N be a sequence of bounded linear operators

Ln : B1 −→ B2 for n ∈ N

between two Banach spaces B1 and B2 . Moreover, suppose the operators Ln

are pointwise bounded, i.e., for any f ∈ B1 we have

sup Ln f < ∞.

n∈N

Then, the uniform boundedness principle holds for the operators Ln , i.e.,

sup Ln < ∞.

n∈N

In conclusion, by the Banach-Steinhaus theorem, the pointwise bounded-

ness of the operators (Ln )n∈N implies their uniform boundedness. But this
has negative consequences for the approximation with Fourier partial sums.
We can further explain this by providing the following corollary.

Corollary 6.53. There is a function f ∈ C2π for which the sequence

(Fn f )n∈N of Fourier partial sums does not converge uniformly to f , i.e.,

Fn f − f ∞ −→ ∞ for n → ∞.

Moreover, for this f , we have the divergence

Fn f ∞ −→ ∞ for n → ∞.

Proof. The function space C2π , equipped with the maximum norm · ∞ , is
a Banach space. By the divergence Fn ∞ = λn −→ ∞ for n → ∞, there is
one f ∈ C2π with Fn f ∞ −→ ∞ for n → ∞. Indeed, otherwise this would
contradict the Banach-Steinhaus theorem. Now the estimate

Fn f − f ∞ ≥ Fn f ∞ − f ∞

immediately implies, for this f , the stated divergence Fn f − f ∞ −→ ∞,

for n → ∞, of the Fourier partial sums’ maximum norms.
8
Stefan Banach (1892-1945), Polish mathematician
9
Hugo Steinhaus (1887-1972), Polish mathematician
6.3 Convergence of Fourier Partial Sums 215

Next, we show the norm minimality of the Fourier operator Fn among all
surjective projection operators onto the linear space of trigonometric poly-
nomials Tn . The following result dates back to Charshiladse-Losinski.

Theorem 6.54. (Charshiladse-Losinski).

For n ∈ N0 , let L : C2π −→ Tn be a continuous linear projection operator,
i.e.,
L(Lf ) = L(f ) for all f ∈ C2π .
Moreover, suppose L is surjective, i.e., L(C2π ) = Tn . Then, we have

L∞ ≥ Fn ∞ .

Proof. We deﬁne for s ∈ R the translation operator Ts by

(Ts f )(x) := f (x + s) for f ∈ C2π and x ∈ R.

Note that Ts ∞ = 1. Moreover, we deﬁne a linear operator G by

π
1
(Gf )(x) := (T−s LTs f )(x) ds for f ∈ C2π and x ∈ R. (6.46)
2π −π

Then, G : C2π −→ Tn is bounded (i.e., continuous) on C2π , since we have

|(Gf )(x)| ≤ T−s LTs f ∞ ≤ T−s ∞ L∞ Ts ∞ f ∞ = L∞ f ∞

and so Gf ∞ ≤ L∞ f ∞ for all f ∈ Tn , or,

G∞ ≤ L∞ .

Now the operator G coincides on C2π with the Fourier operator Fn , as we

will show by the following lemma. This then completes our proof.

Lemma 6.55. Suppose the operator L : C2π −→ Tn satisﬁes the assumptions

in Theorem 6.54. Then, the operator G in (6.46) coincides on C2π with the
Fourier operator Fn : C2π −→ Tn , i.e., we have

Gf = Fn f for all f ∈ C2π .

C
Proof. We obtain the extension L : C2π −→ TnC of the operator L by letting
C R
Lf := Lu + iLv for f = u + iv ∈ C2π where u, v ∈ C2π = C2π .
C
In this way, the extension of G in (6.46) from C2π to C2π is well-deﬁned.
C
Moreover, we work with the extension of Fn from C2π to C2π .
C
Since the orthonormal system {eij· | j ∈ Z} is complete in C2π (cf. Re-
mark 6.42 and Exercise 6.89) and by the continuity of the linear operators
C
Fn : C2π −→ TnC and G : C2π
C
−→ TnC , it is suﬃcient to show the identity
216 6 Asymptotic Results

G eij· = Fn eij· for all j ∈ Z. (6.47)

To this end, we take a closer look at the operator G. First we note
Ts eij· (x) = eij(x+s) = eijx eijs
and this implies
LTs eij· (x) = eijs Leij· (x)
and, moreover,
T−s LTs eij· (x) = eijs Leij· (x − s). (6.48)

Case 1: For |j| ≤ n, we have (since L is surjective)

(Lf )(x) = eijx ∈ TnC
C
for one f ∈ C2π . Together with the projection property L(Lf ) = Lf , this
implies (for this particular f ) the identity
(L(Lf ))(x) = L eij· (x) = (Lf )(x) = eijx ,

i.e., L eij· (x) = eijx . In combination with (6.48), we further obtain

T−s LTs eij· (x) = eijs eij(x−s) = eijx

and so
π
ij· 1
G e (x) = eijx ds = eijx = Fn eij· (x).
2π −π

Case 2: For |j| > n, we have Fn eij· (x) = 0. Moreover, the function
e is orthogonal to the trigonometric polynomial L eij· (x − s) ∈ TnC .
ijs

From this and by (6.48), we obtain

π
1
G e ij·
(x) = eijs L eij· (x − s) ds = 0.
2π −π
Altogether, the identity (6.47) holds, as stated.
Obviously, the result of Theorem 6.54 makes our situation worse. Indeed,
we can formulate one more negative consequence from the Charshiladse-
Losinski theorem.
Corollary 6.56. Let (Ln )n∈N0 be a sequence of continuous and surjective
linear projection operators Ln : C2π −→ Tn . Then, there is a function f ∈ C2π
satisfying
Ln f ∞ −→ ∞ for n → ∞,
whereby
Ln f − f ∞ −→ ∞ for n → ∞.

6.4 The Jackson Theorems 217

Corollary 6.56 can be proven by similar arguments as for Corollary 6.53.

We ﬁnally draw another negative conclusion from the Banach-Steinhaus
theorem, which prohibits uniform convergence for sequences of interpolation
polynomials. The following important result is due to Faber10 [23].
Theorem 6.57. (Faber, 1914). For any sequence (In )n∈N0 of interpolation
operators In : C [a, b] −→ Pn , there is a continuous function f ∈ C [a, b],
for which the corresponding sequence (In f )n∈N0 of interpolation polynomials
In f ∈ Pn does not converge uniformly to f .
For a proof of the Faber theorem, we refer to Exercise 6.93.

6.4 The Jackson Theorems

In this section, we analyze the asymptotic behaviour of the minimal distances
η∞ (f, Tn ) := inf T − f ∞ for f ∈ C2π
T ∈Tn
η∞ (f, Pn ) := inf p − f ∞ for f ∈ C [a, b]
p∈Pn

for n → ∞ with respect to the maximum norm ·∞ . According to the Weier-
strass theorems, Corollaries 6.12 and 6.17, we can rely on the convergence
η∞ (f, Tn ) −→ 0 and η∞ (f, Pn ) −→ 0 for n → ∞.
In this section, we quantify the asymptotic decay of the zero sequences
(η∞ (f, Tn ))n∈N0 and (η∞ (f, Pn ))n∈N0 for n → ∞.
We begin our analysis with the trigonometric case, i.e., with the asymp-
totic behaviour of (η∞ (f, Tn ))n∈N0 . On this occasion, we ﬁrst recall the con-
vergence rates of the Fourier partial sums Fn f for f ∈ C2π . By the estimate
η∞ (f, Tn ) ≤ Fn f − f ∞ for n ∈ N0
we expect for f ∈ C2π
k
, k ≥ 1, at least the convergence rate k − 1, according
to Corollary 6.47. However, as it turns out, we gain even more. In fact, we
will obtain the convergence rate k, i.e.,
η∞ (f, Tn ) = O(n−k ) for n → ∞ for f ∈ C2π
k
.
Note that this complies with the convergence behaviour of Fourier partial
sums Fn f with respect to the Euclidean norm · . Indeed, in that case, we
have, by Theorem 6.44, the asymptotic behaviour
η(f, Tn ) = o(n−k ) for n → ∞ for f ∈ C2π
k
.
For an intermediate conclusion, we note one important principle:
The smoother f ∈ C2πk
is, i.e., the larger k ∈ N, the faster the convergence
of the minimal distances η(f, Tn ) and η∞ (f, Tn ) to zero, for n → ∞.
10
Georg Faber (1877-1966), German mathematician
218 6 Asymptotic Results

Remark 6.58. On this occasion, we recall Remark 6.45, where we had

drawn a similar conclusion for the approximation by Fourier partial sums
with respect to the Euclidean norm. As regards our above intermediate con-
clusion, we remark that the converse of that principle is covered by the clas-
sical Bernstein theorems (see e.g. [11]), albeit we decided to refrain from
discussing Bernstein theorems in more details.
In this section, we develop suitable conditions on f ∈ C2π \ C2π
1
, under
which the sequence (Fn f )n∈N0 of Fourier partial sums converges uniformly
to f . In this way, we give an answer to the question which we formulated
at the outset of Section 6.3.4. But first we require some preparations. Let
Πn : C2π −→ Tn denote the nonlinear projection operator, which assigns
f ∈ C2π to its unique best approximation Πn f ∈ Tn with respect to the
maximum norm · ∞ , so that
η∞ (f, Tn ) = Πn f − f ∞ for all f ∈ C2π .
Then, we have the estimate
Fn f − f ∞ = Fn f − Πn f + Πn f − f ∞
= Fn (f − Πn f ) + (Πn f − f )∞
= (I − Fn )(Πn f − f )∞
≤ I − Fn ∞ · Πn f − f ∞
= I − Fn ∞ · η∞ (f, Tn ), (6.49)
where I denotes the identity on C2π . By Theorem 6.51 the sequence of ope-
rator norms Fn ∞ = λn diverges logarithmically, so that
I − Fn ∞ ≤ I∞ + Fn ∞ = O(log(n)) for n → ∞. (6.50)
On the ground of this observation, the asymptotic analysis of the mini-
mal distances η∞ (f, Tn ) is of primary interest: Namely, if we can show, for
f ∈ C2π , that the sequence (η∞ (f, Tn ))n∈N0 converges to zero at least alge-
braically, so that
log(n) · η∞ (f, Tn ) −→ 0 for n → ∞, (6.51)
then the sequence (Fn f )n∈N0 converges by (6.49) and (6.50) uniformly to f .
To this end, the following inequalities of Jackson11 are indeed very useful.
We begin our asymptotic analysis of the minimal distances η∞ (f, Tn ) for
continuously differentiable functions f ∈ C2π
1
. Recall that in this case the uni-
form convergence of the Fourier partial sums Fn f to f is already guaranteed
by Corollary 6.46 and quantified by Corollary 6.47. Nevertheless, the follo-
wing Jackson theorem is of fundamental importance for further investigations
concerning convergence rates of the minimal distance (η∞ (f, Tn ))n∈N0 .
11
Dunham Jackson (1888-1946), US-American mathematician
6.4 The Jackson Theorems 219

Theorem 6.59. (Jackson 1). For f ∈ C2π 1

, we have
π
η∞ (f, Tn ) ≤ f ∞ = O(n−1 ) for n → ∞. (6.52)
2(n + 1)
Remark 6.60. The estimate of Jackson 1, Theorem 6.59, is sharp, i.e., there
is a function f ∈ C2π
1
\ Tn for which equality holds in (6.52). For more details,
we refer to Exercise 6.95.
Our proof for Theorem 6.59 is based on the following two lemmas.
Lemma 6.61. We have

π n
π2
min ξ − a sin(jξ) dξ = . (6.53)
a1 ,...,an ∈R j 2(n + 1)
0 j=1

Lemma 6.62. For A1 , . . . , An ∈ R, let Ln : C2π −→ Tn be a linear operator

of the form
a0
n
(Ln f )(x) := + Aj [aj cos(jx) + bj sin(jx)] for f ∈ C2π , (6.54)
2 j=1

where a0 = (f, 1), aj = (f, cos(j·)) and bj = (f, sin(j·)), for 1 ≤ j ≤ n, are
the Fourier coeﬃcients of f in (6.29). Then we have, for f ∈ C2π 1
, the error
representation
⎡ ⎤
π n j
1 ⎣ξ + (−1)
(Ln f − f )(x) = Aj sin(jξ)⎦ f (x + π − ξ) dξ. (6.55)
π −π 2 j=1 j

Now we can prove the statement of Jackson 1, Theorem 6.59.

Proof. (Jackson 1). For f ∈ C2π
1
, we have for the minimal distance
η∞ (f, Tn ) = inf T − f ∞
T ∈Tn

the estimate
η∞ (f, Tn ) ≤ Ln f − f ∞

π

1 ξ
n
(−1) j

≤ f ∞ · + Aj sin(jξ) dξ

π −π 2 j=1 j

π

1 n
2(−1) j
= f ∞ · ξ + A sin(jξ) dξ
π 0 j
j
j=1
1 π2
= f ∞ · ·
π 2(n + 1)
π
= f ∞ · ,
2(n + 1)
220 6 Asymptotic Results

where in the second line we use the error representation (6.55). Moreover,
in the penultimate line we choose optimal coeﬃcients A1 , . . . , An according
to (6.53).

Now let us prove the two lemmas.

Proof. (Lemma 6.62). Suppose f ∈ C2π

1
. We use the notation

ξ (−1)j
n
g(ξ) := + Aj sin(jξ)
2 j=1 j

for the ﬁrst factor of the integrand in (6.55). This way we obtain

1 π
g(ξ)f (x + π − ξ) dξ
π −π
ξ=π
1 1 π
= − g(ξ)f (x + π − ξ) + g (ξ)f (x + π − ξ) dξ
π ξ=−π π −π

1π 1π 1 π
=− f (x) − f (x + 2π) + g (x + π − σ)f (σ) dσ
π2 π2 π −π
π
1
= −f (x) + g (x + π − σ)f (σ) dσ
π −π

after integration by parts from the error representation (6.55). Now we have

g (x + π − σ)
1 (−1)j
n
= + Aj · j · cos(j(x + π − σ))
2 j=1 j

1
n
= + (−1)j Aj [cos(j(x + π)) cos(jσ) + sin(j(x + π)) sin(jσ)]
2 j=1

1
n
= + (−1)j Aj (−1)j (cos(jx) cos(jσ) + sin(jx) sin(jσ))
2 j=1

1
n
= + Aj [cos(jx) cos(jσ) + sin(jx) sin(jσ)]
2 j=1

and so

a0
π n
1
g (x + π − σ)f (σ) dσ = + Aj [aj cos(jx) + bj sin(jx)]
π −π 2 j=1
= (Ln f )(x),

which already shows that the stated error representation holds.

6.4 The Jackson Theorems 221

Proof. (Lemma 6.61). For arbitrary a1 , . . . , an ∈ R, we have the estimate

π
n

ξ − aj sin(jξ) dξ

0 j=1
⎡ ⎤
π
n

≥ ⎣ ξ− aj sin(jξ) sgn(sin((n + 1)ξ)) dξ
⎦ (6.56)
0 j=1
π

= ξ · sgn(sin((n + 1)ξ)) dξ (6.57)
0
(k+1)π/(n+1)
n

= (−1) k
ξ dξ
kπ/(n+1)
k=0

1 π2 n
2
= (−1) k
(k + 1) 2
− k
2 (n + 1)2
k=0

π2 n

= (−1) k
(2k + 1)
2(n + 1)2
k=0

π2 π2
= 2
· (n + 1) = ,
2(n + 1) 2(n + 1)
where for the equality in (6.57) we use the identity
π
sin(jξ) · sgn(sin((n + 1)ξ)) dξ = 0 for j < n + 1. (6.58)
0

We prove the statement (6.58) by Lemma 6.63.

For the solution of the optimization problem (6.53), we determine coef-
ﬁcients a1 , . . . , an ∈ R, such that equality holds in (6.56). In this case, the
function

n
g(ξ) = ξ − aj sin(jξ)
j=1

must necessarily change signs at the points ξk = kπ/(n + 1) ∈ (0, π), for
1 ≤ k ≤ n. Indeed, this is because the function sgn(sin((n + 1)ξ)) has sign
changes on (0, π) only at the points ξ1 , . . . , ξn .
Note that this requirement yields n conditions on the sought coeﬃcients
a1 , . . . , an ∈ R, where these conditions are the interpolation conditions

n
ξk = aj sin(jξk ) for 1 ≤ k ≤ n. (6.59)
j=1

But the interpolation problem (6.59) has a unique solution, since the trigono-
metric polynomials sin(j·), 1 ≤ j ≤ n, form a Haar system on (0, π) (see
Exercise 5.54).
222 6 Asymptotic Results

Finally, it remains to show the identity (6.58).

Lemma 6.63. For n ∈ N, we have the identity
π
sin(jξ) · sgn(sin((n + 1)ξ)) dξ = 0 for 1 ≤ j < n + 1. (6.60)
0

Proof. The integrand in (6.60) is an even function. Now we regard the integral
in (6.60) on [−π, π] (rather than on [0, π]). By using the identity
1 ijξ
sin(jξ) = e − e−ijξ
2i
it is suﬃcient to show
π
Ij := eijξ · sgn(sin((n + 1)ξ)) dξ = 0 for 1 ≤ |j| < n + 1. (6.61)
−π

After the substitution ξ = σ + π/(n + 1) in (6.61) the representation

π−π/(n+1)
Ij = eij(σ+π/(n+1)) · sgn(sin((n + 1)σ + π)) dσ
−π−π/(n+1)
π
= −eijπ/(n+1) eijσ · sgn(sin((n + 1)σ)) dσ
−π

= −eijπ/(n+1) · Ij
holds. Since −eijπ/(n+1) = 1, this implies Ij = 0 for 1 ≤ |j| < n + 1.
We wish to work with weaker conditions on f (i.e., weaker than f ∈ C2π
1

as in Jackson 1). In the next Jackson theorem, we only need Lipschitz12

continuity for f .
Deﬁnition 6.64. A function f : [a, b] −→ R is said to be Lipschitz
continuous on [a, b] ⊂ R, if there is a constant L > 0 satisfying
|f (x) − f (y)| ≤ L|x − y| for all x, y ∈ R.
In this case, L is called a Lipschitz constant of f on [a, b].
Remark 6.65. For a compact interval [a, b] ⊂ R, every function f ∈ C 1 [a, b]
is Lipschitz continuous on [a, b]. Indeed, this is because in this case, the mean
value theorem applies, so that for any x, y ∈ [a, b], we have the representation
f (x) − f (y) = f (ξ) · (x − y) for some ξ ∈ (a, b)
and this implies the estimate
|f (x) − f (y)| ≤ f ∞ · |x − y| for all x, y ∈ [a, b].
Therefore, L = f ∞ is a Lipschitz constant of f on [a, b].
12
Rudolf Lipschitz (1832-1903), German mathematician
6.4 The Jackson Theorems 223

Theorem 6.66. (Jackson 2). Let f ∈ C2π be Lipschitz continuous on

[0, 2π] with Lipschitz constant L > 0. Then, we have
π·L
η∞ (f, Tn ) ≤ = O(n−1 ) for n → ∞.
2(n + 1)
Remark 6.67. The estimate of Jackson 2, Theorem 6.66, is sharp. For more
details, we refer to Exercise 6.95.
Proof. For δ > 0, we consider the local mean value function
x+δ
1
ϕδ (x) = f (ξ) dξ for x ∈ R (6.62)
2δ x−δ
of f on (x − δ, x + δ). Then, we have
f (x + δ) − f (x − δ)
ϕδ (x) = for all x ∈ R,
2δ
and so ϕδ is in C2π
1
. Moreover, ϕδ satisﬁes the uniform bound
|ϕδ (x)| ≤ L for all x ∈ R,
i.e., ϕδ ∞ ≤ L. By Jackson 1, Theorem 6.59, this implies
π·L
η∞ (ϕδ , Tn ) ≤ .
2(n + 1)
Moreover, we have

x+δ L x+δ
1
|ϕδ (x) − f (x)| = (f (ξ) − f (x)) dξ ≤ |ξ − x| dξ
2δ x−δ 2δ x−δ
L 2 L
= · δ = · δ −→ 0 for δ → 0.
2δ 2
Now let T ∗ (ϕδ ) ∈ Tn be the best approximation to ϕδ from Tn with
respect to · ∞ , so that
η∞ (ϕδ , Tn ) = T ∗ (ϕδ ) − ϕδ ∞ .
Then, we have
η∞ (f, Tn ) ≤ T ∗ (ϕδ ) − f ∞
≤ T ∗ (ϕδ ) − ϕδ ∞ + ϕδ − f ∞
π·L L
≤ + · δ,
2(n + 1) 2
whereby for δ 0, we obtain
π·L
η∞ (f, Tn ) ≤
2(n + 1)
as stated.
224 6 Asymptotic Results

To obtain even weaker conditions on the target function f ∈ C2π , we now

work with the modulus of continuity.

Deﬁnition 6.68. For [a, b] ⊂ R, let f ∈ C [a, b] and δ > 0. Then,

ω(f, δ) = sup |f (x + σ) − f (x)|

x,x+σ∈[a,b]
|σ|≤δ

is called modulus of continuity of f on [a, b] with respect to δ.

Remark 6.69. Note that the modulus of continuity ω(f, δ) quantiﬁes the
local distance between the function values of f uniformly on [a, b]. In fact,
the smaller the modulus of continuity ω(f, δ), the smaller is the local variation
of f on [a, b]. For a compact interval [a, b] ⊂ R, the modulus of continuity
ω(f, δ) of f ∈ C [a, b] is ﬁnite by

ω(f, δ) ≤ 2f ∞,[a,b] for all b − a ≥ δ > 0,

and, moreover, we have the convergence

ω(f, δ) −→ 0 for δ 0.

For f ∈ C 1 [a, b] and x, x + σ ∈ [a, b], we have

f (x + σ) − f (x) = σ · f (ξ) for some ξ ∈ (x, x + σ)

by the mean value theorem, and so

ω(f, δ) ≤ δ · f ∞ .

For a Lipschitz continuous function f ∈ C [a, b] with Lipschitz constant L > 0

we ﬁnally have
ω(f, δ) ≤ δ · L.

The following Jackson theorem gives an upper bound for the minimal
distance η∞ (f, Tn ) by involving the modulus of continuity of f ∈ C2π .

Theorem 6.70. (Jackson 3). For f ∈ C2π , we have

3 π
η∞ (f, Tn ) ≤ · ω f, . (6.63)
2 n+1

Remark 6.71. The estimate of Jackson 3, Theorem 6.70, is not sharp. For
more details, we refer to Exercise 6.97.
6.4 The Jackson Theorems 225

Proof. For the local mean value function ϕδ ∈ C2π 1

of f on (x − δ, x + δ)
in (6.62), we can give a uniform bound on the pointwise error by

1 x+δ
1
|ϕδ (x) − f (x)| ≤ (f (ξ) − f (x)) dξ ≤ · 2δ · ω(f, δ) = ω(f, δ).
2δ x−δ 2δ

Moreover, ϕδ is uniformly bounded above by

1
ϕδ ∞ ≤ · ω(f, 2δ).
2δ
Now let T ∗ (ϕδ ) ∈ Tn be the best approximation to ϕδ from Tn with
respect to · ∞ . Then, by Jackson 1, Theorem 6.59, this implies for δ > 0
the estimate

η∞ (f, Tn ) ≤ T ∗ (ϕδ ) − f ∞
≤ T ∗ (ϕδ ) − ϕδ ∞ + ϕδ − f ∞
π 1
≤ · · ω(f, 2δ) + ω(f, δ)
2(n + 1) 2δ

π
≤ ω(f, 2δ) +1 .
4δ(n + 1)

Letting δ = π/(2(n + 1)), this gives the stated estimate in (6.63).

Next, we analyze the asymptotic decay rate of the minimal distances

η∞ (f, Tn ), for smoother target functions f . To be more precise, we prove
asymptotic convergence rates for f ∈ C2π k
, k ∈ N. Given our previous results,
we can, for smoother f ∈ C2π , i.e., for larger k, expect faster convergence of
k

the zero sequence (η∞ (f, Tn ))n∈N0 . Our perception matches with the result
of the following Jackson theorem.

Theorem 6.72. (Jackson 4). For f ∈ C2π

k
, k ≥ 1, we have
k
π
η∞ (f, Tn ) ≤ · f (k) ∞ = O n−k for n → ∞.
2(n + 1)
Our proof for Theorem 6.72 is based on two lemmas.

Lemma 6.73. For f ∈ C2π

1
and n ∈ N, the estimate
π
η∞ (f, Tn ) ≤ · η∞ (f , Tn ),
2(n + 1)
holds, where the linear space

Tn := span {cos(k·), sin(k·) | 1 ≤ k ≤ n} for n ∈ N

consists of all trigonometric polynomials from Tn without the constants.

226 6 Asymptotic Results

Remark 6.74. For n ∈ N, we have

Tn = {T ∈ C2π | T ∈ Tn } ⊂ Tn

and this explains our notation Tn . By Tn ⊂ Tn , we ﬁnd the estimate

η∞ (f, Tn ) ≤ η∞ (f, Tn )

for all f ∈ C2π .

Proof. (Lemma 6.73). Let T ∗ ∈ Tn be best approximation to f from Tn .

For x
T (x) := T ∗ (ξ) dξ ∈ Tn
0

we have T = T ∗ and so

(T − f ) ∞ = T ∗ − f ∞ = η∞ (f , Tn ).

But this implies, by using Jackson 1, Theorem 6.59, the stated estimate:
π π
η∞ (f, Tn ) = η∞ (T − f, Tn ) ≤ · (T − f ) ∞ = · η∞ (f , Tn ).
2(n + 1) 2(n + 1)

Lemma 6.75. Let f ∈ C2π

1
satisfy
π
f (x) dx = 0. (6.64)
−π

Then we have, for any n ∈ N, the two estimates

π
η∞ (f, Tn ) ≤ · f ∞ (6.65)
2(n + 1)
π
η∞ (f, Tn ) ≤ · η∞ (f , Tn ). (6.66)
2(n + 1)

Proof. For the modiﬁed Fourier partial sum Ln f in (6.54),

a0
n
(Ln f )(x) = + Ak (ak cos(kx) + bk sin(kx)),
2
k=1

we have a0 ≡ a0 (f ) = (f, 1) = 0 by (6.64) and so Ln f ∈ Tn . Therefore, we

have (6.65), since
π
η∞ (f, Tn ) ≤ Ln f − f ∞ ≤ f ∞ ·
2(n + 1)

holds for optimal coeﬃcients A1 , . . . , An (like in the proof of Jackson 1).

6.4 The Jackson Theorems 227

To show (6.66), suppose that T ∗ ∈ Tn is best approximation to f from Tn .

For x
T (x) := T ∗ (ξ) dξ ∈ Tn
0

we have T = T ∗ . Moreover, for

π
1 a0 (T )
S(x) := T (x) − T (ξ) dξ = T (x) −
2π −π 2

we have a0 (S) = (S, 1) = 0. Therefore, S ∈ Tn and S = T ∗ . But this already

implies the stated estimate (6.66) by
π π
η∞ (f, Tn ) = η∞ (S − f, Tn ) ≤ · S − f ∞ = · η∞ (f , Tn ).
2(n + 1) 2(n + 1)

Now we are in a position where we can prove Jackson 4, Theorem 6.72.

Proof. (Jackson 4). For f ∈ C2π 1

we have
π
f (ξ) dξ = f (π) − f (−π) = 0.
−π

Now the estimate (6.66) in Lemma 6.75 implies

k−2
π
η∞ (f , Tn ) ≤ · η∞ (f (k−1) , Tn ) for f ∈ C2π
k−1
2(n + 1)

by induction on k ≥ 2. Moreover, by Lemma 6.73 and (6.65), we get

π
η∞ (f, Tn ) ≤ · η∞ (f , Tn )
2(n + 1)
k−1
π
≤ · η∞ (f (k−1) , Tn )
2(n + 1)
k−1
π π
≤ f (k) ∞
2(n + 1) 2(n + 1)
k
π
= f (k) ∞
2(n + 1)

for f ∈ C2π
k
, where k ≥ 1.

Now we return to the discussion from the outset of this section concer-
ning the uniform convergence of Fourier partial sums. In that discussion, we
developed the error estimate (6.49),

Fn f − f ∞ ≤ I − Fn ∞ · η∞ (f, Tn ) for f ∈ C2π .

228 6 Asymptotic Results

Moreover, we took further note on the application of the Jackson theorems.

We summarize the discussion of this section by the Dini13 -Lipschitz theorem,
each of whose results follows directly from (6.51),

log(n) · η∞ (f, Tn ) −→ 0 for n → ∞,

and one corresponding Jackson inequality in Theorems 6.66-6.72.

Theorem 6.76. (Dini-Lipschitz, 1872).

If f ∈ C2π satisﬁes one of the following conditions, then the sequence
(Fn f )n∈N0 of Fourier partial sums Fn f converges uniformly to f , i.e.,

Fn f − f ∞ −→ 0 for n → ∞,

at the following convergence rates.

(a) If
log(n) · ω(f, 1/n) = o(1) for n → ∞,
then we have (by Jackson 3)

Fn f − f ∞ = o(1) for n → ∞.

(b) If f is Lipschitz continuous, then we have (by Jackson 2)

Fn f − f ∞ = O(log(n)/n) for n → ∞.

Fn f − f ∞ = O(log(n)/nk ) for n → ∞.

Finally, we transfer the results of the theorems Jackson 1-4 concerning

the approximation of f ∈ C2π by trigonometric polynomials from Tn to the
case of approximation of f ∈ C [−1, 1] by algebraic polynomials from Pn .

Theorem 6.77. (Jackson 5). For the minimal distances

η∞ (f, Pn ) = inf p − f ∞,[−1,1] for f ∈ C [−1, 1]

p∈Pn

the following estimates hold.

(a) For f ∈ C [−1, 1], we have

3 π
η∞ (f, Pn ) ≤ · ω f, .
2 n+1
13
Ulisse Dini (1845-1918), Italian mathematician and politician
6.4 The Jackson Theorems 229

(b) If f is Lipschitz continuous with Lipschitz constant L > 0, then we have

3π · L
η∞ (f, Pn ) ≤ .
2(n + 1)

(c) For f ∈ C k [−1, 1], k ≥ 1, we have

π k 1
η∞ (f, Pn ) ≤ f (k) ∞
2 (n + 1)n(n − 1) . . . (n − (k − 2))
= O(n−k ) for n → ∞.

We split the proof of Jackson 5, Theorem 6.77, into several lemmas. The
following lemma reveals the structural connection between the trigonometric
and the algebraic case.

Lemma 6.78. For f ∈ C [−1, 1] and g(ϕ) = f (cos(ϕ)) ∈ C2π , we have

η∞ (f, Pn ) = η∞ (g, Tn ).

Proof. For f ∈ C [−1, 1] the function g ∈ C2π is even. Therefore, the unique
best approximation T ∗ ∈ Tn to g is even, so that we have

T ∗ (ϕ) = p(cos(ϕ)) for ϕ ∈ [0, 2π]

for some p ∈ Pn . Moreover, we ﬁnd the relation

η∞ (g, Tn ) = T ∗ − g∞ = p − f ∞ = p∗ − f ∞ = η∞ (f, Pn ),

where p∗ ∈ Pn is the unique best approximation to f from Pn .

Lemma 6.79. For f ∈ C [−1, 1] and g(ϕ) = f (cos(ϕ)) ∈ C2π , we have

ω(g, δ) ≤ ω(f, δ) for all δ > 0.

Proof. By the mean value theorem, we have

| cos(ϕ + ε) − cos(ϕ)| ≤ ε for ε > 0

which in turn implies

ω(g, δ) = sup |g(ϕ + ε) − g(ϕ)| = sup |f (cos(ϕ + ε)) − f (cos(ϕ))|

|ε|≤δ |ε|≤δ

≤ sup |f (x + σ) − f (x)| = ω(f, δ).

|σ|≤δ

Now we prove statements (a) and (b) of Theorem 6.77.

230 6 Asymptotic Results

Proof. (Jackson 5, parts (a),(b)).

(a): From Jackson 3, Theorem 6.70, in combination with Lemma 6.78 and
Lemma 6.79, we can conclude

3 π 3 π
η∞ (f, Pn ) = η∞ (g, Tn ) ≤ · ω g, ≤ · ω f,
2 n+1 2 n+1
for f ∈ C [−1, 1].
(b): Statement (a) implies, for Lipschitz continuous f ∈ C [−1, 1] with
Lipschitz constant L the estimate

3 π 3 π·L
η∞ (f, Pn ) ≤ · ω f, ≤ · .
2 n+1 2 n+1

To prove part (c) of Jackson 5, Theorem 6.77, we use the following lemma.
Lemma 6.80. For f ∈ C 1 [−1, 1], we have
π
η∞ (f, Pn ) ≤ η∞ (f , Pn−1 ).
2(n + 1)
Proof. Let p∗ ∈ Pn−1 be best approximation to f from Pn−1 . For
x
p(x) = p∗ (ξ) dξ ∈ Pn
0
∗
we have p = p and so
π π
η∞ (f, Pn ) = η∞ (p − f, Pn ) ≤ p − f ∞ = η∞ (f , Pn−1 )
2(n + 1) 2(n + 1)
holds by Jackson 1, Theorem 6.59, and Lemma 6.78.
Now we can prove statement (c) of Jackson 5, Theorem 6.77.
Proof. (Jackson 5, part (c)).
For f ∈ C k [−1, 1], we obtain from Lemma 6.80 the estimate
π k 1
η∞ (f, Pn ) ≤ · η∞ (f (k) , Pn−k )
2 (n + 1)n(n − 1) . . . (n + 2 − k)
by induction on k ≥ 1. This already implies, by
η∞ (f (k) , Pn−k ) ≤ f (k) − 0∞ = f (k) ∞ ,
the stated estimate
π k 1
η∞ (f, Pn ) ≤ f (k) ∞ .
2 (n + 1)n(n − 1) . . . (n − (k − 2))

6.4 The Jackson Theorems 231

We close this chapter by giving a reformulation of the Dini-Lipschitz the-

orem for the algebraic case.

Theorem 6.81. (Dini-Lipschitz).

If f ∈ C [−1, 1] satisﬁes one of the following conditions, then the sequence
(Πn f )n∈N0 of Chebyshev partial sums

n
(f, Tk )w
Πn f = Tk
Tk 2w
k=0

in (4.32) converges uniformly to f , i.e.,

Πn f − f ∞ −→ 0 for n → ∞,

at the following convergence rates.

(a) If
log(n) · ω(f, 1/n) = o(1) for n → ∞,
then we have by Jackson 5 (a)

Πn f − f ∞ = o(1) for n → ∞.

(b) If f is Lipschitz continuous, then we have by Jackson 5 (b)

Πn f − f ∞ = O(log(n)/n) for n → ∞.

(c) If f ∈ C k [−1, 1], for k ≥ 1, then we have by Jackson 5 (c)

Πn f − f ∞ = O(log(n)/nk ) for n → ∞.

For the proof of the Dini-Lipschitz theorem we refer to Exercise 6.99.

232 6 Asymptotic Results

6.5 Exercises
Exercise 6.82. Prove the following results.
(a) Show that for a set of n + 1 pairwise distinct interpolation points

a ≤ x 0 < . . . < xn ≤ b

where n ∈ N, the corresponding interpolation operator In : C [a, b] −→ Pn

is not necessarily monotone. To this end, construct one counterexample
for the case n = 2 with three interpolation points a = x0 < x1 < x2 = b.
(b) Develop for the case n = 1 a necessary and suﬃcient condition for two
interpolation points
a ≤ x0 < x1 ≤ b
under which the interpolation operator I1 is monotone.

Exercise 6.83. For n ∈ N0 , consider the Bernstein polynomials

(n) n k
βk (x) = x (1 − x)n−k ∈ Pn for x ∈ [0, 1] and 0 ≤ k ≤ n.
k
(n)
(a) Show that the Bernstein polynomials βk are non-negative on the unit
interval [0, 1], where they are a partition of unity (cf. Remark 6.7 (b),(c)).
(b) Determine the zeros (including their multiplicities) and the maximum of
(n)
the Bernstein polynomial βk on [0, 1], for 0 ≤ k ≤ n, n ∈ N0 .
(c) Prove the recursion formula
(n) (n−1) (n−1)
βk (x) = x βk−1 (x) + (1 − x) βk (x) for x ∈ [0, 1],

for n ∈ N and k = 0, . . . , n, with initial and boundary values

(0) (n−1)
β0 ≡ 1, β−1 ≡ 0, βn(n−1) ≡ 0.
(n) (n)
(d) Show that the Bernstein polynomials β0 , . . . , βn of degree n ∈ N0
form a basis for the polynomial space Pn (cf. Remark 6.7 (a)).

Exercise 6.84. Consider the Bernstein operator Bn : C [0, 1] −→ Pn ,

n
(n)
(Bn f )(x) = f (j/n)βj (x) for f ∈ C [0, 1] and n ∈ N0 ,
j=0

(n)
with the Bernstein polynomials βj (x) = n
j xj (1 − x)n−j , for 0 ≤ j ≤ n.
Show that, for any f ∈ C [0, 1], the sequence ((Bn f ) )n∈N0 of derivatives
of Bn f converges uniformly on [0, 1] to f , i.e.,

lim Bn (f ) − f ∞ = 0.
n→∞
6.5 Exercises 233

Exercise 6.85. Prove the following results.

(a) For a compact interval [a, b] ⊂ R, let f ∈ C [a, b]. Show that f vanishes
identically on [a, b], if and only if all moments of f on [a, b] vanish, i.e., if
and only if
b
mn = xn f (x) dx = 0 for all n ∈ N0 .
a

(b) Suppose f ∈ C2π . Show that f vanishes identically on R, if and only if

all Fourier coeﬃcients of f vanish, i.e., if and only if
2π
1
cj = f (x) e−ijx dx = 0 for all j ∈ Z.
2π 0
Exercise 6.86. Prove the following generalization of the Korovkin theorem.
Let Ω ⊂ Rd be a compact domain, where d ∈ N. Moreover, suppose for
s1 , . . . , sm ∈ C (Ω) that there are functions a1 , . . . , am ∈ C (Ω) satisfying

m
pt (x) = aj (t) sj (x) ≥ 0 for all t, x ∈ Ω,
j=1

where pt (x) = 0, if and only if t = x. Then, for any sequence (Ln )n∈N of
linear positive operators Ln : C (Ω) −→ C (Ω) satisfying

lim Ln sj − sj ∞ = 0 for all 1 ≤ j ≤ m

n→∞

we have the convergence

lim Ln s − s∞ = 0 for all s ∈ C (Ω).

n→∞

Conclude from this the statement of the Korovkin theorem, Theorem 6.11.

Exercise 6.87. Consider for n ∈ N0 the operator Πn∗ : C2π −→ Tn , which

assigns f ∈ C2π to its (strongly unique) best approximation Πn∗ f from Tn
with respect to · ∞ , so that

η∞ (f, Tn ) = inf T − f ∞ = Πn∗ f − f ∞ .

T ∈Tn

Investigate Πn∗ for the following (possible) properties.

(a) projection property;
(b) surjectivity;
(c) linearity;
(d) continuity;
(e) boundedness.
234 6 Asymptotic Results

Exercise 6.88. Let (un )n∈Z be a system of elements in a Hilbert space H.

Prove the equivalence of the following two statements.
(a) The system (un )n∈Z is a Riesz basis of H.
(b) There is a linear, continuous and invertible operator T : H −→ H and a
complete orthonormal system (en )n∈Z in H satisfying

T e n = un for all n ∈ Z.

Exercise 6.89. Prove the completeness of the orthonormal system

C
{eij· | j ∈ Z} ⊂ C2π
C
in C2π with respect to the Euclidean norm · C .
Hint: Corollary 6.41 and Remark 6.42.
C
Exercise 6.90. Consider the linear space C2L of complex-valued 2L-periodic
continuous functions, equipped with the inner product
2L
1 C
(f, g) = f (x) · g(x) dx for f, g ∈ C2L .
2L 0
C
(a) Determine a complete orthonormal system (ej )j∈Z in C2L .
C
(b) Develop the Fourier coeﬃcients cj = (f, ej ) of f ∈ C2L .
C
(c) Formulate the Parseval identity in C2L with respect to (ej )j∈Z .
Exercise 6.91. Let cj (f ) be the complex Fourier coeﬃcients of f ∈ C2π .
Show that the estimate

|cj (f )| ≤ C(1 + |j|)−(k+1+ε) for all j ∈ Z,

for some C > 0 and ε > 0, implies f ∈ C2π

k
(cf. Remark 6.45).
Hint: Analyze the (uniform) convergence of the Fourier partial sums

n
(Fn f )(x) = cj (f )eijx
j=−n

and their derivatives.

Exercise 6.92. Show for f ∈ C2π
1
the identity

Fn f = (Fn f ) for all n ∈ N

for the Fourier partial sums Fn f of the derivative f ∈ C2π .

Exercise 6.93. Prove Faber’s theorem, Theorem 6.57: For any sequence
(In )n∈N0 of interpolation operators In : C [a, b] −→ Pn , there is a conti-
nuous function f ∈ C [a, b], for which the corresponding sequence (In f )n∈N0
of interpolation polynomials In f ∈ Pn does not converge uniformly to f .
6.5 Exercises 235

Exercise 6.94. Let [a, b] ⊂ R be a compact interval. For the numerical

integration of
b
Iab (f ) = f (x) dx for f ∈ C [a, b]
a

we apply the Newton-Cotes quadrature. For n ∈ N, the n-th Newton-Cotes

quadrature formula is deﬁned as

n
Qn (f ) = (b − a) αj,n f (xj,n )
j=0

at equidistant knots
b−a
xj,n = a + j for j = 0, . . . , n
n
and weights
b
1
αj,n = Lj,n (x) dx for j = 0, . . . , n,
b−a a

where {L0,n , . . . , Ln,n } ⊂ Pn are the Lagrange basis functions for the knot set
Xn = {x0,n , . . . , xn,n } (cf. the discussion on Lagrange bases in Section 2.3).
Show that there is a continuous function f ∈ C [a, b], for which the se-
quence of Newton-Cotes approximations ((Qn f ))n∈N diverges.

Hint: Apply the Kuzmin15 theorem, according to which the sum of the
weights’ moduli |αj,n | diverges, i.e.,

n
|αj,n | −→ ∞ for n → ∞.
j=0

Exercise 6.95. Show that the estimate of Jackson 1, Theorem 6.59,

π
η∞ (f, Tn ) ≤ f ∞ for f ∈ C2π
1
, (6.67)
2(n + 1)

is sharp, i.e., there is a function f ∈ C2π

1
\Tn for which equality holds in (6.67).
Conclude from this that the estimate of Jackson 2, Theorem 6.66,
π·L
η∞ (f, Tn ) ≤ for f Lipschitz continuous with constant L > 0
2(n + 1)

is also sharp.
14
Roger Cotes (1682-1716), English mathematician
15
Rodion Ossijewitsch Kuzmin (1891-1949), Russian mathematician
236 6 Asymptotic Results

Exercise 6.96. Prove the theorem of de La Vallée Poussin16 : Let f ∈ C2π

and Tn ∈ Tn . If there exist 2n + 2 pairwise distinct points

0 ≤ x0 < . . . < x2n+1 < 2π,

such that Tn − f has alternating signs on xk , k = 0, . . . , 2n + 1, then we have

η∞ (f, Tn ) ≥ min |(Tn − f )(xk )|.

0≤k≤2n+1

Exercise 6.97. The estimate of Jackson 3, Theorem 6.70, is not sharp. Show
that the estimate

π
η∞ (f, Tn ) ≤ ω f, for f ∈ C2π
n+1

is sharp (under the assumptions and with the notations in Theorem 6.70).
Hint: Apply the theorem of de La Vallée Poussin from Exercise 6.96.

Exercise 6.98. Verify the following properties of the modulus of continuity

ω(f, δ) = sup |f (x + σ) − f (x)|

x,x+σ∈R
|σ|≤δ

of f : R −→ R on R with respect to δ > 0 (cf. Deﬁnition 6.68).

(a) ω(f, (n + θ)δ) ≤ nω(f, δ) + ω(f, θδ) for all θ ∈ [0, 1) and n ∈ N.
(b) ω(f, δ) ≤ nω(f, δ/n) for all n ∈ N.

Exercise 6.99. Prove part (c) of the Dini-Lipschitz theorem, Theorem 6.81,
in two steps as follows. First show that, for any f ∈ C 1 [−1, 1], the sequence
(Πn f )n∈N0 of Chebyshev partial sums

n
(f, Tj )w
Πn f = Tj where Tj = cos(j arccos(·)) ∈ Pj
j=0
Tj 2w

converges uniformly on [−1, 1] to f , i.e.,

lim Πn f − f ∞ = 0.
n→∞

From this conclude, for f ∈ C k [−1, 1], k ≥ 1, the convergence behaviour

Πn f − f ∞ = o(n1−k ) for n → ∞.

16
Charles-Jean de La Vallée Poussin (1866-1962), Belgian mathematician
7 Basic Concepts of Signal Approximation

In this chapter, we study basic concepts of mathematical signal analysis. To

this end, we ﬁrst introduce the continuous Fourier transform F,

(Ff )(ω) = f (x) · e−ixω dω for f ∈ L1 (R), (7.1)
R

as a linear integral transform on the Banach space L1 (R) of absolutely

Lebesgue-integrable functions. We motivate the transfer from Fourier series
C
of periodic functions f ∈ C2π to Fourier transforms of non-periodic functions
f ∈ L (R). In particular, we provide a heuristic account to the Fourier trans-
1

formation Ff in (7.1), where we depart from Fourier partial sums Fn f , for

f ∈ C2π . Then, we analyze the following relevant questions.
(1) Is the Fourier transform F invertible?
(2) Can F be transferred to the Hilbert space L2 (R)?
(3) Can F be applied to multivariate functions f ∈ Lp (Rd ), for p = 1, 2?
We give positive answers to all questions (1)-(3). The answer to (1) leads
us, for f ∈ L1 (R), with Ff ∈ L1 (R), to the Fourier inversion formula

1
f (x) = (Ff )(ω)eixω dω for almost every x ∈ R.
2π R
To analyze (2), we study the spectral properties of F, where we identify the
Hermite functions hn in (4.55) as eigenfunctions of F. As we show, the Her-
mite functions (hn )n∈N0 form a complete orthogonal system in the Hilbert
space L2 (R). This result leads us to the Plancherel theorem, Theorem 7.30,
providing the continuous extension of F to an automorphism on L2 (R). The
basic properties of the Fourier operator F can be generalized from the uni-
variate case to the multivariate case, and this gives an answer to (3).
Finally, we formulate and prove the celebrated Shannon sampling theorem,
Theorem 7.34 (in Section 7.3), giving a fundamental result of mathematical
signal processing. According to the Shannon sampling theorem, a signal, i.e.,
a function f ∈ L2 (R), with bounded frequency density can be reconstructed
exactly from its samples (i.e., function values) on an infinite uniform grid at a
sufficiently small sampling rate. Our proof of the Shannon sampling theorem
serves to demonstrate the relevance and the significance of the introduced
Fourier methods.

© Springer Nature Switzerland AG 2018 237

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_7
238 7 Basic Concepts of Signal Approximation

The second half of this chapter is devoted to wavelets. Wavelets are popu-
lar and powerful tools of modern mathematical signal processing, in particular
for the approximation of functions f ∈ L2 (R). A wavelet approximation to f
is essentially based on a multiresolution of L2 (R), i.e., on a nested sequence
· · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · ⊂ Vj−1 ⊂ Vj ⊂ · · · ⊂ L2 (R) (7.2)
of closed scale spaces Vj ⊂ L2 (R). The nested sequence in (7.2) leads us to
stable approximation methods, where f is represented on diﬀerent frequency
bands by orthogonal projectors Πj : L2 (R) −→ Vj . More precisely, for a ﬁxed
scaling function ϕ ∈ L2 (R), the scale spaces Vj ⊂ L2 (R) in (7.2) are generated
by dilations and translations of basis functions ϕjk (x) := 2j/2 ϕ(2j x − k), for
j, k ∈ Z, so that

Vj = span{ϕjk : k ∈ Z} ⊂ L2 (R) for j ∈ Z.

Likewise, for a corresponding wavelet function ψ ∈ L2 (R), the orthogonal
complement Wj ⊂ Vj+1 of Vj in Vj+1 ,
Vj+1 = Wj ⊕ Vj for j ∈ Z,
is generated by basis functions ψkj (x) := 2j/2 ψ(2j x − k), for j, k ∈ Z, so that

Wj = span{ψkj | k ∈ Z} for j ∈ Z.
The basic construction of wavelet approximations to f ∈ L2 (R) is based
on reﬁnement equations of the form

ϕ(x) = hk ϕ(2x − k) and ψ(x) = gk ϕ(2x − k),
k∈Z k∈Z

for speciﬁc coeﬃcient masks (hk )k∈Z , (gk )k∈Z ⊂ 2

.
The development of wavelet methods, dating back to the early 1980s, has
since then gained enormous popularity in applications of information techno-
logy, especially in image and signal processing. Inspired by a wide range of
applications in science and engineering, this has led to rapid progress concer-
ning both computational methods and the mathematical theory of wavelets.
Therefore, it is by no means possible for us to give a complete overview
over the multiple facets of wavelet methods. Instead, we have decided to
present selected basic principles of wavelet approximation. To this end, we
restrict the discussion of this chapter to the rather simple Haar wavelet
⎧
⎨ 1 for x ∈ [0, 1/2),
ψ(x) = χ[0,1/2) (x) − χ[1/2,1) (x) = −1 for x ∈ [1/2, 1),
⎩
0 otherwise,
and its corresponding scaling function ϕ = χ[0,1) . For a more comprehensive
account to the mathematical theory of wavelets, we recommend the classical
textbooks [14, 18, 49] and, moreover, the more recent textbooks [29, 31, 69]
for a more pronounced connection to Fourier analysis.
7.1 The Continuous Fourier Transform 239

7.1 The Continuous Fourier Transform

In Section 4.3, we introduced Fourier partial sums Fn f to approximate con-
C
tinuous periodic functions f ∈ C2π , or, f ∈ C2π . Recall that we restricted
ourselves (without loss of generality) to continuous functions f with period
T = 2π (according to Deﬁnition 2.32).
In the following discussion, we assume f ≡ fT : R −→ C to be a time-
continuous signal (i.e., a function) with period T , for some T > 0. In this
case, by following along the lines of our derivations in Section 4.3, we obtain
for the complex n-th Fourier partial sum Fn fT of fT the representation

n
(Fn fT )(x) = cj eijωx (7.3)
j=−n

with the frequency ω = 2π/T and with the complex Fourier coeﬃcients
T T /2
1 −ijωξ 1
cj = fT (ξ)e dξ = fT (ξ)e−ijωξ dξ (7.4)
T 0 T −T /2

for j = −n, . . . , n, as in (4.23) and in (4.24). Technically speaking, the Fourier

coefficient cj gives the amplification factor for the Fourier mode e−ijω· of the
frequency ωj = j · ω = j · 2π/T , for j = −n, . . . , n, i.e., the Fourier coefficients
cj yield the amplitudes of the present fundamental Fourier modes e−ijω· .
Then, in Section 6.3 we analyzed the convergence of Fourier partial sums.
According to Theorem 6.48, the Fourier series representation
∞

fT (x) = cj eijωx (7.5)
j=−∞

holds pointwise at all points x ∈ R, where fT is diﬀerentiable.

The bi-infinite sequence (cj )j∈Z of complex Fourier coefficients in (7.5) is
called the discrete Fourier spectrum of the signal fT . Therefore, the discrete
Fourier spectrum of fT is a sequence (cj )j∈Z of Fourier coefficients from
which we can reconstruct fT exactly via (7.5). If only finitely many Fourier
coefficients cj do not vanish, then only the frequencies of the (finitely many)
corresponding Fourier modes e−ijω· will appear in the representation (7.5).
In this case, the frequency spectrum of fT is bounded.
Now we derive an alternative representation for the Fourier series in (7.5).
To this end, we first introduce the mesh width
2π
Δω := ωj+1 − ωj ≡ for j ∈ Z
T
as the difference between two consecutive frequencies of the Fourier modes
e−ijω· . Then, the Fourier series in (7.5) can be written as
240 7 Basic Concepts of Signal Approximation
∞
T /2
1
fT (x) = fT (ξ)eiωj (x−ξ) dξ · Δω. (7.6)
j=−∞
2π −T /2

The Fourier series representation (7.6) of the T -periodic signal fT leads

us to the following questions.
• Is there a representation as in (7.6) for non-periodic signals f : R −→ C?
• If so, how would we represent the Fourier spectrum of f ?
In the following analysis on these questions, we consider a non-periodic
signal f : R −→ C as a signal with inﬁnite period, i.e., we consider the limit

f (x) = lim fT (x) for x ∈ R, (7.7)

T →∞

where the T -periodic signal fT is assumed to coincide on (−T /2, T /2) with f .
Moreover, we regard the function
T /2
gT (ω) := fT (ξ)e−iωξ dξ
−T /2

of the frequency variable ω, whereby we obtain for fT the representation

∞
1
fT (x) = gT (ωj ) eiωj x · Δω (7.8)
2π j=−∞

from (7.6). We remark that the inﬁnite series in (7.8) is a Riemannian sum
on the knot sequence {wj }j∈Z . Note that the mesh width Δω of the sequence
{wj }j∈Z is, for large enough T > 0, arbitrarily small. This observation leads
us, via the above-mentioned limit in (7.7), to the function
∞
g(ω) := lim gT (ω) = f (ξ)e−iωξ dξ for ω ∈ R. (7.9)
T →∞ −∞

To guarantee the well-deﬁnedness of the function g in (7.9), we mainly

require the existence of the Fourier integral on the right hand side in (7.9)
for all frequencies ω. To this end, we assume f ∈ L1 (R), i.e., we assume the
function f to be absolutely integrable. In this case, the Fourier integral in (7.9)
is, due to |e−iω· | ≡ 1, ﬁnite, for all frequencies ω. Recall that we work here
and throughout this work with Lebesgue integration.

Deﬁnition 7.1. For f ∈ L1 (R), the function

ˆ
(Ff )(ω) = f (ω) := f (x)e−ixω dx for ω ∈ R (7.10)
R

is called the Fourier transform of f . The Fourier operator, which assigns

f ∈ L1 (R) to its Fourier transform Ff = fˆ, is denoted as F.
7.1 The Continuous Fourier Transform 241

Note that the Fourier transform F is a linear integral transform which

maps a function f ≡ f (x) of the spatial variable x (or, a signal f of the
time variable) to a function Ff = fˆ ≡ fˆ(ω) of the frequency variable ω.
The application of the Fourier transform is (especially for signals) referred
to as time-frequency analysis. Moreover, the function Ff = fˆ is called the
continuous Fourier spectrum of f . If we regard the Fourier integral in (7.10)
as parameter integral of the frequency variable ω, then we will see that the
Fourier transform fˆ : R −→ C of f ∈ L1 (R) is a function that is uniformly
continuous on R (see Exercise 7.56). In particular, we have fˆ ∈ C (R). More-
over, due to the estimate

|fˆ(ω)| = f (x)e−ixω dx ≤ |f (x)| dx = f L1 (R) , (7.11)
R R

the function fˆ is uniformly bounded on R by the L1 -norm f L1 (R) of f .

We note the following fundamental properties of F (see Exercise 7.58).

Proposition 7.2. The Fourier transform F : L1 (R) −→ C (R) has the fol-
lowing properties, where we assume f ∈ L1 (R) for all statements (a)-(e).
(a) For fx0 := f (· − x0 ), where x0 ∈ R, we have

(Ffx0 )(ω) = e−iωx0 (Ff )(ω) for all ω ∈ R.

(b) For fα := f (α ·), where α ∈ R \ {0}, we have

1
(Ffα )(ω) = (Ff )(ω/α) for all ω ∈ R. (7.12)
|α|

(F f¯)(ω) = (Ff )(−ω) for all ω ∈ R. (7.13)

(d) For the Fourier transform of the derivative f of f , we have

(Ff )(ω) = iω(Ff )(ω) for all ω ∈ R

under the assumption f ∈ C 1 (R) ∩ L1 (R) with f ∈ L1 (R).

(e) For the derivative of the Fourier transform Ff of f , we have

d
(Ff )(ω) = −i(F(xf ))(ω) for all ω ∈ R
dω
under the assumption xf ∈ L1 (R).

All properties in Proposition 7.2 can be shown by elementary calculations.

242 7 Basic Concepts of Signal Approximation

In the following discussion, we work with functions of compact support.

Deﬁnition 7.3. For a continuous function f : R −→ C, we call the point

set
supp(f ) := {x ∈ R | f (x) = 0} ⊂ R
support of f . Therefore, f has compact support, if supp(f ) is compact.

We denote by Cc (R) the linear space of all continuous functions with

compact support. Recall that Cc (R) is dense in L1 (R), i.e., for any f ∈ L1 (R)
and ε > 0 there is a function g ∈ Cc (R) satisfying f − gL1 (R) < ε.
According to the Riemann1 -Lebesgue lemma, the Fourier transform fˆ of
f ∈ L1 (R) vanishes at inﬁnity.

Lemma 7.4. (Riemann-Lebesgue).

The Fourier transform fˆ of f ∈ L1 (R) vanishes at inﬁnity, i.e.,

fˆ(ω) −→ 0 for |ω| → ∞.

Proof. Let g be a continuous function with compact support, i.e., g ∈ Cc (R).

Due to statement (a) in Proposition 7.2, the function

g−π/ω = g(· + π/ω) ∈ Cc (R) ⊂ L1 (R) for ω = 0

has the Fourier transform

(Fg−π/ω )(ω) = eiπ (Fg)(ω) = −(Fg)(ω) for ω = 0.

This implies the representation

2(Fg)(ω) = (Fg)(ω) − (Fg−π/ω )(ω) = (g(x) − g(x + π/ω))e−ixω dx,
R

whereby, in combination with the dominated convergence theorem, we get

1
|ĝ(ω)| = |(Fg)(ω)| ≤ |g(x) − g(x + π/ω)| dx −→ 0 for |ω| → ∞. (7.14)
2 R

Now Cc (R) is dense in L1 (R), so that for any f ∈ L1 (R) and ε > 0 there is
one g ∈ Cc (R) satisfying f − gL1 (R) < ε. From this, the statement follows
from the estimate (7.11), whereby

|fˆ(ω) − ĝ(ω)| ≤ f − gL1 (R) < ε for all ω ∈ R,

in combination with the property (7.14).

1
Bernhard Riemann (1826-1866), German mathematician
7.1 The Continuous Fourier Transform 243

Remark 7.5. By the Riemann-Lebesgue lemma, the Fourier transform F is

a linear mapping between the Banach space (L1 (R), ·L1 (R) ) of all absolutely
integrable functions and the Banach space (C0 (R), · ∞ ) of all continuous
functions that are vanishing at inﬁnity, i.e.,

F : L1 (R) −→ C0 (R).

In our following discussion, two questions are of fundamental importance:

• Is the Fourier transform F invertible?
• Can the Fourier transform F be transferred to the Hilbert space L2 (R)?
To give an answer to these questions, we require only a few preparations.
First, we prove the following result.

Proposition 7.6. For f, g ∈ L1 (R) both functions fˆg and f ĝ are integrable.
Moreover, we have

fˆ(x)g(x) dx = f (ω)ĝ(ω) dω. (7.15)
R R

Proof. Since the functions fˆ and ĝ are continuous and bounded, respectively,
both functions fˆg and f ĝ are integrable. By using the Fubini2 theorem, we
can conclude

−ixω
f (ω)ĝ(ω) dω = f (ω) g(x)e dx dω
R R R

−ixω
= f (ω)e dω g(x) dx = fˆ(x)g(x) dx.
R R R

Now let us discuss two important examples for Fourier transforms.

Example 7.7. For α > 0, let α = χ[−α,α] be the indicator function of the
compact interval [−α, α] ⊂ R. Then,
1
(F1 )(ω) = e−ixω dx = 2 · sinc(ω) for ω ∈ R
−1

is the Fourier transform of 1 , where the (continuous) function

sin(ω)/ω for ω = 0
sinc(ω) :=
1 for ω = 0

is called sinus cardinalis (or, sinc function) (see Figure 7.1).

2
Guido Fubini (1879-1943), Italian mathematician
244 7 Basic Concepts of Signal Approximation

1.5

0.5

−0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

(a) The indicator function 1

1.5

0.5

−0.5
−25 −20 −15 −10 −5 0 5 10 15 20 25

(b) The Fourier transform F 1 = 2sinc

Fig. 7.1. The sinc function yields the Fourier transform of the function 1.
7.1 The Continuous Fourier Transform 245

By the scaling property in (7.12), we ﬁnd that (Fα )(ω) = 2α · sinc(αω),

for ω ∈ R, is the Fourier transform of α . Note that the Fourier transform
Fα of α ∈ L1 (R) is not contained in L1 (R), since the sinc function is not
absolutely integrable. ♦

Example 7.8. We compute the Fourier transform of the Gauss function

gα (x) = e−αx
2
for x ∈ R

for α > 0 by

−αx2 −ixω
e−α(x
2
g;
α (ω) = e e dx = +ixω/α)
dx
R R
2

−α x2 + ixω
α +( 2α )
iω 2
eα( 2α ) dx
iω
= e
R

iω 2
=e e−α(x+ 2α ) dx
−ω 2 /(4α)
R
<
π −ω2 /(4α)
= ·e ,
α

where in the last line we used the well-known identity

<
−α(x+iy)2 −αx2 π
e dx = e dx = for α > 0.
R R α

In conclusion, we note the following observation:

The Fourier transform of a Gauss function is a Gauss function.
In particular, for α = 1/2, we have
√
g=1/2 = 2π · g1/2 , (7.16)
√
i.e., the Gauss function g1/2 is an eigenfunction of F for the eigenvalue 2π.
The identity in (7.16) immediately implies the representation

1
e−x /2 = √ e−y /2 · eixy dy
2 2
for all x ∈ R, (7.17)
2π R
where we used the symmetry g1/2 (x) = g1/2 (−x), for x ∈ R. ♦

Now we can determine the operator norm of F.

Proposition 7.9. The Fourier transform F : L1 (R) −→ C0 (R) has operator

norm one, i.e.,
FL1 (R)→C0 (R) = 1.
246 7 Basic Concepts of Signal Approximation

Proof. For f ∈ L1 (R), the Fourier transform Ff = fˆ is bounded, due

to (7.11), where Ff ∞ ≤ f L1 (R) . This implies FL1 (R)→C0 (R) ≤ 1.
For the Gauss function g1/2 (x) = exp(−x2 /2) from Example 7.8, we ob-
√
tain g1/2 L1 (R) = 2π, on the one hand, whereas, on the other hand, we
√
have Fg1/2 ∞ = 2π, due to (7.16). This implies

Ff ∞
FL1 (R)→C0 (R) = sup = 1.
f ∈L1 (R)\{0} f L1 (R)

From the result of Proposition 7.9, we can draw the following conclusion.
Corollary 7.10. Let (fn )n∈N be a convergent sequence in L1 (R) with limit
f ∈ L1 (R). Then, the corresponding sequence (fˆn )n∈N of Fourier transforms
Ffn = fˆn ∈ C0 (R) converges uniformly on R to fˆ.
Proof. The statement follows immediately from the estimate

fˆn − fˆ∞ = F(fn − f )∞ ≤ F · fn − f L1 (R) = fn − f L1 (R) ,

where F = FL1 (R)→C0 (R) = 1, due to Proposition 7.9.

In the following discussion, the convolution in L1 is of primary importance.
Deﬁnition 7.11. For f, g ∈ L1 (R) the function

(f ∗ g)(x) := f (x − y)g(y) dy for x ∈ R (7.18)
R

is called convolution product, in short, convolution, between f and g.

We note that the convolution between L1 -functions is well-deﬁned, i.e.,
for f, g ∈ L1 (R), the integral in (7.18) is ﬁnite, for all x ∈ R. Moreover, the
convolution f ∗ g is in L1 (R). We take note of this important result as follows.
Proposition 7.12. For f, g ∈ L1 (R), the convolution f ∗g is absolutely inte-
grable, and we have the estimate

f ∗ gL1 (R) ≤ f L1 (R) · gL1 (R) . (7.19)

Proof. For f, g ∈ L1 (R), we have the representation

(f ∗ g)(x) dx = f (x − y)g(y) dy dx
R
R R
= g(y) f (x − y) dx dy = f (x) dx · g(y) dy,
R R R R

by using the Fubini theorem. Therefore, f ∗ g is integrable. From a similar

representation for |f ∗ g|, we get the estimate in (7.19).
7.1 The Continuous Fourier Transform 247

Remark 7.13. Due to Proposition 7.12, the Banach space L1 (R) is closed
under the convolution product ∗, i.e., we have f ∗ g ∈ L1 (R) for f, g ∈ L1 (R).
Moreover, for f, g ∈ L1 (R), we have the identity

(f ∗ g)(x) = f (x − y)g(y) dy = f (y)g(x − y) dy = (g ∗ f )(x)
R R

for all x ∈ R, i.e., the convolution ∗ is commutative on L1 (R).

Therefore, L1 (R) is a commutative Banach algebra.

Due to Proposition 7.12 and Remark 7.13, we can apply the Fourier trans-
form F to the convolution of two L1 -functions. As we show now, the Fourier
transform F(f ∗ g) of the convolution f ∗ g, for f, g ∈ L1 (R), coincides with
the algebraic product of their Fourier transforms Ff and Fg.

Theorem 7.14. (Fourier convolution theorem).

For f, g ∈ L1 (R) we have

F(f ∗ g) = (Ff ) · (Fg).

Proof. With application of the Fubini theorem we immediately obtain by

(F(f ∗ g))(ω) = (f ∗ g)(x)e−ixω dx = f (x − y)g(y) dy e−ixω dx
R R
R
= f (x − y)e−i(x−y)ω dx g(y)e−iyω dy
R R

= (Ff )(ω)g(y)e−iyω dy = (Ff )(ω) · (Fg)(ω)
R

the stated representation for all ω ∈ R.

Next, we specialize the Fourier convolution theorem to autocorrelations.

Deﬁnition 7.15. For f ∈ L1 (R) the convolution product

(f ∗ f ∗ )(x) = f (x − y)f ∗ (y) dy = f (x + y)f (y) dy for x ∈ R
R R

is called autocorrelation of f , where the function f ∗ ∈ L1 (R) is deﬁned as

f ∗ (x) := f (−x) for all x ∈ R.

From the Fourier convolution theorem, Theorem 7.14, in combination with

statement (c) in Proposition 7.2, we immediately get the following result.

Corollary 7.16. For real-valued f ∈ L1 (R), we have the representation

F(f ∗ f ∗ )(ω) = |(Ff )(ω)|2 for all ω ∈ R

for the Fourier transform of the autocorrelation of f .

248 7 Basic Concepts of Signal Approximation

7.1.1 The Fourier Inversion Theorem

In this section, we prove the Fourier inversion formula on L1 (R), as we already

motivated in the previous section. To this end, we derive a continuous version
of the Fourier series representation in (7.5) by using the continuous Fourier
spectrum fˆ = Ff , for f ∈ L1 (R).
In order to do so, we need only a few preparations.

Deﬁnition 7.17. A sequence of functions (δk )k∈N is called a Dirac3 sequence

in L1 (R), if all of the following conditions are satisﬁed.
(a) For all k ∈ N, we have the positivity

δk (x) ≥ 0 for almost every x ∈ R.

(b) For all k ∈ N, we have the normalization

δk (x) dx = 1.
R

(c) For all r > 0, we have

lim δk (x) dx = 0.
k→∞ R\[−r,r]

If we interpret the functions δk ∈ L1 (R) of a Dirac sequence as (non-

negative) mass densities, then the total mass will be normalized to unity,
due to property (b). Moreover, the total mass will at increasing k ∈ N be
concentrated around zero. This observation motivates the following example.

Example 7.18. For the Gauss function g1/2 (x) = e−x

2
/2
in Example 7.8,
we have √
e−x /2 dx = 2π.
2
g1/2 (x) dx =
R R

Now we let δ1 (x) := √1 g1/2 (x)

and, moreover, δk (x) := kδ1 (kx), for k > 1,
2π
so that
k
δk (x) = √ · e−k x /2
2 2
for k ∈ N. (7.20)
2π
By elementary calculations, we see that the Gauss sequence (δk )k∈N satisﬁes
all conditions (a)-(c) in Deﬁnition 7.17, i.e., (δk )k∈N is a Dirac sequence. ♦

Next, we prove an important approximation theorem for L1 -functions,

according to which any f ∈ L1 (R) can be approximated arbitrarily well by
its convolutions f ∗ δk with elements of a Dirac sequence (δk )k∈N in L1 (R).
3
Paul Adrien Maurice Dirac (1902-1984), English physicist
7.1 The Continuous Fourier Transform 249

Theorem 7.19. (Dirac approximation theorem).

Let f ∈ L1 (R) and (δk )k∈N be a Dirac sequence in L1 (R). Then, we have

f − f ∗ δk L1 (R) −→ 0 for k → ∞, (7.21)

i.e., the sequence (f ∗ δk )k∈N converges in L1 (R) to f .

Proof. Let g be a continuously diﬀerentiable function with compact support,

i.e., g ∈ Cc1 (R). Then, the functions g and g are bounded on R, i.e., there is
a constant M > 0 with max(g∞ , g ∞ ) ≤ M . We let K := |supp(g)| < ∞
for the (ﬁnite) length |supp(g)| of the support interval supp(g) ⊂ R of g.
We estimate the L1 -error g − g ∗ δk L1 (R) from above by

g − g ∗ δk L1 (R) = δk (y) (g(x) − g(x − y)) dy dx

R R

≤ δk (y) |g(x) − g(x − y)| dy dx,
R R

= δk (y) |g(x) − g(x − y)| dx dy, (7.22)
R R

where we used the properties (a) and (b) in Deﬁnition 7.17. Note that the
function hy := g − g(· − y) satisﬁes, for any y ∈ R, the estimate

|hy (x)| ≤ g ∞ · |y| ≤ M · |y| for all x ∈ R. (7.23)

Now we split the outer integral in (7.22) into a sum of two terms, which
we estimate uniformly from above by (7.23), so that we have, for any ρ > 0,

for all k ≥ N ≡ N (ρ) ∈ N satisfying

δk (y) dy ≤ ρ,
R\(−ρ,ρ)

by using property (c) in Deﬁnition 7.17. For ε > 0 we have g−g∗δk L1 (R) < ε,
for all k ≥ N , with assuming ρ < ε/(4KM ), Therefore, g ∈ Cc1 (R) can
be approximated arbitrarily well in L1 (R) by convolutions g ∗ δk . Finally,
Cc1 (R) is dense in L1 (R), which implies the stated L1 -convergence in (7.21)
for f ∈ L1 (R).
250 7 Basic Concepts of Signal Approximation

Now we turn to the Fourier inversion formula. At the outset of Section 7.1,
we derived the representation (7.8) for periodic functions. We can transfer
the inversion formula (7.8) from the discrete case to the continuous case. This
motivates the following deﬁnition.

Deﬁnition 7.20. For g ∈ L1 (R) we call the function

1
(F −1 g)(x) = ǧ(x) := g(ω) · eixω dω for x ∈ R (7.24)
2π R
inverse Fourier transform of g. The inverse Fourier operator, which
assigns g ∈ L1 (R) to its inverse Fourier transform ǧ, is denoted as F −1 .

Now we can prove the Fourier inversion formula

f = F −1 Ff

under suitable assumptions on f ∈ L1 (R).

Theorem 7.21. (Fourier inversion formula).

For f ∈ L1 (R) satisfying fˆ = Ff ∈ L1 (R) the Fourier inversion formula

1
f (x) = fˆ(ω) · eixω dω for almost every x ∈ R (7.25)
2π R
holds with equality at every point x ∈ R, where f is continuous.

Proof. In the following proof, we utilize the Dirac sequence (δk )k∈N of Gauss
functions from Example 7.18. For δk in (7.20), the identity (7.17) yields the
representation

k 1
e−y /2 · eikxy dy = e−ω /(2k ) · eixω dω
2 2 2
δk (x) = (7.26)
2π R 2π R
for all k ∈ N. This in turn implies

1
e−ω /(2k ) · ei(x−y)ω dω dy
2 2
(f ∗ δk )(x) = f (y)
R 2π R

1
f (y) · e−iyω dy e−ω /(2k ) · eixω dω
2 2
=
2π R R

1
f (ω) · e−ω /(2k ) · eixω dω,
2 2
= ˆ (7.27)
2π R
where, for changing order of integration, we applied the dominated conver-
gence theorem with the dominating function |f (y)|e−ω .
2

For k → ∞, the sequence of integrals in (7.27) converges to

1
fˆ(ω) · eixω dω,
2π R
7.2 The Fourier Transform on L2 (R) 251

where we use the assumption fˆ ∈ L1 (R).

According to the Dirac approximation theorem, Theorem 7.19, the se-
quence of Dirac approximations f ∗ δk converges in L1 (R) to f , for k → ∞.
This already proves the stated Fourier inversion formula (7.25) in L1 (R).
Finally, the parameter integral in (7.25) is a continuous function at x.
Therefore, we have equality in (7.25), provided that f is continuous at x.

Remark 7.22. According to Remark 7.5, the Fourier transform maps any
f ∈ L1 (R) to a continuous function fˆ ∈ C0 (R). Therefore, by the Fourier
inversion formula, Theorem 7.21, there exists for any f ∈ L1 (R) satisfying
fˆ ∈ L1 (R) a continuous representative f˜ ∈ L1 (R), which coincides with f
almost everywhere on R (i.e., f ≡ f˜ in the L1 -sense), and for which the
Fourier inversion formula holds on R.

The Fourier inversion formula implies the injectivity of F on L1 (R).

Corollary 7.23. Suppose Ff = 0 for f ∈ L1 (R) . Then, f = 0 almost

everywhere, i.e., the Fourier transform F : L1 (R) −→ C0 (R) is injective.

In the following discussion, we will often apply the Fourier inversion for-
mula to continuous functions f ∈ L1 (R) ∩ C (R). By the following result, we
can in this case drop the assumption fˆ ∈ L1 (R) (see Exercise 7.64).

Corollary 7.24. For f ∈ L1 (R) ∩ C (R) the Fourier inversion formula

1
fˆ(ω) · eixω e−ε|ω| dω
2
f (x) = lim for all x ∈ R (7.28)
ε 0 2π R

holds.

7.2 The Fourier Transform on L2 (R)

In this section, we transfer the Fourier transform F : L1 (R) −→ C0 (R) from
L1 (R) to L2 (R). We remark, however, that the Banach space L1 (R) is not a
subspace of the Hilbert space L2 (R) (see Exercise 7.57). For this reason, we
ﬁrst consider the Schwartz4 space
8
k d
∞
S(R) = f ∈ C (R) x · f (x) is bounded for all k, ∈ N0
dx

of all rapidly decaying C ∞ functions.

4
Laurent Schwartz (1915-2002), French mathematician
252 7 Basic Concepts of Signal Approximation

Remark 7.25. Every function f ∈ S(R) and all of its derivatives f (k) , for
k ∈ N, are rapidly decaying to zero around inﬁnity, i.e., for any (complex-
valued) polynomial p ∈ P C and for any k ∈ N0 , we have
p(x)f (k) (x) −→ 0 for |x| → ∞.
Therefore, all derivatives f (k) of f ∈ S(R), for k ∈ N, are also contained
in S(R). Obviously, we have the inclusion S(R) ⊂ L1 (R), and so f ∈ S(R)
and all its derivatives f (k) , for k ∈ N, are absolutely integrable, i.e., we have
f (k) ∈ L1 (R) for all k ∈ N0 .
Typical examples of elements in the Schwartz space S(R) are C ∞ func-
tions with compact support. Another example is the Gauss function gα , for
α > 0, from Example 7.8. Before we give further examples of functions in the
Schwartz space S(R), we ﬁrst note a few observations.
According to Remark 7.25, every function f ∈ S(R) and all its derivatives
f (k) , for k ∈ N, have a Fourier transform. Moreover, for f ∈ S(R) and
k, ∈ N0 , we have the representations
d
(Ff )(ω) = (−i) (F(x f ))(ω) for all ω ∈ R
dω
(Ff (k) )(ω) = (iω)k (Ff )(ω) for all ω ∈ R,
as they directly follow (by induction) from Proposition 7.2 (d)-(e) (see Exer-
cise 7.59). This yields the uniform estimate
k
k d d
ω
dω (Ff )(ω) ≤ dxk (x f (x)) 1 for all ω ∈ R.

(7.29)
L (R)

i.e., all functions ω k (Ff )() (ω), for k, ∈ N0 , are bounded. Therefore, we see
that the Fourier transform Ff of any f ∈ S(R) is also contained in S(R).
By the Fourier inversion formula, Theorem 7.25, the Fourier transform F is
bijective on S(R). We reformulate this important result as follows.
Theorem 7.26. The Fourier transform F : S(R) −→ S(R) is an automor-
phism on the Schwartz space S(R), i.e., F is linear and bijective on S(R).

Now we make an important example for a family of functions that are
contained in the Schwartz space S(R). To this end, we recall the Hermite
polynomials Hn from Section 4.4.3 and their associated Hermite functions
hn from Exercise 4.42.
Example 7.27. The Hermite functions

hn (x) = Hn (x) · e−x

2
/2
for n ∈ N0 (7.30)
are contained in the Schwartz space S(R). Indeed, this follows from the rapid
decay of the Gauss function g1/2 (x) = exp(−x2 /2), cf. Example 7.8. ♦
7.2 The Fourier Transform on L2 (R) 253

The Schwartz space S(R) is obviously contained in any Banach space

Lp (R), for 1 ≤ p ≤ ∞. In particular, S(R) is a subspace of the Hilbert space
L2 (R), i.e., S(R) ⊂ L2 (R). In the following discussion, we work with the
L2 -inner product

(f, g) = f (x)g(x) dx for f, g ∈ L2 (R). (7.31)
R

Now we prove the completeness of the Hermite functions in L2 (R).

Proposition 7.28. The Hermite functions (hn )n∈N0 in (7.30) are a com-
plete orthogonal system in the Hilbert space L2 (R).

Proof. The orthogonality of (hn )n∈N0 follows from the orthogonality of the
Hermite polynomials in Theorem 4.28. According to (4.47), we have
√
(hm , hn ) = 2n n! π · δmn for all m, n ∈ N0 . (7.32)

Now we show the completeness of the system (hn )n∈N0 . To this end, we
use the completeness criterion in Theorem 6.26, as follows.
Suppose that f ∈ L2 (R) satisﬁes (f, hn ) = 0 for all n ∈ N0 . Then, we
consider the function g : C −→ C, deﬁned as

g(z) = h0 (x)f (x)e−ixz dx for z ∈ C.
R

Note that g is holomorphic on C, and, moreover, we have

(m)
g (z) = (−i) m
xm h0 (x)f (x)e−ixz dx for m ∈ N0 .
R

Therefore, g (m) (0) can be written as a linear combination of the inner

products (f, hk ), for k = 0, . . . , m, so that g (m) (0) = 0 for all m ∈ N0 .
From this, we conclude g ≡ 0, since g is holomorphic, which in turn implies
F(h0 f ) = 0. By Corollary 7.23, we get h0 f = 0 almost everywhere. In par-
ticular, we have f = 0 almost everywhere. Due to the completeness criterion,
Theorem 6.26, the orthogonal system (hn )n∈N0 is complete in L2 (R).

Theorem 7.29. For any n ∈ N0 , the Hermite function h√n in (7.30) is an

eigenfunction of the Fourier transform for the eigenvalue 2π(−i)n , i.e.,
√
;n = 2π(−i)n hn
h for all n ∈ N0 .

Proof. We prove the statement by induction on n ∈ N0 .

Initial step: For n = 0 the statement holds for h0 = g1/2 by (7.16).
Induction hypothesis: Assume that the Hermite function
√ hn is an eigen-
function of the Fourier transform for the eigenvalue 2π(−i)n , for n ∈ N0 .
254 7 Basic Concepts of Signal Approximation

Induction step (n −→ n + 1): By partial integration, we obtain

h
n+1 (ω) = e−ixω hn+1 (x) dω
R
n
d d −x2
e−ixω (−1)n+1 ex /2
2
= e dx
R dx dxn
n
x=R
−ixω n+1 x2 /2 d −x2
= lim e (−1) e e
R→∞ dxn x=−R

dn
− (−iω + x)e−ixω ex /2 (−1)n+1 n e−x dx
2 2

R dx

x=R
= lim −e−ixω hn (x) x=−R + (−iω + x)e−ixω hn (x) dx
R→∞ R
;n (ω) + xh
= −iω h =n (ω).

From the induction hypothesis and Proposition 7.2 (e), we conclude

√
h
n+1 (ω) = 2π(−i)n+1 (ωhn (ω) − hn (ω)) . (7.33)

Now the three-term recursion of the Hermite polynomials in (4.48) can

be transferred to the Hermite functions, so that

hn+1 (x) = 2xhn (x) − 2nhn−1 (x) for n ≥ 0 (7.34)

holds with the initial values h−1 ≡ 0 and h0 (x) = exp(−x2 /2). By using the
recursion Hn (x) = 2nHn−1 (x), for n ∈ N, from Corollary 4.30, we get
d −x2 /2
hn (x) = · Hn (x) = −x · e−x /2 · Hn (x) + e−x /2 · Hn (x)
2 2
e
dx
= −xhn (x) + e−x /2 (2nHn−1 (x))
2

= 2nhn−1 (x) − xhn (x). (7.35)

Moreover, the representations in (7.34) and (7.35) imply the recursion

hn+1 (x) = xhn (x) − hn (x) for n ≥ 0 (7.36)

√
(cf. Exercise 4.42). Therefore, h
n+1 = 2π(−i)n+1 hn+1 by (7.33) and (7.36).

Given the completeness of the Hermite functions (hn )n∈N0 in L2 (R), ac-
cording to Theorem 7.28, there is a unique extension of the Fourier transform
F : S(R) −→ S(R) to the Hilbert space L2 (R). Moreover, by the spectral
property of the orthonormal system (hn )n∈N0 in L2 (R), as shown in Theo-
rem 7.29, the Parseval identity (6.12) can also be extended to L2 (R). This
important result is referred to as the Plancherel5 theorem.
5
Michel Plancherel (1885-1967), Swiss mathematician
7.3 The Shannon Sampling Theorem 255

Theorem 7.30. (Plancherel theorem).

The Fourier transform F : S(R) −→ S(R) can uniquely be extended to a
bounded and bijective linear mapping on the Hilbert space L2 (R). The ex-
tended Fourier transform F : L2 (R) −→ L2 (R) has the following properties.
(a) The Parseval identity

(Ff, Fg) = 2π(f, g) for all f, g ∈ L2 (R),

holds, so that in particular

√
Ff L2 (R) = 2πf L2 (R) for all f ∈ L2 (R).

(b) The Fourier inversion formula holds on L2 (R), i.e.,

F −1 (Ff ) = f for all f ∈ L2 (R).

(c) For the operator norms of F and F −1 on L2 (R), we have

FL2 (R)→L2 (R) = (2π)1/2

F −1 L2 (R)→L2 (R) = (2π)−1/2 .

We close this section by the following remark.
Remark 7.31. The Fourier operator F : L2 (R) −→ L2 (R) is uniquely de-
termined by the properties in Theorem 7.30. Moreover, we remark that the
Fourier transform F : L1 (R) −→ C0 (R) maps any f ∈ L1 (R) to a unique
uniformly continous function Ff ∈ C0 (R). In contrast, the Fourier transform
F : L2 (R) −→ L2 (R) maps any f ∈ L2 (R) to a function Ff ∈ L2 (R) that is
merely almost everywhere unique.

7.3 The Shannon Sampling Theorem

This section is devoted to the Shannon6 sampling theorem, which is a funda-
mental result in mathematical signal processing. According to the Shannon
sampling theorem, any signal f ∈ L2 (R) with bounded frequency density can
be reconstructed exactly from its samples (i.e., function values) on an infinite
uniform grid {jd | j ∈ Z} ⊂ R at a sufficiently small sampling rate d > 0.
We formulate the mathematical assumptions on f as follows.
Definition 7.32. A function f ∈ L2 (R) is said to be band-limited, if its
Fourier transform Ff has compact support supp(Ff ), i.e., if there is some
constant L > 0 satisfying supp(Ff ) ⊂ [−L, L], where the smallest constant
L with this property is called the bandwidth of f .
6
Claude Elwood Shannon (1916-2001), US-American mathematician
256 7 Basic Concepts of Signal Approximation

Remark 7.33. Every band-limited function f is analytic. This important

result is due to the Paley7 -Wiener8 theorem. A detailed discussion concerning
the analyticity of Fourier transforms can be found in [58, Section IX.3].

Theorem 7.34. (Shannon sampling theorem).

Let f ∈ L2 (R) be a band-limited function with bandwidth L > 0. Then, we
have the reconstruction formula

f (x) = f (jπ/L) · sinc(Lx − jπ) for all x ∈ R. (7.37)
j∈Z

Proof. Without loss of generality, we assume L = π for the bandwidth of f ,

since otherwise we can resort to the case ĝ(ω) = fˆ(ω · π/L).
For a ﬁxed x ∈ R, we work with the function ex (ω) = exp(ixω). By
ex ∈ L2 [−π, π], the Fourier series representation

ex (ω) = cj (ex ) · eijω
j∈Z

holds in the L2 -sense. The Fourier coeﬃcients cj (ex ) of ex can be computed

as
π
1
cj (ex ) = ex (ω) · e−ijω dω = sinc(π(x − j)) for all j ∈ Z.
2π −π

Now f has a continuous representative in L2 which satisﬁes the representation

π
1
f (x) = fˆ(ω) · eixω dω (7.38)
2π −π
π
1
= sinc(π(x − j)) · fˆ(ω) · eijω dω (7.39)
2π −π
j∈Z

= f (j) · sinc(π(x − j)) (7.40)
j∈Z

pointwise for all x ∈ R. Note that we have applied the Fourier inversion
formula of the Plancherel theorem, Theorem 7.30, to obtain (7.38) and (7.40).
Finally, we remark that the interchange of integration and summation
in (7.39) is valid by the Parseval identity
π
1
g(ω)h(ω) dω = cj (g) · cj (h) for all g, h ∈ L2 [−π, π],
2π −π
j∈Z

which completes our proof for the stated reconstruction formula in (7.37).
7
Raymond Paley (1907-1933), English mathematician
8
Norbert Wiener (1894-1964), US-American mathematician
7.4 The Multivariate Fourier Transform 257

Remark 7.35. By the Shannon sampling theorem, Theorem 7.34, any band-
limited function f ∈ L1 (R) ∩ C (R), or, f ∈ L2 (R) with bandwidth L > 0
can uniquely be reconstructed from its values on the uniform sampling grid
{jd | j ∈ Z} ⊂ R for all sampling rates d ≤ π/L. Therefore, the optimal
sampling rate is d∗ = π/L, and this rate corresponds to half of the smallest
wave length 2π/L that is present in the signal f . The optimal sampling rate
d∗ = π/L is called the Nyquist rate (or, Nyquist distance).

Remark 7.36. In the commonly used literature, various formulations of the

Shannon sampling theorem are given for band-limited functions f ∈ L1 (R),
rather than for f ∈ L2 (R). We remark that the representation in (7.37) does
also hold for band-limited functions f ∈ L1 (R), or, to be more precise, the
representation in (7.37) holds pointwise for a continuous representative of
f ∈ L1 (R). In fact, this statement can be shown (for compact supp(fˆ) ⊂ R)
by following along the lines of our proof for Theorem 7.34.

Remark 7.37. The Shannon sampling theorem is, in its diﬀerent variants,
also connected with the names of Nyquist9 , Whittaker10 , and Kotelnikov11 . In
fact, Kotelnikov had formulated and published the sampling theorem already
in 1933, although his work was widely unknown for a long time. Shannon
formulated the sampling theorem in 1948, where he used this result as a
starting point for his theory on maximal channel capacities.

7.4 The Multivariate Fourier Transform

In this section, we introduce the Fourier transform for complex-valued func-

tions f ≡ f (x1 , . . . , xd ) of d real variables. To this end, we can rely on basic
concepts for the univariate case, d = 1, from the previous sections. Again,
we ﬁrst regard the Fourier transform on the Banach space of all absolutely
integrable functions,
8

L (R ) = f : R −→ C
1 d d |f (x)| dx < ∞ ,
Rd

equipped with the L1 -norm

f L1 (Rd ) = |f (x)| dx for f ∈ L1 (Rd ).
Rd

9
Harry Nyquist (1889-1976), US-American electrical engineer
10
Edmund Taylor Whittaker (1873-1956), British astronomer, mathematician
11
Vladimir Kotelnikov (1908-2005), Russian pioneer of information theory
258 7 Basic Concepts of Signal Approximation

Deﬁnition 7.38. For f ∈ L1 (Rd ), the function

ˆ
(Fd f )(ω) = f (ω) := f (x)e−ix,ω dx for ω ∈ Rd (7.41)
Rd

is called the Fourier transform of f . The Fourier operator, which assigns

f ∈ L1 (Rd ) to its d-variate Fourier transform Fd f = fˆ, is denoted as Fd .
Likewise, for g ∈ L1 (Rd ) the function

(Fd−1 g)(x) = ǧ(x) := (2π)−d g(ω) · eix,ω dω for x ∈ R (7.42)
R

is called the inverse Fourier transform of g. The operator, which assigns

g ∈ L1 (Rd ) to its inverse Fourier transform ǧ, is denoted as Fd−1 .

By separation of the variables in the Rd -inner product ·, · ,

x, ω = x1 ω1 +. . .+xd ωd for x = (x1 , . . . , xd )T , ω = (ω1 , . . . , ωd )T ∈ Rd ,

appearing in the Fourier transform’s formulas (7.41) and (7.42) we can, via

e±ix,ω = e±ix1 ω1 · . . . · e±ixd ωd ,

generalize the results for the univariate case, d = 1, to the multivariate case,
d ≥ 1. In the following of this section, we merely quote results that are
needed in Chapters 8 and 9. Of course, the Fourier inversion formula from
Theorem 7.21 is of central importance.

Theorem 7.39. (Fourier inversion formula).

For f ∈ L1 (Rd ) with fˆ = Fd f ∈ L1 (Rd ), the Fourier inversion formula

f (x) = (2π)−d fˆ(ω) · eix,ω dω for almost every x ∈ Rd (7.43)
Rd

holds with equality at every point x ∈ Rd , where f is continuous.

As in Corollary 7.24, formula (7.43) holds also for f ∈ L1 (Rd ) ∩ C (Rd ).

Corollary 7.40. For f ∈ L1 (Rd ) ∩ C (Rd ), the Fourier inversion formula

−d ˆ ix,ω −ε ω 22
f (x) = lim (2π) f (ω) · e e dω for all x ∈ Rd (7.44)
ε 0 Rd

holds.

An important example is the Fourier transform of the Gauss function.

7.4 The Multivariate Fourier Transform 259

Example 7.41. The d-variate Fourier transform of the Gauss function

gα (x) = e−α
2
x 2 for x ∈ Rd and α > 0

is π d/2
e−
2
(Fd gα )(ω) = ω 2 /(4α) for ω ∈ Rd .
α
♦

Moreover, we apply the Fourier transform Fd to convolutions.

Deﬁnition 7.42. For f, g ∈ L1 (Rd ), the function

(f ∗ g)(x) := f (x − y)g(y) dy for x ∈ Rd (7.45)
Rd

is called the convolution product, in short, convolution, between f and g.

Moreover, for f ∈ L1 (Rd ) the convolution product

∗ ∗
(f ∗ f )(x) = f (x − y)f (y) dy = f (x + y)f (y) dy for x ∈ Rd
Rd Rd

is called the autocorrelation of f , where f ∗ (x) := f (−x) for all x ∈ Rd .

As for the univariate case, in Theorem 7.14 and Corollary 7.16, the Fourier
convolution theorem holds for the multivariate Fourier transform.

Theorem 7.43. (Fourier convolution theorem).

For f, g ∈ L1 (Rd ), the identity

Fd (f ∗ g) = (Fd f ) · (Fd g).

holds. In particular, for real-valued f ∈ L1 (R), we have

Fd (f ∗ f ∗ )(ω) = |(Fd f )(ω)|2 for all ω ∈ Rd

for the Fourier transform of the autocorrelation of f .

By following along the lines of Section 7.2, we can transfer the multivariate
Fourier transform Fd : L1 (Rd ) −→ C0 (Rd ) to the Hilbert space
8

L2 (Rd ) = f : Rd −→ C |f (x)|2 dx < ∞
Rd

of all square-integrable function, being equipped with the L2 -inner product

(f, g) = f (x)g(x) dx for f, g ∈ L2 (Rd )
Rd
260 7 Basic Concepts of Signal Approximation

and the Euclidean norm · L2 (Rd ) = (·, ·)1/2 . To this end, we ﬁrst introduce
the Fourier transform Fd on the Schwartz space
8
∞

d k d
S(R ) = f ∈ C (R ) x · f (x) is bounded for all k, ∈ N0
d d
dx
of all rapidly decaying C ∞ functions. As for the univariate case, Theorem 7.26,
the Fourier transform Fd is bijective on S(Rd ).
Theorem 7.44. The multivariate Fourier transform Fd : S(Rd ) −→ S(Rd )
is an automorphism on the Schwartz space S(Rd ).
This implies the Plancherel theorem, as in Theorem 7.30 for d = 1.
Theorem 7.45. (Plancherel theorem).
The Fourier transform Fd : S(Rd ) −→ S(Rd ) can uniquely be extended to
a bounded and bijective linear mapping on the Hilbert space L2 (Rd ). The
extended Fourier transform Fd : L2 (Rd ) −→ L2 (Rd ) has the following pro-
perties.
(a) The Parseval identity
(Fd f, Fd g) = (2π)d (f, g) for all f, g ∈ L2 (Rd ),
holds, so that in particular
Fd f L2 (Rd ) = (2π)d/2 f L2 (Rd ) for all f ∈ L2 (Rd ).
(b) The Fourier inversion formula
Fd−1 (Fd f ) = f for all f ∈ L2 (Rd )
holds on L2 (R), i.e.,

f (x) = (2π)−d fˆ(ω)eix,ω dω for almost every x ∈ Rd .
Rd

(c) For the operator norms of Fd and Fd−1 on L2 (Rd ), we have

Fd L2 (Rd )→L2 (Rd ) = (2π)d/2
Fd−1 L2 (Rd )→L2 (Rd ) = (2π)−d/2 .

7.5 The Haar Wavelet

In this section, we turn to the construction and analysis of wavelet methods.
Wavelets are important building blocks for multiresolution representations of
signals f ∈ L2 (R). To this end, suitable wavelet bases of L2 (R) are utilized. A
very simple-structured wavelet basis of L2 (R) is due to the work [32] of Alfréd
Haar in 1910. In the following discussion, we explain important principles of
wavelet methods by using the Haar12 wavelet.
12
Alfréd Haar (1885-1933), Hungarian mathematician
7.5 The Haar Wavelet 261

Let us ﬁrst introduce a basic ingredient. For an interval I ⊂ R, we denote

by χI : R −→ R,
1 for x ∈ I,
χI (x) :=
0 otherwise,
the indicator function of I. Now we can give a definition for the Haar wavelet.
Definition 7.46. The function ψ : R −→ R, defined as
⎧
⎨ 1 for x ∈ [0, 1/2),
ψ(x) = χ[0,1/2) (x) − χ[1/2,1) (x) = −1 for x ∈ [1/2, 1),
⎩
0 otherwise,
is called Haar wavelet.
In the following, we wish to construct a wavelet basis of L2 (R) by using the
Haar wavelet ψ. To this end, we apply dilations (i.e., scalings) and translations
(i.e., shifts) to the argument of ψ. To be more precise, we consider, for j, k ∈ Z,
the wavelet functions
ψkj (x) := 2j/2 ψ(2j x − k) for x ∈ R (7.46)
that are generated from the Haar wavelet ψ by multiplication of ψ with factor
2j/2 , along with the application of dilations with 2j and translations about k
on the argument of ψ. In particular, for j = k = 0, we get the Haar wavelet
ψ = ψ00 . Figure 7.2 shows the function graphs of ψkj , for j = −1, 0, 1.
Let us note only a few elementary properties of the wavelet functions ψkj .
Proposition 7.47. For ψkj in (7.46), the following statements hold.
(a) The wavelet functions ψkj have zero mean, i.e.,
∞
ψkj (x) dx = 0 for all j, k ∈ Z.
−∞

(b) The wavelet functions ψkj have unit L2 -norm, i.e.,

ψkj L2 (R) = 1 for all j, k ∈ Z.

supp(ψkj ) = [2−j k, 2−j (k + 1)].

Proposition 7.47 (a)-(c) can be proven by elementary calculations.
Another important property is the orthonormality of the function system
{ψkj }j,k∈Z with respect to the L2 -inner product (·, ·), deﬁned as

(f, g) := f (x)g(x) dx for f, g ∈ L2 (R).
R
262 7 Basic Concepts of Signal Approximation

1.5

-1 -1
1
-1 0
0.5

-0.5

-1

-1.5

-2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

ψk−1 for k = −1, 0

1.5 0 0 0 0
1
-2 -1 0 1
0.5

-0.5

-1

-1.5

-2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

ψk0 , for k = −2, −1, 0, 1

2 1 1 1 1 1 1 1 1
1.5 -4 -3 -2 -1 0 1 2 3
1

0.5

-0.5

-1

-1.5

-2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

ψk1 for k = −4, . . . , 3

Fig. 7.2. The Haar wavelet ψ = ψ00 generates the functions ψkj = 2j/2 ψ(2j · −k).
7.5 The Haar Wavelet 263

Proposition 7.48. The function system {ψkj }j,k∈Z is orthonormal in L2 (R),

i.e.,
(ψkj , ψm ) = δjm δk for all j, k, , m ∈ Z.

Proof. According to Proposition 7.47 (b), any ψkj has unit L2 -norm.
Now suppose that ψkj and ψm , are, for j, k, , m ∈ Z, distinct.
Case 1: If j = m, then k = . In this case, the intersection of the support
intervals of ψkj and ψm contains at most one point, according to Proposi-
tion 7.47 (c), so that (ψkj , ψm ) = 0.
Case 2: If j = m, then we assume m > j (without loss of generality). In
this case, we either have, for = 2m−j k, . . . , 2m−j (k + 1) − 1,

supp(ψkj ) ∩ supp(ψm ) = ∅,

whereby (ψkj , ψm ) = 0, or we have, for = 2m−j k, . . . , 2m−j (k + 1) − 1,

supp(ψm ) = [2−m , 2−m ( + 1)] ⊂ [2−j k, 2−j (k + 1)] = supp(ψkj ),

so that
(ψkj , ψm ) = ±2 j/2
ψm (x) dx = 0.
supp(ψm )

This completes our proof.

Next, we wish to construct a sequence of approximations to f ∈ L2 (R)

on different scales, i.e., at different resolutions. To this end, we work with
a decomposition of L2 (R) into ”finer” and ”coarser” closed subspaces. In
the following construction of such closed subspaces, the relation between the
Haar-Wavelet ψ and its scaling function

ϕ = χ[0,1)

plays an important role. For the functions

ϕjk (x) := 2j/2 ϕ(2j x − k) for j, k ∈ Z, (7.47)

generated by ϕ, we note the following elementary properties.

264 7 Basic Concepts of Signal Approximation

Proposition 7.49. For ϕjk in (7.47) the following statements hold.

(a) We have the orthonormality relation

(ϕjk , ϕj ) = δk for all j, k, ∈ Z.

(b) For any j, k ∈ Z, the function ϕjk has compact support, where

supp(ϕjk ) = [2−j k, 2−j (k + 1)].

Proposition 7.49 can be proven by elementary calculations.

Now we turn to a very important property of ϕ, which in particular ex-
plains the naming scaling function.

Proposition 7.50. The refinement equations

ϕj−1
k = 2−1/2 (ϕj2k + ϕj2k+1 ) for all j, k ∈ Z (7.48)
−1/2
ψkj−1 =2 (ϕj2k − ϕj2k+1 ) for all j, k ∈ Z (7.49)

hold.

Proof. By the representation

ϕ(x) = ϕ(2x) + ϕ(2x − 1) for all x ∈ R

the reﬁnement equation in (7.48) holds for j = k = 0. By linear transforma-

tion of the argument x → 2j−1 x − k, this implies the representation in (7.48).
Likewise, the representation in (7.49) can be veriﬁed, by now from the identity

ψ(x) = ϕ(2x) − ϕ(2x − 1) for all x ∈ R.

By the reﬁnement equations in (7.48) and (7.49), the coarser functions

ϕj−1
k and ψkj−1 are represented by a unique linear combination of two ﬁner
functions, ϕj2k and ϕj2k+1 , respectively. We collect all functions of reﬁnement
level j ∈ Z in the L2 -closure

Vj = span{ϕjk : k ∈ Z} ⊂ L2 (R) for j ∈ Z (7.50)

of all linear combinations of functions ϕjk , for k ∈ Z. For the properties of

the scale spaces Vj , we note the following observation.

Proposition 7.51. The spaces Vj in (7.50) have the following properties.

(a) Vj is 2−j Z-translation-invariant, i.e., f ∈ Vj implies f (· − 2−j k) ∈ Vj .
(b) The inclusion Vj−1 ⊂ Vj holds.
7.5 The Haar Wavelet 265

Proof. Property (a) follows from the scale-invariance of the wavelet basis,

ϕj (x − 2−j k) = 2j/2 ϕ(2j (x − 2−j k) − ) = 2j/2 ϕ(2j x − (k + )) = ϕjk+ .

Property (b) follows directly from the reﬁnement equation in (7.48).

Remark 7.52. According to property (b) in Proposition 7.51, the coarser

scale space Vj−1 (spanned by the coarser basis elements ϕj−1
k ) is contained in
the ﬁner scale space Vj (spanned by the ﬁner basis elements ϕjk ). Therefore,
the scale spaces (Vj )j∈Z in (7.50) are a nested sequence

· · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · ⊂ Vj−1 ⊂ Vj ⊂ · · · ⊂ L2 (R) (7.51)

of subspaces in L2 (R).

Now we study further properties of the nested sequence (Vj )j∈Z . To this
end, we work with the orthogonal projection operator Πj : L2 (R) −→ Vj ,
for j ∈ Z, which assigns every f ∈ L2 (R) to its unique best approximation
s∗j = Πj f in L2 (R). According to our discussion in Section 6.2, we have the
series representation

Πj f = (f, ϕjk )ϕjk ∈ Vj for f ∈ L2 (R) (7.52)
k∈Z

for the orthogonal projection of f on Vj , as in (6.9). The following result

describes the asymptotic behaviour of the approximations (Πj f )j∈Z to any
function f ∈ L2 (R) with respect to · = · L2 (R) .

Proposition 7.53. For the sequence (Πj f )j∈Z of orthogonal projections

Πj f of f ∈ L2 (R) in (7.52) the following statements hold.
(a) The sequence (Πj f )j∈Z converges for j → ∞ w.r.t. · to f , i.e.,

Πj f − f −→ 0 for j → ∞.

(b) The sequence (Πj f )j∈Z converges for j → −∞ to zero, i.e.,

Πj f −→ 0 for j → −∞.

Proof. Let ε > 0 and f ∈ L2 (R). Then there is, for a (suﬃciently ﬁne) dyadic
decomposition of R, a step function T ∈ L2 (R) with T −f < ε/2. Moreover,
for the indicator functions χI j of the dyadic intervals Ikj := [2−j k, 2−j (k +1)),
k
we have the reproduction property Πj χI j = χI j , for all k ∈ Z. Therefore,
k k
there is a level index j0 ∈ Z with T = Πj T for all j ≥ j0 . From this, we can
conclude statement (a) by the estimate

Πj f − f ≤ Πj (f − T ) + Πj T − T + T − f

≤ Πj · f − T + T − f < ε for j ≥ j0 ,
266 7 Basic Concepts of Signal Approximation

where we use Πj = 1 from Proposition 4.7.

To prove statement (b), we take on given ε > 0 a continuous function g
with compact support supp(g) = [−R, R], for R > 0, such that f −g < ε/2.
Now for 2j ≤ R−1 , we have
. , - /
0 R
Πj g = 2j g(x) dx χI j + g(x) dx χI j
−1 0
−R 0

= 2j (c−1 χI j + c0 χI j ),
−1 0

where c−1 = (g, ϕj−1 ) and c0 = (g, ϕj0 ). Then, we have Πj g2 = 2j (c2−1 + c20 )
and, moreover, Πj g < ε/2 for j ≡ j(ε) ∈ Z small enough. For this j, we
ﬁnally get

Πj f ≤ Πj (f − g) + Πj g ≤ f − g + Πj g < ε

by the triangle inequality, and so (b) is also proven.

Proposition 7.53 implies a fundamental property of the scale spaces Vj .

Theorem 7.54. The system (Vj )j∈Z of scale spaces Vj in (7.50) forms a
multiresolution analysis of L2 (R) by satisfying the following conditions.
(a) The scale spaces in (Vj )j∈Z are nested, so that the inclusions (7.51) hold.
>
(b) The system (Vj )j∈Z is complete in L2 (R), i.e., 2
? L (R) = j∈Z Vj .
(c) The system (Vj )j∈Z satisﬁes the separation j∈Z Vj = {0}.

Proof. Property (a) holds according to Remark 7.52.

Property (b) follows from Proposition 7.53 (a) ? and Theorem 6.21.
To prove (c), let f ∈ L2 (R) be an element in j∈Z Vj . Then, f must have
the form
c for x ∈ (−∞, 0),
f (x) =
cr for x ∈ [0, ∞),
for some constants c , cr ∈ R. Since f ∈ L2 (R), we have c = cr = 0 and so
f ≡ 0. Hence, statement (c) is proven.

In the following analysis, we consider the orthogonal complement

Wj−1 = {w ∈ Vj | (w, v) = 0 for all v ∈ Vj−1 } ⊂ Vj for j ∈ Z

of Vj−1 in Vj , where we use the notation

Vj = Wj−1 ⊕ Vj−1 (7.53)

for the orthogonality relation between Wj−1 and Vj−1 . In this way, the lin-
ear scale space Vj is by (7.53) decomposed into a smooth scale space Vj−1
7.5 The Haar Wavelet 267

containing the low frequency functions of Vj and a rough orthogonal comple-

ment space Wj−1 containing the high frequency functions from Vj . A recursive
decomposition of the scale spaces V yields the representation

Vj = Wj−1 ⊕ Wj−2 ⊕ · · · ⊕ Wj− ⊕ Vj− for ∈ N, (7.54)

whereby the scale space Vj is being decomposed in a ﬁnite sequence of sub-

space with increasing smoothness. By Theorem 7.54, we get the decomposi-
tion @
L2 (R) = Wj , (7.55)
j∈Z
2
i.e., L (R) is decomposed into the orthogonal subspaces Wj . The linear func-
tion spaces Wj are called wavelet spaces. The following result establishes
a fundamental relation between the wavelet functions {ψkj }j,k∈Z of the Haar
wavelets ψ and the wavelet spaces Wj .

Theorem 7.55. The functions {ψkj }j,k∈Z form an orthonormal basis of

L2 (R), i.e., {ψkj }j,k∈Z is a complete orthonormal system in L2 (R).

Proof. The orthonormality of the functions {ψkj }j,k∈Z is covered by Proposi-

tion 7.48. Therefore, it remains to prove the completeness of the orthonormal
system {ψkj }j,k∈Z in L2 (R). Due to the decomposition in (7.55), it is suﬃ-
cient to show that the wavelet space Wj is, for any reﬁnement level j ∈ Z,
generated by the functions ψkj , for k ∈ Z, i.e.,

Wj = span{ψkj | k ∈ Z} for j ∈ Z.

To this end, we ﬁrst verify the orthogonality relation

(ψkj−1 , ϕj−1
)=0 for all k, ∈ Z. (7.56)

We get (7.56) as follows. For k = , we have supp(ψkj−1 ) ∩ supp(ϕj−1 ) = ∅,

whereby (ψkj−1 , ϕj−1
) = 0. For k = , the orthogonality in (7.56) follows from
Proposition 7.47 (a). Now by the orthogonality relation in (7.56), we have

ψkj−1 ∈ Wj−1 for all k ∈ Z.

The reﬁnement equations (7.48) and (7.49) in Proposition 7.50 imply

ϕj2k = 2−1/2 ϕj−1
k + ψkj−1

ϕj2k+1 = 2−1/2 ϕj−1
k − ψ j−1
k .

Therefore, any basis element {ϕjk }k∈Z of Vj can be represented as a unique

linear combination of basis elements in {ϕj−1 k }k∈Z ⊂ Vj−1 and elements in
{ψkj−1 }k∈Z , and so the statement follows from the decomposition (7.53).
268 7 Basic Concepts of Signal Approximation

According to our more general discussion concerning complete orthogonal

systems in Section 6.2, we obtain for all elements of the Hilbert space L2 (R)
the representation

f= (f, ψkj )ψkj for all f ∈ L2 (R). (7.57)
j,k∈Z

This representation follows directly from Theorem 6.21 (b) and Theorem 7.55.
Now we organize the representation (7.57) for f ∈ L2 (R) on multiple
wavelet scales. Our starting point for doing so is the multiresolution analysis
of L2 (R) in Theorem 7.54. For simpliﬁcation we suppose supp(f ) ⊂ [0, 1]. We
approximate f on the scale space Vj , for j ∈ N, by the orthogonal projectors
Πj : L2 (R) −→ Vj , given as

N −1
Πj f = cjk ϕjk ∈ Vj for f ∈ L2 (R), (7.58)
k=0

where cjk := (f, ϕjk ), for k = 0, . . . , N − 1, and where we assume N = 2j . The

representation in (7.58) follows directly from (7.52), where the range of the
summation index k ∈ {0, . . . , N − 1} in (7.58) is due to

supp(f ) ⊂ [0, 1] and supp(ϕjk ) = [2−j k, 2−j (k + 1)].

⊥
By (7.53), Πj−1 = Πj − Πj−1 is the orthogonal projector of L2 (R) onto
Wj−1 , so that the decomposition
⊥
Πj f = Πj−1 f + Πj−1 f for all f ∈ L2 (R) (7.59)
⊥
holds. The orthogonal projector Πj−1 : L2 (R) −→ Wj−1 is described by

N/2−1
⊥
Πj−1 f= dj−1
k ψk
j−1
for f ∈ L2 (R), (7.60)
k=0

where dj−1
k := (f, ψkj−1 ), for k = 0, . . . , N/2 − 1.
By (7.58) and (7.60), the identity (7.59) can be written in the basis form

N −1
N/2−1

N/2−1
cjk ϕjk = dj−1
k ψk
j−1
+ cj−1 j−1
k ϕk . (7.61)
k=0 k=0 k=0

With the recursive decomposition of the scale spaces in (7.54), for = j,

Vj = Wj−1 ⊕ Wj−2 ⊕ · · · ⊕ W0 ⊕ V0 for j ∈ N,

we can write the orthogonal projector Πj : L2 (R) −→ Vj as a telescoping sum

7.5 The Haar Wavelet 269

j−1
Πj f = Πr⊥ f + Π0 f for f ∈ L2 (R), (7.62)
r=0

whereby Πj f ∈ Vj is decomposed into a sum of functions Πr⊥ f ∈ Wr , for

r = j − 1, . . . , 0, and Π0 f ∈ V0 with increasing smoothness, i.e., from high
frequency to low frequency terms. By (7.58) and (7.60), we can rewrite (7.62)
in basis form as
N−1
r
j−1 2 −1
j j
c k ϕk = drk ψkr + c00 ϕ00 . (7.63)
k=0 r=0 k=0
In practice, however, we have only discrete samples of f ∈ L2 (R). Suppose
the function values f (2−j k) are known for all k = 0, . . . , N −1, where N = 2j .
Then, f is interpolated by the function

N −1
s= f (2−j k)ϕ(2j · −k) ∈ Vj
k=0

at the sample points. Indeed, by ϕ(k) = δ0k we get

s(2−j ) = f (2−j ) for = 0, . . . , N − 1.
For the approximation of f , we use the function values of the finest level,
cjk ≈ 2−j/2 f (2−j k) for k = 0, . . . , N − 1,
−1
for the coefficients cj = (cjk )N
k=0 ∈ R
N
in (7.58).
Now we consider the representation of Πj f in (7.63). Our aim is to com-
−1
pute, from the input coefficients cj = (cjk )N
k=0 ∈ R , all wavelet coefficients
N

d = (c0 , d0 , (d1 )T , . . . , (dj−1 )T )T ∈ RN (7.64)

of the representation in (7.63), where
−1
r r
c0 = (c00 ) ∈ R1 and dr = (drk )2k=0 ∈ R2 for r = 0, . . . , j − 1.
The linear mapping T : RN −→ RN , which maps any data vector cj ∈ RN
to its corresponding wavelet coefficients d ∈ RN in (7.64) is bijective, and
referred to as discrete wavelet analysis. In the following discussion, we
describe the discrete wavelet analysis in detail.
The computation of the wavelet coefficients d in (7.64) can be performed
by recursive decompositions: At the first decomposition level, we compute
N/2−1 N/2−1
cj−1 = (cj−1
k )k=0 and dj−1 = (dj−1
k )k=0 in (7.61). To this end, we apply
the refinement equation in (7.48) to the representation in (7.61), whereby

N/2−1

N/2−1
cj2k ϕj2k + cj2k+1 ϕj2k+1 =
k=0 k=0
⎛ ⎞

N/2−1

N/2−1
2−1/2 ⎝ (cj2k + cj2k+1 )ϕj−1
k + (cj2k − cj2k+1 )ψkj−1 ⎠ .
k=0 k=0
270 7 Basic Concepts of Signal Approximation

By comparison of coeﬃcients, we obtain the decomposition equation

j−1 j−1
Hj j c c
c = j−1 or Tj · cj = j−1 (7.65)
Gj d d

with the orthogonal decomposition matrix Tj ∈ RN ×N containing the matrix

blocks
⎡ ⎤ ⎡ ⎤
11 1 −1
⎢ .. ⎥ −1/2 ⎢ .. ⎥
Hj = 2−1/2 ⎣ . ⎦ , Gj = 2 ⎣ . ⎦∈R
N/2×N
.
11 1 −1

In the next level, the vector cj−1 ∈ RN/2 is decomposed into the vec-
tors cj−2 ∈ RN/4 and dj−2 ∈ RN/4 . The resulting recursion is called the
pyramid algorithm. The decomposition scheme of the pyramid algorithm is
represented as follows.

dj−1 dj−2 ... d0

! ! ! !
cj −→ cj−1 −→ cj−2 −→ . . . −→ c0

We can describe the decompositions of the pyramid algorithm as linear

mappings T : RN −→ RN , cj −→ T cj = d, whose matrix representation

T · cj = T1 · T2 · . . . · Tj−1 · Tj · cj = (c0 , d0 , (d1 )T , . . . , (dj−1 )T )T = d

contains the decomposition matrices Tj−r , r = 0, . . . , j − 1, of the recursion

levels. The orthogonal decomposition matrices are block diagonal of the form
⎡ ⎤
Hj−r
⎢ ⎥ N ×N
Tj−r = ⎣ Gj−r ⎦∈R for r = 0, . . . , j − 1 (7.66)
Ir
−r
)×N (1−2−r )
with Hj−r , Gj−r ∈ RN/2 ×N/2 and the identities Ir ∈ RN (1−2
r+1 r
.
Therefore, the orthogonal matrix

T = T1 · T2 · . . . · Tj−1 · Tj ∈ RN ×N (7.67)

represents the discrete wavelet analysis.

−1
For given wavelet coeﬃcients d in (7.64), the coeﬃcients cj = (cjk )N
k=0 can
thereby be reconstructed from Πj f in (7.63). The linear mapping of this re-
construction is called discrete wavelet synthesis. The wavelet synthesis
is represented by the inverse matrix

T −1 = Tj−1 · Tj−1
−1
· . . . · T2−1 · T1−1 = TjT · Tj−1
T
· . . . · T2T · T1T ∈ RN ×N

of T in (7.67), so that
7.6 Exercises 271

cj = TjT · . . . · T1T · d.
The discrete wavelet analysis and the discrete wavelet synthesis are as-
sociated with the terms discrete wavelet transform (wavelet analysis) and
inverse discrete wavelet transformation (wavelet synthesis).
Due to the orthogonality of the matrices Tj−r in (7.66), the wavelet trans-
form is numerically stable, since

d2 = T1 · . . . · Tj · cj 2 = cj 2 .

Moreover, the complexity of the wavelet transform is only linear, since the j
decomposition steps (for r = 0, 1, . . . , j − 1) require altogether

N + N/2 + . . . + 2 = 2N − 2 = O(N ) for N → ∞

operations.

7.6 Exercises
Exercise 7.56. Show that the Fourier transform fˆ : R −→ C,

fˆ(ω) = f (x)e−ixω dx for ω ∈ R,
R

of f ∈ L (R) is a uniformly continuous function on R.

Exercise 7.57. Consider the Banach space (L1 (R), ·L1 (R) ) and the Hilbert
space (L2 (R), · L2 (R) ). Show that neither the inclusion L1 (R) ⊂ L2 (R) nor
the inclusion L2 (R) ⊂ L1 (R) holds. Make a (non-trivial) example for a linear
space S satisfying S ⊂ L1 (R) and S ⊂ L2 (R).

Exercise 7.58. Consider Proposition 7.2.

(a) Prove the properties (a)-(e) in Proposition 7.2.
(b) Give a multivariate formulation for each of the statements (a)-(e).

Exercise 7.59. Prove the following statements for the Fourier transform F.
(a) For the Fourier transform of the k-th derivative f (k) of f , we have

(Ff (k) )(ω) = (iω)k (Ff )(ω) for all ω ∈ R

under the assumption f (k) ∈ C (R) ∩ L1 (R).

(b) For the k-th derivative of the Fourier transform Ff of f , we have

dk
(Ff )(ω) = (−i)k (F(xk f ))(ω) for all ω ∈ R
dω k
under the assumption xk f ∈ L1 (R).
272 7 Basic Concepts of Signal Approximation

Exercise 7.60. Conclude from the results in Exercise 7.59 the statement:
”f ∈ L1 (R) is smooth, if and only if Ff has rapid decay around inﬁnity”.
Be more precise on this and quantify the decay and the smoothness of f .

Exercise 7.61. Let f ∈ L1 (R) \ {0} be a function with compact support.

Prove the following statements for the Fourier transform Ff = fˆ of f .
(a) fˆ has arbitrarily many derivatives, i.e., fˆ ∈ C ∞ (R);
(b) fˆ does not have compact support.

Exercise 7.62. Prove the estimate

f ∗ g∞ ≤ f L1 (R) · g∞ for all f ∈ L1 (R), g ∈ C0 (R).

Exercise 7.63. Prove the convolution formula

Fd f ∗ Fd g = (2π)d Fd (f · g) for all f, g ∈ L1 (Rd )

in the frequency domain of the multivariate Fourier transform Fd .

Exercise 7.64. Prove for f ∈ L1 (R) ∩ C (R) the Fourier inversion formula

1
fˆ(ω) · eixω e−ε|ω| dω
2
f (x) = lim for all x ∈ R,
ε 0 2π R

i.e., prove Corollary 7.24 as a conclusion from Theorem 7.21.

Hint: see [26, Chapter 7].

Exercise 7.65. Let f : R −→ R be a Lebesgue-measurable function with

f (x) = 0 for almost every x ∈ R. Moreover, suppose that f satisﬁes the
decay condition
|f (x)| ≤ C · e−τ |x| for all x ∈ R
for some C, τ > 0. Show that the system (xn f (x))n∈N0 is complete in L2 (R),
i.e.,
span{xn f (x) | n ∈ N0 } = L2 (R).

Hint: Proposition 7.28.

Exercise 7.66. Prove the statements of Proposition 7.47.

Exercise 7.67. Let V0 ⊂ V1 be closed subspaces of L2 (R). Moreover, let

Π : L2 (R) −→ V be linear projectors of L2 (R) onto V , for = 0, 1.
(a) Show that the operator P = Π1 − Π0 : L2 (R) −→ V1 is a projector of
L2 (R) onto V1 , if and only if Π0 ◦ Π1 = Π0 .
(b) Make an example for two projectors Π : L2 (R) −→ V , for = 0, 1, such
that the condition Π0 ◦ Π1 = Π0 is violated.
7.6 Exercises 273

Exercise 7.68. For ψ ∈ L2 (R), let {ψ(· − k) | k ∈ Z} be a Riesz basis of

W0 = span{ψ(· − k) | k ∈ Z}

with Riesz constants 0 < A ≤ B < ∞. Moreover, let

ψkj := 2j/2 ψ(2j · −k) for j, k ∈ Z.

(a) Show that {ψkj | k ∈ Z} is a Riesz basis of

Wj = span{ψkj | k ∈ Z} for j ∈ Z

with Riesz constants 0 < A ≤ B < ∞.

(b) Show that {ψkj | j, k ∈ Z} is a Riesz basis of L2 (R) with Riesz constants
0 < A ≤ B < ∞, provided that
@
L2 (R) = Wj .
j∈Z
8 Kernel-based Approximation

This chapter is devoted to interpolation and approximation of multivariate

functions. Throughout this chapter, f : Ω −→ R denotes a continuous func-
tion on a domain Ω ⊂ Rd , for d > 1. Moreover, X = {x1 , . . . , xn } ⊂ Ω is a set
of pairwise distinct interpolation points where we assume that the function
values of f at X are known. We collect these function values in a data vector
fX = (f (x1 ), . . . , f (xn ))T = (f1 , . . . , fn )T ∈ Rn . (8.1)
Since we do not make any assumptions on the distribution of the points X in
the domain Ω, the point set X is considered as scattered. We formulate the
basic interpolation problem for scattered data sets (X, fX ) as follows.
Problem 8.1. On given interpolation points X = {x1 , . . . , xn } ⊂ Ω, where
Ω ⊂ Rd for d > 1, and function values fX ∈ Rn find an interpolant s ∈ C (Ω)
satisfying sX = fX , so that s satisfies the interpolation conditions
s(xj ) = f (xj ) for all 1 ≤ j ≤ n. (8.2)

According to the Mairhuber-Curtis theorem, Theorem 5.25, there are no
non-trivial Haar systems in the truly multivariate case, i.e., for multivariate
parameter domains Ω ⊂ Rd , d > 1, containing at least one interior point.
Therefore, the interpolation problem for the multivariate case, as formulated
by Problem 8.1, is much harder than that for the univariate case.
To solve the posed multivariate interpolation problem, we construct fami-
lies of basis functions that are generated by a reproducing kernel K of a
Hilbert space F. The construction of such kernels K requires suitable charac-
terizations for positive definite functions, as we explain this in detail later in
this chapter. To this end, we rely on fundamental results from functional ana-
lysis, which we develop here. For only a few standard results from functional
analysis, we omit their proofs and rather refer to the textbook [33].
In the following discussion of this chapter, we show how positive definite
kernels lead to optimal solutions of the interpolation problem, Problem 8.1.
Moreover, we discuss other features and advantages of the proposed interpola-
tion method, where aspects of numerical relevance, e.g. stability and update
strategies, are included in our discussion. Finally, we briefly address basic
aspects of kernel-based learning methods.

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_8
276 8 Kernel-based Approximation

8.1 Multivariate Lagrange Interpolation

8.1.1 Discussion of the Interpolation Problem

Before we develop concrete solutions to Problem 8.1, we discuss the interpo-

lation problem in (8.2) from a more general viewpoint. To this end, suppose
that for a continuous function and for pairwise distinct interpolation points
X = {x1 , . . . , xn } ⊂ Ω ⊂ Rd , d > 1, a data vector fX containing function
values of the form (8.1) is given.
To solve the interpolation problem in (8.2), we fix a suitable (finite-
dimensional) subspace S ⊂ C (Ω), from which we wish to determine an in-
terpolant s ∈ S satisfying the interpolation conditions (8.2). To this end,
we choose a set B = {s1 , . . . , sn } ⊂ C (Ω) of n linearly independent con-
tinuous functions sj : Ω −→ R, 1 ≤ j ≤ n, so that the finite-dimensional
interpolation space

S = span{s1 , . . . , sn } ⊂ C (Ω)

consists of all linear combinations of functions in B. In this approach, the

sought interpolant s ∈ S is assumed to be of the form

n
s= cj sj . (8.3)
j=1

Now the solution of Problem 8.1 leads us to the linear system

VB,X · c = fX

for (unknown) coeﬃcients c = (c1 , . . . , cn )T ∈ Rn of s in (8.3), where

VB,X = (sj (xk ))1≤j,k≤n ∈ Rn×n

is the generalized Vandermonde-Matrix with respect to the basis B.

We wish to determine the basis B, and so the interpolation space S, such
that the interpolation problem (8.2) has for any set of interpolation points
X and function values fX a unique solution s from S, i.e., we require the
regularity of VB,X for any set of interpolation points X.
As we recall from our discussion in Chapter 5, especially in Section 5.3,
there is, according to the Mairhuber-Curtis theorem, Theorem 5.25, no non-
trivial Haar space S ⊂ C (Ω) on domains Ω ⊂ Rd containing bifurcations.
The negative result of Mairhuber-Curtis is in particular critical for the case of
multivariate domains. In other words, according to Mairhuber-Curtis, there
is for n ≥ 2 no Haar system {s1 , . . . , sn }, such that for any data vector fX the
interpolation problem fX = sX assuming an interpolant s ∈ span{s1 , . . . , sn }
is guaranteed to have a unique solution.
To further explain this dilemma, we refer to the characterization of Haar
spaces in Theorem 5.23. According to Theorem 5.23, for the unique solution
8.1 Multivariate Lagrange Interpolation 277

of the interpolation problems (8.2) we need to work with a basis B whose

elements do necessarily depend on the interpolation points X. To construct
such data-dependent bases B = {s1 , . . . , sn }, we choose the approach

sj ≡ K(·, xj ) for 1 ≤ j ≤ n, (8.4)

so that the j-th basis function sj ∈ B depends on the j-th interpolation

point xj ∈ X. In this approach, K : Ω × Ω −→ R in (8.4) denotes a suitable
continuous function, whose structural properties are discussed in the following
section.
Note that our assumption in (8.4) leads us, for a ﬁxed set of interpolation
points X = {x1 , . . . , xn } ⊂ Ω to the ﬁnite-dimensional interpolation space

SX = span{K(·, xj ) | xj ∈ X} ⊂ C (Ω),

from which we wish to choose an interpolant of the form

n
s= cj K(·, xj ). (8.5)
j=1

The solution of the interpolation problems fX = sX is in this case given by

the solution c = (c1 , . . . , cn )T ∈ Rn of the linear equation system

AK,X · c = f X

with the interpolation matrix AK,X = (K(xk , xj ))1≤j,k≤n ∈ Rn×n .

8.1.2 Lagrange Interpolation by Positive Deﬁnite Functions

For the sake of unique interpolation, in Problem 8.1, and with assuming (8.5),
the matrix AK,X must necessarily be regular. Indeed, this follows directly
from Theorem 5.23. In the following discussion, we wish to construct conti-
nuous functions K : Ω × Ω −→ R, such that AK,X is symmetric positive
definite for all finite sets X of interpolation points, in which case AK,X would
be regular. Obviously, the matrix AK,X is symmetric, if the function K is
symmetric, i.e., if K(x, y) = K(y, x) for all x, y ∈ Rd . The requirement
for AK,X to be positive definite leads us to the notion of positive definite
functions. Since we allow arbitrary parameter domains Ω ⊂ Rd , we will from
now restrict ourselves (without loss of generality) to the case Ω = Rd .

Deﬁnition 8.2. A continuous and symmetric function K : Rd × Rd −→ R

is said to be positive definite on Rd , K ∈ PDd , if for any set of pairwise
distinct interpolation points X = {x1 , . . . , xn } ⊂ Rd , n ∈ N, the matrix

AK,X = (K(xk , xj ))1≤j,k≤n ∈ Rn×n , (8.6)

is symmetric and positive deﬁnite.

278 8 Kernel-based Approximation

We summarize our discussion as follows (cf. Theorem 5.23).

Theorem 8.3. For K ∈ PDd , let X = {x1 , . . . , xn } ⊂ Rd , for n ∈ N, be a

finite point set. Then, the following statements are true.
(a) The matrix AK,X in (8.6) is positive definite.
(b) If s ∈ SX vanishes on X, i.e., if sX = 0, then s ≡ 0.
(c) The interpolation problem sX = fX has a unique solution s ∈ SX of the
form (8.5), whose coefficient vector c = (c1 , . . . , cn )T ∈ Rn is determined
by the unique solution of the linear system AK,X · c = fX .

By Theorem 8.3, the posed interpolation problem, in Problem 8.1, has

for K ∈ PDd a unique solution s ∈ SX of the form (8.5). In this case, for
any ﬁxed set of interpolation points X = {x1 , . . . , xn } ⊂ Rd there is a unique
Lagrange basis { 1 , . . . , n } ⊂ SX , whose Lagrange basis functions j ,
1 ≤ j ≤ n, are uniquely determined by the solution of the cardinal interpola-
tion problem

1 for j = k
j (xk ) = δjk = for all 1 ≤ j, k ≤ n. (8.7)
0 for j = k

Therefore, the Lagrange basis functions are also often referred to as cardinal
interpolants. We can represent the elements of the Lagrange basis { 1 , . . . , n }
as follows.

Proposition 8.4. Let K ∈ PDd and X = {x1 , . . . , xn } ⊂ Rd . Then, the

Lagrange basis { 1 , . . . , n } ⊂ SX for X is uniquely determined by the solution
of the linear system

AK,X · (x) = R(x) for x ∈ Rd , (8.8)

where

(x) = ( 1 (x), . . . , n (x))

T
∈ Rn and R(x) = (K(x, x1 ), . . . , K(x, xn ))T ∈ Rn .

The interpolant s ∈ SX satisfying sX = fX has the Lagrange representation

s(x) = fX , (x) , (8.9)

where ·, · denotes the usual inner product on the Euclidean space Rn .

Proof. For x = xj , the right hand side R(xj ) in (8.8) coincides with the j-th
column of AK,X , and so the j-th unit vector ej ∈ Rn is the unique solution
of the linear equation system (8.8), i.e.,

(xj ) = ej ∈ Rn for all 1 ≤ j ≤ n.

8.1 Multivariate Lagrange Interpolation 279

In particular, j satisﬁes the conditions (8.7) of cardinal interpolation.

Moreover, any Lagrange basis function j can, by using (x) = A−1 K,X R(x),
uniquely be represented as a linear combination

j (x) = eTj A−1

K,X R(x) for 1 ≤ j ≤ n (8.10)

of the basis functions K(x, xj ) in R(x), i.e., j ∈ SX for 1 ≤ j ≤ n.

From (8.10), we obtain, in particular, the stated representation in (8.8).
Finally, the interpolant s in (8.5) can, by using

s(x) = c, R(x) = A−1 −1

K,X fX , R(x) = fX , AK,X R(x) = fX , (x) ,

be represented as a unique linear combination in the Lagrange basis, where

n
s(x) = f (xj ) j (x)
j=1

and so we ﬁnd the Lagrange representation, as stated in (8.9).

8.1.3 Construction of Positive Deﬁnite Functions

In this section, we discuss the construction and characterization of positive

definite functions. To this end, we use the continuous multivariate Fourier
transform from Section 7.4.
But let us first note two simple observations. For K ∈ PDd and X = {x},
for x ∈ Rd , the matrix AK,X ∈ R1×1 is positive definite, i.e., K(x, x) > 0.
For X = {x, y}, with x, y ∈ Rd , x = y, we have det(AK,X ) > 0, whereby
K(x, y)2 < K(x, x)K(y, y).
In our subsequent construction of positive definite functions we assume

K(x, y) := Φ(x − y) for x, y ∈ Rd (8.11)

for an even continuous function Φ : Rd −→ R, i.e., Φ(x) = Φ(−x) for all

x ∈ Rd . Important special cases for such Φ are radially symmetric functions.

Deﬁnition 8.5. A continuous function Φ : Rd −→ R is radially symmetric

on Rd , with respect to the Euclidean norm · 2 , in short, Φ is radially
symmetric, if there exists a continuous function φ : [0, ∞) −→ R satisfying
Φ(x) = φ(x2 ) for all x ∈ Rd .

Obviously, every radially symmetric function Φ = φ( · 2 ) is even. In the

following discussion, we call Φ or φ positive deﬁnite, respectively, in short,
Φ ∈ PDd or φ ∈ PDd , if and only if K ∈ PDd .
We summarize our observations for K ∈ PDd in (8.11) as follows.
280 8 Kernel-based Approximation

Remark 8.6. Let Φ : Rd −→ R be even and positive deﬁnite, i.e., Φ ∈ PDd .

Then, the following statements hold.
(a) Φ(0) > 0;
(b) |Φ(x)| < Φ(0) for all x ∈ Rd \ {0}.
From now, we assume the normalization Φ(0) = 1. This is without loss of
generality, since for Φ ∈ PDd , we have α Φ ∈ PDd for any α > 0.
Now let us discuss the construction of positive deﬁnite functions. This is
done by using the continuous Fourier transform

ˆ
f (ω) := f (x)e−ix,ω dx for f ∈ L1 (Rd ).
Rd

The following fundamental result is due to Bochner1 who studied in [8] posi-
tive (semi-)definite functions of one variable. We can make use of the Bochner
theorem in [8] to prove suitable characterizations for multivariate positive
definite functions.
Theorem 8.7. (Bochner, 1932).
Suppose that Φ ∈ C (Rd )∩L1 (Rd ) is an even function. If the Fourier transform
Φ̂ of Φ is positive on Rd , Φ̂ > 0, then Φ is positive definite on Rd , Φ ∈ PDd .
Proof. For Φ ∈ C (Rd ) ∩ L1 (Rd ), the Fourier inversion formula

−d
Φ(x) = (2π) Φ̂(ω)eix,ω dω
Rd

holds (see Corollary 7.24). Moreover, Φ̂ is continuous on Rd (cf. our discussion

in Section 7.1). If Φ̂ > 0 on Rd , then the quadratic form
2

n
n

cT AK,X c = cj ck Φ(xj − xk ) = (2π)−d c e ixj ,ω
j Φ̂(ω) dω
j,k=1 Rd j=1

is non-negative for any pair (c, X) of a vector c = (c1 , . . . , cn )T ∈ Rn and a

point set X = {x1 , . . . , xn } ⊂ Rd , i.e., cT AK,X c ≥ 0. If cT AK,X c = 0, then
the symbol function

n
S(ω) ≡ Sc,X (ω) = cj eixj ,ω for ω ∈ Rd
j=1

must vanish identically on Rd , due to the positivity of Φ̂ on Rd . By the linear

independence of the functions eixj ,· , we can conclude c = 0 from S ≡ 0 (see
Exercise 8.61). Therefore, we have cT AK,X c > 0 for all c ∈ Rn \ {0} and
X ⊂ Rd with |X| = n ∈ N.
1
Salomon Bochner (1899-1982), mathematician
8.1 Multivariate Lagrange Interpolation 281

Remark 8.8. We could also work with weaker assumptions on Φ̂ ∈ C (Rd )

in Theorem 8.7, if we merely require non-negativity Φ̂ ≥ 0, with Φ̂ ≡ 0, for Φ̂.
But the somewhat stronger requirements for Φ̂ in Theorem 8.7 are suﬃcient
for our purposes and, in fact, quite convenient in our following discussion.

By using Bochner’s characterization of Theorem 8.7, we can make three

examples for positive deﬁnite radially symmetric functions Φ.

Example 8.9. The Gauss function

Φ(x) = e−
2
x 2 for x ∈ Rd

is for any d ≥ 1 positive deﬁnite on Rd , Φ ∈ PDd , by Example 7.41, where

Φ̂(ω) = π d/2 e−
2
ω 2 /4 > 0,

and so K(x, y) = exp(−x − y22 ) ∈ PDd , according to Theorem 8.7. ♦

Example 8.10. The inverse multiquadric

−β/2
Φ(x) = 1 + x22 for β > d/2

is positive deﬁnite on Rd for all d ∈ N. The Fourier transform of Φ is given

as
21−β
Φ̂(ω) = (2π)−d/2 ·
β−d/2
ω2 Kd/2−β (ω2 ), (8.12)
Γ (β)
where
∞
Kν (z) = e−z cosh(x) cosh(νx) dx for z ∈ C with | arg(z)| < π/2
0

is the modiﬁed Bessel function of the third kind of order ν ∈ C. We decided to

omit the rather technical details concerning the Fourier transform Φ̂ in (8.12)
and its positivity, but rather refer to [72, Theorem 6.13]. ♦

Example 8.11. The radial characteristic functions

(1 − x2 )β for x2 < 1

Φ(x) = (1 − x2 )β+ =
0 for x2 ≥ 1

of Askey [2] are for d ≥ 2 positive deﬁnite on Rd , provided that β ≥ (d+1)/2.

In this case, the Fourier transform Φ̂ of Φ can (up to some positive constant)
be represented as
s
Φ̂(s) = s−(d/2+β+1) (s − t)β td/2 J(d−2)/2 (t) dt > 0 (8.13)
0

for s = ω2 , where

282 8 Kernel-based Approximation
∞
(−1)j (z/2)ν+2j
Jν (z) = for z ∈ C \ {0}
j=0
j!Γ (ν + j + 1)

is the Bessel function of the ﬁrst kind of order ν ∈ C. Again, we decided to

omit the technical details concerning the Fourier transform Φ̂ in (8.13). Fur-
ther details on the construction and characterization of these early examples
for compactly supported radial positive deﬁnite functions are in [37]. ♦

Now that we have provided three explicit examples for positive deﬁnite
(radial) functions, we remark that the characterization of Bochner’s theorem
allows us to construct even larger classes of positive deﬁnite functions. This
is done by using convolutions. Recall that for any pair f, g ∈ L1 (Rd ) of
functions, the Fourier transform maps the convolution product f ∗g ∈ L1 (Rd ),

(f ∗ g)(x) = f (x − y)g(y) dy for f, g ∈ L1 (Rd )
Rd

to the product of their Fourier transforms, i.e.,

f
∗ g = fˆ · ĝ for f, g ∈ L1 (Rd )

by the Fourier convolution theorem, Theorem 7.14.

For g(x) = f ∗ (x) = f (−x), we get the non-negative autocorrelation

f
∗ f ∗ = fˆ · fˆ = |fˆ|2 for f ∈ L1 (Rd ).

This gives a simple method for constructing positive deﬁnite functions.

Corollary 8.12. For any function Ψ ∈ L1 (Rd ) \ {0}, its autocorrelation

Φ(x) = (Ψ ∗ Ψ ∗ )(x) = Ψ (x − y)Ψ (−y) dy
Rd

is positive deﬁnite, Φ ∈ PDd .

Proof. For Ψ ∈ L1 (Rd )\{0}, we have Φ ∈ L1 (Rd )\{0}, and so Φ̂ ∈ C (Rd )\{0}.
Moreover, the Fourier transform Φ̂ = |Ψ̂ |2 of the autocorrelation Φ = Ψ ∗ Ψ ∗
is, due to the Fourier convolution theorem, Theorem 7.43, non-negative, so
that Φ ∈ PDd , due to Remark 8.8.

The practical value of the construction resulting from Corollary 8.12 is,
however, rather limited. This is because the autocorrelations Ψ ∗Ψ ∗ are rather
awkward to evaluate. To avoid numerical integration, one would prefer to
work with explicit (preferably simple) analytic expressions for positive deﬁ-
nite functions Φ = Ψ ∗ Ψ ∗ .
We remark that the basic idea of Corollary 8.12 has led to the construc-
tion of compactly supported positive deﬁnite (radial) functions, dating back to
8.2 Native Reproducing Kernel Hilbert Spaces 283

earlier Göttingen works of Schaback & Wendland [62] (in 1993), Wu [74] (in
1994), and Wendland [71] (in 1995). In their constructions, explicit formulas
were given for autocorrelations Φ = Ψ ∗Ψ ∗ , whose generators Ψ (x) = ψ(x2 ),
x ∈ Rd , are specific radially symmetric and compactly supported functions
ψ : [0, ∞) −→ R. This has provided a large family of continuous, radially
symmetric, and compactly supported functions Φ = Ψ ∗ Ψ ∗ , as they were
later popularized by Wendland [71], who used the radial characteristic func-
tions of Example 8.11 for Ψ to obtain piecewise polynomial positive definite
compactly supported radial functions of minimal degree. For further details
concerning the construction of compactly supported positive definite radial
functions, we refer to the survey [61] of Schaback.

8.2 Native Reproducing Kernel Hilbert Spaces

The discussion of this section is devoted to reproducing kernel Hilbert spaces
F which are generated by positive definite functions K ∈ PDd . In particular,
for any fixed K ∈ PDd , the positive definite function K is shown to be the
reproducing kernel of its associated Hilbert space F ≡ FK , whose structure
is entirely determined by the properties of K. Therefore, F is also referred
to as the native reproducing kernel Hilbert space of K, in short, native space.
To introduce F, we first define, for a fixed positive definite K ∈ PDd , the
reconstruction space

S = {s ∈ SX | X ⊂ Rd , |X| < ∞} (8.14)

containing all (potential) interpolants of the form

n
s(x) = cj K(x, xj ) (8.15)
j=1

for some c = (c1 , . . . , cn )T ∈ Rn and X = {x1 , . . . , xn } ⊂ Rd .

Note that any s ∈ S in (8.15) can be rewritten as

n
s(x) ≡ sλ (x) := λy K(x, y) for λ = c j δxj (8.16)
j=1

where δx is the Dirac2 point evaluation functional, deﬁned by δx (f ) = f (x),

and λy in (8.16) denotes action of the linear functional λ on variable y.
2
Paul Adrien Maurice Dirac (1902-1984), English physicist
284 8 Kernel-based Approximation

8.2.1 Topology of the Reconstruction Space and Duality

Now we consider the linear space
⎧ ⎫
⎨ ⎬
n

L= λ= cj δxj c = (c1 , . . . , cn )T ∈ Rn , X = {x1 , . . . , xn } ⊂ Rd , n ∈ N
⎩ ⎭
j=1

containing all ﬁnite linear combinations of δ-functionals. We equip L with

the inner product

nλ
nμ
(λ, μ)K := λx μy K(x, y) = cj dk K(xj , yk ) for λ, μ ∈ L, (8.17)
j=1 k=1

for K ∈ PDd , where

nλ
nμ
λ= c j δxj ∈ L and μ= dk δyk ∈ L.
j=1 k=1

1/2
By · K := (·, ·)K , L is a Euclidean space. Likewise, via the duality relation
in (8.16), we can equip S with the inner product
(sλ , sμ )K := (λ, μ)K for sλ , sμ ∈ S (8.18)
1/2
and the norm · K = (·, ·)K . Note that the normed linear spaces S and L
are isometric isomorphic, S ∼
= L, via the linear bijection λ −→ sλ and by the
norm isometry
λK = sλ K for all λ ∈ L. (8.19)
Before we study the topology of the spaces L and S in more detail, we ﬁrst
discuss a few concrete examples for inner products and norms of elements in
L and S.
Example 8.13. For any pair of point evaluation functionals δz1 , δz2 ∈ L,
with z1 , z2 ∈ Rd , their inner product is given by
(δz1 , δz2 )K = δzx1 δzy2 K(x, y) = K(z1 , z2 ) = Φ(z1 − z2 ).
Moreover, for the norm of any δz ∈ L, z ∈ Rd , we obtain
δz 2K = (δz , δz )K = δzx δzy K(x, y) = K(z, z) = Φ(0) = 1,
with using the normalization Φ(0) = 1, as introduced in Remark 8.6. Likewise,
we have
(K(·, z1 ), K(·, z2 ))K = K(z1 , z2 ) = Φ(z1 − z2 ) (8.20)
for all z1 , z2 ∈ Rd and
K(·, z)K = δz K = 1 for all z ∈ Rd .
♦
8.2 Native Reproducing Kernel Hilbert Spaces 285

To extend this first elementary example, we regard, for a fixed point set
X = {x1 , . . . , xn } ⊂ Rd , the linear bijection operator G : Rn −→ SX , defined
as

n
G(c) = cj K(·, xj ) = c, R(x) for c = (c1 , . . . , cn )T ∈ Rn . (8.21)
j=1

Proposition 8.14. For any X = {x1 , . . . , xn } ⊂ Rd , we have

(G(c), G(d))K = c, d AK,X for all c, d ∈ Rn ,

where
c, d AK,X := cT AK,X d for c, d ∈ Rn
denotes the inner product generated by the positive deﬁnite matrix AK,X . In
particular, G is an isometry by

G(c)K = cAK,X for all c ∈ Rn ,

1/2
where · AK,X := ·, · AK,X .

Proof. By (8.20), we have

n
(G(c), G(d))K = cj dk (K(·, xj ), K(·, xk ))K = cT AK,X d = c, d AK,X
j,k=1

for all c = (c1 , . . . , cn )T ∈ Rn and d = (d1 , . . . , dn )T ∈ Rn .

The result of Proposition 8.14 leads us to the dual operator of G.

Proposition 8.15. For any ﬁnite point set X = {x1 , . . . , xn } ⊂ Rd , the dual
operator G∗ : SX −→ Rn of G in (8.21), characterized by the relation

(G(c), s)K = c, G∗ (s) for c ∈ Rn and s ∈ SX , (8.22)

is given as
G∗ (s) = sX for s ∈ SX .

Proof. Note that for any s ∈ SX , there is a unique d ∈ Rn satisfying G(d) = s,

so that we have

(G(c), s)K = (G(c), G(d))K = c, d AK,X = c, AK,X d = c, sX

for all c ∈ Rn , in which case the assertion follows directly from (8.22).

Next, we compute inner products and norms for the Lagrange basis func-
tions 1 , . . . , n of SX . The following proposition yields an important result
concerning our subsequent stability analysis of the interpolation method.
286 8 Kernel-based Approximation

Proposition 8.16. For X = {x1 , . . . , xn } ⊂ Rd , the inner products between

the Lagrange basis functions j ∈ SX satisfying (8.7) are given as

( j, k )K = a−1
jk for all 1 ≤ j, k ≤ n,

where A−1 −1
K,X = (ajk )1≤j,k≤n ∈ R
n×n
. In particular, the norm of j ∈ SX is

j 2K = a−1
jj for all 1 ≤ j ≤ n.

Proof. The representation of the Lagrange basis functions j in (8.10) yields

( j, k )K = eTj A−1 −1 T −1 −1
K,X AK,X AK,X ek = ej AK,X ek = ajk

for all 1 ≤ j, k ≤ n.

From Example 8.13 and Proposition 8.16, we see that the matrices

AK,X = ((δxj , δxk )K )1≤j,k≤n ∈ Rn×n

A−1
K,X = (( j , k )K )1≤j,k≤n ∈ Rn×n

are Gramian, i.e., the entries of the symmetric positive deﬁnite matrices AK,X
and A−1
K,X are represented by inner products, respectively.

8.2.2 Construction of the Native Hilbert Space

In this section, we introduce the native Hilbert space F ≡ FK of K ∈ PDd .

To this end, we perform a completion of the Euclidean space S. On this
occasion, we recall the general concept of completion for normed linear spaces
from functional analysis. But we decided to omit the proofs, where we rather
refer to the more general discussion in [33, Appendix B]. The following result
can, for instance, be found in [33, Corollary 16.11].

Theorem 8.17. (Completion of normed linear spaces). Let S be a

normed linear space. Then, S is isometric isomorphic to a dense subspace
of a Banach space F, which is, up to norm isomorphy, unique. The Banach
space F is called completion of S, in short, F = S.

The concept of completion can, obviously, be applied to Euclidean spaces:

For any Euclidean space S in (8.14), there is a unique (up to norm isomorphy)
Hilbert space F, which is the completion of S with respect to the Euclidean
norm · K , i.e., F = S. Likewise, for the dual space L of S, there is a unique
(up to norm isomorphy) Hilbert space D satisfying D = L.
By the norm isomorphy in (8.19) and by the continuity of the norm · K ,
we obtain another important result, where we extend the linear bijection
λ −→ sλ between L and S to D and F.
8.2 Native Reproducing Kernel Hilbert Spaces 287

Proposition 8.18. The Hilbert spaces D and F are isometric isomorphic,

D∼
= F,

via the linear bijection λ −→ sλ and by the norm isometry

λK = sλ K for all λ ∈ D.

Remark 8.19. Any functional μ ∈ D is continuous on the Hilbert space F

by the Cauchy-Schwarz inequality

|μ(sλ )| = |μx λy K(x, y)| = |(μ, λ)K | ≤ μK · λK = μK · sλ K .

In particular, any point evaluation functional δx ∈ L, x ∈ Rd , is continuous

on F, since we have

|δx (f )| ≤ δx K · f K = f K for all f ∈ F,

where δx K = 1 (see Example 8.13).

Resorting to functional analysis, we see that F is a reproducing kernel

Hilbert space. But let us ﬁrst recall some facts about reproducing kernels [1].

Deﬁnition 8.20. Let H denote a Hilbert space of functions f : Rd −→ R,

with inner product (·, ·)H . Then, a function K : Rd × Rd −→ R is said to be
a reproducing kernel for H, if K(·, x) ∈ H, for all x ∈ Rd , and

(K(·, x), f )H = f (x) for all f ∈ H and all x ∈ Rd .

Next, we prove an important result concerning the characterization of

reproducing kernel Hilbert spaces. To this end, we rely on the representation
theorem of Fréchet3 -Riesz4 , giving another standard result from functional
analysis, which can be found, for instance, in [33, Section 8.3].

Theorem 8.21. (Fréchet-Riesz representation theorem). Let H be a

Hilbert space. Then, there is for any bounded linear functional ϕ : H −→ R
a unique representer uϕ ∈ H, satisfying

ϕ(u) = (uϕ , u)H for all u ∈ H.

The mapping ϕ −→ uϕ is linear, bijective and isometric from H onto H.

3
Maurice René Fréchet (1878-1973), French mathematician
4
Frigyes Riesz (1880-1956), Hungarian mathematician
288 8 Kernel-based Approximation

Theorem 8.22. A Hilbert space H of functions f : Rd −→ R has a repro-

ducing kernel, if and only if all point evaluation functionals δx : H −→ R,
for x ∈ Rd , are continuous on H.

Proof. Suppose K is a reproducing kernel for H. Then, by the estimate

|δx (f )| = |f (x)| = |(K(·, x), f )H | ≤ K(·, x)H · f H for x ∈ Rd

any point evaluation functional δx is bounded, and so continuous, on H.

As for the converse, suppose that all point evaluation functionals δx are
continuous on H. Then, due to the Fréchet-Riesz representation theorem,
Theorem 8.21, there is, for any x ∈ Rd , a unique function kx ∈ H satisfying

f (x) = δx (f ) = (kx , f )H for all f ∈ H,

and so the function K(·, x) := kx is a reproducing kernel for H.

Remark 8.23. A reproducing kernel K for H is unique. Indeed, if K̃ is

another reproducing kernel for H, then we have

(K̃(·, x), f )H = f (x) for all f ∈ H and all x ∈ Rd .

For kx := K(·, x) and k̃x := K̃(·, x) this implies

(kx − k̃x , f )H = 0 for all f ∈ H and all x ∈ Rd ,

and so kx ≡ k̃x , i.e., K ≡ K̃.

8.2.3 The Madych-Nelson Theorem

Now we are in a position where we can show that the positive deﬁnite function
K ∈ PDd is the (unique) reproducing kernel for the Hilbert space F ≡ FK .
To this end, we rely on the seminal works [45, 46, 47] of Madych and Nelson.

Theorem 8.24. (Madych-Nelson, 1983).

For any dual functional λ ∈ D, we have the representation

λ(f ) = (λy K(·, y), f )K for all f ∈ F. (8.23)

Proof. For λ ∈ L and sμ = μy K(·, y) ∈ S the representation

(λy K(·, y), sμ )K = (sλ , sμ )K = (λ, μ)K = λx μy K(x, y) = λ(sμ ) (8.24)

holds, with the inner products (·, ·)K : L × L −→ R and (·, ·)K : S × S −→ R
in (8.17) and in (8.18). By continuous extension of the representation (8.24)
from L to D and from S to F we already obtain the statement in (8.23).

From the result of Theorem 8.24, we note the following observation.

8.3 Optimality of the Interpolation Method 289

Remark 8.25. Any dual functional λ ∈ D is, according to (8.23) and in the
sense of the Fréchet-Riesz representation theorem, Theorem 8.21, uniquely
represented by the element sλ = λy K(·, y) ∈ F.

Now we can formulate the central result of this section.

Corollary 8.26. Every positive deﬁnite function K ∈ PDd is the unique

reproducing kernel of the Hilbert space F ≡ FK generated by K.

Proof. On the one hand, we have, for δx ∈ L , x ∈ Rd , the representation

δxy K(·, y) = K(·, x) ∈ F for all x ∈ Rd .

On the other hand, by letting λ = δx ∈ L in (8.23), we obtain

(K(·, x), f )K = f (x) for all f ∈ F and all x ∈ Rd .

Therefore, K is the reproducing kernel of F according to Deﬁnition 8.20.

Another useful consequence of the Madych-Nelson theorem is as follows.

Corollary 8.27. Every function f ∈ F is continuous on Rd , F ⊂ C (Rd ).

Proof. Recall that we assume continuity for K ∈ PDd . Therefore, by

|f (x) − f (y)| = |(K(·, x) − K(·, y), f )K | ≤ f K · K(·, x) − K(·, y)K ,

for f ∈ F, and by

K(·, x) − K(·, y)2K

= (K(·, x), K(·, x))K − 2(K(·, x), K(·, y))K + (K(·, y), K(·, y))K
= K(x, x) − 2K(x, y) + K(y, y)

(cf. Example 8.13) we see that every f ∈ F is a continuous function.

8.3 Optimality of the Interpolation Method

In this section, we prove further results that directly follow from the Madych-
Nelson theorem, Theorem 8.24. As we show, the proposed Lagrange interpo-
lation method is optimal in two diﬀerent senses.

8.3.1 Orthogonality and Best Approximation

The ﬁrst optimality property is based on the Pythagoras theorem.

290 8 Kernel-based Approximation

Corollary 8.28. For X = {x1 , . . . , xn } ⊂ Rd , F can be decomposed as

F = SX ⊕ {f ∈ F | fX = 0} , (8.25)
⊥
where SX = {f ∈ F | fX = 0} is the orthogonal complement of SX in F.
For f ∈ F and the unique interpolant s ∈ SX to f on X satisfying
sX = fX , the Pythagoras theorem holds, i.e.,

f 2K = s2K + f − s2K . (8.26)

Proof. For f ∈ F, let s ∈ SX be the unique interpolant to f from SX

satisfying sX = fX . Then, s can, according to (8.16), be represented
:n as
s = λy K(·, y), for some dual functional λ ∈ L of the form λ = j=1 cj δxj .
By the Madych-Nelson theorem, Theorem 8.24, we have

(s, g)K = 0 for all g ∈ F with λ(g) = 0,

i.e., s is perpendicular to the (algebraic) kernel of λ, and so the implication

gX = 0 =⇒ g ⊥ SX
⊥
holds. But this implies f − s ⊥ SX , or f − s ∈ SX , since (f − s)X = 0.
Therefore, the stated decomposition with the direct sum in (8.25) holds by
⊥
f = s + (f − s) ∈ SX ⊕ SX

and, moreover,

f 2K = f − s + s2K = f − s2K + 2(f − s, s)K + s2K = f − s2K + s2K ,

whereby the Pythagoras theorem (8.26) is also proven.

By the result of Corollary 8.28, we can identify the unique interpolant

s∗ ∈ SX to f on X as the orthogonal projection of f onto SX . Therefore, the
interpolant s∗ is, according to Remark 4.2, the unique best approximation
to f from SX with respect to (F , · K ) . Note that the projection operator
ΠSX : F −→ SX , f −→ s∗ , satisﬁes the property

f − ΠSX f = (I − ΠSX )(f ) ⊥ SX for all f ∈ F,

⊥
so that the operator I − ΠSX : F −→ SX maps, according to the decom-
⊥
position in (8.25), onto the orthogonal complement SX ⊂ F of SX in F.
The linear operator I − ΠSX is also a projection operator (cf. our general
discussion on orthogonal projections in Section 4.2).
We summarize our observations as follows.
8.3 Optimality of the Interpolation Method 291

Corollary 8.29. For f ∈ F and X = {x1 , . . . , xn } ⊂ Rd the unique inter-

polant s∗ ∈ SX to f on X, s∗X = fX , satisﬁes the following properties.
(a) s∗ is the unique orthogonal projection of f ∈ F onto SX .
(b) s∗ is the unique best approximation to f ∈ F from SX w.r.t. · K .

Moreover, Corollary 8.28 implies that the interpolant has minimal variation.

Corollary 8.30. For X = {x1 , . . . , xn } ⊂ Rd and fX ∈ Rn , the interpolant

s ∈ SX satisfying sX = fX is the unique minimizer of the energy functional
· K among all interpolants from F to the data fX , i.e.,

sK ≤ gK for all g ∈ F with gX = fX .

The interpolant s is uniquely determined by this variational property.

Now we analyze to stability of the proposed interpolation method. To

this end, we compute the norm of the interpolation operator IX : F −→ SX ,
which maps every f ∈ F to its unique interpolant s ∈ SX satisfying fX = sX .
On this occasion, we recall the deﬁnition for the norm of linear operators,
which is in particular for IX : F −→ S with respect to · K given as

IX f K
IX K = sup .
f ∈F \{0} f K

Theorem 8.31. For X = {x1 , . . . , xn } ⊂ Rd , the native space norm IX K

of the interpolation operator IX : F −→ SX is one, i.e.,

IX K = 1.

Proof. The variational property in Corollary 8.30 implies

IX f K ≤ f K for all f ∈ F, (8.27)

and so IX K ≤ 1. Due to the projection property IX s = s, for all s ∈ SX ,

equality in (8.27) is attained at any s ∈ SX , i.e.,

IX sK = sK for all f ∈ SX ,

and therefore we have IX K = 1.

The above result allows us to draw the following conclusion.

Remark 8.32. By the stability property (8.27) in Theorem 8.31, the pro-
posed interpolation method has minimal condition number w.r.t. · K .
292 8 Kernel-based Approximation

8.3.2 Norm Minimality of the Pointwise Error Functional

The second optimality property of the proposed interpolation method is con-

cerning the pointwise error

εx (f ) = f (x) − s(x) for x ∈ Rd (8.28)

between f ∈ F and the interpolant s ∈ SX to f on X satisfying sX = fX .

By the Lagrange representation of s in (8.9), the pointwise error functional
εx : F −→ [0, ∞) can be written as a linear combination of δ-functionals,

n
εx = δx − j (x)δxj = δx − (x)T δX ∈ L, (8.29)
j=1

where δX := (δx1 , . . . , δxn )T . Moreover, we use the notation

n
n
(x)T R(x) = j (x)K(x, xj ) =
y
j (x)δxj K(x, y) = δx , T
(x)δX K
.
j=1 j=1

The pointwise error εx (f ) in (8.28) is bounded above as follows.

Corollary 8.33. For f ∈ F and X = {x1 , . . . , xn } ⊂ Rd , let s ∈ SX be the

unique interpolant to f on X satisfying sX = fX . Then, for the pointwise
error εx (f ) in (8.28) the estimate

|εx (f )| ≤ εx K · f K (8.30)

holds, where the norm εx K of the error functional can be written as

εx 2K = 1 − (x)T AK,X (x) = 1 − (x)2AK,X , (8.31)

by using the positive deﬁnite matrix AK,X in (8.6), so that

0 ≤ εx K ≤ 1 for all x ∈ Rd . (8.32)

The error estimate in (8.30) is sharp, where equality holds for the function

fx = εyx K(·, y) ∈ F. (8.33)

Proof. By the Madych-Nelson theorem, Theorem 8.24, we have

εx (f ) = (εyx K(·, y), f )K for all f ∈ F, (8.34)

so that (8.30) follows directly from (8.34) and the Cauchy-Schwarz inequality.
We compute the norm of the error functional εx in (8.29) by

εx 2K = (εx , εx )K = (δx − (x)T δX , δx − (x)T δX )K

= 1 − 2 (x)T R(x) + (x)T AK,X (x) = 1 − (x)T AK,X (x),
8.4 Orthonormal Systems, Convergence, and Updates 293

(cf. Example 8.13), where we use the representation in (8.8). The upper bound
for εx K in (8.32) follows from the positive deﬁniteness of AK,X .
Finally, for the function fx in (8.33) equality holds in (8.30), since we get
|εx (fx )| = |(εyx K(·, y), fx )K | = (fx , fx )K = (εx , εx )K = εx K · fx K
from the Madych-Nelson theorem, and so the estimate in (8.30) is sharp.
Finally, we show the pointwise optimality of the interpolation method.
To this end, we regard quasi-interpolants of the form

n
s = T
fX = j f (xj ) for = ( 1, . . . , n)
T
∈ Rn
j=1

along with their associated pointwise error functionals

n
ε()
x = δx − j δxj = δx − T
δX ∈ L for x ∈ Rd . (8.35)
j=1

()
For the norm εx K we have, like in (8.31), the representation

x K = 1 − 2
ε() 2 T
R(x) + T
AK,X .
()
Now let us minimize the norm εx K under variation of the coeﬃcients
∈ Rn . This leads us directly to the unconstrained optimization problem
ε()
x K = 1 − 2
2 T
R(x) + T
AK,X −→ minn ! (8.36)
∈R

whose unique solution is the solution to the linear system AK,X = R(x).
But this already implies the pointwise optimality, that we state as follows.
Corollary 8.34. Let X = {x1 , . . . , xn } ⊂ Rd and x ∈ Rd . Then, the point-
wise error functional εx in (8.29) is norm-minimal among all error func-
tionals of the form (8.35), where
εx K < ε()
x K for all ∈ Rn with AK,X = R(x),
i.e., εx is the unique solution to the optimization problem (8.36).

8.4 Orthonormal Systems, Convergence, and Updates

In this section, we discuss important numerical aspects of the interpolation
method. First, we construct countable systems {uj }j∈N ⊂ S of orthonormal
bases in S ⊂ F, in short, orthonormal systems. On this occasion, we recall our
discussion in Sections 4.2 and 6.2, where we have already explained important
advantages of orthonormal systems. In particular, orthonormal systems and
their associated orthogonal projection operators Π : F −→ S lead us to
eﬃcient and numerically stable approximation methods.
294 8 Kernel-based Approximation

8.4.1 Construction of Orthonormal Systems

Our following construction of orthonormal systems in S ⊂ F relies on a fa-
miliar result from linear algebra, the spectral theorem for symmetric matrices.
Proposition 8.35. For X = {x1 , . . . , xn } ⊂ Rd , let
AK,X = QT DQ
be the eigendecomposition of the symmetric positive deﬁnite kernel matrix
AK,X ∈ Rn×n in (8.6), with an orthogonal factor Q ∈ Rn×n and a diagonal
matrix D = diag(σ1 , . . . , σn ) ∈ Rn×n , whose elements σ1 ≥ . . . ≥ σn > 0 are
the positive eigenvalues of AK,X . Then, the functions
uj (x) = eTj D−1/2 Q · R(x) for 1 ≤ j ≤ n (8.37)
−1/2 −1/2
form an orthonormal basis of SX , where D−1/2 = diag(σ1 , . . . , σn ).
Proof. By the representation in (8.37), for R(x) = (K(x, xj ))1≤j≤n ∈ Rn ,
any uj is being expressed as a linear combination of the basis functions
{K(·, xj )}nj=1 ⊂ SX , and so uj ∈ SX . By Proposition 8.14, we obtain the
orthonormality relation
(uj , uk )K = eTj D−1/2 QAK,X QT D−1/2 ek = ej , ek = δjk
for all 1 ≤ j, k ≤ n.
Now we develop for s, s̃ ∈ SX useful representations of their inner products
(s, s̃)K and norms sK . To this end, we work with the inner product
c, d A−1 = cT A−1
K,X d for c, d ∈ Rn ,
K,X

which is generated by the positive deﬁnite inverse A−1

K,X of AK,X .

Proposition 8.36. For X = {x1 , . . . , xn } ⊂ Rd , we have the representations

(s, s̃)K = sX , s̃X A−1 (8.38)
K,X

sK = sX A−1 (8.39)

K,X

for all s, s̃ ∈ SX .
Proof. For s, s̃ ∈ SX we have the Lagrange representations

n
n
s(x) = sX , (x) = s(xj ) j (x) and s̃(x) = s̃X , (x) = s̃(xk ) k (x)
j=1 k=1

according to (8.9) in Proposition 8.4. From this and Proposition 8.16, we get

n
n
(s, s̃)K = s(xj )s̃(xk )( j , k )K = s(xj )s̃(xk )a−1
jk = sX , s̃X A−1 ,
K,X
j,k=1 j,k=1

and so (8.38) holds. For s = s̃ in (8.38), we have (8.39).

8.4 Orthonormal Systems, Convergence, and Updates 295

8.4.2 On the Convergence of the Interpolation Method

In this section, we develop rather elementary convergence results for the

proposed kernel-based interpolation method. We use the following notations.
By X we denote a ﬁnite set of pairwise distinct interpolation points, where we
further assume that X is contained in a compact domain Ω ⊂ Rd , i.e., X ⊂ Ω.
Moreover, we denote by sf,X ∈ SX the unique interpolant to f : Ω −→ R on
X, where we assume that the function f is contained in the linear subspace

FΩ := span {K(·, y) | y ∈ Ω} ⊂ F

i.e., f ∈ FΩ . Finally,
hX,Ω := sup min y − x2 (8.40)
y∈Ω x∈X

is the fill distance of the interpolation points X in the compact set Ω.

In the following discussion, we analyze for a nested sequence

X1 ⊂ X2 ⊂ X3 ⊂ . . . ⊂ Xn ⊂ . . . ⊂ Ω (8.41)

of (ﬁnite) point sets Xn ⊂ Ω, for n ∈ N, the asymptotic behaviour of the

minimal distances

ηK (f, SXn ) := sf,Xn − f K = inf s − f K for f ∈ FΩ (8.42)

s∈SXn

for n → ∞. Moreover, we work with the (reasonable) assumption

hXn ,Ω 0 for n → ∞ (8.43)

concerning the asymptotic geometric distribution of the interpolation points

Xn . Under this assumption, we already obtain our ﬁrst convergence result.

Theorem 8.37. Let (Xn )n∈N ⊂ Ω be a nested sequence of interpolation

points, as in (8.41). Moreover, suppose that the associated ﬁll distances hXn ,Ω
have asymptotic decay hXn ,Ω 0, as in (8.43). Then, we have, for any
f ∈ FΩ , the convergence

ηK (f, SXn ) = sf,Xn − f K −→ 0 for n → ∞.

Proof. Suppose y ∈ Ω. Then, according to our assumption in (8.43) there is

a sequence (xn )n∈N ⊂ Ω of interpolation points xn ∈ Xn satisfying

y − xn 2 ≤ hXn ,Ω −→ 0 for n → ∞.

Moreover, we have
2
ηK (K(·, y), SXn ) ≤ K(·, xn ) − K(·, y)2K = 2 − 2K(y, xn ) −→ 0

for n → ∞, due to the continuity of K and the normalization K(w, w) = 1.

296 8 Kernel-based Approximation

Now for Y = {y1 , . . . , yN } ⊂ Ω and c = (c1 , . . . , cN )T ∈ RN we consider

the function
N
fc,Y = cj K(·, yj ) ∈ SY ⊂ FΩ .
j=1

(j)
For any yj ∈ Y , 1 ≤ j ≤ N , we take a sequence (xn )n∈N ⊂ Ω of interpolation
(j) (j)
points xn ∈ Xn satisfying yj − xn 2 ≤ hXn ,Ω . Moreover, we consider the
functions

N
sc,n = n ) ∈ SXn
cj K(·, x(j) for n ∈ N.
j=1

Then, we have

N

ηK (fc,Y , SXn ) ≤ sc,n − fc,Y K =

n ) − K(·, yj )
cj K(·, x(j)
j=1
K

N
≤ |cj | · K(·, x(j)
n ) − K(·, yj )K −→ 0 for n → ∞.
j=1

This proves the convergence for the dense subset

SΩ := {fc,Y ∈ SY | |Y | < ∞} ⊂ FΩ .

By continuous extension, we ﬁnally obtain the stated convergence on FΩ .

We remark that the proven convergence in Theorem 8.37 can be arbitrarily

slow. Indeed, for any monotonically decreasing zero sequence (ηn )n∈N of non-
negative real numbers, i.e., ηn 0 for n → ∞, there is a nested sequence of
point sets (Xn )n∈N ⊂ Ω, as in (8.41), and a function f ∈ FΩ satisfying

ηK (f, SXn ) ≥ ηn for all n ∈ N.

For the proof of this statement, we refer to Exercise 8.64.

Nevertheless, we can prove convergence rates for norms that are weaker
than the native space norm · K . To make a prototypical case, we restrict
ourselves to the maximum norm · ∞ (cf. Exercise 8.62). On this occasion,
recall that any function f ∈ F is continuous, according to Corollary 8.27. In
particular, we have FΩ ⊂ C (Ω), and so · ∞ is well-deﬁned on FΩ .
For our next convergence result we require the following lemma.

Lemma 8.38. Let K(x, y) = Φ(x − y) be positive deﬁnite, K ∈ PDd , where

Φ : Rd −→ R is even and Lipschitz continuous with Lipschitz constant L > 0.
Then, we have, for any f ∈ FΩ , the estimate

|f (x) − f (y)|2 ≤ 2Lx − y2 · f 2K for all x, y ∈ Ω.

8.4 Orthonormal Systems, Convergence, and Updates 297

Proof. Suppose f ∈ FΩ satisﬁes f K = 1 (without loss of generality). Then,

|f (x) − f (y)|2 = |(f, Φ(· − x) − Φ(· − y))K |2 ≤ Φ(· − x) − Φ(· − y)2K

= 2Φ(0) − 2Φ(x − y) ≤ 2Lx − y2 ,

where we use the reproduction property of K in FΩ .

Lemma 8.38 immediately implies the following error estimate.

Theorem 8.39. Let K(x, y) = Φ(x−y) be positive deﬁnite, K ∈ PDd , where

Φ : Rd −→ R is even and Lipschitz continuous with Lipschitz constant L > 0.
Moreover, let X ⊂ Ω be a ﬁnite subset of Ω ⊂ Rd . Then, we have, for any
f ∈ FΩ , the error estimate
9
sf,X − f ∞ ≤ 2LhX,Ω · f K .

Proof. Suppose y ∈ Ω. Then, there is some x ∈ X satisfying y−x2 ≤ hX,Ω .

Then, from Lemma 8.38, and by (sf,X − f )(x) = 0, we can conclude

|(sf,X − f )(y)|2 ≤ 2LhX,Ω · f 2K for all y ∈ Ω,

where we use the estimate sf,X − f K ≤ f K .

From Theorem 8.39 we ﬁnally obtain our next convergence result.

Corollary 8.40. Let K(x, y) = Φ(x − y) be positive deﬁnite, K ∈ PDd ,

where Φ : Rd −→ R is even and Lipschitz continuous with Lipschitz constant
L > 0. Moreover, let (Xn )n∈N ⊂ Ω be a nested point set of interpolation
points, as in (8.41). Finally, assume for the associated ﬁll distances hXn ,Ω the
asymptotic decay hXn ,Ω 0, as in (8.43). Then, we have, for any f ∈ FΩ ,
the uniform convergence

1/2
sf,Xn − f ∞ = O hXn ,Ω for n → ∞

at convergence rate 1/2.

We remark that we can, under more restrictive assumptions on Φ ∈ PDd ,

prove convergence rates that are even higher than those in Corollary 8.40.
For a prototypical case, we refer to Exercise 8.66.

8.4.3 Update Strategies

Now we develop update strategies for the proposed interpolation method. To

further explain this task, let us regard a set Xn = {x1 , . . . , xn } ⊂ Rd of
n ∈ N pairwise distinct points. In this case, one update step is initiated by
adding a new point xn+1 ∈ Rd \ Xn to Xn . Typically, the insertion of xn+1
is motivated by the purpose to improve the quality of the approximation to
298 8 Kernel-based Approximation

f ∈ F (according to our discussion in Section 8.4.2). When adding a new

point xn+1 , this leads by
Xn+1 := Xn ∪ {xn+1 } for n ∈ N (8.44)
to an updated set of interpolation points Xn+1 . Note that the update of Xn
in (8.44) yields one additional interpolation condition
s(xn+1 ) = f (xn+1 )
in Problem 8.1, and so this requires an update for the interpolation method.
But we essentially wish to use the data of the interpolant sn ∈ SXn of f
on Xn , sXn = fXn , to efficiently compute the new data for the interpolant
sn+1 ∈ SXn+1 of f on Xn+1 , sXn+1 = fXn+1 . Any method which performs
such an efficient update for the relevant data is called an update strategy.
By iteration on the update step, we obtain, starting with an initial point
set X1 = {x1 }, for some x1 ∈ Rd , a nested sequence
X1 ⊂ X2 ⊂ X3 ⊂ . . . ⊂ Xn ⊂ Rd (8.45)
of subsets Xk , containing |Xk | = k interpolation points each. Moreover, two
subsequent point sets Xk ⊂ Xk+1 differ only about the point xk+1 , so that
{xk+1 } = Xk+1 \ Xk for 1 ≤ k ≤ n − 1.
Now we discuss the performance of selected update strategies. We begin
with updates on the Lagrange bases. On this occasion, we introduce another
orthonormal system for S ⊂ F.
Theorem 8.41. Let (Xm )nm=1 be a nested sequence of point sets of the
(m) (m)
form (8.45). Moreover, let (m) = { 1 , . . . , m } ⊂ SXm be their corres-
ponding Lagrange bases, for 1 ≤ m ≤ n, satisfying
(m)
j (xk ) = δjk for 1 ≤ j, k ≤ m.
Then, the sequence
(1) (n)
1 ,..., n
of the leading Lagrange basis functions forms an orthogonal system in SXn ,
where
( j , k )K = δjk · a−1
(j) (k)
kk for 1 ≤ j, k ≤ n
with the diagonal entries a−1 −1
kk of the inverse AK,Xk of AK,Xk , 1 ≤ k ≤ n.

Proof. We distinguish two cases.

Case 1: For j = k the statement follows from Proposition 8.16.
Case 2: Suppose j = k.
Assuming j < k (without loss of generality), we have
(k)
k (xj ) =0 for all xj ∈ Xj ⊂ Xk ,
(k) (k) (j)
i.e., k ⊥ SXj . In particular, ( k , j )K = 0.
8.4 Orthonormal Systems, Convergence, and Updates 299

Next, we develop update strategies for the Cholesky5 decomposition of

the symmetric positive definite interpolation matrix AK,X in (8.6). To this
end, we describe one update step, starting with Xn = {x1 , . . . , xn }. In the
following discussion, it is convenient to use the abbreviation An := AK,Xn .
Now our aim is to efficiently compute the coefficients
(n+1) (n+1)
c(n+1) = (c1 , . . . , cn+1 )T ∈ Rn+1

of the interpolant

n+1
(n+1)
sn+1 = cj K(·, xj ) ∈ SXn+1
j=1

to f on Xn+1 via the solution to the linear system

An+1 c(n+1) = fXn+1

from the coeﬃcients c(n) ∈ Rn of the previous interpolant sn ∈ SXn to f

on Xn . On this occasion, we recall the Cholesky decomposition for sym-
metric positive deﬁnite matrices, which should be familiar from numerical
mathematics (see e.g. [57, Theorem 3.6]). But let us ﬁrst introduce lower
unitriangular matrices.

Deﬁnition 8.42. A lower unitriangular matrix L ∈ Rn×n has the form

⎡ ⎤
1
⎢ l21 1 ⎥
⎢ ⎥
⎢ l31 l32 1 ⎥
L=⎢ ⎥
⎢ .. . . ⎥
⎣ . .. .. ⎦
ln1 · · · · · · ln,n−1 1

i.e., we have ljj = 1 for the diagonal entries of L, 1 ≤ j ≤ n, and vanishing

entries above the diagonal, i.e., ljk = 0 for all 1 ≤ j < k ≤ n.

Theorem 8.43. Every symmetric positive deﬁnite matrix A has a unique

factorization of the form
A = LDLT (8.46)
with a lower unitriangular matrix L and a diagonal D = diag(d1 , . . . , dn )
with positive diagonal entries d1 , . . . , dn > 0.

For a diagonal matrix

√ D = diag(d√ 1 , . . . , dn ) with positive diagonal entries,
we let D1/2 := diag( d1 , . . . , dn ), so that D1/2 · D1/2 = D. Now we can
introduce the Cholesky decomposition.
5
André-Louis Cholesky (1875-1918), French mathematician
300 8 Kernel-based Approximation

Deﬁnition 8.44. For a symmetric positive deﬁnite matrix A in (8.46), the

unique factorization
A = L̄L̄T
with factor L̄ := L · D1/2 , is called the Cholesky decomposition of A.
Now we can describe the Cholesky update. We start with the Cholesky
decomposition
An = L̄n L̄Tn (8.47)
of An = AK,Xn . When adding one interpolation point xn+1 ∈ Rd \ Xn to Xn ,
we wish to determine the Cholesky decomposition of An+1 := AK,Xn+1 for
the interpolation points Xn+1 = Xn ∪ {xn+1 }. To this end, we can use the
Cholesky decomposition of An in (8.47). In the following discussion, we let
1/2
L̄n := Ln · Dn for n ∈ N.
Theorem 8.45. For Xn = {x1 , . . . , xn } ⊂ Rd , let An = AK,Xn be the inter-
polation matrix in (8.6), whose Cholesky decomposition is as in (8.47). Then,
for An+1 = AK,Xn+1 , Xn+1 = Xn ∪ {xn+1 }, the Cholesky decomposition
An+1 = L̄n+1 L̄Tn+1 (8.48)
is given by the Cholesky factor
. /
L̄n 0
L̄n+1 = −1/2 1/2 ∈ R(n+1)×(n+1) , (8.49)
SnT Dn 1 − SnT Dn−1 Sn
where Sn ∈ Rn is the unique solution of the triangular system Ln Sn = Rn ,
for Rn := R(xn+1 ) = (K(x1 , xn+1 ), . . . , K(xn , xn+1 ))T ∈ Rn .
Proof. The matrix An+1 has the form

An Rn
An+1 =
RnT 1
and, moreover, the decomposition
T −1
Ln 0 Dn 0 L n D n Sn
An+1 = · · (8.50)
SnT Dn−1 1 0 1 − SnT Dn−1 Sn 0 1
holds, as we can verify directly by multiplying the factors.
Now note that the three matrix factors on the right hand side in (8.50)
have the required form of the unique decomposition An+1 = Ln+1 Dn+1 LTn+1
for An+1 , according to Theorem 8.43. Therefore, we have in particular

Ln 0
Ln+1 = ∈ R(n+1)×(n+1)
SnT Dn−1 1
and Dn+1 = diag(d1 , . . . , dn , 1 − SnT Dn−1 Sn ) ∈ R(n+1)×(n+1) .
But this immediately yields the Cholesky decomposition in (8.48) with
1/2
the Cholesky factor L̄n+1 = Ln+1 · Dn+1 , for which we can verify the stated
form in (8.49) by multiplying the factors.
8.4 Orthonormal Systems, Convergence, and Updates 301

Now let us discuss the computational complexity of the Cholesky update.

Essentially, we need to determine the vector Sn in (8.49), which can be com-
puted by forward substitution as the solution of the triangular system in
O(n2 ) steps. This allows us to compute the required entries in the last row
of the Cholesky factor L̄n+1 in (8.49) in O(n) steps. Altogether, we only re-
quire at most O(n2 ) steps for the Cholesky update. In contrast, a complete
Cholesky decomposition of An+1 without using the Cholesky factor L̄n of An
costs O(n3 ) floating point operations (flops).
(n+1)
We compute the coefficients c(n+1) = (c1 , . . . , c(n+1) )T ∈ Rn+1 of the
interpolant
(n+1)
n+1
sn+1 = cj K(·, xj ) ∈ SXn+1
j=1

to f on Xn+1 via the solution of the linear equation system

An+1 c(n+1) = fXn+1 (8.51)

eﬃciently as follows. To this end, we assume the coeﬃcients c(n) ∈ Rn of

the previous interpolant sn ∈ SXn to f on Xn to be known. Moreover, we
employ the Cholesky decomposition An+1 = L̄n+1 L̄Tn+1 of An+1 to compute
the solution c(n+1) ∈ Rn+1 of (8.51) in two steps. This is done as follows.
(a) Solve the system L̄n+1 d(n+1) = fXn+1 by forward substitution.
(b) Solve the system L̄Tn+1 c(n+1) = d(n+1) by backward substitution.
Computational methods for solving (a) and (b) should be familiar from
numerical mathematics. The numerical solution of the triangular systems
in (a) and (b) require O(n2 ) ﬂops each. But we can entirely avoid the com-
putational costs in (a). To this end, we take a closer look at the two systems
in (a) and (b).
The system in (a), L̄n+1 d(n+1) = fXn+1 , has the form
. / . /
L̄n 0 d(n) f Xn
−1/2 1/2 · = .
1 − SnT Dn−1 Sn
(n+1)
SnT Dn dn+1 f (xn+1 )

Note that we have already determined the solution d(n) ∈ Rn of the

triangular system L̄n d(n) = fXn with the computation of the interpolant sn ,
whereby we obtain the last coeﬃcient in d(n+1) by
−1/2 (n)
(n+1) f (xn+1 ) − SnT Dn d
dn+1 = 1/2
. (8.52)
1− SnT Dn−1 Sn
(n+1)
But we can avoid the computation of the entry dn+1 in (8.52). To see
this, we consider the system in (b), L̄Tn+1 c(n+1) = d(n+1) , which has the form
302 8 Kernel-based Approximation
. / . /
−1/2
L̄Tn Dn Sn d(n)
1/2 · c(n+1) = (n+1) .
0 1 − SnT Dn−1 Sn dn+1

For the last coeﬃcient in c(n+1) , we have the representation

(n+1) −1/2
(n+1) dn+1 f (xn+1 ) − SnT Dn d(n)
cn+1 = = .
1 − SnT Dn−1 Sn
1/2
1 − SnT Dn−1 Sn

For the computation of the remaining n coeﬃcients in c(n+1) , we apply back-

(n+1)
ward substitution. But in this case, the entry dn+1 in (8.52) is not needed.
Therefore, we require for the substitution in (a) no computational costs at
all, while the backward substitution in (b) costs altogether O(n2 ) ﬂops.

8.5 Stability of the Reconstruction Scheme

In this section, we analyze the numerical stability of the kernel-based inter-
polation method. To this end, we ﬁrst prove basic stability results, before we
discuss the conditioning of the interpolation problem. The investigations of
this section are motivated by the wavelet theory on time-frequency analysis,
where the concept of Riesz stability plays an important role.

8.5.1 Riesz Bases and Riesz Stability

For the special case of kernel-based interpolation from ﬁnite data, we can
characterize Riesz bases in a rather straightforward manner: For a ﬁnite set
X = {x1 , . . . , xn } ⊂ Rd of pairwise distinct interpolation points, the basis
functions BX = {K(·, xj )}nj=1 ⊂ SX are (obviously) a Riesz basis of SX ,
where we have the Riesz stability estimate
2

n

σmin (AK,X )c2 ≤ cj K(·, xj )
≤ σmax (AK,X )c2
2 2
(8.53)
j=1
K

for all c = (c1 , . . . , cn )T ∈ Rn , whose Riesz constants are determined by the

smallest eigenvalue σmin (AK,X ) and the largest eigenvalue σmax (AK,X ) of
AK,X . Indeed, according to Proposition 8.14, we have for G : Rn −→ SX
in (8.21),
n
G(c) = cj K(·, xj ),
j=1

the representation

G(c)2K = c2AK,X = cT AK,X c for all c ∈ Rn .

8.5 Stability of the Reconstruction Scheme 303

Therefore, the stated Riesz stability estimate in (8.53) holds by the Courant6 -
Fischer7 theorem, which should be familiar from linear algebra. In fact, ac-
cording to the Courant-Fischer theorem, the minimal eigenvalue σmin (A) and
the maximal eigenvalue σmax (A) of a symmetric matrix A can be represented
by the minimal and the maximal Rayleigh8 quotient, respectively, i.e.,
c, Ac c, Ac
σmin (A) = min and σmax (A) = max .
c∈Rn \{0} c, c c∈Rn \{0} c, c

By Theorem 6.31, any Riesz basis B has a unique dual Riesz basis B̃.
Now let us determine the dual Riesz basis of BX = {K(·, xj )}nj=1 ⊂ SX . To
this end, we rely on the results from Section 6.2.2. By Theorem 6.31, we can
identify the Lagrange basis of SX as dual to BX , i.e., B̃X = { 1 , . . . , n } ⊂ SX .

Theorem 8.46. For any point set X = {x1 , . . . , xn } ⊂ Rd , the Lagrange

basis B̃X = { j }nj=1 is the unique dual Riesz basis of BX = {K(·, xj )}nj=1 . In
particular, the orthonormality relation

(K(·, xj ), k )K = δjk , (8.54)

holds, for all 1 ≤ j, k ≤ n. Moreover, the stability estimates

2

n

−1
σmax (AK,X )fX 2 ≤ f (xj ) j −1
≤ σmin (AK,X )fX 2 ,
2 2
(8.55)
j=1
K

hold, for all fX ∈ R , and, we have

σmin (AK,X )s2K ≤ sX 22 ≤ σmax (AK,X )s2K (8.56)

for all s ∈ SX .

Proof. The orthonormality relation in (8.54) follows from the reproduction

property of the kernel K, whereby

(K(·, xj ), k )K = k (xj ) = δjk for all 1 ≤ j, k ≤ n.

Due to Theorem 6.31, the Lagrange basis B̃X = { j }nj=1 ⊂ SX is the uniquely
determined dual Riesz basis of BX = {K(·, xj )}nj=1 ⊂ SX .
Moreover, by Proposition 8.36, the representation
2

n
f (xj ) j T −1
= fX A−1 for all fX ∈ Rn
2
= fX AK,X fX
j=1 K,X
K
6
Richard Courant (1888-1972), German-US American mathematician
7
Ernst Sigismund Fischer (1875-1954), Austrian mathematician
8
John William Strutt, 3. Baron Rayleigh (1842-1919), English physicist
304 8 Kernel-based Approximation

holds. According to the Courant-Fischer theorem, the Rayleigh estimates

σmin (A−1 T −1 −1
K,X )fX 2 ≤ fX AK,X fX ≤ σmax (AK,X )fX 2
2 2

hold for all fX ∈ Rn . This implies the stability estimate in (8.55), where
−1
σmax (AK,X ) = σmin (A−1
K,X ) and −1
σmin (AK,X ) = σmax (A−1
K,X ).

Letting f = s ∈ SX in (8.55), we ﬁnally get

2

n

−1
σmax (AK,X )sX 2 ≤ sK = s(xj ) j −1
≤ σmin (AK,X )sX 2
2 2 2
j=1
K

for all s ∈ SX , so that the stated estimates in (8.56) hold.

From the Riesz duality relation between the bases BX = {K(·, xj )}nj=1 and
B̃X = { j }nj=1 , in combination with Theorem 6.31, in particular with (6.22),
we can conclude another important result.

Corollary 8.47. For f ∈ SX , the representations

n
n
f= (f, K(·, xj ))K j = (f, j )K K(·, xj ) (8.57)
j=1 j=1

hold.

Remark 8.48. We can also verify the representations in (8.57) for

n
n
f= cj K(·, xj ) = f (xj ) j ∈ SX
j=1 j=1

directly: On the one hand, we get

cj = ej , c = eTj A−1 T −1

K,X fX = fX AK,X ej = (f, j )K

from Proposition 8.16. On the other hand, we have (f, K(·, xj ))K = f (xj ) by
the reproduction property of the kenel K, for all 1 ≤ j ≤ n.

8.5.2 Conditioning of the Interpolation Problem

In this section, we analyze the conditioning of the interpolation problem,

Problem 8.1. Thereby, we quantify the sensitivity of the interpolation problem
with respect to perturbations of the input data. We restrict ourselves to the
interpolation problem for continuous functions f ∈ C (Ω) on a ﬁxed compact
domain Ω ⊂ Rd , i.e., we only allow interpolation point sets X in Ω, X ⊂ Ω.
In practice, this requirement does usually not lead to severe restrictions.
8.5 Stability of the Reconstruction Scheme 305

For our subsequent analysis, we equip C (Ω) with the maximum norm
· ∞ . Moreover, for any set of interpolation points X = {x1 , . . . , xn } ⊂ Ω,
we denote by IX : C (Ω) −→ SX the interpolation operator for X, which
assigns every function f ∈ C (Ω) to its unique interpolant s ∈ SX satisfying
s X = fX .

Deﬁnition 8.49. For X = {x1 , . . . , xn } ⊂ Ω, the condition number of

the interpolation problem, Problem 8.1, is the smallest constant κ∞ ≡ κ∞,X
satisfying
IX f ∞ ≤ κ∞ · f ∞ for all f ∈ C (Ω),
i.e., κ∞ is the operator norm IX ∞ of IX on C (Ω) w.r.t. · ∞ .

The operator norm IX ∞ = κ∞ can be computed as follows.

Theorem 8.50. For X = {x1 , . . . , xn } ⊂ Ω, the norm IX ∞ of the inter-

polation operator IX : C (Ω) −→ SX is given by the Lebesgue constant

n
Λ∞ := max | j (x)| = max (x)1 , (8.58)
x∈Ω x∈Ω
j=1

i.e., IX ∞ = Λ∞ .

Proof. For any f ∈ C (Ω), let s = IX (f ) ∈ SX ⊂ C (Ω) denote the unique

interpolant to f on X satisfying fX = sX . Using the Lagrange representation
of s in (8.9), we obtain the estimate

n
IX f ∞ = s∞ ≤ max | j (x)| · |f (xj )| ≤ Λ∞ · f ∞ ,
x∈Ω
j=1

and therefore IX ∞ ≤ Λ∞ .

In order to see that IX ∞ ≥ Λ∞ holds, suppose that the maximum of
Λ∞ in (8.58) is attained at x∗ ∈ Ω. Moreover, let g ∈ C (Ω) be a function
with unit norm gL∞ (Ω) = 1, that is satisfying the interpolation conditions
g(xj ) = sgn( j (x∗ )), for all 1 ≤ j ≤ n. Then, we have

n
n
IX g∞ ≥ (IX g)(x∗ ) = j (x
∗
)g(xj ) = | j (x∗ )| = Λ∞
j=1 j=1

and so IX g∞ ≥ Λ∞ , which implies IX ∞ ≥ Λ∞ .

Altogether, the stated identity IX ∞ = Λ∞ holds.
306 8 Kernel-based Approximation

We can compute bounds on the Lebesgue constant Λ∞ as follows.

Proposition 8.51. For X = {x1 , . . . , xn } ⊂ Ω, we have the estimates
n A
A
1 ≤ Λ∞ ≤ a−1
jj ≤ n · σmax (A−1
K,X ) (8.59)
j=1

for the Lebesgue constant Λ∞ , where a−1

jj > 0 is, for 1 ≤ j ≤ n, the j-th
diagonal entry in the inverse A−1
K,X of A K,X .

Proof. We ﬁrst prove the upper bound in (8.59). To this end, we assume that
the maximum in (8.58) is attained at x∗ ∈ Ω. Then, from Example 8.13 and
Proposition 8.16, we get the ﬁrst upper bound in (8.59) by

n
n
Λ∞ = | j (x∗ )| = |δx∗ ( j )|
j=1 j=1
n
n n A

≤ δx∗ K · j K = j K = a−1
jj .
j=1 j=1 j=1

But this immediately implies the second upper bound in (8.59) by

a−1 −1
jj ≤ σmax (AK,X ) for all 1 ≤ j ≤ n.

The lower bound in (8.59) holds by (xj )1 = 1, for 1 ≤ j ≤ n.

We remark that the estimates for Λ∞ in (8.59) are only rather rough.
Optimal bounds for the spectral condition number of AK,X can be found in
the recent work [22] of Diederichs.

8.6 Kernel-based Learning Methods

This section is devoted to one particular variant of linear regression. For the
description of the basic method, we can directly link with our previous inves-
tigations in Sections 2.1 and 2.2. By the introduction of kernel-based learning
methods, we provide an alternative method for data fitting by Lagrange in-
terpolation. Kernel-based learning is particularly relevant, if the input data
(X, fX ) are very large and decontaminated from noise. In such application
scenarios, we wish to reduce, for a suitably chosen linear subspace R ⊂ S,
the empiric 2 -data error
1
ηX (f, s) = sX − fX 22 (8.60)
N
under variation of s ∈ R. To this end, we construct an approximation s∗ to
f , s∗ ≈ f , which, in addition, satisfies specific smoothness requirements. We
measure the smoothness of s∗ by the native energy functional J : S −→ R,
8.6 Kernel-based Learning Methods 307

J(s) := s2K for s ∈ S. (8.61)

To make a compromise between the data error in (8.60) and the smooth-
ness in (8.61), we consider in the following of this section the minimization
of the cost functional Jα : S −→ R, deﬁned as

Jα (s) = ηX (f, s) + αJ(s) for α > 0. (8.62)

The term αJ(s) in (8.62) is called the regularization term, which penalizes
non-smooth elements s ∈ R, that are admissible for the optimization problem.
Moreover, the regularization parameter α > 0 is used to balance between the
data error ηX (f, s) and the smoothness J(s) of s.
Therefore, we can view the approximation method of this section as a
regularization method (see Section 2.2). According to the jargon of approxi-
mation theory, the proposed method of this section is also referred to as
penalized least squares approximation (see, e.g. [30]).

8.6.1 Problem Formulation and Characterization of Solutions

To explain the basic approximation problem, let X = {x1 , . . . , xN } ⊂ Rd be

a ﬁnite point set. Moreover, suppose that Y = {y1 , . . . , yn } is a subset of X,
Y ⊂ X, whose size |Y | = n is much smaller than the size |X| = N of X, i.e.,
n N . Then, our aim is to reconstruct an unknown function f ∈ F from its
values fX ∈ RN by solving the following unconstrained optimization problem.

Problem 8.52. Let α ≥ 0. Determine from given data fX and Y ⊂ X, an

approximation sα ∈ SY to f satisfying

1 1
(f − sα )X 2 + αsα K = min
2 2
(f − s)X 2 + αsK .
2 2
(8.63)
N s∈SY N

We denote the optimization problem in (8.63) as (Pα ).

Before we discuss the well-posedness of problem (Pα ), let us ﬁrst make a

few comments. For α = 0, the optimization problem (P0 ) obviously coincides
with the basic problem of linear least squares approximation [7, 41]. For very
large values of α > 0, the smoothness term αs2K in (8.63) dominates the
data error. In fact, we expect that any sequence {sα }α of solutions sα to (Pα )
converges for α → ∞ to zero, which is the unique minimum of J(s) on SY .
Now we show that the optimization problem (Pα ) has, for any α > 0, a
unique solution. To this end, we choose for the data error the representation
1
ηX (f, s) = fX − AX,Y c22 for s ∈ SY , (8.64)
N
with
AX,Y = (K(xk , yj ))1≤k≤N ;1≤j≤n ∈ RN ×n ,
308 8 Kernel-based Approximation

and the coeﬃcient vector c = (c1 , . . . , cn )T ∈ Rn of

n
s= cj K(·, yj ) ∈ SY . (8.65)
j=1

Therefore, by using the representation

J(s) = s2K = cT AK,Y c for s ∈ S,

we can express the cost functional Jα : S −→ R in (8.63) for (Pα ) as

1
Jα (s) := ηX (f, s) + αJ(s) = fX − AX,Y c22 + αcT AK,Y c. (8.66)
N
Now we prove the existence and uniqueness for the solution of (Pα ).
Theorem 8.53. Let α ≥ 0. Then, the penalized least squares problem (Pα )
has a unique solution sα ∈ SY of the form (8.65), where the coefficients
cα ∈ Rn of sα are uniquely determined by the solution of the normal equation

1 T 1
AX,Y AX,Y + αAK,Y cα = ATX,Y fX . (8.67)
N N
Proof. For any solution sα of (Pα ), the corresponding coefficient vector
cα ∈ Rn minimizes the cost functional Jα in (8.66). Now the gradient of
Jα does necessarily vanish at cα , whereby the representation through the
stated normal equation in (8.67) follows. Note that the coefficient matrix of
the normal equation in (8.67) is, for any α ≥ 0, symmetric positive definite.
Therefore, (Pα ) has a unique solution.
An alternative characterization for the unique solution sα of (Pα ) follows
from our previous results on Euclidean approximation (see Section 4.1).
Theorem 8.54. For α ≥ 0, the solution sα ≡ sα (f ) ∈ SY of (Pα ) satisfies
the condition
1
(f − sα )X , sX = α(sα , s)K for all s ∈ SY . (8.68)
N
Proof. We equip F ×F with a positive semi-definite symmetric bilinear form,
1
[(f, g), (f˜, g̃)]α := fX , f˜X + α(g, g̃)K for f, g, f˜, g̃ ∈ F,
N
yielding the semi-norm
1
|(f, g)|2α = fX 22 + αg2K for f, g ∈ F.
N
Now the solution sα ∈ SY of (Pα ) corresponds to the best approximation
(s∗α , s∗α ) ∈ SY × SY to (f, 0) with respect to (F × F, | · |α ), i.e.,
8.6 Kernel-based Learning Methods 309

|(f, 0) − (s∗α , s∗α )|2α = inf |(f, 0) − (s, s)|2α .

s∈SY

According to Remark 4.2, the best approximation s∗α is unique and, moreover,
characterized by the orthogonality condition

[(f, 0) − (s∗α , s∗α ), (s, s)]α = 0 for all s ∈ SY ,

which is for s∗α = sα equivalent to the condition in (8.68).

The characterizations in the Theorems 8.53 and 8.54 are obviously equi-
valent. Indeed, if we replace s ∈ SY in (8.68) by the standard basis functions
K(·, yk ) ∈ SY , for 1 ≤ k ≤ n, then, the condition in (8.68) can be expressed
as
1
(f − sα )X , R(yk ) = α(sα , K(·, yk ))K for all 1 ≤ k ≤ n, (8.69)
N
where
RT (yk ) = (K(x1 , yk ), . . . , K(xN , yk )) = eTk ATX,Y .
For sα in (8.65) with corresponding coeﬃcients cα ∈ Rn , we get

(sα )X = AX,Y cα ∈ RN ,

and so we obtain the normal equation in (8.69): On the one hand, the left
hand side in (8.69) can be written as
1 1
(f − sα )X , R(yk ) = RT (yk )fX − RT (yk )AX,Y cα
N N
1
= eTk ATX,Y fX − eTk ATX,Y AX,Y cα .
N
On the other hand, the right hand side in (8.69) can be written as

α(sα , K(·, yk ))K = αsα (yk ) = αeTk AK,Y cα ,

where we used the reproduction property

(sα , K(·, yk ))K = sα (yk )

of the kernel K.

8.6.2 Stability, Sensitivity, Error Bounds, and Convergence

Next, we analyze the stability of the proposed regression method. To this

end, we ﬁrst bound the minimum of the cost functional in (8.63) as follows.
Theorem 8.55. For any α ≥ 0, the solution sα ≡ sα (f ) ∈ SY of (Pα )
satisﬁes the stability estimate
1
(sα − f )X 22 + αsα 2K ≤ (1 + α)f 2K .
N
310 8 Kernel-based Approximation

Proof. Let sf ∈ SY denote the (unique) interpolant to f at Y satisfying

(sf − f )Y = 0. Recall sf K ≤ f K from Corollary 8.30. Then, we have

1
N
1
(sα − f )X 22 + αsα 2K = |sα (xk ) − f (xk )|2 + αsα 2K
N N
k=1

1
N
≤ |sf (xk ) − f (xk )|2 + αsf 2K
N
k=1
1
≤ εx 2K · f 2K + αf 2K
N
x∈X\Y
⎛ ⎞
1
=⎝ εx 2K + α⎠ f 2K
N
x∈X\Y

N −n
≤ + α f 2K ≤ (1 + α)f 2K ,
N
where we use the pointwise error estimate in (8.30) along with the uniform
estimate εx K ≤ 1 in (8.32).
Next, we analyze the sensitivity of problem (Pα ) under variation of the
smoothing parameter α ≥ 0. To this end, we ﬁrst observe that the solution
sα ≡ sα (f ) of problem (Pα ) coincides with that of the target function s0 , i.e.,
sα (s0 ) = sα (f ).
Lemma 8.56. For any α ≥ 0, the solution sα ≡ sα (f ) of (Pα ) satisﬁes the
following properties.
(a) The Pythagoras theorem, i.e.,

(f − sα )X 22 = (f − s0 )X 22 + (s0 − sα )X 22 .

(b) The best approximation property sα (s0 ) = sα (f ), i.e.,

1 1
(s0 − sα )X 22 + αsα 2K = min (s0 − s)X 22 + αs2K .
N s∈SY N
Proof. Recall that the solution sα (f ) to (Pα ) is characterized by the condi-
tions (8.68) in Theorem 8.54, where for α = 0, we obtain the characterization
1
(f − s0 )X , sX = 0 for all s ∈ SY . (8.70)
N
For s ∈ SY , this implies the relation

(f − s)X 22 = (f − s0 + s0 − s)X , (f − s0 + s0 − s)X

= (f − s0 )X 22 + 2(f − s0 )X , (s0 − s)X + (s0 − s)X 22
= (f − s0 )X 22 + (s0 − s)X 22 ,
8.6 Kernel-based Learning Methods 311

and so, for s = sα , we get property (a).

To verify property (b), we subtract the representation in (8.70) from that
in (8.68), whereby with
1
(s0 − sα )X , sX = α(sα , s)K for all s ∈ SY (8.71)
N
we get the characterization (8.68) for the unique solution sα (s0 ) of (Pα ), so
that statement (b) follows from Theorem 8.54.

Next, we analyze the convergence of {sα }α for α 0. To this end, we ﬁrst

prove one stability estimate for sα , along with one error bound for sα − s0 .

Theorem 8.57. Let f ∈ F and α ≥ 0. Then, the solution sα ≡ sα (f ) to

problem (Pα ) has the following properties.
(a) sα satisﬁes the stability estimate

sα K ≤ s0 K .

(b) sα satisﬁes the error estimate

1
(sα − s0 )X 22 ≤ αs0 2K .
N
Proof. Letting s = s0 − sα in (8.71) we get
1
(s0 − sα )X 22 + αsα 2K = α(sα , s0 )K . (8.72)
N
By using the Cauchy-Schwarz inequality, this implies
1
(s0 − sα )X 22 + αsα 2K ≤ αsα K · s0 K ,
N
and so the statements in (a) and (b) hold.

Finally, we prove the convergence of sα to s0 , for α 0.

Theorem 8.58. The solution sα of (Pα ) converges to the solution s0 of (P0 ),

for α 0, at the following convergence rates.
(a) With respect to the norm · K , we have the convergence

sα − s0 2K = O(α) for α 0.

(b) With respect to the data error, we have the convergence

1
(sα − s0 )X 22 = o(α) for α 0.
N
312 8 Kernel-based Approximation

Proof. To prove (a), ﬁrst note that

sX := sX 2 for s ∈ SY

is a norm on SY . To see the deﬁniteness of · X on SY , note that sX = 0

implies sX = 0, in particular sY = 0, since Y ⊂ X, in which case s = 0.
Moreover, since SY has ﬁnite dimension, the norms · X and · K are
equivalent on SY , so that there exists some constant C > 0 satisfying

sK ≤ CsX for all s ∈ SY .

This, in combination with property (b) in Theorem 8.57, implies (a) by

sα − s0 2K ≤ C 2 (sα − s0 )X 22 ≤ C 2 N αs0 2K .

To prove (b), we recall the relation (8.72) to obtain

1 1
(sα , s0 )K = (s0 − sα )X 22 + αsα 2K for α > 0.
α N

This in turn implies the identity

2
sα − s0 2K = s0 2K − sα 2K − (s0 − sα )X 22 (8.73)
αN
by

sα − s0 2K = sα 2K − 2(sα , s0 )K + s0 2K

2 1
= sα 2K + s0 2K − (s0 − sα )X 22 + αsα 2K
α N
2
= s0 2K − sα 2K − (s0 − sα )X 22 .
αN
To complete our proof for (b), note that, by statement (a), the left hand
side in (8.73) tends to zero, for α 0, and so does the right hand side
in (8.73) tend to zero. By the stability estimate in Theorem 8.57 (a), we get

0 ≤ s0 K − sα K ≤ sα − s0 K −→ 0 for α 0,

so that sα K −→ s0 K for α 0. Therefore,

2
(s0 − sα )X 22 −→ 0 for α 0,
αN
which completes our proof for (b).
8.7 Exercises 313

8.7 Exercises
Exercise 8.59. Let K : Rd × Rd −→ R be a continuous symmetric function,
for d > 1. Moreover, suppose that for some n ∈ N all symmetric matrices of
the form
AK,X = (K(xk , xj ))1≤j,k≤n ∈ Rn×n ,
for sets X = {x1 , . . . , xn } ⊂ Rd of n pairwise distinct points, are regular.
Show that all symmetric matrices AK,X ∈ Rn×n are positive deﬁnite, as
soon as there is one point set Y = {y1 , . . . , yn } ⊂ Rd for which the matrix
AK,Y ∈ Rn×n is symmetric positive deﬁnite.
Hint: Proof of the Mairhuber-Curtis theorem, Theorem 5.25.

Exercise 8.60. Let F be a Hilbert space of functions f : Rd −→ R with

reproducing kernel K : Rd × Rd −→ R, K ∈ PDd . Moreover, for a set of
interpolation points X = {x1 , . . . , xn } ⊂ Rd , let IX : F −→ SX denote
the interpolation operator which assigns every function f ∈ F to its unique
interpolant s ∈ SX from SX = span{K(·, xj ) | 1 ≤ j ≤ n} satisfying sX = fX .
Prove the following statements.
(a) If the interpolation method is translation-invariant, i.e., if we have for
any ﬁnite set of interpolation points X the translation invariance

(IX f )(x) = (IX+x0 fx0 )(x + x0 ) for all f ∈ F and all x0 ∈ Rd ,

where X + x0 := {x1 + x0 , . . . , xn + x0 } ⊂ Rd and fx0 := f (· − x0 ), then

K has necessarily the form K(x, y) = Φ(x − y), where Φ ∈ PDd .
(b) If the interpolation method is translation-invariant and rotation-invariant,
i.e., if for any ﬁnite set of interpolation points X = {x1 , . . . , xn } ⊂ Rd
and for any rotation matrix Q ∈ Rd×d the identity

(IX f )(x) = (IQX fQ )(Qx) for all f ∈ F

holds, where QX := {Qx1 , . . . , Qxn } ⊂ Rd and fQ := f (QT ·), then K

has necessarily the form K(x, y) = φ(x − y2 ), where φ ∈ PDd .

Exercise 8.61. Let X = {x1 , . . . , xn } ⊂ Rd , d ∈ N, be a ﬁnite set of n ∈ N

points. Show that the functions eixj ,· , for 1 ≤ j ≤ n, are linearly independent
on Rd , if and only if the points in X are pairwise distinct.
Hint: First prove the assertion for the univariate case, d = 1. To this
end, consider, for pairwise distinct points X = {x1 , . . . , xn } ⊂ R the linear
combination

Sc,X (ω) = c1 eix1 ,ω + . . . + cn eixn ,ω for c = (c1 , . . . , cn )T ∈ Rn .

(k)
Then, evaluate the function Sc,X and its derivatives Sc,X , for 1 ≤ k < n, at
ω = 0, to show the implication
314 8 Kernel-based Approximation

Sc,X ≡ 0 =⇒ c=0
(k)
by using the n linear conditions Sc,X (0) = 0, for 0 ≤ k < n. Finally, to
prove the assertion for the multivariate case, d > 1, use the separation of the
components in eixj ,ω , for ω = (ω1 , . . . , ωd )T ∈ Rd and 1 ≤ j ≤ n.
Exercise 8.62. Let K ∈ PDd . Show that the native space norm · K of
the Hilbert space F ≡ FK is stronger than the maximum norm · ∞ , i.e.,
if a sequence (fn )n∈N of functions in F converges w.r.t. · K to f ∈ F, so
that fn − f K −→ 0 for n → ∞, then (fn )n∈N does also converge w.r.t. the
maximum norm · ∞ to f , so that fn − f ∞ −→ 0 for n → ∞.
Exercise 8.63. Let H be a Hilbert space of functions with reproducing ker-
nel K ∈ PDd . Show that H is the native Hilbert space of K, i.e., FK = H.
Hint: First, show the inclusion FK ⊂ H. Then, consider the direct sum

H = FK ⊕ G

to show G = {0}, by contradiction.

Exercise 8.64. Let (ηn )n∈N be a monotonically decreasing zero sequence
of non-negative real numbers, i.e., ηn 0 for n → ∞. Show that there is
a nested sequence of point sets (Xn )n∈N ⊂ Ω, as in (8.41), and a function
f ∈ FΩ satisfying

ηK (f, SXn ) ≥ ηn for all n ∈ N.

Exercise 8.65. Let K(x, y) = Φ(x−y) be positive deﬁnite, K ∈ PDd , where

Φ : Rd −→ R is even and satisﬁes, for α > 0, the growth condition

|Φ(0) − Φ(x)| ≤ Cxα

2 for all x ∈ Br (0), (8.74)

around zero, for some r > 0 and some C > 0. Show that in this case, every
f ∈ F ≡ FK is globally Hölder continuous with Hölder exponent α/2, i.e.,
α/2
|f (x) − f (y)| ≤ Cx − y2 for all x, y ∈ Rd .

Conclude that no positive deﬁnite kernel function K ∈ PDd satisﬁes the

growth condition in (8.74) for α > 2.
Exercise 8.66. Let K(x, y) = Φ(x−y) be positive deﬁnite, K ∈ PDd , where
Φ : Rd −→ R is even and satisﬁes, for α > 0, the growth condition

|Φ(0) − Φ(x)| ≤ Cxα

2 for all x ∈ Br (0),

around zero, for some r > 0 and C > 0. Moreover, for compact Ω ⊂ Rd , let
(Xn )n∈N be a nested sequence of subsets Xn ⊂ Ω, as in (8.41), whose mono-
tonically decreasing ﬁll distances hXn ,Ω are a zero sequence, i.e., hXn ,Ω 0
for n → ∞. Show for f ∈ FΩ the uniform convergence
8.7 Exercises 315

α/2
sf,Xn − f ∞ = O hXn ,Ω for n → ∞.

Determine from this result the convergence rate for the special case of the
Gauss kernel in Example 8.9.

Exercise 8.67. Show that the diagonal entry

1/2
1 − SnT Dn−1 Sn

of the Cholesky factor L̄n+1 in (8.49) is positive. To this end, ﬁrst show the
representation
1 − SnT Dn−1 Sn = εxn+1 ,Xn 2K ,
where εxn+1 ,Xn is the error functional in (8.29) at xn+1 ∈ Xn+1 \ Xn with
respect to the set of interpolation points Xn .
9 Computerized Tomography

Computerized tomography (CT) refers to a popular medical imaging method

in diagnostic radiology, where large data samples are taken from a human
body to generate slices of images to visualize the interior structure, e.g. of
organs, muscles, brain tissue, or bones. But computerized tomography is also
used in other relevant applications areas, e.g. in non-destructive evaluation
of materials.
In the data acquisition of computerized tomography, a CT scan is being
generated by a large set of X-ray beams with known intensity, where the
X-ray beams pass through a medium (e.g. a human body) whose interior
structure is to be recovered. Each CT datum is generated by one X-ray beam
which is travelling along a straight line segment from an emitter to a detector.
If we identify the image domain with a convex set in the plane Ω ⊂ R2 ,
then (for each X-ray beam) the emitter is located at some position xE ∈ Ω,
whereas the detector is located at another position xD ∈ Ω. Therefore, the
X-ray beam passes through Ω along the straight line segment [xE , xD ] ⊂ Ω,
from xE to xD (see Figure 9.1).

xD
Ω

Fig. 9.1. X-ray beam travelling from emitter xE to detector xD along [xE , xD ] ⊂ Ω.

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7_9
318 9 Computerized Tomography

In the acquisition of one CT datum, the initial intensity IE = I(xE ) of

the X-ray beam is being controlled at the emitter, whereas the final intensity
ID = I(xD ) is being measured at the detector. Therefore, the difference
ΔI = IE − ID gives the loss of intensity. Now the datum ΔI depends on
the interior structure (i.e., on the material properties) of the medium along
the straight line segment [xE , xD ]. To be more precise, ΔI quantifies the
medium’s absorption of energy on [xE , xD ].
Now we explain, how the CT data ΔI are interpreted mathematically. By
the law of Lambert1 -Beer2 [6]

dI(x)
= −f (x)I(x) (9.1)
dx
the rate of change for the X-ray intensity I(x) at x is quantified by the factor
f (x), where f (x) is referred to as the attenuation-coefficient function. There-
fore, the attenuation-coefficient function f (x) yields the energy absorption on
the computational domain Ω, and so f (x) represents an important material
property of the scanned medium.
In the following of this chapter, we are interested in the reconstruction
of f (x). To this end, we further study the differential equation (9.1). By
integrating (9.1) along the straight line segment [xE , xD ], we determine the
loss of intensity (or, the loss of energy) of the X-ray beam on [xE , xD ] by
xD xD
dI(x)
=− f (x)dx. (9.2)
xE I(x) xE

Now we can rewrite (9.2) as

xD
I(xE )
log = f (x)dx. (9.3)
I(xD ) xE

The intensity IE = I(xE ) at the emitter and the intensity ID = I(xD ) at

the detector can be controlled or be measured. This measurement yields the
line integral of the attenuation-coeﬃcient function f (x) along [xE , xD ],
xD
f (x)dx. (9.4)
xE

In this chapter, we ﬁrst explain how the attenuation-coeﬃcient function

f (x) can be reconstructed exactly from the set of line integrals in (9.4). This
leads us to a rather comprehensive mathematical discussion, from the prob-
lem formulation to the analytical solution. Then, we develop and analyze nu-
merical algorithms to solve the reconstruction problem for f (x) in relevant
application scenarios. Numerical examples are presented for illustration.
1
Johann Heinrich Lambert (1728-1777), mathematician, physicist, philosopher
2
August Beer (1825-1863), German mathematician, chemist and physicist
9.1 The Radon Transform 319

9.1 The Radon Transform

9.1.1 Representation of Lines in the Plane
We represent any straight line ⊂ R2 in the Euclidean plane by using polar
coordinates. To this end, we consider the orthogonal projection x ∈ of the
origin 0 ∈ R2 onto . Therefore, we can characterize x ∈ as the unique
best approximation to 0 from with respect to the Euclidean norm · 2 .
Moreover, we consider the (unique) angle θ ∈ [0, π), for which the unit vector
nθ = (cos(θ), sin(θ)) is perpendicular to . Then, x ∈ can be represented
by
x = (t cos(θ), t sin(θ)) ∈ ⊂ R2
for some pair (t, θ) ∈ R × [0, π) of polar coordinates. For any straight line
⊂ R2 , the so constructed polar coordinates (t, θ) ∈ R × [0, π) are unique.
As for the converse, for any pair of polar coordinates (t, θ) ∈ R × [0, π)
there is a unique straight line ≡ t,θ ⊂ R2 , which is represented in this way
by (t, θ). We introduce this representation as follows (see Figure 9.2).
Deﬁnition 9.1. For any coordinate pair (t, θ) ∈ R × [0, π), we denote by
t,θ ⊂ R the unique straight line which passes through x = (t cos(θ), t sin(θ))
2

and is perpendicular to the unit vector nθ = (cos(θ), sin(θ)).

n⊥
θ = − sin(θ), cos(θ)

nθ = cos(θ), sin(θ)

•

x = t cos(θ), t sin(θ)

x
t,θ

Fig. 9.2. Representation of straight line t,θ ⊂ R2 by coordinates (t, θ) ∈ R × [0, π).

For the parameterization of a straight line t,θ , for (t, θ) ∈ R × [0, π), we
use the standard point-vector representation, whereby any point (x, y) ∈ t,θ
in t,θ is uniquely represented as a linear combination of the form
320 9 Computerized Tomography

(x, y) = t · nθ + s · n⊥
θ (9.5)

with the curve parameter s ∈ R and the spanning unit vector

n⊥
θ = (− sin(θ), cos(θ)),

which is perpendicular to nθ , i.e., n⊥

θ ⊥ nθ (see Figure 9.2). We can describe
the relation between (t, s) and (x, y) in (9.5) via the linear system

x ≡ x(t, s) = cos(θ)t − sin(θ)s

y ≡ y(t, s) = sin(θ)t + cos(θ)s

or
x cos(θ) − sin(θ) t t
= · = Qθ · (9.6)
y sin(θ) cos(θ) s s
with the rotation matrix Qθ ∈ R2×2 . The inverse of the orthogonal matrix
Qθ is given by the rotation matrix Q−θ = QTθ , whereby the representation

cos(θ) sin(θ) x t x
· = = QTθ · (9.7)
− sin(θ) cos(θ) y s y

follows immediately from (9.6). Moreover, (9.6), or (9.7), yields the relation

t2 + s2 = x2 + y 2 , (9.8)

which will be useful later in this chapter.

9.1.2 Formulation of the Reconstruction Problem

The basic reconstruction problem of computerized tomography, as sketched

at the outset of this chapter, can be stated mathematically as follows.

Problem 9.2. Reconstruct a function f ≡ f (x, y) from line integrals

f (x, y) dx dy, (9.9)
t,θ

that are assumed to be known for all straight lines t,θ , (t, θ) ∈ R × [0, π).

We remark that the CT reconstruction problem, Problem 9.2, cannot be

solved for all bivariate functions f . But under suitable conditions on f we
can reconstruct the function f exactly from its Radon3 data
* +

f (x, y) dx dy (t, θ) ∈ R × [0, π) . (9.10)
t,θ

3
Johann Radon (1887-1956), Austrian mathematician
9.1 The Radon Transform 321

For any function f ∈ L1 (R2 ), the line integral in (9.9) is, for any coordi-
nate pair (t, θ) ∈ R × [0, π), deﬁned as

f (x, y) dx dy = f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) ds, (9.11)
t,θ R

where we use the coordinate transform in (9.6) with the arc length element
9
(ẋ(s), ẏ(s))2 ds = (− sin(θ))2 + (cos(θ))2 ds = ds

on t,θ . This ﬁnally leads us to the Radon transform.

Deﬁnition 9.3. For f ≡ f (x, y) ∈ L1 (R2 ), the function

Rf (t, θ) = f (t cos(θ) −s sin(θ), t sin(θ) + s cos(θ)) ds for t ∈ R, θ ∈ [0, π)
R

is called the Radon transform of f .

Remark 9.4. The Radon transform R is well-deﬁned on L1 (R2 ), where we

have Rf ∈ L1 (R × [0, π)) for f ∈ L1 (R2 ) (see Exercise 9.33). However, there
are functions f ∈ L1 (R2 ), whose Radon transform Rf ∈ L1 (R × [0, π)) is not
ﬁnite in (t, θ) ∈ R × [0, π) (see Exercise 9.34).

Note that the Radon transform R is a linear integral transform which

maps a bivariate function f ≡ f (x, y) in Cartesian coordinates (x, y) to a bi-
variate function Rf (t, θ) in polar coordinates (t, θ). This observation allows us
to reformulate the reconstruction of f , Problem 9.2, more concise as follows.
On this occasion, we implicitly accommodate the requirement f ∈ L1 (R2 ) to
the list of our conditions on f .

Problem 9.5. Determine the inversion of the Radon transform R.

Before we turn to the solution of Problem 9.5, we ﬁrst give some elemen-
tary examples of Radon transforms. We begin with the indicator function
(i.e., the characteristic function) of the disk Br = {x ∈ R2 | x2 ≤ r}, r = 0.

Example 9.6. For the indicator function χBr of the disk Br ,

1 for x2 + y 2 ≤ r2 ,
f (x, y) = χBr (x, y) :=
0 for x2 + y 2 > r2 ,

we compute the Radon transform Rf as follows. We ﬁrst apply the variable

transformation (9.6), which, in combination with the relation (9.8), gives

1 for t2 + s2 ≤ r2 ,
f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) =
0 for t2 + s2 > r2 .
322 9 Computerized Tomography

Note that Rf (t, θ) = 0, if and only if the straight line t,θ does not intersect
with the interior of the disk Br , i.e, if and only if |t| ≥ r. Otherwise, i.e., for
|t| < r, we obtain by
√
r 2 −t2 9
Rf (t, θ) = f (x, y) d(x, y) = √ 1 ds = 2 r 2 − t2
t,θ − r 2 −t2

the length of the straight line segment t,θ ∩ supp(f ) = t,θ ∩ Br . ♦

Example 9.7. We compute the Radon transform of the cone function

9
1 − x2 + y 2 for x2 + y 2 ≤ 1,
f (x, y) =
0 for x2 + y 2 > 1,

or, by transformation (9.6) and on the relation (9.8),

√
1 − t2 + s 2 for t2 + s2 ≤ 1,
f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) =
0 for t2 + s2 > 1.

In this case, we get Rf (t, θ) = 0 for |t| ≥ 1 and

√
1−t2 9
Rf (t, θ) = f (x, y) d(x, y) = √ 1− t2 + s2 ds
t,θ − 1−t2
, √ -
9 t2 1 + 1 − t2
= 1 − t2 − log √
2 1 − 1 − t2

for |t| < 1. ♦

Remark 9.8. For any radially symmetric function f (·) = f (·2 ), the Radon
transform Rf (t, θ) does only depend on t ∈ R, but not on the angle θ ∈ [0, π).
Indeed, in this case, we have the identity

Rf (t, θ) = f (x2 ) dx = f (Qθ x2 ) dx = f (x2 ) dx
t,θ t,0 t,0
= Rf (t, 0)

by application of the variable transform with the rotation matrix Qθ in (9.6).

This observation is consistent with our Examples 9.6 and 9.7.

Now we construct another simple example from elementary functions. In

medical image reconstruction, the term phantom is often used to denote test
images whose Radon transforms can be computed analytically. The phantom
bull’s eye is only one such example for a popular test case.
9.1 The Radon Transform 323

(a) The phantom bull’s eye

1.4

1.2

0.8

0.6

0.4

0.2

0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(b) The Radon transform of bull’s eye

Fig. 9.3. Bull’s eye and its Radon transform (see Example 9.9).
324 9 Computerized Tomography

Example 9.9. The phantom bull’s eye is given by the linear combination
3 1
f (x, y) = χB3/4 (x, y) − χB1/2 (x, y) + χB1/4 (x, y) (9.12)
4 4
of three indicator functions χBr of the disks Br , for r = 3/4, 1/2, 1/4. To
compute Rf , we apply the linearity of operator R, whereby
3 1
Rf (t, θ) = (RχB3/4 )(t, θ) − (RχB1/2 )(t, θ) + (RχB1/4 )(t, θ). (9.13)
4 4
Due to the radial symmetry of f (or, of χBr ), the Radon transform Rf (t, θ)
does depend on t, but not on θ (cf. Remark 9.8). Now we can use the result
of Example 9.6 to represent the Radon transform Rf in (9.13) by linear com-
bination of the Radon transforms RχBr , for r = 3/4, 1/2, 1/4. The phantom
f and its Radon transform Rf are shown in Figure 9.3. ♦

(a) Shepp-Logan-Phantom f (b) Radon transform Rf

Fig. 9.4. The Shepp-Logan phantom and its sinogram.

For further illustration, we ﬁnally consider the Shepp-Logan phantom [66],

a popular test case from medical imaging. The Shepp-Logan phantom f is a
superposition of ten diﬀerent ellipses to sketch a cross section of the human
brain, see Figure 9.4 (a). In fact, the Shepp-Logan phantom is a very popular
test case for numerical simulations, where the Radon transform Rf of f can
be computed analytically. Figure 9.4 (b) shows the Radon transform Rf
displayed in the rectangular coordinate system R × [0, π). Such a represen-
tation of Rf is called sinogram. In computerized tomography, the Shepp-
Logan phantom (along with other popular test cases) is often used to evaluate
the performance of numerical algorithms to reconstruct f from Rf .
9.2 The Filtered Back Projection 325

9.2 The Filtered Back Projection

Now we turn to the inversion of the Radon transform, i.e., we wish to solve
Problem 9.5. To this end, we first note some preliminary observations. Sup-
pose we wish to reconstruct f ≡ f (x, y) from given Radon data (9.10) only
at one point (x, y). In this case, only those values of the line integrals (9.9)
are relevant, whose Radon lines t,θ contain the point (x, y). Indeed, for all
other straight lines t,θ , which do not contain (x, y), the value f (x, y) does
not take influence on the line integral Rf (t, θ).
For this reason, we first wish to find out which Radon lines t,θ do contain
the point (x, y). For any fixed angle θ ∈ [0, π), we can immediately work this
out by using the relation (9.5). In fact, in this case, we necessarily require

t = x cos(θ) + y sin(θ),

see (9.7), and so this condition on t is also suﬃcient. Therefore, only the
straight lines
x cos(θ)+y sin(θ),θ for θ ∈ [0, π)
contain the point (x, y). This observation leads us to the following deﬁnition
for the back projection operator.

Deﬁnition 9.10. For h ∈ L1 (R × [0, π)), the function

1 π
Bh(x, y) = h(x cos(θ) + y sin(θ), θ) dθ for (x, y) ∈ R2
π 0

is called the back projection of h.

Remark 9.11. The back projection is a linear integral transform which

maps a bivariate function h ≡ h(t, θ) in polar coordinates (t, θ) to a bivariate
function Bh(x, y) in Cartesian coordinates (x, y).
Moreover, the back projection B is (up to a positive factor) the adjoint
operator of the Radon transform Rf . For more details on this, we refer to
Exercise 9.39.

Remark 9.12. The back projection B is not the inverse of the Radon trans-
form R. To see this, we make a simple counterexample. We consider the indi-
cator function f := χB1 ∈ L1 (R2 ) of the unit ball B1 = {x ∈ R2 | x2 ≤ 1},
whose (non-negative) Radon transform
√
2 1 − t2 for |t| ≤ 1,
Rf (t, θ) =
0 for |t| > 1

is computed in Example 9.6.√Now we evaluate the back projection

√ B(Rf ) of
Rf at (1 + ε, 0). For ε ∈ (0, 2 − 1), we have 1 + ε ∈ (1, 2) and, moreover,
|(1 + ε) cos(θ)| < 1 for θ ∈ [π/4, 3π/4]. Therefore, we obtain
326 9 Computerized Tomography
π
1
(B(Rf ))(1 + ε, 0) = Rf ((1 + ε) cos(θ), θ) dθ
π 0
3π/4
1
≥ Rf ((1 + ε) cos(θ), θ) dθ
π π/4
3π/4 9
2
= 1 − (1 + ε)2 cos2 (θ) dθ > 0,
π π/4
√
i.e., we have (B(Rf ))(1 + ε, 0) > 0 for all ε ∈ (0, 2 − 1).
Likewise, by the radial symmetry of f , we get for ϕ ∈ (0, 2π)
√
(B(Rf ))((1 + ε) cos(ϕ), (1 + ε) sin(ϕ)) > 0 for all ε ∈ (0, 2 − 1),

see Exercise 9.37, i.e., B(Rf ) is positive on the open annulus

√ 3 √ 4
R1 2 = x ∈ R2 1 < x2 < 2 ⊂ R2 .
√
However, we have f ≡ 0 on R1 2 , so that f is not reconstructed by the back
projection B(Rf ) of its Radon transform Rf , i.e., f = B(Rf ).

Figure 9.5 shows another counterexample by graphical illustration: In

this case, the back projection B is applied to the Radon transform Rf of the
Shepp-Logan phantom f (cf. Figure 9.4). Observe that the sharp edges of
phantom f are blurred by the back projection B. In relevant applications, in
particular for clinical diagnostics, such smoothing effects are clearly undesired.
In the following discussion, we show how we can avoid such undesired effects
by the application of filters.

(a) Shepp-Logan phantom f (b) back projection B(Rf ).

Fig. 9.5. The Shepp-Logan phantom f and its back projection B(Rf ).
9.2 The Filtered Back Projection 327

Now we turn to the inversion of the Radon transform. To this end, we

work with the continuous Fourier transform F, which we apply to bivariate
functions f ≡ f (x, y) in Cartesian coordinates as usual, i.e., by using the
bivariate Fourier transform F ≡ F2 . But for functions h ≡ h(t, θ) in polar
coordinates we apply the univariate Fourier transform F ≡ F1 to variable
t ∈ R, i.e., we keep the angle θ ∈ [0, π) fixed.
Definition 9.13. For a function f ≡ f (x, y) ∈ L1 (R2 ) in Cartesian coordi-
nates the Fourier transform F2 f of f is defined as

(F2 f )(X, Y ) = f (x, y)e−i(xX+yY ) d(x, y).
R2

For a function h ≡ h(t, θ) in polar coordinates satisfying h(·, θ) ∈ L1 (R), for

all θ ∈ [0, π), the univariate Fourier transform F1 h of h is deﬁned as

(F1 h)(S, θ) = h(t, θ)e−iSt dt for θ ∈ [0, π).
R

The following result will lead us directly to the inversion of the Radon
transform. In fact, the Fourier slice theorem (also often referred to as central
slice theorem) is an important result in Fourier analysis.
Theorem 9.14. (Fourier slice theorem). For f ∈ L1 (R2 ), we have
F2 f (S cos(θ), S sin(θ)) = F1 (Rf )(S, θ) for all S ∈ R, θ ∈ [0, π). (9.14)
Proof. For f ≡ f (x, y) ∈ L1 (R2 ), we consider the Fourier transform

F2 f (S cos(θ), S sin(θ)) = f (x, y)e−iS(x cos(θ)+y sin(θ)) dx dy (9.15)
R R

at (S, θ) ∈ R × [0, π). By the variable transformation (9.6), the right hand
side in (9.15) can be represented as

f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ))e−iSt ds dt,
R R
or, as

f (t cos(θ) − s sin(θ), t sin(θ) + s cos(θ)) ds e−iSt dt.
R R

Note that the inner integral coincides with the Radon transform Rf (t, θ).
But this already implies the stated identity

F2 f (S cos(θ), S sin(θ)) = Rf (t, θ)e−iSt dt = F1 (Rf )(S, θ).
R

328 9 Computerized Tomography

Theorem 9.15. (Filtered back projection formula).

For f ∈ L1 (R2 ) ∩ C (R2 ), the filtered back projection formula
1
f (x, y) = B F1−1 [|S|F1 (Rf )(S, θ)] (x, y) for all (x, y) ∈ R2 (9.16)
2
holds.
Proof. For f ∈ L1 (R2 ) ∩ C (R2 ), we have the Fourier inversion formula

−1 1
f (x, y) = F2 (F2 f )(x, y) = (F2 f )(X, Y )ei(xX+yY ) dX dY.
4π 2 R R
Changing variables from Cartesian coordinates (X, Y ) to polar coordinates,
(X, Y ) = (S cos(θ), S sin(θ)) for S ∈ R and θ ∈ [0, π),
and by dX dY = |S| dS dθ we get the representation
π
1
f (x, y) = F2 f (S cos(θ), S sin(θ))eiS(x cos(θ)+y sin(θ)) |S| dS dθ.
4π 2 0 R
By the representation (9.14) in the Fourier slice theorem, this yields
π
1
f (x, y) = F1 (Rf )(S, θ)eiS(x cos(θ)+y sin(θ)) |S| dS dθ
4π 2 0 R
π
1
= F −1 [|S|F1 (Rf )(S, θ)] (x cos(θ) + y sin(θ)) dθ
2π 0 1
1
= B F1−1 [|S|F1 (Rf )(S, θ)] (x, y).
2

Therefore, the reconstruction problem, Problem 9.5, is solved analytically.
But the application of the formula (9.16) leads to critical numerical problems.
Remark 9.16. The filtered back projection (FBP) formula (9.16) is numer-
ically unstable. We can explain this as follows. In the FBP formula (9.16) the
Fourier transform F1 (Rf ) of the Radon transform Rf is being multiplied
by the factor |S|. According to the jargon of signal processing, we say that
F1 (Rf ) is being filtered by |S|, which, on this occasion, explains the naming
filtered back projection. Now the multiplication by the filter |S| in (9.16) is
very critical for high frequencies S, i.e., for S with large magnitude |S|. In
fact, by the FBP formula (9.16) the high-frequency components in Rf are
amplified by the factor |S|. This is particularly critical for noisy Radon data,
since the high-frequency noise level of the recorded signals Rf is in this case
exaggerated by application of the filter |S|.
Conclusion: The filtered back projection formula (9.16) is highly sensitive
with respect to perturbations of the Radon data Rf by noise. For this reason,
the FBP formula (9.16) is entirely useless for practical purposes.
9.3 Construction of Low-Pass Filters 329

9.3 Construction of Low-Pass Filters

To stabilize the filtered back projection, we replace the filter |S| in the FPB
formula (9.16) by a specific low-pass filter. In the general context of Fourier
analysis, a low-pass filter is a function F ≡ F (S) of the frequency variable
S, which maps high-frequency parts of a signal to zero. To this end, we
usually require compact support for F , so that supp(F ) ⊆ [−L, L] for a fixed
bandwidth L > 0, i.e., so that F (S) = 0 for all frequencies S with |S| > L.
In the particular context of the FBP formula (9.16), we require sufficient
approximation quality for the low-pass filter F within the frequency band
[−L, L], i.e.,
F (S) ≈ |S| on [−L, L].
To be more concrete on this, we explain our requirements for F as follows.
Definition 9.17. Let L > 0. Moreover, suppose that W ∈ L∞ (R) is an even
function with compact support supp(W ) ⊆ [−1, 1] satisfying W (0) = 1. A
low-pass filter for the stabilization of (9.16) is a function F : R −→ R of
the form
F (S) = |S| · W (S/L) for S ∈ R,
where L denotes the bandwidth and W is the window of F ≡ FL,W .
Now let us make a few examples for commonly used low-pass filters. In
the following discussion,
1 for |S| ≤ L,
L (S) ≡ χ[−L,L] (S) =
0 for |S| > L,

is, for L > 0, the indicator function of the interval [−L, L], and we let := 1 .
Example 9.18. The Ram-Lak filter FRL is given by the window

WRL (S) = (S),

so that
|S| for |S| ≤ L,
FRL (S) = |S| · L (S) =
0 for |S| > L.
The Ram-Lak ﬁlter is shown in Figure 9.6 (a). ♦
Example 9.19. The Shepp-Logan filter FSL is given by the window

WSL (S) = sinc(πS/2) · (S),

so that
sin(πS/(2L)) 2L
· | sin(πS/(2L))| for |S| ≤ L,
FSL (S) = |S| · · L (S) = π
πS/(2L) 0 for |S| > L.

The Shepp-Logan ﬁlter is shown in Figure 9.6 (b). ♦

330 9 Computerized Tomography
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(a) Ram-Lak ﬁlter

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(b) Shepp-Logan ﬁlter

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(c) cosine ﬁlter

Fig. 9.6. Three commonly used low-pass ﬁlters (see Examples 9.18-9.20).
9.3 Construction of Low-Pass Filters 331
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(a) β = 0.5
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(b) β = 0.6
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

Fig. 9.7. The Hamming ﬁlter Fβ for β ∈ {0.5, 0.6, 0.7} (see Example 9.21).
332 9 Computerized Tomography
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(a) α = 2.5
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

(b) α = 5.0
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-1.5 -1 -0.5 0 0.5 1 1.5

Fig. 9.8. The Gauss ﬁlter Fα for α ∈ {2.5, 5.0, 10.0} (see Example 9.22).
9.3 Construction of Low-Pass Filters 333

Example 9.20. The cosine filter FCF is given by the window

WCF (S) = cos(πS/2) · (S),

so that
|S| · cos(πS/(2L)) for |S| ≤ L,
FCF (S) = |S| · cos(πS/(2L)) · L (S) =
0 for |S| > L.

The cosine ﬁlter is shown in Figure 9.6 (c). ♦

Example 9.21. The Hamming filter Fβ is given by the window

Wβ (S) = (β + (1 − β) cos(πS)) · (S) for β ∈ [1/2, 1].

Note that the Hamming filter Fβ is a combination of the Ram-Lak filter FRL
and the cosine filter FCF . The Hamming filter Fβ is shown in Figure 9.7, for
β ∈ {0.5, 0.6, 0.7}. ♦

Example 9.22. The Gauss filter Fα is given by the window

Wα (S) = exp −(πS/α)2 · (S) for α > 1.

The Gauss ﬁlter Fα is shown in Figure 9.8, for α ∈ {2.5, 5.0, 10.0}. ♦

If we replace the ﬁlter |S| in (9.16) by a low-pass ﬁlter F ≡ F (S), then

the resulting reconstruction of f ,
1
fF (x, y) := B F1−1 [F (S) · F1 (Rf )(S, θ)] (x, y), (9.17)
2
will no longer be exact, i.e., the function fF in (9.17) yields an approximate
reconstruction of f , f ≈ fF . We analyze the approximation behaviour of fF
later in this chapter. But we develop a suitable representation for fF in (9.17)
already now.
Note that any low-pass filter F is absolutely integrable, i.e., F ∈ L1 (R).
In particular, any low-pass filter F has, in contrast to the filter |S|, an inverse
Fourier transform F1−1 F . We use this observation to simplify the representa-
tion of fF in (9.17) by
1
fF (x, y) = B (F1−1 F ∗ Rf )(S, θ) (x, y). (9.18)
2
To further simplify the representation in (9.18), we first prove a useful
relation between R and B. This relation involves the convolution product ∗
that we apply to bivariate functions in Cartesian coordinates and to bivariate
functions in polar coordinates, according to the following definition.
334 9 Computerized Tomography

Deﬁnition 9.23. For f ≡ f (x, y) ∈ L1 (R2 ) and g ≡ g(x, y) ∈ L1 (R2 ), the

convolution f ∗ g between the functions f and g is deﬁned as

(f ∗ g)(X, Y ) = f (X − x, Y − y)g(x, y) dx dy for X, Y ∈ R.
R R

For θ ∈ [0, π) and functions g(·, θ), h(·, θ) ∈ L1 (R), the convolution g ∗ h
between g and h is deﬁned as

(g ∗ h)(T, θ) = g(T − t, θ)h(t, θ) dt for T ∈ R.
R

Theorem 9.24. For h ∈ L1 (R×[0, π)) and f ∈ L1 (R2 ), we have the relation
B (h ∗ Rf ) (X, Y ) = (Bh ∗ f ) (X, Y ) for all (X, Y ) ∈ R2 . (9.19)
Proof. For the right hand side in (9.19), we obtain the representation
(Bh ∗ f ) (X, Y )

= (Bh)(X − x, Y − y)f (x, y) dx dy
R R
π
1
= h((X − x) cos(θ) + (Y − y) sin(θ), θ) dθ f (x, y) dx dy.
π R R 0
By variable transformation on (x, y) by (9.6) and dx dy = ds dt, we obtain

1 π
(Bh ∗ f ) (X, Y ) = h(X cos(θ) + Y sin(θ) − t, θ)(Rf )(t, θ) dt dθ
π 0 R

1 π
= (h ∗ Rf )(X cos(θ) + Y sin(θ), θ) dθ
π 0
= B(h ∗ Rf )(X, Y )
for all (X, Y ) ∈ R2 .
Theorem 9.24 and (9.18) provide a very useful representation for fF ,
where we use the inverse Fourier transform F1−1 F of the filter F by
(F1−1 F )(t, θ) := (F −1 F )(t) for t ∈ R and θ ∈ [0, π)
as a bivariate function.
Corollary 9.25. Let f ∈ L1 (R2 ). Moreover, let F be a filter satisfying
F1−1 F ∈ L1 (R × [0, π)). Then, the representation
1
fF (x, y) = B(F1−1 F ) ∗ f (x, y) = (KF ∗ f ) (x, y), (9.20)
2
holds, where
1
B F1−1 F (x, y)
KF (x, y) :=
2
denotes the convolution kernel of the low-pass filter F .
9.4 Error Estimates and Convergence Rates 335

Remark 9.26. The statement of Corollary 9.25 does also hold without the
assumption F1−1 F ∈ L1 (R×[0, π)), see [5]. Therefore, in Section 9.4, we apply
Corollary 9.25 without any assumptions on the low-pass ﬁlter F .

9.4 Error Estimates and Convergence Rates

To evaluate the quality of low-pass ﬁlters F , we analyze the intrinsic L2 -error

f − fF L2 (R2 ) (9.21)

that is incurred by the utilization of F . To this end, we consider for α > 0

the Sobolev4 space
1 2
Hα (R2 ) = g ∈ L2 (R2 ) | gα < ∞ ⊂ L2 (R2 ),

equipped with the norm · α , where

1 α
gα =
2
1 + x2 + y 2 |Fg(x, y)|2 dx dy for g ∈ Hα (R2 ),
4π 2 R R

where we apply the bivariate Fourier transform, i.e., F = F2 .

We remark that estimates for the L2 -error in (9.21) and for the Lp -error
were proven by Madych [44] in 1990. Moreover, pointwise error estimates
and L∞ -estimates were studied by Munshi et al. [52, 53, 54] in 1991-1993.
However, we do not further pursue their techniques here. Instead of this, we
work with a more recent account by Beckmann [4, 5]. The following result
of Beckmann [5] leads us, without any detour, to useful L2 -error estimates
under rather weak assumptions on f and F in (9.21).

Theorem 9.27. For α > 0, let f ∈ L1 (R2 ) ∩ Hα (R2 ) and W ∈ L∞ (R).

Then, we have for the L2 -error (9.21) of the reconstruction fF = f ∗ KF
in (9.20) the estimate

f − fF L2 (R2 ) ≤ Φα,W (L) + L−α f α ,
1/2
(9.22)

where
(1 − W (S))2
Φα,W (L) := sup 2 2 α
for L > 0. (9.23)
S∈[−1,1] (1 + L S )

Proof. By f ∈ L1 (R2 ) ∩ Hα (R2 ), for α > 0, we have f ∈ L2 (R2 ). Moreover,

we have fF ∈ L2 (R2 ), as shown in [5]. By application of the Fourier convolu-
tion theorem on L2 (R2 ), Theorem 7.43, in combination with the Plancherel
theorem, Theorem 7.45, we get the representation

4
Sergei Lvovich Sobolev (1908-1989), Russian mathematician
336 9 Computerized Tomography

1
f − f ∗ KF 2L2 (R2 ) = Ff − Ff · FKF 2L2 (R2 ;C)
4π 2
1
= Ff − WL · Ff 2L2 (R2 ;C) , (9.24)
4π 2
where for the scaled window WL (S) := W (S/L), S ∈ R, we used the identity

WL ((x, y)2 ) = FKF (x, y) for almost every (x, y) ∈ R2 (9.25)

(see Exercise 9.44). Since supp(WL ) ⊂ [−L, L], we can split the square error
in (9.24) into a sum of two integrals,
1
Ff − WL · Ff 2L2 (R2 ;C)
4π 2
1
= |(Ff − WL · Ff )(x, y)|2 d(x, y) (9.26)
4π 2 (x,y) 2 ≤L

1
+ 2 |Ff (x, y)|2 d(x, y). (9.27)
4π (x,y) 2 >L

By f ∈ Hα (R2 ), we estimate the integral in (9.27) from above by

1
|Ff (x, y)|2 d(x, y)
4π 2 (x,y) 2 >L

1 α
≤ 2
1 + x2 + y 2 L−2α |Ff (x, y)|2 d(x, y)
4π (x,y) 2 >L

≤ L−2α f 2α , (9.28)

whereas for the integral in (9.26), we obtain the estimate

= Φα,W (L) · f 2α . (9.29)

Finally, the sum of the two upper bounds in (9.29) and in (9.28) yields the
stated error estimate in (9.22).
9.4 Error Estimates and Convergence Rates 337

Remark 9.28. For the Ram-Lak ﬁlter from Example 9.18, we have W ≡ 1
on [−1, 1], and so Φα,W ≡ 0. In this case, Theorem 9.27 yields the error
estimate

f − fF L2 (R2 ) ≤ L−α f α = O L−α for L → ∞.

This further implies L2 -convergence, i.e., fF −→ f for L → ∞, for the

reconstruction method fF at convergence rate α.

In our subsequent analysis concerning arbitrary low-pass ﬁlters F , we use

the following result from [4] to prove L2 -convergence fF −→ f , for L → ∞,
i.e.,
f − fF L2 (R2 ) −→ 0 for L → ∞.

Theorem 9.29. Let W ∈ C [−1, 1] satisfy W (0) = 1. Then, we have, for

any α > 0, the convergence

(1 − W (S))2
Φα,W (L) = max α −→ 0 for L → ∞. (9.30)
S∈[0,1] (1 + L2 S 2 )

∗
Proof. Let Sα,W,L ∈ [0, 1] be the smallest maximum on [0, 1] for the function

(1 − W (S))2
Φα,W,L (S) := α for S ∈ [0, 1].
(1 + L2 S 2 )

∗
Case 1: Suppose Sα,W,L is uniformly bounded away from zero, i.e., we
∗
have Sα,W,L ≥ c > 0 for all L > 0, for some c ≡ cα,W > 0. Then,
2
∗
∗
1 − W (Sα,W,L ) 1 − W 2∞,[−1,1]
0 ≤ Φα,W,L Sα,W,L = ∗ α ≤ α −→ 0
1 + L2 (Sα,W,L )2 (1 + L2 c2 )

holds for L → ∞.
∗
Case 2: Suppose Sα,W,L −→ 0 for L → ∞. Then, we have

∗ 2
∗
1 − W (Sα,W,L ) ∗ 2
0≤ Φα,W,L Sα,W,L = ∗ α ≤ 1 − W (Sα,W,L ) −→ 0,
1+ L2 (Sα,W,L )2

for L → ∞, by the continuity of W and with W (0) = 1.

Now the convergence of the reconstruction method fF follows directly

from Theorems 9.27 and 9.29.
338 9 Computerized Tomography

Corollary 9.30. For α > 0, let f ∈ L1 (R2 ) ∩ Hα (R2 ). Moreover, let W be a

continuous window on [0, 1] satisfying W (0) = 1. Then, the convergence
f − fF L2 (R2 ) −→ 0 for L → ∞ (9.31)
holds.
We remark that the assumptions in Corollary 9.30,
W ∈ C ([0, 1]) and W (0) = 1
are satisﬁed by all windows W of the low-pass ﬁlters F in Examples 9.18-9.22.
A more detailed discussion on error estimates and convergence rates for FBP
reconstruction methods fF can be found in the work [4] of Beckmann.

9.5 Implementation of the Reconstruction Method

In this section, we explain how to implement the FBP reconstruction method
efficiently. The starting point for our discussion is the representation (9.18),
whereby, for a fixed low-pass filter F , the corresponding reconstruction fF ,
is given as
1
fF (x, y) = B (F1−1 F ∗ Rf )(S, θ) (x, y). (9.32)
2
Obviously, we can only acquire and process finitely many Radon data
Rf (t, θ) in practice. In the acquisition of Radon data, the X-ray beams are
usually generated, such that the resulting Radon lines t,θ ⊂ Ω are distributed
in the image domain Ω ⊂ R2 on regular geometries.

9.5.1 Parallel Beam Geometry

A commonly used method of data acquisition is referred to as parallel beam
geometry. In this method, the Radon lines t,θ are collected in subsets of
parallel lines. We can explain this more precisely as follows. For a uniform
discretization of the angular variable θ ∈ [0, π) with N distinct angles
θk := kπ/N for k = 0, . . . , N − 1
and for a ﬁxed sampling rate d > 0 with
tj := j · d for j = −M, . . . , M,
a constant number of 2M + 1 Radon data is recorded per angle θk ∈ [0, π),
along the parallel Radon lines { tj ,θk | j = −M, . . . , M }. Therefore, the re-
sulting discretization of the Radon transform consists of N × (2M + 1) Radon
data
{Rf (tj , θk ) | j = −M, . . . , M and k = 0, . . . , N − 1} . (9.33)
Figure 9.9 shows 110 Radon lines tj ,θk ∩ [−1, 1]2 on parallel beam geo-
metry for N = 10 angles θk and 2M + 1 = 11 Radon lines per angle.
9.5 Implementation of the Reconstruction Method 339
1

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

-1
-1 -0.5 0 0.5 1

Fig. 9.9. Parallel beam geometry. Regular distribution of 110 Radon lines tj ,θk ,
for N = 10 angles θk , 2M + 1 = 11 Radon lines per angle, at sampling rate d = 0.2.

9.5.2 Inverse Fourier transform of the low-pass ﬁlters

For our implementation of the FBP reconstruction method fF in (9.32), we

need the inverse Fourier transforms of the chosen low-pass filter F . Note that
any low-pass filter F is, according to Definition 9.17, an even function. There-
fore, the inverse Fourier transform F −1 F of F is an inverse cosine transform.
This observation will simplify our following computations for F −1 F .
We start with the Ram-Lak filter from Example 9.18.

Proposition 9.31. The inverse Fourier transform of the Ram-Lak ﬁlter

FRL (S) = |S| · (S) for S ∈ R

is given by

−1 1 (Lt) · sin(Lt) 2 · sin2 (Lt/2)
F FRL (t) = − for t ∈ R. (9.34)
π t2 t2

The evaluation of F −1 FRL at tj = jd, with sampling rate d = π/L > 0, yields
340 9 Computerized Tomography
⎧
⎨
L2 /(2π) for j = 0,
−1
F FRL (πj/L) = 0 0 even,
for j = (9.35)
⎩
−2L2 /(π 3 · j 2 ) for j odd.

Proof. The inverse Fourier transform F −1 FRL of the even function FRL is
given by the inverse cosine transform
L
1
F −1 FRL (t) = S · cos(tS) dS.
π 0

Now we can compute the representation in (9.34) by elementary calculations,

S=L
−1 1 cos(tS) + (tS) · sin(tS)
F FRL (t) =
π t2
S=0

1 cos(Lt) + (Lt) · sin(Lt) − 1
=
π t2

1 (Lt) · sin(Lt) 2 · sin2 (Lt/2)
= − ,
π t2 t2

where we use the trigonometric identity cos(θ) = 1 − 2 · sin2 (θ/2).

For the evaluation of F −1 FRL at t = πj/L, we obtain

1 (πj) · sin(πj) 2 · sin2 (πj/2)
F −1 FRL (πj/L) = −
π (πj/L)2 (πj/L)2
, 2 -
L2 2 · sin(πj) sin(πj/2)
= −
2π πj (πj/2)

and this already yields the stated representation in (9.35).

Next, we consider the Shepp-Logan ﬁlter from Example 9.19.

Proposition 9.32. The Shepp-Logan ﬁlter

2L
π · | sin(πS/(2L))| for |S| ≤ L,
FSL (S) =
0 for |S| > L,

has the inverse Fourier transform

L cos(Lt − π/2) − 1 cos(Lt + π/2) − 1
F −1 FSL (t) = 2 − for t ∈ R.
π t − π/(2L) t + π/(2L)

The evaluation of F −1 FSL at tj = jd, with sampling rate d = π/L > 0, yields

4L2
F −1 FSL (πj/L) = . (9.36)
π 3 (1 − 4j 2 )
9.5 Implementation of the Reconstruction Method 341

Proof. We compute the inverse Fourier transform F −1 FSL by

L
1 2L
F −1 FSL (t) = · sin(πS/(2L)) · cos(tS) dS
π 0 π
S=L
L cos((t − π/(2L))S) cos((t + π/(2L))S)
= − ,
π2 t − π/(2L) t + π/(2L) S=0

where we used the trigonometric addition formula

x−y x+y
2 sin cos = sin(x) − sin(y)
2 2

for x = (t + π/(2L))S and y = (t − π/(2L))S.

For the evaluation of F −1 FSL at t = πj/L, we obtain

−1 L cos(πj − π/2) cos(πj + π/2)
F FSL (πj/L) = 2 −
π πj/L − π/(2L) πj/L + π/(2L)

1 1
− −
πj/L − π/(2L) πj/L + π/(2L)

L 1 1
= 2 −
π πj/L + π/(2L) πj/L − π/(2L)
4L2
= 3 .
π (1 − 4j 2 )

This completes our proof for the stated representation in (9.36).

For the inverse Fourier transforms of the remaining ﬁlters F from Exam-
ples 9.20-9.22 we refer to Exercise 9.43.

9.5.3 Discretization of the Convolution

Next, we discretize the convolution operator ∗ in (9.32). Our purpose for

doing so is to approximate, for any angle θ ≡ θk ∈ [0, π), k = 0, . . . , N − 1,
the convolution product

(F1−1 F ∗ Rf )(S, θ) = (F1−1 F )(S − t) · Rf (t, θ) dt (9.37)
R

between the functions

u(t) = F1−1 F (t) and v(t) = Rf (t, θ) for t ∈ R

from discrete data

uj = F1−1 F (tj ) and vj = Rf (tj , θk ) for j ∈ Z,

342 9 Computerized Tomography

which we acquire at tj = j · d with sampling rate d = π/L. To this end, we

replace the integral in (9.37), by application of the composite rectangle rule,
with the (inﬁnite) sum
π
(F1−1 F ∗ Rf )(tm , θk ) ≈ um−j · vj for m ∈ Z, (9.38)
L
j∈Z

i.e., we evaluate the convolution u∗v at S = tm = πm/L, m ∈ Z, numerically.

For the convergence of the sum in (9.38), we require (uj )j∈Z , (vj )j∈Z ∈ 1 .
But in relevant application scenarios the situation is much easier. Indeed, we
can assume compact support for the target attenuation-coefficient function
f . In this case, the Radon transform v = Rf (·, θ) also has compact support,
for all θ ∈ [0, π), and so only finitely many Radon data in (vj )j∈Z do not
vanish, so that the sum in (9.38) is finite.
According to our discussion concerning parallel beam geometry, we assume
for the Radon data {Rf (tj , θk )}j∈Z , for any angle θk = kπ/N ∈ [0, π), the
form
vj = Rf (tj , θk ) for j = −M, . . . , M.
But we choose M ∈ N large enough, so that vj = 0 for all |j| > M . In this
way, we can represent the series in (9.38) as a finite sum. This finally yields
by

π
M
(F1−1 F ∗ Rf )(tm , θk ) ≈ um−j · vj for m ∈ Z (9.39)
L
j=−M

a suitable discretization for the convolution product (F1−1 F ∗ Rf ).

9.5.4 Discretization of the Back Projection

Finally, we turn to the discretization of the back projection. Recall that,

according to Deﬁnition 9.10, the back projection of h ∈ L1 (R × [0, π)) at
(x, y) is deﬁned as

1 π
Bh(x, y) = h(x cos(θ) + y sin(θ), θ) dθ for (x, y) ∈ R2 . (9.40)
π 0
Further recall that for the FBP reconstruction method fF in (9.32), the back
projection B is applied to the function

h(S, θ) = (F1−1 F ∗ Rf )(S, θ). (9.41)

To numerically compute the integral in (9.40) at (x, y), we apply the

composite rectangle rule, whereby
N −1
1
Bh(x, y) ≈ h(x cos(θk ) + y sin(θk ), θk ). (9.42)
N
k=0
9.5 Implementation of the Reconstruction Method 343

This, however, leads us to the following problem.

To approximate Bh(x, y) in (9.42) over the Cartesian grid of pixel points
(x, y) we need, for any angle θk , k = 0, . . . , N − 1, the values h(t, θk ) at
t = x cos(θk ) + y sin(θk ). (9.43)
In the previous section, we have shown how to evaluate h in (9.41) at
polar coordinates (tm , θk ) numerically from input data of the form
h(tm , θk ) = (F1−1 F ∗ Rf )(tm , θk ) for m ∈ Z. (9.44)
Now note that t in (9.43) is not necessarily contained in the set {tm }m∈Z .
But we can determine the value h(t, θk ) at t = x cos(θk ) + y sin(θk ) from the
data in (9.44) by interpolation, where we recommend the following methods.
Piecewise constant interpolation: In this method, the value h(t, θk ) at
t ∈ [tm , tm+1 ) is approximated by
h(tm , θk ) for t − tm ≤ tm+1 − t,
h(t, θk ) ≈ I0 h(t, θk ) :=
h(tm+1 , θk ) for t − tm > tm+1 − t.
Note that the resulting interpolant I0 h(·, θk ) is piecewise constant.
Interpolation by linear splines: In this method, the value h(t, θk ) at
t ∈ [tm , tm+1 ) is approximated by
L
h(t, θk ) ≈ I1 h(t, θk ) := [(t − tm )h(tm+1 , θk ) + (tm+1 − t)h(tm , θk )]
π
The spline interpolant I1 h(·, θk ) is globally continuous and piecewise linear.

We summarize the proposed FBP reconstruction method in Algorithm 10.

9.5.5 Numerical Reconstruction of the Shepp-Logan Phantom

We have implemented the FBP reconstruction method, Algorithm 10. For
the purpose of illustration, we apply the FBP reconstruction method to the
Shepp-Logan phantom [66] (see Figure 9.4 (a)). To this end, we use the Shepp-
Logan ﬁlter FSL from Example 9.19 (see Figure 9.6 (b)). The inverse Fourier
transform F −1 FSL is given in Proposition 9.32, along with the functions
values
4L2
(F −1 FSL )(πj/L) = 3 for j ∈ Z,
π (1 − 4j 2 )
which we use to compute the required convolutions in line 9 of Algorithm 10.
To compute the back projection (line 16), we apply linear spline interpolation,
i.e., we choose I = I1 in line 13.
For our numerical experiments we used parameter values as in Table 9.1.
Figure 9.10 shows the resulting FBP reconstructions of the Shepp-Logan
phantoms on a grid of 512 × 512 pixels.
344 9 Computerized Tomography

Algorithm 10 Reconstruction by ﬁltered back projection

1: function Filtered Back Projection(Rf )
2: Input: Radon data Rf ≡ Rf (tj , θk ), k = 0, . . . , N − 1, j = −M, . . . , M ;
3: evaluation points {(xn , ym ) ∈ R2 | (n, m) ∈ Ix × Iy } for (finite)
4: index sets Ix × Iy ⊂ N × N.
5:
6: choose low-pass filter F with window WF and bandwidth L > 0;
7: for k = 0, . . . , N − 1 do
8: for i ∈ I do for (finite) index set I ⊂ Z
9: let compute convolution product (9.39)

π
M
hik := F −1 F ((i − j)π/L) · Rf (tj , θk )
L j=−M 1

10: end for

11: end for
12:
13: choose interpolation method I e.g. linear splines I1
14: for n ∈ Ix do
15: for m ∈ Iy do
16: let compute back projection (9.42)

1
N −1
fnm := Ih(xn cos(θk ) + ym sin(θk ), θk ).
2N
k=0

17: end for

18: end for
19:
20: Output: reconstruction {fnm }(n,m)∈Ix ×Iy with values fnm ≈ fF (xn , ym ).
21: end function

(a) 2460 Radon lines (b) 15150 Radon lines (c) 60300 Radon lines

Fig. 9.10. Reconstruction of the Shepp-Logan phantom by ﬁltered back projection,

Algorithm 10, using the parameters in Table 9.1.
9.6 Exercises 345

Table 9.1. Reconstruction of the Shepp-Logan phantom by ﬁltered back projection,

Algorithm 10. The following values were used for the bandwidth L, the sampling
rate d, the number N of angles θk , at 2M +1 parallel Radon lines tj ,θk per angle θk .
The resulting reconstructions on a grid of 512 × 512 pixels are shown in Figure 9.10.

parameter bandwidth sampling rate # angles # Radon lines

M L=π·M d = π/L N = 3 · M N × (2M + 1)
20 20π 0.05 60 2460
50 50π 0.02 150 15150
100 30π 0.01 300 60300

9.6 Exercises
Exercise 9.33. Prove for f ∈ L1 (R2 ) the estimate

Rf (·, θ)L1 (R) ≤ f L1 (R2 ) for all θ ∈ [0, π)

to conclude
Rf ∈ L1 (R × [0, π)) for all f ∈ L1 (R2 ),
i.e., for f ∈ L1 (R2 ), we have

(Rf )(t, θ) < ∞ for almost every (t, θ) ∈ R × [0, π).

Exercise 9.34. Consider the function f : R2 −→ R, deﬁned as

* −3/2
x2 for x2 ≤ 1
f (x) = x = (x, y) ∈ R2 .
0 for x2 > 1

Show that (Rf )(0, 0) is not ﬁnite, although f ∈ L1 (R2 ).

Exercise 9.35. Show that the Radon transform Rf of f ∈ L1 (R2 ) has com-
pact support, if f has compact support.
Does the converse of this statement hold? I.e., does f ∈ L1 (R2 ) necessarily
have compact support, if supp(Rf ) is compact?

Exercise 9.36. Recall the rotation matrix Qθ ∈ R2×2 in (9.6) and the unit
vector nθ = (cos(θ), sin(θ))T ∈ R2 , for θ ∈ [0, π), respectively.
Prove the following properties for the Radon transform Rf of f ∈ L1 (R2 ).

(a) For fθ (x) = f (Qθ x), the identity

(Rf )(t, θ + ϕ) = (Rfθ )(t, ϕ)

holds for all t ∈ R and all θ, ϕ ∈ [0, π).

346 9 Computerized Tomography

(b) For fx0 (x) = f (x + x0 ), where x0 ∈ R2 , the identity

(Rfx0 )(t, θ) = (Rf )(t + nTθ x0 , θ)

holds for all t ∈ R and all θ ∈ [0, π).

Exercise 9.37. Show that for a radially symmetric function f ∈ L1 (R2 ), the
backward projection B(Rf ) of Rf is radially symmetric.
Now consider the indicator function f = χB1 of the unit ball B1 and its
Radon transform Rf from Example 9.6. Show that the backward projection
B(Rf ) of Rf is positive on the open annulus
√ 3 √ 4
R1 2 = x ∈ R2 1 < x2 < 2 ⊂ R2 .

Hint: Remark 9.12.

Exercise 9.38. Prove the Radon convolution theorem,

R(f ∗ g) = (Rf ) ∗ (Rg) for f, g ∈ L1 (R2 ) ∩ C (R2 )

Hint: Use the Fourier slice theorem, Theorem 9.14.

Exercise 9.39. Show that the backward projection B is (up to factor π) the
adjoint operator of the Radon transform R. To this end, prove the relation

(Rf, g)L2 (R×[0,π)) = π(f, Bg)L2 (R2 )

for g ∈ L2 (R × [0, π)) and f ∈ L1 (R2 ) ∩ L2 (R2 ) satisfying Rf ∈ L2 (R × [0, π)).

Exercise 9.40. In this exercise, we consider a spline ﬁlter of ﬁrst order,

which is a low-pass ﬁlter F : R −→ R of the form

F (S) = |S| · ∧(S) · (S)

(cf. Deﬁnition 9.17) with the linear B-spline ∧ : R −→ R, deﬁned as

1 − |S| for |S| ≤ 1,

∧(S) = (1 − |S|)+ =
0 for |S| > 1.

(a) Show the representation

−1 2 sin2 (x/2) + sinc(x) − 1
(F1 F )(x) = for x ∈ R
π x2

for the inverse Fourier transform F1−1 F of F .

(b) Use the result in (a) to compute (F1−1 F )(πn) for n ∈ Z.
9.6 Exercises 347

Exercise 9.41. A spline ﬁlter Fk of order k ∈ N0 has the form

Fk (S) = |S| · ∧k (S) · (S) for k ∈ N0 , (9.45)

where the B-spline ∧k is deﬁned by the recursion

∧k (S) := (∧k−1 ∗ )(S/αk ) for k ∈ N (9.46)

for the initial value ∧0 := and where, moreover, the positive scaling factor
αk > 0 in (9.46) is chosen, such that supp(∧k ) = [−1, 1].
In this exercise, we construct a spline filter of second order.
(a) Show that the initial value ∧0 yields the Ram-Lak filter, i.e., F0 ≡ FRL .
(b) Show that the scaling factor αk > 0 in (9.46) is, for any k ∈ N, uniquely
determined by the requirement supp(∧k ) = [−1, 1].
(c) Show that ∧1 generates by F1 the spline filter from Exercise 9.40.
Determine the scaling factor α1 of F1 .
(d) Compute the second order spline filter F2 . To this end, determine the
B-spline ∧2 in (9.46), along with its scaling factor α2 .

Exercise 9.42. Develop a construction scheme for higher order spline ﬁlters
Fk of the form (9.45), where k ≥ 3. To this end, apply the recursion in (9.46)
and determine the scaling factors αk , for k ≥ 3.

Exercise 9.43. Compute the inverse Fourier transform F −1 F of the

(a) cosine filter F = FCF from Example 9.20;
(b) Hamming filter F = Fβ from Example 9.21;
(c) Gauss filter F = Fα from Example 9.22.
Compute for each of the filters F in (a)-(c) the values

(F −1 F )(πj/L) for j ∈ Z.

Hint: Proposition 9.31 and Proposition 9.32.

Exercise 9.44. Let F ≡ FL,W be a low-pass ﬁlter with bandwidth L > 0

and window function W : R −→ R, according to Deﬁnition 9.17. Moreover,
let
1
KF (x, y) = B F1−1 F (x, y) for (x, y) ∈ R2
2
be the convolution kernel of F .
Prove for the scaled window WL (S) = W (S/L), S ∈ R, the identity

WL ((x, y)2 ) = FKF (x, y) (9.47)

In which sense does the identity in (9.47) hold?

Hint: Elaborate the details in the proof of [5, Proposition 4.1].
348 9 Computerized Tomography

Exercise 9.45. Implement the reconstruction method of the ﬁltered back

projection (FBP), Algorithm 10. Apply the FBP method to the phantom
bull’s eye (see Example 9.9 and Figure 9.3). To this end, use the bandwidth
L = π · M , the sampling rate d = π/L, and N = 3 · M angles θk , with 2M + 1
parallel Radon lines tj ,θk per angle θk , for M = 10, 20, 50.
For veriﬁcation, the reconstructions with 512 × 512 pixels are displayed
in Figure 9.11, where we used the Shepp-Logan ﬁlter FSL from Example 9.19
(see Figure 9.6 (b)) in combination with linear spline interpolation.

(a) 630 Radon lines (b) 2460 Radon lines (c) 15150 Radon lines

Fig. 9.11. Reconstruction of bull’s eye from Example 9.9 (see Figure 9.3).
References

1. N. Aronszajn: Theory of reproducing kernels. Transactions of the AMS 68,

1950, 337–404.
2. R. Askey: Radial characteristic functions. TSR # 1262, Univ. Wisconsin, 1973.
3. S. Banach, S. Mazur: Zur Theorie der linearen Dimension.
Studia Mathematica 4, 1933, 100–112.
4. M. Beckmann, A. Iske: Error estimates and convergence rates for filtered back
projection. Mathematics of Computation, published electronically on April 30,
2018, https://doi.org/10.1090/mcom/3343.
5. M. Beckmann, A. Iske: Approximation of bivariate functions from fractional
Sobolev spaces by filtered back projection. HBAM 2017-05, U. Hamburg, 2017.
6. A. Beer: Bestimmung der Absorption des rothen Lichts in farbigen Flüssig-
keiten. Annalen der Physik und Chemie 86, 1852, 78–88.
7. Å. Bjørck: Numerical Methods for Least Squares Problems. SIAM, 1996.
8. S. Bochner: Vorlesungen über Fouriersche Integrale.
Akademische Verlagsgesellschaft, Leipzig, 1932.
9. D. Braess: Nonlinear Approximation Theory. Springer, Berlin, 1986.
10. M.D. Buhmann: Radial Basis Functions.
Cambridge University Press, Cambridge, UK, 2003.
11. E.W. Cheney: Introduction to Approximation Theory.
Second edition, McGraw Hill, New York, NY, U.S.A., 1982.
12. W. Cheney, W. Light: A Course in Approximation Theory.
Graduate Studies in Mathematics, vol. 101, AMS, Providence, RI, U.S.A., 2000.
13. O. Christensen: An Introduction to Frames and Riesz Bases.
Second expanded edition, Birkhäuser, 2016.
14. C.K. Chui: Wavelets: A Mathematical Tool for Signal Analysis.
Monographs on Mathematical Modeling and Computation. SIAM, 1997.
15. C.W. Clenshaw: A note on the summation of Chebyshev series.
Mathematics of Computation 9(51), 1955, 118–120.
16. J.W. Cooley, J.W. Tukey. An algorithm for the machine calculation of complex
Fourier series. Mathematics of Computation 19, 1965, 297–301.
17. P.C. Curtis Jr.: N-parameter families and best approximation.
Pacific Journal of Mathematics 9, 1959, 1013–1027.
18. I. Daubechies: Ten Lectures on Wavelets. SIAM, Philadelphia, 1992.
19. P.J. Davis: Interpolation and Approximation. 2nd edition, Dover, NY, 1975.
20. C. de Boor: A Practical Guide to Splines. Revised edition,
Applied Mathematical Sciences, vol. 27, Springer, New York, 2001.
21. R.A. DeVore: Nonlinear approximation. Acta Numerica, 1998, 51–150.

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7
350 References

22. B. Diederichs, A. Iske: Improved estimates for condition numbers of radial basis
function interpolation matrices. J. Approximation Theory, published electroni-
cally on October 16, 2017, https://doi.org/10.1016/j.jat.2017.10.004.
23. G. Faber: Über die interpolatorische Darstellung stetiger Funktionen.
Jahresbericht der Deutschen Mathematiker-Vereinigung 23, 1914, 192–210.
24. G.E. Fasshauer: Meshfree Approximation Methods with Matlab.
World Scientific, Singapore, 2007.
25. G.E. Fasshauer, M. McCourt: Kernel-based Approximation Methods using
Matlab. World Scientific, Singapore, 2015.
26. G.B. Folland: Fourier Analysis and its Applications.
Brooks/Cole, Pacific Grove, CA, U.S.A., 1992.
27. B. Fornberg, N. Flyer: A Primer on Radial Basis Functions with Applications
to the Geosciences. SIAM, Philadelphia, 2015.
28. W. Gander, M.J. Gander, F. Kwok: Scientific Computing – An Introduction
using Maple and MATLAB. Texts in CSE, volume 11, Springer, 2014.
29. C. Gasquet, P. Witomski: Fourier Analysis and Applications. Springer Sci-
ence+Business Media, New York, 1999.
30. M. v. Golitschek: Penalized least squares approximation problems.
Jaen Journal on Approximation Theory 1(1), 2009, 83–96.
31. J. Gomes, L. Velho: From Fourier Analysis to Wavelets. Springer, 2015.
32. A. Haar: Zur Theorie der orthogonalen Funktionensysteme.
Mathematische Annalen 69, 1910, 331–371.
33. M. Haase: Functional Analysis: An Elementary Introduction.
American Mathematical Society, Providence, RI, U.S.A., 2014.
34. P.C. Hansen, J.G. Nagy, D.P. O’Leary: Deblurring Images: Matrices, Spectra,
and Filtering. Fundamentals of Algorithms. SIAM, Philadelphia, 2006.
35. E. Hewitt, K.A. Ross: Abstract Harmonic Analysis I. Springer, Berlin, 1963.
36. K. Höllig, J. Hörner: Approximation and Modeling with B-Splines.
SIAM, Philadelphia, 2013.
37. A. Iske: Charakterisierung bedingt positiv definiter Funktionen für multivari-
ate Interpolationsmethoden mit radialen Basisfunktionen. Dissertation, Uni-
versität Göttingen, 1994.
38. A. Iske: Multiresolution Methods in Scattered Data Modelling. Lecture Notes
in Computational Science and Engineering, vol. 37, Springer, Berlin, 2004.
39. J.L.W.V. Jensen: Sur les fonctions convexes et les inégalités entre les valeurs
moyennes. Acta Mathematica 30, 1906, 175–193.
40. P. Jordan, J. von Neumann: On inner products in linear, metric spaces.
Annals of Mathematics 36(3), 1935, 719–723.
41. C.L. Lawson, R.J. Hanson: Solving Least Squares Problems.
Prentice-Hall, Englewood Cliffs, NJ, U.S.A., 1974.
42. P.D. Lax: Functional Analysis. Wiley-Interscience, New York, U.S.A., 2002.
43. G.G. Lorentz, M. v. Golitschek, Y. Makovoz: Constructive Approximation.
Grundlehren der mathematischen Wissenschaften, Band 304, Springer, 2011.
44. W.R. Madych: Summability and approximate reconstruction from Radon
transform data. In: Integral Geometry and Tomography, E. Grinberg and
T. Quinto (eds.), AMS, Providence, RI, U.S.A., 1990, 189–219.
45. W.R. Madych, S.A. Nelson: Multivariate Interpolation: A Variational Theory.
Technical Report, Iowa State University, 1983.
46. W.R. Madych, S.A. Nelson: Multivariate interpolation and conditionally posi-
tive definite functions. Approx. Theory Appl. 4, 1988, 77–89.
References 351

47. W.R. Madych, S.A. Nelson: Multivariate interpolation and conditionally posi-
tive definite functions II. Mathematics of Computation 54, 1990, 211–230.
48. J. Mairhuber: On Haar’s theorem concerning Chebysheff problems having
unique solutions. Proc. Am. Math. Soc. 7, 1956, 609–615.
49. S. Mallat: A Wavelet Tour of Signal Processing. Academic Press, 1998.
50. G. Meinardus: Approximation of Functions: Theory and Numerical Methods.
Springer, Berlin, 1967.
51. V. Michel: Lectures on Constructive Approximation. Birkhäuser, NY, 2013.
52. P. Munshi: Error analysis of tomographic filters I: theory.
NDT & E Int. 25, 1992, 191–194.
53. P. Munshi, R.K.S. Rathore, K.S. Ram, M.S. Kalra: Error estimates for tomo-
graphic inversion. Inverse Problems 7, 1991, 399–408.
54. P. Munshi, R.K.S. Rathore, K.S. Ram, M.S. Kalra: Error analysis of tomo-
graphic filters II: results. NDT & E Int. 26, 1993, 235–240.
55. J.J. O’Connor, E.F. Robertson: MacTutor History of Mathematics archive.
http://www-history.mcs.st-andrews.ac.uk.
56. M.J.D. Powell: Approximation Theory and Methods.
Cambridge University Press, Cambridge, UK, 1981.
57. A. Quarteroni, R. Sacco, F. Saleri: Numerical Mathematics.
Springer, New York, 2000.
58. M. Reed, B. Simon: Fourier Analysis, Self-Adjointness. In: Methods of Modern
Mathematical Physics II, Academic Press, New York, 1975.
59. E.Y. Remez: Sur le calcul effectiv des polynômes d’approximation des
Tschebyscheff. Compt. Rend. Acad. Sc. 199, 1934, 337.
60. E.Y. Remez: Sur un procédé convergent d’approximations successives pour
déterminer les polynômes d’approximation. Compt. Rend. Acad. Sc. 198, 1934,
2063.
61. R. Schaback: Creating surfaces from scattered data using radial basis functions.
In: Mathematical Methods for Curves and Surfaces, M. Dæhlen, T. Lyche, and
L.L. Schumaker (eds.), Vanderbilt University Press, Nashville, 1995, 477–496.
62. R. Schaback, H. Wendland: Special Cases of Compactly Supported Radial
Basis Functions. Technical Report, Universität Göttingen, 1993.
63. R. Schaback, H. Wendland: Numerische Mathematik. Springer, Berlin, 2005.
64. L.L. Schumaker: Spline Functions: Basic Theory. Third Edition,
Cambridge University Press, Cambridge, UK, 2007.
65. L.L. Schumaker: Spline Functions: Computational Methods. SIAM, 2015.
66. L.A. Shepp, B.F. Logan: The Fourier reconstruction of a head section.
IEEE Trans. Nucl. Sci. 21, 1974, 21–43.
67. G. Szegő: Orthogonal Polynomials. AMS, Providence, RI, U.S.A., 1939.
68. L.N. Trefethen: Approximation Theory and Approximation Practice.
SIAM, Philadelphia, 2013.
69. D.F. Walnut: An Introduction to Wavelet Analysis. Birkhäuser Basel, 2004.
70. G.A. Watson: Approximation Theory and Numerical Methods.
John Wiley & Sons, Chichester, 1980.
71. H. Wendland: Piecewise polynomial, positive definite and compactly supported
radial functions of minimal degree. Advances in Comp. Math. 4, 1995, 389–396.
72. H. Wendland: Scattered Data Approximation.
Cambridge University Press, Cambridge, UK, 2005.
73. Wikipedia. The free encyclopedia. https://en.wikipedia.org/wiki/
74. Z. Wu: Multivariate compactly supported positive definite radial functions.
Advances in Comp. Math. 4, 1995, 283–292.
Subject Index

Algorithm completeness criterion, 197

– Clenshaw, 125, 126, 136 computerized tomography, 317
– divided differences, 33 condition number, 305
– filtered back projection, 344 connected, 159
– Gram-Schmidt, 120 convergence rate, 206
– Neville-Aitken, 27 convex
– pyramid, 270 – function, 73
– Remez, 167, 173 – functional, 74
alternation – hull, 148
– condition, 142, 167 – set, 69
– matrix, 164 convolution, 246, 259, 334
– set, 165 – kernel, 334
– theorem, 165 – theorem
autocorrelation, 247, 259, 282 – – Fourier transform, 259
– – Radon transform, 346
back projection, 325
Banach space, 2 dense subset, 186
band-limited function, 255 Dirac
bandwidth, 255, 329 – approximation theorem, 249
Bernstein – evaluation functional, 283
– operator, 188 – sequence, 248
– polynomial, 187 Dirichlet kernel, 209
Bessel inequality, 109 discrete Fourier transform, 53
best approximation, 61 divided difference, 29, 168
– direct characterization, 87 dual
– dual characterization, 86 – functional, 84
– strongly unique, 92 – space, 84

Chebyshev Euclidean space, 3

– approximation, 139 extremal points, 140
– knots, 43
– norm, 139 ﬁll distance, 295
– partial sum, 125, 231 ﬁltered back projection, 328, 344
– polynomials, 43, 123 formula
Cholesky decomposition, 300 – Euler, 49
complete – Hermite-Genocchi, 36
– orthogonal system, 196 – Leibniz, 37
– orthonormal system, 196 – Rodrigues, 127

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7
354 Subject Index

Fourier Lagrange
– coefficient, 48, 112 – basis, 278
– convolution theorem, 247 – polynomial, 21
– inversion formula, 250, 251, 255, 258 – representation, 21, 278
– matrix, 53 Lebesgue
– operator, 118, 240, 250, 258 – constant, 211, 305
– partial sum, 112 – integrable, 79
– series, 118 Legendre polynomial, 127
– slice theorem, 327 Leibniz formula, 37
– spectrum, 239, 241 Lemma
– transform, 240, 258, 327 – Aitken, 26
frame, 202 – Riemann-Lebesgue, 242
frequency spectrum, 239 Lipschitz
functional – constant, 222
– bounded, 84 – continuity, 222
– continuous, 64 low-pass filter, 329
– convex, 74
– dual, 84 matrix
– linear, 84 – alternation, 164
– design, 10
Gâteaux derivative, 87 – Gram, 106, 286
Gauss – Toeplitz, 57
– filter, 333 – unitriangular, 299
– function, 245, 259, 281 – Vandermonde, 20, 276
– normal equation, 11 minimal
– distance, 61
Hölder inequality, 77, 79 – sequence, 69
Haar Minkowski inequality, 78, 79
– space, 158 modulus of continuity, 224
– system, 158 multiresolution analysis, 266
– wavelet, 261
Hermite Newton
– function, 138, 252 – Cotes quadrature, 235
– Genocchi formula, 36 – polynomial, 28, 168
– polynomials, 130
Hilbert space, 69 operator
– analysis, 199
indicator function, 261 – Bernstein, 188
inequality – difference, 29
– Bessel, 109 – projection, 108
– Hölder, 77, 79 – synthesis, 199
– Jensen, 73 orthogonal
– Minkowski, 78, 79 – basis, 106
– Young, 77 – complement, 108, 266
– projection, 104, 108, 265
Jackson theorems, 217 – system, 196
Jensen inequality, 73 orthonormal
– basis, 107, 267
Kolmogorov criterion, 92 – system, 196, 293
Subject Index 355

parallel beam geometry, 338 strictly convex

parallelogram identity, 66 – function, 73
Parseval identity, 109, 195, 234, 255 – norm, 74
periodic function, 47 – set, 69
polarization identity, 66 support, 242
polynomial
– Bernstein, 187 Theorem
– Chebyshev, 43, 123 – alternation, 165
– Hermite, 130 – Banach-Mazur, 86
– Lagrange, 21 – Banach-Steinhaus, 214
– Legendre, 127 – Bochner, 280
– Newton, 28 – Carathéodory, 150
positive definite function, 277 – Charshiladse-Losinski, 215
projection – de La Vallée Poussin, 236
– operator, 108 – Dini-Lipschitz, 228, 231
– orthogonal, 108 – Faber, 217
pseudoinverse, 18 – Freud, 93
pyramid algorithm, 270 – Jackson, 219, 223–225, 228
– Jordan-von Neumann, 66
radially symmetric, 279 – Korovkin, 189
Radon transform, 321 – Kuzmin, 235
refinement equations, 264 – Madych-Nelson, 288
regularization method, 14 – Mairhuber-Curtis, 160
Remez – Paley-Wiener, 256
– algorithm, 167, 173 – Plancherel, 255, 260
– exchange, 172 – Pythagoras, 108, 290
reproducing kernel, 287 – Shannon, 256
Riemann-Lebesgue lemma, 242 – Weierstrass, 191, 192
Riesz three-term recursion, 121
– basis, 198, 302 Toeplitz matrix, 57
– constant, 198, 302 topological
– stability, 302 – closure, 186
– dual, 84
scale space, 264 translation-invariant, 264, 313
scaling function, 263 trigonometric polynomials, 47, 48
Schwartz space, 251, 260
sequence uniform boundedness principle, 214
– Cauchy, 69 unitriangular matrix, 299
– Dirac, 248
– Korovkin, 187 Vandermonde matrix, 20, 159, 276
sinc function, 40, 243 wavelet, 260
sinogram, 324 – analysis, 269
Sobolev space, 335 – coefficient, 269
space – Haar, 261
– Banach, 2 – space, 267
– Haar, 158 – synthesis, 270
– Hilbert, 69 – transform, 271
– Schwartz, 251 window function, 329
– Sobolev, 335
spline filter, 346 Young inequality, 77
Name Index

Aitken, A.C. (1895-1967), 26 Haar, A. (1885-1933), 158, 260

Hahn, H. (1879-1934), 86
Banach, S. (1892-1945), 86, 214 Hermite, C. (1822-1901), 34, 130
Beer, A. (1825-1863), 318 Hesse, L.O. (1811-1874), 11
Bernstein, S.N. (1880-1968), 187 Hilbert, D. (1862-1943), 69
Bessel, F.W. (1784-1846), 109 Horner, W.G. (1786-1837), 182
Bochner, S. (1899-1982), 280
Jackson, D. (1888-1946), 218
Carathéodory, C. (1873-1950), 150 Jensen, J.L. (1859-1925), 73
Cauchy, A.-L. (1789-1857), 69, 105 Jordan, P. (1902-1980), 66
Chebyshev, P.L. (1821-1894), 139
Cholesky, A.-L. (1875-1918), 299 Kolmogoroﬀ, A.N. (1903-1987), 91
Cooley, J.W. (1926-2016), 56 Korovkin, P.P. (1913-1985), 187
Cotes, R. (1682-1716), 235 Kotelnikov, V. (1908-2005), 257
Courant, R. (1888-1972), 303 Kuzmin, R.O. (1891-1949), 235
Cramer, G. (1704-1752), 166
Curtis, P.C. Jr. (1928-2016), 160 Lagrange, J.-L. (1736-1813), 21
Lambert, J.H. (1728-1777), 318
de L’Hôpital, M. (1661-1704), 211
Laplace, P.-S. (1749-1827), 165
de La Vallée Poussin (1866-1962), 236
Lebesgue, H.L. (1875-1941), 79, 211
Dini, U. (1845-1918), 228
Legendre, A.-M. (1752-1833), 127
Dirac, P.A.M. (1902-1984), 248, 283
Leibniz, G.W. (1646-1716), 37
Dirichlet, P.G.L. (1805-1859), 209
Lipschitz, R. (1832-1903), 222
Euler, L. (1707-1783), 49
Machiavelli, N.B. (1469-1527), 56
Faber, G. (1877-1966), 217 Mairhuber, J.C. (1922-2007), 160
Fischer, E.S. (1875-1954), 303 Mazur, S. (1905-1981), 86
Fourier, J.B.J. (1768-1830), 48 Minkowski, H. (1864-1909), 78
Fréchet, M.R. (1878-1973), 287
Freud, G. (1922-1979), 93 Neumann, J. von (1903-1957), 66
Fubini, G. (1879-1943), 243 Neville, E.H. (1889-1961), 27
Newton, I. (1643-1727), 28
Gâteaux, R. (1889-1914), 87 Nyquist, H. (1889-1976), 257
Gauß, C.F. (1777-1855), 11
Genocchi, A. (1817-1889), 34 Paley, R. (1907-1933), 256
Gram, J.P. (1850-1916), 106 Parseval, M.-A. (1755-1836), 109
Plancherel, M. (1885-1967), 254
Hölder, O. (1859-1937), 77 Pythagoras (around 570-510 BC), 108

A. Iske, Approximation Theory and Algorithms for Data Analysis, Texts
in Applied Mathematics 68, https://doi.org/10.1007/978-3-030-05228-7
358 Name Index

Radon, J. (1887-1956), 320 Szegő, G. (1895-1985), 123

Rayleigh, J.W.S. (1842-1919), 303
Remez, E.Y. (1896-1975), 167 Taylor, B. (1685-1731), 98
Riemann, B. (1826-1866), 242 Tikhonov, A.N. (1906-1993), 15
Riesz, F. (1880-1956), 198, 287 Toeplitz, O. (1881-1940), 57
Rodrigues, B.O. (1795-1851), 127 Tukey, J.W. (1915-2000), 56
Rolle, M. (1652-1719, 162
Vandermonde, A.-T. (1735-1796), 20
Schmidt, E. (1876-1959), 119
Schwartz, L. (1915-2002), 251 Weierstraß, K. (1815-1897), 186
Schwarz, H.A. (1843-1921), 105 Whittaker, E.T. (1873-1956), 257
Shannon, C.E. (1916-2001), 255 Wiener, N. (1894-1964), 256
Sobolev, S.L. (1908-1989), 335
Steinhaus, H. (1887-1972), 214 Young, W.H. (1863-1942), 77

M832 Approximation Theory Course Notes
100% (1)
M832 Approximation Theory Course Notes
94 pages
Finite Elements and Approximation
From Everand
Finite Elements and Approximation
O. C. Zienkiewicz
4.5/5 (4)
M.J.D. Powell - Approximation Theory and Methods-Cambridge University Press (1981)
No ratings yet
M.J.D. Powell - Approximation Theory and Methods-Cambridge University Press (1981)
351 pages
Riemann-Hilbert Problems, Their Numerical Solution and The Computation of Nonlinear Special Functions
100% (1)
Riemann-Hilbert Problems, Their Numerical Solution and The Computation of Nonlinear Special Functions
371 pages
An Introduction to Information Theory
From Everand
An Introduction to Information Theory
Fazlollah M. Reza
No ratings yet
Numerical Analysis 2000. - Vol.1. Approximation Theory (JCAM 121, 2000)
No ratings yet
Numerical Analysis 2000. - Vol.1. Approximation Theory (JCAM 121, 2000)
465 pages
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) pdf download
100% (2)
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) pdf download
55 pages
Immediate download (Ebook) Algebraic Approximation: A Guide to Past and Current Solutions by Jorge Bustamante (auth.) ISBN 9783034801935, 3034801939 ebooks 2024
100% (2)
Immediate download (Ebook) Algebraic Approximation: A Guide to Past and Current Solutions by Jorge Bustamante (auth.) ISBN 9783034801935, 3034801939 ebooks 2024
67 pages
Numerical Approximation Methods
No ratings yet
Numerical Approximation Methods
14 pages
Algebraic Approximation A Guide To Past And Current Solutions Jorge Bustamante download
No ratings yet
Algebraic Approximation A Guide To Past And Current Solutions Jorge Bustamante download
85 pages
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) download pdf
100% (2)
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) download pdf
67 pages
Chap 1
No ratings yet
Chap 1
5 pages
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) 2024 Scribd Download
100% (9)
Algebraic Approximation A Guide to Past and Current Solutions 1st Edition Jorge Bustamante (Auth.) 2024 Scribd Download
77 pages
Mathematics of Approximation 1st Edition Johan De De Villiers - The ebook in PDF format is ready for immediate access
No ratings yet
Mathematics of Approximation 1st Edition Johan De De Villiers - The ebook in PDF format is ready for immediate access
76 pages
Download Complete Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker PDF for All Chapters
100% (1)
Download Complete Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker PDF for All Chapters
77 pages
(Mathematics and Its Applications 26) Richard PDF
No ratings yet
(Mathematics and Its Applications 26) Richard PDF
238 pages
Multivariate Approximation
100% (1)
Multivariate Approximation
296 pages
ApproximationOperators Syllabus Maths AMU
No ratings yet
ApproximationOperators Syllabus Maths AMU
3 pages
Fourier Analysis And Approximation Of Functions Trigub R Bellinsky E instant download
100% (1)
Fourier Analysis And Approximation Of Functions Trigub R Bellinsky E instant download
85 pages
Mathematics of Approximation 1st Edition Johan De De Villiers 2024 Scribd Download
100% (11)
Mathematics of Approximation 1st Edition Johan De De Villiers 2024 Scribd Download
81 pages
Introduction Numerical Analysis
No ratings yet
Introduction Numerical Analysis
411 pages
Greedy Approximation-Cambridge University Press (2011) Vladimir Temlyakov
100% (1)
Greedy Approximation-Cambridge University Press (2011) Vladimir Temlyakov
434 pages
Approximation theory
No ratings yet
Approximation theory
2 pages
Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker - Download the ebook and start exploring right away
100% (3)
Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker - Download the ebook and start exploring right away
57 pages
Introduction Numerical Analysis
No ratings yet
Introduction Numerical Analysis
443 pages
Foundations of the Complex Variable Boundary Element Method by Theodore Hromadka, Robert Whitley (auth.) (z-lib.org)
No ratings yet
Foundations of the Complex Variable Boundary Element Method by Theodore Hromadka, Robert Whitley (auth.) (z-lib.org)
86 pages
Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker download
100% (2)
Learning theory An approximation theory viewpoint 1st Edition Felipe Cucker download
58 pages
Asymptotic and Perturbation Methods
No ratings yet
Asymptotic and Perturbation Methods
129 pages
Introduction Numerical Analysis (1) (1)
No ratings yet
Introduction Numerical Analysis (1) (1)
252 pages
PIII - Approximation Theory (Lectures 17-24 Missing) - Shadrin (2005) 52pg PDF
No ratings yet
PIII - Approximation Theory (Lectures 17-24 Missing) - Shadrin (2005) 52pg PDF
52 pages
Pad e and Hermite-Pad e Approximation and Orthogonality: Walter Van Assche
No ratings yet
Pad e and Hermite-Pad e Approximation and Orthogonality: Walter Van Assche
31 pages
MA4K0 Notes
No ratings yet
MA4K0 Notes
189 pages
Approximation Theory XVI Nashville TN USA May 19 22 2019 First Edition Gregory E Fasshauer Editor Marian Neamtu Editor Larry L Schumaker Editor pdf download
100% (1)
Approximation Theory XVI Nashville TN USA May 19 22 2019 First Edition Gregory E Fasshauer Editor Marian Neamtu Editor Larry L Schumaker Editor pdf download
81 pages
Numerical Methods For Large Eigenvalue Problems
100% (1)
Numerical Methods For Large Eigenvalue Problems
285 pages
topic-5-handout
No ratings yet
topic-5-handout
19 pages
Multivariate Approximation V Temlyakov pdf download
100% (1)
Multivariate Approximation V Temlyakov pdf download
89 pages
Numerical Methods For Eigenvalue Problems (PDFDrive)
No ratings yet
Numerical Methods For Eigenvalue Problems (PDFDrive)
217 pages
218alecturenotes - 6 07 12
No ratings yet
218alecturenotes - 6 07 12
441 pages
Signal Approximation
No ratings yet
Signal Approximation
2 pages
Radial Basis Functions Theory and Implementations
No ratings yet
Radial Basis Functions Theory and Implementations
14 pages
Asymptotic Issues For Some Partial Differential Equations, 2nd Edition, B0D43L8XL4, 9811290431, 2024, by Michel Marie Chipot
No ratings yet
Asymptotic Issues For Some Partial Differential Equations, 2nd Edition, B0D43L8XL4, 9811290431, 2024, by Michel Marie Chipot
283 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
MIR Demidovich B P and Maron I A Computational Mathematics 1981 PDF
No ratings yet
MIR Demidovich B P and Maron I A Computational Mathematics 1981 PDF
688 pages
MIR Demidovich B P and Maron I A Computational Mathematics 1981 PDF
100% (1)
MIR Demidovich B P and Maron I A Computational Mathematics 1981 PDF
688 pages
MIR - Demidovich B. P. and Maron I. A. - Computational Mathematics - 1981
83% (6)
MIR - Demidovich B. P. and Maron I. A. - Computational Mathematics - 1981
688 pages
Complete Download (Ebook) Multivariate Approximation and Applications by N. Dyn, D. Leviatan, D. Levin, A. Pinkus ISBN 9780521800235, 0521800234 PDF All Chapters
100% (1)
Complete Download (Ebook) Multivariate Approximation and Applications by N. Dyn, D. Leviatan, D. Levin, A. Pinkus ISBN 9780521800235, 0521800234 PDF All Chapters
82 pages
Approximation Theory and Methods
No ratings yet
Approximation Theory and Methods
2 pages
Fem 2
No ratings yet
Fem 2
491 pages
Where can buy (Ebook) Generalized Inverse Operators: And Fredholm Boundary-Value Problems by Alexander Andreevych Boichuk; Anatolii M. Samoilenko; Peter V. Malyshev ISBN 9783110378443, 3110378442 ebook with cheap price
100% (3)
Where can buy (Ebook) Generalized Inverse Operators: And Fredholm Boundary-Value Problems by Alexander Andreevych Boichuk; Anatolii M. Samoilenko; Peter V. Malyshev ISBN 9783110378443, 3110378442 ebook with cheap price
77 pages
MIT18 S096F15 TenLec
No ratings yet
MIT18 S096F15 TenLec
165 pages
Nabook
No ratings yet
Nabook
211 pages
Multivariate Approximation and Applications 1st Edition N. Dyn download
No ratings yet
Multivariate Approximation and Applications 1st Edition N. Dyn download
52 pages
Probability: An Introduction
From Everand
Probability: An Introduction
Samuel Goldberg
4.5/5 (5)
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Analysis of Numerical Methods
From Everand
Analysis of Numerical Methods
Eugene Isaacson
3.5/5 (3)
Concepts of Combinatorial Optimization
From Everand
Concepts of Combinatorial Optimization
Vangelis Th. Paschos
No ratings yet
Complex Variables for Scientists and Engineers: Second Edition
From Everand
Complex Variables for Scientists and Engineers: Second Edition
John D. Paliouras
5/5 (1)
Elementary Matrix Algebra
From Everand
Elementary Matrix Algebra
Franz E. Hohn
3.5/5 (7)
Analysis in Euclidean Space
From Everand
Analysis in Euclidean Space
Kenneth Hoffman
No ratings yet
Explorations in Computational Physics
From Everand
Explorations in Computational Physics
Devang Patil
No ratings yet
Raspberry Pi As A Wireless Sensor Node: Performances and Constraints
No ratings yet
Raspberry Pi As A Wireless Sensor Node: Performances and Constraints
6 pages
VCO-Based Quantizers Using FTD and TTD Converters
No ratings yet
VCO-Based Quantizers Using FTD and TTD Converters
64 pages
Raspberry Pi Pico For Radio Amateurs 9783895764813 3895764817 - Compress
100% (1)
Raspberry Pi Pico For Radio Amateurs 9783895764813 3895764817 - Compress
310 pages
Keating Sensors and Transducers
0% (1)
Keating Sensors and Transducers
9 pages
Franklin Experiments Vol2
No ratings yet
Franklin Experiments Vol2
36 pages
Effect of Power Fac 00 Brad
No ratings yet
Effect of Power Fac 00 Brad
174 pages
Table 1. Common Temperature Sensor Types
No ratings yet
Table 1. Common Temperature Sensor Types
7 pages
134B1A
No ratings yet
134B1A
2 pages
Differential Geometry, Gabriel-Paternain
No ratings yet
Differential Geometry, Gabriel-Paternain
44 pages
Common Laplace Transformations PDF
No ratings yet
Common Laplace Transformations PDF
2 pages
Sem232 La Cc03 Group 2
No ratings yet
Sem232 La Cc03 Group 2
16 pages
Holder S Inequality
No ratings yet
Holder S Inequality
8 pages
Guillemin Pollack Errata
No ratings yet
Guillemin Pollack Errata
2 pages
L7 Relations Functions and Graphs
No ratings yet
L7 Relations Functions and Graphs
48 pages
Linear Algebraic/ Linear Algebraic/: by DR Ali Ja Arneh DR Ali Jawarneh
No ratings yet
Linear Algebraic/ Linear Algebraic/: by DR Ali Ja Arneh DR Ali Jawarneh
31 pages
DFT notes (1)
No ratings yet
DFT notes (1)
32 pages
Solution Eigen
100% (2)
Solution Eigen
4 pages
GR 12 Trig
No ratings yet
GR 12 Trig
95 pages
2060A Ex 1 2019
No ratings yet
2060A Ex 1 2019
4 pages
Introduction To The Fast Multipole Method Topics in Computational Biophysics Theory and Implementation 1st Edition Victor Anisimov (Author)
100% (3)
Introduction To The Fast Multipole Method Topics in Computational Biophysics Theory and Implementation 1st Edition Victor Anisimov (Author)
62 pages
Unit 12 PDF
No ratings yet
Unit 12 PDF
38 pages
11th Maths EM Question Bank Year Wise English Medium PDF Download
No ratings yet
11th Maths EM Question Bank Year Wise English Medium PDF Download
68 pages
Tutorial - CVPD - 2019 202019 12 18 13 01 50
No ratings yet
Tutorial - CVPD - 2019 202019 12 18 13 01 50
77 pages
(Ebook) Measure Theory and Integration by Michael E. Taylor ISBN 9780821841808, 0821841807 pdf download
100% (6)
(Ebook) Measure Theory and Integration by Michael E. Taylor ISBN 9780821841808, 0821841807 pdf download
47 pages
HWsol Chap11
No ratings yet
HWsol Chap11
21 pages
Properties of Fourier Series
No ratings yet
Properties of Fourier Series
14 pages
Operator Algebras Operator Theory and Applications 18th International Workshop On Operator Theory and Applications Potchefstroom July 2007 1st Edition Joseph A. Ball All Chapter Instant Download
100% (11)
Operator Algebras Operator Theory and Applications 18th International Workshop On Operator Theory and Applications Potchefstroom July 2007 1st Edition Joseph A. Ball All Chapter Instant Download
84 pages
Composite and Inverse Functions practice
No ratings yet
Composite and Inverse Functions practice
7 pages
Duren - Geometric Function Theory
100% (1)
Duren - Geometric Function Theory
40 pages
HSJ
No ratings yet
HSJ
6 pages
100MCQ XII
No ratings yet
100MCQ XII
9 pages
Maths PPR 2024 Comprehensive
No ratings yet
Maths PPR 2024 Comprehensive
4 pages
Log Equations
76% (29)
Log Equations
26 pages
Airy Function
No ratings yet
Airy Function
6 pages
Categories For Quantum Theory An Introduction Chris Heunen Full Chapter PDF
100% (19)
Categories For Quantum Theory An Introduction Chris Heunen Full Chapter PDF
69 pages
Extra Practice-1
No ratings yet
Extra Practice-1
14 pages
Tensor Intro PDF
No ratings yet
Tensor Intro PDF
2 pages

Iske A Approximation Theory and Algorithms For Data Analysis

Uploaded by

Iske A Approximation Theory and Algorithms For Data Analysis

Uploaded by

Texts in Applied Mathematics 68

More information about this series at http://www.springer.com/series/1214

ISSN 0939-2475 ISSN 2196-9949 (electronic)

Mathematics Subject Classiﬁcation (2010): 41-XX, 42-XX, 65-XX, 94A12

This textbook oﬀers an elementary introduction to the theory and numerics

More Advanced Topics. Chapters 7-9 discuss more advanced topics

Biographical Data. To allow readers to appreciate the historical con-

Acknowledgement. The material of this book has grown over many

Hamburg, October 2018 Armin Iske

2 Basic Methods and Numerical Algorithms . . . . . . . . . . . . . . . . 9

4 Euclidean Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5 Chebyshev Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7 Basic Concepts of Signal Approximation . . . . . . . . . . . . . . . . . . 237

8 Kernel-based Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

9 Computerized Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

Contemporary applications in computational science and engineering, as well

on compact parameter domains Ω ⊂ Rd , d ≥ 1, plays a central role.

i.e., an element s∗ ∈ S which is among all elements s ∈ S closest to f .

© Springer Nature Switzerland AG 2018 1

1.1 Preliminaries, Deﬁnitions and Notations

Deﬁnition 1.1. For a linear space F, a mapping  ·  : F −→ [0, ∞) is said

For the approximation of functions, inﬁnite-dimensional linear spaces F

u∞ := max |u(x)| for u ∈ C (Ω),

Theorem 1.2. For compact Ω ⊂ Rd , (C (Ω),  · ∞ ) is a Banach space.

Further important examples for norms on C (Ω) are the p-norms  · p ,

Example 1.3. For 1 ≤ p < ∞, (C (Ω),  · p ) is a normed space. ♦

We remark that the case p = 2 is of particular interest: In this case, the

via  · 2 = (·, ·)1/2 , so that

To be more general, a linear space F, equipped with an inner product

C k (Ω) = {u : Ω −→ R | u has k continuous derivatives on Ω} ⊂ C (Ω)

C ∞ (Ω) ⊂ C k+1 (Ω) ⊂ C k (Ω) ⊂ C k−1 (Ω) ⊂ · · · ⊂ C 1 (Ω) ⊂ C 0 (Ω) = C (Ω)

of inﬁnite-dimensional linear subsets of C (Ω), where

is the linear space of functions with arbitrary diﬀerentiation order on Ω.

by n parameters c1 , . . . , cn ∈ R. As we will see later, the assumption of

For the representation of algebraic polynomials from Pn the monomial

is a normed linear space. For p = 2 the function space C2πk

Finally, the function space C2π

The approximation of functions from C2π plays an important role in math-

with Fourier coeﬃcients a0 , . . . , an , b1 , . . . , bn ∈ R are used. In this case, we

1.2 Basic Problems and Outlook

between f and S. In relevant application scenarios, we wish to approximate

where we also regard the corresponding sequence of minimal distances

η(f, S0 ) ≥ η(f, S1 ) ≥ . . . ≥ η(f, Sn ) ≥ 0,

whose asymptotic behaviour we will analyze. Now if we wish to approximate

Question: Is there for any f ∈ F and any ε > 0 one n ∈ N satisfying

η(f, Sn ) = s∗n − f  < ε,

where s∗n ∈ Sn is a best approximation to f from Sn ? ♦

If the answer to the above question is positive, then the union

is called dense in F with respect to the norm  · , or, dense subset of F.

Theorem 1.6. (Weierstrass, 1885). For a compact interval [a, b] ⊂ R, the

p − f ∞,[a,b] = max |p(x) − f (x)| < ε.

The above version of the Weierstrass theorem is an algebraic one. But

η(f, Tn ) := inf T − f  and η∞ (f, Tn ) := inf T − f ∞

Theorem 1.7. (Jackson). For f ∈ C2π

1.3 Approximation Methods for Data Analysis

Having studied classical topics of approximation (in Chapters 3-6) we will

1.4 Hints on Classical and More Recent Literature

In this chapter, we discuss basic mathematical methods and numerical al-

Moreover, we discuss numerical algorithms for interpolation, which could

© Springer Nature Switzerland AG 2018 9

2.1 Linear Least Squares Approximation

with basis B and of dimension dim(S) = m. In typical applications of linear

Problem 2.1. Compute from a given set X = {x0 , . . . , xn } ⊂ [a, b] of n + 1

s∗X − fX 22 ≤ sX − fX 22 for all s ∈ S. (2.3)

To solve the minimization of Problem 2.1 we represent s∗ ∈ S as a unique

of the basis functions in B. Thereby, the linear least squares approximation

where the design matrix

contains all evaluations of the basis functions from B at the points in X. To

F (c) = Bc − fX 22 = (Bc − fX )T (Bc − fX ) = cT B T Bc − 2cT B T fX + fX

referred to as Gaussian2 normal equation. If B has full rank, i.e., rank(B) =

of the design matrix B ∈ R(n+1)×m , where Q ∈ R(n+1)×(n+1) is an orthogonal

F (c) = Bc − fX 22 = QRc − fX 22 = Rc − QT fX 22 , (2.9)

QT y2 = y2 for all y ∈ Rn+1 .

Deﬁnition 1.1. For a linear space F, a mapping · : F −→ [0, ∞) is said

u∞ := max |u(x)| for u ∈ C (Ω),

Theorem 1.2. For compact Ω ⊂ Rd , (C (Ω), · ∞ ) is a Banach space.

Further important examples for norms on C (Ω) are the p-norms · p ,

Example 1.3. For 1 ≤ p < ∞, (C (Ω), · p ) is a normed space. ♦

via · 2 = (·, ·)1/2 , so that

η(f, Sn ) = s∗n − f < ε,

is called dense in F with respect to the norm · , or, dense subset of F.

p − f ∞,[a,b] = max |p(x) − f (x)| < ε.

η(f, Tn ) := inf T − f and η∞ (f, Tn ) := inf T − f ∞

s∗X − fX 22 ≤ sX − fX 22 for all s ∈ S. (2.3)

F (c) = Bc − fX 22 = (Bc − fX )T (Bc − fX ) = cT B T Bc − 2cT B T fX + fX

F (c) = Bc − fX 22 = QRc − fX 22 = Rc − QT fX 22 , (2.9)

QT y2 = y2 for all y ∈ Rn+1 .

ηX (f, s) = sX − fX 22

sX − fX 22 + αJ(s) −→ min ! (2.13)

Bc − fX 22 + αc2A −→ minm ! (2.15)

the solution s∗α ∈ S converges for α 0 to a norm minimal solution s∗ of

Fα (c) = Bc − fX 22 + αc2A = cT B T B + αA c − 2cT B T fX + fX

Bα c − gX 22 −→ minm ! , (2.17)

Bc − fX 22 + αc2A = BA−1/2 A1/2 c − fX 22 + αA1/2 c22 .

Cb − fX 22 + αb22 −→ minm ! (2.18)

Cb − fX 22 + αb22 = V ΣW T b − fX 22 + αb22

Cb − fX 22 + αb22 = Σa − V T fX 22 + αa22

Cb − fX 22 −→ minm !

Bc − fX 22 −→ minm !

pk,j (x ) = f for all k − j ≤ ≤ k. (2.26)