0% found this document useful (0 votes)
1 views

Sparse Grids Book

Uploaded by

changleu10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Sparse Grids Book

Uploaded by

changleu10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 123

Sparse grids

Hans-Joachim Bungartz
IPVS, Universität Stuttgart,
Universitätsstraße 38, D-70569 Stuttgart, Germany
E-mail: [email protected]

Michael Griebel
Institut für Numerische Simulation, Universität Bonn,
Wegelerstraße 6, D-53113 Bonn, Germany
E-mail: [email protected]

We present a survey of the fundamentals and the applications of sparse grids,


with a focus on the solution of partial differential equations (PDEs). The
sparse grid approach, introduced in Zenger (1991), is based on a higher-
dimensional multiscale basis, which is derived from a one-dimensional multi-
scale basis by a tensor product construction. Discretizations on sparse grids
involve O(N ·(log N )d−1 ) degrees of freedom only, where d denotes the under-
lying problem’s dimensionality and where N is the number of grid points in
one coordinate direction at the boundary. The accuracy obtained with piece-
wise linear basis functions, for example, is O(N −2 · (log N )d−1 ) with respect
to the L2 - and L∞ -norm, if the solution has bounded second mixed derivat-
ives. This way, the curse of dimensionality, i.e., the exponential dependence
O(N d ) of conventional approaches, is overcome to some extent. For the en-
ergy norm, only O(N ) degrees of freedom are needed to give an accuracy of
O(N −1 ). That is why sparse grids are especially well-suited for problems of
very high dimensionality.
The sparse grid approach can be extended to nonsmooth solutions by adaptive
refinement methods. Furthermore, it can be generalized from piecewise linear
to higher-order polynomials. Also, more sophisticated basis functions like
interpolets, prewavelets, or wavelets can be used in a straightforward way.
We describe the basic features of sparse grids and report the results of various
numerical experiments for the solution of elliptic PDEs as well as for other
selected problems such as numerical quadrature and data mining.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


CONTENTS
1 Introduction 148
2 Breaking the curse of dimensionality 151
3 Piecewise linear interpolation on sparse grids 154
4 Generalizations, related concepts, applications 188
5 Numerical experiments 219
6 Concluding remarks 255
References 256

1. Introduction
The discretization of PDEs by conventional methods is limited to prob-
lems with up to three or four dimensions, due to storage requirements and
computational complexity. The reason is the so-called curse of dimension-
ality, a term coined in Bellmann (1961). Here, the cost of computing and
representing an approximation with a prescribed accuracy ε depends expo-
nentially on the dimensionality d of the considered problem. We encounter
complexities of the order O(ε−αd ) with α > 0 depending on the respect-
ive approach, the smoothness of the function under consideration, and the
details of the implementation. If we consider simple uniform grids with
piecewise d-polynomial functions over a bounded domain in a finite element
or finite difference approach, for instance, this complexity estimate trans-
lates to O(N d ) grid points or degrees of freedom for which approximation

accuracies of the order O(N −α ) are achieved, where α depends on the
smoothness of the function under consideration and the polynomial degree
of the approximating functions.1 Thus, the computational cost and storage
requirements grow exponentially with the dimensionality of the problem,
which is the reason for the dimensional restrictions mentioned above, even
on the most powerful machines presently available.
The curse of dimensionality can be circumvented to some extent by re-
stricting the class of functions under consideration. If we make a stronger
assumption on the smoothness of the solution such that the order of ac-
curacy depends on d in a negative exponential way, i.e., it behaves like
O(N −β/d ), we directly see that the cost–benefit ratio is independent of d
and that it is of the order O(N d · N −β/d ) = O(N −β ), with some β inde-
pendent of d. This way, the curse of dimensionality can be broken easily.
However, such an assumption is somewhat unrealistic.

1
If the solution is not smooth but possesses singularities, the order α of accuracy de-
teriorates. Adaptive refinement/nonlinear approximation is employed with success. In
the best case, the cost–benefit ratio of a smooth solution can be recovered.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


In the sparse grid method, in principle, we follow this approach, but
we only assume the functions to live in spaces of functions with bounded
mixed derivatives instead. First, we need a 1D multilevel basis, preferably
an H 1 - and L2 -stable one. Then, if we express a 1D function as usual as
a linear combination of these basis functions, the corresponding coefficients
decrease from level to level with a rate which depends on the smoothness
of the function and on the chosen set of basis functions. From this, by
a simple tensor product construction, we obtain a multilevel basis for the
higher-dimensional case. Note that here 1D bases living on different levels
are used in the product construction, i.e., we obtain basis functions with
anisotropic support in the higher-dimensional case. Now, if the function
to be represented has bounded second mixed derivatives and if we use a
piecewise linear 1D basis as a starting point, it can be shown that the cor-
responding coefficients decrease with a factor proportional to 2−2|l|1 where
the multi-index l = (l1 , . . . , ld ) denotes the different levels involved. If we
then omit coefficients whose absolute values are smaller than a prescribed
tolerance, we obtain sparse grids. It turns out that the number of degrees
of freedom needed for some prescribed accuracy no longer depends (up to
logarithmic factors) on d exponentially. This allows substantially faster
solution of moderate-dimensional problems and can enable the solution of
higher-dimensional problems.2
Of course, this sparse grid approach is not restricted to the piecewise
linear case. Extensions of the standard piecewise linear hierarchical basis
to general polynomial degree p as well as interpolets (see Section 4.3) or
(pre-) wavelets (see Section 4.4) have been successfully used as the uni-
variate ingredient for tensor product construction. Finally, the spectrum
of applications of sparse grids ranges from numerical quadrature, via the
discretization of PDEs, to fields such as data mining.
The remainder of this paper is organized as follows. In Section 2, we
briefly discuss the breaking of the curse of dimensionality from the theoret-
ical point of view. Here, we resort to harmonic analysis and collect known
approaches for escaping the exponential dependence on d. It turns out that
one example of such a method simply corresponds to the assumption of
using a space of bounded mixed first variation, i.e., an anisotropic multi-
variate smoothness assumption. Except for the degree of the derivative,
this resembles the explicit assumption of Section 3, where we introduce the
principles of the sparse grid technique and derive the interpolation prop-
erties of the resulting sparse grid spaces. As a starting point, we use the

2
The constant in the corresponding complexity estimates, however, still depends ex-
ponentially on d. This still limits the approach for PDEs to moderate-dimensional
problems. At present we are able to deal with 18D PDEs.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


standard piecewise linear multiscale basis in one dimension, i.e., the Faber
basis (Faber 1909, Yserentant 1986), as the simplest example of a multi-
scale series expansion which involves interpolation by piecewise linears. To
attain higher dimensions, we resort to a tensor product construction. Then,
for functions with bounded mixed second derivatives, sparse grid approx-
imation schemes are derived which exhibit cost complexities of the order
O(N (log N )d−1 ) and give an error of O(N −2 (log N )d−1 ) in the L2 -norm.
We then show that, for our smoothness assumption, the sparse grid dis-
cretization scheme can also be formally derived by solving an optimization
problem which is closely related to n-term approximation (DeVore 1998).
Finally, we consider the energy norm, for which optimality leads us to an
energy-based sparse grid with cost complexity O(N ) and accuracy O(N −1 ).
Thus the exponential dependence of the logarithmic terms on d is completely
removed (although present in the constants).
In Section 4 we discuss generalizations of the above approach based on
piecewise linear hierarchical bases. First, instead of these, higher-order poly-
nomial hierarchical bases can be employed. Here we describe the construc-
tion of multilevel polynomial bases by means of a hierarchical Lagrangian
interpolation scheme and analyse the resulting sparse grid approximation
properties. A further example of an extension is based on the interpolets
due to Deslauriers and Dubuc (1989). Such higher-order approaches allow
us to take advantage of higher regularity assumptions concerning the solu-
tion, resulting in better approximation rates. However, all these hierarchical
multiscale bases have a crucial drawback when d > 1. They are not stable,
in the sense that the error can be estimated from above by a multilevel norm
with constants independent of the level, but not from below. Lower bounds
can be obtained by using wavelet-type multiscale bases, semi-orthogonal
prewavelets, or related biorthogonal multiscale bases instead. Again, then,
the tensor product construction and successive optimization lead to sparse
grids. Finally, we close this section with a short overview of the state of the
art of sparse grid research. Here, the focus is put on the numerical treat-
ment of PDEs based on different discretization approaches, and we include
a short discussion of adaptive grid refinement and fast solvers in the sparse
grid context.
In Section 5 we present numerical results of selected experiments. First, to
show the properties of the sparse grid approximation, we discuss some PDE
model problems in two or three dimensions. Second, we apply sparse grids
to the solution of flow problems via the Navier–Stokes equations. Finally,
we present results for two non-PDE problem classes of high dimensionality:
numerical quadrature and data mining.
The concluding remarks of Section 6 close this discussion of sparse grid
methods.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


2. Breaking the curse of dimensionality
Classical approximation schemes exhibit the curse of dimensionality (Bell-
mann 1961) mentioned above. We have

f − fn  = O(n−r/d ),
where r and d denote the isotropic smoothness of the function f and the
problem’s dimensionality, respectively. This is one of the main obstacles
in the treatment of high-dimensional problems. Therefore, the question is
whether we can find situations, i.e., either function spaces or error norms,
for which the curse of dimensionality can be broken. At first glance, there
is an easy way out: if we make a stronger assumption on the smoothness
of the function f such that r = O(d), then, we directly obtain f − fn  =
O(n−c ) with constant c > 0. Of course, such an assumption is completely
unrealistic.
However, about ten years ago, Barron (1993) found an interesting result.
Denote by FL1 the class of functions with Fourier transforms in L1 . Then,
consider the class of functions of Rd with
∇f ∈ FL1 .
We expect an approximation rate
f − fn  = O(n−1/d )
since ∇f ∈ FL1 ≈ r = 1. However, Barron was able to show
f − fn  = O(n−1/2 )
independent of d. Meanwhile, other function classes are known with such
properties. These comprise certain radial basis schemes, stochastic sampling
techniques, and approaches that work with spaces of functions with bounded
mixed derivatives.
A better understanding of these results is possible with the help of har-
monic analysis (Donoho 2000). Here we resort to the approach of the L1 -
combination of L∞ -atoms; see also Triebel (1992) and DeVore (1998). Con-
sider the class of functions F(M ) with integral representation

f (x) = A(x, t) d|µ|(t)

with 
d|µ|(t) ≤ M, (2.1)

where, for fixed t, we call A(x, t) = At (x) an L∞ -atom if |At (x)| ≤ 1 holds.
Then there are results from Maurey for Banach spaces and Stechkin in

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Fourier analysis which state that there exists an n-term sum

n
fn (x) = aj Atj (x)
j=1

where
f − fn ∞ ≤ C · n−1/2
with C independent of d. In the following, we consider examples of such
spaces.
Example 1. (Radial basis schemes) Consider superpositions of Gaus-
sian bumps. These resemble the space F(M, Gaussians) with t := (x0 , s)
and Gaussian atoms A(x, t) = exp(−x − x0 2 /s2 ), where  ·  denotes the
Euclidean norm. Now, if the sum of the height of all Gaussians is bounded
by M , Niyogi and Girosi (1998) showed that the resulting approximation
rate is independent of d for the corresponding radial basis schemes. There is
no further condition on the widths or positions of the bumps. Note that this
corresponds to a ball in Besov space B1,1d (Rd ) which is just Meyer’s bump

algebra (Meyer 1992). Thus, we have a restriction to smoother functions in


higher dimensions such that the ratio r/d stays constant and, consequently,
n−r/d does not grow with d.
Example 2. (Orthant scheme I) Another class of functions with an
approximation rate independent of d is F(M, Orthant). Now t = (x0 , k),
and k is the orthant indicator. Furthermore, A(x, t) is the indicator of or-
thant k with apex at x0 . Again, if the integral (2.1) is at most M , the result-
ing approximation rate is of order O(n−1/2 ) independent of d. A typical and
well-known example for such a construction is the cumulative distribution
function in Rd . This simply results in the Monte Carlo method.
Example 3. (Orthant scheme II) A further interesting function class
are the functions which are formed by any superposition of 2d functions,
each
 orthant-wise monotone for a different orthant. Now, the condition
d|µ|(t) ≤ 1 is the same as
∂df
∈ L1 ,
∂x1 · · · ∂xd
i.e., we obtain the space of bounded mixed first variation. Again, this
means considering only functions that get smoother as the dimensionality
grows, but, in contrast to the examples mentioned above, only an anisotropic
smoothness assumption is involved. Note that this is just the prerequisite
for sparse grids with the piecewise constant hierarchical basis.
Further results on high-dimensional (and even infinite-dimensional) prob-
lems and their tractability were recently given by Woźniakowski, Sloan,

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


and others (Wasilkovski and Woźniakowski 1995, Sloan and Woźniakowski
1998, Wasilkovski and Woźniakowski 1999, Sloan 2001, Hickernell, Sloan
and Wasilkowski 2003, Dick, Sloan, Wang and Woźniakowski 2003). Here,
especially in the context of numerical integration, they introduce the notion
of weighted Sobolev spaces. Having observed that for some problems the
integrand becomes less and less variable in successive coordinate directions,
they introduce a sequence of positive weights {γj } with decreasing values,
with the weight γj being associated with coordinate direction j. They are
able to show that the integration problem in a particular Sobolev space
setting becomes strongly tractable (Traub and Woźniakowski 1980, Traub,
Wasilkowski and Woźniakowski 1983, 1988), i.e., that the worst-case error
for all functions in the unit ball of the weighted Sobolev space is bounded
independently of d and goes polynomially to zero if and only if the sum of
the weights is asymptotically bounded from above. This corresponds to the
decay of the kernel contributions in a reproducing kernel Hilbert space with
rising d. The original paper (Sloan and Woźniakowski 1998) assumes that
the integrand belongs to a Sobolev space of functions with square-integrable
mixed first derivatives with the weights built into the definition of the as-
sociated inner product. Note that this assumption is closely related to that
of Example 3 above. Since then, more general assumptions on the weights,
and thus on the induced weighted function spaces, have been found (Dick
et al. 2003, Hickernell et al. 2003).
In any case, we observe that a certain smoothness assumption on the
function under consideration changes with d to obtain approximation rates
which no longer depend exponentially on d. This raises the question of the
very meaning of smoothness as the dimension changes and tends to ∞.
To this end, let us finally note an interesting aspect, namely the concentra-
tion of measure phenomenon (Milman 1988, Talagrand 1995, Gromov 1999)
for probabilities in normed spaces of high dimension (also known as the geo-
metric law of large numbers). This is an important development in modern
analysis and geometry, manifesting itself across a wide range of mathemat-
ical sciences, particularly geometric functional analysis, probability theory,
graph theory, diverse fields of computer science, and statistical physics. In
the statistical setting it states the following. Let f be a Lipschitz function,
Lipschitz constant L, on the d-sphere. Let P be normalized Lebesgue meas-
ure on the sphere and let X be uniformly distributed with respect to P .
Then
 
P |f (X) − Ef (X)| > t ≤ c1 exp(−c2 t2 /L2 ),
with constants c1 and c2 independent of f and d (see Milman and Schecht-
man (1986) or Baxter and Iserles (2003, Section 2.3)). In its simplest form,
the concentration of measure phenomenon states that every Lipschitz func-
tion on a sufficiently high-dimensional domain Ω is well approximated by a

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


constant function (Hegland and Pestov 1999). Thus, there is some chance
of treating high-dimensional problems despite the curse of dimensionality.

3. Piecewise linear interpolation on sparse grids


As a first approach to sparse grids and their underlying hierarchical multi-
level setting, we discuss the problem of interpolating smooth functions with
the help of piecewise d-linear hierarchical bases. For that, we introduce a
tensor product-based subspace splitting and study the resulting subspaces.
Starting from their properties, sparse grids are defined via an optimiza-
tion process in a cost–benefit spirit closely related to the notion of n-term
approximation. Out of the variety of norms with respect to which such
an optimized discretization scheme can be derived, we restrict ourselves to
the L2 -, the L∞ - and the energy norm, and thus to the respective types of
sparse grids. After presenting the most important approximation properties
of the latter, a short digression into recurrences and complexity will demon-
strate their asymptotic characteristics and, consequently, their potential for
problems of high dimensionality.

3.1. Hierarchical multilevel subspace splitting


In this section, the basic ingredients for our tensor product-based hierarch-
ical setting are provided.

Subspace decomposition
Let us start with some notation and with the preliminaries that are neces-
sary for a detailed discussion of sparse grids for purposes of interpolation or
approximation, respectively. On the d-dimensional unit interval Ω̄ := [0, 1]d ,
we consider multivariate functions u, u(x) ∈ R, x := (x1 , . . . , xd ) ∈ Ω̄, with
(in some sense) bounded weak mixed derivatives
∂ |α|1 u
Dα u := (3.1)
∂xα1 1 · · · ∂xαd d
up to some given order r ∈ N0 . Here, α ∈ Nd0 denotes a d-dimensional
multi-index with the two norms

d
|α|1 := αj and |α|∞ := max αj .
1≤j≤d
j=1

In the context of multi-indices, we use component-wise arithmetic opera-


tions, for example
α · β := (α1 β1 , . . . , αd βd ), γ · α := (γα1 , . . . , γαd ),

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


or
2α := (2α1 , . . . , 2αd ),
the relational operators
α ≤ β ⇔ ∀1≤j≤d αj ≤ βj
and
α<β ⇔ α≤β and α = β,
and, finally, special multi-indices such as
0 := (0, . . . , 0) or 1 := (1, . . . , 1),
and so on. In the following, for q ∈ {2, ∞} and r ∈ N0 , we study the spaces
X q,r (Ω̄) := {u : Ω̄ → R : Dα u ∈ Lq (Ω), |α|∞ ≤ r}, (3.2)
X0q,r (Ω̄) := {u ∈ X r (Ω̄) : u|∂Ω = 0}.
Thus, X q,r (Ω̄) denotes the space of all functions of bounded (with respect to
the Lq -norm) mixed derivatives up to order r, and X0q,r (Ω̄) will be the sub-
space of X q,r (Ω̄) consisting of those u ∈ X r (Ω̄) vanishing on the boundary
∂Ω. Note that, for the theoretical considerations, we shall restrict ourselves
to the case of homogeneous boundary conditions, i.e., to X0q,r (Ω̄). Fur-
thermore, note that we omit the ambient dimension d when clear from the
context. Concerning the smoothness parameter r ∈ N0 , we need r = 2 for
the case of piecewise linear approximations which, for the moment, will be
in the centre of interest. Finally, for functions u ∈ X0q,r (Ω̄) and multi-indices
α with |α|∞ ≤ r, we introduce the seminorms
 
|u|α,∞ := Dα u∞ , (3.3)
 1/2
   α 2
|u|α,2 := Dα u2 = D u dx .
Ω̄

Now, with the multi-index l = (l1 , . . . , ld ) ∈ Nd , which indicates the level


in a multivariate sense, we consider the family of d-dimensional standard
rectangular grids
Ωl : l ∈ Nd (3.4)
on Ω̄ with mesh size
hl := (hl1 , . . . , hld ) := 2−l . (3.5)
That is, the grid Ωl is equidistant with respect to each individual coordinate
direction, but, in general, may have different mesh sizes in the different
coordinate directions. The grid points xl,i of grid Ωl are just the points
xl,i := (xl1 ,i1 , . . . , xld ,id ) := i · hl , 0 ≤ i ≤ 2l . (3.6)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


y W21
W1
W2
X

Figure 3.1. Tensor product approach for piecewise bilinear basis functions.

Thus, here and in the following, the multi-index l indicates the level (of
a grid, a point, or, later on, a basis function, respectively), whereas the
multi-index i denotes the location of a given grid point xl,i in the respective
grid Ωl .
Next, we have to define discrete approximation spaces and sets of basis
functions that span those discrete spaces. In a piecewise linear setting, the
simplest choice of a 1D basis function is the standard hat function φ(x),
1 − |x|, if x ∈ [−1, 1],
φ(x) := (3.7)
0, otherwise.
This mother of all piecewise linear basis functions can be used to generate
an arbitrary φlj ,ij (xj ) with support [xlj ,ij −hlj , xlj ,ij +hlj ] = [(ij −1)hlj , (ij +
1)hlj ] by dilation and translation, that is,

xj − ij · hlj
φlj ,ij (xj ) := φ . (3.8)
hlj
The resulting 1D basis functions are the input of the tensor product con-
struction which provides a suitable piecewise d-linear basis function in each
grid point xl,i (see Figure 3.1):
d
φl,i (x) := φlj ,ij (xj ). (3.9)
j=1

Since we deal with homogeneous boundary conditions (i.e., with X0q,2 (Ω̄)),
only those φl,i (x) that correspond to inner grid points of Ωl are taken into
account for the definition of
Vl := span φl,i : 1 ≤ i ≤ 2l − 1 , (3.10)
the space of piecewise d-linear functions with respect to the interior of Ωl .
Obviously, the φl,i form a basis of Vl , with one basis function φl,i of a support
of the fixed size 2 · hl for each inner grid point xl,i of Ωl , and this basis {φl,i }
is just the standard nodal point basis of the finite-dimensional space Vl .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Additionally, we introduce the hierarchical increments Wl ,
Wl := span φl,i : 1 ≤ i ≤ 2l − 1, ij odd for all 1 ≤ j ≤ d , (3.11)
for which the relation

Vl = Wk (3.12)
k≤l

can be easily seen. Note that the supports of all basis functions φl,i spanning
Wl are mutually disjoint. Thus, with the index set
Il := i ∈ Nd : 1 ≤ i ≤ 2l − 1, ij odd for all 1 ≤ j ≤ d , (3.13)
we get another basis of Vl , the hierarchical basis
φk,i : i ∈ Ik , k ≤ l , (3.14)
which generalizes the well-known 1D basis shown in Figure 3.2 to the
d-dimensional case by means of a tensor product approach. With these
hierarchical difference spaces Wl , we can define

 ∞
 
V := ··· W(l1 ,...,ld ) = Wl (3.15)
l1 =1 ld =1 l∈Nd

with its natural hierarchical basis


φl,i : i ∈ Il , l ∈ Nd . (3.16)
Except for completion with respect to the H 1 -norm, V is simply the under-
lying Sobolev space H01 (Ω̄), i.e., V̄ = H01 (Ω̄).

Figure 3.2. Piecewise linear hierarchical basis (solid)


vs. nodal point basis (dashed).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Later we shall deal with finite-dimensional subspaces of V . Note that, for
instance, with the discrete spaces

Vn(∞) := Wl , (3.17)
|l|∞ ≤n

the limit
 ∞

lim Vn(∞) = lim Wl := Vn(∞) = V (3.18)
n→∞ n→∞
|l|∞ ≤n n=1

(∞) (∞)
exists due to Vn ⊂ Vn+1 . Hence, any function u ∈ H01 (Ω̄) and, con-
sequently, any u ∈ X0q,2 (Ω̄) can be uniquely split by
 
u(x) = ul (x), ul (x) = vl,i · φl,i (x) ∈ Wl , (3.19)
l i∈Il

where the vl,i ∈ R are the coefficient values of the hierarchical product basis
representation of u also called hierarchical surplus.
Before we turn to finite-dimensional approximation spaces for X0q,2 (Ω̄),
we summarize the most important properties of the hierarchical subspaces
Wl according to Bungartz (1992b) and Bungartz and Griebel (1999).

Basic properties of the subspaces


Concerning the subspaces Wl , the crucial questions are how important Wl is
for the interpolation of some given u ∈ X0q,2 (Ω̄) and what computational and
storage cost come along with it. From (3.11) and (3.13), we immediately
learn the dimension of Wl , i.e., the number of degrees of freedom (grid
points or basis functions, respectively) associated with Wl :

|Wl | = |Il | = 2|l−1|1 . (3.20)


Equation (3.20) already answers the second question.
The following discussion of a subspace’s contribution to the overall inter-
polant according to (3.19) will be based upon three norms: the maximum
norm  · ∞ , the Lp -norm  · p (p = 2 in general), and the energy norm
 d  1/2
 ∂u(x) 2
uE := dx , (3.21)
Ω ∂xj
j=1

which is equivalent to the H 1 -norm in H01 (Ω̄). For the Laplacian, (3.21)
indeed indicates the energy norm in finite element terminology. First we
look at the different hierarchical basis functions φl,i (x).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Lemma 3.1. For any piecewise d-linear basis function φl,i (x), the follow-
ing equations hold:
φl,i ∞ = 1, (3.22)
 d/p
2
φl,i p = · 2−|l|1 /p , (p ≥ 1),
p+1
  d 1/2
√ 2 (d−1)/2 
−|l|1 /2 2lj
φl,i E = 2 · ·2 · 2 .
3
j=1

Proof. All equalities result from straightforward calculations based on the


definition of φl,i (x) (see (3.7) to (3.9), and Bungartz (1992b) and Bungartz
and Griebel (1999), for example).
Next, we consider the hierarchical coefficient values vl,i in more detail.
They can be computed from the function values u(xl,i ) in the following
way:
 d   d 
 1 
vl,i = − 2 1 − 12 x ,l u =: Ixlj ,ij ,lj u (3.23)
lj ,ij j
j=1 j=1
=: Ixl,i ,l u.
This is due to the definition of the spaces Wl and their basis functions
(3.11), whose supports are mutually disjoint and do not contain coarse grid
points xk,j , k < l, in their interior. Definition (3.23) illustrates why vl,i is
also called hierarchical surplus. In (3.23), as usual in multigrid terminology
(cf. Hackbusch (1985, 1986), for instance), Ixl,i ,l denotes a d-dimensional
stencil which gives the coefficients for a linear combination of nodal values
of its argument u. This operator-based representation of the hierarchical
coefficients vl,i leads to an integral representation of vl,i , as follows.
Lemma 3.2. Let ψlj ,ij (xj ) := −2−(lj +1) · φlj ,ij (xj ). Further, let ψl,i (x) :=
d
j=1 ψlj ,ij (xj ). For any coefficient value vl,i of the hierarchical representa-
tion (3.19) of u ∈ X0q,2 (Ω̄), the following relation holds:

vl,i = ψl,i (x) · D2 u(x) dx. (3.24)

Proof. First we look at the simplest case d = 1, where we can omit the
index j for clarity. Partial integration provides

∂ 2 u(x)
ψl,i (x) · dx
Ω ∂x2
 xl,i +hl
∂ 2 u(x)
= ψl,i (x) · dx
xl,i −hl ∂x2

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


   xl,i +hl
∂u(x) xl,i +hl ∂ψl,i (x) ∂u(x)
= ψl,i (x) · − · dx
∂x xl,i −hl xl,i −hl ∂x ∂x
 xl,i  xl,i +hl
1 ∂u(x) 1 ∂u(x)
= · dx − · dx
xl,i −hl 2 ∂x xl,i 2 ∂x
= Ixl,i ,l u,

since ψl,i (xl,i − hl ) = ψl,i (xl,i + hl ) = 0 and since ∂ψl,i (x)/∂x ∈ { 12 , − 12 } due
to the construction of ψl,i and φl,i . Finally, the tensor product approach
according to the operator product given in (3.23) leads to a straightforward
generalization to d > 1.

The above lemma and its proof show the close relations of our hierarchical
basis approach to integral transforms like wavelet transforms. Applying
successive partial integration to (3.24), twice for d = 1 and 2d times for
general dimensionality, we get
 
2
vl,i = ψl,i (x) · D u(x) dx = ψ̂l,i (x) · u(x) dx, (3.25)
Ω Ω

where ψ̂l,i (x) equals D2 ψl,i (x) in a weak sense (i.e., in the sense of distri-
butions) and is a linear combination of 3d Dirac pulses of alternating sign.
Thus, the hierarchical surplus vl,i can be interpreted as the coefficient res-
ulting from an integral transform with respect to a function ψ̂l,i (x) of an
oscillating structure.
Starting from (3.24), we are now able to give bounds for the hierarchical
coefficients with respect to the different seminorms introduced in (3.3).

Lemma 3.3. Let u ∈ X0q,2 (Ω̄) be given in its hierarchical representation.


Then, the following estimates for the hierarchical coefficients vl,i hold:

|vl,i | ≤ 2−d · 2−2·|l|1 · |u|2,∞ , (3.26)


 d/2
2  
−d
|vl,i | ≤ 2 · · 2−(3/2)·|l|1 · u |supp(φl,i ) 2,2 ,
3
where supp(φl,i ) denotes the support of φl,i (x).

Proof. With (3.22), (3.24), and with the definition of ψl,i , we get
 
   
|vl,i | =  ψl,i (x) · D u(x) dx ≤ ψl,i 1 · D2 u |supp(φl,i ) ∞
 2

 
= 2−d · 2−|l|1 · φl,i 1 · u |supp(φl,i ) 2,∞
≤ 2−d · 2−2·|l|1 · |u|2,∞ .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


For | · |2,2 , the Cauchy–Schwarz inequality provides
 
   
|vl,i | =  ψl,i (x) · D u(x) dx ≤ ψl,i 2 · D2 u |supp(φl,i ) 2
 2

 
= 2−d · 2−|l|1 · φl,i 2 · u |supp(φl,i ) 2,2 ,
which, with (3.22), is the desired result.
Finally, the results from the previous three lemmata lead to bounds for
the contribution ul ∈ Wl of a subspace Wl to the hierarchical representation
(3.19) of a given u ∈ X0q,2 (Ω̄).
Lemma 3.4. Let u ∈ X0q,2 (Ω̄) be given in its hierarchical representation
(3.19). Then, the following estimates for its components ul ∈ Wl hold:
ul ∞ ≤ 2−d · 2−2·|l|1 · |u|2,∞ , (3.27)

ul 2 ≤ 3−d · 2−2·|l|1 · |u|2,2 ,


 d 1/2
1 
−2·|l|1 2·lj
ul E ≤ ·2 · 2 · |u|2,∞ ,
2 · 12(d−1)/2 j=1
 d 1/2
√ 
ul E ≤ 3 · 3−d · 2−2·|l|1 · 22·lj · |u|2,2 .
j=1

Proof. Since the supports of all φl,i contributing to ul according to (3.19)


are mutually disjoint, the first estimate follows immediately from the re-
spective statements in (3.22) and (3.26). For the estimate concerning the
L2 -norm, we get with the same argument of disjoint supports and with
(3.22) and (3.26)
 2
  
2  
ul 2 =  vl,i · φl,i  = |vl,i |2 · φl,i 22
 
i∈Il 2 i∈Il
 1  d
−3·|l|1 
 2 2
≤ ·2 
· u |supp(φl,i ) 2,2 · · 2−|l|1
6 d 3
i∈Il

= 9−d · 2−4·|l|1 · |u|22,2 .


Finally, an analogous argument provides
 2
 
 
ul 2E =  vl,i · φl,i 
 
i∈Il E

= |vl,i | · φl,i 2E
2

i∈Il

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 d−1  d 
 1 2 
−4·|l|1 2
≤ d
·2 · |u|2,∞ · 2 · · 2−|l|1 · 2·lj
2
4 3
i∈Il j=1
 d 
1  
−5·|l|1 2·lj
= ·2 · 2 · |u|22,∞
2·6 d−1
j=1 i∈Il
 d 
1 
= · 2−4·|l|1 · 22·lj · |u|22,∞
4 · 12 d−1
j=1

as well as the second estimate for ul E .

In the next section, the information gathered above will be used to con-
struct finite-dimensional approximation spaces U for V or X0q,2 (Ω̄), respect-
ively. Such a U shall be based on a subspace selection I ⊂ Nd ,

U := Wl , (3.28)
l∈I

with corresponding interpolants or approximants



uU := ul , ul ∈ W l . (3.29)
l∈I

The estimate
 
    
 
u − uU  =  ul − ul  ≤ ul  ≤ b(l) · |u| (3.30)
 
l l∈I l∈I l∈I

will allow the evaluation of the approximation space U with respect to a


norm  ·  and a corresponding seminorm | · | on the basis of the bounds
from above indicating the benefit b(l) of Wl .

3.2. Sparse grids


Interpolation in finite-dimensional spaces
The hierarchical multilevel splitting introduced in the previous section brings
along a whole family of hierarchical subspaces Wl of V . However, for dis-
cretization purposes, we are more interested in decompositions of finite-
dimensional subspaces of V or X0q,2 (Ω̄) than in the splitting of V itself.
Therefore, we now turn to finite sets I of active levels l in the summation
(∞)
(3.19). For some n ∈ N, for instance, one possibility Vn has already been
(∞)
mentioned in (3.17). The finite-dimensional Vn is just the usual space
of piecewise d-linear functions on the rectangular grid Ωn·1 = Ω(n,...,n) with
equidistant mesh size hn = 2−n in each coordinate direction. In a scheme

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


W 11 W 21 l1

W 12

l2

Figure 3.3. Scheme of subspaces for d = 2. Each square represents


one subspace Wl with its associated grid points. The supports of the
corresponding basis functions have the same mesh size hl and cover
the domain Ω.

(∞)
of subspaces Wl as shown in Figure 3.3 for the 2D case, Vn corresponds
to a square sector of subspaces: see Figure 3.4.
(∞)
Obviously, the dimension of Vn (i.e., the number of inner grid points
in the underlying grid) is
 (∞)   n 
Vn  = 2 − 1 d = O(2d·n ) = O(h−d n ). (3.31)
(∞) (∞) (∞)
For the error u − un of the interpolant un ∈ Vn of a given function
q,2
u ∈ X0 (Ω̄) with respect to the different norms we are interested in, the
following lemma states the respective results.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


n=1 n=2 n=3

W11

W12
33 23 33 13 33 23 33
32 22 32 12 32 22 32
33 23 33 13 33 23 33
31 21 31 11 31 21 31
33 23 33 13 33 23 33
W13
32 22 32 12 32 22 32
33 23 33 13 33 23 33

(∞)
Figure 3.4. The (full) grid of V3 , d = 2, and the assignment of grid
points to subspaces.

Lemma 3.5. For u ∈ X0q,2 (Ω̄), the following estimates for the different
(∞) (∞) (∞)
norms of the interpolation error u − un , un ∈ Vn , hold:

d
u − u(∞)
n ∞ ≤ · 2−2n · |u|2,∞ = O(h2n ), (3.32)
6d
d
u − u(∞)
n 2 ≤ d · 2
−2n
· |u|2,2 = O(h2n ),
9
d3/2
u − u(∞)
n E ≤ · 2−n · |u|2,∞ = O(hn ),
2 · 3(d−1)/2 · 6d−1
d3/2
u − u(∞)
n E ≤ √ · 2−n · |u|2,2 = O(hn ).
3 · 9d−1

Proof. For the L∞ -norm, (3.27) provides


 1 
u − u(∞)
n ∞ ≤ ul ∞ ≤ d
· |u|2,∞ · 2−2·|l|1 ,
2
|l|∞ >n |l|∞ >n

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


from which we get
 
1  
u − u(∞)
n ∞ ≤ d · |u|2,∞ · 4 −|l|1
− 4−|l|1
2
l |l|∞ ≤n
  n d 
1 1 d 
= d · |u|2,∞ · − 4−i
2 3
i=1

1 1  d 
= d
· |u|2,∞ · · 1 − (1 − 4−n )d
2 3
d
≤ d · |u|2,∞ · 4−n .
6
The respective result for the L2 -norm can be obtained in exactly the same
way. For the error with respect to the energy norm, (3.27) leads to

d1/2 
u − u(∞)
n E ≤ · |u|2,∞ · 2−2·|l|1 · max 2lj
2 · 12(d−1)/2 1≤j≤d
|l|∞ >n

d3/2 
≤ · |u|2,∞ · 2−2·|l|1 · 2l1
2· 12(d−1)/2
|l|∞ =l1 >n
 d−1
d3/2  
l1
= · |u|2,∞ · 2−l1 · 4−lj
2 · 12(d−1)/2 l >n lj =1
1

d3/2 1
≤ · |u|2,∞ · d−1 · 2−n ,
2 · 12(d−1)/2 3
and an analogous argument provides the second estimate.

It is important to note that we get the same order of accuracy as in


standard approximation theory, although our regularity assumptions differ
from those normally used there.
(∞)
Equations (3.31) and (3.32) clearly reveal the crucial drawback of Vn ,
the curse of dimensionality discussed in detail in Section 2. With d in-
creasing, the number of degrees of freedom that are necessary to achieve
an accuracy of O(h) or O(h2 ), respectively, grows exponentially. Therefore,
we ask how to construct discrete approximation spaces that are better than
(∞)
Vn in the sense that the same number of invested grid points leads to a
higher order of accuracy. Hence, in the following, we look for an optimum
V (opt) by solving a restricted optimization problem of the type

max u − uV (opt)  = min max u − uU  (3.33)


u∈X0q,2 : |u|=1 U ⊂V : |U |=w u∈X0q,2 : |u|=1

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


for some prescribed cost or work count w. The aim is to profit from a given
work count as much as possible. Note that an optimization the other way
round could be done as well. Prescribe some desired accuracy ε and look
for the discrete approximation scheme that achieves this with the smallest
work count possible. This is in fact the point of view of computational
complexity. Of course, any potential solution V (opt) of (3.33) has to be
expected to depend on the norm  ·  as well as on the seminorm | · | used
to measure the error of u’s interpolant uU ∈ U or the smoothness of u,
respectively. According  to our hierarchical setting, we will allow discrete
spaces of the type U := l∈I Wl for an arbitrary finite index set I ⊂ Nd as
candidates for the optimization process only.
An approach such as (3.33) selects certain Wl due to their importance,
and thus selects the respective underlying grid points. Depending on the
invested work count w, we can expect to get some kind of regular struc-
ture or grid patterns. However, in contrast to adaptive grid refinement,
which is highly problem-dependent, such a proceeding simply depends on
the problem class (i.e., on the space u has to belong to, here X0q,2 (Ω̄)), but
not on u itself. Although such a priori optimization strategies are not very
widespread in the context of PDEs, there is a long tradition in approxim-
ation theory and numerical quadrature. For example, think of the Gauss
quadrature rules where the grid points are chosen as the roots of certain
classes of orthogonal polynomials; cf. Krommer and Ueberhuber (1994), for
instance. Compared with equidistant quadrature rules based on polynomial
interpolants with the same number n of grid points, the degree of accuracy,
i.e., the maximum polynomial degree up to which a numerical quadrature
rule provides exact results, can be augmented from at most n to 2n − 1.
Another nice example of the usefulness of an a priori grid optimiza-
tion in numerical quadrature is provided by the Koksma–Hlawka inequality
(Hlawka 1961, Krommer and Ueberhuber 1994), which says that, for every
quadrature formula Qn based on a simple averaging of samples,

1 
n
Qn u := · u(xi ), (3.34)
n
i=1

that is used to get an approximation to Iu := Ω̄ u(x) dx, a sharp error
bound is given by
 
Qn u − Iu ≤ V (u) · Dn∗ (x1 , . . . , xn ). (3.35)

Here, V (u) is the so-called variation of u in the sense of Hardy and Krause,
a property of u indicating the global smoothness of u: the smoother u is
on Ω̄, the smaller the values of V (u). Dn∗ (x1 , . . . , xn ) denotes the so-called
star discrepancy of the grid (x1 , . . . , xn ), which measures the deviation of a

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


finite part of the sequence x1 , x2 , . . . from the uniform distribution, and is
defined by
  
1 n 
 
Dn∗ (x1 , . . . , xn ) := sup  · χE (xi ) − χE (x) dx, (3.36)
E∈E  n Ω̄
i=1


where E := [0, e1 [ × · · · × [0, ed [⊂ Ω̄ is the set of all subcubes of Ω̄ with 0


as a corner and χE is the characteristic function of E ∈ E. Although we do
not want to go into detail here,
 we emphasize the crucial point of (3.35). The
quadrature error Qn u − Iu divides into two parts: a problem-dependent

one (the variation of u) unaffected by the grid, and a grid-dependent one (the
star discrepancy) uninfluenced by the actual problem (i.e., u). This clearly
shows the benefit of the two optimization strategies mentioned above: the
construction of grids of low discrepancy (low-discrepancy formulas) reduces
the second factor in (3.35), whereas adaptive grid refinement can help to
concentrate further grid points on subregions of Ω̄ with a (locally) high
variation. After this digression into numerical quadrature, we return to our
actual topic, the solution of (3.33) in our hierarchical subspace setting.

Formal derivation and properties of sparse grids


(opt)
As already mentioned,
 the candidates for V are finite sets of Wl . There-
fore, spaces U := l∈I Wl , the respective grids, and the underlying index
sets I ⊂ Nd have to be identified. There are two obvious ways to tackle such
problems: a continuous one based on an analytical approach where the
multi-index l is generalized to a nonnegative real one, and a discrete one
which uses techniques known from combinatorial optimization; for details,
we refer the reader to Bungartz (1998).

Continuous optimization
For the following, a grid and its representation I – formerly a finite set of
multi-indices – is nothing but a bounded subset of Rd+ , and a hierarchical
subspace Wl just corresponds to a point l ∈ Rd+ .
First we have to formulate the optimization problem (3.33). To this end,
and inspired by (3.20), the local cost function c(l) is defined as a straight-
forward generalization of the number of degrees of freedom involved:
c(l) := 2|l|1 −d = 2l1 +···+ld −d . (3.37)
For the local benefit function b(l), we use the squared upper bounds for ul 
according to (3.27). At the moment, we do not fix the norm to be used
here. Obviously, the search for an optimal I ⊂ Rd+ can be restricted to
I ⊂ I(max) := [0, N ]d for a sufficiently large N without loss of generality.
Based on the two local quantities c(l) and b(l), the global cost C(I) and the

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


global benefit B(I) of a grid I are defined by
 
C(I) := c(l) dl, B(I) := b(l) dl. (3.38)
I I

This leads to the desired restricted optimization problem according to (3.33):


max B(I). (3.39)
C(I)=w

For the solution of (3.39), we start from an arbitrary I ⊂ I(max) that has
a sufficiently smooth boundary ∂I. With a sufficiently smooth mapping τ ,
τ : Rd+ → Rd+ , τ (l) = 0 for l ∈ ∂Rd+ , (3.40)
we define a small disturbance ϕε,τ of the grid I:
ϕε,τ : I → Iε,τ ⊂ I(max) , ϕε,τ (l) := l + ε · τ (l), ε ∈ R. (3.41)
For the global cost of the disturbed grid Iε,τ , we get
 
C(Iε,τ ) = c(k) dk = c(l + ε · τ (l)) · |det Dϕε,τ | dl. (3.42)
Iε,τ I

Taylor expansion of c(l + ε · τ (l)) in ε = 0 provides


c(l + ε · τ (l)) = c(l) + ε · ∇c(l) · τ (l) + O(ε2 ), (3.43)
where ∇c(l)·τ (l) denotes the scalar product. Furthermore, a straightforward
calculation shows
|det Dϕε,τ | = 1 + ε · div τ + O(ε2 ). (3.44)
Thus, since I ⊂ I(max) with I(max) bounded, Gauss’s theorem leads to

C(Iε,τ ) = C(I) + ε · c(l) · τ (l) dS + O(ε2 ). (3.45)
∂I
Consequently, for the derivative with respect to ε, we get

∂C(Iε,τ )  C(Iε,τ ) − C(I)
 = lim = c(l) · τ (l) dS. (3.46)
∂ε ε=0 ε→0 ε ∂I

Similar arguments hold for the global benefit B(I) and result in

∂B(Iε,τ )  B(Iε,τ ) − B(I)
 = lim = b(l) · τ (l) dS. (3.47)
∂ε ε=0 ε→0 ε ∂I

Now, starting from the optimal grid I(opt) , Lagrange’s principle for the
optimization under a constraint can be applied, and we get
 
λ· c(l) · τ (l) dS = b(l) · τ (l) dS. (3.48)
∂I(opt) ∂I(opt)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Since τ vanishes on the boundary of Rd+ , i.e., τ (l) = 0 when any component
of l vanishes, (3.48) is equivalent to
 
λ· c(l) · τ (l) dS = b(l) · τ (l) dS. (3.49)
∂I(opt) \∂Rd+ ∂I(opt) \∂Rd+

Finally, since (3.49) is valid for all appropriate smooth disturbances τ ,


λ · c(l) = b(l) (3.50)
holds for all l ∈ ∂I(opt) \ ∂Rd+ .
This is a quite interesting result, because (3.50) says that the ratio of
the local benefit b(l) to the local cost c(l) is constant on the boundary
∂I(opt) \ ∂Rd+ of any grid I(opt) that is optimal in our sense. This means that
the global optimization process (3.33) or (3.39), respectively, in which we
look for an optimal grid can be reduced to studying the local cost–benefit
ratios b(l)/c(l) of the subspaces associated with l. Therefore, if we come
back to real hierarchical subspaces Wl and to indices l ∈ Nd , all one has to
do is to identify sets of subspaces Wl with constant cost–benefit ratio in the
subspace scheme of Figure 3.3. The grid I(opt) , then, contains the region
where the cost–benefit ratio is bigger than or equal to the constant value
on the boundary ∂I(opt) \ ∂Rd+ .

Discrete optimization
Since the above continuous optimization process with its roundabout way
of generalizing integer multi-indices to real ones is a bit unnatural, (3.33) is
now formulated as a discrete optimization problem.
First of all, we redefine the local functions c(l) and b(l), now for multi-
indices l ∈ Nd only. According to (3.20), the local cost c(l) is defined by
c(l) := |Wl | = 2|l−1|1 , (3.51)
which is exactly the same as (3.37) restricted to l ∈ Nd . Obviously, c(l) ∈ N
holds for all l ∈ Nd . Concerning the local benefit function, we define
b(l) := γ · β(l), (3.52)
where β(l) is an upper bound for ul 2 according to (3.27), and γ is a factor
depending on the problem’s dimensionality d and on the smoothness of the
data, i.e., of u, but constant with respect to l, such that b(l) ∈ N. The
respective bounds in (3.27) show that such a choice of γ is possible for each
of the three norms that are of interest in our context. Note that, as in the
continuous case, we do not make any decision concerning the actual choice
of norm to be used for b(l) for the moment.
Again, the search for an optimal grid I ⊂ Nd can be restricted to all
I ⊂ I(max) := {1, . . . , N }d for a sufficiently large N without loss of generality.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Next, the global cost benefit functions are redefined as well. For C(I), we
define
 
C(I) := c(l) = x(l) · c(l), (3.53)
l∈I l∈I(max)

where
0, l ∈
/ I,
x(l) := (3.54)
1, l ∈ I.

The interpolant to u on a grid I provides the global benefit B(I):


 2  2
      
   
u − ul  ≈  u − ul  (3.55)
   (max) l 
l∈I l∈I l∈I

≤ ul 2
l∈I(max) \I
  
≤ 1 − x(l) · γ · β(l)
l∈I(max)
 
= γ · β(l) − x(l) · γ · β(l)
l∈I(max) l∈I(max)

=: γ · β(l) − B(I).
l∈I(max)

Of course, (3.55) gives only an upper bound for an approximation to the


(squared) interpolation error, because it does not take into account all l ∈/
I (max) . However, since N and, consequently, I (max) can be chosen to be as
big as appropriate, this is not a serious restriction. Altogether, we get the
following reformulation of (3.33):
 
max x(l) · γ · β(l) with x(l) · c(l) = w. (3.56)
I⊂I(max)
l∈I(max) l∈I(max)

If we arrange the l ∈ I(max) in some linear order (e.g., a lexicographical one)


with local cost ci and benefit bi , i = 1, . . . , N d =: M , (3.56) reads as

max bT x with cT x = w, (3.57)


x

where b ∈ NM , c ∈ NM , x ∈ {0, 1}M , and, without loss of generality,


w ∈ N. In combinatorial optimization, a problem like (3.57) is called a
binary knapsack problem (Martello and Toth 1990), which is known to be
an NP-hard one. However, a slight change makes things much easier. If
 M
rational solutions, i.e., x ∈ [0, 1]∩Q , are allowed too, there exists a very

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 M
simple algorithm that provides an optimal solution vector x ∈ [0, 1] ∩ Q :
(1) rearrange the order such that cb11 ≥ b2
··· ≥ bM
cM ;
 c2
(2) let r := max j : ji=1 ci ≤ w ;
(3) x1 := · · ·:= xr:= 1, 
xr+1 := w − ri=1 ci /cr+1 ,
xr+2 := · · · := xM := 0.
Although there is only one potential non-binary coefficient xr+1 , the rational
solution vector x, generally, has nothing to do with its binary counterpart.
But fortunately our knapsack is of variable size, since the global work count
w is an arbitrarily chosen natural number. Therefore, it is possible to force
the solution of the rational problem to be a binary one which is, of course,
also a solution of the corresponding binary problem. Consequently, as in
the continuous case before, the global optimization problem (3.33) or (3.57),
respectively, can be reduced to the discussion of the local cost–benefit ratios
bi /ci or b(l)/c(l) of the underlying subspaces Wl . Those subspaces with the
best cost–benefit ratios are taken into account first, and the smaller these
ratios become, the more negligible the underlying subspaces turn out to be.
This is in the same spirit as n-term approximation (DeVore 1998).

L2 -based sparse grids


Owing to (3.27), the L2 - and L∞ -norm of Wl ’s contribution ul to the hier-
archical representation (3.19) of u ∈ X0q,2 (Ω̄) are of the same order of mag-
nitude. Therefore there are no differences in the character of the cost–benefit
ratio, and the same optimal grids I(opt) will result from the optimization
process described above. According to (3.20) and (3.27), we define
b∞ (l) 2−4·|l|1 · |u|22,∞ 1
cbr∞ (l) := := |l−1|
= d · 2−5·|l|1 · |u|22,∞ , (3.58)
c(l) 4 ·2
d 1 2
−4·|l| 2  d
b2 (l) 2 1 · |u|2,2 2
cbr2 (l) := := |l−1|
= · 2−5·|l|1 · |u|22,2
c(l) 9 ·2
d 1 9
as the local cost–benefit ratios. Note that we use bounds for the squared
norms of ul for reasons of simplicity, but without loss of generality An op-
timal grid I(opt) will consist of all multi-indices l or their corresponding sub-
spaces Wl where cbr∞ (l) or cbr2 (l) is bigger than some prescribed threshold
σ∞ (n) or σ2 (n), respectively. We choose those thresholds to be of the order
of cbr∞ (l̄) or cbr2 (l̄) with l̄ := (n, 1, . . . , 1):
1
σ∞ (n) := cbr∞ (l̄) = · 2−5·(n+d−1) · |u|22,∞ , (3.59)
2d
 d
2
σ2 (n) := cbr2 (l̄) = · 2−5·(n+d−1) · |u|22,2 .
9

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


n=1 n=2 n=3

W11

W12
13
22 12 22
13
31 21 31 11 31 21 31
13 W13
22 12 22
13

(1)
Figure 3.5. The sparse grid of V3 , d = 2, and the assignment of grid
points to subspaces.

That is, we fix d subspaces on the axes in the subspace scheme of Figure 3.3
and search for all Wl whose cost–benefit ratio is equal or better. Thus,
applying the criterion cbr∞ (l) ≥ σ∞ (n) or cbr2 (l) ≥ σ2 (n), respectively, we
get the relation
|l|1 ≤ n + d − 1 (3.60)

that qualifies a subspace Wl to be taken into account. This result leads us


(1)
to the definition of a new discrete approximation space Vn ,

Vn(1) := Wl , (3.61)
|l|1 ≤n+d−1

which is L∞ - and L2 -optimal with respect to our cost–benefit setting. The


(1)
grids that correspond to the spaces Vn are just the standard sparse grids
as were introduced in Zenger (1991), studied in detail in Bungartz (1992b),
and discussed in a variety of other papers for different applications. In
(∞)
comparison with the standard full grid space Vn , we now have triangular
or simplicial sectors of subspaces in the scheme of Figure 3.3: see Figure 3.5.
Figure 3.6, finally, gives two examples of sparse grids: a regular 2D and an
adaptively refined 3D one.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 3.6. Sparse grids: regular example (left) and adaptive one (right).

Now, let us turn to the basic properties of the sparse grid approximation
(1)
spaces Vn .
(1)
Lemma 3.6. The dimension of the space Vn , i.e., the number of degrees
of freedom or inner grid points, is given by
 
 (1)  n−1 d−1+i
 Vn  = 2i · (3.62)
d−1
i=0
d−1 
 n+d−1
= (−1) + 2 ·
d n
· (−2)d−1−i
i
i=0
 d−1
n
= 2n · + O(nd−2 ) .
(d − 1)!
Thus, we have
 (1) 
Vn  = O(h−1
n · | log2 hn |
d−1
). (3.63)
Proof. With (3.20) and (3.61), we get
 
 (1)    
  
n+d−1 
Vn  =  Wl  = 2|l−1|1 = 2i−d · 1
 
|l|1 ≤n+d−1 |l|1 ≤n+d−1 i=d |l|1 =i


n+d−1 
i−1
= 2i−d
·
d−1
i=d

n−1 
d−1+i
= 2 ·
i
,
d−1
i=0

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 i−1 
since there are d−1 ways to form the sum i with d nonnegative integers.
Furthermore,

n−1 
d−1+i
2i ·
d−1
i=0

1 
n−1
(d−1) 
= · xi+d−1 
(d − 1)! x=2
i=0

1 1 − xn (d−1) 
= · xd−1 · 
(d − 1)! 1−x x=2

 d−1 
d−1  (d−1−i) 
1 
n+d−1 (i) 1 
= · · xd−1
−x · 
(d − 1)! i 1−x x=2
i=0
d−1 
 n+d−1
= (−1)d + 2n · · (−2)d−1−i ,
i
i=0

from which the result concerning the order and the leading coefficient follows
immediately.

The above lemma shows the order O(2n ·nd−1 ) or, with hn = 2−n , O(h−1
n ·
| log2 hn |d−1 ), which is a significant reduction of the number of degrees of
freedom and, thus, of the computational and storage requirement compared
(∞)
with the order O(h−d n ) of Vn .
The other question to be discussed concerns the interpolation accuracy
that can be obtained on sparse grids. For that, we look at the interpolation
(1) (1) (1)
error u − un of the sparse grid interpolant un ∈ Vn which, due to (3.19)
and (3.61), can be written as
  
u − u(1)
n = ul − ul = ul .
l |l|1 ≤n+d−1 |l|1 >n+d−1

Therefore, for any norm  · , we have



u − u(1)
n ≤ ul . (3.64)
|l|1 >n+d−1

The following lemma provides a prerequisite for the estimates of the inter-
polation error with respect to the different norms we are interested in. For
d, n ∈ N, we define
d−1 
 n+d−1 nd−1
A(d, n) := = + O(nd−2 ). (3.65)
k (d − 1)!
k=0

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Lemma 3.7. For purposes of summation over all grid points xl,i with
(1)
corresponding basis functions φl,i ∈
/ Vn , we obtain for arbitrary s ∈ N

 ∞
 
−s·|l|1 −s·n −s·d −s·i n+i+d−1
2 =2 ·2 · 2 · (3.66)
d−1
|l|1 >n+d−1 i=0

≤ 2−s·n · 2−s·d · 2 · A(d, n).

Proof. As for the proof of the previous lemma, we get

 ∞
 
2−s·|l|1 = 2−s·i · 1
|l|1 >n+d−1 i=n+d |l|1 =i

 
−s·i i−1
= 2 ·
d−1
i=n+d

 
−s·n −s·d −s·i n+i+d−1
=2 ·2 · 2 · .
d−1
i=0

Since

∞  ∞ (d−1)
 n+i+d−1 x−n 
x · i
= · xn+i+d−1
(3.67)
d−1 (d − 1)!
i=0 i=0
 (d−1)
x−n 1
= · xn+d−1 ·
(d − 1)! 1−x

d−1   (d−1−k)
x−n d − 1  n+d−1 (k) 1
= · · x ·
(d − 1)! k 1−x
k=0
d−1 
  d−1−k
n+d−1 x 1
= · · ,
k 1−x 1−x
k=0

we get with x := 2−s :


  d−1 

−s·i n+i+d−1 n+d−1
2 · ≤2· = 2 · A(d, n).
d−1 k
i=0 k=0

With the above lemma, we obtain the desired result concerning the inter-
(1)
polation quality of standard sparse grid spaces Vn .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Theorem 3.8. For the L∞ -, the L2 -, and the energy norm, we have the
following upper bounds for the interpolation error of a function u ∈ X0q,2 (Ω̄)
(1)
in the sparse grid space Vn :
2 · |u|2,∞ −2n
u − u(1)
n ∞ ≤ ·2 · A(d, n) = O(h2n · nd−1 ), (3.68)
8d
2 · |u|2,2 −2n
u − u(1)
n 2 ≤ ·2 · A(d, n) = O(h2n · nd−1 ),
12d
d · |u|2,∞
u − u(1)
n E ≤ · 2−n = O(hn ),
2 · 3(d−1)/2 · 4d−1
d · |u|2,2
u − u(1)
n E ≤ √ · 2−n = O(hn ).
3 · 6d−1
Proof. With (3.27), (3.64), and (3.66) for s = 2, we get
 |u|2,∞ 
u − u(1)
n ∞ ≤ ul ∞ ≤ · 2−2·|l|1
2d
|l|1 >n+d−1 |l|1 >n+d−1
2 · |u|2,∞ −2n
·2 ≤
· A(d, n)
8d
and, analogously, the corresponding result for the L2 -norm. Concerning the
first bound with respect to the energy norm, we have

u − u(1)
n E ≤ ul E
|l|1 >n+d−1
 d 1/2
|u|2,∞  
≤ · 4−|l|1 · 4lj
2 · 12(d−1)/2
|l|1 >n+d−1 j=1


 d 1/2
|u|2,∞   
= · 4−i · 4lj
2 · 12(d−1)/2 i=n+d
|l|1 =i j=1


|u|2,∞
≤ · d · 2−i
2 · 12(d−1)/2 i=n+d
d · |u|2,∞
= · 2−n ,
2 · 3(d−1)/2 · 4d−1
because
 d 1/2
 
4 lj
≤ d · 2i ,
|l|1 =i j=1

which can be shown by complete induction with respect to d. The last


estimate can be obtained with analogous arguments.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


This theorem shows the crucial improvement of the sparse grid space
(1) (∞)
Vn in comparison with Vn . The number of degrees of freedom is reduced
significantly, whereas the accuracy is only slightly deteriorated – for the L∞ -
and the L2 -norm – or even stays of the same order if the error is measured
in the energy norm. This lessens the curse of dimensionality, but it does not
overcome it completely. Since this result is optimal with respect to both the
L∞ - and the L2 -norm, a further improvement can only be expected if we
change the setting. Therefore, in the following, we study the optimization
process with respect to the energy norm.

Energy-based sparse grids


Now, we base our cost–benefit approach on the energy norm. According to
(3.20) and (3.27), we define

bE (l) 2−4·|l|1 · |u|22,∞  d


cbrE (l) := := · 4lj (3.69)
c(l) 4 · 12d−1 · 2|l−1|1 j=1

3 
d
−5·|l|1
= · 2 · 4lj · |u|22,∞
6d
j=1

as the local cost–benefit ratio. Again, instead of ul E itself, only an upper
bound for the squared energy norm of ul is used. The resulting optimal grid
I(opt) will consist of all those multi-indices l or their respective hierarchical
subspaces Wl that fulfil cbrE (l) ≥ σE (n) for some given constant threshold
σE (n). As before, σE (n) is defined via the cost–benefit ratio of Wl̄ with
l̄ := (n, 1, . . . , 1):
3 −5·(n+d−1)
 n 
σE (n) := cbrE (l̄) = · 2 · 4 + 4 · (d − 1) · |u|22,∞ . (3.70)
6d
Thus, applying the criterion cbrE (l) ≥ σE (n), we come to an alternative
(E)
sparse grid approximation space Vn , which is based on the energy norm:

Vn(E) := Wl . (3.71)
Pd l   
|l|1 − 15 ·log2 j=1 4 j ≤(n+d−1)− 51 ·log2 4n +4d−4

First, we look at the number of grid points of the underlying sparse grids.
(E)
Lemma 3.9. The energy-based sparse grid space Vn is a subspace of
(1)
Vn , and its dimension fulfils
d d
|Vn(E) | ≤ 2n · · e = O(h−1
n ). (3.72)
2

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Proof. For subspaces Wl with |l|1 = n + d − 1 + i, i ∈ N, we have
 d 
1  1  
|l|1 − · log2 4lj ≥ n + d − 1 + i − · log2 4n+i + 4d − 4
5 5
j=1
1   
≥n+d−1+i− · log2 4i 4n + 4d − 4
5
1  
> n + d − 1 − · log2 4n + 4d − 4 .
5
(E)
Therefore, no Wl with |l|1 > n + d − 1 can belong to Vn . Consequently,
(E) (1) (E) (1)
Vn is a subspace of Vn and |Vn | ≤ |Vn | for all n ∈ N. Starting from
that, (3.20) provides

n−1 
|Vn(E) | = |Wl |
Pd n
i=0 |l|1 =n+d−1−i, j=1 4lj
≥ 4 +4d−4
32i

1  −i 
n−1
= 2n · · 2 · 1
2 Pd n
i=0 |l|1 =n+d−1−i, j=1
lj
4 ≥ 4 +4d−4
32i

1  n−1 
≤ 2 · · lim
n
2−i · 1
2 n→∞ Pd n
i=0 |l|1 =n+d−1−i, j=1
lj
4 ≥ 4 +4d−4
32i

 n−1 
1 d − 1 − 1.5i
= 2n · · lim 2−i · d · ,
2 n→∞ d−1
i=0

since it can be shown that, for n → ∞, our energy-based sparse grid and
the grid resulting from the second condition |l|∞ ≥ n − 2.5i for the inner

(∞) (1) (E)


Table 3.1. Dimension of Vn , Vn , and Vn for different values of d and n.

d=2 d=3 d=4

n = 10 n = 20 n = 10 n = 20 n = 10 n = 20

(∞)
Vn 1.05 · 106 1.10 · 1012 1.07 · 109 1.15 · 1018 1.10 · 1012 1.21 · 1024
(1)
Vn 9217 1.99 · 107 47103 2.00 · 108 1.78 · 105 1.41 · 109
(E)
Vn 3841 4.72 · 106 10495 1.68 · 107 24321 5.27 · 107

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


l1 l1

l2 l2

(1) (E)
Figure 3.7. Scheme of subspaces for V30 (left) and V30 (right), d = 2.

sum instead of

d
4n + 4d − 4
4lj ≥
32i
j=1

are the same, and since there exist



d − 1 + 1.5i
d−1
such subspaces Wl with |l|∞ = l1 . Consequently, we obtain

 
(E) n d − 23 i d−1+i
|Vn | ≤ 2 · · 2 ·
2 d−1
i=0
d  2 −d
= 2n · · 1 − 2− 3
2
n d
≤ 2 · · ed ,
2
∞ k+i
since i=0 x
i · k = (1 − x)−k−1 for k ∈ N0 and 0 < x < 1.
Table 3.1 compares the dimensions of the standard full grid approxim-
(∞) (1) (E)
ation space Vn with both sparse grid spaces Vn and Vn for different
dimensionalities d ∈ {2, 3, 4} and for the two resolutions n = 10 and n = 20.
(1)
The sparse grid effect is already obvious for Vn . Especially for larger d,
(E)
the advantages of Vn become evident. For a comparison of the underlying
(1) (E)
subspace schemes of Vn and Vn in 2D, see Figure 3.7.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Next we have to deal with the interpolation accuracy of the energy-based
(E) (E) (E)
sparse grid spaces Vn and to study the sparse grid interpolant un ∈ Vn .
Note that we look at the energy norm only, since, with respect to both the
(1)
L∞ - and the L2 -norm, the spaces Vn are already optimal in our cost–
(E)
benefit setting. Thus, with the reduced number of grid points of Vn , a
deterioration of the (L∞ - and L2 -) interpolation quality is to be expected.
Theorem 3.10. The energy norm of the interpolation error of some u ∈
(E)
X0q,2 (Ω̄) in the energy-based sparse grid space Vn is bounded by
  d−1
(E) d · |u|2,∞ 1 5
u − un E ≤ (d−1)/2 d−1 · + · 2−n = O(hn ), (3.73)
3 ·4 2 2
  d−1
(E) 2 · d · |u|2,2 1 5
u − un E ≤ √ · + · 2−n = O(hn ).
3·6 d−1 2 2
Proof. First, since
u − u(E) (1) (1) (E)
n E ≤ u − un E + un − un E ,
(1)
and since we already know that u − un E is of the order O(hn ), we can
(1) (E)
restrict ourselves to un − un E . For that, it can be shown that, for
i ∈ N0 , each Wl with |l|1 = n + d − 1 − i and |l|∞ ≥ n − 2.5i is a subspace
(E)
of Vn . Therefore, we obtain with (3.27)
u(1) (E)
n − un E

≤ ul E
(1) (E)
Wl ⊆Vn Vn


i 
≤ ul E
i=0 |l|1 =n+d−1−i, |l|∞ <n−2.5i
∗  d 1/2
|u|∞ 
i  
≤ · 4−|l|1 · 4lj
2 · 12(d−1)/2 i=0
|l|1 =n+d−1−i, |l|∞ <n−2.5i j=1
i∗
 d 
|u|∞   
≤ (d−1)/2
· 4−n−d+1 · 4i · 2lj
2 · 12 i=0 |l|1 =n+d−1−i, |l|∞ <n−2.5i j=1
i∗
 n−1− 2.5i
 
|u|∞ n+d−2−i−j
≤ · 4−n−d+1 · 4 ·
i
d· · 2j
2 · 12(d−1)/2 d−2
i=0 j=1

 i∗ n−1− 2.5i
 
|u|∞ d − 2 + 1.5i + k n−
= 4−n−d+1 4i d 2 2.5i −k
2 · 12 (d−1)/2 d−2
i=0 k=1

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 i∗ n−1− 2.5i
 
d · |u|∞ −(d−1) −n d − 2 + 1.5i + k −k
2−
i
= 4 2 2 2
2 · 12(d−1)/2 d−2
i=0 k=1
d · |u|∞
≤ · 4−(d−1) · 2−n · 2 · 5d−1
2 · 12(d−1)/2
 d−1
d · |u|∞ 5
= (d−1)/2 d−1 · · 2−n ,
3 ·4 2
where 0 ≤ i∗ ≤ n − 1 is the maximum value of i for which the set of indices
l with |l|1 = n + d − 1 − i and |l|∞ < n − 2.5i is not empty. Together with
(3.68), we get the first result and, in a completely analogous way, the second
one, too.

Though we have only derived upper bounds for the energy norm of the
interpolation error, it is helpful to compare the respective results (3.32),
(∞) (1) (E)
(3.68) and (3.73) for the three approximation spaces Vn , Vn and Vn .
Table 3.2 shows that there is no asymptotic growth with respect to d, either
for the full grid case or for our two sparse grid spaces.
The crucial result of this section is that, with the energy-based sparse grid
(E)
spaces Vn , the curse of dimensionality can be overcome. In both (3.72)
and (3.73), the n-dependent terms are free of any d-dependencies. There is
an order of O(2n ) for the dimension and O(2−n ) for the interpolation error.
Especially, there is no longer any polynomial term in n like nd−1 . That
is, apart from the factors that are constant with respect to n, there is no
(E) (E)
d-dependence of both |Vn | and u − un E , and thus no deterioration
in complexity for higher-dimensional problems. Furthermore, the growth
of the d-dependent terms in d is not too serious, since we have a factor

(.)
Table 3.2. d-depending constants in the bounds for u − un E (multiply with
|u|2,∞ · 2−n (first row) or |u|2,2 · 2−n (second row) to get the respective bounds).

(∞) (1) (E)


Vn Vn Vn

 
d3/2
d−1
d d 1 5
· +
2 · 3(d−1)/2 · 6d−1 2 · 3(d−1)/2 · 4d−1 3(d−1)/2 · 4d−1 2 2
 
d3/2
d−1
d 2·d 1 5
√ √ √ · +
3 · 9d−1 3 · 6d−1 3 · 6d−1 2 2

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


(1)
Figure 3.8. Recursive structure of V3 for d = 2.

(E)
of d
2· ed in the upper bound of |Vn | and (in the best case; see Table 3.2)
  d−1  (E)
d
3(d−1)/2 ·4d−1
· 12 + 52 in the upper bound of u − un E .

3.3. Recurrences and complexity


In this section, we make a short digression into recurrence formulas for
sparse grids and into sparse grid complexity, in order to learn more about
their asymptotic behaviour. We present the most interesting results only
and refer the reader to Bungartz (1998) for details or proofs.

Recurrence formulas
(1)
For the following, we restrict ourselves to the sparse grid spaces Vn . In
(3.61), they were introduced with the help of an explicit formula. Now we
study their recursive character to obtain further results concerning their
complexity and asymptotic properties. First, starting from (3.62), one can
(1)
show a recurrence relation for |Vn |, the number of (inner) grid points of a
(1)
sparse grid. Note that |Vn | depends on two parameters: the dimensionality
d and the resolution n. Defining
an,d := |Vn(1) |, (3.74)
we get
an,d = an,d−1 + 2 · an−1,d . (3.75)
That is, the d-dimensional sparse grid of resolution (or depth) n consists of a
(d − 1)-dimensional one of depth n − 1 (separator) and of two d-dimensional
sparse grids of depth n − 1 (cf. Figure 3.8, left). If we continue with this
decomposition in a recursive way, we finally obtain a full-history version of
(3.75) with respect to n,

n−1
an,d = 2i · an−i,d−1 , (3.76)
i=0

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 3.9. Recursive structure of b3,2 .

(1)
since a1,d = 1 for all d ∈ N due to (3.74). Thus, a sparse grid Vn of
dimensionality d and depth n can be completely reduced to sparse grids of
dimensionality d − 1 and depth k, k = 1, . . . , n (cf. Figure 3.8, right).
In addition to an,d , we shall now deal with sparse grids with grid points on
the boundary. To this end, let bn,d be the number of overall grid points of the
L2 -based sparse grid of parameters d and n, i.e., in the interior and on the
boundary of Ω̄. On ∂ Ω̄, we assume sparse grids of the same resolution n, but
of a reduced dimensionality.
d d−j Since the boundary of the d-dimensional unit
interval Ω̄ consists of j · 2 j-dimensional unit intervals (j = 0, . . . , d),
we get
d 
d
bn,d := · 2d−j · an,j , (3.77)
j
j=0

where an,0 := 1 for all n ∈ N. With the help of (3.75) and (3.77), the
following recurrence relation for the bn,d can be derived:
bn,d = 2 · bn−1,d + 3 · bn,d−1 − 4 · bn−1,d−1 (3.78)
with its full-history version with respect to n

n−1
bn,d = 3 · bn,d−1 + 2i · bn−i,d−1 (3.79)
i=1

n−1
= 2 · bn,d−1 + 2i · bn−i,d−1 ,
i=0

where the first term stands for the boundary faces xd ∈ {0, 1}, whereas
the sum denotes the inner part of the grid with respect to direction xd .
Figure 3.9 illustrates the recursive structure of bn,d in the 2D case.
Finally, a third quantity cp,d shall be introduced that motivates the sparse
grid pattern from a perspective of approximation with polynomials, thus
anticipating, to some extent, the higher-order approaches to be discussed

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


later. Starting from our hierarchical setting, we are looking for the minimum
number of grid points that are necessary to realize a polynomial approxim-
ation of a certain degree p. In the simple 1D case, things are evident.
First, with one degree of freedom available in the left point of the bound-
ary (x = 0), a constant basis function is placed there, while on the right
side (x = 1), a linear function is chosen, thus allowing linear interpolation
(p = 1) in Ω̄ with two degrees of freedom. Afterwards, on each level l := l
of inner grid points, we raise the degree p of the basis functions by one,
and consequently get an approximation order of degree p = l + 1 for level
l. Note that there is no overall interpolant of degree p > 2 on Ω̄, but, ow-
ing to the hierarchical subspace decomposition, there exists an interpolant,
continuous on Ω̄, that is piecewise polynomial of degree p with respect to
level l − 1 = p − 2.
For d > 1, things are a little bit more complicated. We discuss the defini-
tion of cp,d for d = 2. For constant interpolation, just one degree of freedom
is necessary. Linear interpolation, i.e., p = 1, can be obtained with three
grid points, e.g., with the three degrees of freedom 1, x, and y living in three
of the unit square’s four corners. This proceeding is completely consistent
with the tensor product approach: for x = 0, the 1D basis function is con-
stant with respect to x. Thus, we need linear interpolation with respect to
y (two degrees of freedom for x = 0). On the right-hand side of the unit
square (x = 1), we are linear with respect to x and thus need only one degree
of freedom with respect to y (see Figure 3.10). For a quadratic approxim-
ation, six degrees of freedom are necessary, and so on. This economic use
of degrees of freedom leads to a certain asymmetry of the respective grid
patterns and to a delayed generation of inner grid points: when the first
inner grid point appears, the maximum depth on the boundary is already
three in the 2D case. Nevertheless, the resulting grids for a given polynomial
degree p are very closely related to the standard sparse grids described by
bn,d , since, obviously, the grid patterns resulting in the interior are just the
(1)
standard patterns of Vn . Hence, here, the maximum degree p of the basis
functions used takes the part of the resolution n as the second parameter
besides d.
The principles of construction for the grids resulting from this polynomial
approach lead us to a recurrence relation for cp,d ,


p−2
cp,d = cp,d−1 + cp−1,d−1 + 2i · cp−2−i,d−1 , (3.80)
i=0

from which we get

cp,d = 2 · cp−1,d + cp,d−1 − cp−1,d−1 − cp−2,d−1 . (3.81)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


p = 0: 1 point p = 1: 3 points p = 2: 6 points p = 3: 12 points p = 4: 25 points

basis: {1} basis: {1,x,y} basis: {1,x,y,x^2,y^2,xy} ..... .....

Figure 3.10. Minimum number of grid points for piecewise


polynomial degree p = 3 in the hierarchical setting (d = 2)
and corresponding local bases.

In the following, we present some properties of the an,d , bn,d , and cp,d .
Though we do not want to go into detail here, note that, in contrast to
many recurrences studied for the analysis of algorithms (cf. Graham, Knuth
and Patashnik (1994), Sedgewick and Flajolet (1996), for example), these
quantities, and thus the overall storage requirement and computational cost
connected with sparse grids, depend on two parameters. Let d ∈ N, n ∈ N,
and p ∈ N0 . The initial conditions
a1,d = 1 ∀d ∈ N, an,1 = 2n − 1 ∀n ∈ N, (3.82)
b1,d = 3 d
∀d ∈ N, bn,1 = 2 + 1 ∀n ∈ N,
n

c0,d = 1 ∀d ∈ N, cp,1 = 2p−1 + 1 ∀p ∈ N,


c1,d = d + 1 ∀d ∈ N
follow immediately from the semantics of an,d , bn,d , and cp,d . Owing to
(3.76), (3.79), (3.80), and the initial conditions (3.82), all an,d , bn,d , and cp,d
are natural numbers. Next, we study the behaviour of our three quantities
for increasing d, n, or p, respectively. The following lemma summarizes the
asymptotic behaviour of an,d , bn,d , and cp,d .

Lemma 3.11. The following relations are valid for the n- and d-asymptotic
behaviour of an,d and bn,d :
an,d+1 d→∞ an,d+1 n→∞
−→ 1, −→ ∞, (3.83)
an,d an,d
an+1,d n→∞ an+1,d d→∞
−→ 2, −→ ∞,
an,d an,d
bn,d+1 d→∞ bn,d+1 n→∞
−→ 3, −→ ∞,
bn,d bn,d
bn+1,d n→∞ bn+1,d d→∞
−→ 2, −→ ∞.
bn,d bn,d

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


For cp,d , the respective limits are
cp,d+1 d→∞ cp,d+1 p→∞
−→ 1, −→ ∞, (3.84)
cp,d cp,d
cp+1,d p→∞ cp+1,d d→∞
−→ 2, −→ ∞.
cp,d cp,d
For an,d , bn,d , and cp,d , the results for the limits n → ∞ are not too
(d)
surprising. In the case of an,d = |Vn |, for instance, they can be derived from
the order terms 2 ·nn d−1 with respect to n according to (3.63). However, the
statements concerning the limits d → ∞ provide some new and interesting
information. First, the ratio an+1,d /an,d of two sparse grids of resolution
n+1 and n, for example, is not bounded for increasing d. That is, increasing
the resolution does not entail only a certain bounded factor of increasing
cost, but the relative increase in cost switching from resolution n to n + 1
becomes bigger and bigger for higher dimensionality and is not bounded.
Second, for a fixed resolution n, the relative difference between two sparse
grids of dimensionality d and d + 1 becomes more and more negligible for
increasing d, which is, in fact, somewhat surprising. Note that this is a hard
statement and not just one dealing with orders of magnitude.
However, after all those asymptotic considerations, it is important to note
that, often, such limits are not excessively useful for numerical purposes,
since practical computations do not always reach the region of asymptotic
behaviour. Therefore, usually, the constant factors play a more predominant
part than an asymptotic point of view suggests. Furthermore, it certainly
makes sense to increase resolution during the numerical solution of a given
problem, but a variable dimensionality is, of course, more of a theoretical
interest.

ε-complexity
An approach that is closely related to the cost–benefit setting used for
the derivation of the sparse grid approximation spaces is the concept of
the ε-complexity (Traub and Woźniakowski 1980, Traub, Wasilkowski and
Woźniakowski 1983, Traub, Wasilkowski and Woźniakowski 1988, Woźnia-
kowski 1985). The ε-complexity of a numerical method or algorithm indic-
ates the computational work that is necessary to produce an approximate
solution of some prescribed accuracy ε. In particular, for the complex-
ity of general multivariate tensor product problems, see Wasilkovski and
Woźniakowski (1995). We consider the ε-complexity of the different discrete
(∞) (1) (E)
approximation spaces Vn , Vn , and Vn for the problem of representing
q,2
a function u ∈ X0 (Ω̄) on a grid, i.e., the problem of constructing the in-
(∞) (∞) (1) (1) (E) (E)
terpolant un ∈ Vn , un ∈ Vn , or un ∈ Vn , respectively. To this
end, the overall computational cost caused by the interpolation in one of the

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


three above discrete spaces will be estimated by the number of degrees of
freedom (i.e., grid points), or by an upper bound for this number. This does
not constitute a restriction, since there are algorithms that can calculate the
(·) (·)
interpolant un in O(|Vn |) arithmetic operations, of course. Furthermore,
as a measure for the accuracy, we use the upper bounds for the interpolation
(∞) (1) (E)
error u − un , u − un , and u − un  with respect to the different
norms, as provided by (3.32), (3.68), and (3.73).
(∞)
First, we deal with the well-known case of the regular full grid space Vn ,
where the curse of dimensionality is predominant and causes the problem
to be intractable in the sense of Traub et al. (1988): the computational
cost of obtaining an approximate solution of some given accuracy ε grows
exponentially in the problem’s dimensionality d. Note that, in the following,
all occurring order terms have to be read with respect to ε or N , respectively,
i.e., for arbitrary, but fixed d.

Lemma 3.12. For the ε-complexities N∞ (ε), N2 (ε), and NE (ε) of the
(∞) (∞)
problem of computing the interpolant un ∈ Vn with respect to the L∞ -
, the L2 -, and the energy norm for some prescribed accuracy ε, the following
relations hold:
 d  d
N∞ (ε) = O ε− 2 , N2 (ε) = O ε− 2 , (3.85)
 −d 
NE (ε) = O ε .
Conversely, given a number N of grid points, the following accuracies can
be obtained with respect to the different norms:
 2  2
εL∞ (N ) = O N − d , εL2 (N ) = O N − d , (3.86)
 −1 
εE (N ) = O N d .
Proof. The statements follow directly from (3.31) and (3.32).
(1)
Next we turn to the L2 -based sparse grid space Vn . As we have already
(1)
seen, Vn lessens the curse of dimensionality, but does not yet overcome it.

Lemma 3.13. For the ε-complexities N∞ (ε), N2 (ε), and NE (ε) of the
(1) (1)
problem of computing the interpolant un ∈ Vn with respect to the L∞ -,
the L2 -, and the energy norm for some prescribed accuracy ε, the following
relations hold:
 1 3 
N∞ (ε), N2 (ε) = O ε− 2 · | log2 ε| 2 ·(d−1) , (3.87)
 −1 
NE (ε) = O ε · | log2 ε|d−1 .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Conversely, given a number N of grid points, the following accuracies can
be obtained with respect to the different norms:
 
ε∞ (N ), ε2 (N ) = O N −2 · | log2 N |3·(d−1) , (3.88)
 −1 
εE (N ) = O N · | log2 N |d−1 .
Proof. See Bungartz (1998), for example.
(E)
Finally, for the energy-based sparse grid space Vn , the situation is evid-
ent and gratifying: the curse of dimensionality has disappeared.
Lemma 3.14. For the ε-complexity NE (ε) of the problem of computing
(E) (E)
the interpolant un ∈ Vn with respect to the energy norm for some pre-
scribed accuracy ε, the following relation holds:
 
NE (ε) = O ε−1 . (3.89)
Thus, for a fixed number of N of grid points, the following accuracy can be
obtained:
 
εE (N ) = O N −1 . (3.90)
Proof. Both results are obvious consequences of (3.72) and (3.73).
With the above remarks on the ε-complexity of the problem of interpol-
(1) (E)
ating functions u ∈ X0q,2 (Ω̄) in our two sparse grid spaces Vn and Vn ,
we close the discussion of the piecewise d-linear case.

4. Generalizations, related concepts, applications


In the previous section we presented the main ideas of the sparse grid ap-
proach starting from the 1D piecewise linear hierarchical basis, which was
then extended to the general piecewise d-linear hierarchical basis by the dis-
cussed tensor product construction. However, the discretization on sparse
grids is, of course, not limited to this explanatory example, but can be
directly generalized to other multiscale bases such as p-type hierarchical
bases, prewavelets, or wavelets, for instance. To this end, another 1D multi-
scale basis must be chosen. Then the tensor product construction as well
as the cut-off of the resulting series expansion will lead to closely related
sparse grids.
In the following, we first give a short survey of the historic background of
sparse grids. Then we generalize the piecewise linear hierarchical basis to
hierarchical polynomial bases of higher order, and discuss hierarchical La-
grangian interpolation of Bungartz (1998) and the so-called interpolets of
Deslauriers and Dubuc (1989). After that, we discuss the use of prewavelets
and wavelets in a sparse grid context. A short overview of the current state

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


concerning sparse grid applications, with a focus on the discretization of
PDEs including some remarks on adaptive refinement and fast solvers, will
close this section.

4.1. Ancestors
The discussion of hierarchical finite elements (Peano 1976, Zienkiewicz,
Kelly, Gago and Babuška 1982) and, in particular, the series of articles
by Yserentant (1986, 1990, 1992) introducing the use of hierarchical bases
for the numerical solution of PDEs, both for purposes of an explicit discret-
ization and for the construction of preconditioners, was the starting point of
Zenger’s sparse grid concept (Zenger 1991). The generalization of Yserent-
ant’s hierarchical bases to a strict tensor product approach with its underly-
ing hierarchical subspace splitting discussed in Section 3 allowed the a priori
identification of more and of less important subspaces and grid points. As
we have seen, it is this characterization of subspaces that the definition of
sparse grids is based on. With the sparse grid approach, for the first time,
a priori optimized and fully structured grid patterns were integrated into
existing and well-established discretization schemes for PDEs such as finite
elements, and were combined with a very straightforward access to adaptive
grid refinement.
Even if the sparse grid concept was new for the context of PDEs, very
closely related techniques had been studied for purposes of approximation,
recovery, or numerical integration of smooth functions, before. For instance,
the generalization of Archimedes’ well-known hierarchical quadrature of
1 − x2 on [−1, 1] to the d-dimensional case via Cavalieri’s principle (see
Figure 4.1) is a very prominent example of an (indeed early) hierarchical
tensor product approach. Much later, Faber (1909) discussed the hierarch-
ical representation of functions.
Finally, once more, the Russian literature turns out to be helpful for
exploring the roots of a new approach in numerical mathematics. Two

Figure 4.1. Hierarchical quadrature according to Archimedes (left)


and application of Cavalieri’s principle (right).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.2. Smolyak quadrature patterns based on the trapezoidal
rule as the one-dimensional algorithm: p = 2 and n = 8 (left), p = 3
and n = 5 (centre), p = 4 and n = 4 (right).

names that have to be mentioned here are those of Smolyak and Babenko.
Smolyak (1963) studied classes of quadrature formulas of the type
 n 
 (1) (1)  (d−1)
Q(d)
n f := Qi − Qi−1 ⊗ Qn−i f (4.1)
i=0

that are based on a tensor product ⊗ of lower-dimensional operators. In


(d)
(4.1), Qn denotes a d-dimensional quadrature formula based on the 1D
(1)
rule Qn that is, for n ∈ N, usually chosen to be the compound formula
resulting from the application of some simple formula Q on pn subintervals
[i/pn , (i + 1)/pn ], i = 0, . . . , pn − 1, of [0, 1] for some natural number p ≥ 2.
(1) (1)
Furthermore, the midpoint rule is usually taken as Q0 , and Q−1 ≡ 0 (see
Frank and Heinrich (1996), Novak and Ritter (1996), and Smolyak (1963)).
Figure 4.2 shows several examples of grids resulting from the application
of 2D Smolyak quadrature for different n and p. Functions suitable for
the Smolyak approach typically live in spaces of bounded (Lp -integrable)
mixed derivatives which are closely related to our choice of X0q,r (Ω̄) in (3.2).
A similar approach to the approximation of periodic multivariate functions
of bounded mixed derivatives may be found in Babenko’s hyperbolic crosses
(Babenko 1960). Here, Fourier monomials or coefficients, respectively, are
taken from sets of the type
d

Γ(n) := k ∈ Zd : max{|kj |, 1} ≤ n . (4.2)
j=1

For a detailed discussion of those methods, we refer to the work of Temlyakov


(1989, 1993, 1994). There are several other approaches that are more or
less based on Smolyak’s tensor product technique, for example the so-called
Boolean methods of Delvos et al. (Delvos 1982, Delvos and Schempp 1989,
Delvos 1990, Baszenski and Delvos 1993) or the discrete blending methods

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


of Baszenski, Delvos and Jester (1992) going back to work of Gordon (Gor-
don 1969, 1971, Gordon and Hall 1973). For the general analysis of such
tensor product methods, see also Wasilkovski and Woźniakowski (1995).
Concerning the computational cost, Smolyak’s tensor product approach and
its derivatives are characterized by terms of the order O(N · (log2 N )d−1 ),
where N denotes the 1D cost, as we have shown them for the sparse grid
(1)
spaces Vn in (3.63). Furthermore, the tensor product approach of (4.1)
itself, obviously, calls to mind the definition (3.61) of the L2 -based sparse
(1)
grid spaces Vn , which can be written as
 (d)

n
(1) (d−1,1)
Vn(d,1) := Vn(1) = Wl = Wl ⊗ Vn+d−1−l , (4.3)
|l|1 ≤n+d−1 l=1

that is, in a tensor product form, too.


Finally, there are of course close relations of hierarchical bases and sparse
grids on the one hand and wavelets on the other hand. These will be dis-
cussed in Section 4.4.

4.2. Higher-order polynomials


In this section, we discuss how to generalize the piecewise linear sparse grid
method to higher-order basis functions.

The hierarchical Lagrangian interpolation


The first approach, the so-called hierarchical Lagrangian interpolation intro-
duced in Bungartz (1998), uses a hierarchical basis of piecewise polynomials
of arbitrary degree p, still working with just one degree of freedom per node.
Before we discuss the underlying p-hierarchy, note that the piecewise con-
stant case, i.e., p = 0, as illustrated in Figure 4.3, is the natural starting
point of such a hierarchy.
In accordance with the tensor product approach (3.9), we define basis
functions of the (generalized) degree p := (p1 , . . . , pd ) ∈ Nd as products
d
(p) (p )
φl,i (x) := φlj ,ij j (xj ) (4.4)
j=1

of d 1D basis polynomials of degree pj with the respective supports [xlj ,ij −


hlj , xlj ,ij + hlj ]. Note that, since there is no change in the underlying grids
Ωl , the grid points xl,i or xlj ,ij and the mesh widths hl or hlj are defined
exactly as in the linear case. This, again, allows the restriction to the 1D
case. For reasons of clarity, we will omit the index j. As already mentioned,
we want to preserve the ‘character’ of the elements, i.e., we want to keep
to C 0 -elements and manage without increasing the number of degrees of

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.3. Piecewise constant hierarchical bases: continuous
from the left (left) or continuous from the right (right).

freedom per element or per grid point for a higher p. However, to determine
a polynomial u(p) (x) of degree p uniquely on [xl,i − hl , xl,i + hl ], we need
p + 1 conditions u(p) (x) has to fulfil. In the linear case, the interpolant
resulting from the hierarchically higher levels is defined by its values in
the two boundary points xl,i ± hl . For p ≥ 2, these two conditions are
no longer sufficient. Therefore, we profit from the hierarchical history of
xl,i . Figure 4.4 shows the hierarchical relations of the grid points according
to the hierarchical subspace splitting of Section 3.1. Apart from xl,i ± hl ,
(p)
which mark the boundary of the support of φl,i , xl,i may have hierarchical
ancestors that are all located outside this support. Consequently, for the
definition of such a local interpolant u(p) (x), it is reasonable and, for the
construction of a hierarchical basis, essential to take the values of u in
xl,i ± hl (as in the linear case) and, in addition, in a sufficient number of
hierarchically next ancestors of xl,i . These considerations lead us to the
following definition.
Let u ∈ C p+1 ([0, 1]), 1 ≤ p ≤ l, and let Ωl denote the 1D grid of mesh
width hl = 2−l with grid points xl,i according to (3.4)–(3.6). Then, the
hierarchical Lagrangian interpolant u(p) (x) of degree p of u(x) with respect
to Ωl is defined on [xl,i − hl , xl,i + hl ], i odd, as the polynomial interpolant of
(xk , u(xk )), k = 1, . . . , p+1, where the xk are just xl,i ±hl and the p−1 next
hierarchical ancestors of xl,i . Note that u(p) (x) is continuous on Ω̄, piecewise
of polynomial degree p with respect to the grid Ωl−1 , and it interpolates u(x)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.4. Ancestors (here: boundary points of the respective
basis function’s support and two more (p = 4); solid) and
descendants (dotted) of two grid points.

on Ωl−1 . Nevertheless, it is defined locally on [xl,i − hl , xl,i + hl ]. Thus, the


width of the interval taken for the local definition of u(p) (x) is 2p · hl , but for
any kind of further calculations, u(p) (x) is living only on the local interval
of size 2 · hl . The restriction l ≥ p is due to the fact that, with degree p,
we need at least p + 1 ancestors for the interpolation. Now we study the
local approximation properties of the hierarchical Lagrangian interpolation,
that is, the interpolation error u(xl,i ) − u(p) (xl,i ) or hierarchical surplus
(cf. (3.19)).

Lemma 4.1. Let u ∈ C p+1 ([0, 1]), 1 ≤ p ≤ l, and let x1 < · · · < xp+1
be the ancestors of xl,i on level l, i odd, taken for the construction of the
hierarchical Lagrangian interpolant u(p) of u in [xl,i − hl , xl,i + hl ]. Then the
hierarchical surplus v (p) (xl,i ) in xl,i fulfils
p+1
(p) (p) 1  
v (xl,i ) := u(xl,i ) − u (xl,i ) = · Dp+1 u(ξ) · xl,i − xk (4.5)
(p + 1)!
k=1

for some ξ ∈ [x1 , xp+1 ]. Moreover, the order of approximation is given by


 (p)  1  
v (xl,i ) ≤ · Dp+1 u(ξ) · hp+1 · 2p·(p+1)/2−1 . (4.6)
(p + 1)! l

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Proof. (4.5) is a standard remainder formula for the interpolation with
polynomials. For (4.6), a careful look at the distances xl,i − xk provides

 (p)  1  p+1  p+1 


v (xl,i ) = · D u(ξ) ·
 xl,i − xk 
(p + 1)!
k=1

1  p+1  p+1 p  k 
≤ 
· D u(ξ) · hl · 2 −1
(p + 1)!
k=1
1  p+1  p+1 p·(p+1)/2−1
≤ · D u(ξ) · hl · 2 ,
(p + 1)!
which is (4.6).
Hence, we obtain the desired increase in the order of approximation, for
the price of a factor growing exponentially in p2 . This is a hint that increas-
ing p on each new level will not be the best strategy.
As an analogue to (3.24), an integral representation can be shown for
(p)
the general order hierarchical surplus, too. For that, define sl,i (t) as the
minimum support spline with respect to xl,i and its p + 1 direct hierarchical
ancestors (renamed x0 , . . . , xp+1 in increasing order):

(p)   
p+1
(xk − t)p+
sl,i (t) := x0 , . . . , xp+1 (x − t)p+ =  (x ) . (4.7)
wl,i k
k=0
 
Here, x0 , . . . , xp+1 f (x) just denotes the divided differences of order p with
respect to x,
(x − t)p , for x − t ≥ 0,
(x − t)p+ := (4.8)
0, otherwise,
and
p+1
wl,i (x) := (x − xj ). (4.9)
j=0

Lemma 4.2. With the above definitions, we get the following integral
representation for the hierarchical surplus v (p) (xl,i ):
 (x )  ∞
wl,i
(p) l,i (p)
v (xl,i ) = · sl,i (t) · Dp+1 u(t) dt. (4.10)
p! −∞

Proof. See Bungartz (1998).


An immediate consequence of (4.10) is the relation
 xp+1
(p)
Dp−1 sl,i (t) · f (t) dt = 0 (4.11)
x0

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


(p)
Figure 4.5. Dp−1 sl,i (t) for p = 1 and the two constellations of
p = 2 (left) and for two of the four cubic constellations (right).

for all f ∈ Pp−2 . Owing to (4.7),

(p)

p+1
(xk − t)+
Dp−1 sl,i (t) = (−1) p−1
·  (x ) (4.12)
wl,i k
k=0

is piecewise linear and continuous. In wavelet terminology, (4.11) means


(p)
that the p − 1 first moments of the Dp−1 sl,i (t) vanish. Therefore we could
(p)
construct sl,i (t) and, hence, the hierarchical surplus v (p) (xl,i ) of degree p
the other way round, too. Starting from x0 , . . . , xp+1 , look for the piecewise
linear and continuous function σ (p) (t) with σ (p) (x0 ) = σ (p) (xp+1 ) = 0 and
with vanishing first p − 1 moments, which is determined up to a constant
factor, and integrate the resulting σ (p) (t) p − 1 times. The remaining degree
of freedom can be fixed by forcing the coefficient of u(xm ) to be 1 (as it is in
v (p) (xm )). The left-hand part of Figure 4.5 shows this (p − 1)st derivative of
(p)
sl,i (t) for the possible constellations of ancestors for p = 1, 2. In the linear
case, we have no vanishing moment; for p = 2, we get one. The cubic case
is illustrated in the right-hand part of Figure 4.5.
Note that Figure 4.5 suggests that the higher-order case may be obtained
without explicitly working with polynomials of arbitrary degree, but just by
a superposition of the linear approximations in different ancestors. In fact,
the spline’s (p − 1)st derivative, and thus the calculation of the hierarchical
surplus v (p) (xl,i ) of degree p, can be reduced to the linear case.
Lemma 4.3. Let p, l, and i be defined as before. Furthermore, let xm
denote the current grid point xl,i , and let xn be its hierarchical father. For
any u ∈ C p+1 ([0, 1]), the hierarchical surplus v (p) (xm ) of degree p in xm can
be calculated with the help of v (p−1) (xm ) and of v (p−1) (xn ):
v (p) (xm ) = v (p−1) (xm ) − α(x0 , . . . , xp+1 ) · v (p−1) (xn ), (4.13)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.6. Hierarchical basis polynomials for p = 2 and p = 3
(different scaling for clarity): construction via hierarchical
Lagrangian interpolation (left) and restriction to the respective
hierarchical support (right).

where α depends on the relative position of xm ’s ancestors, but not on the


interpolated values.
Proof. See Bungartz (1998).
Having introduced the hierarchical Lagrangian interpolation we now dis-
cuss the corresponding hierarchical basis polynomials of degree p. Such
(p)
a φl,i in xl,i with i odd, l ≥ p − 1, and p ≥ 2 is uniquely defined on
[xl,i − hl , xl,i + hl ] by the following p + 1 conditions:
(p) (p)
φl,i (xl,i ) := 1, φl,i (xk ) := 0, (4.14)
where the xk are just xl,i ± hl and the p − 2 next hierarchical ancestors of
(p)
xl,i . Additionally, we force φl,i to vanish outside [xl,i − hl , xl,i + hl ]:
(p)
φl,i (x) := 0 for x ∈
/ [xl,i − hl , xl,i + hl ]. (4.15)
Note that the restriction p ≥ 2 is due to the fact that p = 1 does not fit into
this scheme. Since the typical basis function used for linear approximation is
a piecewise linear one only, there are three degrees of freedom to determine
it uniquely on [xl,i − hl , xl,i + hl ], as in the quadratic case.
At this point, three things are important to realize. First, this definition
is fully consistent with the hierarchical Lagrangian interpolation: a global
(p)
interpolant u(p) (x) built up with the help of these φl,i fulfils all require-
(p)
ments. Second, though the definition of the φl,i is based on points out-
side [xl,i − hl , xl,i + hl ] (for p > 2, at least, to be precise), we use only
(p)
[xl,i − hl , xl,i + hl ] as support of φl,i according to (4.15). Thus, the support
of the basis polynomials does not change in comparison with the piecewise

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.7. Hierarchical basis polynomials for p = 4: construction
via hierarchical Lagrangian interpolation (left) and restriction to
the respective hierarchical support (right).

linear case. Finally, since we need p − 2 ancestors outside [xl,i − hl , xl,i + hl ],


a basis polynomial of degree p can not be used earlier than on level p − 1.
The quadratic basis polynomial is uniquely defined by its values in xl,i
and xl,i ± hl . For p = 3, however, the shape of the resulting polynomial
(p)
depends on where the third zero outside φl,i ’s support is located. Owing to
the hierarchical relations illustrated in Figure 4.4, there are two possibilities
for this ancestor’s position: to the left of xl,i , i.e., in x = xl,i − 3 · hl , or to
the right in x = xl,i + 3 · hl . Thus we get two different types of cubic basis
polynomials. Figure 4.6 illustrates this for the highest two levels l ∈ {1, 2}.
In x = 0.5 (i.e., l = 1), only p = 2 is possible owing to the lack of ancestors
outside Ω̄. On the next level, in x = 0.25, the third ancestor x = 1.0 is
situated to the right of 0.5. In x = 0.75, the third ancestor is x = 0.0, to
the left of x = 0.75. Of course, both cubic functions are symmetric with
respect to x = 0.5. If we continue with cubic basis polynomials on the lower
levels l > 2, no new relationships will occur. Thus we can manage with these
two types of cubic polynomials. Figure 4.7 shows the four types of quartic
basis polynomials. Owing to the underlying hierarchy, these four (pairwise
symmetric) polynomials cover all possible constellations. Obviously, in the
general case of an arbitrary polynomial degree p > 1, our approach leads to
2p−2 different types of basis functions.
Moreover, Figure 4.7 illustrates that the four quartic basis polynomials
do not differ that much. This effect even holds for all basis polynomials of
arbitrary p. Figure 4.8 shows this similarity for all different types of basis
polynomials up to degree 7 (all now plotted with respect to the common
support [−1, 1]). This similarity is also reflected in the analytical properties
of the basis polynomials. For details, we refer the reader to Bungartz (1998).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.8. All hierarchical basis polynomials for p ≤ 6
(31 different types; left) and p ≤ 7 (63 different types; right)
with respect to the support [−1, 1].

Higher-order approximation on sparse grids


Finally, we deal with the generalization of the piecewise linear sparse grid
(1) (E)
approximation spaces Vn and Vn to the case of piecewise polynomials
of degree p ≥ 2 or p ≥ 2 according to the hierarchical Lagrangian interpol-
ation. For the discussion of the approximation properties of the resulting
sparse grid spaces of higher order, we restrict ourselves to the p-regular case.
For some fixed maximum degree p := pmax , the polynomials are chosen as
described in the previous section until we reach level p − 1 or (p − 1) · 1. On
the further levels, we use this p. Note that the degree p of a basis function
or of a subspace, respectively, has to fulfil the condition
2 ≤ p = min{p · 1, l + 1}, (4.16)
where the minimum is taken component-wise. The lower bound reflects the
fact that there are no linear hierarchical Lagrangian basis functions; the
upper bound is caused by our choice of a maximum degree p and by the
lack of basis polynomials of degree p earlier than on level p − 1.
The starting point is the space X0p+1,q of functions of (in some sense)
bounded weak mixed derivatives of the order less than or equal to p + 1
in each direction: cf. (3.2). For the approximation of some u ∈ X0p+1,q ,
(p)
we use product-type basis functions φl,i (x) according to (4.4). Without
(p)
introducing nodal subspaces Vl , we directly turn to the hierarchical sub-
(p)
spaces Wl :
(p) (p)
Wl := span φl,i : i ∈ Il . (4.17)
(p)
The completion of the sum of all Wl with respect to the energy norm

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


contains X0p+1,q. Thus, analogously to (3.19), the hierarchical subspace de-
composition leads to a hierarchical representation of degree p of u ∈ X0p+1,q :
 (p) (p)
 (p) (p) (p) (p)
u(x) = ul (x), ul (x) = vl,i · φl,i (x) ∈ ul ∈ Wl , (4.18)
l i∈Il
(p)
where vl,i ∈ R is just the hierarchical coefficient or surplus. Note that, in
(4.18), the degree p is not constant because of (4.16).
(p)
Concerning the cost and benefit of the Wl , remember that the number
(p)
of degrees of freedom induced by Wl does not increase in comparison to
the linear situation. Thus we still have
 (p)   
W  = Wl  = 2|l−1|1 . (4.19)
l
(p)
On the other hand, the discussion of Wl ’s benefit, i.e., of its contribution
(p) (p)
to the overall interpolant ul ∈ Wl of some u ∈ X0p+1,q , requires studying
(p)
the different norms of our hierarchical Lagrangian basis polynomials φl,i (x).
Lemma 4.4. For any d-dimensional hierarchical Lagrangian basis polyno-
(p)
mial φl,i (x) according to the above discussion, the following relations hold:
 (p) 
φ  ≤ 1.117d , (4.20)
l,i ∞
 (p) 
φ  ≤ 1.117d · 2d/q · 2−|l|1 /q , q ≥ 1,
l,i q
 d/2  d 1/2
 (p)  5 
φ  ≤ 3.257 · ·2−|l| 1 /2
· 2 2l j
.
l,i E 2
j=1

Proof. The statements follow immediately from analytical properties of


the basis polynomials: see Bungartz (1998) for details.
(p)
Next we consider vl,i , the d-dimensional hierarchical surplus of degree p.
Inspired by the integral representation (4.10) of the 1D surplus, we define

(p)
d wlj ,ij (xlj ,ij ) (p )
σl,i (x) := · slj ,ij j (xj ), (4.21)
pj !
j=1
(p )
where wlj ,ij (xj ) and slj ,ij j (xj ) are defined exactly as in (4.9) and (4.7), but
now for direction xj and based on the respective hierarchical ancestors. With
the help of (4.21) we obtain an integral representation analogous to (3.24).
(p)
Lemma 4.5. For u ∈ X0p+1,q , the hierarchical surplus vl,i fulfils

(p) (p)
vl,i = σl,i (x) · Dp+1 u(x) dx. (4.22)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Proof. Analogously to (3.23) for p = 1, the d-dimensional interpolation
operator or stencil can be written as an operator product of the d univari-
ate operators. Thus we can proceed as in the proof of (3.24) and do the
integration with respect to the d coordinate directions one after the other.
According to (4.10) and (4.21), the 1D integral with respect to xj leads to
the 1D surplus with respect to xj . Consequently, the d-dimensional integral
equals the d-dimensional surplus, as asserted in (4.22).

Again, note the close relations between the hierarchical approach and
integral transforms. Applying successive partial integration to (4.22), we get
 
(p) (p) p+1 |p+1|1 (p)
vl,i = σl,i (x)D u(x) dx = (−1) σ̂l,i (x)u(x) dx, (4.23)
Ω Ω
(p) (p)
where σ̂l,i (x) equals Dp+1 σl,i (x) in a weak sense and is a linear combin-
ation of (p + 2)d Dirac pulses of alternating sign. Thus, again, the surplus
can be interpreted as the coefficient resulting from an integral transform
(p)
based on the function σ̂l,i (x).
Next, (4.22) leads us to upper bounds for the d-dimensional surplus of
(p) (p)
degree p and for Wl ’s contribution ul to the hierarchical representation
of u.

Lemma 4.6. Let u ∈ X0p+1,q be given in its hierarchical representation


(4.18). Then, with
d
2pj ·(pj +1)/2
c(p) := , (4.24)
(pj + 1)!
j=1

and with the seminorms |u|α,∞ and |u|α,2 defined in (3.3), the following
(p)
estimates hold for the d-dimensional hierarchical surplus vl,i :

 (p)  1 d
v  ≤ · c(p) · 2−|l·(p+1)|1 · |u|p+1,∞ , (4.25)
l,i 2

 (p)  1 d/2  
v  ≤ · c(p) · 2−|l·(p+1)|1 · 2|l|1 /2 · Dp+1 u |supp(φl,i ) 2 .
l,i 6
(p )
Proof. Owing to (4.21), (4.22), and slj ,ij j (xj ) ≥ 0, we have
 
 (p)   (p) 
v  =  σ (x) · D p+1
u(x) dx
l,i  l,i

 
 d wlj ,ij (xlj ,ij ) (p ) 
=  · slj ,ij j (xj ) ·D p+1
u(x) dx
Ω j=1 pj !

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 


d wlj ,ij (xlj ,ij )  (pj )
 
 
≤ · slj ,ij (xj ) dxj  · Dp+1 u∞
 pj ! [0,1] 
j=1
   
d
 wlj ,ij (xlj ,ij ) (p ) 
=  · slj ,ij (xj ) dxj  · |u|p+1,∞ .
j
 pj !
j=1 [0,1]

Because of (4.10), each of the d factors in the above product is the absolute
p +1
value of the hierarchical surplus of xj j /(pj + 1)!, for j = 1, . . . , d, which
is bounded by
1 p +1
· h j · 2pj ·(pj +1)/2−1
(pj + 1)! lj

because of (4.6). Thus we obtain

 (p)  d
1 p +1
v  ≤ · h j · 2pj ·(pj +1)/2−1 · |u|p+1,∞
l,i (pj + 1)! lj
j=1

= 2−d · c(p) · 2−|l·(p+1)|1 · |u|p+1,∞ .

For the bound with respect to the L2 -norm, we start from


 
 (p)       
v  =  σ (p) (x) · Dp+1 u(x) dx ≤ σ (p)  · Dp+1 u |supp(φ )  .
l,i  l,i  l,i 2 l,i 2

(p )
According to (4.21), and since slj ,ij j (xj ) ≥ 0,
  
 (p)  d
 wlj ,ij (xlj ,ij ) (pj ) 
σ  =  · slj ,ij 
l,i 2  pj ! 
j=1 2
   2 1/2
d
 wlj ,ij (xlj ,ij ) (pj ) 
=  · slj ,ij (xj ) dxj
 pj !
j=1 [0,1]
    1/2
d
 wlj ,ij (xlj ,ij ) (pj )  1 p +1
≤ max  · slj ,ij (xj ) · h j 2pj ·(pj +1)/2−1
xj ∈[0,1] pj ! (pj + 1)! lj
j=1
 1/2
d
 (p )  1 2·(p +1)
≤ max slj ,ij j (xj ) · h j 2pj ·(pj +1)−2
xj ∈[0,1] pj ! · (pj + 1)! lj
j=1

holds because of (4.6) and (4.10). In (4.10), choose u such that

wlj ,ij (xlj ,ij ) (p )


Dpj +1 u(xj ) = · slj ,ij j (xj )
pj !

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


and apply (4.6). Finally, since
(p ) 2 1
|slj ,ij j (xj )| ≤ ·
3 (pj + 1) · hlj
can be shown for all xj ∈ [0, 1] and for all possible lj , ij , and pj , we get
 d/2
 (p) 
σ  ≤ 2−d · 2 · c(p) · 2−|l·(p+1)|1 · 2|l|1 /2 ,
l,i 2 3
 d/2
 (p)   
v  ≤ 1 · c(p) · 2−|l·(p+1)|1
· 2|l|1 /2  p+1
· D u |supp(φ )
 .
l,i 6 l,i 2

Lemma 4.7. Let u ∈ X0p+1,q be given in its representation (4.18). Then


(p) (p)
the following upper bounds hold for the contributions ul ∈ Wl :
 (p) 
u  ≤ 0.5585d c(p) · 2−|l·(p+1)|1 · |u|p+1,∞ , (4.26)
l ∞
 d/2
 (p)   
u  ≤ 1.117d · 1 c(p) · 2−|l·(p+1)|1  
· u p+1,2 ,
l 2 3
 d/2  d 1/2
 (p)  5 
u  ≤ 3.257 · c(p) · 2 −|l·(p+1)|1
· 2l
2 j · |u|p+1,∞ ,
l E 8
j=1

  d 1/2
 (p)  5 d/2 
u  ≤ 3.257 · c(p) · 2−|l·(p+1)|1 · 22lj · |u|p+1,2 .
l E 12
j=1

Proof. All results follow from the previous lemmata, with arguments com-
pletely analogous to the piecewise linear case.
Now, we are ready for the optimization process studied in detail for the
piecewise linear case in Section 3. For the p-regular scenario, a slight sim-
plification allows us to use the diagonal subspace pattern again, that is,
 (p)
Vn(1,1) := Vn(1) , Vn(p,1) := Wl for p > 1. (4.27)
|l|1 ≤n+d−1

As before, note that p is not constant, owing to (4.16). The following the-
(p,1)
orem deals with the approximation quality of these sparse grid spaces Vn .
Theorem 4.8. For the L∞ -, L2 -, and energy norm, the following bounds
(p,1) (p,1)
for the error of the interpolant un ∈ Vn of u ∈ X0p+1,q hold:

  0.5585 d
u − u(p,1)  ≤ · c(p · 1) · |u|(p+1)·1,∞ · A(d, n) · hp+1 + O(hp+1
n )
n ∞ 2p+1 n

n ·n
= O(hp+1 d−1
), (4.28)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press



  1.117 d
u − u(p,1)  ≤ √ · c(p · 1) · |u|(p+1)·1,2 · A(d, n) · hp+1 + O(hp+1
n )
n 2 n
3 · 2p+1
n ·n
= O(hp+1 d−1
),
 √
  5 d
d · |u|(p+1)·1,∞ p
u − u(p,1)  ≤ 3.257 · √ · c(p · 1) · · hn + O(hpn )
n E
2 · 2p+1 1 − 2−p
= O(hpn ),
 √
  5 d
d · |u|(p+1)·1,2 p
u − u(p,1)  ≤ 3.257 · √ · c(p · 1) · · hn + O(hpn )
n E
3 · 2p+1 1 − 2−p
= O(hpn ).

Proof. Actually, there is just one major difference compared to the proof
of (3.68) dealing with the piecewise linear case. Now, owing to (4.16),
(p)
the polynomial degree p is not constant for all subspaces Wl neglected
(p,1)
in Vn , but depends on the respective level l. However, the influence of
all subspaces with p < p · 1 can be collected in a term of the order O(hp+1
n )
with respect to the L∞ - and L2 -norm or of the order O(hpn ) with respect
to the energy norm, if n ≥ p − 1: for sufficiently large n, each of those
subspaces involves at least one coordinate direction xj with lj = O(n) and
pj = p.
Therefore we can proceed as in the proof of (3.68) and assume a constant
degree p = p · 1. With (4.26) and (3.66) for s = p + 1, we get

    (p) 
u − u(p,1)  ≤ u 
n ∞ l ∞
|l|1 >n+d−1

≤ 0.5585d · c(p · 1) · 2−(p+1)·|l|1 · |u|(p+1)·1,∞ + O(hp+1
n )
|l|1 >n+d−1

= 0.5585d · c(p · 1) · |u|(p+1)·1,∞ · 2−(p+1)·|l|1 + O(hp+1
n )
|l|1 >n+d−1

≤ 0.5585d · c(p · 1) · |u|(p+1)·1,∞ · 2−(p+1)·(n+d) · A(d, n) + O(hp+1


n )

0.5585 d
= · c(p · 1) · |u|(p+1)·1,∞ · A(d, n) · hp+1
n + O(hp+1
n )
2p+1
= O(hp+1
n · nd−1 )

owing to the definition of A(d, n) in (3.65), and, by analogy, the correspond-

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


ing result for the L2 -norm. Concerning the energy norm, we have
    (p) 
u − u(p,1)  ≤ u 
n E l E
|l|1 >n+d−1
  1/2
5 d/2  
d
≤ 3.257 c(p · 1)|u|(p+1)·1,∞ · 2−(p+1)·|l|1 · 4lj
8
|l|1 >n+d−1 j=1

+ O(hpn )
and, as for the linear case,
 
u − u(p,1)
n

E
 ∞
 d 1/2
5 d/2   
≤ 3.257 c(p · 1)|u|(p+1)·1,∞ · 2−(p+1)·i · 4lj
8
i=n+d |l|1 =i j=1

+ O(hpn )
 d/2 ∞

5
≤ 3.257 c(p · 1)|u|(p+1)·1,∞ · d · 2−p·i + O(hpn )
8
i=n+d
 √
5 d
d · |u|(p+1)·1,∞ p
≤ 3.257 √ c(p · 1) · hn + O(hpn )
2 · 2p+1 1 − 2−p
= O(hpn ),
 d 
lj 1/2 ≤ d · 2i as in the proof of (3.68). The second
because |l|1 =i j=1 4
energy estimate can be obtained in an analogous way.
This theorem shows that our approach, indeed, leads to a sparse grid
(p,1)
approximation of higher order. For the space Vn , i.e., for a maximum
degree of p in each direction, we get an interpolation error of the order
O(hp+1
n · | log2 hn |d−1 ) with respect to both the L∞ - and the L2 -norm. For
the energy norm, the result is an error of the order O(hpn ).
Of course, the above optimization process can be based on the energy
norm, too. For that, we start from (4.26) and define the local cost–benefit
ratio, now with respect to the energy norm (cf. (3.69)):
bE (l)
cbrE (l) := (4.29)
c(l)
  d 
5 d 
2 −|2·l·p+3·l|1
= 10.608 · · c (p) · 2 · lj
4 · |u|2p+1,∞ .
4
j=1

Although we do not want to study the energy-based sparse grid spaces of


(p,E)
higher order Vn resulting from (4.29) in detail, we shall nevertheless

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


mention their most important properties. First, we can show that

Vn(E) ⊆ Vn(p,E) ⊆ Vn(1) (4.30)


(p,E)
is valid for arbitrary n. Second, as long as Vn is not the same space as
(1)
Vn , we are rid of the log-factors, again – both concerning the overall cost
(which is of the order O(h−1n )) and the approximation quality with respect
to the energy norm (which is of the order O(hpn )).
We close this discussion by briefly returning to the notion of ε-complexity.
As in the discussion of the piecewise linear counterpart, all occurring order
terms have to be read with respect to ε or N . That is, for the following,
both d and p are supposed to be arbitrary, but fixed.

(p) (p) (p)


Lemma 4.9. For the ε-complexities N∞ (ε), N2 (ε), and NE (ε) of the
(p,1) (p,1)
problem of computing the interpolant un ∈ Vn with respect to the L∞ -,
L2 -, and the energy norm for given accuracy ε, the following relations hold:
(p)
 − 1 p+2
·(d−1) 
N∞ (ε) = O ε p+1 · | log2 ε| p+1 , (4.31)
(p)  − 1 p+2
·(d−1) 
N2 (ε) = O ε p+1 · | log2 ε| p+1 ,
(p)  − 1 
NE (ε) = O ε p · | log2 ε|d−1 .

Conversely, given a number of N of grid points, the following accuracies can


be obtained with respect to the different norms:
 −(p+1) 
ε(p)
∞ (N ) = O N · | log2 N |(p+2)·(d−1) , (4.32)
(p)  
ε2 (N ) = O N −(p+1) · | log2 N |(p+2)·(d−1) ,
(p)  
εE (N ) = O N −p · | log2 N |p·(d−1) .

Finally, for the same problem tackled by the energy-based sparse grid ap-
(p,E)
proximation space Vn , we get
(p)  −1  (p)  
NE (ε) = O ε p and εE (N ) = O N −p . (4.33)

Proof. The proof follows exactly the argumentation of the linear case.

(p)
In comparison with the full grid case, where we get N∞ (ε) = O(ε−d/(p+1) )
(p)
and ε∞ (N ) = O(N −(p+1)/d ) with respect to the L∞ - or L2 -norm and
(p) (p)
NE (ε) = O(ε−d/p ) and εE (N ) = O(N −p/d ) with respect to the energy
(p,1)
norm, as before, the sparse grid space Vn lessens the curse of dimension-
(p,E)
ality in a significant manner; Vn , however, completely overcomes it.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


4.3. Interpolets
Another hierarchical multiscale basis with higher-order functions is given by
the interpolet family (Deslauriers and Dubuc 1989, Donoho and Yu 1999).
These functions are obtained from a simple but powerful interpolation pro-
cess. For given data y(s), s ∈ Z, we seek an interpolating function y : R → R
which is as smooth as possible. To this end, in a first step, the interpolated
values in Z + 12 are determined. Here, for the determination of y(s + 12 ),
the Lagrangian polynomial p(x) of degree 2n − 1 is calculated which inter-
polates the data in s − n + 1, . . . , s + n. Then we set y(s + 12 ) := p(s + 12 ).
The parameter n later determines the smoothness of the interpolant and
the degree of polynomial exactness. Since the values y(Z) and y(Z + 12 ) are
now known, we can compute the values y(Z + 14 ) and y(Z + 34 ) using the
same scheme. Here, for example, the values y(s + 12 (−n + 1)), . . . , y(s + n2 )
are used for y(s + 14 ). This way, interpolation values for y can be found on a
set which is dense in R. Since the interpolant depends linearly on the data,
there exists a fundamental function F with

y(x) = y(s)F (x − s).
s∈Z

F is the interpolant for the data y(s) = δ0,s , s ∈ Z. Now, the interpolet
mother function of the order 2n is just φ := F . A hierarchical multiscale
basis is formed from that φ by dilation and translation as in (3.8). The
function φ has the following properties.

Scaling equation. φ is the solution of



φ(x) = hs φ(2x − s). (4.34)
s∈Z

The mask coefficients h := {hs }s∈Z are given by h0 = 1, hs = h−s (s ∈ Z),


and
 0     
1 30 ··· (2n − 1)0 h1 1/2
 12 32 ··· (2n − 1)2     
   h3   0 
 .. .. .. .. ·
  ..   .. .
= (4.35)
 . . . .   .   . 
12n−2 32n−2 · · · (2n − 1)2n−2 h2n−1 0

All other hs are zero.

Compact support.

supp φ = [−2n + 1, 2n − 1]. (4.36)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


−2 −1 0 1 2 −4 −3 −2 −1 0 1 2 3 4

−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6

Figure 4.9. The interpolets φ for N = 2, 4 and 6.

Polynomial exactness. In a pointwise sense, polynomials of degree less


than N = 2n can be written as linear combinations of translates of φ, e.g.,

∀ 0 ≤ i < N : ∀x ∈ R : xi = si φ(x − s). (4.37)
s∈Z

Interpolation property. With δ(x) denoting the Dirac functional, we get


∀s ∈ Z : φ(s) = δ(s). (4.38)
This is the main property which distinguishes interpolating multiscale
bases from other non-interpolating multiscale bases. It allows particularly
simple multilevel algorithms for the evaluation of nonlinear terms.
The functions φ for different values of the parameter N = 2n are given in
Figure 4.9. Note that there is not much difference between the interpolets
with N = 4 and N = 6. This behaviour is quite similar to that of the
higher-order polynomials of Section 4.2, which were created by hierarchical
Lagrangian interpolation. Note, furthermore, that whilst interpolets are
defined on the whole of R, their construction can easily be adapted to a
bounded interval; see Koster (2002) for the respective construction.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Now, we can use such a higher-order interpolet function φ as the mother
function in (3.7) for the definition of the 1D hierarchical basis {φlj ,ij (xj )} in
(3.8). Again, as in our introductory example of piecewise linear hierarchical
bases of Section 3.1, we use these 1D functions as the input of the tensor
product construction, whichd provides a suitable piecewise d-dimensional
basis function φl,i (x) := j=1 φlj ,ij (xj ) in each grid point xl,i . In an ana-
logous way to that of Sections 3 and 4.2, we can derive sparse grids based
on interpolets. Also, all other considerations regarding the estimates for the
cost and error complexities carry over accordingly.

4.4. Prewavelets and wavelets


Note again that the presented hierarchical multiscale bases with higher-
order polynomials or interpolets, after the tensor product construction, al-
low for a relatively cheap evaluation of differential operators and discretiz-
ation schemes on sparse grids in a Galerkin approach owing to their inter-
polating properties. Also, upper error estimates can be easily derived for
various sparse grid spaces following the arguments in Sections 3.2 and 4.2.
However, they all exhibit a main drawback: there is no lower error estimate
with constants independent of the number of levels involved, that is, they
form no stable multiscale splitting (Oswald 1994). The consequences are
twofold. First, the (absolute) value of the hierarchical coefficient is just a
local error indicator and no true error estimator. We obtain sufficiently
refined sparse grids (compare the associated numerical experiments in Sec-
tion 5) on which the error is properly reduced, but it may happen that too
many grid points are employed for a prescribed error tolerance in an adap-
tive procedure. Efficiency is thus not guaranteed. Second, the condition
number of the linear system which results from a symmetric elliptic par-
tial differential operator in multiscale basis representation is, after diagonal
scaling, in general not independent of the finest mesh size involved.3 To
obtain a fast mesh independent solver, additional lifting tricks (Sweldens
1997, Daubechies and Sweldens 1998, Koster 2002) or multigrid-type exten-
sions (Oswald 1994, Griebel 1994a, Griebel and Oswald 1994, Griebel and
Oswald 1995a, Griebel and Oswald 1995b) are necessary. These difficulties
are avoided if we employ stable multiscale splittings (Oswald 1994, Dah-
men and Kunoth 1992, Carnicer, Dahmen and Pena 1996, Cohen 2003) and
the respective L2 - and H1 -stable multiscale bases, for which two-sided es-
timates exist.

3
This is the case for stable splittings with wavelets. Then a simple fast solver results
from the diagonal scaling preconditioner: see Dahmen and Kunoth (1992) and Oswald
(1994).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


1 1

−1/2 −1/2 −−1/2

Figure 4.10. Non-orthogonal linear spline


wavelet (left: interior; right: boundary).

Since Daubechies’ fundamental discovery of orthogonal multiscale bases


with local support, i.e., the classical wavelets (Daubechies 1988), an enorm-
ous literature has arisen. At present, there exists a whole zoo of such bases
from the wavelet family. Here, we stick to simple hierarchical spline-like
function systems. Within this class, we have orthonormal wavelets (Lemarié
1988), biorthogonal spline wavelets (Cohen, Daubechies and Feauveau 1992),
semi-orthogonal spline wavelets (Chui and Wang 1992), fine grid correction
spline wavelets (Cohen and Daubechies 1996, Lorentz and Oswald 1998)
and multiwavelets derived from splines. The construction principles of such
functions are highly developed, and there is abundant literature on them.
Most theory can be found in Daubechies (1992), Chui (1992) and Meyer
(1992), and the references cited therein. Further reading is Daubechies
(1988, 1993), and Cohen et al. (1992, 2001). A nice introduction, similar in
spirit to this paper, is given in Cohen (2003).
In the following, let us briefly mention the simplest but also cheapest
mother wavelets which are made up from piecewise linears by stable com-
pletion procedures (Carnicer et al. 1996). They are sufficient to be used for
a second-order PDE within the Galerkin method.
These are the so-called linear pre-prewavelets, which are a special case of
non-orthogonal spline wavelets (Cohen and Daubechies 1996, Lorentz and
Oswald 1998, Stevenson 1996) of linear order. The corresponding mother
function φ is shown in Figure 4.10 (left). Again, by translation and dilation
in an analogous way to (3.8), we get a 1D multilevel basis from this φ.
Note that a modification is necessary near the boundary. For homogeneous
Dirichlet conditions, the scaled and dilated function in Figure 4.10 (right)
can be used in points next to the boundary.
Another, still reasonably cheap approach involves linear prewavelets, a
special case of semi-orthogonal spline wavelets (Chui and Wang 1992, Griebel

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


1
9/10

1/10 1/10 1/10

−6/10 −6/10 −6/10

Figure 4.11. Semi-orthogonal linear spline


wavelet (left: interior; right: boundary).

and Oswald 1995b). The corresponding mother function φ is shown in Fig-


ure 4.11 (left). Again, a modification is necessary near the boundary. For
homogeneous Dirichlet conditions, this is shown in Figure 4.11 (right). A
construction for Neumann conditions can be found in Griebel and Oswald
(1995b).
Finally, let us mention the so-called lifting wavelets. They are the result
of the application of Sweldens’ lifting scheme (Carnicer et al. 1996, Sweldens
1997) to the interpolet basis {φlj ,ij (xj )} or other interpolatory multiscale
bases, for example. Here, the hierarchical (i.e., if we consider odd indices
only; see (3.11)) lifting wavelets are defined by
 lj
φ̂lj ,2ij +1 := φ(lj ,2ij +1) + Qsj ,2ij +1 φ(lj −1,sj )
sj

j l
on finer levels lj > 0. The basic idea is to choose the weights Qs,ij
in this
linear combination in such a way that φ̂lj ,ij has more vanishing moments
than φlj ,ij , and thus to obtain a stabilization effect. If we apply this ap-
proach to the hierarchical interpolet basis of Section 4.3 so that we achieve
two vanishing moments in the lifting wavelet basis, we end up with
1
φ̂lj ,2ij +1 = φ(lj ,2ij +1) − (φ(lj −1,ij ) + φ(lj −1,ij +1) )
4
for the odd indices 2ij + 1. The corresponding mother function φ̂ is shown
in Figure 4.12.
Again, as in our introductory example of piecewise linear hierarchical
bases of Section 3.1, we can use these 1D multilevel basis functions as the
input of the tensor product construction which provides a suitable piecewise
d-dimensional basis function φl,i (x) in each grid point xl,i . As in Sections 3

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


3/4

−1/4 −1/4

Figure 4.12. Lifting interpolet with N = 2


and two vanishing moments.

and 4.2, we can derive sparse grids based on wavelets. Also, most other
considerations regarding the estimates for the cost and error complexities
carry over accordingly. Further information on wavelets and sparse grids
can be found in Griebel and Oswald (1995b), Hochmuth (1999), DeVore,
Konyagin and Temlyakov (1998), Koster (2002), Knapek (2000a), Griebel
and Knapek (2000) and the references cited therein.
At the end of this subsection, let us dwell a bit into theory, and let us
give an argument why there is a difference between hierarchical polynomials
and interpolets on the one hand and prewavelets and pre-prewavelets on the
other hand.
For all mother functions and the resulting product multiscale bases and
sparse grid subspaces, we can, in principle, follow the arguments and proofs
of Sections 3 and 4.2, respectively, to obtain cost complexity estimates and
upper error bounds with relatively sharp estimates for the order constants.
The main tools in the proofs are the simple triangle inequality and geometric
series arguments. However, as already mentioned, a lower bound for the
error can not be obtained so easily. An alternative approach is that of
Griebel and Oswald (1995b) and Knapek (2000a), where we developed a
technique for which two-sided error norm estimates for the 1D situation can
be carried over to the higher-dimensional case. The approach is based on
the representation of Sobolev spaces H s ([0, 1]d ), s ≥ 0, as
d
H s ([0, 1]d ) = L2 ([0, 1]) ⊗ · · · ⊗ L2 ([0, 1]) ⊗H s ([0, 1])⊗ (4.39)
! "# $
i=1
(i−1) times

L2 ([0, 1]) ⊗ · · · ⊗ L2 ([0, 1]) .


! "# $
(d−i) times

s ([0, 1]d ), s ≥ 0, on the other hand is defined as the


The Sobolev space Hmix

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


simple tensor product
s
Hmix ([0, 1]d ) = H s ([0, 1]) ⊗ · · · ⊗ H s ([0, 1]) .
! "# $
d times

Now, for the different components of the intersection, we obtain two-


sided norm estimates, if the univariate multiscale functions φlj ,ij of (3.8)
allow two-sided norm estimates for both H s and L2 .
Theorem 4.10. Let the univariate multiscale basis {φlj ,ij } satisfy the
norm equivalence
 
u2H s ∼ 2−lj 22lj s |ulj ,ij |2 , (4.40)
lj ij

where u(xj ) = lj ,ij ulj ,ij φlj ,ij (xj ), and for −γ1 < s < γ2 , γ1 , γ2 > 0. Then,
the multivariate basis functions {φl,i (x)} fulfil the norm equivalences
 2
  
2  
uH s =  ul,i φl,i  ∼ 22s|l|∞ |ul,i |2 2−|l|1 ,
  s
l,i H l,i
 2
  
 
u2Hmix
s = ul,i φl,i  ∼ 22s|l|1 |ul,i |2 2−|l|1 ,
  s
l,i Hmix l,i

where u(x) = l,i ul,i φl,i (x).

Here, ∼ denotes a two-sided equivalence, i.e., a ∼ b means that there


exist positive constants c1 , c2 such that c1 · b ≤ a ≤ c2 · b. Note the distinct
difference in the quality of the two estimates. It is exactly the difference
between |l|∞ and |l|1 which leads to a significant reduction in ε-complexity,
i.e., it allows us to use substantially fewer degrees of freedom to reach the
same truncation error in the Hmix s -norm than in the H s -norm.

For the proof and further details, see Griebel and Oswald (1995b), Knapek
(2000a) and Koster (2002). Here, the bounds γ1 , γ2 for the range of the
regularity parameter s depend on the specific choice of the mother func-
tion φ. The value γ2 is determined by the Sobolev regularity of φ, i.e.,
γ2 = sup{s : φ ∈ H s }. The theory here works with biorthogonality ar-
guments and dual spaces: our univariate multiscale basis {φlj ,ij } possesses
a dual basis {φ̃lj ,ij }, in the sense that (for our hierarchical functions4 ) the

4
In the general setting, there is a multiresolution analysis with spaces spanned by scaling
functions and difference spaces spanned by wavelets. Then, biorthogonality conditions
must be fulfilled between all these function systems. In the hierarchical approach,
certain scaling functions are simply wavelets, and the biorthogonality conditions are
reduced.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


biorthogonality relations (Cohen et al. 1992)
φ̃lj ,ij , φlj ,kj  = δij ,kj , φ̃lj −1,ij , φlj ,kj  = δij ,kj .
If the primal and dual functions are at least in L2 , then . , . is the usual
L2 -inner product. The value of γ1 is here just the Sobolev regularity of the
dual mother function φ̃.
Note that, with γ1 , γ2 > 0, the 1D norm equivalence for s = 0, and
thus the L2 -stability of the 1D hierarchical basis {φlj ,ij }, is a prerequisite
in Theorem 4.10. However, the linear hierarchical bases of Section 3, the
higher-order Lagrangian interpolation functions of Section 4.2 and the inter-
polets of Section 4.3 do not fulfil this prerequisite, so they are not L2 -stable.5
Then we obtain simply one-sided estimates. For these function systems, we
get the same upper estimates with our simple tools as the triangle inequal-
ity and the geometric series arguments, but we can pinpoint, additionally,
quite sharp bounds for the constants. The one-sided estimates still allow
us to use the hierarchical coefficients |ul,i | (e.g., after suitable weighting) as
error indicators in a refinement procedure, but give us no true error estim-
ator. To this end, grids are constructed in an adaptation process steered
by the indicator such that a prescribed global error tolerance is reached.
This aim, however, might be reached by using more points than necessary.
For the function systems built from the wavelet-type mother functions, the
prerequisite of L2 -stability is fulfilled, and we obtain two-sided estimates.
Thus, the wavelet coefficients |ul,i | can serve as local error estimators, and
the lower estimate part then gives us efficiency of the corresponding ad-
aptive scheme, which means that only the necessary number of points (up
to a constant) to reach a prescribed error tolerance is employed in a grid
adaptation process. Furthermore, a fast solver can now be gained with level-
independent convergence rate by simple diagonal scaling as a preconditioner
in the multiscale system.
Note that from relation (4.39) it also becomes clear that the constants in
the multivariate norm equivalence are more or less given by the d th power
of the constants in the univariate equivalence. This causes rather large
constants for higher dimensionality d.

4.5. Sparse grid applications


PDE discretization techniques
Since its introduction, the sparse grid concept has been applied to most
of the relevant discretization schemes for PDEs. These are finite element

5
The reason is that the dual ‘functions’ are Dirac functionals. Then γ1 = − 12 and the
norm equivalency (4.40) is only valid for s ∈] 12 , γ2 [, i.e., it does not hold for s = 0.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
1
0.5 1.4
0 1.2
1
−0.5 0.8
0.6
−1 0.4
0.2
−1.5 0

Figure 4.13. Laplace equation with a singularity on the boundary:


adaptive sparse grid (left) and corresponding solution (right).

methods and Galerkin techniques, finite differences, finite volumes, spec-


tral methods, and splitting extrapolation, which leads to the combination
technique.
The first main focus of the development of sparse grid discretization
methods was on piecewise linear finite elements. In the pioneering work
of Zenger (1991) and Griebel (1991b), the foundations for adaptive re-
finement, multilevel solvers, and parallel algorithms for sparse grids were
laid. Subsequent studies included the solution of the 3D Poisson equa-
tion Bungartz (1992a, 1992b), the generalization to arbitrary dimensional-
ity d (Balder 1994) and to more general equations (the Helmholtz equation
(Balder and Zenger 1996), parabolic problems using a time–space discretiz-
ation (Balder, Rüde, Schneider and Zenger 1994), the biharmonic equation
(Störtkuhl 1995), and general linear elliptic operators of second order in 2D
(Pflaum 1996, Dornseifer and Pflaum 1996). As a next step, the solution
of general linear elliptic differential equations and, via mapping techniques,
the treatment of more general geometries was implemented (Bungartz and
Dornseifer 1998, Dornseifer 1997) (see Figure 4.13). Since then, algorithmic
improvements for the general linear elliptic operator of second order have
been studied in detail (Schneider 2000, Achatz 2003a, Achatz 2003b).
In the first experiments with polynomial bases of higher-order, bicubic
Hermite bases for the biharmonic equation (Störtkuhl 1995) and hierarch-
ical Lagrange bases of degree two (Bungartz 1996) as well as of arbitrary
degree (Bungartz and Dornseifer 1998, Bungartz 1997) were studied. The
general concept of the hierarchical Lagrangian interpolation on sparse grids
was introduced in Bungartz (1998). Afterwards, the higher-order concept
was combined with a duality-based adaptivity approach (Schneider 2000)
and with operator extension (Achatz 2003a). In addition to that, aspects
of complexity, especially for the solution of the Poisson equation (Bungartz

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


and Griebel 1999), were discussed with regard to Werschulz (1995). Fur-
thermore, besides the classical hierarchical bases according to Section 3,
prewavelet bases (Oswald 1994, Griebel and Oswald 1995b) and wavelet de-
compositions (Sprengel 1997a) were considered. Also, the parallelization of
the sparse grid algorithms was pursued: see Pfaffinger (1997), Hahn (1990),
Griebel (1991a, 1991b).
Another development in the discretization of PDEs on sparse grids us-
ing second-order finite differences was derived and implemented in Griebel
(1998), Griebel and Schiekofer (1999), Schiekofer (1998), and Schiekofer
and Zumbusch (1998). There, based on Taylor expansions and sufficient
regularity assumptions, the same orders of consistency can be shown as
they are known from the comparable full grid operators. This means, in
particular, that the typical log-term does not occur. Generalizations to
higher-order finite difference schemes using conventional Taylor expansion
(Schiekofer 1998) and interpolets (Koster 2002) as discussed in Section 4.3
were also developed. The conceptual simplicity of finite difference schemes
allowed for straightforward adaptive refinement strategies based on hashing
(Griebel 1998, Schiekofer 1998), for the application of sparse grids to elliptic
problems with general coefficient functions, and for the handling of general
domains and nonlinear systems of parabolic PDEs of reaction–diffusion type
(Schiekofer 1998). This opened the way to a first solution of the Navier–
Stokes equations on sparse grids: see Schiekofer (1998), Griebel and Koster
(2000), Koster (2002) and Griebel and Koster (2003).
Furthermore, the so-called combination technique (Griebel, Schneider and
Zenger 1992), an extrapolation-type sparse grid variant, has been discussed
in several papers (Bungartz, Griebel, Röschke and Zenger 1994b, 1994c,
Bungartz, Griebel and Rüde 1994a, Pflaum and Zhou 1999). Because a
sparse grid can be represented as a superposition of several (much coarser)
full grids Ωl (see Figure 4.14 for the 2D case), a sparse grid solution for some
PDEs can be obtained from the linear combination of solutions ul computed
on the respective coarse grids. Since the latter ones are regular full grids,
existing solvers can be used without any need for an explicit discretization
on a sparse grid. For the 2D case, we get
 
un(c) (x) := ul (x) − ul (x), (4.41)
|l|1 =n+1 |l|1 =n

and the 3D combination formula is given by


  
un(c) (x) := ul (x) − 2 · ul (x) + ul (x). (4.42)
|l|1 =n+2 |l|1 =n+1 |l|1 =n

Note that, in contrast to (3.19), where ul ∈ Wl , ul denotes the approxim-


ate solution of the underlying PDE on the regular full grid Ωl . Compared
with the direct discretization on sparse grids, the combination technique

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


 qqqqqqqqq
qq q q qq q q q Ω(3,1)
q q q q q q q q q
q q q q q
q q q q q
+ qq q q qq q q qq Ω(2,2)
q q q q q
qq q q q qq q q q qq qq
q
qq
q
qq
q
qq q qq q qq + qqq qq qq Ω(1,3)
qqqq q q q qqqq q q q qqq = 
q qq qq
qq q q q qq q q q qq qq qq qq
(1) q q q q q
Ω3
- q q q q q Ω(2,1)
q q q q q
q q q
- qqq qqq qqq Ω(1,2)
 q q q

Figure 4.14. Combination technique for d = 2:


combine coarse full grids Ωl , |l|1 ∈ {n, n + 1}, with
(1)
mesh widths 2−l1 and 2−l2 to get a sparse grid Ωn
(1)
corresponding to Vn .

turns out to be advantageous in two respects. First, the possibility of us-


ing existing codes allows the straightforward application of the combina-
tion technique to complicated problems. Second, since the different sub-
problems can be solved fully in parallel, there is a very elegant and efficient
inherent coarse-grain parallelism that makes the combination technique per-
fectly suited to modern high-performance computers (Griebel 1992, 1993).
This has been shown in a series of papers dealing with problems from com-
putational fluid dynamics, including turbulence simulation (Bungartz and
Huber 1995, Huber 1996b, Griebel and Thurner 1993, Griebel, Huber and
Zenger 1993a, Griebel 1993, Griebel and Thurner 1995, Griebel and Huber
1995, Griebel, Huber and Zenger 1996, Huber 1996a). Furthermore, note
that there are close relations to the above-mentioned Boolean and blending
methods (Delvos 1982, Delvos and Schempp 1989, Gordon 1969, Gordon
and Hall 1973, Baszenski et al. 1992, Hennart and Mund 1988) as well as
to the so-called splitting extrapolation method (Liem, Lu and Shih 1995).
Concerning alternative discretization schemes, there are ongoing investig-
ations on sparse grid finite volume methods for the Euler equations (Hemker
1995, Hemker and de Zeeuw 1996, Koren, Hemker and de Zeeuw 1996). In
this context, a variant – the so-called sets of grids or grids of grids – has been
developed (Hemker and Pflaum 1997, Hemker, Koren and Noordmans 1998).
This approach goes back to the notion of semi-coarsening (Mulder 1989,
Naik and van Rosendale 1993). Furthermore, spectral methods have been

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


implemented on sparse grids and studied for PDEs with periodic boundary
conditions (Kupka 1997).

Adaptivity and fast solvers


Especially in the context of PDE, two ingredients are essential for each kind
of discretization scheme: adaptive mesh refinement and fast linear solvers.
Concerning the first, grid adaptation based on local error estimators (typ-
ically closely related to the hierarchical surplus, i.e., the hierarchical basis
coefficients) is a very widely used approach (Zenger 1991, Bungartz 1992a,
1992b, Griebel 1998, Koster 2002, Griebel and Koster 2000). However, in
Schneider (2000), the dual problem approach (Becker and Rannacher 1996,
Eriksson, Estep, Hansbo and Johnson 1996) has been successfully applied to
sparse grids, allowing for global error estimation too. As a result, the error
control can be governed by some appropriate linear error functional, and
the respective refinement strategies include l2 -driven as well as point-driven
adaptation.
Concerning the efficient multilevel solution of the resulting linear systems,
a number of contributions using additive and multiplicative subspace correc-
tion schemes have been made (Griebel 1991a, 1991b, Pflaum 1992, Griebel,
Zenger and Zimmer 1993b, Griebel 1994b, Griebel and Oswald 1994, 1995a,
1995b, Bungartz 1997, Pflaum 1998), showing the availability of fast al-
gorithms for the solution of the linear systems resulting from sparse grid
discretizations.

Applications
Flow problems were the first focus for the use of sparse grid PDE solv-
ers. Now, however, sparse grids are also used for problems from quantum
mechanics (Garcke 1998, Garcke and Griebel 2000, Yserentant 2004, Hack-
busch 2001), for problems in the context of stochastic differential equations
(Schwab and Todor 2002, 2003), or for the discretization of differential forms
arising from Maxwell’s equations (Hiptmair and Gradinaru 2003).
Aside from the field of PDEs, sparse grids are being applied to a vari-
ety of problems that will not be forgotten here. Among these problems
are integral equations (Frank, Heinrich and Pereverzev 1996, Griebel, Os-
wald and Schiekofer 1999, Knapek and Koster 2002, Knapek 2000b), general
operator equations (Griebel and Knapek 2000, Knapek 2000a, Hochmuth,
Knapek and Zumbusch 2000), eigenvalue problems (Garcke 1998, Garcke
and Griebel 2000), periodic interpolation (Pöplau and Sprengel 1997), in-
terpolation on Gauss–Chebyshev grids (Sprengel 1997b), Fourier transforms
(Hallatschek 1992), tabulation of reduced chemical systems (Heroth 1997),
digital elevation models and terrain representation Gerstner (1995, 1999),
audio and image compression (Frank 1995, Paul 1995) (see Figure 4.15),
and possibly others not listed here.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 4.15. Data compression with
sparse grids: Mozart’s autograph of
the fourth movement of KV 525
‘Eine kleine Nachtmusik’. Top left:
original image. Top right: compressed
version. Right: corresponding adaptive
sparse grid.

Of course, owing to the historic background described in Section 4.1,


numerical quadrature has always been a hot topic in sparse grid research.
Starting from the explicit use of the piecewise linear basis functions to calcu-
late integrals Bonk (1994b, 1994a), quadrature formulas based on the mid-
point rule (Baszenski and Delvos 1993), the rectangle rule (Paskov 1993),
the trapezoidal rule (Bonk 1994a), the Clenshaw–Curtis rule (Cools and
Maerten 1997, Novak and Ritter 1998), the Gauss rules (Novak and Ritter
1997), and the Gauss–Patterson rules (Gerstner and Griebel 1998, Petras
2000) were used as 1D input for Smolyak’s principle according to (4.1). Fur-
thermore, the higher-order basis polynomials from Section 4.2 were used for
an adaptive quadrature algorithm (Bungartz and Dirnstorfer 2003). These
techniques can be used advantageously for the computation of stiffness mat-
rix entries in conventional finite element methods and especially in partition
of unity methods (Griebel and Schweitzer 2002, Schweitzer 2003). Gerstner

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


and Griebel (2003) developed a generalization of the conventional sparse
grid approach which is able to adaptively assess the dimensions according
to their importance. For further information on numerical quadrature on
sparse grids and on data mining, another interesting field of application, we
refer to the respective numerical examples in Section 5.3.

5. Numerical experiments
In this section, we present a collection of numerical results for different
problems solved on sparse grids. We start with the discussion of the basic
features of sparse grid methods for PDEs, applied to simpler 2D and 3D
model problems. Then we turn to the solution of the Navier–Stokes equa-
tions on sparse grids. Finally, we illustrate the potential of sparse grids
for problems of a higher dimensionality, here in the context of numerical
quadrature and data mining.

5.1. PDE model problems


We start with some PDE model problems to demonstrate the interpolation
properties of the L2 -based sparse grid spaces presented as well as their
behaviour concerning the approximation of a PDE’s solution. We omit
examples with energy-based sparse grids, since the situation with respect
to the energy norm in a finite element context is completely clear owing to
Céa’s lemma, whereas the L2 -quality of the finite element solution is still
an open question.
Hence we choose a standard Ritz–Galerkin finite element setting and use
the hierarchical Lagrangian basis polynomials from Section 4.2 in the p-
regular way, i.e., choosing the local degree as high as possible according to
(4.16) up to some given maximum value for p. Note that a Petrov–Galerkin
approach with the higher-order basis functions appearing in the approx-
imation space only leads to the same approximation properties. We will
not discuss the algorithmic scheme which is applied to ensure the linear
complexity of the matrix–vector product, the so-called unidirectional prin-
ciple (Bungartz 1998), allowing us to treat the single dimensions separately.
However, note that, especially in the case of more general operators, this
is actually the challenging part of a sparse grid finite element implement-
ation; see Dornseifer (1997), Bungartz and Dornseifer (1998), Schneider
(2000), and Achatz (2003a, 2003b). Concerning the iterative solution of the
resulting linear systems, several multilevel schemes are available, most of
them based upon the semi-coarsening approach of Mulder (1989) and Naik
and van Rosendale (1993) and its sparse grid extension (Griebel 1991b,
1991a, Griebel et al. 1993b, Griebel and Oswald 1995b, Pflaum 1992, 1998,
Bungartz 1997). In the context of solvers, it is important that the influence

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.1. Condition numbers of the
(diagonally preconditioned) stiffness
(p,1)
matrix for the sparse grid spaces Vn .

of the polynomial degree starting from p = 2 be only moderate, which is


in contrast to the behaviour known from hierarchical polynomials in a p-
or h-p-version context (cf. Zumbusch (1996), for example). Figure 5.1 hints
this, showing spectral condition numbers of the stiffness matrix.
Our model problems cover regular and adaptively refined sparse grids in
two and three dimensions, mainly using the Laplacian as the operator, but
also the second-order linear operator with more general coefficients. For
measuring the error, we consider the error’s discrete maximum norm, the
discrete l2 -norm, and the error in the centre point P = (0.5, . . . , 0.5). Where
grid adaptation is used, it is based on a sparse grid extension of the dual
problem approach (Becker and Rannacher 1996, Eriksson et al. 1996); see
Schneider (2000).

The 2D Laplace equation


On Ω̄ = [0, 1]2 , let
∆u(x) = 0 in Ω, (5.1)
sinh(πx2 )
u(x) = sin(πx1 ) · on ∂Ω,
sinh(π)
Figure 5.2 shows the smooth solution u(x) of (5.1). We study the accuracy
of the hierarchical Lagrangian approach for the regular sparse grid spaces
(p,1)
Vn with polynomial degrees p ∈ {1, . . . , 6}. Figure 5.3 illustrates the
maximum and the l2 -error. The effect of the higher-order approximation can
be seen clearly. For p = 6, we already get troubles with machine accuracy
because of the use of standard double precision floating point numbers.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


sinh(πx2 )
Figure 5.2. Solution u(x) = sin(πx1 ) · sinh(π) of (5.1).

Figure 5.3. Example (5.1): maximum and l2 -error for regular


(p,1)
sparse grid spaces Vn , p ∈ {1, . . . , 6}.

In Figure 5.4, a summary of the convergence behaviour with respect to all


error norms regarded and for p ∈ {1, . . . , 6} is provided. In addition to the
error plots, we show the curves of expected sparse grid convergence (sgc)
due to the results (3.88) and (4.32), stating
 
ε∞ (N ) = O N −2 · | log2 N |3·(d−1) , (5.2)
(p)
 −(p+1) (p+2)·(d−1)

ε∞ (N ) = O N · | log2 N | .
These curves indicate the (asymptotic) behaviour of the ε-complexity of
sparse grids with respect to the problem of interpolating a given function.
For all polynomial degrees presented, we observe a rather striking corres-
pondence of theoretical results concerning the mere quality of interpolation
and experimental results for the accuracy of calculating approximate solu-
tions of PDEs. Note that this correspondence is still an open question with
respect to the L2 - or maximum norm, in contrast to the energy norm for
which the answer is provided by Céa’s lemma.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.4. Example (5.1): convergence on regular sparse grid
(p,1)
spaces Vn for p ∈ {1, 2, 3} (left) and p ∈ {4, 5, 6} (right);
the solid lines indicate the respective expected sparse grid
convergence (sgc) due to the interpolation accuracy.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


0.1·(x1 +0.1)
Figure 5.5. Solution u(x) = (x1 +0.1)2 +(x2 −0.5)2 of example (5.3).

(p,1)
Figure 5.6. Example (5.3): convergence on Vn for p = 1 (left)
and p = 2 (right); the solid lines indicate the respective
expected sparse grid convergence (sgc; position of curve chosen
for clarity).

Next, we turn to adaptively refined sparse grids. On Ω̄ = [0, 1]2 , let


∆u(x) = 0 in Ω, (5.3)
0.1 · (x1 + 0.1)
u(x) = on ∂Ω.
(x1 + 0.1)2 + (x2 − 0.5)2
In x = (−0.1, 0.5), u(x) has a singularity which is outside Ω̄, but close to
the boundary ∂Ω. Figure 5.5 shows the solution u(x).
(1,1)
Figure 5.6 illustrates the case of the regular sparse grid spaces Vn (left)
(2,1)
and Vn (right). Obviously, things are not too convincing for reasonable
N . However, decisive progress can be made if we apply our l2 -adaptivity.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.7. Example (5.3): maximum and l2 -error for adaptive
refinement, p ∈ {1, 2, 6}.

Figure 5.8. Example (5.3): error on regular (left) and adaptive


sparse grid (right).

Figure 5.7 shows a gain in accuracy with higher p that is comparable to the
smooth situation of (5.1).
In Figure 5.8, we compare the error on a regular sparse grid and on an
adaptively refined one. As expected, the l2 -adaptation process reduces the
error equally over the whole domain. In contrast to that, regular sparse
grids show large errors near the singularity.
As for (5.1), the achieved accuracy will be compared to the theoretical
results concerning the ε-complexity of interpolation on sparse grids. In Fig-
ure 5.9, again for p = 1, the correspondence is striking. For p = 6, it
seems that the asymptotic behaviour needs bigger values of N to appear.
This was to be expected, since our hierarchical Lagrangian basis polyno-
mials of degree p = 6 need at least level 5 to enter the game. However,
the adaptation process causes a delayed creation of new grid points in the

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.9. Example (5.3): convergence with adaptive mesh
refinement for p = 1 (left) and p = 6 (right); the solid lines
indicate the respective expected sparse grid convergence (sgc;
position of curve chosen for clarity).

Figure 5.10. Example (5.3): adaptive grids for p = 1 (left) and


p = 6 (right).

smooth areas. Nevertheless, with the adaptive refinement advancing, the


higher-order accuracy comes to fruition.
Finally, to get an impression of the adaptation process, Figure 5.10 shows
two adaptively refined grids with 7 641 grid points (p = 1, left), and 10 965
grid points (p = 6, right).
With these results supporting the efficiency of the combination of higher-
order approximation and adaptive mesh refinement on sparse grids, we close
the discussion of the 2D Laplacian and turn to 3D problems.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


The 3D Laplace equation
For the 3D case, we restrict ourselves to a smooth model problem. On
Ω̄ = [0, 1]3 , let

∆u(x) = 0 in Ω, (5.4)

sinh( 2πx1 )
u(x) = √ · sin(πx2 ) · sin(πx3 ) on ∂Ω.
sinh( 2π)

For the polynomial degrees p ∈ {1, . . . , 4}, Figure 5.11 compares the ac-
curacy with respect to the error’s maximum norm or its l2 -norm, respect-
ively. Again, the effects of the improved approximation properties of our
hierarchical polynomial bases are evident.
Figure 5.12 shows that we do already come quite close to the asymptotic
behaviour predicted for the quality of mere interpolation. This is true in
spite of the fact that up to 100 000 degrees of freedom are not excessive
for a 3D problem. Remember that, although the curse of dimensionality is
(1)
lessened significantly by the L2 - or L∞ -based sparse grid spaces Vn and
(p,1)
Vn , there is still some d-dependence. Evaluating the respective order
terms of the ε-complexity derived before and given once more in (5.2) for
the 3D case, we observe an exponent of (p + 2) · (d − 1) = 2p + 4 in the
log-factor, i.e., an exponent of 12 for polynomial degree p = 4, for example.
Nevertheless, as in the 2D case of (5.1) and (5.3), the benefit caused by
higher polynomial degrees in combination with sparse grid discretization is
again rather impressive.

Figure 5.11. Example (5.4): maximum and l2 -error for regular


(p,1)
sparse grid spaces Vn , p ∈ {1, 2, 3, 4}.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


(p,1)
Figure 5.12. Example (5.4): convergence on Vn , p ∈ {1, 2}
(left) and p ∈ {3, 4} (right); the solid lines indicate the
respective expected sparse grid convergence (sgc; position of
curve chosen for clarity).

Towards greater generality


Figure 5.13 illustrates the effect of using the point functional with respect to
the midpoint P of Ω for the dual problem-based control of adaptive refine-
ment instead of the l2 -adaptation regarded up to now. It can be shown that
the resulting error estimator is strongly related to the energy of the basis
function living in P . Since we have an O(hp )-behaviour of the energy norm,
the energy decreases of the order O(h2p ). Consequently, a convergence be-
haviour of O(h2p ) for the error in the midpoint can be expected. Actually,
for the quadratic case, this can be seen from Figure 5.13 for both example
(5.1) and even for the root singularity Re (z 1/2 ) on Ω := ]0, 1[ × ] − 12 , 12 [.
Note that the solid line now indicates an N −4 -behaviour. In addition to
(5.3), the results for the root singularity show that we can tackle really
singular problems too.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.13. Reducing the error in the midpoint for p = 2:
example (5.1) (left) and root singularity Re (z 1/2 ) (right); the
solid lines indicate the expected convergence N −4 (position of
curve chosen for clarity).

In Figure 5.14, we present the error resulting from adaptive refinement


based on the point functional for example (5.1). Obviously, the error is
primarily reduced near the midpoint. The right-hand side of Figure 5.14
shows the underlying adaptive grid consisting of 12 767 grid points.
Since all examples have so far been treated with a p-regular approach, we
want to present at least one result for a p-asymptotic proceeding, i.e., for
the scenario of increasing p for each new level l of grid points occurring. On
Ω = ]0, 1[2 , let
 
x1 · x21 x22 − 3x22 + x41 + 2x21 + 1
−∆u(x) = −2 · in Ω, (5.5)
(x21 + 1)3
x1 x2
u(x) = 2 2 on ∂Ω.
x1 + 1
According to the p-asymptotic strategy, in Figure 5.15, the polynomial
degree is chosen as p := l + 1 up to p = 12. In order to avoid trouble with
machine accuracy, we switch to quadruple precision. Of course, striving for
twenty-five or more correct digits is not particularly useful. Nevertheless,
we obtain a convincing diagram that shows the sub-exponential behaviour
of the p-asymptotic strategy which, by the way, performs better than theory
had suggested.
The last finite element example of this section demonstrates that our ap-
proach is not limited to simple operators such as the Laplacian. Here we
turn to a variable diffusion coefficient. For a detailed discussion of the gen-
eral linear elliptic operator of second order, see Achatz (2003b), for instance.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.14. Example (5.1): reducing the error in the midpoint
for p = 2; error (left) and adaptively refined grid (right).

(p,1)
Figure 5.15. Example (5.5): regular sparse grid spaces Vn ,
p-asymptotic proceeding up to p = 12 with quadruple
precision.

On the unit square, let


 
−∇ · A(x)∇u(x) = f (x) in Ω, (5.6)
where

4 + sin(2πx1 ) + sin(2πx2 ) 1 + 4x2 · (1 − x2 ) · sin(2πx1 )
A(x) := ,
1 + 4x2 · (1 − x2 ) · sin(2πx1 ) 4 + sin(2πx1 x2 )
and where f (x) and the Dirichlet boundary conditions are chosen such that
u(x) := sin(πx1 ) · sin(πx2 ) (5.7)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.16. Example (5.6): maximum and l2 -error for the
(p,1)
regular sparse grid spaces Vn , p ∈ {2, 4, 6}.

(p,1)
Figure 5.17. Example (5.6): convergence on Vn for
p = 2 (left) and p = 6 (right); the solid lines indicate the
respective expected sparse grid convergence (sgc; position of
curve chosen for clarity).

is the solution. We present results for p ∈ {2, 4, 6}. Note that, owing to the
smoothness of u(x), we restrict ourselves to the regular sparse grid spaces
(p,1)
Vn . Figure 5.16 shows the maximum and the l2 -error.
Obviously, the higher-order approximation of our hierarchical Lagrangian
basis polynomials comes to fruition in the more general situation of example
(5.6) too. Figure 5.17 illustrates that we come close to the expected asymp-
totic behaviour already for moderate values of N .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


So far, all problems have been tackled with a finite element approach.
However, finite differences have been successfully used for the numerical
solution of PDEs too (see Griebel and Schiekofer (1999), Schiekofer (1998),
Schiekofer and Zumbusch (1998), for example). Hence we want to present
one model problem that has been solved with finite differences on sparse
(1)
grids, i.e., on Vn , to be precise. On the unit cube, define the matrix
 2 
x1 exp(πx2 − 1) x21 π −3
Â(x) :=  0 π 0 ,
0 sin(x1 ) 0.1 · (x21 + 2x2 )
the vector
   
b̂1 (x) x1 /(x1 x3 + 0.1)
b̂(x) := b̂2 (x) := cos(exp(x1 + x2 x3 )),
b̂3 (x) x1 x22
and the scalar function
ĉ(x) := π · (x1 + x2 + x3 ).
Furthermore, choose the right-hand side f (x) such that
  3

x1 + x2 + x3 4  
u(x) = arctan 100 √ − xi − x2i (5.8)
3 5
i=1

solves
 
− ∇ · Â(x)∇u(x) + b̂(x) · ∇u(x) + ĉ(x)u(x) = f (x) (5.9)
on the unit cube with Dirichlet boundary conditions. Figure 5.19 (overleaf)
illustrates the solution u(x) due to (5.8) for the two x3 -values 0.5 and 0.875.
Numerical results are presented in Figure 5.18.

.1

.1e–1

.1e–2

maximum
.1e–3
P2
P1
1e–05 l2

.1e3 .1e4 .1e5 1e+05

Figure 5.18. Example (5.9), finite differences, regular sparse


(1)
grid spaces Vn : maximum l2 -error as well as errors in
P1 = (0.5, 0.5, 0.5) and P2 = (1/π, 1/π, 1/π).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.19. Example (5.9), finite differences, regular sparse grid
(1)
spaces Vn : solution u(x) and its contour lines for x3 = 0.5 (top)
and x3 = 0.875 (bottom).

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


5.2. Flow problems and the Navier–Stokes equations
We now consider the application of an adaptive sparse grid method to the
solution of incompressible flow problems. The governing relations are given
by the Navier–Stokes equations
∂t u + ∇ · (u ⊗ u) = f − ∇p + ν∆u, (x, t) ∈ Ω × [0, T ], (5.10)
∇ · u = 0,
u(x, 0) = u0 (x),
here in primitive variables, with velocity u = (u, v, w)T and pressure p. The
parameter ν denotes the kinematic viscosity; f is a prescribed volume force.
The system has to be completed by proper initial and boundary conditions.
We apply the the pressure correction scheme (Chorin 1968, Temam 1969,
Bell, Colella and Glaz 1989). To this end, the discretization of the Navier–
Stokes equations is split into two subproblems.
Transport step. Calculate the auxiliary velocity ûn+1 by
ûn+1 − un
+ C(un , un−1 , . . . ) = f n+1 − ∇pn + ν∆ûn+1 . (5.11)
k
Here, C(un , un−1 , . . . ) resembles a stabilized, space-adaptive discretization
of ∇ · (u ⊗ u) by means of a Petrov–Galerkin/collocation approach using
sparse grid interpolets. Adaptive refinement or coarsening takes place in
every time-step steered by the size of the actual coefficients of the multiscale
representation of the current solution approximation; see Griebel (1998),
Koster (2002) and Griebel and Koster (2000) for further details.
Projection step. Calculate un+1 , pn+1 as the solution of
un+1 − ûn+1
= −∇(pn+1 − pn ), (5.12)
k
∇ · un+1 = 0.
Instead of the simple Euler scheme, higher-order time discretization tech-
niques can be applied analogously. Of course, both subproblems have to
be augmented by boundary conditions – not only for u and û, but also for
the pressure p. The (in general non-physical) pressure boundary conditions
especially are the subject of controversy (cf. Gresho and Sani (1987), Kar-
niadakis and Sherwin (1999), Prohl (1997)), which is beyond the scope of
this paper. In the following, we will assume periodic boundary conditions
for u and p. The saddle point problem (5.12) is treated by solving the Schur
complement equation
1
∇ · ∇(pn+1 − pn ) = ∇ · ûn+1 , (5.13)
k

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


followed by the correction of the velocity
un+1 = ûn+1 − k∇(pn+1 − pn ). (5.14)
To this end, a weakly divergence-free adaptive discretization of ∇ · ∇ is ap-
plied together with sparse grid interpolets. The solver involves a transform
to the lifting interpolets of Section 4.4, a diagonal scaling, and a backtrans-
form; see Koster (2002) and Griebel and Koster (2000) for further details.

Merging of modons
Now, we apply the adaptive version (Griebel and Koster 2000, Koster 2002,
Griebel and Koster 2003) of this sparse grid interpolet solver to the model
problem of the interaction of three vortices in a 2D flow. Here we use the
interpolets of Section 4.3 with N = 6.
The initial velocity is induced by three vortices, each with a Gaussian
vorticity profile
3 
−x − xi 2
ω(x, 0) = ω0 + ωi exp 2 , x ∈ [0, 1]2 .
i=1
σ i

The first two vortices have the same positive sign ω1 , ω2 > 0, and the third
has a negative sign; see Figure 5.20.
The different parameters ω0 , ωi , and σi are chosen such that the mean
value of ω(. , 0) vanishes and that ω(. , 0)∂[0,1]2 is almost ω0 to allow for
periodic boundary conditions.
Owing to the induced velocity field, the three vortices start to rotate
around each other. In a later stage, the two same-sign vortices merge, which
leads to a configuration of two counter-rotating vortices. This process is the
basic mechanism in 2D turbulence, and it takes place, e.g., in the shear
layer problem of the next subsection or during the convergence of ω to
the solution of the Joyce–Montgomery equation (Chorin 1998) for random
initial vorticity fields.

Figure 5.20. Initial configuration for


the three vortices’ interaction.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.21. Isolines of ω at t = 5, 10, 15, 35 (from top to
bottom). Left: adaptive wavelet solver ( = 4 · 10−4 ). Right:
adaptive sparse grid.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


In our numerical experiments, the viscosity was set to ν = 3.8 · 10−6 . The
maximum velocity of the initial condition is 7 · 10−2 , which corresponds to
a Reynolds number of ≈ 55200. For the time discretization in the Chorin
projection scheme, we used the third-order Adams–Bashforth scheme. The
time-step used was dt = 10−2 . The values of the threshold parameter 
were set to {8, 4, 1} · 10−4 , and the finest level l was limited to (10, 10) to
avoid very fine time-steps due to the CFL-condition.
In Figure 5.21 (page 235), the contour lines of the vorticity of the adaptive
solutions are shown. The contour levels are equally spaced from −1.5 to 3,
which is approximately the minimum/maximum value of ω(. , 0). Obviously,
our method recognizes the arising complicated flow patterns in space and
time (see Figure 5.21) and devotes many more degrees of freedom to these
than to smooth regions. Note that the left-hand pictures of Figure 5.21
are enhanced by 20%. Further comparisons of these results with a con-
ventional Fourier spectral solution showed that the results of the adaptive
wavelet solver are quite accurate. But to achieve the same accuracy, only a
small number of degrees of freedom was needed by our adaptive sparse grid
approach – less than 1% compared with the Fourier spectral technique.

2D shear layer flow


Now we apply the adaptive wavelet solver to a shear layer model problem.
The initial configuration of the temporally developing shear layer is a ve-
locity field with a hyperbolic-tangent profile u(y) = U tanh(2y/δ), v = 0,
where δ is the vorticity shear layer thickness δ = 2U/ max(du/dy); see
Figure 5.22. This initial condition is an exact stationary solution of the
Navier–Stokes equations for the case ν = 0, i.e., for inviscid fluids. How-
ever, from linear stability analysis this initial configuration is known to be

y
1/2
u(y) = U tanh y/δ
U

0 δ x
U

−1/2
Figure 5.22. Velocity profile of the initial condition for the
shear flow model problem. The thickness of the boundary
layer is defined as δ = 2U/ maxy |∂y u(y)|.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.23. Isolines of the rotation for the values {0.1, 0.2,
0.3, 0.5, 0.75, 1.0, . . . , 2.25, 2.5} at time t = 4, 8, 16, 36.
Left: isolines of computed rotation. Right: adaptive sparse grid.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


inviscidly unstable. Slight perturbations are amplified by Kelvin–Helmholtz
instabilities and vortex roll-up occurs (Michalke 1964).
In our numerical experiments, the vorticity thickness δ was chosen so
that ten vortices should develop in the computational domain [0, 1]2 . The
maximum velocity was U = 1.67 · 10−2 , and the viscosity was ν = 3.8 · 10−6 .
The instabilities were triggered by a superimposed weak white noise in the
shear layer.
The numerical simulations were applied to the periodized version of the
problem with two shear layers on [0, 1] × [0, 2]. We used the same adaptive
sparse grid solver as in the previous experiment, based on interpolets with
N = 6, a stabilized Petrov–Galerkin collocation, and Chorin’s projection
method. The values of the threshold parameter  were set to {12, 8, 4}·10−4
and the finest level l was limited to (10, 10). The time-step was 5 · 10−3 .
On the left-hand side of Figure 5.23 (page 237), the resulting rotation
ω = ∇ × u for  = 8 · 10−4 is given. The evolution of the sparse grid points
is shown on the right-hand side of Figure 5.23. The initial velocity is not
very smooth owing to the white noise added for triggering the instability.
Therefore, a large number of degrees of freedom (DOF) is required to re-
solve the initial velocity in the shear layer sufficiently well. Then, in a first
phase, the diffusion makes the velocity smooth very fast, which leads to
the strong decay of the number of DOF. This process is stopped by the
development of the Kelvin–Helmholtz instabilities leading to an increase of
the number of DOF (4 < t < 10). In the last phase (t > 10), the number
of coherent vortices constantly decreases by successive merging. The final
state comprises two counter-rotating vortices which dissipate.
Besides the generation and the roll-up of Kelvin–Helmholtz vortices, we
also see that the vortices merge with time. This process, in which a few
large vortices are created from many small ones, is typical for 2D flows;
cf. Chapter 4.6 of Chorin (1998). It comes with an energy transfer from
fine to coarse structures, i.e., from fine to coarse levels in the sense of an
isotropic multiscale representation (Novikov 1976). Thus 2D turbulence
behaves fundamentally differently to 3D turbulence. Here, predominantly
an energy transfer from coarse to fine levels occurs, that is, coarse structures
decay to fine structures. The maximal level was limited to 10 to avoid overly
small time-steps resulting from the CFL condition.

3D shear layer flow


As seen in the previous section, vortices with the same rotation direction
merge successively in 2D flows. In 3D flows, however, this effect is no longer
present. Here the additional third dimension allows for vortex tubes. They
can be bent, folded, and stretched over time until they break apart. Their
diameter reduces during stretching, and therefore smaller-scale structures

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.24. Iso-surfaces of the rotation for the 3D turbulent
shear layer problem at time 0, 15, 30, 60, 90, 150.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


are created. In this way energy is transported to finer scales. This energy
cascade proceeds recursively until the structures are so tiny and the asso-
ciated relative Reynolds number is so small that energy is annihilated by
dissipation. The different phases of a developing 3D shear layer are shown
in Figure 5.24 (page 239), from which this increase of complexity in the
flow becomes clear. The initial conditions for this calculation are the 3D
analogues of the initial conditions for the previous 2D example. However,
viscosity was increased, i.e., we have set ν = 5.0410−6 . The remaining
parameters were the same as for the 2D calculation.

5.3. Problems of high dimensionality


We now turn to higher-dimensional problems. To this end, a few general
remarks are helpful. First, the sparse grid approach is limited to (topolo-
gically) quadrilateral domains due to its tensor product nature. Complic-
ated domains must be built up by gluing together quadrilateral patches on
which locally sparse grids are employed, or by appropriate mapping tech-
niques. Now, in higher dimensionalities, this question of the shape of the
domain is not as important as in the 2D and 3D case, since complicated
domains typically do not appear in applications. Conceptually, besides Rd
itself, we use mainly hypercubes [−a, a]d , a > 0, and their straightforward
generalizations using different values of a for each coordinate direction, as
well as the corresponding structures in polar coordinates. These domains
are of tensor product structure and cause no difficulties to the sparse grid
approach. Second, complexity issues are still crucial in higher-dimensional
applications. In the simplest case of a sufficiently smooth function with
bounded mixed derivatives, the cost and error estimates for the Lp -norms
still possess (log N )d−1 -terms which depend exponentially on the dimension-
ality d. This limits the practical applications of the sparse grid method for
PDEs to at most 18 dimensions at present. Furthermore, in the nonsmooth
case, adaptivity towards a singularity is a difficult task in higher dimen-
sions. In principle, it can be done, as demonstrated in the examples of the
previous sections. However, an implementation of refinement strategies, er-
ror indicators, and solvers with optimal complexity is more difficult than in
the low-dimensional case. Hence, utmost care has to be taken in the im-
plementation not to obtain order constants in the work count that depend
exponentially on d. For a simple example of this problem, just consider
the stiffness matrix for d-linear finite elements on a uniform grid. There,
the number of nonzero entries grows like 3d for a second-order elliptic PDE
with general coefficient functions. Also the work count for the computa-
tion of each entry by a conventional quadrature scheme involves terms that
grow exponentially with d. Here, at least in certain simple cases like the
Laplacian, an algorithmic scheme can be applied, which is based on the

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


so-called unidirectional principle (Bungartz 1998). It allows us to treat the
single dimensions separately and results in linear complexity in both the de-
grees of freedom and d, for the matrix–vector product in a Galerkin sparse
grid approach. However, note that, especially in the case of more general
nonseparable operators, this is the challenging part of a sparse grid finite
element implementation.
The next question concerns the sources of high-dimensional problems
and PDEs. Here, besides pure integration problems stemming from phys-
ics and finance, typically models from the stochastic and data analysis
world show up. For example, high-dimensional Laplace/diffusion prob-
lems and high-dimensional convection–diffusion problems result from dif-
fusion approximation techniques and the Fokker–Planck equation. Ex-
amples are the description of queueing networks (Mitzlaff 1997, Shen, Chen,
Dai and Dai 2002), reaction mechanisms in molecular biology (Sjöberg
2002, Elf, Lötstedt and Sjöberg 2001), the viscoelasticity in polymer flu-
ids (Rouse 1953, Prakash and Öttinger 1999, Prakash 2000), or various
models for the pricing of financial derivatives (Reisinger 2003). Further-
more, homogenization with multiple scales (Allaire 1992, Cioranescu, Dam-
lamian and Griso 2002, Matache 2001, Matache and Schwab 2001, Hoang
and Schwab 2003) as well as stochastic elliptic equations (Schwab and Todor
2002, 2003) result in high-dimensional PDEs. Next, we find quite high-
dimensional problems in quantum mechanics. Here, the dimensionality
of the Schrödinger equation grows with the number of considered atoms.
Sparse grids have been used in this context by Garcke (1998), Garcke and
Griebel (2000), Yserentant (2004) and Hackbusch (2001).
In the following, we illustrate the potential of sparse grids for problems
of a higher dimensionality in the context of numerical quadrature and data
mining.

Classification and regression in data mining


Data mining is the process to find hidden patterns, relations, and trends in
large data sets. It plays an increasing role in commerce and science. Typical
scientific applications are the post-processing of data in medicine (e.g., CT
data), the evaluation of data in astrophysics (e.g., telescope and observat-
ory data), and the grouping of seismic data, or the evaluation of satellite
pictures (e.g., NASA earth observing system). Financial and commercial
applications are perhaps of greater importance. With the development of
the internet and e-commerce, there are huge data sets collected, more or
less automatically, which can be used for business decisions and further
strategic planning. Here, applications range from contract management to
risk assessment, from the segmentation of customers for marketing to fraud
detection, stock analysis and turnover prediction.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Usually, the process of data mining (or knowledge discovery) can be sep-
arated into the planning step, the preparation phase, the mining phase (i.e.,
machine learning) and the evaluation. To this end, association-analysis clas-
sification, clustering, and prognostics are to be performed. For a thorough
overview of the various tasks arising in the data mining process, see Berry
and Linoff (2000) and Cios, Pedrycz and Swiniarski (1998).
In the following, we consider the classification problem in detail. Here,
a set of data points in d-dimensional feature space is given together with a
class label in {−1, 1}, for example. From these data, a classifier must be con-
structed which allows us to predict the class of any newly given data point
for future decision making. Widely used approaches are nearest neighbour
methods, decision tree induction, rule learning, and memory-based reason-
ing. There are also classification algorithms based on adaptive multivariate
regression splines, neural networks, support vector machines, and regulariz-
ation networks. Interestingly, these latter techniques can be interpreted in
the framework of regularization networks (Girosi, Jones and Poggio 1995).
This approach allows a direct description of the most important neural net-
works, and it also allows for an equivalent description of support vector
machines and n-term approximation schemes (Girosi 1998).
We follow Garcke, Griebel and Thess (2001) and consider a given set of
already classified data (the training set):
S = {(xi , yi ) ∈ Rd × R}M
i=1 .
We now assume that these data have been obtained by sampling of an
unknown function f which belongs to some function space V defined over
Rd . The sampling process was disturbed by noise. The aim is now to recover
the function f from the given data as well as possible. This is clearly an
ill-posed problem since infinitely many solutions are possible. To get a
well-posed, uniquely solvable problem, we have to restrict f . To this end,
regularization theory (Tikhonov and Arsenin 1977, Wahba 1990) imposes
an additional smoothness constraint on the solution of the approximation
problem, and the regularization network approach considers the variational
problem
min R(f )
f ∈V

with
1 
m
R(f ) = C(f (xi ), yi ) + λΦ(f ). (5.15)
M
i=1
Here, C(. , .) denotes an error cost function which measures the interpolation
error and Φ(f ) is a smoothness functional which must be well defined for
f ∈ V . The first term enforces closeness of f to the data, the second term
enforces smoothness of f , and the regularization parameter λ balances these

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


two terms. We consider the case

C(x, y) = (x − y)2 and Φ(f ) = P f 22 with P f = ∇f.

The value of λ can be chosen according to cross-validation techniques (Allen


1972, Golub, Heath and Wahba 1979, Utreras 1979, Wahba 1985) or to some
other principle, such as structural risk minimization (Vapnik 1982). We find
exactly this type of formulation in the case d = 2, 3 in many scattered data
approximation methods (see Arge, Dæhlen and Tveito (1995) and Hoschek
and Lasser (1992)), where the regularization term is usually physically mo-
tivated.
We now restrict the problem to a finite-dimensional subspace VN ⊂ V .
The function f is then replaced by


N
fN = αj ϕj (x). (5.16)
j=1

Here {ψj }Nj=1 should span VN and preferably should form a basis for VN . The
coefficients {αj }N j=1 denote the degrees of freedom. Note that the restriction
to a suitably chosen finite-dimensional subspace involves some additional
regularization (regularization by discretization) which depends on the choice
of VN . In this way we obtain from the minimization problem a feasible linear
system. We thus have to minimize

1 
M
2
R(fN ) = fN (xi ) − yi + λP fN 2L2 , fN ∈ VN (5.17)
M
i=1

in the finite-dimensional space VN . We plug (5.16) into (5.17) and obtain


N 2
1   
M N
R(fN ) = αj ϕj (xi )−yi + λP αj ϕj 2L2 (5.18)
M
i=1 j=1 j=1
N 2
1   
M N 
N
= αj ϕj (xi )−yi + λ αi αj (P ϕi , P ϕj )L2 . (5.19)
M
i=1 j=1 i=1 j=1

Differentiation with respect to αk , k = 1, . . . , N , gives


N 
2   
M N
∂R(fN )
0= = αj ϕj (xi )−yi ·ϕk (xi ) + 2λ αj (P ϕj , P ϕk )L2 .
∂αk M
i=1 j=1 j=1
(5.20)

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


This is equivalent to (k = 1, . . . , N )


N
1  
N M
1 
M
λ αj (P ϕj , P ϕk )L2 + αj ϕj (xi ) · ϕk (xi ) = yi ϕk (xi )
M M
j=1 j=1 i=1 i=1
(5.21)
and we obtain (k = 1, . . . , N )
% &

N 
M 
M
αj M λ(P ϕj , P ϕk )L2 + ϕj (xi ) · ϕk (xi ) = yi ϕk (xi ). (5.22)
j=1 i=1 i=1

In matrix notation, we end up with the linear system

(λC + B · B T )α = By. (5.23)

Here, C is a square N × N matrix with entries Cj,k = M · (P ϕj , P ϕk )L2 ,


j, k = 1, . . . , N , and B is a rectangular N × M matrix with entries Bj,i =
ϕj (xi ), i = 1, . . . , M, j = 1, . . . , N . The vector y contains the data yi and
has length M . The unknown vector α contains the degrees of freedom
αj and has length N . With the gradient P = ∇ in the regularization
expression in (5.15), we obtain a Poisson problem with an additional term
that resembles the interpolation problem. The natural boundary conditions
for such a differential equation in Ω = [0, 1]d , for instance, are Neumann
conditions. The discretization (5.16) gives us then the linear system (5.23)
where C corresponds to a discrete Laplacian. To obtain the classifier fN ,
we now have to solve this system.
Again, the curse of dimensionality prohibits us from using for VN conven-
tional finite element spaces living on a uniform grid. The complexity would
grow exponentially with d. Instead, we used a sparse grid approach, namely
the combination method, as already described for the 2D and 3D case in
(4.41), (4.42), and Figure 4.14. These formulae can easily be generalized to
the d-dimensional case. Here we consider the minimization problem on a
sequence of grids, solve the resulting linear systems (5.23), and combine the
solution accordingly. The complexity of the method is with respect to the
number of levels n, as usual for regular sparse grids of the order O(nd−1 2n ).
With respect to the number M of training data, the cost complexity scales
linearly in M for a clever implementation. This is a substantial advantage
in comparison to most neural networks and support vector machines, which
generally scale at least quadratically in M and are therefore not suited to
problems with very large data sets.
We now apply our approach to different test data sets. Here we use
both synthetic data generated by DatGen (Melli 2003) and real data from
practical data mining applications. All the data sets are rescaled to [0, 1]d .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


100

98

10-Fold Correctness %
96

94
Level 7 Train
Level 7 Test
92 Level 8 Train
Level 8 Test
Level 9 Train
Level 9 Test
90 Level 10 Train
Level 10 Test
88
1e-05 0.0001 0.001 0.01
Lambda

Figure 5.25. Left: chessboard data set, combination technique


with level 10, λ = 4.53999 · 10−5 . Right: plot of the dependence
on λ (in logscale) and level.

To evaluate our method we give the correctness rates on testing data sets
if available and the 10-fold cross-validation results on training and testing
data sets. For the 10-fold cross-validation we proceed as follows. We divide
the training data into 10 equally sized disjoint subsets. For i = 1 to 10, we
pick the ith of these subsets as a further testing set and build the sparse grid
combination classifier with the data from the remaining nine subsets. We
then evaluate the correctness rates of the current training and testing set. In
this way we obtain ten different training and testing correctness rates. The
10-fold cross-validation result is just the average of these ten correctness
rates. For further details, see Stone (1974). For a critical discussion on the
evaluation of the quality of classifier algorithms, see Salzberg (1997).
We first consider 2D problems with small sets of data that correspond to
certain structures. Then we treat problems with huge sets of synthetic data
with up to 5 million points.
The first example is taken from Ho and Kleinberg (1996) and Kaufman
(1999). Here, 1000 training data points were given which are more or less
uniformly distributed in Ω = [0, 1]2 . The associated data values are plus one
or minus one depending on their location in Ω such that a 4 × 4 chessboard
structure appears: see Figure 5.25 (left). We computed the 10-fold cross-
validated training and testing correctness with the sparse grid combination
method for different values of the regularization parameter λ and different
levels n. The results are shown in Figure 5.25 (right).
We see that the 10-fold testing correctness is well around 95% for values
of λ between 3 · 10−5 and 5 · 10−3 . Our best 10-fold testing correctness was
96.20% on level 10 with λ = 4.54 · 10−5 . The chessboard structure is thus
reconstructed with less than 4% error.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Figure 5.26. Spiral data set, sparse grid with level 6 (left)
and 8 (right), λ = 0.001.

Another 2D example with structure is the spiral data set, first proposed
by Wieland; see also Fahlmann and Lebiere (1990). Here, 194 data points
describe two intertwined spirals: see Figure 5.26. This is surely an artificial
problem, which does not appear in practical applications. However it serves
as a hard test case for new data mining algorithms. It is known that neural
networks can have severe problems with this data set, and some neural
networks can not separate the two spirals at all. In Figure 5.26 we give the
results obtained with our sparse grid combination method with λ = 0.001
for n = 6 and n = 8. Already for level 6, the two spirals are clearly detected
and resolved. Note that here only 577 grid points are contained in the sparse
grid. For level 8 (2817 sparse grid points), the shape of the two reconstructed
spirals gets smoother and the reconstruction gets more precise.
The BUPA Liver Disorders data set from the Irvine Machine Learning
Database Repository (Blake and Merz 1998) consists of 345 data points with
six features plus a selector field used to split the data into two sets with
145 instances and 200 instances, respectively. Here we only have training
data and can therefore only report our 10-fold cross-validation results. No
comparison with unused test data is possible.
We compare with the two best results from Lee and Mangasarian (2001),
the smoothed support vector machine (SSVM) introduced therein, and the
feature selection concave minimization (FSV) algorithm due to Bradley and
Mangasarian (1998). Table 5.1 gives the results for the 10-fold correctness.
Our sparse grid combination approach performs on level 3 with λ = 0.0625
at 69.23% 10-fold testing correctness. But our other results were also in this
range. Our method performs only slightly worse here than the SSVM but

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Table 5.1. Results for the BUPA liver disorders data set.

sparse grid combination method


SSVM FSV level 1 level 2 level 3 level 4
λ = 0.0001 λ = 0.1 λ = 0.0625 λ = 0.625

train. % 70.37 68.18 83.28 79.54 90.20 88.66


test. % 70.33 65.20 66.89 66.38 69.23 68.74

Table 5.2. Results for a 6D synthetic massive data set, λ = 0.01.

training testing total data matrix # of


# of points corr. corr. time (sec) time (sec) iterat.

50 000 87.9 88.1 158 152 41


level 1 500 000 88.0 88.1 1570 1528 44
5 million 88.0 88.0 15933 15514 46

50 000 89.2 89.3 1155 1126 438


level 2 500 000 89.4 89.2 11219 11022 466
5 million 89.3 89.2 112656 110772 490

clearly better than FSV. Note that the results for the robust linear program
(RLP) algorithm (Bennett and Mangasarian 1992), the support vector ma-
chine using the 1-norm approach (SVM·1 ), and the classical support vec-
tor machine (SVM·22 ) (Bradley and Mangasarian 1998, Cherkassky and
Mulier 1998, Vapnik 1995) were reported to be somewhat worse in Lee and
Mangasarian (2001).
Next, we produced a 6D data set with 5 million training points and 20 000
points with DatGen (Melli 2003) for testing. We used the call
datgen -r1 -X0/100,R,O :0/100,R,O:0/100,R,O:0/100,R,O:0/200,
R,O:0/200,R,O -R2 -C2/4 -D2/5 -T10/60 -O502 0000 -p -e0.15.
The results are given in Table 5.2. On level one, a testing correctness of
88% was achieved already, which is quite satisfying for this data. We see
that really huge data sets of 5 million points could be handled. We also
give the CPU time which is needed for the computation of the matrices
Gl = Bl · BlT . Here, more than 96% of the computing time is spent on the
matrix assembly. Again, the execution times scale linearly with the number
of data points.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Analogously, we can at present deal with up to 18-dimensional prob-
lems before storage limitations on our available computers stop us. This
moderate number of dimensions, however, is still sufficient for many prac-
tical applications. In very high-dimensional data, there exist mostly strong
correlations and dependencies between the dimensions. Then, in a prepro-
cessing step, the effective dimensions can be determined, for instance, by
means of principal component analysis, and the dimension can be reduced
in many cases to 18 or less.
To reach 18 dimensions, a generalization of the sparse grid combination
technique to simplicial basis functions (Garcke and Griebel 2001a, 2002) is
needed. Note that the sparse grid combination technique can be parallelized
in a straightforward way (Garcke and Griebel 2001b, Garcke, Hegland and
Nielsen 2003). Finally, the sparse grid technique can be used in a dimension-
adaptive fashion (Hegland 2002, 2003), which further enhances the method’s
capabilities. This approach will be discussed for the example of numerical
integration now.

Integration
The computation of high-dimensional integrals is a central part of computer
simulations in many application areas such as statistical mechanics, finan-
cial mathematics, and computational physics. Here, the arising integrals
usually cannot be solved analytically, and thus numerical approaches are
required. Furthermore, often a high-accuracy solution is needed, and thus
such problems can be computationally quite challenging even for parallel
supercomputers. Conventional algorithms for the numerical computation of
such integrals are usually limited by the curse of dimensionality. However,
for special function classes, such as spaces of functions which have bounded
mixed derivatives, Smolyak’s construction (Smolyak 1963) (see (4.1)) can
overcome this curse to a certain extent. In this approach, multivariate
quadrature formulas are constructed using combinations of tensor products
of appropriate 1D formulas. In this way, the number of function evalu-
ations and the numerical accuracy become independent of the dimension
of the problem up to logarithmic factors. Smolyak’s construction is simply
our sparse grid approach. It has been applied to numerical integration
by several authors, using the midpoint rule (Baszenski and Delvos 1993),
the rectangle rule (Paskov 1993), the trapezoidal rule (Bonk 1994a), the
Clenshaw–Curtis rule (Cools and Maerten 1997, Novak and Ritter 1998),
the Gauss rules (Novak and Ritter 1997), and the Gauss–Patterson rules
(Gerstner and Griebel 1998, Petras 2000) as the 1D basis integration pro-
cedure. The latter approach, in particular, achieves the highest polynomial
exactness of all nested quadrature formulas and shows very good results
for sufficiently smooth multivariate integrands. Further studies have been

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


made concerning extrapolation methods (Bonk 1994a), discrepancy meas-
ures (Frank and Heinrich 1996), and complexity questions (Wasilkovski and
Woźniakowski 1999).
There is also a large variety of other methods for the numerical integration
of multivariate functions such as Monte Carlo and quasi-Monte Carlo meth-
ods (Niederreiter 1992), lattice rules (Sloan and Joe 1994), adaptive subdivi-
sion methods (Genz and Malik 1980, Dooren and Ridder 1976), and approx-
imation methods based on neural networks (Barron 1994, Mhaskar 1996).
Each of these methods is particularly suitable for functions from a certain
function class and has a complexity which is then also independent or nearly
independent of the problem’s dimensionality.
Despite the large improvements of the quasi-Monte Carlo and sparse grid
methods over the Monte Carlo method, their convergence rates will suffer
more and more with rising dimension owing to their respective dependence
on the dimension in the logarithmic terms. Therefore, one aim of recent
numerical approaches has been to reduce the dimension of the integration
problem without affecting the accuracy unduly.
In some applications, the different dimensions of the integration problem
are not equally important. For example, in path integrals the number of
dimensions corresponds to the number of time-steps in the time discretiza-
tion. Typically, the first steps in the discretization are more important than
the last steps since they determine the outcome more substantially. In other
applications, although the dimensions seem to be of the same importance
at first sight, the problem can be transformed into an equivalent one where
the dimensions are not. Examples are the Brownian bridge discretization
or the Karhunen–Loeve decomposition of stochastic processes.
Intuitively, problems where the different dimensions are not of equal
importance might be easier to solve: numerical methods could concen-
trate on the more important dimensions. Interestingly, complexity the-
ory also reveals that integration problems with weighted dimensions can
become tractable even if the unweighted problem is not (Wasilkovski and
Woźniakowski 1999). Unfortunately, classical adaptive numerical integra-
tion algorithms (Genz and Malik 1980, Dooren and Ridder 1976) cannot be
applied to high-dimensional problems, since the work overhead in order to
find and adaptively refine in important dimensions would be too large.
To this end, a variety of algorithms have been developed that try to
find and quantify important dimensions. Often, the starting point of these
algorithms is Kolmogorov’s superposition theorem: see Kolmogorov (1956,
1957). Here, a high-dimensional function is approximated by sums of lower-
dimensional functions. A survey of this approach from the perspective of
approximation theory is given in Khavinson (1997). Further results can
be found in Rassias and Simsa (1995) and Simsa (1992). Analogous ideas
are followed in statistics for regression problems and density estimation.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


2 2 2
10 · e−x + 10 · e−y ex + ey + exy −10y 2
e−10x
old old old
8 active 8 active 8 active

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5.27. The resulting index sets and corresponding sparse


grids for TOL = 10−15 for some isotropic test functions.

Here, examples are so-called additive models (Hastie and Tibshirani 1990),
multivariate adaptive regression splines (MARS) (Friedman 1991), and the
ANOVA decomposition (Wahba 1990, Yue and Hickernell 2002); see also
Hegland and Pestov (1999). Other interesting techniques for dimension
reduction are presented in He (2001). If the importance of the dimensions
is known a priori , techniques such as importance sampling can be applied
in Monte Carlo methods (Kalos and Whitlock 1986). For the quasi-Monte
Carlo method, a sorting of the dimensions according to their importance
leads to a better convergence rate (yielding a reduction of the effective
dimension). The reason for this is the better distributional behaviour of
low-discrepancy sequences in lower dimensions than in higher ones (Caflisch,
Morokoff and Owen 1997). The sparse grid method, however, a priori treats
all dimensions equally and thus gains no immediate advantage for problems
where dimensions are of different importance.
In Gerstner and Griebel (2003), we developed a generalization of the
conventional sparse grid approach which is able to adaptively assess the
dimensions according to their importance, and thus reduces the dependence
of the computational complexity on the dimension. This is quite in the spirit
of Hegland (2002, 2003). The dimension-adaptive algorithm tries to find

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


2 2 2
e−x + 10 · e−y ex + 10ey + 10exy −5y 2
e−10x
old old old
8 active 8 active 8 active

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5.28. The resulting index sets and corresponding sparse


grids for TOL = 10−15 for some anisotropic test functions.

important dimensions automatically and adapts (places more integration


points) in those dimensions. To achieve this efficiently, a data structure for
a fast bookkeeping and searching of generalized sparse grid index sets is
necessary.
We will now show the performance of the dimension-adaptive algorithm in
numerical examples. First, we consider some 2D test functions, which allows
us to show the resulting grids and level index sets. In these cases, the exact
value of the integral is known (or can be computed quickly). The second
example is a path integral of 32 dimensions in which the integral value is
also known beforehand. The third example is a 256-dimensional application
problem from finance where the exact value is unknown. We use the well-
known Patterson formulas (Patterson 1986) for univariate quadrature in
all examples. These were shown to be a good choice for the sparse grid
construction by Gerstner and Griebel (1998).
Let us first consider simple combinations of exponential functions defined
over [0, 1]2 . In Figures 5.27 and 5.28 we depict the level index sets used
by the algorithm as well as the resulting dimension-adapted sparse grids
for some isotropic and some anisotropic functions, respectively. In these
examples, the selected error threshold is TOL = 10−15 .

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


The first example is a sum of 1D functions. The dimension-adaptive
algorithm correctly selects no indices in joint dimensions. Also, more points
are placed in the x-direction than in the y-direction in the anisotropic case.
Clearly, here the conventional sparse grid would use too many points in joint
directions.
The second example is not separable, nor does it have product structure.
The resulting level index set is almost triangular, like the conventional sparse
grid. However, the dimension-adaptive algorithm chooses to select more
points on the axes, while the conventional sparse grid would have used too
many points in the interior. In our experience, many application problems
fall in this category, which we would call nearly additive.
The third example is the well-known Gaussian hat function, and has
product structure. In this example, many points in joint dimensions are
required, and the conventional sparse grid would have placed too few points
there. At first sight, this is a surprising result, since product functions
should be more easily integrable by a tensor product approach. However,
the mixed derivatives of the Gaussian can become large even if they are
bounded, which reduces the efficiency of both the conventional sparse grid
and the dimension-adaptive approaches.
Let us now approach some higher-dimensional problems. We will first
consider an initial value problem given by the linear PDE

∂u 1 ∂2u
= · 2 (x, t) + v(x, t) · u(x, t),
∂t 2 ∂x

with initial condition u(x, 0) = f (x). The solution of this problem can be
obtained with the Feynman–Kac formula as
 Rt 
u(x, t) = Ex,0 f (ξ(t)) · e 0 v(ξ(r),t−r) dr
,

where ξ represents a Wiener path starting at ξ(0) = x. The expectation


Ex,0 can be approximated by a discretization of time using a finite number
of time-steps ti = i · ∆t with ∆t = t/d. The integral in the exponent is
approximated by a 1D quadrature formula such as a sufficiently accurate
trapezoidal rule.
The most natural way to discretize the Wiener path is by a random walk,
i.e., by the recursive formula

ξk = ξk−1 + ∆t zk ,

where ξ0 = x and zk are normally distributed random variables with mean


zero and variance one. The dimensions in the random walk discretization
are all of the same importance since all the variances are identical to ∆t.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


0.1 7
Sparse Grid Random Walk Maximum Level with Random Walk
Sparse Grid Brownian Bridge Maximum Level with Brownian Bridge
0.01 Dimension-Adaptive Random Walk 6
Dimension-Adaptive Brownian Bridge

0.001 5

0.0001 4

Level
Error

1e-05 3

1e-06 2

1e-07 1

1e-08 0
1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 0 5 10 15 20 25 30 35
Function Evaluations Dimension

Figure 5.29. Computational results for the path integral (d = 32):


integration error vs. number of function evaluations (left) and
maximum level over all dimensions (sorted) for the dimension-
adaptive algorithm with Brownian bridge discretization (right).

In the Brownian bridge discretization (Caflisch et al. 1997), however, the


path is discretized using a future and a past value
'
1 h · ∆t
ξk = (ξk−h + ξk+h ) + · zk .
2 2

Starting with ξ0 := x and ξd := x + t zd , the subsequent values to
be computed are ξd/2 , ξd/4 , ξ3d/4 , ξd/8 , ξ3d/8 , ξ5d/8 , ξ7d/8 , ξ1d/16 , ξ3d/16 , . . .
with corresponding h = 1/2, 1/4, 1/4, 1/8, 1/8, 1/8, 1/8, 1/16, 1/16, . . . .
The Brownian bridge leads to a concentration of the total variance in the
first few steps of the discretization and thus to a weighting of the dimensions.
Let us now consider the concrete example (Morokoff and Caflisch 1995)

1 1 4x2
v(x, t) = + 2 − 2 ,
t + 1 x + 1 (x + 1)2
1
with initial condition u(x, 0) = x2 +1
. The exact solution is then
t+1
u(x, t) = .
x2 + 1
The results for d = 32, t = 0.02 and x = 0 are shown on the left-hand side of
Figure 5.29. We see the integration error plotted against the number of func-
tion evaluations in a log-log scale. Here, the conventional sparse grid method
is compared with the dimension-adaptive algorithm for the random walk and
Brownian bridge discretizations. In this example, the conventional sparse
grid is for the random walk discretization obviously close to the optimum,
since the dimension-adaptive method cannot improve on the performance.
The conventional sparse grid gains no advantage from the Brownian bridge
discretization, but the convergence rate of the dimension-adaptive algorithm

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


is dramatically improved. Note that the convergence rate of the quasi-Monte
Carlo method (with Brownian bridge) is comparable to that of the con-
ventional sparse grid approach (Morokoff and Caflisch 1995, Gerstner and
Griebel 1998). On the right-hand side of Figure 5.29, we plot the maximum
level per dimension of the final level index set of the dimension-adaptive
method with and without the Brownian bridge discretization. Here the di-
mensions are sorted according to this quantity. For the Brownian bridge
discretization, the maximum level decays with the dimension. This shows
that only a few dimensions are important and thus contribute substantially
to the total integral while the other dimensions add significantly less.
Let us now consider a typical collateralized mortgage obligation problem,
which involves several tranches, which in turn derive their cash flows from an
underlying pool of mortgages (Caflisch et al. 1997, Paskov and Traub 1995).
The problem is to estimate the expected value of the sum of present values of
future cash flows for each tranche. Let us assume that the pool of mortgages
has a 21 13 year maturity and cash flows are obtained monthly. Then the
expected value requires the evaluation of an integral of dimension d = 256
for each tranche,

v(ξ1 , . . . , ξd ) · g(ξ1 ) · . . . · g(ξd ) dξ1 · · · dξd ,
Rd
2 2
with Gaussian weights g(ξi ) = (2πσ 2 )−1/2 e−ξi /2σ . The sum of the future
cash flows v is basically a function of the interest rates ik (for month k),
ik := K0 eξ1 +···+ξk i0
with a certain normalizing constant K0 and an initial interest rate i0 (for
details see the first example in Caflisch et al. (1997) and compare Gerst-
ner and Griebel (1998) and Paskov and Traub (1995)). Again the interest
rates can be discretized using either a random walk or the Brownian bridge
construction. For the numerical computation, the integral over Rd is trans-
formed into an unweighted integral on [0, 1]d with the help of the inverse
normal distribution.
In Figure 5.30, we again compare the conventional sparse grid method
with the dimension-adaptive method for the random walk and the Brownian
bridge discretization. The error is computed against an independent quasi-
Monte Carlo calculation. Note also that in this example the convergence
rate of the conventional sparse grid approach is comparable to the quasi-
Monte Carlo method (Gerstner and Griebel 1998).
We see again that a weighting of the dimensions does not influence the
convergence of the conventional sparse grid method. But for the dimension-
adaptive method the amount of work is again substantially reduced (by
several orders of magnitude) for the same accuracy when the Brownian
bridge discretization is used, and thus higher accuracies can be obtained.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


0.01 7
Sparse Grid Random Walk Maximum Level with Random Walk
Sparse Grid Brownian Bridge Maximum Level with Brownian Bridge
Dimension-Adaptive Random Walk 6
0.001 Dimension-Adaptive Brownian Bridge

5
0.0001

Level
Error

1e-05
3

1e-06
2

1e-07 1

1e-08 0
1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 0 50 100 150 200 250 300
Function Evaluations Dimension

Figure 5.30. Computational results for the CMO problem


(d = 256): integration error vs. number of function evaluations
(left) and maximum level over all dimensions (sorted) for the
dimension-adaptive algorithm with and without Brownian bridge
discretization (right).

In this example the dimension-adaptive method also gives better results


than the conventional sparse grid method for the random walk discretiza-
tion. This implies that the conventional sparse grid uses too many points in
mixed dimensions for this problem. The problem seems to be intrinsically
lower-dimensional and nearly additive (Caflisch et al. 1997).
At present, we are working to carry this dimension-adaptive approach
over to PDEs.

6. Concluding remarks
In this contribution we have given an overview of the basic principles and
properties of sparse grids as well as a report on the state of the art con-
cerning sparse grid applications. Starting from the dominant motivation –
breaking the curse of dimensionality – we discussed the underlying tensor
product approach, based upon different 1D multiscale bases such as the
classical piecewise linear hierarchical basis, general hierarchical polynomial
bases, interpolets, or wavelets. We then presented various resulting sparse
grid constructions and discussed their properties with respect to computa-
tional complexity, discretization error, and smoothness requirements. The
approach can be extended to nonsmooth solutions by adaptive refinement
methods. We demonstrated the effectiveness of sparse grids in a series of ap-
plications. The presented numerical results include 2D and 3D PDE model
problems, flow problems, and even two non-PDE applications in higher di-
mensions, namely numerical quadrature and data mining.
Since their introduction slightly more than a decade ago, sparse grids have
seen a very successful development and a variety of different applications.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Especially for higher-dimensional scenarios, we are convinced that sparse
grids, together with dimension adaptivity, will also have a thriving future.
For readers who want to stay up-to-date on sparse grid research, we
refer to the sparse grid bibliography at www.ins.uni-bonn.de/info/sgbib,
which gives roughly 300 articles from the past 40 years.

REFERENCES
S. Achatz (2003a), Adaptive finite Dünngitter-Elemente höherer Ordnung für ellip-
tische partielle Differentialgleichungen mit variablen Koeffizienten, Disserta-
tion, Institut für Informatik, TU München.
S. Achatz (2003b), ‘Higher order sparse grids methods for elliptic partial differential
equations with variable coefficients’, Computing 71, 1–15.
G. Allaire (1992), ‘Homogenization and two-scale convergence’, SIAM J. Math.
Anal. 21, 1482–1516.
A. Allen (1972), Regression and the Moore–Penrose Pseudoinverse, Academic
Press, New York.
E. Arge, M. Dæhlen and A. Tveito (1995), ‘Approximation of scattered data using
smooth grid functions’, J. Comput. Appl. Math. 59, 191–205.
K. Babenko (1960), ‘Approximation by trigonometric polynomials in a certain
class of periodic functions of several variables’, Soviet Math. Dokl. 1, 672–
675. Russian original in Dokl. Akad. Nauk SSSR 132 (1960), 982–985.
R. Balder (1994), Adaptive Verfahren für elliptische und parabolische Differen-
tialgleichungen auf dünnen Gittern, Dissertation, Institut für Informatik, TU
München.
R. Balder and C. Zenger (1996), ‘The solution of multidimensional real Helmholtz
equations on sparse grids’, SIAM J. Sci. Comp. 17, 631–646.
R. Balder, U. Rüde, S. Schneider and C. Zenger (1994), Sparse grid and extra-
polation methods for parabolic problems, in Proc. International Conference
on Computational Methods in Water Resources, Heidelberg 1994 (A. Peters
et al., eds), Kluwer Academic, Dordrecht, pp. 1383–1392.
A. Barron (1993), ‘Universal approximation bounds for superpositions of a sig-
moidal function’, IEEE Trans. Inform. Theory 39, 930–945.
A. Barron (1994), ‘Approximation and estimation bounds for artificial neural net-
works’, Machine Learning 14, 115–133.
G. Baszenski and F. Delvos (1993), Multivariate Boolean midpoint rules, in Nu-
merical Integration IV (H. Brass and G. Hämmerlin, eds), Vol. 112 of Inter-
national Series of Numerical Mathematics, Birkhäuser, Basel, pp. 1–11.
G. Baszenski, F. Delvos and S. Jester (1992), Blending approximations with sine
functions, in Numerical Methods of Approximation Theory 9 (D. Braess and
L. Schumaker, eds), Vol. 105 of International Series of Numerical Mathemat-
ics, Birkhäuser, Basel, pp. 1–19.
B. J. C. Baxter and A. Iserles (2003), ‘On the foundations of computational math-
ematics’, in Handbook of Numerical Analysis, Vol. 11 (F. Cucker, ed.), El-
sevier, pp. 3–35.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


R. Becker and R. Rannacher (1996), ‘A feed-back approach to error control in finite
element methods: Basic analysis and examples’, East–West J. Numer. Math.
4, 237–264.
J. Bell, P. Colella and H. Glaz (1989), ‘A second order projection method for the
incompressible Navier–Stokes equations’, J. Comput. Phys. 85, 257–283.
R. Bellmann (1961), Adaptive Control Processes: A Guided Tour, Princeton Uni-
versity Press.
K. Bennett and O. Mangasarian (1992), ‘Robust linear programming discrimin-
ation of two linearly inseparable sets’, Optimiz. Methods and Software 1,
23–34.
M. Berry and G. Linoff (2000), Mastering Data Mining, Wiley.
C. Blake and C. Merz (1998), ‘UCI repository of machine learning databases’.
www.ics.uci.edu/∼mlearn/MLRepository.html
T. Bonk (1994a), Ein rekursiver Algorithmus zur adaptiven numerischen Quad-
ratur mehrdimensionaler Funktionen, Dissertation, Institut für Informatik,
TU München.
T. Bonk (1994b), A new algorithm for multi-dimensional adaptive numerical
quadrature, in Adaptive Methods: Algorithms, Theory, and Applications
(W. Hackbusch and G. Wittum, eds), Vol. 46 of Notes on Numerical Fluid
Mechanics, Vieweg, Braunschweig/Wiesbaden, pp. 54–68.
P. Bradley and O. Mangasarian (1998), Feature selection via concave minimization
and support vector machines, in Machine Learning: Proc. 15th International
Conference; ICML ’98 (J. Shavlik, ed.), Morgan Kaufmann, pp. 82–90.
H.-J. Bungartz (1992a), An adaptive Poisson solver using hierarchical bases and
sparse grids, in Iterative Methods in Linear Algebra (P. de Groen and R. Beau-
wens, eds), Elsevier, Amsterdam, pp. 293–310.
H.-J. Bungartz (1992b), Dünne Gitter und deren Anwendung bei der adaptiven
Lösung der dreidimensionalen Poisson-Gleichung, Dissertation, Institut für
Informatik, TU München.
H.-J. Bungartz (1996), Concepts for higher order finite elements on sparse grids, in
Proc. International Conference on Spectral and High Order Methods, Houston
1995 (A. Ilin and L. Scott, eds), Houston J. Math., pp. 159–170.
H.-J. Bungartz (1997), ‘A multigrid algorithm for higher order finite elements on
sparse grids’, ETNA 6, 63–77.
H.-J. Bungartz (1998), Finite elements of higher order on sparse grids, Habilita-
tionsschrift, Institut für Informatik, TU München and Shaker Verlag, Aachen.
H.-J. Bungartz and S. Dirnstorfer (2003), ‘Multivariate quadrature on adaptive
sparse grids’, Computing 71, 89–114.
H.-J. Bungartz and T. Dornseifer (1998), Sparse grids: Recent developments for
elliptic partial differential equations, in Multigrid Methods V (W. Hackbusch
and G. Wittum, eds), Vol. 3 of Lecture Notes in Computational Science and
Engineering, Springer, Berlin/Heidelberg.
H.-J. Bungartz and M. Griebel (1999), ‘A note on the complexity of solving Pois-
son’s equation for spaces of bounded mixed derivatives’, J. Complexity 15,
167–199.
H.-J. Bungartz and W. Huber (1995), First experiments with turbulence simula-
tion on workstation networks using sparse grid methods, in Computational

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


Fluid Dynamics on Parallel Systems (S. Wagner, ed.), Vol. 50 of Notes on
Numerical Fluid Mechanics, Vieweg, Braunschweig/Wiesbaden.
H.-J. Bungartz, M. Griebel and U. Rüde (1994a), ‘Extrapolation, combination, and
sparse grid techniques for elliptic boundary value problems’, Comput. Meth.
Appl. Mech. Eng. 116, 243–252.
H.-J. Bungartz, M. Griebel, D. Röschke and C. Zenger (1994b), ‘Pointwise conver-
gence of the combination technique for the Laplace equation’, East–West J.
Numer. Math. 2, 21–45.
H.-J. Bungartz, M. Griebel, D. Röschke and C. Zenger (1994c), Two proofs of con-
vergence for the combination technique for the efficient solution of sparse grid
problems, in Domain Decomposition Methods in Scientific and Engineering
Computing (D. Keyes and J. Xu, eds), Vol. 180 of Contemporary Mathemat-
ics, AMS, Providence, RI, pp. 15–20.
H.-J. Bungartz, M. Griebel, D. Röschke and C. Zenger (1996), ‘A proof of con-
vergence for the combination technique using tools of symbolic computation’,
Math. Comp. Simulation 42, 595–605.
R. Caflisch, W. Morokoff and A. Owen (1997), ‘Valuation of mortgage backed
securities using Brownian bridges to reduce effective dimension’, J. Comput.
Finance 1, 27–46.
J. Carnicer, W. Dahmen and J. Pena (1996), ‘Local decomposition of refinable
spaces’, Appl. Comp. Harm. Anal. 3, 127–153.
V. Cherkassky and F. Mulier (1998), Learning from Data: Concepts, Theory and
Methods, Wiley.
A. Chorin (1968), ‘Numerical solution of the Navier–Stokes equations’, Math.
Comp. 22, 745–762.
A. Chorin (1998), Vorticity and Turbulence, Springer.
C. Chui (1992), An Introduction to Wavelets, Academic Press, Boston.
C. Chui and Y. Wang (1992), ‘A general framework for compactly supported splines
and wavelets’, J. Approx. Theory 71, 263–304.
D. Cioranescu, A. Damlamian and G. Griso (2002), ‘Periodic unfolding and homo-
genization’, CR Acad. Sci. Paris, Ser. I 335, 99–104.
K. Cios, W. Pedrycz and R. Swiniarski (1998), Data Mining Methods for Knowledge
Discovery, Kluwer.
A. Cohen (2003), Numerical Analysis of Wavelet Methods, Vol. 32 of Studies in
Mathematics and its Applications, North-Holland.
A. Cohen and I. Daubechies (1996), ‘A new technique to estimate the regularity
of refinable functions’, Rev. Mat. Iberoamer. 12, 527–591.
A. Cohen, W. Dahmen and R. DeVore (2001), ‘Adaptive wavelet methods for
elliptic operator equations’, Math. Comp. 70, 27–75.
A. Cohen, I. Daubechies and J. Feauveau (1992), ‘Biorthogonal bases of compactly
supported wavelets’, Comm. Pure Appl. Math. 45, 485–560.
R. Cools and B. Maerten (1997), Experiments with Smolyak’s algorithm for integ-
ration over a hypercube, Technical Report, Department of Computer Science,
Katholieke Universiteit Leuven.
W. Dahmen and A. Kunoth (1992), ‘Multilevel preconditioning’, Numer. Math.
63, 315–344.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


I. Daubechies (1988), ‘Orthogonal bases of compactly supported wavelets’, Comm.
Pure Appl. Math. 41, 909–996.
I. Daubechies (1992), Ten Lectures on Wavelets, Vol. 61 of CBMS–NSF Regional
Conf. Series in Appl. Math., SIAM.
I. Daubechies (1993), ‘Orthonormal bases of compactly supported wavelets II’,
SIAM J. Math. Anal. 24, 499–519.
I. Daubechies and W. Sweldens (1998), ‘Factoring wavelet transforms into lifting
steps’, J. Fourier Anal. Appl. 4, 245–267.
F. Delvos (1982), ‘d-variate Boolean interpolation’, J. Approx. Theory 34, 99–114.
F. Delvos (1990), ‘Boolean methods for double integration’, Math. Comp. 55, 683–
692.
F. Delvos and W. Schempp (1989), Boolean Methods in Interpolation and Ap-
proximation, Vol. 230 of Pitman Research Notes in Mathematics, Longman
Scientific and Technical, Harlow.
G. Deslauriers and S. Dubuc (1989), ‘Symmetric iterative interpolation processes’,
Constr. Approx. 5, 49–68.
R. DeVore (1998), Nonlinear approximation, in Acta Numerica, Vol. 7, Cambridge
University Press, pp. 51–150.
R. DeVore, S. Konyagin and V. Temlyakov (1998), ‘Hyperbolic wavelet approxim-
ation’, Constr. Approx. 14, 1–26.
J. Dick, I. Sloan, X. Wang and H. Woźniakowski (2003), Liberating the weights,
Technical Report AMR03/04, University of New South Wales.
D. Donoho (2000), ‘High-dimensional data analysis: The curses and blessings of
dimensionality’. Aide-Memoire.
D. Donoho and P. Yu (1999), Deslauriers–Dubuc: Ten years after, in Vol. 18 of
CRM Proceedings and Lecture Notes (G. Deslauriers and S. Dubuc, eds).
P. V. Dooren and L. D. Ridder (1976), ‘An adaptive algorithm for numerical in-
tegration over an n-dimensional cube’, J. Comp. Appl. Math. 2, 207–217.
T. Dornseifer (1997), Diskretisierung allgemeiner elliptischer Differentialgleichun-
gen in krummlinigen Koordinatensystemen auf dünnen Gittern, Dissertation,
Institut für Informatik, TU München.
T. Dornseifer and C. Pflaum (1996), ‘Elliptic differential equations on curvilinear
bounded domains with sparse grids’, Computing 56, 607–615.
J. Elf, P. Lötstedt and P. Sjöberg (2001), Problems of high dimension in molecular
biology, in 17th Gamm Seminar, Leipzig 2001 (W. Hackbusch, ed.), pp. 1–10.
K. Eriksson, D. Estep, P. Hansbo and C. Johnson (1996), Adaptive Finite Elements,
Springer, Berlin/Heidelberg.
G. Faber (1909), ‘Über stetige Funktionen’, Mathematische Annalen 66, 81–94.
S. Fahlmann and C. Lebiere (1990), The cascade-correlation learning architecture,
in Advances in Neural Information Processing Systems, Vol. 2 (Touretzky,
ed.), Morgan-Kaufmann.
A. Frank (1995), Hierarchische Polynombasen zum Einsatz in der Datenkompres-
sion mit Anwendung auf Audiodaten, Diplomarbeit, Institut für Informatik,
TU München.
K. Frank and S. Heinrich (1996), ‘Computing discrepancies of Smolyak quadrature
rules’, J. Complexity 12, 287–314.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


K. Frank, S. Heinrich and S. Pereverzev (1996), ‘Information complexity of mul-
tivariate Fredholm integral equations in Sobolev classes’, J. Complexity 12,
17–34.
J. Friedman (1991), ‘Multivariate adaptive regression splines’, Ann. Statist. 19,
1–141. With discussion and a rejoinder by the author.
J. Garcke (1998), Berechnung von Eigenwerten der stationären Schrödinger-
gleichung mit der Kombinationstechnik, Diplomarbeit, Institut für Ange-
wandte Mathematik, Universität Bonn.
J. Garcke and M. Griebel (2000), ‘On the computation of the eigenproblems of
hydrogen and helium in strong magnetic and electric fields with the sparse
grid combination technique’, J. Comput. Phys. 165, 694–716.
J. Garcke and M. Griebel (2001a), Data mining with sparse grids using simpli-
cial basis functions, in Proc. 7th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, San Francisco, USA (F. Provost and
R. Srikant, eds), pp. 87–96.
J. Garcke and M. Griebel (2001b), On the parallelization of the sparse grid approach
for data mining, in Large-Scale Scientific Computations, Third International
Conference, LSSC 2001, Sozopol, Bulgaria (S. Margenov, J. Wasniewski and
P. Yalamov, eds), Vol. 2179 of Lecture Notes in Computer Science, pp. 22–32.
J. Garcke and M. Griebel (2002), ‘Classification with sparse grids using simplicial
basis functions’, Intelligent Data Analysis 6, 483–502.
J. Garcke, M. Griebel and M. Thess (2001), ‘Data mining with sparse grids’, Com-
puting 67, 225–253.
J. Garcke, M. Hegland and O. Nielsen (2003), Parallelisation of sparse grids for
large scale data analysis, in Proc. International Conference on Computational
Science 2003 (ICCS 2003 ), Melbourne, Australia (P. Sloot, D. Abramson,
A. Bogdanov, J. Dongarra, A. Zomaya and Y. Gorbachev, eds), Vol. 2659 of
Lecture Notes in Computer Science, Springer, pp. 683–692.
A. Genz and A. Malik (1980), ‘An adaptive algorithm for numerical integration
over an n-dimensional rectangular region’, J. Comp. Appl. Math. 6, 295–302.
T. Gerstner (1995), Ein adaptives hierarchisches Verfahren zur Approximation und
effizienten Visualisierung von Funktionen und seine Anwendung auf digitale
3D Höhenmodelle, Diplomarbeit, Institut für Informatik, TU München.
T. Gerstner (1999), Adaptive hierarchical methods for landscape representation
and analysis, in Process Modelling and Landform Evolution, Vol. 78 of Lecture
Notes in Earth Sciences, Springer.
T. Gerstner and M. Griebel (1998), ‘Numerical integration using sparse grids’,
Numer. Alg. 18, 209–232.
T. Gerstner and M. Griebel (2003), ‘Dimension-adaptive tensor-product quadrat-
ure’, Computing 71, 65–87.
F. Girosi (1998), ‘An equivalence between sparse approximation and support vector
machines’, Neural Computation 10, 1455–1480.
F. Girosi, M. Jones and T. Poggio (1995), ‘Regularization theory and neural net-
works architectures’, Neural Computation 7, 219–265.
G. Golub, M. Heath and G. Wahba (1979), ‘Generalized cross validation as a
method for choosing a good ridge parameter’, Technometrics 21, 215–224.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


W. Gordon (1969), Distributive lattices and the approximation of multivari-
ate functions, in Approximation with Special Emphasis on Spline Functions
(I. Schoenberg, ed.), Academic Press, New York, pp. 223–277.
W. Gordon (1971), ‘Blending function methods of bivariate and multivariate in-
terpolation and approximation’, SIAM J. Numer. Anal. 8, 158–177.
W. Gordon and C. Hall (1973), ‘Transfinite element methods: Blending-function
interpolation over arbitrary curved element domains’, Numer. Math. 21, 109–
129.
R. Graham, D. Knuth and O. Patashnik (1994), Concrete Mathematics, Addison-
Wesley, Reading.
P. Gresho and R. Sani (1987), ‘On pressure boundary conditions for the incom-
pressible Navier–Stokes equations’, Int. J. Numer. Meth. Fluids 7, 371–394.
M. Griebel (1991a), Parallel multigrid methods on sparse grids, in Multigrid Meth-
ods III (W. Hackbusch and U. Trottenberg, eds), Vol. 98 of International
Series of Numerical Mathematics, Birkhäuser, Basel, pp. 211–221.
M. Griebel (1991b), A parallelizable and vectorizable multi-level algorithm
on sparse grids, in Parallel Algorithms for Partial Differential Equations
(W. Hackbusch, ed.), Vol. 31 of Notes on Numerical Fluid Mechanics, Vieweg,
Braunschweig/Wiesbaden, pp. 94–100.
M. Griebel (1992), ‘The combination technique for the sparse grid solution of PDEs
on multiprocessor machines’, Parallel Processing Letters 2, 61–70.
M. Griebel (1993), Sparse grid multilevel methods, their parallelization and their
application to CFD, in Proc. Parallel Computational Fluid Dynamics 1992
(J. Häser, ed.), Elsevier, Amsterdam, pp. 161–174.
M. Griebel (1994a), ‘Multilevel algorithms considered as iterative methods on semi-
definite systems’, SIAM J. Sci. Statist. Comput. 15, 547–565.
M. Griebel (1994b), Multilevelmethoden als Iterationsverfahren über Erzeugenden-
systemen, Teubner Skripten zur Numerik, Teubner, Stuttgart.
M. Griebel (1998), ‘Adaptive sparse grid multilevel methods for elliptic PDEs based
on finite differences’, Computing 61, 151–179.
M. Griebel and W. Huber (1995), Turbulence simulation on sparse grids using
the combination method, in Parallel Computational Fluid Dynamics, New
Algorithms and Applications (N. Satofuka, J. Périaux and A. Ecer, eds),
Elsevier, Amsterdam, pp. 75–84.
M. Griebel and S. Knapek (2000), ‘Optimized tensor-product approximation
spaces’, Constr. Approx. 16, 525–540.
M. Griebel and F. Koster (2000), Adaptive wavelet solvers for the unsteady in-
compressible Navier–Stokes equations, in Advances in Mathematical Fluid
Mechanics (J. Malek, J. Necas and M. Rokyta, eds), Springer, pp. 67–118.
M. Griebel and F. Koster (2003), Multiscale methods for the simulation of turbulent
flows, in Numerical Flow Simulation III (E. Hirschel, ed.), Vol. 82 of Notes on
Numerical Fluid Mechanics and Multidisciplinary Design, Springer, pp. 203–
214.
M. Griebel and P. Oswald (1994), ‘On additive Schwarz preconditioners for sparse
grid discretizations’, Numer. Math. 66, 449–463.
M. Griebel and P. Oswald (1995a), ‘On the abstract theory of additive and multi-
plicative Schwarz algorithms’, Numer. Math. 70, 161–180.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


M. Griebel and P. Oswald (1995b), ‘Tensor product type subspace splittings and
multilevel iterative methods for anisotropic problems’, Adv. Comput. Math.
4, 171–206.
M. Griebel and T. Schiekofer (1999), An adaptive sparse grid Navier–Stokes solver
in 3D based on the finite difference method, in Proc. ENUMATH97 (H. Bock,
G. Kanschat, R. Rannacher, F. Brezzi, R. Glowinski, Y. Kuznetsov and
J. Periaux, eds), World Scientific, Heidelberg.
M. Griebel and M. Schweitzer (2002), ‘A particle-partition of unity method, part II:
Efficient cover construction and reliable integration’, SIAM J. Sci. Comp. 23,
1655–1682.
M. Griebel and V. Thurner (1993), ‘Solving CFD-problems efficiently by the com-
bination method’, CFD-News 3, 19–31.
M. Griebel and V. Thurner (1995), ‘The efficient solution of fluid dynamics prob-
lems by the combination technique’, Int. J. Numer. Meth. Heat Fluid Flow
5, 251–269.
M. Griebel, W. Huber and C. Zenger (1993a), A fast Poisson solver for turbu-
lence simulation on parallel computers using sparse grids, in Flow Simulation
on High-Performance Computers I (E. Hirschel, ed.), Vol. 38 of Notes on
Numerical Fluid Mechanics, Vieweg, Braunschweig/Wiesbaden.
M. Griebel, W. Huber and C. Zenger (1996), Numerical turbulence simulation
on a parallel computer using the combination method, in Flow Simulation
on High-Performance Computers II (E. Hirschel, ed.), Vol. 52 of Notes on
Numerical Fluid Mechanics, Vieweg, Braunschweig/Wiesbaden, pp. 34–47.
M. Griebel, P. Oswald and T. Schiekofer (1999), ‘Sparse grids for boundary integral
equations’, Numer. Math. 83, 279–312.
M. Griebel, M. Schneider and C. Zenger (1992), A combination technique for
the solution of sparse grid problems, in Iterative Methods in Linear Algebra
(P. de Groen and R. Beauwens, eds), Elsevier, Amsterdam, pp. 263–281.
M. Griebel, C. Zenger and S. Zimmer (1993b), ‘Multilevel Gauss–Seidel-algorithms
for full and sparse grid problems’, Computing 50, 127–148.
M. Gromov (1999), Metric Structures for Riemannian and Non-Riemannian
Spaces, Vol. 152 of Progress in Mathematics, Birkhäuser.
W. Hackbusch (1985), Multigrid Methods and Applications, Springer, Berlin/
Heidelberg.
W. Hackbusch (1986), Theorie und Numerik elliptischer Differentialgleichungen,
Teubner, Stuttgart.
W. Hackbusch (2001), ‘The efficient computation of certain determinants arising
in the treatment of Schrödinger’s equation’, Computing 67, 35–56.
W. Hahn (1990), Parallelisierung eines adaptiven hierarchischen Dünngitterver-
fahrens, Diplomarbeit, Institut für Informatik, TU München.
K. Hallatschek (1992), ‘Fouriertransformation auf dünnen Gittern mit hierarchis-
chen Basen’, Numer. Math. 63, 83–97.
T. Hastie and R. Tibshirani (1990), Generalized Additive Models, Chapman and
Hall.
T. He (2001), Dimensionality Reducing Expansion of Multivariate Integration,
Birkhäuser.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


M. Hegland (2002), Additive sparse grid fitting, in Proc. 5th International Confer-
ence on Curves and Surfaces, Saint-Malo, France 2002. Submitted.
M. Hegland (2003), Adaptive sparse grids, in Proc. 10th Computational Techniques
and Applications Conference, CTAC-2001 (K. Burrage and R. Sidje, eds),
Vol. 44 of ANZIAM J., pp. C335–C353.
M. Hegland and V. Pestov (1999), Additive models in high dimensions, Technical
Report 99-33, MCS-VUW research report.
P. Hemker (1995), ‘Sparse-grid finite-volume multigrid for 3D problems’, Adv.
Comput. Math. 4, 83–110.
P. Hemker and P. de Zeeuw (1996), BASIS3: A data structure for 3-dimensional
sparse grids, in Euler and Navier–Stokes Solvers Using Multi-Dimensional
Upwind Schemes and Multigrid Acceleration (H. Deconinck and B. Koren,
eds), Vol. 56 of Notes on Numerical Fluid Mechanics, Vieweg, Braunsch-
weig/Wiesbaden, pp. 443–484.
P. Hemker and C. Pflaum (1997), ‘Approximation on partially ordered sets of
regular grids’, Appl. Numer. Math. 25, 55–87.
P. Hemker, B. Koren and J. Noordmans (1998), 3D multigrid on partially ordered
sets of grids. Multigrid Methods V, Vol. 3 of Lecture Notes in Computational
Science and Engineering, Springer, Berlin/Heidelberg, pp. 105–124.
J. Hennart and E. Mund (1988), ‘On the h- and p-versions of the extrapolated Gor-
don’s projector with applications to elliptic equations’, SIAM J. Sci. Statist.
Comput. 9, 773–791.
J. Heroth (1997), Are sparse grids suitable for the tabulation of reduced chemical
systems?, Technical Report, TR 97-2, Konrad-Zuse-Zentrum für Information-
stechnik Berlin.
F. Hickernell, I. Sloan and G. Wasilkowski (2003), On tractability of weighted integ-
ration for certain Banach spaces of functions, Technical Report AMR03/08,
University of New South Wales.
R. Hiptmair and V. Gradinaru (2003), ‘Multigrid for discrete differential forms on
sparse grids’, Computing 71, 17–42.
E. Hlawka (1961), ‘Funktionen von beschränkter Variation in der Theorie der
Gleichverteilung’, Ann. Math. Pure Appl. 54, 325–333.
T. Ho and E. Kleinberg (1996), ‘Checkerboard dataset’.
www.cs.wisc.edu/math-prog/mpml.html
V. Hoang and C. Schwab (2003), High-dimensional finite elements for elliptic prob-
lems with multiple scales, Technical Report 2003-14, Seminar für Angewandte
Mathematik, ETH Zürich.
R. Hochmuth (1999), Wavelet bases in numerical analysis and restricted nonlinear
approximation, Habilitationsschrift, Freie Universität Berlin.
R. Hochmuth, S. Knapek and G. Zumbusch (2000), Tensor products of Sobolev
spaces and applications, Technical Report 685, SFB 256, Universität Bonn.
J. Hoschek and D. Lasser (1992), Grundlagen der Geometrischen Datenverarbei-
tung, Teubner, chapter 9.
W. Huber (1996a), Numerical turbulence simulation on different parallel computers
using the sparse grid combination method, in Proc. EuroPar ’96, Lyon 1996
(L. Bougé, P. Fraigniaud, A. Mignotte and Y. Robert, eds), Vol. 1124 of
Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, pp. 62–65.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


W. Huber (1996b), Turbulenzsimulation mit der Kombinationsmethode auf Work-
station-Netzen und Parallelrechnern, Dissertation, Institut für Informatik,
TU München.
M. Kalos and P. Whitlock (1986), Monte Carlo Methods, Wiley.
G. Karniadakis and S. Sherwin (1999), Spectral/hp Element Methods for CFD,
Oxford University Press.
L. Kaufman (1999), Solving the quadratic programming problem arising in support
vector classification, in Advances in Kernel Methods: Support Vector Learning
(B. Schölkopf, C. Burges and A. Smola, eds), MIT Press, pp. 146–167.
S. Khavinson (1997), Best Approximation by Linear Superposition (Approximate
Nomography), Vol. 159 of AMS Translations of Mathematical Monographs,
AMS, Providence, RI.
S. Knapek (2000a), Approximation and Kompression mit Tensorprodukt-Multi-
skalen-Approximationsräumen, Dissertation, Institut für Angewandte Math-
ematik, Universität Bonn.
S. Knapek (2000b), Hyperbolic cross approximation of integral operators with
smooth kernel, Technical Report 665, SFB 256, Universität Bonn.
S. Knapek and F. Koster (2002), ‘Integral operators on sparse grids’, SIAM J.
Numer. Anal. 39, 1794–1809.
A. Kolmogorov (1956), ‘On the representation of continuous functions of several
variables by superpositions of continuous functions of fewer variables’, Dokl.
Akad. Nauk SSSR 108, 179–182. In Russian; English Translation in Amer.
Math. Soc. Transl. (2) 17 (1961), 369–373.
A. Kolmogorov (1957), ‘On the representation of continuous functions of several
variables by superpositions of continuous functions of one variable and addi-
tion’, Dokl. Akad. Nauk SSSR 114, 953–956. In Russian; English Translation
in Amer. Math. Soc. Transl. (2) 28 (1963), 55–59.
B. Koren, P. Hemker and P. de Zeeuw (1996), Semi-coarsening in three directions
for Euler-flow computations in three dimensions, in Euler and Navier–Stokes
Solvers Using Multi-Dimensional Upwind Schemes and Multigrid Accelera-
tion (H. Deconinck and B. Koren, eds), Vol. 56 of Notes on Numerical Fluid
Mechanics, Vieweg, Braunschweig/Wiesbaden, pp. 547–567.
F. Koster (2002), Multiskalen-basierte Finite Differenzen Verfahren auf adaptiven
dünnen Gittern, Dissertation, Institut für Angewandte Mathematik, Uni-
versität Bonn.
A. Krommer and C. Ueberhuber (1994), Numerical Integration on Advanced Com-
puter Systems, Vol. 848 of Lecture Notes in Computer Science, Springer, Ber-
lin/Heidelberg.
F. Kupka (1997), Sparse Grid Spectral Methods for the Numerical Solution of Par-
tial Differential Equations with Periodic Boundary Conditions, Dissertation,
Institut für Mathematik, Universität Wien.
Y. Lee and O. Mangasarian (2001), ‘SSVM: A smooth support vector machine for
classification’, Comput. Optimiz. Appl. 20, 5–22.
P. Lemarié (1988), ‘Ondelettes à localisation exponentielle’, J. Math. Pures Appl.
67, 222–236.
C. Liem, T. Lu and T. Shih (1995), The Splitting Extrapolation Method, World
Scientific, Singapore.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


R. Lorentz and P. Oswald (1998), Multilevel finite element Riesz bases in So-
bolev spaces, in Proc. 9th International Conference on Domain Decomposition
(P. Bjoerstad et al., eds), Domain Decomposition Press, Bergen, pp. 178–187.
S. Martello and P. Toth (1990), Knapsack Problems: Algorithms and Computer
Implementations, Wiley, Chichester.
A. Matache (2001), Sparse two-scale FEM for homogenization problems, Technical
Report 2001-09, Seminar für Angewandte Mathematik, ETH Zürich.
A. Matache and C. Schwab (2001), Two-scale FEM for homogenization prob-
lems, Technical Report 2001-06, Seminar für Angewandte Mathematik, ETH
Zürich.
G. Melli (2003), ‘DatGen: A program that creates structured data’, web site:
www.datasetgenerator.com
Y. Meyer (1992), Wavelets and Operators, Cambridge University Press.
H. Mhaskar (1996), ‘Neural networks and approximation theory’, Neural Networks
9, 711–722.
A. Michalke (1964), ‘On the inviscid instability of the hyperbolic tangent velocity
profile’, J. Fluid Mech. 19, 543–556.
V. Milman (1988), ‘The heritage of P. Levy in geometrical functional analysis’,
Asterisque 157–158, 273–301.
V. Milman and G. Schechtman (1986), Asymptotic Theory of Finite-Dimensional
Normed Spaces, Vol. 1200 of Lecture Notes in Mathematics, Springer.
U. Mitzlaff (1997), Diffusionsapproximation von Warteschlangensystemen, Disser-
tation, Institut für Mathematik, Technische Universität Clausthal.
W. Morokoff and R. Caflisch (1995), ‘Quasi-Monte Carlo integration’, J. Comput.
Phys. 122, 218–230.
W. Mulder (1989), ‘A new multigrid approach to convection problems’, J. Comput.
Phys. 83, 303–323.
N. Naik and J. van Rosendale (1993), ‘The improved robustness of multigrid elliptic
solvers based on multiple semicoarsened grids’, SIAM J. Numer. Anal. 30,
215–229.
H. Niederreiter (1992), Random Number Generation and Quasi-Monte-Carlo Meth-
ods, SIAM, Philadelphia.
P. Niyogi and F. Girosi (1998), ‘Generalization bounds for function approximation
from scattered noisy data’, Adv. Comput. Math. 10, 51–80.
E. Novak and K. Ritter (1996), ‘High dimensional integration of smooth functions
over cubes’, Numer. Math. 75, 79–98.
E. Novak and K. Ritter (1997), Simple cubature formulas for d-dimensional in-
tegrals with high polynomial exactness and small error, Technical Report,
Institut für Mathematik, Universität Erlangen–Nürnberg.
E. Novak and K. Ritter (1998), The curse of dimension and a universal
method for numerical integration, in Multivariate Approximation and Splines
(G. Nürnberger, J. Schmidt and G. Walz, eds), International Series in Nu-
merical Mathematics, Birkhäuser, Basel, pp. 177–188.
E. Novikov (1976), ‘Dynamics and statistics of a system of vortices’, Sov. Phys.
JETP 41,5, 937–943.
P. Oswald (1994), Multilevel Finite Element Approximation, Teubner Skripten zur
Numerik, Teubner, Stuttgart.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


S. Paskov (1993), ‘Average case complexity of multivariate integration for smooth
functions’, J. Complexity 9, 291–312.
S. Paskov and J. Traub (1995), ‘Faster valuation of financial derivatives’, J. Port-
folio Management 22, 113–120.
T. Patterson (1986), ‘The optimum addition of points to quadrature formulae’,
Math. Comp. 22, 847–856.
A. Paul (1995), Kompression von Bildfolgen mit hierarchischen Basen, Diplo-
marbeit, Institut für Informatik, TU München.
A. Peano (1976), ‘Hierarchies of conforming finite elements for plane elasticity and
plate bending’, Comp. Math. Appl. 2, 211–224.
K. Petras (2000), ‘On the Smolyak cubature error for analytic functions’, Adv.
Comput. Math. 12, 71–93.
A. Pfaffinger (1997), Funktionale Beschreibung und Parallelisierung von Algorith-
men auf dünnen Gittern, Dissertation, Institut für Informatik, TU München.
C. Pflaum (1992), Anwendung von Mehrgitterverfahren auf dünnen Gittern, Dip-
lomarbeit, Institut für Informatik, TU München.
C. Pflaum (1996), Diskretisierung elliptischer Differentialgleichungen mit dünnen
Gittern, Dissertation, Institut für Informatik, TU München.
C. Pflaum (1998), ‘A multilevel algorithm for the solution of second order elliptic
differential equations on sparse grids’, Numer. Math. 79, 141–155.
C. Pflaum and A. Zhou (1999), ‘Error analysis of the combination technique’,
Numer. Math. 84, 327–350.
G. Pöplau and F. Sprengel (1997), Some error estimates for periodic interpolation
on full and sparse grids, in Curves and Surfaces with Applications in CAGD
(A. Le Méhauté, C. Rabut and L. Schumaker, eds), Vanderbilt University
Press, Nashville, Tennessee, pp. 355–362.
J. Prakash (2000), Rouse chains with excluded volume interactions: Linear
viscoelasticity, Technical Report 221, Berichte der Arbeitsgruppe Techno-
mathematik, Universität Kaiserslautern.
J. Prakash and H. Öttinger (1999), ‘Viscometric functions for a dilute solution of
polymers in a good solvent’, Macromolecules 32, 2028–2043.
A. Prohl (1997), Projection and Quasi-Compressibility Methods for Solving the In-
compressible Navier–Stokes Equations, Advances in Numerical Mathematics,
B. G. Teubner.
T. Rassias and J. Simsa (1995), Finite Sums Decompositions in Mathematical Ana-
lysis, Wiley.
C. Reisinger (2003), Numerische Methoden für hochdimensionale parabolische
Gleichungen am Beispiel von Optionspreisaufgaben, Dissertation, Universität
Heidelberg.
P. Rouse (1953), ‘A theory of the linear viscoelastic properties of dilute solutions
of coiling polymers’, J. Chem. Phys. 21, 1272–1280.
S. Salzberg (1997), ‘On comparing classifiers: Pitfalls to avoid and a recommended
approach’, Data Mining and Knowledge Discovery 1, 317–327.
T. Schiekofer (1998), Die Methode der finiten Differenzen auf dünnen Gittern zur
Lösung elliptischer und parabolischer partieller Differentialgleichungen, Dis-
sertation, Institut für Angewandte Mathematik, Universität Bonn.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


T. Schiekofer and G. Zumbusch (1998), Software concepts of a sparse grid finite
difference code, in Proc. 14th GAMM-Seminar on Concepts of Numerical
Software (W. Hackbusch and G. Wittum, eds), Notes on Numerical Fluid
Mechanics, Vieweg, Braunschweig/Wiesbaden.
S. Schneider (2000), Adaptive Solution of Elliptic PDE by Hierarchical Tensor
Product Finite Elements, Dissertation, Institut für Informatik, TU München.
C. Schwab and R. Todor (2002), Sparse finite elements for stochastic elliptic prob-
lems, Technical Report 2002-05, Seminar für Angewandte Mathematik, ETH
Zürich.
C. Schwab and R. Todor (2003), ‘Sparse finite elements for stochastic elliptic prob-
lems: Higher order moments’, Computing 71, 43–63.
M. Schweitzer (2003), A Parallel Multilevel Partition of Unity Method for Elliptic
Partial Differential Equations, Vol. 29 of Lecture Notes in Computational
Science and Engineering, Springer.
R. Sedgewick and P. Flajolet (1996), Analysis of Algorithms, Addison-Wesley,
Reading.
X. Shen, H. Chen, J. Dai and W. Dai (2002), ‘The finite element method for
computing the stationary distribution on an SRBM in a hypercube with ap-
plications to finite buffer queueing networks’, Queuing Systems 42, 33–62.
J. Simsa (1992), ‘The best L2 -approximation by finite sums of functions with sep-
arable variables’, Aequationes Mathematicae 43, 284–263.
P. Sjöberg (2002), Numerical solution of the master equation in molecular biology,
Master’s Thesis, Department of Scientific Computing, Uppsala Universität.
I. Sloan (2001), QMC integration: Beating intractability by weighting the coordin-
ate directions, Technical Report AMR01/12, University of New South Wales.
I. Sloan and S. Joe (1994), Lattice Methods for Multiple Integration, Oxford Uni-
versity Press.
I. Sloan and H. Woźniakowski (1998), ‘When are quasi-Monte Carlo algorithms
efficient for high dimensional integrals ?’, J. Complexity 14, 1–33.
S. Smolyak (1963), ‘Quadrature and interpolation formulas for tensor products of
certain classes of functions’, Soviet Math. Dokl. 4, 240–243. Russian original
in Dokl. Akad. Nauk SSSR 148 (1963), 1042–1045.
F. Sprengel (1997a), Interpolation and Wavelet Decomposition of Multivariate Peri-
odic Functions, Dissertation, FB Mathematik, Universität Rostock.
F. Sprengel (1997b), ‘A unified approach to error estimates for interpolation on full
and sparse Gauss–Chebyshev grids’, Rostocker Math. Kolloq. 51, 51–64.
R. Stevenson (1996), Piecewise linear (pre)-wavelets on non-uniform meshes, in
Multigrid Methods IV (W. Hackbusch and G. Wittum, eds), Vol. 3 of Lecture
Notes in Computational Science and Engineering, Springer.
M. Stone (1974), ‘Cross-validatory choice and assessment of statistical predictions’,
J. Royal Statist. Soc. 36, 111–147.
T. Störtkuhl (1995), Ein numerisches adaptives Verfahren zur Lösung der biharmo-
nischen Gleichung auf dünnen Gittern, Dissertation, Institut für Informatik,
TU München.
W. Sweldens (1997), ‘The lifting scheme: A construction of second generation
wavelets’, SIAM J. Math. Anal. 29, 511–546.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


M. Talagrand (1995), ‘Concentration of measure and isoperimetric inequalities in
product spaces’, Publ. Math. IHES 81, 73–205.
R. Temam (1969), ‘Sur l’approximation de la solution des équations de Navier–
Stokes par la méthode des fractionaires (II)’, Arch. Rat. Mech. Anal. 33,
377–385.
V. Temlyakov (1989), Approximation of Functions with Bounded Mixed Derivative,
Vol. 178 of Proc. Steklov Inst. of Math., AMS, Providence, RI.
V. Temlyakov (1993), ‘On approximate recovery of functions with bounded mixed
derivative’, J. Complexity 9, 41–59.
V. Temlyakov (1994), Approximation of Periodic Functions, Nova Science, Com-
mack, New York.
A. Tikhonov and V. Arsenin (1977), Solutions of Ill-Posed Problems, W. H. Win-
ston, Washington DC.
J. Traub and H. Woźniakowski (1980), A General Theory of Optimal Algorithms,
Academic Press, New York.
J. Traub, G. Wasilkowski and H. Woźniakowski (1983), Information, Uncertainty,
Complexity, Addison-Wesley, Reading.
J. Traub, G. Wasilkowski and H. Woźniakowski (1988), Information-Based Com-
plexity, Academic Press, New York.
H. Triebel (1992), Theory of Function Spaces II, Birkhäuser.
F. Utreras (1979), Cross-validation techniques for smoothing spline functions in one
or two dimensions, in Smoothing Techniques for Curve Estimation (T. Gasser
and M. Rosenblatt, eds), Springer, Heidelberg, pp. 196–231.
V. Vapnik (1982), Estimation of Dependences Based on Empirical Data, Springer,
Berlin.
V. Vapnik (1995), The Nature of Statistical Learning Theory, Springer.
G. Wahba (1985), ‘A comparison of GCV and GML for choosing the smoothing
parameter in the generalized splines smoothing problem’, Ann. Statist. 13,
1378–1402.
G. Wahba (1990), Spline Models for Observational Data, Vol. 59 of Series in Applied
Mathematics, SIAM, Philadelphia.
G. Wasilkovski and H. Woźniakowski (1995), ‘Explicit cost bounds of algorithms
for multivariate tensor product problems’, J. Complexity 11, 1–56.
G. Wasilkovski and H. Woźniakowski (1999), ‘Weighted tensor product algorithms
for linear multivariate problems’, J. Complexity 15, 402–447.
A. Werschulz (1995), The complexity of the Poisson problem for spaces of bounded
mixed derivatives, Technical Report CUCS-016-95, Columbia University.
H. Woźniakowski (1985), ‘A survey of information-based complexity’, J. Complexity
1, 11–44.
H. Yserentant (1986), ‘On the multi-level splitting of finite element spaces’, Numer.
Math. 49, 379–412.
H. Yserentant (1990), ‘Two preconditioners based on the multi-level splitting of
finite element spaces’, Numer. Math. 58, 163–184.
H. Yserentant (1992), Hierarchical bases, in Proc. ICIAM ’91, Washington 1991
(R. O’Malley et al., eds), SIAM, Philadelphia.
H. Yserentant (2004), On the regularity of the electronic Schrödinger equation in
Hilbert spaces of mixed derivatives. Numer. Math., in press.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press


R. Yue and F. Hickernell (2002), ‘Robust designs for smoothing spline ANOVA
models’, Metrika 55, 161–176.
C. Zenger (1991), Sparse grids, in Parallel Algorithms for Partial Differential Equa-
tions (W. Hackbusch, ed.), Vol. 31 of Notes on Numerical Fluid Mechanics,
Vieweg, Braunschweig/Wiesbaden.
O. Zienkiewicz, D. Kelly, J. Gago and I. Babuška (1982), Hierarchical finite element
approaches, error estimates, and adaptive refinement, in The Mathematics of
Finite Elements and Applications IV (J. Whiteman, ed.), Academic Press,
London.
G. Zumbusch (1996), Simultaneous hp Adaptation in Multilevel Finite Elements,
Shaker, Aachen.

https://doi.org/10.1017/S0962492904000182 Published online by Cambridge University Press

You might also like