Sparse Grids Book
Sparse Grids Book
Hans-Joachim Bungartz
IPVS, Universität Stuttgart,
Universitätsstraße 38, D-70569 Stuttgart, Germany
E-mail: [email protected]
Michael Griebel
Institut für Numerische Simulation, Universität Bonn,
Wegelerstraße 6, D-53113 Bonn, Germany
E-mail: [email protected]
1. Introduction
The discretization of PDEs by conventional methods is limited to prob-
lems with up to three or four dimensions, due to storage requirements and
computational complexity. The reason is the so-called curse of dimension-
ality, a term coined in Bellmann (1961). Here, the cost of computing and
representing an approximation with a prescribed accuracy ε depends expo-
nentially on the dimensionality d of the considered problem. We encounter
complexities of the order O(ε−αd ) with α > 0 depending on the respect-
ive approach, the smoothness of the function under consideration, and the
details of the implementation. If we consider simple uniform grids with
piecewise d-polynomial functions over a bounded domain in a finite element
or finite difference approach, for instance, this complexity estimate trans-
lates to O(N d ) grid points or degrees of freedom for which approximation
accuracies of the order O(N −α ) are achieved, where α depends on the
smoothness of the function under consideration and the polynomial degree
of the approximating functions.1 Thus, the computational cost and storage
requirements grow exponentially with the dimensionality of the problem,
which is the reason for the dimensional restrictions mentioned above, even
on the most powerful machines presently available.
The curse of dimensionality can be circumvented to some extent by re-
stricting the class of functions under consideration. If we make a stronger
assumption on the smoothness of the solution such that the order of ac-
curacy depends on d in a negative exponential way, i.e., it behaves like
O(N −β/d ), we directly see that the cost–benefit ratio is independent of d
and that it is of the order O(N d · N −β/d ) = O(N −β ), with some β inde-
pendent of d. This way, the curse of dimensionality can be broken easily.
However, such an assumption is somewhat unrealistic.
1
If the solution is not smooth but possesses singularities, the order α of accuracy de-
teriorates. Adaptive refinement/nonlinear approximation is employed with success. In
the best case, the cost–benefit ratio of a smooth solution can be recovered.
2
The constant in the corresponding complexity estimates, however, still depends ex-
ponentially on d. This still limits the approach for PDEs to moderate-dimensional
problems. At present we are able to deal with 18D PDEs.
f − fn = O(n−r/d ),
where r and d denote the isotropic smoothness of the function f and the
problem’s dimensionality, respectively. This is one of the main obstacles
in the treatment of high-dimensional problems. Therefore, the question is
whether we can find situations, i.e., either function spaces or error norms,
for which the curse of dimensionality can be broken. At first glance, there
is an easy way out: if we make a stronger assumption on the smoothness
of the function f such that r = O(d), then, we directly obtain f − fn =
O(n−c ) with constant c > 0. Of course, such an assumption is completely
unrealistic.
However, about ten years ago, Barron (1993) found an interesting result.
Denote by FL1 the class of functions with Fourier transforms in L1 . Then,
consider the class of functions of Rd with
∇f ∈ FL1 .
We expect an approximation rate
f − fn = O(n−1/d )
since ∇f ∈ FL1 ≈ r = 1. However, Barron was able to show
f − fn = O(n−1/2 )
independent of d. Meanwhile, other function classes are known with such
properties. These comprise certain radial basis schemes, stochastic sampling
techniques, and approaches that work with spaces of functions with bounded
mixed derivatives.
A better understanding of these results is possible with the help of har-
monic analysis (Donoho 2000). Here we resort to the approach of the L1 -
combination of L∞ -atoms; see also Triebel (1992) and DeVore (1998). Con-
sider the class of functions F(M ) with integral representation
f (x) = A(x, t) d|µ|(t)
with
d|µ|(t) ≤ M, (2.1)
where, for fixed t, we call A(x, t) = At (x) an L∞ -atom if |At (x)| ≤ 1 holds.
Then there are results from Maurey for Banach spaces and Stechkin in
where
f − fn ∞ ≤ C · n−1/2
with C independent of d. In the following, we consider examples of such
spaces.
Example 1. (Radial basis schemes) Consider superpositions of Gaus-
sian bumps. These resemble the space F(M, Gaussians) with t := (x0 , s)
and Gaussian atoms A(x, t) = exp(−x − x0 2 /s2 ), where · denotes the
Euclidean norm. Now, if the sum of the height of all Gaussians is bounded
by M , Niyogi and Girosi (1998) showed that the resulting approximation
rate is independent of d for the corresponding radial basis schemes. There is
no further condition on the widths or positions of the bumps. Note that this
corresponds to a ball in Besov space B1,1d (Rd ) which is just Meyer’s bump
Subspace decomposition
Let us start with some notation and with the preliminaries that are neces-
sary for a detailed discussion of sparse grids for purposes of interpolation or
approximation, respectively. On the d-dimensional unit interval Ω̄ := [0, 1]d ,
we consider multivariate functions u, u(x) ∈ R, x := (x1 , . . . , xd ) ∈ Ω̄, with
(in some sense) bounded weak mixed derivatives
∂ |α|1 u
Dα u := (3.1)
∂xα1 1 · · · ∂xαd d
up to some given order r ∈ N0 . Here, α ∈ Nd0 denotes a d-dimensional
multi-index with the two norms
d
|α|1 := αj and |α|∞ := max αj .
1≤j≤d
j=1
Figure 3.1. Tensor product approach for piecewise bilinear basis functions.
Thus, here and in the following, the multi-index l indicates the level (of
a grid, a point, or, later on, a basis function, respectively), whereas the
multi-index i denotes the location of a given grid point xl,i in the respective
grid Ωl .
Next, we have to define discrete approximation spaces and sets of basis
functions that span those discrete spaces. In a piecewise linear setting, the
simplest choice of a 1D basis function is the standard hat function φ(x),
1 − |x|, if x ∈ [−1, 1],
φ(x) := (3.7)
0, otherwise.
This mother of all piecewise linear basis functions can be used to generate
an arbitrary φlj ,ij (xj ) with support [xlj ,ij −hlj , xlj ,ij +hlj ] = [(ij −1)hlj , (ij +
1)hlj ] by dilation and translation, that is,
xj − ij · hlj
φlj ,ij (xj ) := φ . (3.8)
hlj
The resulting 1D basis functions are the input of the tensor product con-
struction which provides a suitable piecewise d-linear basis function in each
grid point xl,i (see Figure 3.1):
d
φl,i (x) := φlj ,ij (xj ). (3.9)
j=1
Since we deal with homogeneous boundary conditions (i.e., with X0q,2 (Ω̄)),
only those φl,i (x) that correspond to inner grid points of Ωl are taken into
account for the definition of
Vl := span φl,i : 1 ≤ i ≤ 2l − 1 , (3.10)
the space of piecewise d-linear functions with respect to the interior of Ωl .
Obviously, the φl,i form a basis of Vl , with one basis function φl,i of a support
of the fixed size 2 · hl for each inner grid point xl,i of Ωl , and this basis {φl,i }
is just the standard nodal point basis of the finite-dimensional space Vl .
can be easily seen. Note that the supports of all basis functions φl,i spanning
Wl are mutually disjoint. Thus, with the index set
Il := i ∈ Nd : 1 ≤ i ≤ 2l − 1, ij odd for all 1 ≤ j ≤ d , (3.13)
we get another basis of Vl , the hierarchical basis
φk,i : i ∈ Ik , k ≤ l , (3.14)
which generalizes the well-known 1D basis shown in Figure 3.2 to the
d-dimensional case by means of a tensor product approach. With these
hierarchical difference spaces Wl , we can define
∞
∞
V := ··· W(l1 ,...,ld ) = Wl (3.15)
l1 =1 ld =1 l∈Nd
the limit
∞
lim Vn(∞) = lim Wl := Vn(∞) = V (3.18)
n→∞ n→∞
|l|∞ ≤n n=1
(∞) (∞)
exists due to Vn ⊂ Vn+1 . Hence, any function u ∈ H01 (Ω̄) and, con-
sequently, any u ∈ X0q,2 (Ω̄) can be uniquely split by
u(x) = ul (x), ul (x) = vl,i · φl,i (x) ∈ Wl , (3.19)
l i∈Il
where the vl,i ∈ R are the coefficient values of the hierarchical product basis
representation of u also called hierarchical surplus.
Before we turn to finite-dimensional approximation spaces for X0q,2 (Ω̄),
we summarize the most important properties of the hierarchical subspaces
Wl according to Bungartz (1992b) and Bungartz and Griebel (1999).
which is equivalent to the H 1 -norm in H01 (Ω̄). For the Laplacian, (3.21)
indeed indicates the energy norm in finite element terminology. First we
look at the different hierarchical basis functions φl,i (x).
since ψl,i (xl,i − hl ) = ψl,i (xl,i + hl ) = 0 and since ∂ψl,i (x)/∂x ∈ { 12 , − 12 } due
to the construction of ψl,i and φl,i . Finally, the tensor product approach
according to the operator product given in (3.23) leads to a straightforward
generalization to d > 1.
The above lemma and its proof show the close relations of our hierarchical
basis approach to integral transforms like wavelet transforms. Applying
successive partial integration to (3.24), twice for d = 1 and 2d times for
general dimensionality, we get
2
vl,i = ψl,i (x) · D u(x) dx = ψ̂l,i (x) · u(x) dx, (3.25)
Ω Ω
where ψ̂l,i (x) equals D2 ψl,i (x) in a weak sense (i.e., in the sense of distri-
butions) and is a linear combination of 3d Dirac pulses of alternating sign.
Thus, the hierarchical surplus vl,i can be interpreted as the coefficient res-
ulting from an integral transform with respect to a function ψ̂l,i (x) of an
oscillating structure.
Starting from (3.24), we are now able to give bounds for the hierarchical
coefficients with respect to the different seminorms introduced in (3.3).
Proof. With (3.22), (3.24), and with the definition of ψl,i , we get
|vl,i | = ψl,i (x) · D u(x) dx ≤ ψl,i 1 · D2 u |supp(φl,i ) ∞
2
Ω
= 2−d · 2−|l|1 · φl,i 1 · u |supp(φl,i ) 2,∞
≤ 2−d · 2−2·|l|1 · |u|2,∞ .
i∈Il
In the next section, the information gathered above will be used to con-
struct finite-dimensional approximation spaces U for V or X0q,2 (Ω̄), respect-
ively. Such a U shall be based on a subspace selection I ⊂ Nd ,
U := Wl , (3.28)
l∈I
The estimate
u − uU = ul − ul ≤ ul ≤ b(l) · |u| (3.30)
l l∈I l∈I l∈I
W 12
l2
(∞)
of subspaces Wl as shown in Figure 3.3 for the 2D case, Vn corresponds
to a square sector of subspaces: see Figure 3.4.
(∞)
Obviously, the dimension of Vn (i.e., the number of inner grid points
in the underlying grid) is
(∞) n
Vn = 2 − 1 d = O(2d·n ) = O(h−d n ). (3.31)
(∞) (∞) (∞)
For the error u − un of the interpolant un ∈ Vn of a given function
q,2
u ∈ X0 (Ω̄) with respect to the different norms we are interested in, the
following lemma states the respective results.
W11
W12
33 23 33 13 33 23 33
32 22 32 12 32 22 32
33 23 33 13 33 23 33
31 21 31 11 31 21 31
33 23 33 13 33 23 33
W13
32 22 32 12 32 22 32
33 23 33 13 33 23 33
(∞)
Figure 3.4. The (full) grid of V3 , d = 2, and the assignment of grid
points to subspaces.
Lemma 3.5. For u ∈ X0q,2 (Ω̄), the following estimates for the different
(∞) (∞) (∞)
norms of the interpolation error u − un , un ∈ Vn , hold:
d
u − u(∞)
n ∞ ≤ · 2−2n · |u|2,∞ = O(h2n ), (3.32)
6d
d
u − u(∞)
n 2 ≤ d · 2
−2n
· |u|2,2 = O(h2n ),
9
d3/2
u − u(∞)
n E ≤ · 2−n · |u|2,∞ = O(hn ),
2 · 3(d−1)/2 · 6d−1
d3/2
u − u(∞)
n E ≤ √ · 2−n · |u|2,2 = O(hn ).
3 · 9d−1
d1/2
u − u(∞)
n E ≤ · |u|2,∞ · 2−2·|l|1 · max 2lj
2 · 12(d−1)/2 1≤j≤d
|l|∞ >n
d3/2
≤ · |u|2,∞ · 2−2·|l|1 · 2l1
2· 12(d−1)/2
|l|∞ =l1 >n
d−1
d3/2
l1
= · |u|2,∞ · 2−l1 · 4−lj
2 · 12(d−1)/2 l >n lj =1
1
d3/2 1
≤ · |u|2,∞ · d−1 · 2−n ,
2 · 12(d−1)/2 3
and an analogous argument provides the second estimate.
1
n
Qn u := · u(xi ), (3.34)
n
i=1
that is used to get an approximation to Iu := Ω̄ u(x) dx, a sharp error
bound is given by
Qn u − Iu ≤ V (u) · Dn∗ (x1 , . . . , xn ). (3.35)
Here, V (u) is the so-called variation of u in the sense of Hardy and Krause,
a property of u indicating the global smoothness of u: the smoother u is
on Ω̄, the smaller the values of V (u). Dn∗ (x1 , . . . , xn ) denotes the so-called
star discrepancy of the grid (x1 , . . . , xn ), which measures the deviation of a
Continuous optimization
For the following, a grid and its representation I – formerly a finite set of
multi-indices – is nothing but a bounded subset of Rd+ , and a hierarchical
subspace Wl just corresponds to a point l ∈ Rd+ .
First we have to formulate the optimization problem (3.33). To this end,
and inspired by (3.20), the local cost function c(l) is defined as a straight-
forward generalization of the number of degrees of freedom involved:
c(l) := 2|l|1 −d = 2l1 +···+ld −d . (3.37)
For the local benefit function b(l), we use the squared upper bounds for ul
according to (3.27). At the moment, we do not fix the norm to be used
here. Obviously, the search for an optimal I ⊂ Rd+ can be restricted to
I ⊂ I(max) := [0, N ]d for a sufficiently large N without loss of generality.
Based on the two local quantities c(l) and b(l), the global cost C(I) and the
For the solution of (3.39), we start from an arbitrary I ⊂ I(max) that has
a sufficiently smooth boundary ∂I. With a sufficiently smooth mapping τ ,
τ : Rd+ → Rd+ , τ (l) = 0 for l ∈ ∂Rd+ , (3.40)
we define a small disturbance ϕε,τ of the grid I:
ϕε,τ : I → Iε,τ ⊂ I(max) , ϕε,τ (l) := l + ε · τ (l), ε ∈ R. (3.41)
For the global cost of the disturbed grid Iε,τ , we get
C(Iε,τ ) = c(k) dk = c(l + ε · τ (l)) · |det Dϕε,τ | dl. (3.42)
Iε,τ I
Similar arguments hold for the global benefit B(I) and result in
∂B(Iε,τ ) B(Iε,τ ) − B(I)
= lim = b(l) · τ (l) dS. (3.47)
∂ε ε=0 ε→0 ε ∂I
Now, starting from the optimal grid I(opt) , Lagrange’s principle for the
optimization under a constraint can be applied, and we get
λ· c(l) · τ (l) dS = b(l) · τ (l) dS. (3.48)
∂I(opt) ∂I(opt)
Discrete optimization
Since the above continuous optimization process with its roundabout way
of generalizing integer multi-indices to real ones is a bit unnatural, (3.33) is
now formulated as a discrete optimization problem.
First of all, we redefine the local functions c(l) and b(l), now for multi-
indices l ∈ Nd only. According to (3.20), the local cost c(l) is defined by
c(l) := |Wl | = 2|l−1|1 , (3.51)
which is exactly the same as (3.37) restricted to l ∈ Nd . Obviously, c(l) ∈ N
holds for all l ∈ Nd . Concerning the local benefit function, we define
b(l) := γ · β(l), (3.52)
where β(l) is an upper bound for ul 2 according to (3.27), and γ is a factor
depending on the problem’s dimensionality d and on the smoothness of the
data, i.e., of u, but constant with respect to l, such that b(l) ∈ N. The
respective bounds in (3.27) show that such a choice of γ is possible for each
of the three norms that are of interest in our context. Note that, as in the
continuous case, we do not make any decision concerning the actual choice
of norm to be used for b(l) for the moment.
Again, the search for an optimal grid I ⊂ Nd can be restricted to all
I ⊂ I(max) := {1, . . . , N }d for a sufficiently large N without loss of generality.
where
0, l ∈
/ I,
x(l) := (3.54)
1, l ∈ I.
W11
W12
13
22 12 22
13
31 21 31 11 31 21 31
13 W13
22 12 22
13
(1)
Figure 3.5. The sparse grid of V3 , d = 2, and the assignment of grid
points to subspaces.
That is, we fix d subspaces on the axes in the subspace scheme of Figure 3.3
and search for all Wl whose cost–benefit ratio is equal or better. Thus,
applying the criterion cbr∞ (l) ≥ σ∞ (n) or cbr2 (l) ≥ σ2 (n), respectively, we
get the relation
|l|1 ≤ n + d − 1 (3.60)
Now, let us turn to the basic properties of the sparse grid approximation
(1)
spaces Vn .
(1)
Lemma 3.6. The dimension of the space Vn , i.e., the number of degrees
of freedom or inner grid points, is given by
(1) n−1 d−1+i
Vn = 2i · (3.62)
d−1
i=0
d−1
n+d−1
= (−1) + 2 ·
d n
· (−2)d−1−i
i
i=0
d−1
n
= 2n · + O(nd−2 ) .
(d − 1)!
Thus, we have
(1)
Vn = O(h−1
n · | log2 hn |
d−1
). (3.63)
Proof. With (3.20) and (3.61), we get
(1)
n+d−1
Vn = Wl = 2|l−1|1 = 2i−d · 1
|l|1 ≤n+d−1 |l|1 ≤n+d−1 i=d |l|1 =i
n+d−1
i−1
= 2i−d
·
d−1
i=d
n−1
d−1+i
= 2 ·
i
,
d−1
i=0
1
n−1
(d−1)
= · xi+d−1
(d − 1)! x=2
i=0
1 1 − xn (d−1)
= · xd−1 ·
(d − 1)! 1−x x=2
d−1
d−1 (d−1−i)
1
n+d−1 (i) 1
= · · xd−1
−x ·
(d − 1)! i 1−x x=2
i=0
d−1
n+d−1
= (−1)d + 2n · · (−2)d−1−i ,
i
i=0
from which the result concerning the order and the leading coefficient follows
immediately.
The above lemma shows the order O(2n ·nd−1 ) or, with hn = 2−n , O(h−1
n ·
| log2 hn |d−1 ), which is a significant reduction of the number of degrees of
freedom and, thus, of the computational and storage requirement compared
(∞)
with the order O(h−d n ) of Vn .
The other question to be discussed concerns the interpolation accuracy
that can be obtained on sparse grids. For that, we look at the interpolation
(1) (1) (1)
error u − un of the sparse grid interpolant un ∈ Vn which, due to (3.19)
and (3.61), can be written as
u − u(1)
n = ul − ul = ul .
l |l|1 ≤n+d−1 |l|1 >n+d−1
The following lemma provides a prerequisite for the estimates of the inter-
polation error with respect to the different norms we are interested in. For
d, n ∈ N, we define
d−1
n+d−1 nd−1
A(d, n) := = + O(nd−2 ). (3.65)
k (d − 1)!
k=0
∞
−s·|l|1 −s·n −s·d −s·i n+i+d−1
2 =2 ·2 · 2 · (3.66)
d−1
|l|1 >n+d−1 i=0
∞
2−s·|l|1 = 2−s·i · 1
|l|1 >n+d−1 i=n+d |l|1 =i
∞
−s·i i−1
= 2 ·
d−1
i=n+d
∞
−s·n −s·d −s·i n+i+d−1
=2 ·2 · 2 · .
d−1
i=0
Since
∞ ∞ (d−1)
n+i+d−1 x−n
x · i
= · xn+i+d−1
(3.67)
d−1 (d − 1)!
i=0 i=0
(d−1)
x−n 1
= · xn+d−1 ·
(d − 1)! 1−x
d−1 (d−1−k)
x−n d − 1 n+d−1 (k) 1
= · · x ·
(d − 1)! k 1−x
k=0
d−1
d−1−k
n+d−1 x 1
= · · ,
k 1−x 1−x
k=0
∞
d−1
−s·i n+i+d−1 n+d−1
2 · ≤2· = 2 · A(d, n).
d−1 k
i=0 k=0
With the above lemma, we obtain the desired result concerning the inter-
(1)
polation quality of standard sparse grid spaces Vn .
∞
d 1/2
|u|2,∞
= · 4−i · 4lj
2 · 12(d−1)/2 i=n+d
|l|1 =i j=1
∞
|u|2,∞
≤ · d · 2−i
2 · 12(d−1)/2 i=n+d
d · |u|2,∞
= · 2−n ,
2 · 3(d−1)/2 · 4d−1
because
d 1/2
4 lj
≤ d · 2i ,
|l|1 =i j=1
3
d
−5·|l|1
= · 2 · 4lj · |u|22,∞
6d
j=1
as the local cost–benefit ratio. Again, instead of ul E itself, only an upper
bound for the squared energy norm of ul is used. The resulting optimal grid
I(opt) will consist of all those multi-indices l or their respective hierarchical
subspaces Wl that fulfil cbrE (l) ≥ σE (n) for some given constant threshold
σE (n). As before, σE (n) is defined via the cost–benefit ratio of Wl̄ with
l̄ := (n, 1, . . . , 1):
3 −5·(n+d−1)
n
σE (n) := cbrE (l̄) = · 2 · 4 + 4 · (d − 1) · |u|22,∞ . (3.70)
6d
Thus, applying the criterion cbrE (l) ≥ σE (n), we come to an alternative
(E)
sparse grid approximation space Vn , which is based on the energy norm:
Vn(E) := Wl . (3.71)
Pd l
|l|1 − 15 ·log2 j=1 4 j ≤(n+d−1)− 51 ·log2 4n +4d−4
First, we look at the number of grid points of the underlying sparse grids.
(E)
Lemma 3.9. The energy-based sparse grid space Vn is a subspace of
(1)
Vn , and its dimension fulfils
d d
|Vn(E) | ≤ 2n · · e = O(h−1
n ). (3.72)
2
1 −i
n−1
= 2n · · 2 · 1
2 Pd n
i=0 |l|1 =n+d−1−i, j=1
lj
4 ≥ 4 +4d−4
32i
1 n−1
≤ 2 · · lim
n
2−i · 1
2 n→∞ Pd n
i=0 |l|1 =n+d−1−i, j=1
lj
4 ≥ 4 +4d−4
32i
n−1
1 d − 1 − 1.5i
= 2n · · lim 2−i · d · ,
2 n→∞ d−1
i=0
since it can be shown that, for n → ∞, our energy-based sparse grid and
the grid resulting from the second condition |l|∞ ≥ n − 2.5i for the inner
n = 10 n = 20 n = 10 n = 20 n = 10 n = 20
(∞)
Vn 1.05 · 106 1.10 · 1012 1.07 · 109 1.15 · 1018 1.10 · 1012 1.21 · 1024
(1)
Vn 9217 1.99 · 107 47103 2.00 · 108 1.78 · 105 1.41 · 109
(E)
Vn 3841 4.72 · 106 10495 1.68 · 107 24321 5.27 · 107
l2 l2
(1) (E)
Figure 3.7. Scheme of subspaces for V30 (left) and V30 (right), d = 2.
sum instead of
d
4n + 4d − 4
4lj ≥
32i
j=1
i∗ n−1− 2.5i
|u|∞ d − 2 + 1.5i + k n−
= 4−n−d+1 4i d 2 2.5i −k
2 · 12 (d−1)/2 d−2
i=0 k=1
Though we have only derived upper bounds for the energy norm of the
interpolation error, it is helpful to compare the respective results (3.32),
(∞) (1) (E)
(3.68) and (3.73) for the three approximation spaces Vn , Vn and Vn .
Table 3.2 shows that there is no asymptotic growth with respect to d, either
for the full grid case or for our two sparse grid spaces.
The crucial result of this section is that, with the energy-based sparse grid
(E)
spaces Vn , the curse of dimensionality can be overcome. In both (3.72)
and (3.73), the n-dependent terms are free of any d-dependencies. There is
an order of O(2n ) for the dimension and O(2−n ) for the interpolation error.
Especially, there is no longer any polynomial term in n like nd−1 . That
is, apart from the factors that are constant with respect to n, there is no
(E) (E)
d-dependence of both |Vn | and u − un E , and thus no deterioration
in complexity for higher-dimensional problems. Furthermore, the growth
of the d-dependent terms in d is not too serious, since we have a factor
(.)
Table 3.2. d-depending constants in the bounds for u − un E (multiply with
|u|2,∞ · 2−n (first row) or |u|2,2 · 2−n (second row) to get the respective bounds).
d3/2
d−1
d d 1 5
· +
2 · 3(d−1)/2 · 6d−1 2 · 3(d−1)/2 · 4d−1 3(d−1)/2 · 4d−1 2 2
d3/2
d−1
d 2·d 1 5
√ √ √ · +
3 · 9d−1 3 · 6d−1 3 · 6d−1 2 2
(E)
of d
2· ed in the upper bound of |Vn | and (in the best case; see Table 3.2)
d−1 (E)
d
3(d−1)/2 ·4d−1
· 12 + 52 in the upper bound of u − un E .
Recurrence formulas
(1)
For the following, we restrict ourselves to the sparse grid spaces Vn . In
(3.61), they were introduced with the help of an explicit formula. Now we
study their recursive character to obtain further results concerning their
complexity and asymptotic properties. First, starting from (3.62), one can
(1)
show a recurrence relation for |Vn |, the number of (inner) grid points of a
(1)
sparse grid. Note that |Vn | depends on two parameters: the dimensionality
d and the resolution n. Defining
an,d := |Vn(1) |, (3.74)
we get
an,d = an,d−1 + 2 · an−1,d . (3.75)
That is, the d-dimensional sparse grid of resolution (or depth) n consists of a
(d − 1)-dimensional one of depth n − 1 (separator) and of two d-dimensional
sparse grids of depth n − 1 (cf. Figure 3.8, left). If we continue with this
decomposition in a recursive way, we finally obtain a full-history version of
(3.75) with respect to n,
n−1
an,d = 2i · an−i,d−1 , (3.76)
i=0
(1)
since a1,d = 1 for all d ∈ N due to (3.74). Thus, a sparse grid Vn of
dimensionality d and depth n can be completely reduced to sparse grids of
dimensionality d − 1 and depth k, k = 1, . . . , n (cf. Figure 3.8, right).
In addition to an,d , we shall now deal with sparse grids with grid points on
the boundary. To this end, let bn,d be the number of overall grid points of the
L2 -based sparse grid of parameters d and n, i.e., in the interior and on the
boundary of Ω̄. On ∂ Ω̄, we assume sparse grids of the same resolution n, but
of a reduced dimensionality.
d d−j Since the boundary of the d-dimensional unit
interval Ω̄ consists of j · 2 j-dimensional unit intervals (j = 0, . . . , d),
we get
d
d
bn,d := · 2d−j · an,j , (3.77)
j
j=0
where an,0 := 1 for all n ∈ N. With the help of (3.75) and (3.77), the
following recurrence relation for the bn,d can be derived:
bn,d = 2 · bn−1,d + 3 · bn,d−1 − 4 · bn−1,d−1 (3.78)
with its full-history version with respect to n
n−1
bn,d = 3 · bn,d−1 + 2i · bn−i,d−1 (3.79)
i=1
n−1
= 2 · bn,d−1 + 2i · bn−i,d−1 ,
i=0
where the first term stands for the boundary faces xd ∈ {0, 1}, whereas
the sum denotes the inner part of the grid with respect to direction xd .
Figure 3.9 illustrates the recursive structure of bn,d in the 2D case.
Finally, a third quantity cp,d shall be introduced that motivates the sparse
grid pattern from a perspective of approximation with polynomials, thus
anticipating, to some extent, the higher-order approaches to be discussed
p−2
cp,d = cp,d−1 + cp−1,d−1 + 2i · cp−2−i,d−1 , (3.80)
i=0
In the following, we present some properties of the an,d , bn,d , and cp,d .
Though we do not want to go into detail here, note that, in contrast to
many recurrences studied for the analysis of algorithms (cf. Graham, Knuth
and Patashnik (1994), Sedgewick and Flajolet (1996), for example), these
quantities, and thus the overall storage requirement and computational cost
connected with sparse grids, depend on two parameters. Let d ∈ N, n ∈ N,
and p ∈ N0 . The initial conditions
a1,d = 1 ∀d ∈ N, an,1 = 2n − 1 ∀n ∈ N, (3.82)
b1,d = 3 d
∀d ∈ N, bn,1 = 2 + 1 ∀n ∈ N,
n
Lemma 3.11. The following relations are valid for the n- and d-asymptotic
behaviour of an,d and bn,d :
an,d+1 d→∞ an,d+1 n→∞
−→ 1, −→ ∞, (3.83)
an,d an,d
an+1,d n→∞ an+1,d d→∞
−→ 2, −→ ∞,
an,d an,d
bn,d+1 d→∞ bn,d+1 n→∞
−→ 3, −→ ∞,
bn,d bn,d
bn+1,d n→∞ bn+1,d d→∞
−→ 2, −→ ∞.
bn,d bn,d
ε-complexity
An approach that is closely related to the cost–benefit setting used for
the derivation of the sparse grid approximation spaces is the concept of
the ε-complexity (Traub and Woźniakowski 1980, Traub, Wasilkowski and
Woźniakowski 1983, Traub, Wasilkowski and Woźniakowski 1988, Woźnia-
kowski 1985). The ε-complexity of a numerical method or algorithm indic-
ates the computational work that is necessary to produce an approximate
solution of some prescribed accuracy ε. In particular, for the complex-
ity of general multivariate tensor product problems, see Wasilkovski and
Woźniakowski (1995). We consider the ε-complexity of the different discrete
(∞) (1) (E)
approximation spaces Vn , Vn , and Vn for the problem of representing
q,2
a function u ∈ X0 (Ω̄) on a grid, i.e., the problem of constructing the in-
(∞) (∞) (1) (1) (E) (E)
terpolant un ∈ Vn , un ∈ Vn , or un ∈ Vn , respectively. To this
end, the overall computational cost caused by the interpolation in one of the
Lemma 3.12. For the ε-complexities N∞ (ε), N2 (ε), and NE (ε) of the
(∞) (∞)
problem of computing the interpolant un ∈ Vn with respect to the L∞ -
, the L2 -, and the energy norm for some prescribed accuracy ε, the following
relations hold:
d d
N∞ (ε) = O ε− 2 , N2 (ε) = O ε− 2 , (3.85)
−d
NE (ε) = O ε .
Conversely, given a number N of grid points, the following accuracies can
be obtained with respect to the different norms:
2 2
εL∞ (N ) = O N − d , εL2 (N ) = O N − d , (3.86)
−1
εE (N ) = O N d .
Proof. The statements follow directly from (3.31) and (3.32).
(1)
Next we turn to the L2 -based sparse grid space Vn . As we have already
(1)
seen, Vn lessens the curse of dimensionality, but does not yet overcome it.
Lemma 3.13. For the ε-complexities N∞ (ε), N2 (ε), and NE (ε) of the
(1) (1)
problem of computing the interpolant un ∈ Vn with respect to the L∞ -,
the L2 -, and the energy norm for some prescribed accuracy ε, the following
relations hold:
1 3
N∞ (ε), N2 (ε) = O ε− 2 · | log2 ε| 2 ·(d−1) , (3.87)
−1
NE (ε) = O ε · | log2 ε|d−1 .
4.1. Ancestors
The discussion of hierarchical finite elements (Peano 1976, Zienkiewicz,
Kelly, Gago and Babuška 1982) and, in particular, the series of articles
by Yserentant (1986, 1990, 1992) introducing the use of hierarchical bases
for the numerical solution of PDEs, both for purposes of an explicit discret-
ization and for the construction of preconditioners, was the starting point of
Zenger’s sparse grid concept (Zenger 1991). The generalization of Yserent-
ant’s hierarchical bases to a strict tensor product approach with its underly-
ing hierarchical subspace splitting discussed in Section 3 allowed the a priori
identification of more and of less important subspaces and grid points. As
we have seen, it is this characterization of subspaces that the definition of
sparse grids is based on. With the sparse grid approach, for the first time,
a priori optimized and fully structured grid patterns were integrated into
existing and well-established discretization schemes for PDEs such as finite
elements, and were combined with a very straightforward access to adaptive
grid refinement.
Even if the sparse grid concept was new for the context of PDEs, very
closely related techniques had been studied for purposes of approximation,
recovery, or numerical integration of smooth functions, before. For instance,
the generalization of Archimedes’ well-known hierarchical quadrature of
1 − x2 on [−1, 1] to the d-dimensional case via Cavalieri’s principle (see
Figure 4.1) is a very prominent example of an (indeed early) hierarchical
tensor product approach. Much later, Faber (1909) discussed the hierarch-
ical representation of functions.
Finally, once more, the Russian literature turns out to be helpful for
exploring the roots of a new approach in numerical mathematics. Two
names that have to be mentioned here are those of Smolyak and Babenko.
Smolyak (1963) studied classes of quadrature formulas of the type
n
(1) (1) (d−1)
Q(d)
n f := Qi − Qi−1 ⊗ Qn−i f (4.1)
i=0
freedom per element or per grid point for a higher p. However, to determine
a polynomial u(p) (x) of degree p uniquely on [xl,i − hl , xl,i + hl ], we need
p + 1 conditions u(p) (x) has to fulfil. In the linear case, the interpolant
resulting from the hierarchically higher levels is defined by its values in
the two boundary points xl,i ± hl . For p ≥ 2, these two conditions are
no longer sufficient. Therefore, we profit from the hierarchical history of
xl,i . Figure 4.4 shows the hierarchical relations of the grid points according
to the hierarchical subspace splitting of Section 3.1. Apart from xl,i ± hl ,
(p)
which mark the boundary of the support of φl,i , xl,i may have hierarchical
ancestors that are all located outside this support. Consequently, for the
definition of such a local interpolant u(p) (x), it is reasonable and, for the
construction of a hierarchical basis, essential to take the values of u in
xl,i ± hl (as in the linear case) and, in addition, in a sufficient number of
hierarchically next ancestors of xl,i . These considerations lead us to the
following definition.
Let u ∈ C p+1 ([0, 1]), 1 ≤ p ≤ l, and let Ωl denote the 1D grid of mesh
width hl = 2−l with grid points xl,i according to (3.4)–(3.6). Then, the
hierarchical Lagrangian interpolant u(p) (x) of degree p of u(x) with respect
to Ωl is defined on [xl,i − hl , xl,i + hl ], i odd, as the polynomial interpolant of
(xk , u(xk )), k = 1, . . . , p+1, where the xk are just xl,i ±hl and the p−1 next
hierarchical ancestors of xl,i . Note that u(p) (x) is continuous on Ω̄, piecewise
of polynomial degree p with respect to the grid Ωl−1 , and it interpolates u(x)
Lemma 4.1. Let u ∈ C p+1 ([0, 1]), 1 ≤ p ≤ l, and let x1 < · · · < xp+1
be the ancestors of xl,i on level l, i odd, taken for the construction of the
hierarchical Lagrangian interpolant u(p) of u in [xl,i − hl , xl,i + hl ]. Then the
hierarchical surplus v (p) (xl,i ) in xl,i fulfils
p+1
(p) (p) 1
v (xl,i ) := u(xl,i ) − u (xl,i ) = · Dp+1 u(ξ) · xl,i − xk (4.5)
(p + 1)!
k=1
1 p+1 p+1 p k
≤
· D u(ξ) · hl · 2 −1
(p + 1)!
k=1
1 p+1 p+1 p·(p+1)/2−1
≤ · D u(ξ) · hl · 2 ,
(p + 1)!
which is (4.6).
Hence, we obtain the desired increase in the order of approximation, for
the price of a factor growing exponentially in p2 . This is a hint that increas-
ing p on each new level will not be the best strategy.
As an analogue to (3.24), an integral representation can be shown for
(p)
the general order hierarchical surplus, too. For that, define sl,i (t) as the
minimum support spline with respect to xl,i and its p + 1 direct hierarchical
ancestors (renamed x0 , . . . , xp+1 in increasing order):
(p)
p+1
(xk − t)p+
sl,i (t) := x0 , . . . , xp+1 (x − t)p+ = (x ) . (4.7)
wl,i k
k=0
Here, x0 , . . . , xp+1 f (x) just denotes the divided differences of order p with
respect to x,
(x − t)p , for x − t ≥ 0,
(x − t)p+ := (4.8)
0, otherwise,
and
p+1
wl,i (x) := (x − xj ). (4.9)
j=0
Lemma 4.2. With the above definitions, we get the following integral
representation for the hierarchical surplus v (p) (xl,i ):
(x ) ∞
wl,i
(p) l,i (p)
v (xl,i ) = · sl,i (t) · Dp+1 u(t) dt. (4.10)
p! −∞
(p)
p+1
(xk − t)+
Dp−1 sl,i (t) = (−1) p−1
· (x ) (4.12)
wl,i k
k=0
(p)
d wlj ,ij (xlj ,ij ) (p )
σl,i (x) := · slj ,ij j (xj ), (4.21)
pj !
j=1
(p )
where wlj ,ij (xj ) and slj ,ij j (xj ) are defined exactly as in (4.9) and (4.7), but
now for direction xj and based on the respective hierarchical ancestors. With
the help of (4.21) we obtain an integral representation analogous to (3.24).
(p)
Lemma 4.5. For u ∈ X0p+1,q , the hierarchical surplus vl,i fulfils
(p) (p)
vl,i = σl,i (x) · Dp+1 u(x) dx. (4.22)
Ω
Again, note the close relations between the hierarchical approach and
integral transforms. Applying successive partial integration to (4.22), we get
(p) (p) p+1 |p+1|1 (p)
vl,i = σl,i (x)D u(x) dx = (−1) σ̂l,i (x)u(x) dx, (4.23)
Ω Ω
(p) (p)
where σ̂l,i (x) equals Dp+1 σl,i (x) in a weak sense and is a linear combin-
ation of (p + 2)d Dirac pulses of alternating sign. Thus, again, the surplus
can be interpreted as the coefficient resulting from an integral transform
(p)
based on the function σ̂l,i (x).
Next, (4.22) leads us to upper bounds for the d-dimensional surplus of
(p) (p)
degree p and for Wl ’s contribution ul to the hierarchical representation
of u.
and with the seminorms |u|α,∞ and |u|α,2 defined in (3.3), the following
(p)
estimates hold for the d-dimensional hierarchical surplus vl,i :
(p) 1 d
v ≤ · c(p) · 2−|l·(p+1)|1 · |u|p+1,∞ , (4.25)
l,i 2
(p) 1 d/2
v ≤ · c(p) · 2−|l·(p+1)|1 · 2|l|1 /2 · Dp+1 u |supp(φl,i ) 2 .
l,i 6
(p )
Proof. Owing to (4.21), (4.22), and slj ,ij j (xj ) ≥ 0, we have
(p) (p)
v = σ (x) · D p+1
u(x) dx
l,i l,i
Ω
d wlj ,ij (xlj ,ij ) (p )
= · slj ,ij j (xj ) ·D p+1
u(x) dx
Ω j=1 pj !
Because of (4.10), each of the d factors in the above product is the absolute
p +1
value of the hierarchical surplus of xj j /(pj + 1)!, for j = 1, . . . , d, which
is bounded by
1 p +1
· h j · 2pj ·(pj +1)/2−1
(pj + 1)! lj
(p) d
1 p +1
v ≤ · h j · 2pj ·(pj +1)/2−1 · |u|p+1,∞
l,i (pj + 1)! lj
j=1
(p )
According to (4.21), and since slj ,ij j (xj ) ≥ 0,
(p) d
wlj ,ij (xlj ,ij ) (pj )
σ = · slj ,ij
l,i 2 pj !
j=1 2
2 1/2
d
wlj ,ij (xlj ,ij ) (pj )
= · slj ,ij (xj ) dxj
pj !
j=1 [0,1]
1/2
d
wlj ,ij (xlj ,ij ) (pj ) 1 p +1
≤ max · slj ,ij (xj ) · h j 2pj ·(pj +1)/2−1
xj ∈[0,1] pj ! (pj + 1)! lj
j=1
1/2
d
(p ) 1 2·(p +1)
≤ max slj ,ij j (xj ) · h j 2pj ·(pj +1)−2
xj ∈[0,1] pj ! · (pj + 1)! lj
j=1
d 1/2
(p) 5 d/2
u ≤ 3.257 · c(p) · 2−|l·(p+1)|1 · 22lj · |u|p+1,2 .
l E 12
j=1
Proof. All results follow from the previous lemmata, with arguments com-
pletely analogous to the piecewise linear case.
Now, we are ready for the optimization process studied in detail for the
piecewise linear case in Section 3. For the p-regular scenario, a slight sim-
plification allows us to use the diagonal subspace pattern again, that is,
(p)
Vn(1,1) := Vn(1) , Vn(p,1) := Wl for p > 1. (4.27)
|l|1 ≤n+d−1
As before, note that p is not constant, owing to (4.16). The following the-
(p,1)
orem deals with the approximation quality of these sparse grid spaces Vn .
Theorem 4.8. For the L∞ -, L2 -, and energy norm, the following bounds
(p,1) (p,1)
for the error of the interpolant un ∈ Vn of u ∈ X0p+1,q hold:
0.5585 d
u − u(p,1) ≤ · c(p · 1) · |u|(p+1)·1,∞ · A(d, n) · hp+1 + O(hp+1
n )
n ∞ 2p+1 n
n ·n
= O(hp+1 d−1
), (4.28)
Proof. Actually, there is just one major difference compared to the proof
of (3.68) dealing with the piecewise linear case. Now, owing to (4.16),
(p)
the polynomial degree p is not constant for all subspaces Wl neglected
(p,1)
in Vn , but depends on the respective level l. However, the influence of
all subspaces with p < p · 1 can be collected in a term of the order O(hp+1
n )
with respect to the L∞ - and L2 -norm or of the order O(hpn ) with respect
to the energy norm, if n ≥ p − 1: for sufficiently large n, each of those
subspaces involves at least one coordinate direction xj with lj = O(n) and
pj = p.
Therefore we can proceed as in the proof of (3.68) and assume a constant
degree p = p · 1. With (4.26) and (3.66) for s = p + 1, we get
(p)
u − u(p,1) ≤ u
n ∞ l ∞
|l|1 >n+d−1
≤ 0.5585d · c(p · 1) · 2−(p+1)·|l|1 · |u|(p+1)·1,∞ + O(hp+1
n )
|l|1 >n+d−1
= 0.5585d · c(p · 1) · |u|(p+1)·1,∞ · 2−(p+1)·|l|1 + O(hp+1
n )
|l|1 >n+d−1
+ O(hpn )
and, as for the linear case,
u − u(p,1)
n
E
∞
d 1/2
5 d/2
≤ 3.257 c(p · 1)|u|(p+1)·1,∞ · 2−(p+1)·i · 4lj
8
i=n+d |l|1 =i j=1
+ O(hpn )
d/2 ∞
5
≤ 3.257 c(p · 1)|u|(p+1)·1,∞ · d · 2−p·i + O(hpn )
8
i=n+d
√
5 d
d · |u|(p+1)·1,∞ p
≤ 3.257 √ c(p · 1) · hn + O(hpn )
2 · 2p+1 1 − 2−p
= O(hpn ),
d
lj 1/2 ≤ d · 2i as in the proof of (3.68). The second
because |l|1 =i j=1 4
energy estimate can be obtained in an analogous way.
This theorem shows that our approach, indeed, leads to a sparse grid
(p,1)
approximation of higher order. For the space Vn , i.e., for a maximum
degree of p in each direction, we get an interpolation error of the order
O(hp+1
n · | log2 hn |d−1 ) with respect to both the L∞ - and the L2 -norm. For
the energy norm, the result is an error of the order O(hpn ).
Of course, the above optimization process can be based on the energy
norm, too. For that, we start from (4.26) and define the local cost–benefit
ratio, now with respect to the energy norm (cf. (3.69)):
bE (l)
cbrE (l) := (4.29)
c(l)
d
5 d
2 −|2·l·p+3·l|1
= 10.608 · · c (p) · 2 · lj
4 · |u|2p+1,∞ .
4
j=1
Finally, for the same problem tackled by the energy-based sparse grid ap-
(p,E)
proximation space Vn , we get
(p) −1 (p)
NE (ε) = O ε p and εE (N ) = O N −p . (4.33)
Proof. The proof follows exactly the argumentation of the linear case.
(p)
In comparison with the full grid case, where we get N∞ (ε) = O(ε−d/(p+1) )
(p)
and ε∞ (N ) = O(N −(p+1)/d ) with respect to the L∞ - or L2 -norm and
(p) (p)
NE (ε) = O(ε−d/p ) and εE (N ) = O(N −p/d ) with respect to the energy
(p,1)
norm, as before, the sparse grid space Vn lessens the curse of dimension-
(p,E)
ality in a significant manner; Vn , however, completely overcomes it.
F is the interpolant for the data y(s) = δ0,s , s ∈ Z. Now, the interpolet
mother function of the order 2n is just φ := F . A hierarchical multiscale
basis is formed from that φ by dilation and translation as in (3.8). The
function φ has the following properties.
Compact support.
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
3
This is the case for stable splittings with wavelets. Then a simple fast solver results
from the diagonal scaling preconditioner: see Dahmen and Kunoth (1992) and Oswald
(1994).
j l
on finer levels lj > 0. The basic idea is to choose the weights Qs,ij
in this
linear combination in such a way that φ̂lj ,ij has more vanishing moments
than φlj ,ij , and thus to obtain a stabilization effect. If we apply this ap-
proach to the hierarchical interpolet basis of Section 4.3 so that we achieve
two vanishing moments in the lifting wavelet basis, we end up with
1
φ̂lj ,2ij +1 = φ(lj ,2ij +1) − (φ(lj −1,ij ) + φ(lj −1,ij +1) )
4
for the odd indices 2ij + 1. The corresponding mother function φ̂ is shown
in Figure 4.12.
Again, as in our introductory example of piecewise linear hierarchical
bases of Section 3.1, we can use these 1D multilevel basis functions as the
input of the tensor product construction which provides a suitable piecewise
d-dimensional basis function φl,i (x) in each grid point xl,i . As in Sections 3
−1/4 −1/4
and 4.2, we can derive sparse grids based on wavelets. Also, most other
considerations regarding the estimates for the cost and error complexities
carry over accordingly. Further information on wavelets and sparse grids
can be found in Griebel and Oswald (1995b), Hochmuth (1999), DeVore,
Konyagin and Temlyakov (1998), Koster (2002), Knapek (2000a), Griebel
and Knapek (2000) and the references cited therein.
At the end of this subsection, let us dwell a bit into theory, and let us
give an argument why there is a difference between hierarchical polynomials
and interpolets on the one hand and prewavelets and pre-prewavelets on the
other hand.
For all mother functions and the resulting product multiscale bases and
sparse grid subspaces, we can, in principle, follow the arguments and proofs
of Sections 3 and 4.2, respectively, to obtain cost complexity estimates and
upper error bounds with relatively sharp estimates for the order constants.
The main tools in the proofs are the simple triangle inequality and geometric
series arguments. However, as already mentioned, a lower bound for the
error can not be obtained so easily. An alternative approach is that of
Griebel and Oswald (1995b) and Knapek (2000a), where we developed a
technique for which two-sided error norm estimates for the 1D situation can
be carried over to the higher-dimensional case. The approach is based on
the representation of Sobolev spaces H s ([0, 1]d ), s ≥ 0, as
d
H s ([0, 1]d ) = L2 ([0, 1]) ⊗ · · · ⊗ L2 ([0, 1]) ⊗H s ([0, 1])⊗ (4.39)
! "# $
i=1
(i−1) times
For the proof and further details, see Griebel and Oswald (1995b), Knapek
(2000a) and Koster (2002). Here, the bounds γ1 , γ2 for the range of the
regularity parameter s depend on the specific choice of the mother func-
tion φ. The value γ2 is determined by the Sobolev regularity of φ, i.e.,
γ2 = sup{s : φ ∈ H s }. The theory here works with biorthogonality ar-
guments and dual spaces: our univariate multiscale basis {φlj ,ij } possesses
a dual basis {φ̃lj ,ij }, in the sense that (for our hierarchical functions4 ) the
4
In the general setting, there is a multiresolution analysis with spaces spanned by scaling
functions and difference spaces spanned by wavelets. Then, biorthogonality conditions
must be fulfilled between all these function systems. In the hierarchical approach,
certain scaling functions are simply wavelets, and the biorthogonality conditions are
reduced.
5
The reason is that the dual ‘functions’ are Dirac functionals. Then γ1 = − 12 and the
norm equivalency (4.40) is only valid for s ∈] 12 , γ2 [, i.e., it does not hold for s = 0.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.5 1.4
0 1.2
1
−0.5 0.8
0.6
−1 0.4
0.2
−1.5 0
Applications
Flow problems were the first focus for the use of sparse grid PDE solv-
ers. Now, however, sparse grids are also used for problems from quantum
mechanics (Garcke 1998, Garcke and Griebel 2000, Yserentant 2004, Hack-
busch 2001), for problems in the context of stochastic differential equations
(Schwab and Todor 2002, 2003), or for the discretization of differential forms
arising from Maxwell’s equations (Hiptmair and Gradinaru 2003).
Aside from the field of PDEs, sparse grids are being applied to a vari-
ety of problems that will not be forgotten here. Among these problems
are integral equations (Frank, Heinrich and Pereverzev 1996, Griebel, Os-
wald and Schiekofer 1999, Knapek and Koster 2002, Knapek 2000b), general
operator equations (Griebel and Knapek 2000, Knapek 2000a, Hochmuth,
Knapek and Zumbusch 2000), eigenvalue problems (Garcke 1998, Garcke
and Griebel 2000), periodic interpolation (Pöplau and Sprengel 1997), in-
terpolation on Gauss–Chebyshev grids (Sprengel 1997b), Fourier transforms
(Hallatschek 1992), tabulation of reduced chemical systems (Heroth 1997),
digital elevation models and terrain representation Gerstner (1995, 1999),
audio and image compression (Frank 1995, Paul 1995) (see Figure 4.15),
and possibly others not listed here.
5. Numerical experiments
In this section, we present a collection of numerical results for different
problems solved on sparse grids. We start with the discussion of the basic
features of sparse grid methods for PDEs, applied to simpler 2D and 3D
model problems. Then we turn to the solution of the Navier–Stokes equa-
tions on sparse grids. Finally, we illustrate the potential of sparse grids
for problems of a higher dimensionality, here in the context of numerical
quadrature and data mining.
(p,1)
Figure 5.6. Example (5.3): convergence on Vn for p = 1 (left)
and p = 2 (right); the solid lines indicate the respective
expected sparse grid convergence (sgc; position of curve chosen
for clarity).
Figure 5.7 shows a gain in accuracy with higher p that is comparable to the
smooth situation of (5.1).
In Figure 5.8, we compare the error on a regular sparse grid and on an
adaptively refined one. As expected, the l2 -adaptation process reduces the
error equally over the whole domain. In contrast to that, regular sparse
grids show large errors near the singularity.
As for (5.1), the achieved accuracy will be compared to the theoretical
results concerning the ε-complexity of interpolation on sparse grids. In Fig-
ure 5.9, again for p = 1, the correspondence is striking. For p = 6, it
seems that the asymptotic behaviour needs bigger values of N to appear.
This was to be expected, since our hierarchical Lagrangian basis polyno-
mials of degree p = 6 need at least level 5 to enter the game. However,
the adaptation process causes a delayed creation of new grid points in the
∆u(x) = 0 in Ω, (5.4)
√
sinh( 2πx1 )
u(x) = √ · sin(πx2 ) · sin(πx3 ) on ∂Ω.
sinh( 2π)
For the polynomial degrees p ∈ {1, . . . , 4}, Figure 5.11 compares the ac-
curacy with respect to the error’s maximum norm or its l2 -norm, respect-
ively. Again, the effects of the improved approximation properties of our
hierarchical polynomial bases are evident.
Figure 5.12 shows that we do already come quite close to the asymptotic
behaviour predicted for the quality of mere interpolation. This is true in
spite of the fact that up to 100 000 degrees of freedom are not excessive
for a 3D problem. Remember that, although the curse of dimensionality is
(1)
lessened significantly by the L2 - or L∞ -based sparse grid spaces Vn and
(p,1)
Vn , there is still some d-dependence. Evaluating the respective order
terms of the ε-complexity derived before and given once more in (5.2) for
the 3D case, we observe an exponent of (p + 2) · (d − 1) = 2p + 4 in the
log-factor, i.e., an exponent of 12 for polynomial degree p = 4, for example.
Nevertheless, as in the 2D case of (5.1) and (5.3), the benefit caused by
higher polynomial degrees in combination with sparse grid discretization is
again rather impressive.
(p,1)
Figure 5.15. Example (5.5): regular sparse grid spaces Vn ,
p-asymptotic proceeding up to p = 12 with quadruple
precision.
(p,1)
Figure 5.17. Example (5.6): convergence on Vn for
p = 2 (left) and p = 6 (right); the solid lines indicate the
respective expected sparse grid convergence (sgc; position of
curve chosen for clarity).
is the solution. We present results for p ∈ {2, 4, 6}. Note that, owing to the
smoothness of u(x), we restrict ourselves to the regular sparse grid spaces
(p,1)
Vn . Figure 5.16 shows the maximum and the l2 -error.
Obviously, the higher-order approximation of our hierarchical Lagrangian
basis polynomials comes to fruition in the more general situation of example
(5.6) too. Figure 5.17 illustrates that we come close to the expected asymp-
totic behaviour already for moderate values of N .
solves
− ∇ · Â(x)∇u(x) + b̂(x) · ∇u(x) + ĉ(x)u(x) = f (x) (5.9)
on the unit cube with Dirichlet boundary conditions. Figure 5.19 (overleaf)
illustrates the solution u(x) due to (5.8) for the two x3 -values 0.5 and 0.875.
Numerical results are presented in Figure 5.18.
.1
.1e–1
.1e–2
maximum
.1e–3
P2
P1
1e–05 l2
Merging of modons
Now, we apply the adaptive version (Griebel and Koster 2000, Koster 2002,
Griebel and Koster 2003) of this sparse grid interpolet solver to the model
problem of the interaction of three vortices in a 2D flow. Here we use the
interpolets of Section 4.3 with N = 6.
The initial velocity is induced by three vortices, each with a Gaussian
vorticity profile
3
−x − xi 2
ω(x, 0) = ω0 + ωi exp 2 , x ∈ [0, 1]2 .
i=1
σ i
The first two vortices have the same positive sign ω1 , ω2 > 0, and the third
has a negative sign; see Figure 5.20.
The different parameters ω0 , ωi , and σi are chosen such that the mean
value of ω(. , 0) vanishes and that ω(. , 0)∂[0,1]2 is almost ω0 to allow for
periodic boundary conditions.
Owing to the induced velocity field, the three vortices start to rotate
around each other. In a later stage, the two same-sign vortices merge, which
leads to a configuration of two counter-rotating vortices. This process is the
basic mechanism in 2D turbulence, and it takes place, e.g., in the shear
layer problem of the next subsection or during the convergence of ω to
the solution of the Joyce–Montgomery equation (Chorin 1998) for random
initial vorticity fields.
y
1/2
u(y) = U tanh y/δ
U
0 δ x
U
−1/2
Figure 5.22. Velocity profile of the initial condition for the
shear flow model problem. The thickness of the boundary
layer is defined as δ = 2U/ maxy |∂y u(y)|.
with
1
m
R(f ) = C(f (xi ), yi ) + λΦ(f ). (5.15)
M
i=1
Here, C(. , .) denotes an error cost function which measures the interpolation
error and Φ(f ) is a smoothness functional which must be well defined for
f ∈ V . The first term enforces closeness of f to the data, the second term
enforces smoothness of f , and the regularization parameter λ balances these
N
fN = αj ϕj (x). (5.16)
j=1
Here {ψj }Nj=1 should span VN and preferably should form a basis for VN . The
coefficients {αj }N j=1 denote the degrees of freedom. Note that the restriction
to a suitably chosen finite-dimensional subspace involves some additional
regularization (regularization by discretization) which depends on the choice
of VN . In this way we obtain from the minimization problem a feasible linear
system. We thus have to minimize
1
M
2
R(fN ) = fN (xi ) − yi + λP fN 2L2 , fN ∈ VN (5.17)
M
i=1
N
1
N M
1
M
λ αj (P ϕj , P ϕk )L2 + αj ϕj (xi ) · ϕk (xi ) = yi ϕk (xi )
M M
j=1 j=1 i=1 i=1
(5.21)
and we obtain (k = 1, . . . , N )
% &
N
M
M
αj M λ(P ϕj , P ϕk )L2 + ϕj (xi ) · ϕk (xi ) = yi ϕk (xi ). (5.22)
j=1 i=1 i=1
98
10-Fold Correctness %
96
94
Level 7 Train
Level 7 Test
92 Level 8 Train
Level 8 Test
Level 9 Train
Level 9 Test
90 Level 10 Train
Level 10 Test
88
1e-05 0.0001 0.001 0.01
Lambda
To evaluate our method we give the correctness rates on testing data sets
if available and the 10-fold cross-validation results on training and testing
data sets. For the 10-fold cross-validation we proceed as follows. We divide
the training data into 10 equally sized disjoint subsets. For i = 1 to 10, we
pick the ith of these subsets as a further testing set and build the sparse grid
combination classifier with the data from the remaining nine subsets. We
then evaluate the correctness rates of the current training and testing set. In
this way we obtain ten different training and testing correctness rates. The
10-fold cross-validation result is just the average of these ten correctness
rates. For further details, see Stone (1974). For a critical discussion on the
evaluation of the quality of classifier algorithms, see Salzberg (1997).
We first consider 2D problems with small sets of data that correspond to
certain structures. Then we treat problems with huge sets of synthetic data
with up to 5 million points.
The first example is taken from Ho and Kleinberg (1996) and Kaufman
(1999). Here, 1000 training data points were given which are more or less
uniformly distributed in Ω = [0, 1]2 . The associated data values are plus one
or minus one depending on their location in Ω such that a 4 × 4 chessboard
structure appears: see Figure 5.25 (left). We computed the 10-fold cross-
validated training and testing correctness with the sparse grid combination
method for different values of the regularization parameter λ and different
levels n. The results are shown in Figure 5.25 (right).
We see that the 10-fold testing correctness is well around 95% for values
of λ between 3 · 10−5 and 5 · 10−3 . Our best 10-fold testing correctness was
96.20% on level 10 with λ = 4.54 · 10−5 . The chessboard structure is thus
reconstructed with less than 4% error.
Another 2D example with structure is the spiral data set, first proposed
by Wieland; see also Fahlmann and Lebiere (1990). Here, 194 data points
describe two intertwined spirals: see Figure 5.26. This is surely an artificial
problem, which does not appear in practical applications. However it serves
as a hard test case for new data mining algorithms. It is known that neural
networks can have severe problems with this data set, and some neural
networks can not separate the two spirals at all. In Figure 5.26 we give the
results obtained with our sparse grid combination method with λ = 0.001
for n = 6 and n = 8. Already for level 6, the two spirals are clearly detected
and resolved. Note that here only 577 grid points are contained in the sparse
grid. For level 8 (2817 sparse grid points), the shape of the two reconstructed
spirals gets smoother and the reconstruction gets more precise.
The BUPA Liver Disorders data set from the Irvine Machine Learning
Database Repository (Blake and Merz 1998) consists of 345 data points with
six features plus a selector field used to split the data into two sets with
145 instances and 200 instances, respectively. Here we only have training
data and can therefore only report our 10-fold cross-validation results. No
comparison with unused test data is possible.
We compare with the two best results from Lee and Mangasarian (2001),
the smoothed support vector machine (SSVM) introduced therein, and the
feature selection concave minimization (FSV) algorithm due to Bradley and
Mangasarian (1998). Table 5.1 gives the results for the 10-fold correctness.
Our sparse grid combination approach performs on level 3 with λ = 0.0625
at 69.23% 10-fold testing correctness. But our other results were also in this
range. Our method performs only slightly worse here than the SSVM but
clearly better than FSV. Note that the results for the robust linear program
(RLP) algorithm (Bennett and Mangasarian 1992), the support vector ma-
chine using the 1-norm approach (SVM·1 ), and the classical support vec-
tor machine (SVM·22 ) (Bradley and Mangasarian 1998, Cherkassky and
Mulier 1998, Vapnik 1995) were reported to be somewhat worse in Lee and
Mangasarian (2001).
Next, we produced a 6D data set with 5 million training points and 20 000
points with DatGen (Melli 2003) for testing. We used the call
datgen -r1 -X0/100,R,O :0/100,R,O:0/100,R,O:0/100,R,O:0/200,
R,O:0/200,R,O -R2 -C2/4 -D2/5 -T10/60 -O502 0000 -p -e0.15.
The results are given in Table 5.2. On level one, a testing correctness of
88% was achieved already, which is quite satisfying for this data. We see
that really huge data sets of 5 million points could be handled. We also
give the CPU time which is needed for the computation of the matrices
Gl = Bl · BlT . Here, more than 96% of the computing time is spent on the
matrix assembly. Again, the execution times scale linearly with the number
of data points.
Integration
The computation of high-dimensional integrals is a central part of computer
simulations in many application areas such as statistical mechanics, finan-
cial mathematics, and computational physics. Here, the arising integrals
usually cannot be solved analytically, and thus numerical approaches are
required. Furthermore, often a high-accuracy solution is needed, and thus
such problems can be computationally quite challenging even for parallel
supercomputers. Conventional algorithms for the numerical computation of
such integrals are usually limited by the curse of dimensionality. However,
for special function classes, such as spaces of functions which have bounded
mixed derivatives, Smolyak’s construction (Smolyak 1963) (see (4.1)) can
overcome this curse to a certain extent. In this approach, multivariate
quadrature formulas are constructed using combinations of tensor products
of appropriate 1D formulas. In this way, the number of function evalu-
ations and the numerical accuracy become independent of the dimension
of the problem up to logarithmic factors. Smolyak’s construction is simply
our sparse grid approach. It has been applied to numerical integration
by several authors, using the midpoint rule (Baszenski and Delvos 1993),
the rectangle rule (Paskov 1993), the trapezoidal rule (Bonk 1994a), the
Clenshaw–Curtis rule (Cools and Maerten 1997, Novak and Ritter 1998),
the Gauss rules (Novak and Ritter 1997), and the Gauss–Patterson rules
(Gerstner and Griebel 1998, Petras 2000) as the 1D basis integration pro-
cedure. The latter approach, in particular, achieves the highest polynomial
exactness of all nested quadrature formulas and shows very good results
for sufficiently smooth multivariate integrands. Further studies have been
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1 1
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Here, examples are so-called additive models (Hastie and Tibshirani 1990),
multivariate adaptive regression splines (MARS) (Friedman 1991), and the
ANOVA decomposition (Wahba 1990, Yue and Hickernell 2002); see also
Hegland and Pestov (1999). Other interesting techniques for dimension
reduction are presented in He (2001). If the importance of the dimensions
is known a priori , techniques such as importance sampling can be applied
in Monte Carlo methods (Kalos and Whitlock 1986). For the quasi-Monte
Carlo method, a sorting of the dimensions according to their importance
leads to a better convergence rate (yielding a reduction of the effective
dimension). The reason for this is the better distributional behaviour of
low-discrepancy sequences in lower dimensions than in higher ones (Caflisch,
Morokoff and Owen 1997). The sparse grid method, however, a priori treats
all dimensions equally and thus gains no immediate advantage for problems
where dimensions are of different importance.
In Gerstner and Griebel (2003), we developed a generalization of the
conventional sparse grid approach which is able to adaptively assess the
dimensions according to their importance, and thus reduces the dependence
of the computational complexity on the dimension. This is quite in the spirit
of Hegland (2002, 2003). The dimension-adaptive algorithm tries to find
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1 1
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
∂u 1 ∂2u
= · 2 (x, t) + v(x, t) · u(x, t),
∂t 2 ∂x
with initial condition u(x, 0) = f (x). The solution of this problem can be
obtained with the Feynman–Kac formula as
Rt
u(x, t) = Ex,0 f (ξ(t)) · e 0 v(ξ(r),t−r) dr
,
0.001 5
0.0001 4
Level
Error
1e-05 3
1e-06 2
1e-07 1
1e-08 0
1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 0 5 10 15 20 25 30 35
Function Evaluations Dimension
5
0.0001
Level
Error
1e-05
3
1e-06
2
1e-07 1
1e-08 0
1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 0 50 100 150 200 250 300
Function Evaluations Dimension
6. Concluding remarks
In this contribution we have given an overview of the basic principles and
properties of sparse grids as well as a report on the state of the art con-
cerning sparse grid applications. Starting from the dominant motivation –
breaking the curse of dimensionality – we discussed the underlying tensor
product approach, based upon different 1D multiscale bases such as the
classical piecewise linear hierarchical basis, general hierarchical polynomial
bases, interpolets, or wavelets. We then presented various resulting sparse
grid constructions and discussed their properties with respect to computa-
tional complexity, discretization error, and smoothness requirements. The
approach can be extended to nonsmooth solutions by adaptive refinement
methods. We demonstrated the effectiveness of sparse grids in a series of ap-
plications. The presented numerical results include 2D and 3D PDE model
problems, flow problems, and even two non-PDE applications in higher di-
mensions, namely numerical quadrature and data mining.
Since their introduction slightly more than a decade ago, sparse grids have
seen a very successful development and a variety of different applications.
REFERENCES
S. Achatz (2003a), Adaptive finite Dünngitter-Elemente höherer Ordnung für ellip-
tische partielle Differentialgleichungen mit variablen Koeffizienten, Disserta-
tion, Institut für Informatik, TU München.
S. Achatz (2003b), ‘Higher order sparse grids methods for elliptic partial differential
equations with variable coefficients’, Computing 71, 1–15.
G. Allaire (1992), ‘Homogenization and two-scale convergence’, SIAM J. Math.
Anal. 21, 1482–1516.
A. Allen (1972), Regression and the Moore–Penrose Pseudoinverse, Academic
Press, New York.
E. Arge, M. Dæhlen and A. Tveito (1995), ‘Approximation of scattered data using
smooth grid functions’, J. Comput. Appl. Math. 59, 191–205.
K. Babenko (1960), ‘Approximation by trigonometric polynomials in a certain
class of periodic functions of several variables’, Soviet Math. Dokl. 1, 672–
675. Russian original in Dokl. Akad. Nauk SSSR 132 (1960), 982–985.
R. Balder (1994), Adaptive Verfahren für elliptische und parabolische Differen-
tialgleichungen auf dünnen Gittern, Dissertation, Institut für Informatik, TU
München.
R. Balder and C. Zenger (1996), ‘The solution of multidimensional real Helmholtz
equations on sparse grids’, SIAM J. Sci. Comp. 17, 631–646.
R. Balder, U. Rüde, S. Schneider and C. Zenger (1994), Sparse grid and extra-
polation methods for parabolic problems, in Proc. International Conference
on Computational Methods in Water Resources, Heidelberg 1994 (A. Peters
et al., eds), Kluwer Academic, Dordrecht, pp. 1383–1392.
A. Barron (1993), ‘Universal approximation bounds for superpositions of a sig-
moidal function’, IEEE Trans. Inform. Theory 39, 930–945.
A. Barron (1994), ‘Approximation and estimation bounds for artificial neural net-
works’, Machine Learning 14, 115–133.
G. Baszenski and F. Delvos (1993), Multivariate Boolean midpoint rules, in Nu-
merical Integration IV (H. Brass and G. Hämmerlin, eds), Vol. 112 of Inter-
national Series of Numerical Mathematics, Birkhäuser, Basel, pp. 1–11.
G. Baszenski, F. Delvos and S. Jester (1992), Blending approximations with sine
functions, in Numerical Methods of Approximation Theory 9 (D. Braess and
L. Schumaker, eds), Vol. 105 of International Series of Numerical Mathemat-
ics, Birkhäuser, Basel, pp. 1–19.
B. J. C. Baxter and A. Iserles (2003), ‘On the foundations of computational math-
ematics’, in Handbook of Numerical Analysis, Vol. 11 (F. Cucker, ed.), El-
sevier, pp. 3–35.