Operator Methods For Integrals and Differential Equations
Operator Methods For Integrals and Differential Equations
Dan Piponi
October 23, 2017
Abstract
d
Some unpolished notes on abusing the differential operator D = dx for both symbolic
and numeric integration and differential equation solving. Starts with some methods we were
taught at high school(!) for solving differential equations which appear not to be well known
among students today. Apologies for some inconsistencies in notation.
1 Warm up
1.1 A differential equation
Suppose we are give the differential equation:
df
+ f = x2
dx
d
Let’s use the shorthand D = dx and rewrite this as
(D + 1)f = x2
The standard approach is first to solve the homogeneous form of the equation
(D + 1)f = 0
giving f (x) = A exp(−x) for some constant A. We then have to find a particular integral, i.e.
any solution to the original equation. The full solution to the original equation is the particular
integral plus A exp(−x).
So how do we find a particular integral? In my experience students are encouraged to use
educated guesswork. Anything goes as long as you prove that what you have found is indeed a
solution. So in this case a popular approach might be to assume f (x) = Ax2 + Bx + C, substitute
into the equation, and solve for A, B and C.
Here’s a more direct way:
(D + 1)f = x2
1
f = x2
1+D
= (1 − D + D2 − D3 − . . .)x2
= x2 − 2x + 2
1
1.2 An integral
Another example. Suppose we wish to find the integral:
Z
x3 exp(2x)dx
We have used the same division and power series operations as above. And of course for the full
answer we need to add a constant C.
My goal here is to sketch (without full rigour) why you might expect such methods to work
and give many more examples. I also want to show how you can stretch these methods to give
some hand-wavey arguments for some well-known theorems.
2 Some justification
2.1 Truncated power series
Let P (x) be a polynomial in x. Suppose we can expand P (x)−1 as a power series with some
non-empty circle of convergence:
∞
1 X
= ai xi
P (x) i=0
We have that
∞
X
P (x) ai xi = 1
i=0
so
n−1
X ∞
X
P (x)( ai xi + ai xi ) = 1
i=0 i=n
and
n−1
X ∞
X
P (x) ai xi = 1 − P (x) ai xi
i=0 i=n
(1 + x)(1 − x + x2 − x3 + x4 ) = 1 − x + x2 − x3 + x4 − x2 + x3 − x4 + x5
= 1 + x5
2
This tells us that we can use truncated power series to compute reciprocals of polynomials as
long as we don’t mind some higher order terms. The nice thing about this is that we know that
for any polynomial Q(x), Dn Q(x) = 0 for n large enough. So we can use truncated power series
in D to obtain exact results when applied to polynomials in x.
Let me rewrite the first argument above in a more rigorous way:
(D + 1)f = x2
(1 − D + D2 )(D + 1)f = (1 − D + D2 )x2
1f = x2 − 2x + 2 (assuming that f has no terms beyond x2 )
We can see this as justifying the use of the operator (D + 1)−1 as long as we remember that
the original argument is just a kind of shorthand for the rigorous one.
for any polynomial f . We also expect this to hold in situations where f is a power series but we
know it’s being applied to a polynomial so terms beyond a certain point contribute zero.
Writing g(x) = exp(−2x)f (x) we can now use the differential equation solving methods above to
solve (D + 2)g = x2 and our final integral is given by exp(2x)g(x). Again, we can see the original
argument as being shorthand for this more rigorous argument.
3
3 Diagonalisation
The functions sin, cos and exp “diagonalise” various differential operators. This means that dif-
ferential operators act on these functions just like multiplication by some number. (That’s what
diagonalisation means - finding elements that operators act on like real numbers.) For example
D exp(ax) = a exp(ax). But we also have Dn exp(ax) = an exp(ax) and so
You can see this is a special case of the shift rule applied to f (D)(exp(ax) × 1).
We also have D2 sin(ax) = −a2 sin(ax) and D2 cos(ax) = −a2 cos(x). So for any polynomial f
We have
4 Lots of examples
Example 4.1. Find a solution to
df
+ f = sin x
dx
Unfortunately we have an odd power of D applied to the sin function so we can’t directly use the
diagonalisation technique. Instead we write sin x using Euler’s formula:
1 1
sin x = =(exp ix)
1+D 1+D
1
= = exp ix (using fact that =(D(f )) = D(=(f )))
1+D
1
= = exp ix (using diagonalisation for exp)
1+i
1−i
= = exp ix
2
1
= (sin x − cos x)
2
4
Example 4.2. Find a solution to
d3 f
− f = sin x
dx3
We want
−1
sin x
1 − D3
We can use the fact that D2 sin x = − sin x to reduce this to:
−1
sin x
1+D
The solution is just minus the previous example
1
f (x) = (cos x − sin x)
2
Example 4.3. Find a solution to
d2 f df
−2 + f = ex
dx2 dx
The only slight subtlety here is noticing that when we use the shift rule, we slide f (D) past exp x
leaving behind a 1 that needs to be integrated twice.
(D2 − 2D + 1)f = ex
1
f = ex
(D − 1)2
1
= ex 2 1
D
x2 x
= e
2
Example 4.4. Find a solution to
d2 f
4 + f = x exp(−x)
dx2
Solution:
1 1
exp(−x)x = exp(−x) x
1 + 4D2 1 + 4(D − 1)2
1
= exp(−x) 2 x
4D − 8D + 5
1 1
= exp(−x) x (using D2 x = 0)
5 1 − 8D/5
1 8
= exp(−x)(1 + D)x
5 5
1
= exp(−x)(5x + 8)
25
d2 f
+ f = xe−x sin(2x)
dx2
5
No avoiding some messy complex number arithmetic here. We want
1 −x
h 1
(2i−1)x
i
e sin(2x)x = = e x
D2 + 1 D2 + 1
h 1 i
= e−x = e2ix x
(D + 2i − 1)2 + 1
h −1 1 i
= e−x = e2ix 2(2i−1)
x
2 + 4i 1 −
2+4i D
h −1 2(2i − 1) i
= e−x = e2ix (x + )
2 + 4i 2 + 4i
1 −x h i
= e = (cos(2x) + i sin(2x)) (10i − 5)x − (11 − 2i)
50
1 −x
= e (2 + 10x) cos(2x) − (11 + 5x) sin(2x)
50
We’re going to be evaluating this at zero and in the limit as x goes to infinity. All of these terms
vanish at infinity. All of the non-constant derivatives of xn vanish at zero. So we’re left with
h i∞
− exp(−x)Dn xn
0
This is n!.
Example 4.7. Find a solution to
df
+ af (x) = g(x)
dx
This becomes
1
f (x) = g(x)
a+D
1
= e−ax eax g(x)
D
Z x
= e−ax eay g(y)dy
Z x
= ea(y−x) g(y)dy (choosing a particular integral)
−∞
The value at x is essentially a weighted sum of the history of g before x. You can think of f as a
“leaky” integral of g - it would be the integral but f “leaks” at a rate proportional to a and f .
5 Exponentials of D
Taylor’s theorem tells us that
a2 00 a3 an (n)
f (x + a) = f (x) + af 0 (x) + f (x) + f (3) (x) + . . . + f (x) + a remainder term
2! 3! n!
6
Figure 1: These techniques are old. This snippet is from Boole’s A Treatise on Differential Equa-
tions from 1859.
The form of the remainder depends on the class of function f . For polynomials the remainder is
precisely zero for n large enough. For analytic functions the remainder goes to zero as n goes to
infinity so we can write
∞
X Dn
f (x + a) = an (f )(x)
n=0
n!
We can now write this as
∞
X Dn
f (x + a) = ( an )(f )(x)
n=0
n!
and therefore as
f (x + a) = exp(aD)(f )(x)
In other words, exp(aD) is the operator that shifts a function by a.
f (x + 1) − f (x) = x3
exp(D)f − f = x3
(exp(D) − 1)f = x3
1
f = x3
exp(D) − 1
1 3
f = 1 2 1 3 1 4 1 5
x
D+ 2D + 3! D + 4! D + 5! D
Z
1
f = 1 1 2 1 3 1 4
x3 dx
1 + 2 D + 3! D + 4! D + 5! D
1 1 4
f = 1 1 2 1 3 1 4
x
1 + 2 D + 3! D + 4! D + 5! D
4
There are tricks we can use to minimise the work here although I’m going to be a bit more explicit
than needed so everything is clear. Analogously to solving differential equations, we are going
to end up with a “particular sum”. The full solution is going to require determining a constant
term. But it’s clear that the sum needs to be zero for x = 0 so the constant term should be zero.
7
Applying D4 to x4 is going to give us a constant term. So we only need to keep terms up to D3 .
We get
1 1 1 1 1 1 1
f = (1 − ( D + D2 + D3 ) + ( D + D2 )2 − ( D)3 )x4
4 2 3! 4! 2 3! 2
1 1 1 2 1 3 1 2 1 3 1 3 4
= (1 − D − D − D + D + D − D )x
4 2 6 24 4 6 8
1 1 1 2 4
= (1 − D + D )x (nice for us, the cubic term vanishes)
4 2 12
x4 x3 x2
= − +
4 2 4
This may seem borderline magical. One way to think about it is that if we know f is a degree 4
polynomial, then f (x + 1) = f (x) + f 0 (x) + 12 f 00 (x) + 3!
1 (3) 1 (4)
f (x) + 4! f (x) exactly. So solving the
summation is equivalent to solving the differential equation
1 d4 f 1 d3 f 1 d2 f df
4
+ 3
+ + = x3
4! dx 3! dx 2 dx2 dx
It is entirely reasonable to solve this using differential equation methods. If you extend this
approach beyond polynomials you rediscover the Euler-Maclaurin summation formula.
Example 6.2. Solve
an+2 = an+1 + an + n2 with a0 = 0, a1 = 0
In this case the problem requires finding a spolution with specific initial conditions. So we’re going
to need both a “particular sum” and a solution to the homogeneous equation. It’s well known that
the solutions to the homogeneous equation are the Fibonacci numbers Fn and the Lucas numbers
Ln . Any other solution is a linear combination of these. The Fibonacci numbers start with F0 = 0
and F1 = 1 and the Lucas numbers start with L0 = 2 and L1 = 1.
Let’s write ax = f (x) + AFx + BLx so our notation matches what used earlier. We have
(e2D − eD − 1)f = x2
1
f = x2
e2D
− eD − 1
5
= (−1 − D − D2 )x2
2
= −x2 − 2x − 5
5
Because F0 = 0, the initial condition at x = 0 immediately implies that B is L0 = 52 . The initial
condition at x = 1 now gives A = 112 . The complete solution is
5 11
an = Ln + Fn − n2 − 2n − 5
2 2
By the way, Mathematica makes a real mess of this. I’ve seen similar methods in a work at least
a century old. Maybe Boole.
7 Numerical methods
7.1 Derivatives
The relation (ehD − 1)f (x) = f (x + h) − f (x) expresses a finite difference in terms of the differ-
ential operator. We can do this in reverse and derive formulae for derivatives in terms of finite
differences. A good application of this is to derive approximations of derivatives that we can use
in numerical methods on a grid. Define Eh = ehD . We’d like to write D in terms of Eh . That
seems straightforward. We expect
1
D = log Eh
h
The problem is, we can’t simply apply a Taylor expansion to log Eh about 0. Suppose we’d like
our numerical methods to work well on polynomials - typically giving exact results on low order
8
Figure 2: A snippet from Arbogast’s Du calcul des dérivations published in 1800
polynomials. The operator Eh − 1 corresponds to finite differencing and we know that repeated
finite differences applied to polynomials eventually give zero. So if we seek a power series in
Eh − 1 we can guarantee convergence on polynomials. Let’s define the finite difference operator
∆h = ehD − 1 = Eh − 1. So we should expand log Eh around 1 and use
Eh D = Eh log Eh
Eh D = (1 + ∆h ) log(1 + ∆h )
1
≈ (1 + ∆h )(∆h − ∆2h )
2
1 2
≈ ∆h + ∆h
2
1
= Eh − 1 + (Eh − 1)2
2
1 1 2
= − + Eh
2 2
9
So f 0 (x + h) ≈ h1 (f (x + 2h) − x(f )) or
f (x + h) − f (x − h)
f 0 (x) ≈ .
2h
which is the usual central difference formula.
More generally, we can use the power series of ∆m k
h log ∆h up to terms in ∆h to generate an
kth order estimate for D. Even more generally, we can use the power series of ∆m n
h (log ∆h ) to
n
estimate D .
Example 7.1. Derive a 3rd order upwind estimate for D using f (x − h), . . . , f (x + 3h). Solution:
1 1
(1 + ∆h ) log(1 + ∆h ) ≈ (1 + ∆h )(∆h − ∆2h + ∆3h )
2 3
1 2 1 3 1
= ∆h − ∆h + ∆h + ∆h − ∆3h
2
2 3 2
1 2 1 3
= ∆h + ∆h − ∆h
2 6
1 1
= Eh − 1 + (Eh − 1)2 − (Eh − 1)3
2 6
1 2 1
= Eh − 1 + (Eh − 2Eh + 1) − (Eh3 − 3Eh2 + 3Eh − 1)
2 6
1 1 2 1 3
= − − Eh + Eh − Eh
3 2 6
So
−2f (x − h) − 3f (x) + 6f (x + h) − f (x + 2h)
f 0 (x) =
6
Example 7.2. Derive a symmetric 4th order order estimate for the third drivative. Solution:
1 1 1
(1 + ∆h )2 (log(1 + ∆h ))3 ≈ (1 + ∆h )2 (∆h − ∆2h + ∆3h − ∆4h )2
2 3 4
1
≈ ∆3h + ∆4h
2
1
= (−1 + 2∆2h − 2∆2h + ∆3h )
2
So
−f (x − 2h) + 2f (x − h) − 2f (x + h) + f (x + 2h)
f (3) (x) ≈
2
a0 + k(a0 + a1 x) + k 2 (a0 + a1 x + a2 x2 ) + . . .
The nth order one-sided derivative is derived using n terms from log Eh = log(1 + ∆h ). So the
coefficients from the nth order scheme are the coefficients of the polynomial in x that forms the
coefficient of k n in log(1+(x−1)k)
1−k . We can tabulate this as
10
Order Coefficients
0 0
1 −1 1
−3 −1
2 2 2 2
−11 −3 1
3 6 3 2 3
−25 4 −1
4 12 4 −3 3 4
−137 10 −5 1
5 60 5 −5 3 4 5
This reproduces the table in [1]. All of the results in that paper can be reproduced by forming
power series from expressions of the form (1 + x)m (log(1 + x))n , with m a half-integer in some
cases.
Incidentally, I used the second order scheme from this table in ILM’s GPU based fluid simulator
basing my work on [2]. I derived it using the techniques described above but it disagreed with the
paper. It turned out that the paper had an error. The moral is: it’s worth knowing how to derive
these schemes even if they appear to be available already in publications. (That paper is otherwise
excellent and contains many powerful methods.)
7.3 Caveat
Before using any of these methods in any kind of differential equation solver please consider doing
a von Neumann stability analysis. It’s sometimes hard to guess which methods are and aren’t
stable.
if you can make yourself to believe that D can behave like an ordinary number in an integration.
Again we use the technique of writing D = h1 log Eh .
11
E2 − 1 (∆ + 1)2 − 1
=
log E ∆ − 21 ∆2 + 13 ∆3
∆2 + 2∆
=
∆ − 21 ∆2 + 13 ∆3
∆+2
=
1 − 2 ∆ + 13 ∆2
1
1 1 1
= (∆ + 2)(1 + ∆ − ∆2 + ( ∆)2 )
2 3 2
1 1 2
= (∆ + 2)(1 + ∆ − ∆ )
2 12
1 2 1
= ∆ + ∆ + 2 + ∆ − ∆2
2 6
1 2
= 2 + 2∆ + ∆
3
So Z 2
1
f (x)dx ≈ (f (0) + 4f (1) + f (2))
0 3
Rescaling the x-axis gives:
b
b−a
Z
f (x)dx ≈ (f (x0 ) + 4f (x1 ) + f (x2 ))
a 6
where x0 = a, x1 = a+b 2 , x2 = b. This is the usual Simpson rule. Note that we could have kept
terms up to ∆3 in the derivation above and we would have had the same result. So Simpson’s rule
is good for cubics as well as quadratics.
with the xi equally spaced from a to b. This is probably the method used by Boole to derive these
coefficients.
12
Figure 3: A snippet from Boole’s Calculus of Finite Differences
where the integral is over some domain D in two dimensions, say. We’d like to compute this in
terms of samples of f at points on some grid. How can we derive a suitable method? As we’re
∂ ∂
working in two dimensions, let’s define X = ∂x and Y = ∂y . Write
Z
I(f ) = f (x, y)dxdy
ZD
= f (x0 , y 0 )dx0 dy 0
D
Z 0 0
= ex X ey Y dx0 dy 0 f (0)
D
13
First we compute the integral
Z
1
U (a, b) = exp(ax + by)dxdy
π D
Z 2π Z 2
1
= exp(a cos θ + b cos θ)rdrdθ
π 0 0
This is in fact a slight variation on the classic integration that gives us the Airy disk in optics.
The result is √
4I1 (2 a2 + b2 )
U (a, b) = √
a2 + b2
where I1 is the first order modified Bessel function of the first kind.
Scaling by 14 so as to get coefficients appropriate for the unit circle, we compute the Taylor
series up to 4th order in ∆x and ∆y in
1
(1 + ∆x )2 (1 + ∆y )2 U (log(1 + ∆x ), log(1 + ∆y ))
4
When this expansion is written in terms of Ex and Ey we get the grid of coefficients:
-1 34 114 34 -1
34 464 444 464 34
1
4320 114 444 -36 444 114
34 464 444 464 34
-1 34 114 34 -1
We can try using these coefficents to integrate an example function. First an approximation
computed by another Z
cos(2x) log(2 + y 2 )dxdy ≈ 0.477976.
D
Using the coefficients above we get 0.477383.
Note that this example is sub-optimal. For example, it uses values outside of the integration
region. But for smooth enough functions it gives good results and the method can be adapted to
many other problems.
14
R∞ √
We assumed that −∞
exp(−(x − a)2 )dx = 2π even when a is a differential operator.
So Z ∞
aD2 1 y2
exp ( )f (x) = √ exp(− )f (x − y)dy
2 2π −∞ 2a
This operation is sometimes called the Weierstrass transform, but better known in the graphics
and image processing world as Gaussian blur.
So we have
Z ∞
2πδ(iD − ω)f (x) = exp(iy(iD − ω))dyf (x)
−∞
Z ∞
= exp(iy(iD − ω))dyf (x)
−∞
Z ∞
= exp(−iyω)f (x − y)dy
−∞
Z ∞
= exp(i(x − y)ω)f (y)dy
−∞
Z ∞
= exp(ixω) exp(−iyω)f (y)dy
−∞
√
= 2π exp(ixω)F̃ (ω)
Therefore
1
δ(iD − ω)f (x) = √ exp(ixω)F̃ (ω)
2π
Notice what it’s doing. It’s projecting the original f to a plane wave scaled by the Fourier transform
at ω. It’s the projection of f onto the Fourier component corresponding to ω. We expect to be
able to reassemble the original function f by summing these projections back up again. First on
the right hand side:
Z ∞
1
√ exp(ixω)F̃ (w)dω = f (x)
−∞ 2π
15
This is the usual statement of how to invert the Fourier transform. Now on the left hand side:
Z ∞ Z ∞ Z ∞
1
δ(iD − ω)dωf (x) = √ δ(iD − ω) F̃ (ν) exp(iνx)dνdω
−∞ 2π −∞ −∞
Z ∞Z ∞
1
= √ F̃ (ν)δ(iD − ω) exp(iνx)dωdν
2π −∞ −∞
Z ∞Z ∞
1
= √ F̃ (ν)δ(i(iν) − ω) exp(iνx)dωdν (diagonalisation)
2π −∞ −∞
Z ∞Z ∞
1
= √ F̃ (ν)δ(−ν − ω) exp(iνx)dωdν
2π −∞ −∞
Z ∞
1
= √ F̃ (−ω) exp(−iωx)dω (standard property of δ)
2π −∞
Z ∞
1
= √ F̃ (ω) exp(iωx)dω
2π −∞
= f (x)
This illustrates how we’re able to work with a Dirac delta of a differential operator much like how
you can work with a conventional Dirac delta.
On its own the Dirac delta of D isn’t so useful. It gets more interesting when you consider
δ(iD + ax − ω) but that’s for another time.
11 Summary
1 identity f (x)
Dn nth derivative f (n) (x)
exp(aD) shift by a f (x + a)
1
R
D integration f (x)dx
1
Rx
D+a leaky integration −∞
ea(y−x) g(y)dy
1
sinh(aD) finite difference 2 (f (x + a) − f (x − a))
R∞
exp(aD2 /2) Weierstrass transform −∞
exp(−y 2 /2a)f (x + y)dy
1
P
exp D−1 Euler-Maclaurin sum x f (x)
12 More
So far I’ve looked at functions of D but it’s also possible look at functions of both x and D. For
example exp(xD) is an operator that appears in many places. However, I have to draw the line
somewhere. So in these notes I’ve chosen to stick with functions of just D and leave the larger
class of operators to a sequel.
16
I’ve left out any mention of fractional powers of D. Whenever I try to read about this subject
I mostly find fractional powers of D being used to solve problems about fractional powers of D.
But there is a large literature on the subject out there.
13 Final thoughts
There are a couple of other uniform approaches to much of what I’ve said above.
One is to use the shift rule to turn your problem into some kind of constraint on a polynomial.
Write the polynomial in general form as a0 + a1 x + . . . + an xn and write the constraint as a linear
system in the vector (a0 , . . . , an ). This is the traditional method taught to students when solving
differential equations. Guess something general enough and solve for the coefficients.
Another approach, relevant to the numerical methods, is to again assume your problem is about
polynomials, and use Lagrange interpolation to fit a polynomial to your data. You now integrate
or differentiate the interpolating polynomial and relate this back as a linear operation on your
original data.
Both of these miss out on the hidden structure given by the rational and transcendental func-
tions of D I’ve written about here.
References
[1] Bengt Fornberg. Generation of finite difference formulas on arbitrarily spaced grids. Mathe-
matics of Computation, 51(184):699–699, 1988.
[2] Jeroen Molemaker, Jonathan M. Cohen, Sanjit Patel, and Jonyong Noh. Low Viscosity
Flow Simulations for Animation. In Markus Gross and Doug James, editors, Eurograph-
ics/SIGGRAPH Symposium on Computer Animation. The Eurographics Association, 2008.
17