Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes
Jean C. Cortissoz I.
3
4 CHAPTER 1.
The previous lemma makes the bisection method a useful tool. Indeed,
assume you want to compute a root r ∈ [a, b] of a continuous function f with
a precission given by > 0, that is, you want to produce an r∗ such that
|r∗ − r| < . Then choose N a positive integer such that
1
(b − a) < ,
2N
and apply the bisection procedure N times: your desired r∗ is just rN . Notice
b−a
that then we can actually compute N from the initial data: N = dlog2 e.
xn+1 = f (xn ) .
1.2. FIXED POINT METHOD 5
We will show that the sequence thus constructed is a Cauchy sequence, and
given the completeness of X, it is convergent. Notice first that
|xm+1 − xm | ≤ αn |x1 − x0 | .
From this follows that the sequence (xk )k=1,2,3,... is a Cauchy sequence, and
hence convergent. Let x∗ = limn→∞ xn . Using the continuity of f , we have
that
x∗ = lim xn+1 = lim f (xn ) = f (x∗ ) ,
n→∞ n→∞
so x∗ is a fixed point of f .
For the uniqueness part, assume that x∗ and x∗∗ are both fixed points of f .
Then we have that
We are now ready for the applications. Say we want to find a root r of a
differentiable function f . Consider the function
g (x) = x − f (x) ,
g (x) = x − x2 − 2 ,
since f 0 (x) = 2x can be as large as 4. But not all is lost, because we can define
1 2
g (x) = x − x −2 ,
4
and a fixed point of this new g will be a root of f . Also, if we compute g 0 (x) =
1
1 − x, and we have the stars aligned in our favor, since it is easy to see that
2
0 1
|g (x)| ≤ . All that is left to do to make sure that the Fixed Point Method
2
can be applied is to check that g [1, 2] ⊆ [1, 2]. This is left as an exercise, and
as a hint notice that g is an increasing function.
1.2. FIXED POINT METHOD 7
Using the Fixed Point Method we obtain the following succesive aproxima-
√ 3 23
tions for 2 starting with x0 = 2: 2, , .
2 16
The question you should be asking yourself, dear reader, is how good these
23
approximations are. Say, how close is from the right answer. To estimate
16
23
how far is is from the right answer notice that this number corresponds to
16
x2 and that α = 14 , and hence
√
2 − 23 ≤ 1 4 1 = 1 .
16 42 3 2 24
Sometimes, when the Fixed Point Method cannot be directly applied by
massaging the original equation to be solved a Fixed Point Iteration scheme can
be produced. Let us give an example again. We start from
x2 − 2 = 0,
which is the equation that we want to solve. Then we can rewrite this equation
succesively as follows:
x2 − 1 = 1, (x − 1) (x + 1) = 1,
and finally
1
x=1+ .
1+x
1
So in this case g (x) = 1 + . Again we must verify that g [1, 2] ⊆ [1, 2], and
1+x
now notice that
1 1
|g 0 (x)| = 2 ≤ on [1, 2] .
(1 + x) 4
3 7 17 41 99
If we start with x0 = 1 we generate the following sequence: , , , , ,
2 5 12 29 √ 70
etc. Of course, we can also estimate how far are our approximations from 2.
Employing the formula from the proof of the Fixed Point Theorem, we see that
√
2 − 99 ≤ 1 1 · 1 = 1 .
70 45 1 − 14 2 1536
Problem. If N steps are needed in a fixed point algorithm to get an an-
swer with a precision of m digits, how many steps are required to double such
precision? √
Problem. Using the second method described above to find 2, find an
algorithm to compute the square root of any positive integer. You must give the
seed to start the scheme, you must show that it does converge, and you must
give an estimate of how good your approximation is after n iterations.
Problem. Start with the equation you want to solve
f (x) = 0 on [a, b] ,
M 2
|xn+1 − r| ≤ |xn − r| .
2η
M
Let us call β = 2η , and let ρ = |x0 − r|; then we have
2n
|xn − r| ≤ (βρ) β −1 , (1.2)
We call the Li ’s Lagrange polynomials, and they are quite easy to construct.
First define
Y
Mi (x) = (x − xk ) ,
k6=i
and we are more than already half way done. Indeed, notice that
M (xj ) = 0, whenever j 6= i,
whereas
Y
M (xi ) = (xi − xk ) 6= 0,
k6=i
M (x) Y x − xk
Li (x) := = .
M (xi ) xi − xk
k6=i
11
12 CHAPTER 2.
f (x) − In f (x)
0 = g (n+1) (ξ) = f (n+1) (ξ) − (n + 1)! ,
qn+1 (x)
so we obtain
f (n+1) (ξ)
f (x) − In f (x) = qn+1 (x) .
(n + 1)!
Exercise. Assume you are aproximating the sine function in the interval
[0, h] using (a) a linear polynomial using as interpolation points x0 = 0 and
x1 = h (b) a quadratic polynomial using x0 = 0, x1 = h2 and x2 = h. Find in
both cases a bound on the error between the sine function and its interpolating
polynomial. In case (b), is there an optimal way of choosing x1 ? (think of
minimizing the maximum possible error).
(x − xi ) (x − xi+2 ) (x − xi ) (x − xi+2 )
=− ,
(xi+1 − xi ) (xi+1 − xi+2 ) h2
(x − xi+1 ) (x − xi ) 2 (x − xi+1 ) (x − xi )
= ,
(xi+2 − xi+1 ) (xi+2 − xi ) h2
2.2. APPLICATIONS OF LAGRANGE INTERPOLATION TO INTEGRATION13
2 (x − xi+1 ) (x − xi+2 )
P2 (x) = yi
h2
(x − xi ) (x − xi+2 ) 2 (x − xi+1 ) (x − xi )
− yi+1 + yi+2 .
h2 h2
Z xi+2
You are invited to use P2 to compute an approximation of f (x) dx. We
xi
1
will try another way. To simplify a little, we will choose xi = 0, xi+1 = 2 and
xi+2 = 1. The interpolating polynomial ax2 + bx + c must satisfy
c = y0
1 1
a + b = y1 − y0
4 2
and
a + b = y2 − y0 .
Solving this system we obtain
b = 4y1 − 3y0 − y2 .
Hence, the area under the parabola is given by
Z 1
1
ax2 + bx + c dx = (y2 + 4y1 + y0 ) .
0 6
So the previous formula gives us the area under the parabola when h = 1. For
arbitrary h, using a change of variables, we obtain
Z 1
h
h ax2 + bx + c dx = (y2 + 4y1 + y0 ) .
0 6
If we divide the interval [a, b] into 2n subintervals, then Simpson’s rule becomes
Z b
h
f (x) dx ∼ (y0 + 4y1 + 2y2 + 4y3 + 2y4 · · · + 4y2n−1 + y2n ) .
a 6
Let us estimate the error given by this formula. We first compute the error on
[xi , xi+2 ]. We use Theorem 2 to obtain
M
|f (x) − In f (x)| ≤ |(x − xi ) (x − xi+1 ) (x − xi+2 )|
6
M 3
≤ √ h ,
72 3
M (b − a) 3
√ h .
72 3
Z b
Exercise. Find a formula to compute f (x) dx using interpolation with
a
polynomials of degree 3. Find an estimate for the error.
Exercise. Show that for Simpson’s rule the error estimate can be improved to
M (b − a) 3
h ,
192
f (n) (x) n
f (β) = P (β) + (β − α) .
n!
Proof. ...
We must worry now about how good this approximation is. To do this we use
Taylor’s theorem (actually we only need to use the mean value theorem).
Z Z
xi xi
f (x) dx − f (xi−1 ) h = f (x) − f (xi−1 ) dx
xi−1 xi−1
Z xi
≤ |f (x) − f (xi−1 )| dx
xi−1
Z xi
= |f 0 (ξx )| (x − xi−1 )
xi−1
M 2
≤ h ,
2
where M is a uniform bound on f 0 in the interval [a, b]. Hence we have the
useful estimate
Z
b n
X M (b − a)
f (x) dx − f (xi−1 ) h ≤ h.
2
a
i=1
The approximation given by the midpoint rule is in general better than the one
given by the leftpoint rule. Indeed, we can estimate the error as follows.
Z Z
xi xi
f (x) dx − f x∗i−1 h = f (x) − f x∗i−1 dx
xi−1 xi−1
Z xi
f (x) − f x∗i−1 dx
≤
xi−1
Z x∗
i−1
|f 0 (ξx )| x∗i−1 − x dx
=
xi−1
Z xi
|f 0 (ξx∗ )| x − x∗i−1 dx
+
x∗
i−1
2 2
M h M h M 2
≤ + = h ,
2 2 2 2 4
where M is a uniform bound on f 0 in the interval [a, b]. Hence we have
Z n
b X M (b − a)
f x∗i−1 h ≤
f (x) dx − h.
4
a
i=1
16 CHAPTER 2.
M
where the implicit constant is given by
.
24
Exercise (a higher order midpoint rule). Given a partition
a = x0 < x1 < · · · < xn−1 < xn = b
with h = xi+1 − xi , let x∗i be the midpoint of the interval Ji = [xi , xi+1 ]. Use
the Taylor expansion of f centered at x∗i given by
f (x) = f (x∗i ) + f 0 (x∗i ) (x − x∗i ) +
f 00 (x∗i ) 2 f 000 (x∗i ) 3 f (4) (ξx ) (x∗i ) 4
(x − x∗i ) + (x − x∗i ) + (x − x∗i )
2 3! 4!
to show that
xi+1
f 00 (x∗i ) 3
Z
M4
f (x) dx − f (x∗i ) h + h5 .
h ≤
xi 24 80 × 4!
where M4 is a bound on the fourth derivative of f on the interval [a, b]. Use
Z b
this to develop a numerical method to compute f (x) dx.
a
2.4. THE TRAPEZOIDAL RULE 17
M M M
where the implicit constant is given by + = . From this we deduce
24 8 6
that
Z b n−1
hX
[f (xi ) + f (xi+1 )] + O h2 ,
f (x) dx =
a 2 i=0
Exercise. Complete the details of the derivation of the for the size of the error
in the trapezoidal rule.
Exercise. Can you improve on the implicit constant given for the Trapezoidal
Rule?
But how do we choose the ci ’s (called weights) and the xi ’s? Well, the idea is
that the approximation given above becomes an equality when f is a polynomial
of degree at most 2n−1. Notice then, that estabilishing the values of the ci ’s and
the xi ’s becomes a problem of solving 2n nonlinear equations with 2n unknowns.
To see how this method works, we shall work out the case n = 2. In this
case, we must produce a formula
Z 1
f (x) dx ∼ c1 f (x1 ) + c2 f (x2 ) .
−1
18 CHAPTER 2.
But then, for this formula to be exact for a polynomial of degree at most 2 ×
2 − 1 = 3 we must have
Z 1
1 dx = 2 = c1 + c2 ,
−1
Z 1
x dx = 0 = c1 x1 + c2 x2 ,
−1
Z 1
2
x2 dx = = c1 x21 + c2 x22 ,
−1 3
Z 1
1 1
x3 dx = 0 = c1 x31 + c2 x32 .
−1 3 3
By symmetry considerations, we have that c1 = 1 = c2 and x1 = −x2 . Using
then the third equation yields
r r
1 1
x2 = , x1 = − .
3 3
Therefore, the formula for the Gaussian quadrature with n = 2 is given by
Z 1 r ! r !
1 1
f (x) dx ∼ f − +f .
−1 3 3
We are now concerned with estimating the error when using Gaussian quadra-
tures. We can do this via the ever useful Taylor’s theorem. Indeed, when work-
ing with a Gaussian quadrature of degree n, by its very definition, if we use the
Taylor approximation centered a 0 of the function (so we must assume that it
is at least 2n-times differentiable), call it Tn (x), we know that
f (2n) (ξx ) 2n
f (x) = T2n−1 (x) + x .
(2n)!
So if we denote by Qj (f ) the Gaussian quadrature of order j applied to the
function f , we want to estimate
Z 1
f (x) dx − Qn (f ),
−1
we can do it as follows. We assume a bound f (2n) (ξ) ≤ M2n on [−1, 1].
Z
1
Z 1
Z 1
f (x) dx − Qn (f ) =
f (x) dx − T2n−1 (x) dx + Qn (T2n−1 ) − Qn (f )
−1 −1 −1
Z 1 (2n)
f (ξx ) 2n
≤
(2n)! x dx + |Qn (f ) − Qn (T2n−1 )|
−1
2M2n X
≤ + |ci | |f (xi ) − T2n−1 (xi )|
(2n + 1)! i
2M2n M2n X
≤ + |ci | x2n
i .
(2n + 1)! 2n! i
2.5. GAUSSIAN QUADRATURE 19
M6
and the error expected from using this formula is at most . So if we apply
R1 x 2520
this formula to computing −1 e dx, we obtain from our efforts that
Z 1
ex dx ∼ 2.350336929,
−1
1
and this value is at most at from the real actual value which for your
840
information is approximately 2.35040....
Exercise. Deduce the points and weights for the Gaussian quadrature formula
with n = 3.
Exercise. Develop a formula using the Taylor polynomial of degree 1 to ap-
proximate integrals numerically. Find an estimate for the error of your approx-
imation.
2.5.1
In general, to apply the Gaussian quadrature technique to a function in a general
interval
Z b
f (x) dx,
a
Also, to increase the accuracy of the method, we can divide the interval [−1, 1]
into n pieces of the same size:
and then use Gaussian quadrature in each of the subintervals Ji−1 = [xi−1 , xi ].
We do this as follows:
h
Z xi Z
2 xi−1 + xi
f (x) dx = f +z dz.
xi−1 −h
2
2
Let us write g (z) = f xi−12+xi + z . When we use Gaussian quadratures to
compute an integral of the form
Z h
2
g (x) dx,
−h
2
And then it comes the computation of the errors. This time we have
Z h Z h Z h
2 2
f (x) dx − Qn (f ) = f (x) dx − T2n−1 (x) dx+
−h −h −h
2 2
Qn (T2n−1 ) − Qn (f )|
Z h (2n)
2f (ξx ) 2n
≤
(2n)! x dx + |Qn (f ) − Qn (T2n−1 )|
−h2
2n+1
2M2n h hX h h
≤ + |ci | f xi − T2n−1 xi
(2n + 1)! 2 2 i 2 2
2n+1 2n
2M2n h M2n h X h
≤ + |ci | x2n
i .
(2n + 1)! 2 2n! 2 i 2
Say we apply Gaussian quadrature of degree n on Ji−1 . Then the error, assuming
the ci are positive is given by
2n
2M2n h
,
(2n)! 2
and hence using this procedure, we obtain that the error of the approximation
is
2n
2M2n h
.
(2n)! 2
In the case n = 3, we obtain that the error when using Gaussian quadrature, is
at most
M6 5
h ,
1260
where M6 is a bound on the sixth derivative of f on [a, b].
2.6. MONTECARLO METHOD FOR INTEGRATION 21
Miniproject
Given a curve, which is given by y = y (x), a function of x, and which joins the
points (0, 0) and (1, 1), show that the time of descent of a particle from (1, 1)
to (1, 1), is given by the integral
s
1
1 + ẏ 2
Z
dx,
0 2g (1 − y)
∞ ∞
!
[ X
P Ak = P (Ak ) .
k=1 k=1
By linearity we can extend this definition for finite linear combinations of indi-
cator functions, which we shall call simple functions:
n
Z X n
X
cj 1Aj (ω) dP (ω) := cj P (Aj ) .
Ω j=1 j=1
We can now extend this definition for positive functions. Given a function
X ≥ 0, we define
Z Z
X (ω) dP (ω) = sup s (ω) dP (ω) ,
Ω s∈S,0≤s≤X Ω
where S is the set of all simple functions. And for a general random variable,
first define
and thus
Z Z Z
X (ω) dP (ω) := X + (ω) dP (ω) − X − (ω) dP (ω) .
Ω Ω Ω
Now assume that the dice is not fair, and the respective probabilities are
P ({1}) = P ({6}) = 41 , and P ({j}) = 18 for j = 1, 3, 5, 6. We want to compute
Z
X (ω) dP (ω). In order to do this we can compute from the definition for the
Ω
integral of a simple variable:
Z
X (ω) dP (ω) = 20P ({2}) + 10P ({4}) − 30P ({1, 3, 5, 6}) ;
1 1 1 1 3
P ({1, 3, 5, 6}) = P ({1}) + P ({3}) + P ({5}) + P ({6}) = + + + = ,
4 8 8 4 4
and hence Z
5 5 45 75
X (ω) dP (ω) = + − = .
2 4 2 4
2.6. MONTECARLO METHOD FOR INTEGRATION 23
With these definitions at hand, we can define the expectation (or expected
value) of a random variable:
Z
E (X) = hXi = X (ω) dP (ω) .
Ω1
2
And the variance σ is given by
D E
2
σ 2 = (X − hXi) .
and notice that the integral gives the area under the curve y = f (x). X repre-
sents the following experiment: choose a point on [0, a] × [0, b] and give yourself
a point if the point is under the curve y = f (x) and none if your point is above.
We take copies of the random variable X, and denote them by Xi , i =
1, 2, 3, . . . , N ; these are independent copies in the following sense: what happens
in the i-th instance of the experiment has nothing to do with what happend or
will happen at another instance of the experiment. Our approximation to the
N
integral we want to compute is then Z : ([0, a] × [0, b]) −→ [0, 1] defined by
N
1 X
Z (ω1 , . . . , ωN ) = Xi (ωi ) .
N i=1
Now we are interested in knowing how good the approximation given by this
method. In order to do this, we estimate the variance of Z and use Chebyshev’s
inequality. First, notice that, being the Xi independent, the variance of Z can
be computed as
N
2 1 X 2
σZ = σ
N 2 i=1 Xi
1 2
= σ .
N X
Therefore for any > 0, we have that
2
1 σX
P (|Z − hZi| ≥ ) ≤ .
N 2
Let us illustrate with an example. Assume we want to compute
Z 1
x3 dx.
0
not need to compute the exact variance, we only need an estimate!), and hence
we want
1 4
≤ 0.1,
N 10−6
and hence N ≥ 4 × 107 . Z 1 √
Exercise. Write an algorithm to compute x dx using a Montecarlo Method.
0
How many instances of the experiment must be performed to obtain an accuracy
of 10−3 with a probability higher than 0.85
Z Z
Exercise. Write a Montecarlo algorithm to compute x2 + y 4 dx dy where
D
D is the part of the unit disc centered at the origin in the xy-plane located in
the first quadrant. Estimate the number of iterations in your algorithm to reach
an accuracy of 10−3 with probability 0.85.
Then !
2N 2 2
P (|Z − hZi| ≥ ) ≤ 2 exp − PN 2
.
i=1 (bi − ai )
UsingZ Hoeffding’s inequality to find N so that the approximation of the
1
integral x3 dx is within 10−3 of the exact value with probability at least of
0
0.90 gives N ∼ 1.500.000.
26 CHAPTER 2.
Chapter 3
In this chapter we will discuss a simple way that can be used to find the mini-
mum of a given function. This method is based on the fact that for any given
differentiable function, minus its gradient points toward the direction of max-
imum decrease. So,quite appropiately, the name of the method that we shall
discuss is the gradient method. As an application of the techniques introduced
in this chapter we will discuss neural networks.
2 1 T
f (xn+1 ) = f (xn ) − t k∇f (xn )k + ∇f (xn ) Hess (f ) (θ) ∇f (xn ) t2 .
2
We can bound the righthandside from above as
2 t2 2
f (xn ) − t k∇f (xn )k + kHess (f ) (θ)k2 k∇f (xn )k ,
2
where for a matrix A = (aij )i=1,...,m;j=1,...,n the expression kAk2
21
Xm X
n
kAk2 = a2ij .
i=1 j=1
27
28 CHAPTER 3.
Hence, if we want that f (xn+1 ) < f (xn ), assuming a bound kHess (f ) (θ)k2 ≤
β, we obtain that if the stepsize t satisfies
2
t< ,
β
In conclusion,
n
X 2
tj k∇f (xj )k
j=1
d2
f (z + tu) = uT Hessf (z) u ≥ η > 0.
dt2 t=0
h
u= ,
khk
Since
d
f (z + tu) = ∇f (z + h) · u
dt t=khk
we arrive at an estimate
The conclusion we can draw is the following. Let > 0 and assume that for all
n ≥ N we have that k∇f (xn )k ≤ , then the distance from xn to z must satisfy
kxn − zk ≤ ,
η
and the sequence provided by the gradient method will converge towards z.
Proof. We let x, y be arbitrary distinct points. We let t ∈ (0, 1), and define
z = ty + (1 − t) x, and consider the Taylor expansion around z. So we have,
using the fact that f 00 > 0,
and
f (x) > f (z) + ∇f (z) [t (x − y)] .
Now multiply the first by t and the second equation by 1 − t and them to
conclude that
f (z) < tf (x) + (1 − t) f (y) .
The lemma above implies that a strongly convex function can only have one
local minimum. In fact, assume there are two local minimums, say x1 and x2 ,
and assume, without loss of generality that f (x1 ) ≤ f (x2 ). If there is strict
inequality, then x2 cannot be a local minimum. Indeed, by the previous lemma,
we would have that for any 0 < t < 1
so by taking t very close to 1, we see that x2 does not satisfy the definition of
local minimum. Hence we must have f (x1 ) = f (x2 ), and hence f is constant
along the line [x1 , x2 ], which contradicts the previous lemma, as we would have
for any 0 < t < 1
and hence for R > 0 large enough, if kxk > 0 we have that
From this we immediatly conclude that the minimum of f in the closed ball
centered at 0 of radius R, which exists by compactness, is a global minimum
of f . This global minimum is of course a local minimum, and by the previous
considerations we can conclude that it is unique. Also, it is clear that at the
point were the minimum is reached, ∇f = 0.
Finally, for a strongly convex function we can guarantee that the steepest
descent method with small enough step size converges towards its unique global
minimum.
√ 2
We shall prove that if we select x0 > 2 and t ≤ 2 the algorithm
√ 12x0 − 8
converges towards 2, though quite slowly. Indeed, the fact that f (xn+1 ) <
f (xn ) implies that the sequence must remain in a bounded set. This implies that
a there is a convergent subsequence xnk . This subsequence converges towards
a point z for which f 0 (z) = 0. But close enough to a minimum, the function
g is monotonic. This then implies that the original sequence xn is eventually
monotonic, and being bounded, it is convergent.
∇f (x) = Ax − b.
On the other hand, Hessf (x) = A, and f is strongly convex, and we can
conclude that the gradient method (steepest descent) will provide the minimizer
of f , say x0 which in turn satisfies
b = Ax0 .
Of course we must be careful with the stepsize. In this case, it must be less
than 2/Λ where Λ is the largest absolute value of an eigenvalue of A.
What can we then say about Hess (J), is it positive definite? Notice that the
upper left corner Jaa > 0, and the determinant is strictly positive (unless all
xi ’s) are equal. This is justified by the Cauchy-Schwartz inequality
2
∂ J √
∂a∂b ≤ m kxk ,
with equality if and only if all xi ’s are equal. The determinant of the Hessian
is then
2 2 2
|Hess (J)| = Jaa Jbb − (Jab ) > m kxk − m kxk = 0.
The error function J is then strongly convex. The existence of a minimum is
guaranteed, as is the convergence of the gradient method.
gaps between neurons. In this gap, a given neuron from the the terminal parts
of its axon releases chemical substances called neurotransmitters which bind to
receptors in the neuron across the synapse, this neuron is called presynaptic;
the receiving neuron is called postsynaptic. This neurotransmitter generate a
reaction on the receiving neuron, but in order for this reaction to be strong
enough so that the receiving neuron in turn generates a pulse an releases its
own neurotransmitters, it might be required that many neurons release their
neurotransmitters that bind to the receptors of the postsynaptic neuron, and
hence a threshold must be surpassed before a neuron really reacts. This fact
will be important when modelling a neuron in a neural network.
Once the neuron reacts, a pulse is created which will run along its axon.
This is the action potential. Let us explain how this works. The fluids across
the membrane of the axon is polarised. The interior of the neuron is at a lower
electric potential with respect to the exterior of the neuron. Once the neuron
reacts and ”decides” to create its action potential, a series of voltage activated
gates open and let a flow of positive ions to enter the neuron, and this causes
a change in polarization at the membrane, a change that goes running along
the whole axon: once the process of polarization reaches a point in the axon, a
voltage operated gate opens allowing the positive ions to flow in, and the process
of polarization to keep going. Once the potential reaches the end of the axon,
some bubbles,
called vesicles, containing the neurotransmitters come into contact with the
cell membrane and ”open up” releasing the neurotransmitter into the sinapsis
which goes on to a dendrite, a muscle or a.
−A− = {−a : a ∈ A− } ,
We start with the weight vector w (0) = 0. Then we pick the k-th element in X,
T
and we denote it by p (k). If w (k) p (k) < 0, we change the weight as follows
w (k + 1) = w (k) + p (k) .
If we finish with the training set, we start over again, following the same order.
Interestingly enough, if there is a solution to the problem of classification (in
this case if there exists a w0 such that for any x ∈ X we have w0T x > 0), the
algorithm described above finds a solution in a finite number of iterations. Let
us show this. In order to proceed, we define two parameters
T
α= min p (j) w0 ,
j=1,...,m
and
2
β 2 = max kp (j)k
j=1,...,m
From this formula, taking scalar product with w0 , and the definition of α, we
obtain
Xk
T T
w (k + 1) w0 = p (n) w0 ≥ kα.
n=0
and thus
k
X
2 2
kw (k + 1)k ≤ kp (j)k ≤ β 2 k,
j=0
or √
kw (k + 1)k ≤ β k.
This two inequalities can be read as follows: on the one hand, the weight vector
grows at least linearly in the number of training examples it classifies wrongly,
but at most as the square root this same number of examples times a constant.
Clearly, this situation is untenable, unless at some point the weight vector clas-
sifies correctly all examples.
lk = N1k , . . . , Nm
k
k
,
the Njk are the nodes or neurons. The last layer we will denote by ld , the d as a
reminder of depth. Neurons from layer lk−1 are connected with neurons in layer
lk . The output of the i-th neuron in the k-th layer we will denoted by oki . The
total output of layer k into neuron i in the next layer is given by
where the wjk,i are called weights and they represent the strength of the con-
nection between neuron i in layer k − 1 and neuron j in layer k, that is the
weight of the connection between Nik−1 and Njk . This is the output received by
the j-th neuron in the k-th layer, but as the neuron depends on an activation
function, we will have by definition that
okj = S xkj ,
1
S (x) = .
1 + e−x
36 CHAPTER 3.
For the network to learn, it will be trained by a set of data: (xa , ya ), where
xa is the given input and ya the corresponding expected output. In a training
scenario, there is a total error
1X d 2
J= oa − y a ,
2 a
and the objective of the learning algorithm is to find weights wjk,i that minimize
J. Of course, the reader must notice that J depends on the family of weights
employed, and this in principle could be seen if we expanded the ouputs oda in
all its glory. The art here is then no having to proceed to do this expansion.
The first weights that can be updated at time t + 1, knowing those weights
at time t are the weights incoming into the last layer.
To minimize (or at least trying to do so) we employ a steepest descent
algorithm, so we adjust the weights as follows
∂J
wjd,i (t + 1) = wjd,i (t) − α , α > 0.
∂wjd,i
We must then compute the partial derivative in the previous expression. The
following fact about the sigmoid function will be useful
∂J ∂J ∂odj ∂xdj
= odj − yj S xdj 1 − S xdj od−1
= i .
∂wjd,i d d d,i
∂oj ∂xj ∂wj
The function S (x) (1 − S (x)), so we shall name it B (x), and thus we can rewrite
the previous identity as
∂J ∂J ∂odj ∂xdj
= odj − yj B xdj oid−1 .
=
∂wjd,i d d d,i
∂oj ∂xj ∂wj
∂J ∂J ∂odj d j
∂odj
= = o j − y ,
∂xdj ∂odj ∂xdj ∂xdj
for the next calculation, and in a more general fashion, once we have updated
the weights coming into the layer k + 1, we will need to keep in memory
∂J
.
∂xk+1
j
3.2. NEURAL NETWORKS 37
Next, we occupy ourselves on finding how to update the weights on the inner
layers. Assume, we have updated the weights of the links coming into the k + 1-
th layer; our task is to update the weights coming into the k-th layer, that is,
wjk,i . As before, we use a steepest descent method,
∂J
wjk,i (t + 1) = wjk,i (t) − α .
∂wjk,i
∂J
We must compute , and to do this we recur to the chain rule
∂wjk,i
∂J ∂J ∂xkj
= ,
∂wjk,i ∂xkj ∂wjk,i
∂J
and we need to compute , which is given by
∂xkj
∂J X ∂J ∂xk+1
l
k
= k+1 ∂xk
,
∂xj l
∂xl j
and then
∂xk+1 ∂xk+1 ∂okj
l l
= wlj,k+1 S xkj 1 − S xkj .
=
∂xkj ∂okj ∂xkj
As the reader should notice, this way we can proceed until we reach the first
layer. This is the backpropagation algorithm.
38 CHAPTER 3.
Chapter 4
In this part of the notes, we shall find numerical methods for solving linear
systems. This is important not only per se, but we should look at this having
in sight applications for solving differential equations numerically. One of our
main tools will be the Fixed Point Theorem. In order to apply this important
result, we must learn how to measure matrix norms.
kT xkµ
kT kν,µ := sup = sup kT xkµ .
x6=0 kxkν kxkν =1
Exercise. Show that the second equality in the previous definition holds.
Exercise. Show that kT kν,µ defines a norm.
An aside on notation: whenever n = m and µ = ν, we shall denote the norm
simply as kT kν .
Let us show a few examples where we compute different norms for a given
operator. We shall consider T : R2 −→ R2 represented by the matrix
2 −5
T =
1 3
39
40 CHAPTER 4.
Notice now that the quantities inside each parenthesis is just the expectation
value of a random variable: the one in the first parenthesis takes the value 2
with probability |x1 | and the value 5 with probability |x2 | = 1 − |x1 |, whereas
the one in the second parenthesis takes on the value 1 with probability |x1 | and
the value 3 with probability |x2 | = 1 − |x1 |. Hence, it is not difficult to estimate
kT xk1 ≤ 5 + 3 = 8.
On the other hand, if we choose x with x1 = 0 and x2 = 1, it is not difficult to
see that kT xk1 = 8. We conclude then that kT k1 = 8.
Next we consider R2 endowed with the norm
kxk∞ = max {|x1 | , |x2 |} .
Our goal now is to compute kT k∞ .
kT k∞ = max {|2x1 − 5x2 | , |x1 + 3x2 |}
≤ max {|x1 | + 5 |x2 | , |x1 | + 3 |x2 |}
≤ 2 |x1 | + 5 |x2 |
≤ (5 + 2) max {|x1 | , |x2 |} ,
and if kxk∞ = 1, then we have the estimate kT k∞ ≤ 7. On the other hand
notice that if we take x such that x1 = 1 and x2 = −1(= sign of − 5), then
kT xk∞ = 6. Therefore kT k∞ = 7.
The previous computations are particular cases of the following theorem.
Theorem 7. Given T : Rn −→ Rn , we have that
n
X
kT k1 = max |Tjk | , (4.1)
j=1,2,...,n
k=1
and
n
X
kT k∞ = max |Tjk | . (4.2)
k=1,2,...,n
j=1
Proof. We shall show (4.1). Pick any vector x = (x1 , . . . , xn ) with kxk = 1.
Then, we have X XX
kT xk1 = |T xj | = |Tij | |xi | .
j i j
P P
Now notice that i |xi | = 1, so, if we define Li = j |Tij |, then the expression
X
Li |xi |
i
To get some intuition about how the spectral radius is realted to the norm
of matrices we prove the following.
J (x) = hAx, xi ,
2
restricted to kxk = 1. First notice that
∂ ∂ X
J (x) = Akj xk xj
∂xi ∂xi
k,j
X
= (Akj δki xj + Akj δji xk )
k,j
X
= (Aij xj + Aki xk ) ,
k,j
notice that k and j are dummy summation indices, hence, using that A is
symmetric gives
∂ X
J (x) = 2 Aik xk ,
∂xi
k
Ax = λx,
Hence, if the minimum and maximum eigenvalues of A are λmin and λmax ,
then if kxk = 1
and hence
kAk2 ≤ max {|λmin | , |λmax |} ,
from which we deduce that kAk2 ≤ ρ (A). The other inequality is left as an
exercise.
Theorem 9. Let A be a matrix whose spectral radius is ρ (A), and let > 0.
∗
There exists a matrix norm k·k such that
∗
kAk ≤ ρ (A) + .
kAkν < 1.
x = −D−1 (A − D) x + D−1 b,
The map T x = −D
−1 (A − D) x + −1
D b will have a fixed point as long as, in a
−1
given norm, αµ = D (A − D) µ < 1; notice also that this condition implies
4.3. GAUSS-SEIDEL 43
that the fixed point scheme will converge in the k·kµ norm towards the fixed
point. Besides, we have an estimate
αµn
kxn − x∗ k ≤ kx1 − x0 kµ .
1 − αµ
Intuitively, Jacobi’s method will work when the elements in the diagonal
are much larger than the other elements in the matrix A. We have then the
following result which guarantees in some particular cases when one of the norms
described above are strictly less than 1 for a matrix.
1 X
max |aij | < 1,
i |aii |
j6=i
Proof. Just notice that the expression above corresponds to the norm k·k1 of
the transformation
D−1 (A − D) .
4.2.1 Example.
We will solve iteratively the equation Ax = b
10 1 1
A= , b= .
−2 12 −1
In this case
10 0
D=
0 12
The iteration scheme is
xn+1 = U xn + c,
where
1 1
0 − 10 10
U= 1 , c= 1 .
6 0 − 12
We take as seed x0 = 0, and hence
1 1
x1 = 10 , x2 = 10 ,
1 1
− 12 − 12
4.3 Gauss-Seidel
In this case we use the following decomposition:
(D + AR ) x = −AL x + b,
44 CHAPTER 4.
hu, viA := ut Av = 0.
Our goal is to solve the system Ax = b. Assume then that we have a set
P = {p1 , . . . , pn } of A-conjugate vectors. It is not difficult to prove that P
forms a basis of Rn , and it is left as an exercise for the reader. A solution to
the system is then given by
x = αi pi ,
where it is not difficult to compute the coefficients αi :
hx, pi iA
αi = . (4.3)
hpi , pi iA
r0 = b, x(1) = b, p1 = b, p0 = 0.
Then we define
Of course it is not obvious that αj exists, this must be proven. That is the
purpose of next lemma.
Lemma 4. With the iterations defined as above,
hp1 , . . . , pk i = b, . . . , Ak−1 b ,
x(k) = b, . . . , Ak−1 b .
j−1
X
hpj+1 , pm iA = hrj , pm iA + αj hpj , pm iA + αl hpl , pm iA .
l=1
hrj , pm iA = 0.
rk+1 = b − Ax(k+1) ,
all we must show is that x(k+1) ∈ b, . . . , Ak b . Since
and
pk+1 = rk + αk pk ,
and both rk and pk belong to b, . . . , Ak b , we can conclude that so does pk+1 ,
and hence x(k+1) . The lemma follows.
4.5. THE CONJUGATE GRADIENT METHOD 47
(j)
and ∇f x = Ax(j) − b = −rj . Thus rj ⊥ pj .
On the other hand
rj = b − Ax(j) = b − A x(j−1) + αj pj ,
and hence
rj = b − Ax(j−1) − αj Apj = rj−1 − αj Apj .
Differential Equations:
Iterative Methods
y 00 (ξn ) 2
y (tn+1 ) − yn+1 = y (tn ) − yn + h (f (tn , y (tn )) − f (tn , yn )) + h .
2
The mean value theorem gives
∂f
f (tn , y (tn )) − f (tn , yn ) = (tn , ζn ) (y (tn ) − yn ) .
∂y
Let us write
En = |y (tn ) − yn | ,
49
50 CHAPTER 5. DIFFERENTIAL EQUATIONS: ITERATIVE METHODS
so we get
B 2
En+1 ≤ En + AhEn + h ,
2
∂f
where A is a bound on (t, y), and B is a bound on y 00 . To obtain this last
∂y
bound, we work out y 00 in terms of f and its derivatives.
∂f ∂f
y 00 = +f .
∂t ∂y
∂f
Assume then that the bound A also works for f and for . Then we can
∂t
bound for all t
|y 00 (t)| ≤ A + A2 .
Hence we have
A + A2 2
En+1 ≤ En (1 + Ah) + h .
2
Therefore, if we solve the recurrence
A + A2
En+1 = θEn + γh2 , θ = 1 + Ah, γ= ,
2
j−1
X θj − 1 γ j
Ej = γh2 θk = γh2
= θ − 1 h.
θ−1 A
k=0
Just for the sake of simplicity, we want to drop the dependence in j on the
previous formula. Thus we have (as 0 ≤ j ≤ N )
N
j AL
θj − 1 = (1 + Ah) − 1 ≤ 1+ − 1 ≤ eAL − 1.
N
Theorem 12. Given the differential equation (5.1) and the approximate solu-
tion by Euler’s method, then we have that
1 + A AL
|y (tn ) − yn | ≤ e − 1 h,
2
∂f ∂f
where A is a bound on |f |, and .
∂t ∂y
5.2. RUNGE-KUTTA 51
h2 ∂f
∂f
yn+1 = yn + hf (tn , yn ) + (tn , yn ) + f (tn , yn ) (tn , yn ) .
2 ∂t ∂y
Find an estimate for |y (tn ) − yn |. Implement this method for the differential
equation
dy 1
= , y (0) = 1, t ∈ [0, 1] ,
dt 1 + t2 y 2
and compare its performance with Euler’s method.
5.2 Runge-Kutta
Euler’s method is what we would call an O (h) or first order method, as the
error between the exact and the approximate solution is of order h.
A simple technique, which we shall illustrate below, can give a method where
the error is or order h2 . Let us begin. The idea is to introduce some extra
parameters that will allow us some freedom of choice. So we set up the method
as follows L
tn+1 = tn + h, h = N ,
k1 = hf (tn , yn ) ,
k2 = hf (tn + αh, yn + βk1 ) ,
y = yn + ak1 + bk2 ,
n+1
y0 = y (0) , t0 = 0.
As before, our aim is to obtain a recurrence to estimate |y (tn ) − yn |.
First we expand yn+1 using Taylor’s theorem
∂f ∂f
yn+1 = yn + ak1 + bh f (tn , yn ) + αh (tn , yn ) + βk1 (tn , yn ) + Rn ,
∂t ∂y
where
bh t
Rn = u Hessf (θn ) un ,
2 n
and here θn is a point in the interior of the segment that joins the point (tn , yn )
to the point (tn + αh, yn + βk1 ), and
h2 ∂f
∂f
y (tn+1 ) = y (tn ) + hf (tn , y (tn )) + +f (tn , y (tn )) + R̃n ,
2 ∂t ∂y
where
1 000
R̃n =y (ηn ) h3 .
6
As an aside, that will be useful later, we can compute y 000 using (5.1):
2 !
000 ∂2f ∂2f ∂f ∂f ∂2f ∂f 2
2∂ f
y (t) = +f + +f +f +f (t, y (t)) .
∂t2 ∂y∂t ∂t ∂y ∂t∂y ∂y ∂y 2
We compute Dn+1 = y (tn+1 ) − yn+1 , and using the mean value theorem con-
veniently, we obtain
2 !
∂f h2 ∂ 2 f ∂f ∂2f
Dn + h (tn , θn ) Dn + + + f 2 (tn , ξn ) Dn + R̃n − R˜n .
∂y 2 ∂y∂t ∂y ∂y
∂f ∂f ∂ 2 f
Assuming that there is an A which gives a common bound for f , , , ,
∂t ∂y ∂t2
∂2f ∂2f
, , and writing En = |y (tn ) − yn |, we can estimate
∂t∂y ∂y 2
h2
A + 2A2 En + R̃n + |Rn | .
En+1 ≤ En + hAEn +
2
Next we show that both Rn and R̃n are both O h3 . Using the expression
for R̃n given above, and the assume bounds on f and its derivatives, it is not
difficult to get an estimate
1
A + 3A2 + 2A3 h3 .
R̃n ≤
6
For Rn , we use Cauchy-Schwartz and the definition of the k·k2 to obtain
bh 2
|Rn | ≤ kHessf (θn )k2 kun k2 .
2
Since the Hessian is symmetric, we can compute its k·k2 as follows
s 2 2 2 2 2
∂2f ∂ f ∂ f
kHessf (θn )k2 = + 2 + ,
∂t2 ∂t∂y ∂y 2
5.2. RUNGE-KUTTA 53
Thus,
|Rn | ≤ bA α2 + β 2 A2 h3 .
To finish our illustration of the Runge-Kutta technique, let us make a choice for
a, b, α and β. We choose
1
a=b=α=β= .
2
This allows us to conclude that a solution to the recurrence
En+1 = ρEn + ωh3 , E0 = 0,
will give a bound from above for the error |y (tn ) − yn |. Here
h2 1 1
A + 2A2 , A + 3A2 + 2A3 + A 1 + A2 .
ρ = 1 + Ah + ω=
2 6 16
Solving this recurrence gives
ρj − 1 ω
Ej ≤ ωh3 h2 ρj − 1 .
≤
ρ−1 h
A + (A + 2A2 )
2
1
If we also assume that h ≤ , this expression can be simplified a bit further,
A
to give
ω 1
Ej ≤ h2 e2A+ 2 − 1 .
A
Thus we have developped a method of solving (5.1) where the error goes as
O h2 : This is a Runge-Kutta method of order 2. We resume what we have
done in this section in the following theorem.
Theorem 13. Given the differential equation (5.1) and the approximate solu-
tion by the Runge-Kutta Method as above with a = b = α = β = 21 , then we
have that
11 1 19 1
|y (tn ) − yn | ≤ + A + A2 e2A+ 2 − 1 h2 ,
48 2 48
2 2 2
∂f ∂f ∂ f ∂ f
, and ∂ f .
where A is a bound on |f |, , , 2 , 2
∂t ∂y ∂t ∂t∂y ∂y
Exercise.
Develop a Runge-Kutta method of order 3 (i.e., the error is of order
O h3 ).
Differential Equations:
Finite Difference Method
55
56CHAPTER 6. DIFFERENTIAL EQUATIONS: FINITE DIFFERENCE METHOD
We use the equations in the previous section to obtain the following expres-
sion for the differential equation:
where
1 (4)
O h2 = y (ξi+1 ) + y (4) (ξi−1 ) h2 ,
24
where ξi+1 is an intermediate point between ti and ti+1 and ξi−1 is analogously
defined. So we propose the following scheme for a numerical solution to the
problem above:
yi+1 + yi−1 − 2yi
− yi = 0, y0 = 1, yN = 2. (6.7)
h2
So if we call y the column vector whose i-th component is yi , we call b the
vector such that b0 = 1, bN = 2 and bj = 0 if j 6= 1, N , and we form the matrix
A with components
1 2 + h2
a00 = 1 = aN N , ai,i−1 = = ai,i+1 , ajj = − ,
h2 h2
then we can rewrite the scheme as the following linear system.
Ay = b.
y = A−1 b.
Now we need to find out how good the approximation is. We shall use the
following notation
Ej ≡ yj − y (tj ) .
So if we subtract from the scheme the differential equation (using the approxi-
mations), we obtain
E = h−2 A−1 .
kEk2 ≤ M kBk2 h4 .
We delay the proof of this beautiful result to show how to use it. In the
example we are studying, g (x) = 0 and h (x) = −1 ≤ 0, hence the hypotheses
of the theorem hold. Therefore, y reaches its maximum at the boundary (indeed,
if the maximum were reached in the interior of the interval, it would be greater
than 2, hence positive, and thus y would be constant which is not!). We can
conclude then that
y ≤ 2, so y (4) ≤ 2.
Now let us show that y is nonnegative. To do so, assume the opposite, that
is, there is a c ∈ (a, b) such that y (c) < 0. Notice that if we define u = −y,
u satisfies u00 − u = 0, so it cannot have an interior nonnegative maximum
because it would be constant. But then, this contradicts our assumption that y is
negative at an interior point. In consequence we must have y ≥ 0. Summarizing,
we have shown that
y (1) − y (0)
y 0 (c) = = 1.
1−0
58CHAPTER 6. DIFFERENTIAL EQUATIONS: FINITE DIFFERENCE METHOD
Having a bound on the first and second derivatives then allows us to bound any
other derivative of y that we wish or need.
y 00 + p (t) y 0 + q (t) y = 0.
Now let η > 0 be any number such that 0 < P η < 1. We are going to bound on
[c, c + η] (to bound on [c − η, c] we proceed in a similar way). Then we obtain,
by writing B1 = maxt∈[c,c+η] |y 0 (t)|
B1 ≤ |y 0 (c)| + P ηB1 + QM η.
B0 QM η
B1 ≤ + .
1 − Pη 1 − Pη
|y 0 | on [c + (k + 1) η, c + (k + 2) η] :
Bk+1 ≤ Bk + P ηBk+1 + QM η,
and hence
Bk
Bk+1 ≤ + A (η) ,
1 − Pη
6.2. THE THEORY BY THE WAY OF AN EXAMPLE 59
QM η
where A (η) = . If we have that
1 − Pη
b−c
η= ,
N
for N large, then
θN − 1 1
BN = θN B0 + A (η) , θ= .
θ−1 1 − Pη
will give a global bound on y 0 on [c, b]. It is not difficult to prove that
lim θN = eP (b−c) .
N →∞
θN − 1 QM P (b−c)
lim A (η) = e −1 .
N →∞ θ−1 P
and hence
QM P (b−c)
maxt∈[c,b] |y 0 (t)| ≤ eP (b−c) |y 0 (c)| +
e −1
P
QM P (b−a)
≤ eP (b−a) |y 0 (c)| +
e −1 .
P
In the same way we get
QM P (b−a)
max |y 0 (t)| ≤ eP (b−a) |y 0 (c)| + e −1 ,
t∈[a,c] P
which gives as a bound:
QM P (b−a)
max |y 0 (t)| ≤ eP (b−a) |y 0 (c)| + e −1 .
t∈[a,b] P
y (b) − y (a)
If we wisely choose c so that y 0 (c) = , we finally obtain
b−a
P (b−a) y (b) − y (a)
0
QM P (b−a)
max |y (t)| ≤ e + e − 1 .
t∈[a,b] b−a P
We state the previous estimate in the following theorem.
Theorem 15. Let y be a smooth solution to the boundary value problem
00
y + p (t) y 0 + q (t) y = 0
y (a) = ya , y (b) = yb .
Let M, P and Q be such that
max |y (t)| ≤ M, max |p (t)| ≤ P, and max |q (t)| ≤ Q.
t∈[a,b] t∈[a,b] t∈[a,b]
Then
0 P (b−a)
y (b) − y (a) QM P (b−a)
max |y (t)| ≤ e + e − 1 .
t∈[a,b] b−a P
When P = 0, the estimate becomes
y (b) − y (a)
max |y 0 (t)| ≤ + QM (b − a) .
t∈[a,b] b−a
60CHAPTER 6. DIFFERENTIAL EQUATIONS: FINITE DIFFERENCE METHOD
α, β ≥ 0, then
u (t) ≤ αeβt .
• w>0
• (L + h) [w] ≤ 0
Theorem 16. Assume that u defined in [a, b] satisfies the differential inequality
u
(L + h) [u] ≥ 0. Then the function v = satisfies the differential inequality
w
0
w 1
00
v + 2 + g v 0 + (L + h) [w] v ≥ 0.
w w
y 00 + y = 0 in [0, 1] ,
(6.8)
y (0) = 1, y (1) = 2.
Notice that h = 1 ≥ 0, and hence we cannot work out this problem as we did in
the previous section. However use the extended Maximum Principle to bound
the error when using a discretization scheme with N = 4, that is h = 0.25. I
suggest to use as w = cos t. And a question arises, not for every interval [0, a]
you will find a w...for which a’s can you guarantee the existence of a w?
6.3. PROBLEM 61
6.3 Problem
For this problem you will have a little bit of research. Find about the Maximum
Principle for the heat equation, and about compatibility conditions for solutions
to the heat equation (or parabolic equations).
In this section you are asked to work out by yourself the problem of solving
numerically the Boundary Value Problem
∂u ∂2u
= in (0, 1) × (0, ∞)
∂t ∂x2
u (0, t) = 0 = u (1, t)
u (x, 0) = sin (2πx) .
You will approximate u (0.3, 0.2). To do so, choose stepsizes ∆t = 0.05 and
∆x = 0.1, and to discretize the time derivative use at (xj , ti ) use
7.1 Introduction
To introduce the method we shall work out an example (a very simple one!).
First we shall work out the approximation and then we will perform the error
analysis.
−y 00 + y = 1 in (0, 1)
(7.1)
y (0) = 0 = y (1) .
Multiply (7.1) by a differentiable function ϕ and integrate by parts to obtain
Z 1 Z 1
y 0 (τ ) ϕ0 (τ ) + y (τ ) ϕ (τ ) dτ = ϕ (τ ) dτ.
0 0
[tj , tj+1 ] , j = 0, . . . , N − 1,
and in what follows we shall denote by h the common length of each of these
subintervals.
We define the piecewise linear functions ej,N , j = 1, . . . , N − 1 as
N (t − tj−1 ) if tj−1 ≤ t < tj ,
ej,N (t) = N (tj+1 − t) if tj ≤ t < tj+1 , (7.2)
0 otherwise.
The N in front of each of the linear pieces in the previous definition is there so
that
1−0
N (tj − tj−1 ) = N h = N × = 1,
N
so it has to be appropriately modified accordingly with the interval where the
boundary value problem is posed. In general, if the interval we are working with
is [a, b], the coefficient multiplying each of the linear parts should be N/ (b − a).
Define the space VN as the finite dimensional vector space generated by the
ej,N . The Finite Element Method consists on finding an approximation for the
63
64 CHAPTER 7. THE FINITE ELEMENT METHOD
Working out the proposed example, with N = 4 we obtain the following system
49 95
1
6 − 24 0 c1 4
− 95 49
− 95 c2 = 1
24 6 24 4
95 49 1
0 − 24 6
c3 4
Assume this bilinear form is positive definite (it is not always so); it then defines
a norm p
kuka = a (u, u).
Then we have the following.
Lemma 6. The approximation yN given by the finite element method is the
best fit approximation to y in the space VN given by the norm induced by the
bilinear form a. In other words, for any w ∈ VN
ky − yN ka ≤ ky − wka .
Proof. Let us prove the first inequality, the second one is left as an exercise.
The fact that ζ (c) = 0 = ζ (d), implies that we can write ζ as
∞
X π
ζ (t) = ak sin (t − c) , L = d − c.
L
k=1
Differentiating, we obtain
∞
X π π
ζ 0 (t) = k ak cos k (t − c) ,
L L
k=1
and
∞
X π 2 π
ζ 00 (t) = − k2 ak sin k (t − c) .
L L
k=1
But then we have
Z d ∞ π 2
2
X 2
(ζ 0 (τ )) dτ = k2 |ak | ,
c L
k=1
and Z d ∞ π 4
2
X 2
(ζ 00 (t)) dτ = k4 |ak | ,
c L
k=1
and by comparing these two expressions the first inequality follows.
Z 1
2
We must estimate (y 00 (τ )) dτ . In order to do so, from (7.1) we obtain
0
Z 1 Z 1
2 2
(y 00 (τ )) dτ = (1 − y) dτ.
0 0
Using the inequality 2ab ≤ a2 + b2 we can estimate the right hand side of the
previous inequality as
Z 1 Z 1
2
(1 − y) dτ ≤ 1 − 2y + y 2 dτ
0 0
Z 1
≤ 3 + y 2 dτ,
0
Z 1
and all that is left to do is to estimate y 2 dτ . But then notice that from
0
Z 1 Z 1
0 2 2
(y (τ )) + (y (τ )) dτ = y (τ ) dτ,
0 0
7.2.1 Problem
Perform the error analysis for the finite element method when applied to
0
− (2t2 + 2)y 0 + ty = t in (0, 1)
(7.3)
y (0) = 0 = y (1) .
Before you begin your analysis show that the bilinear form induced by the
equation is positive definite!
68 CHAPTER 7. THE FINITE ELEMENT METHOD
Chapter 8
We will use the finite difference method, so we discretise the second derivative
as
yi+1 + yi−1 − 2yi
,
h2
and the first derivative as
yi+1 − yi−1
.
2h
Then we obtain the system
For pedagogical reasons, let us only divide the interval into three subintervals.
Then, the system we must solve becomes
Ay = b,
where
1 0 0 0
0.5 − 0.25h −1 0.5 + 0.25h 0
A= ,
0 0.5 − 0.25h −1 0.5 + 0.25h
0 0 0 1
69
70 CHAPTER 8. MONTE CARLO METHODS
1 1
0 y1
0 ,
y=
y2 .
b=
2 2
We can rewrite the previous system as a fixed poin problem
y = P y,
where P = (pij )i,j=0,...,3 is the matrix
1 0 0 0
0.5 − 0.25h 0 0.5 + 0.25h 0
.
0 0.5 − 0.25h 0 0.5 + 0.25h
0 0 0 1
This problem can be solved by successive approximations, and we get that the
solution would be given by
y = P∞ y,
where P∞ = limn→∞ P n , provided that this limit exists. We shall argue, using
probabilistic arguments that P∞ does exists, and that it has a natural interpre-
tation.
Notice then that the previous matrix can be interpreted as the matrix of a
random walk (a Markov chain), as follows. If you look at pij it is the probability
of going from yi to yj . The fact that the 00 and 33 elements of the matrix are
both 1 reflect the fact that there is absorption at the boundary, i.e., once either
0 or 1, the boundary points of the interval, are reached you remain there.
Hence, the ij element of the matrix P n gives the probability of starting at
yi and ending up at yj after n steps. To compute the limit of P n as n goes to
∞ we will need the following
Lemma 8. Let A > 0 be fixed, then for a random walk starting at 0, of stepsize
1, the probability that after n steps it is at distance less that A goes to 0 as
n → ∞.
Before we begin with a proof of this lemma, we shall need the following
elementary inequality
4k
2k
≤√ .
k 2k + 1
We proceed by induction. Assuming the bound at k = n, at k = n + 1 we have
2n + 2 (2n + 2) (2n + 1) 2n
= 2
n+1 (n + 1) n
n
√
2 · 4 2n + 1
≤ ,
n+1
and hence in order to have that
4n+1
2n + 2
≤√ ,
n+1 2n + 3
all we need to check is that
√ √
2n + 1 2n + 3 ≤ 2 (n + 1) ,
8.1. MONTE CARLO FOR ODE’S 71
or equivalently
4n2 + 8n + 3 ≤ 4n2 + 8n + 4,
which is obviously true. The stated bound follows then by induction.
Proof of Lemma 8. Assume that we go to the right with probability p and to
the leftt with probability 1 − p. Hence, the probability that the walk remains
within distance A from the starting point is given by
X n m n−m
p (1 − p) .
n−A n+A
m
2 ≤m≤ 2
if n is large enough.
From the previous estimates we find that
n+2n n
0 1 1 A
|f (x)| ≤ 4n + = 2A n+A 1 + .
2 2 n
A2 e A
A
√ 1+ ,
n n 2
From the proof of the previous lemma, we can extract the following estimate
Corollary 1. Consider s random walk starting at the origin of the real line, with
stepsize h. Then the probability that after n steps the walk is within distance A
from the origin is at most
A2 e Ah
A
√ 1+ .
h n nh2 2
The previous Lemma shows that eventually any random walk hits the bound-
ary of the interval. This shows that as n goes to ∞ if pn,ij is the ij element of
P n and j 6= 0, N then it goes to zero. In other words, all the columns of the
matrix P∞ = limn→∞ P n are zero except the first and the last, and the i-th
component of the 0-th and N -th column of gives the probability of arriving to
the right endpoint and left endpoint of the interval when starting on the i-th
point of the mesh respectively. We shall denote by pi,a the probability of reach-
ing the point a when starting from ti , and by pi,b the probability of reaching
the point b when starting also from ti . Our discussion shows that
pi,a + pi,b = 1.
Hence, if we define a random variable Wi which takes the value ya with proba-
bility pi,a and yb with probability pi,b then,
yi = E [Wi ] ,
S = {−h, h} ,
and then
SM = ∪M i
i=1 S .
8.1. MONTE CARLO FOR ODE’S 73
and a similar estimates holds for pi,b − pi,b;M . Using the definition of h, we
obtain
N 2 e N
N
pi,a − pi,a;M ≤ √ 1+ .
M M 2
This last estimate is quite helpful when trying to estimate the difference E [Wi ] − E W M . i
To get an estimate of yi we now use the the random variable WiM . We use
J copies of WiM , which we shall denote by Wi,k
M
, k = 1, 2, . . . , J, and define the
random variable
∗ J
Z : (SM ) −→ R,
as
J
1X M
Z (ω1 , . . . , ωJ ) = Wi,k (ωk ) .
J
k=1
It is easy to compute the mean and the variance of Z. In fact we have that
σ2
µZ = E WiM , and σZ
2
= ,
J
where σ 2 is the variance of WiM . We can easily bound σ 2 , as
k2 k2 h2 h2
ui,j = ui+1,j + u i−1,j + ui,j+1 + ui,j−1 .
2 (h2 + k 2 ) 2 (h2 + k 2 ) 2 (h2 + k 2 ) 2 (h2 + k 2 )
77