0% found this document useful (0 votes)
165 views192 pages

S Ccs Answers

This document provides solutions to challenges from Chapters 1 and 2 of the textbook "Scientific Computing with Case Studies". The challenges cover topics like errors and arithmetic, sensitivity analysis, and solving systems of linear equations. Detailed step-by-step solutions are given for each challenge problem. Figures and code examples are referenced to further illustrate concepts related to numerical stability and conditioning. Key ideas discussed include sources of error in floating point arithmetic, estimating error bounds, analyzing sensitivity of functions and solutions to perturbations in inputs, and how conditioning affects the behavior of solutions to linear systems when coefficients are changed.

Uploaded by

noshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views192 pages

S Ccs Answers

This document provides solutions to challenges from Chapters 1 and 2 of the textbook "Scientific Computing with Case Studies". The challenges cover topics like errors and arithmetic, sensitivity analysis, and solving systems of linear equations. Detailed step-by-step solutions are given for each challenge problem. Figures and code examples are referenced to further illustrate concepts related to numerical stability and conditioning. Key ideas discussed include sources of error in floating point arithmetic, estimating error bounds, analyzing sensitivity of functions and solutions to perturbations in inputs, and how conditioning affects the behavior of solutions to linear systems when coefficients are changed.

Uploaded by

noshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 192

Solution Manual for

Scientific Computing
with Case Studies
Dianne P. O’Leary c
2008

January 13, 2009


2
Unit I

SOLUTIONS: Preliminaries:
Mathematical Modeling,
Errors, Hardware and Software

3
Chapter 1

Solutions: Errors and


Arithmetic

CHALLENGE 1.1.
(a) This is true. The integers are equally spaced, with distance equal to 1, and you
can easily generate examples.
(b) This is only approximately true.
• For example, with a 53-bit mantissa, if r = 1, then f(r) = 1 + 2−52 , for a
relative distance of 2−52 .
• Similarly, if r = 8 = 10002 , then f(r) = 10002 +2−52+3 , for a relative distance
of 2−49 /23 = 2−52 .
• But if r = 1.25 = 1.012 , then f(r) = 1.25 + 2−52 , for a relative distance of
2−52 /1.25.

In general, suppose we have a machine-representable number r with positive man-


tissa z and exponent p. Then f(r) = (z + 2−52 ) × 2p , so the relative distance
is
(z + 2−52 ) × 2p − (z) × 2p 2−52
= .
z × 2p z
Because 1 ≤ z < 2, the relative distance is always between 2−52 and 2−53 , constant
within a factor of 2. A similar argument holds for negative mantissas.

CHALLENGE 1.2.
(a) The machine number just larger than 1.0000 is 1.0001, so machine epsilon is
10−4 .
(b) The smallest positive normalized mantissa is 1.0000, and the smallest exponent
is -9999, so the number is 1×10−9999 . (Note that this is much smaller than machine
epsilon.)

5
6 Chapter 1. Solutions: Errors and Arithmetic

CHALLENGE 1.3.
(a) The machine number just larger than 1.00000 is 1.000012 . Therefore, for this
machine, machine epsilon is .000012 = 2−5 .
(b) The smallest positive normalized mantissa is 1.000002 , and the smallest expo-
nent is −11112 = −15, so the smallest positive number is 2−15 .
(c) 1/10 = 1.1001100...2 × 2−4 , so the mantissa is +1.10011 and the exponent is
−4 = −01002 .

CHALLENGE 1.4.
(a) The number delta is represented with a mantissa of 1 and an exponent of -53.
When this is added to 1 the first time through the loop, the sum has a mantissa
with 54 digits, but only 53 can be stored, so the low-order 1 is dropped and the
answer is stored as 1. This is repeated 220 times, and the final value of x is still 1.
(b) By mathematical reasoning, we have a loop that never terminates. In floating-
point arithmetic, the loop is executed 1024 times, since eventually both x and twox
are equal to the floating-point value Inf.
(c) There are very many possible answers. For example, for the associative case, we
might choose x = 1 and choose y = z so that x + y = 1 but x + 2 y > 1. Then
(x + y) + z < x + (y + z).
(d) Again there are very many possible examples, including 0/0 = NaN and -1/0
= -Inf.
(e) If x is positive, then the next floating-point number bigger than x is produced
by adding 1 to the lowest-order bit in the mantissa of x. This is m times 2 to the
exponent of x, or approximately m times x.

CHALLENGE 1.5. The problem is:


A = [2 1; 1.99 1];
b = [1;-1];
x = A \ b;
The problem with different units is:
C = [A(1,:)/100; A(2,:)]
d = [b(1)/100; b(2)]
z = C \d
The difference is x-z = 1.0e-12 * [-0.4263; 0.8527]
The two linear systems have exactly the same solution vector. The reason that the
computed solution changed is that rescaling
7

• increased the rounding error in the first row of the data.

• changed the pivot order for Gauss elimination, so the computer performed a
different sequence of arithmetic operations.

The quantity cond(C)mach times the size of b is an estimate for the size of the
change.

CHALLENGE 1.6.
(a)
|x̃ − x| |x(1 − r) − x|
= = |r|.
|x| |x|
The computation for y is similar.
(b)
   
 x̃ỹ − xy   x(1 − r)y(1 − s) − xy 
 = 
 xy   xy 
 
 xy(rs − r − s) 
=  

xy
≤ |r| + |s|+|rs| .

CHALLENGE 1.7. No.

• .1 is not represented exactly, so error occurs in each use of it.

• If we repeatedly add the machine value of .1, the exact machine value for the
answer does not always fit in the same number of bits, so additional error is
made in storing the answer. (Note that this error would occur even if .1 were
represented exactly.)

(This was the issue in the Patriot Missile failure


http://www.ima.umn.edu/~arnold/disasters/patriot.html.)

CHALLENGE 1.8. No answer provided.


8 Chapter 1. Solutions: Errors and Arithmetic

CHALLENGE 1.9.
(a) Ordinarily, relative error bounds add when we do multiplication, but the domi-
nant error in this computation was the rounding of the answer from 3.2 × 4.5 = 14.4
to 14. (Perhaps we were told that we could store only 2 decimal digits.) Therefore,
one way to express the forward error bound is that the true answer lies between
13.5 and 14.5.
(b) There are many correct answers. For example, we have exactly solved the
problem 3.2 × (14/3.2), or 3.2 × 4.37, so we have changed the second piece of data
by 0.13.

CHALLENGE 1.10. Notice that xc solves the linear system


   
2 1 5
xc = ,
3 6 21

so we have solved a linear system whose right-hand side is perturbed by


 
0.244
r= .
0.357

The norm of r gives a bound on the change in the data, so it is a backward error
bound.
(The true solution is xtrue = [1.123, 2.998]T , and a forward error bound would be
computed from xtrue − xc .)

CHALLENGE 1.11. The estimated volume is 33 = 27 m3 .


The relative error in a side is bounded by z = .005/2.995.
Therefore, the relative error in the volume is bounded by 3z (if we ignore the high-
order terms), so the absolute error is bounded by 27 ∗ 3z ≈ 27 ∗ .005 = .135 m3 .

CHALLENGE 1.12. Throwing out the imaginary part or taking ± the absolute
value is dangerous, unless the imaginary part is within the desired error tolerance.
You could check how well the real part of the computed solution satisfies the equa-
tion. It is probably best to try a different method; you have an answer that you
believe is close to the true one, so you might, for example, use your favorite algo-
rithm to solve minx (f (x))2 using the real part of the previously computed solution
as a starting guess.
Chapter 2

Solutions: Sensitivity
Analysis: When a Little
Means a Lot

CHALLENGE 2.1.
(a)
2x dx + b dx + x db = 0,
so
dx x
=− .
db 2x + b

(b) Differentiating we obtain

dx 1 1 b
=− ± √ .
db 2
2 2 b − 4c

Substituting the roots x1,2 into the expression obtained in (a) gives the same result
as this.
(c) The roots will √
be most sensitive when the derivative is large, which occurs when
the discriminant b2 − 4c is almost zero, and the two roots almost coincide. In
contrast, a root will be insensitive when the derivative is close to zero. In this case,
the root itself may be close to zero, so although the absolute change will be small,
the relative change may be large.

CHALLENGE 2.2. The solution is given in exlinsys.m, found on the website,


and the results are shown in Figure 2.1. From the top graphs, we see that if we
“wiggle” the coefficients of the first linear system a little bit, then the intersection
of the two lines does not change much; in contrast, since the two equations for the
second linear system almost coincide, small changes in the coefficients can move the
intersection point a great deal.

9
10 Chapter 2. Solutions: Sensitivity Analysis: When a Little Means a Lot

The middle graphs show that despite this sensitivity, the solutions to the
perturbed systems satisfy the original systems quite well – to within residuals of
5×10−4 . This means that the backward error in the solution is small; we have solved
a nearby problem Ax = b + r where the norm of r is small. This is characteristic
of Gauss elimination, even on ill-conditioned problems.
The bottom graphs are quite different, though. The changes in the solutions
x for the first system are all of the same order as the residuals, but for the second
system they are nearly 500 times as big as the perturbation. Note that for the well-
conditioned problem, the solutions give a rather circular cloud of points, whereas for
the ill-conditioned problem, there is a direction, corresponding to the right singular
vector for the small singular value, for which large perturbations occur.
The condition number of each matrix captures this behavior; it is about 2.65
for the first matrix and 500 for the second, so we expect that changes in the right-
hand side for the second problem might produce a relative change 500 times as big
in the solution x.

CHALLENGE 2.3. The solution is given in exlinpro.m, and the results are
shown in Figure 2.2. For the first example, the Lagrange multiplier predicts that
the change in cT x should be about 3 times the size of the perturbation, and that
is confirmed by the Monte Carlo experiments. The Lagrange multipliers for the
other two examples (1.001 and 100 respectively) are also good predictors of the
change in the function value. But note that something odd happens in the second
example. Although the function value is not very sensitive to perturbations, the
solution vector x is quite sensitive; it is sometimes close to [0, 1] and sometimes
close to [1, 0]! The solution to a (nondegenerate) linear programming problem must
occur at a vertex of the feasible set. In our unperturbed problem there are three
vertices: [0, 1], [1, 0], and [0, 0]. Since the gradient of cT x is almost parallel to the
constraint Ax ≤ b, we sometimes find the solution at the first vertex and sometimes
at the second.
Therefore, in optimization problems, even if the function value is relatively
stable, we may encounter situations in which the solution parameters have very
large changes.

CHALLENGE 2.4. The results are computed by exode.m and shown in Figure
2.3. The Monte Carlo results predict that the growth is likely to be between 1.4
and 1.5. The two black curves, the solution to part (a), give very pessimistic upper
and lower bounds on the growth: 1.35 and 1.57. This is typical of forward error
bounds. Notice that the solution is the product of exponentials,
=49
τ
y(50) = ea(τ ) ,
τ =1
11

Equations for Linear System 1 Equations for Linear System 2


2 4

1.8

2
1.6

1.4
0

1.2
x2

x2
1 −2

0.8

−4
0.6

0.4
−6

0.2

0 −8
−2 −1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 −5 −4 −3 −2 −1 0 1 2 3 4 5
x1 x1

−4 Residuals for Linear System 1 −4 Residuals for Linear System 2


x 10 x 10
6 8

6
4

4
2

2
r2

r2

−2
−2

−4
−4

−6 −6
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
r1 −4 r1 −4
x 10 x 10
−4 Changes in x for Linear System 1
x 10 Changes in x for Linear System 2
6 0.15

0.1
4

0.05
2

0
δ x2

δ x2

−0.05

−2
−0.1

−4
−0.15

−6 −0.2
−10 −8 −6 −4 −2 0 2 4 6 8 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15
δ x1 −4 δ x1
x 10

Figure 2.1. Results of Challenge 2.2. The example on the left is typical
of well-conditioned problems, while the example on the right is ill-conditioned, so
the solution is quite sensitive to noise in the data. The graphs at top plot the linear
equations, those in the middle plot residuals to perturbed systems, and those on the
bottom plot the corresponding solution vectors.
12 Chapter 2. Solutions: Sensitivity Analysis: When a Little Means a Lot

−13 LP Example 1: perturbed solutions LP Example 1: perturbed function values


x 10
7.9 1

0.8

7.85 0.6

0.4

7.8 0.2
x2

7.75 −0.2

−0.4

7.7 −0.6

−0.8

7.65 −1
0.999 0.9992 0.9994 0.9996 0.9998 1 1.0002 1.0004 1.0006 1.0008 1.001 −3.003 −3.002 −3.001 −3 −2.999 −2.998 −2.997
x1 cTx

LP Example 2: perturbed solutions LP Example 2: perturbed function values


1.4 1

0.8
1.2

0.6

1 0.4

0.2
0.8
x2

0.6
−0.2

−0.4
0.4

−0.6

0.2
−0.8

0 −1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 −1.002 −1.0015 −1.001 −1.0005 −1 −0.9995
x1 cTx
−15 LP Example 3: perturbed solutions LP Example 3: perturbed function values
x 10
6 1

0.8

0.6

0.4

0.2
x2

4 0

−0.2

−0.4

−0.6

−0.8

2 −1
0.9 0.95 1 1.05 1.1 1.15 −1.15 −1.1 −1.05 −1 −0.95 −0.9
x1 cTx

Figure 2.2. Results of Challenge 2.3. The graphs on the left plot the
perturbed solutions to the three problems, while those on the right plot the optimal
function values. The optimal function values for the three problems are increasingly
sensitive to small changes in the data. Note the vastly different vertical scales in
the graphs on the left.
13

Solutions to the differential equation


1.7

1.6

1.5

1.4

1.3

1.2

1.1

1
0 5 10 15 20 25 30 35 40 45 50
t

Figure 2.3. Results of Challenge 2.4. The black curves result from setting
a = 0.006 and a = 0.009. The blue curves have random rates chosen for each year.
The red curves are the results of trials with the random rates ordered with largest
to smallest. For the green curves, the rates were ordered smallest to largest.

where a(τ ) is the growth rate in year τ . Since exponentials commute, the final
population is invariant with respect to the ordering of the rates, but the intermediate
population (and thus the demand for social services and other resources) is quite
different under the two assumptions.

CHALLENGE 2.5. The solution is given in exlinsys.m. The confidence


intervals for the first example are

x1 ∈ [−1.0228, −1.0017], x2 ∈ [1.0018, 1.0022]

and for the second example are

x1 ∈ [0.965, 1.035], x2 ∈ [−1.035, −0.965].

Those for the second example are 20 times larger than for the first, since they are
related to the size of A−1 , but in both cases about 95% of the samples lie within
the intervals, as expected.
Remember that these intervals should be calculated using a Cholesky decom-
position or the backslash operator. Using inv or raising a matrix to the −1 power
is slower when n is large, and generally is less accurate, as discussed in Chapter 5.
14 Chapter 2. Solutions: Sensitivity Analysis: When a Little Means a Lot
Chapter 3

Solutions: Computer
Memory and Arithmetic:
A Look Under the Hood

CHALLENGE 3.1. See problem1.m on the website.

CHALLENGE 3.2. The counts of the number of blocks moved are summarized
in the following table:

Dot-product Total Saxpy Total


column A 32 x 128 128/8 x 32
oriented x 32/8 x 128 4624 32/8 x 1 1028
storage y 128/8 x 1 128/8 x 32
row A 32/8 x 128 128 x 32
oriented x 32/8 x 128 1040 32/8 x 1 4612
storage y 128/8 x 1 128/8 x 32

Therefore, for good performance, we should use the dot-product formulation


if storage is row-oriented and the saxpy formulation if storage is column-oriented.

CHALLENGE 3.3. No answer provided.

CHALLENGE 3.4. Consider m = 16, s = 1. We access each of the 16 elements


16 times, and we have 2 cache misses, one for each block of 8 elements. So the total
time is 256 + 2 ∗ 16 ns for the 256 accesses, for an average of 1.125 ns. When s is
increased to 16, we access only z(1), so the total time drops to 256 + 16 ns.
For m = 64, the array no longer fits in cache and each block that we use must
be reloaded for each cycle. For s = 4, we have a cache miss for every other access
to the array, so the average access time is (1 + 16/2) = 9 ns.
The other entries are similar.

15
16 Chapter 3. Solutions: Computer Memory and Arithmetic

CHALLENGE 3.5. The data cannot be fully explained by our simple model,
since, for example, this machine uses prefetching, two levels of cache, and a more
complicated block replacement strategy than the least-recently-used one that we
discussed. Some of the parameters can be extracted using our model, though.
The discontinuity in times as we pass from m = 214 to m = 215 indicates that
the capacity of the cache is 214 (single-precision) words (216 bytes).
The discontinuity between stride s = 23 and s = 24 says that  = 23 words,
so b = 211 words.
The elements toward the bottom left corner of the table indicate that α ≈ 3
ns.
The block of entries for m ≥ 215 and 26 ≤ s ≤ 210 indicates that perhaps
μ ≈ 18 − 3 = 15 ns.
To further understand the results, consult a textbook on computer organiza-
tion and the UltraSPARC III Cu User’s Manual at
http://www.sun.com/processors/manuals/USIIIv2.pdf.

CHALLENGE 3.6. On my machine, the time for floating-point arithmetic is


of the order of the time μ for a cache miss penalty. This is why misses noticeably
slow down the execution time for matrix operations.

CHALLENGE 3.7. No answer provided.

CHALLENGE 3.8. No answer provided.


Chapter 4

Solutions: Design of
Computer Programs:
Writing Your Legacy

CHALLENGE 4.1. See posteddoc.m on the website.

CHALLENGE 4.2.
• Data that a function needs should be specified in variables, not constants.
This is fine; C is a variable.
• Code should be modular, so that a user can pull out one piece and substi-
tute another when necessary. The program posted factors a matrix into the
product of two other matrices, and it would be easy to substitute a different
factorization algorithm.
• On the other hand, there is considerable overhead involved in function calls,
so each module should involve a substantial computation in order to mask this
overhead. This is also satisfied; posted performs a significant computation
(O(mn2 ) operations).
• Input parameters should be tested for validity, and clear error messages should
be generated for invalid input. The factorization can be performed for any
matrix or scalar, so input should be tested to be sure it is not a string, cell
variable, etc.
• “Spaghetti code” should be avoided. In other words, the sequence of instruc-
tions should be top-to-bottom (including loops), without a lot of jumps in
control. This is fine, although there is a lot of nesting of loops.
• The names of variables should be chosen to remind the reader of their purpose.
The letter q is often used for an orthogonal matrix, and r is often used for
an upper triangular one, but it would probably be better practice to use
uppercase names for these matrices.

17
18 Chapter 4. Solutions: Design of Computer Programs

Times for matrices with 200 rows


3
10
Original algorithm
Modified algorithm

2
10

1
10
time (sec)

0
10

−1
10

−2
10
50 100 150 200
number of columns

Figure 4.1. Time taken by the two algorithms for matrices with 200 rows.

CHALLENGE 4.3.
(a) This program computes a QR decomposition of the matrix C using the modified
Gram-Schmidt algorithm.
(b) See website.
(c) This is corrected in postedfact.m on the website. The columns of q should be
mutually orthogonal, but the number of columns in q should be the minimum of
the row and column dimensions of C. Nonzero columns after that are just the result
of rounding errors.

CHALLENGE 4.4. The resulting program is on the website, and the timing re-
sults are shown in Figure 4.1. The program posted has been modified in postedfac
to use vector operations, use internal functions like norm when possible, and pre-
allocate storage for Q and R. The postedfact function runs 150-200 times faster
than posted on matrices with 200 rows, using a Sun UltraSPARC-III with clock
speed 750 MHz running MATLAB 6. It is an interesting exercise to determine the
relative importance of the three changes.
You also might think about how an efficient implementation in your favorite
programming language might differ from this one.
Unit II

SOLUTIONS: Dense Matrix


Computations

19
Chapter 5

Solutions: Matrix
Factorizations

CHALLENGE 5.1.

s = zeros(m,1);
for j=1:n,
s = s + abs(A(:,j));
end

Compare with:

for i=1:m,
s(i) = norm(A(i,:),1);
end

CHALLENGE 5.2.

3x2 = 6 → x2 = 2,
1
2x1 + 5x2 = 8 → x1 = (8 − 10) = −1.
2

The determinant is 2 ∗ 3 = 6.

CHALLENGE 5.3.

21
22 Chapter 5. Solutions: Matrix Factorizations

x = b;
detA = 1;
for i=1:n,
x(i) = x(i) / A(i,i);
x(i+1:n) = x(i+1:n) - A(i+1:n,i)*x(i);
detA = detA * A(i,i);
end

CHALLENGE 5.4.
(a)
⎡ ⎤ ⎡ ⎤⎡ ⎤
a11 a12 a13 11 0 0 11 21 31
⎣ a21 a22 a23 ⎦ = ⎣ 21 22 0 ⎦ ⎣ 0 22 32 ⎦
a31 a32 a33 31 32 33 0 0 33
⎡ 2 ⎤
11 11 21 11 31
= ⎣ 21 11 221 + 222 21 31 + 22 32 ⎦
31 11 31 21 + 32 22 231 + 232 + 233

(b) The MATLAB function should use the following algorithm.


for i =
1:n
i−1
ii = aii − j=1 2ij
for j = i + 1 :
n
i−1
ji = (aji − k=1 jk ik )/ii
end
end

CHALLENGE 5.5. We compute a Givens matrix by setting


3 4
c= √ , s= √ .
9 + 16 25
Then if  
3/5 4/5
G= ,
−4/5 3/5
then    
3 5
G =
4 0
so z = 5, the norm of the original vector.
23

CHALLENGE 5.6.

for i=1:3,

W = planerot(A(i:i+1,i));

% Note that the next instruction just operates on the part


% of A that changes. It is wasteful to do multiplications
% on the rest.

A(i:i+1,i:n) = W * A(i:i+1,i:n);

end

CHALLENGE 5.7. For G to be unitary, we need

I=G∗ G  
c −s c s
=
s̄ c −s̄ c
 2 2

|c| + |s| cs − sc
= .
s̄c − cs̄ |c|2 + |s|2

Therefore, it is sufficient that |c|2 + |s|2 = 1.


Now   |z1 |
cz1 + sz2 z z1 + sz2
Gz = = .
−s̄z1 + cz2 −s̄z1 + |zz1| z2
Therefore, if z1 = 0, we make the second entry zero by setting
|z1 | z2
s̄ = .
z z1

If z1 = 0, we can take s = z̄2 /|z̄2 |. (The only restriction on s in this case is that its
absolute value equals 1.)

√ √
CHALLENGE 5.8. Using Givens, c = s = 3/ 18 = 1/ 2 so
 
1 1 1
G=Q = √T
,
2 −1 1
 
1 6 4
R = QT A = √ .
2 0 −2
24 Chapter 5. Solutions: Matrix Factorizations

Alternatively, using Gram-Schmidt orthogonalization,


 √
r11 = 32 + 32 = 3 2,
 
1 3
q1 = √ .
3 2 3
Then  
3 √
r12 = qT1 = 4/ 2,
1
 
3 √

q2 = − 4/ 2q1 ,
1
√ √
and r22 = the norm of this vector = 2, so q2 =  q2 / 2. If we complete the
arithmetic, we get the same QR as above, up to choice of signs for columns of Q
and rows of R.

CHALLENGE 5.9. Note that after we finish the iteration i = 1, we have


qnew
k+1 = q old
k+1 − r q
1,k+1 1 , so

q∗1 qnew ∗ old ∗


k+1 = q1 qk+1 − r1,k+1 q1 q1 = 0

by the definition of r1,k+1 and the fact that q∗1 q1 = 1.


Assume that after we finish iteration i = j − 1, for a given value of k, we have
q∗ qk+1 = 0 for  ≤ j − 1 and q∗j q = 0 for j <  ≤ k. After we finish iteration
i = j for that value of k, we have q∗j qnew
k+1 = 0 by the same argument we used above,
and we also have that q∗ qnew
k+1 = 0, for  ≤ j − 1, since all we have done to qk+1 is
to add a multiple of qj to it, and qj is orthogonal to q . Thus, after iteration j,
q∗ qnew
k+1 = 0 for  ≤ j, and the induction is complete when j = k and k = n − 1.

CHALLENGE 5.10.
(a) We verify that Q is unitary by showing that its conjugate transpose is its inverse:

Q∗ Q = (I − 2uu∗ )(I − 2uu∗ )


= I − 4uu∗ + 4uu∗ uu∗
= I,

since u∗ u = 1. For the second part, we compute

v∗ z = (z∗ − α∗ eT1 )z
= z∗ z − α∗ z1
= z∗ z − e−iθ zeiθ ζ
= z∗ z − zζ,
25

and

v2 = (z∗ − α∗ eT1 )(z − αe1 )


= z∗ z − α∗ z1 − αz1∗ + α∗ α
= z∗ z − e−iθ zeiθ ζ − eiθ ze−iθ ζ + z2
= 2z∗ z − 2zζ.

Then

Qz = (I − 2uu∗ )z
2
= (I − vv∗ )z
v2
2v∗ z
=z− v
v2
=z−v
= αe1 .

(b) Let the second column of A1 be [a, v1 , . . . , vn−1 ]T . Use the vector v to form the
Householder transformation. Then the product QA1 leaves the first column of A1
unchanged and puts zeros below the main diagonal in the second column.
(c) Assume m > n. (The other case is similar and left to the reader.)
Initially, let R = A and Q = I (dimension m).
for j = 1 : n,
(1) Let z = [rjj , . . . rmj ]T .
(2) Let the polar coordinate representation of z1 be eiθ ζ.
(3) Define v = z − αe1 where α = −eiθ z.
(4) Let u = v/v, and define the Householder transformation by Q  = I−2uu∗ .
(5) Apply the transformation to R by setting R(j : m, j : n) = R(j : m, j :
n) − 2u(u∗ R(j : m, j : n)).
(6) Update the matrix Q by Q(j : m, j : m) = Q(j : m, j : m) − 2(Q(j : m, j :
m)u)u∗ .
end
Note that we used the associative law in the updates to Q and R to avoid
ever forming Q  and to reduce the arithmetic cost.

(d) Let k = m − j + 1. Then the cost per step is:

(1) No multiplications.
(2) O(1) multiplications.
(3) k multiplications.
(4) k divisions and k multiplications to form u.
(5) 2k(n − j + 1) multiplications.
(6) 3k 2 multiplications.
26 Chapter 5. Solutions: Matrix Factorizations

We need to sum this from j = 1 to n, but we can neglect all but the highest order
terms (mn3 and n3 ), so only the cost of steps (5) and (6) are significant. For (5)
we get
n
1
2(m − j + 1)(n − j + 1) ≈ mn2 − n3 ,
j=1
3
n
since j=1 j ≈ n2 /2 and j=1 j 2 ≈ n3 /3. When m = n, this reduces to 2n3 /3 +
n

O(n2 ) multiplications. Determining the cost of (6) is left to the reader.


For completeness, we include the operations counts for the other algorithms:
Householder (R only): for columns i = 1 : n, each entry of the submatrix of A is
used once, and then we compute an outer product of the same size.

n
2(m − i)(n − i) ≈ mn2 − n3 /3.
i=1

Givens (R only): for columns j = 1 : n, for rows i = j + 1 : m, we count the cost


of multiplying a matrix with 2 rows and (n − j) columns by a Givens matrix.

n 
m
4(n − j) ≈ 2mn2 − 2n2 /3
j=1 i=j+1

Gram-Schmidt: At each step k = 1 : n − 1 there is one inner product and one


axpy of vectors of length m.

n−1 k
2m ≈ mn2
k=1 i=1

CHALLENGE 5.11.
• Suppose z = b − Ax̃ ≤ b − Ax for all values of x. Then by multiplying
this inequality by itself we see that z 2 = b − Ax̃2 ≤ b − Ax2 , so x̃ is also
a minimizer of the square of the norm.
• Since QQ∗ = I, we see that ||Q∗ y||22 = (Q∗ y)∗ (Q∗ y) = y∗ QQ∗ y = y∗ y =
||y||22 , Since norms are nonnegative quantities, take the square root and con-
clude that ||Q∗ y||2 = ||y||2 .
• Suppose y1 contains the first p components of the m-vector y. Then

m
y22 = |yj |2
j=1
p 
m
= |yj |2 + |yj |2
j=1 j=p+1

= y1 22 + y2 22 .


27

CHALLENGE 5.12. Define


   
c1 R1
c = Q∗ b = ,R = ,
c2 0

where c1 is n × 1, c2 is (m − n) × 1, R1 is n × n, and 0 is (m − n) × n. Then

b − Ax2 = Q∗ (b − Ax)2


= c − Rx2
= c1 − R1 x2 + c2 − 0x2
= c1 − R1 x2 + c2 2 .

To minimize this quantity, we make the first term zero by taking x to be the solution
to the n × n linear system R1 x = c1 , so we see that the minimum value of b − Ax
is c2 . Note that this derivation is based on the three fundamental facts proved
in the previous challenge.

CHALLENGE 5.13. The two zeros of the function y = norm([.5 .4 a; a .3


.4; .3 .3 .3]) - 1 define the endpoints of the interval. Plotting tells us that one
is between 0 and 1 and the other is between 0 and -1. MATLAB’s fzero can be
used to find both roots. The roots are −0.5398 and 0.2389.

CHALLENGE 5.14. Suppose that {u1 , . . . , uk } form a basis for S. Then any
vector in S can be expressed as α1 u1 + . . . + αk uk . Since Aui = λui , we see that
A(α1 u1 + . . . + αk uk ) = λ1 α1 u1 + . . . + λk αk uk is also in S, since it is a linear
combination of the basis vectors. Therefore, if a subset of the eigenvectors of A
form a basis for S, then S is an invariant subspace.
Now suppose that S is an invariant subspace for A, so for any x ∈ S, the
vector Ax is also in S. Suppose the dimension of S is k, and that some vector
x ∈ S has components of eigenvectors corresponding to more than k eigenvalues:

r
x= αj uj ,
j=1

where r > k and αj = 0 for j = 1, . . . , r. Consider the vectors x, Ax, . . . , Ar−1 x,


all of which are in S, since each is formed from taking A times the previous one.
Then
   
x Ax A2 x . . . Ar−1 x = u1 u2 u3 . . . ur DW,
28 Chapter 5. Solutions: Matrix Factorizations

where D is a diagonal matrix containing the values α1 , . . . , αr , and


⎡ ⎤
1 λ1 λ21 . . . λ1r−1
⎢ 1 λ2 λ22 . . . λ2r−1 ⎥
⎢ ⎥
W=⎢ . . .. .. ⎥ .
⎣ .. .. . ... . ⎦
1 λr λ2r . . . λrr−1

Now W is a Vandermonde matrix and has full rank r, and so does the matrix formed
by the u vectors. Therefore the vectors x, Ax, . . . , Ar−1 x must form a matrix of
rank r and therefore are linearly independent, which contradicts the statement that
S has dimension k < r. Therefore, every vector in S must have components of at
most k different eigenvectors, and we can take them as a basis.

CHALLENGE 5.15.
(a) Subtracting the relations

x(k+1) = Ax(k) + b

and
xtrue = Axtrue + b,
we obtain
e(k+1) = x(k+1) − xtrue = A(x(k) − xtrue ) = Ae(k) .

(b) If k = 1, then the result holds by part (a). As an induction hypothesis, suppose

e(k) = Ak e(0) .

Then e(k+1) = Ae(k) by part (a), and substituting the induction hypothesis yields
e(k+1) = AAk e(0) = Ak+1 e(0) . Therefore, the result holds for all k = 1, 2, . . ., by
mathematical induction.
(c) Following the hint, we express

n
e(0) = αj uj ,
j=1

so, by part (b),



n
e(k) = αj λkj uj .
j=1

Now, if all eigenvalues λj lie within the unit circle, then λkj → 0 as k → ∞, so
e(k) → 0. On the other hand, if some eigenvalue λ is outside the unit circle, then
by choosing x(0) so that e(0) = u , we see that e(k) = λk u does not converge to
zero, since its norm |λk |u  → ∞.
29

CHALLENGE 5.16.

Form y = U∗ b n2 multiplications
Form z = Σ−1 y n multiplications (zi = yi /σi )
Form x = Vz n2 multiplications
Total: 2n2 + n multiplications

CHALLENGE 5.17.
(a) The columns of U corresponding to nonzero singular values form such a basis,
since for any vector y,

Ay = UΣV∗ y
n
= uj (ΣV∗ y)j
j=1

= uj (ΣV∗ y)j ,
σj >0

so any vector Ay can be expressed as a linear combination of these columns of U.


Conversely, any linear combination of these columns of U is in the range of A, so
they form a basis for exactly the range.
(b) Similar reasoning shows that the remaining columns of U form this basis.

CHALLENGE 5.18. We’ll work through (a) using the sum-of-rank-one-matrices


formulation, and (b) using the matrix formulation.
(a) The equation has a solution only if b is in the range of A, so b must be a linear
combination of the vectors u1 , . . . , up . For definiteness, let βj = u∗j b, so that


p
b= βj uj .
j=1

Let βj , j = p + 1, . . . , n be arbitrary. Then if we let


p
βj 
n
x= vj + βj vj ,
j=1
σj j=p+1

we can verify that Ax = b.


30 Chapter 5. Solutions: Matrix Factorizations

(b) Substituting the SVD our equation becomes


 
A∗ x = V Σ1 0 U∗ x = b,

where Σ1 is n × n with the singular values on the main diagonal. Letting y = U∗ x,


we see that a particular solution is
 
Σ1 −1 V∗ b
ygood = ,
0
so  
Σ1 −1 V∗ b 
n
v∗j b
xgood = U = uj .
0 σj
j=1

Every solution can be expressed as xgood + U∗2 v for some vector v since A∗ U∗2 = 0.

CHALLENGE 5.19.
(a) Since we want to minimize

(c1 − σ1 w1 )2 + . . . + (cn − σn wn )2 + c2n+1 + . . . + c2m ,

we set wi = ci /σi = u∗i btrue /σi .


(b) If we express x − xtrue as a linear combination of the vectors v1 , . . . vn , then
multiplying by the matrix A stretches each component by the corresponding singu-
lar value. Since σn is the smallest singular value, A(x − xtrue ) is bounded below
by σn x − xtrue . Therefore

σn x − xtrue  ≤ (b − btrue + r),

and the statement follows.


For any matrix C and any vector z, Cz ≤ Cz. Therefore, btrue  =
Axtrue  ≤ Axtrue .
(c) Using the given fact and the second statement, we see that

1
xtrue  ≥ btrue .
σ1

Dividing the first statement by this one gives

x − xtrue  σ1 b − btrue + r b − btrue + r


≤ = κ(A) .
xtrue  σn btrue  btrue 
31

CHALLENGE 5.20.
(a) Since UΣV∗ x = b, we have

x = VΣ−1 U∗ b.

If we let c = U∗ b, then α1 = c1 /σ1 , and α2 = c2 /σ2 .


(b) Here is one way to look at it. This system is very ill-conditioned. The condition
number is the ratio of the largest singular value to the smallest, so this must be
large. In other words, σ2 is quite small compared to σ1 .
For the perturbed problems,

Ax(i) = b − E(i) x(i) ,

so it is as if we solve the linear system with a slightly perturbed right-hand side.


So, letting f (i) = U∗ E(i) x(i) , the computed solution is
(i) (i)
x(i) = α1 v1 + α2 v2 ,

with
(i) (i) (i) (i)
α1 = (c1 + f1 )/σ1 , α2 = (c2 + f2 )/σ2 .
(i)
From the figure, we know that f (i) must be small, so α1 ≈ α1 . But because σ2 is
(i)
close to zero, α2 can be quite different from α2 , so the solutions lie almost on a
straight line in the direction v2 .

CHALLENGE 5.21. Some possibilities:

• Any right eigenvector u of A corresponding to a zero eigenvalue satisfies Au =


0u = 0. With rounding, the computed eigenvalue is not exactly zero, so we
can choose the eigenvector of A corresponding to the smallest magnitude
eigenvalue.

• Similarly, if v is a right singular vector of A corresponding to a zero singular


value, then Av = 0, so choose a singular vector corresponding to the smallest
singular value.

• Let en be the nth column of the identity matrix. If we perform a rank-


revealing QR decomposition of A∗ , so that A∗ P = QR, and let qn be the
last column of Q, then q∗n A∗ P = q∗n QR = eTn R = rnn eTn = 0. Multiplying
through by P−1 we see that Aqn = 0, so choose z = qn .
32 Chapter 5. Solutions: Matrix Factorizations

CHALLENGE 5.22. (Partial Solution)


(a) Find the null space of a matrix: QR (fast; relatively stable) or SVD (slower but
more reliable)
(b) Solve a least squares problem: QR when the matrix is well conditioned. Don’t
try QR if the matrix is not well-conditioned; use the SVD method.
(c) Determine the rank of a matrix: RR-QR (fast, relatively stable); SVD (slower
but more reliable).
(d) Find the determinant of a matrix: LU with pivoting.
(e) Determine whether a symmetric matrix is positive definite: Cholesky or eigen-
decomposition (slower but more reliable) The LLT version of Cholesky breaks down
if the matrix has a negative eigenvalue by taking the square root of a negative num-
ber, so it is a good diagnostic. If the matrix is singular, (positive semi-definite),
then we get a 0 on the main diagonal, but with rounding error, this is impossible
to detect.
Chapter 6

Solutions: Case Study:


Image Deblurring: I Can
See Clearly Now

(coauthored by James G. Nagy)

CHALLENGE 6.1. Observe that if y is a p × 1 vector and z is a q × 1 vector


then  
 y 2  p q
  = y 2
+ zi2 = y22 + z22 .
 z  i
2 i=1 i=1

Therefore,
    2  
 g K   g − Kf 2
    = g−Kf 22 +αf 22 = g−Kf 22 +α2 f 22 .
 0 − αI f  =  −αf 
2 2

CHALLENGE 6.2. First note that


 T 
U 0
Q≡
0 VT

is an orthogonal matrix since QT Q = I, and recall that the 2-norm of a vector is


invariant under multiplication by an orthogonal matrix: Qz2 = z2 . Therefore,
    2  2
 g   
 −
K
f  =  g − Kf 
 0 αI   −αf 
2 2
  2
 g − UΣVT f 
=
Q 

−αf 2

33
34 Chapter 6. Solutions: Case Study: Image Deblurring: I Can See Clearly Now

 T 
 U g − ΣVT f 2

= 
−αVT f 
2
 2
 
 g − Σf 
= 
 −αf 
2
    2
 g Σ 

= − f
 .
0 αI 2

CHALLENGE 6.3. Let’s write the answer for a slightly more general case: K
of dimension m × n with m ≥ n.
    2
  
 g − Σ f
 0 αI 
2
g − Σf22 + α2 f22
= 
n 
m 
n
= (ĝi − σi fˆi )2 + ĝi2 + α2 fˆi2 .
i=1 i=n+1 i=1

Setting the derivative with respect to fi to zero we obtain

−2σi (ĝi − σi fˆi ) + 2α2 fˆi = 0,

so
σi ĝi
fˆi = .
σi2+ α2

CHALLENGE 6.4. From Challenges 2 and 3 above, with α = 0, the solution is


σi ĝi ĝi
fˆi = 2 = .
σi σi

Note that ĝi = uTi g. Now, since f = Vfˆ, we have

f = v1 fˆ1 + . . . + vn f̂ n

and the formula follows.

CHALLENGE 6.5. See the posted program. Some comments:


35

• One common bug in the TSVD section: zeroing out pieces of SA and SB . This
does not zero the smallest singular values, and although it is very efficient in
time, it gives worse results than doing it correctly.

• The data was generated by taking an original image F (posted on the website),
multiplying by K, and adding random noise using the MATLAB statement
G = B * F * A’ + .001 * rand(256,256). Note that the noise prevents us
from completely recovering the initial image.

• In this case, the best way to choose the regularization parameter is by eye:
choose a detailed part of the image and watch its resolution for various choices
of the regularization parameter, probably using bisection to find the best pa-
rameter. Figure 6.1 shows data and two reconstructions. Although both
algorithms yield images in which the text can be read, noise in the data ap-
pears in the background of the reconstructions. Some nonlinear reconstruction
algorithms reduce this noise.

• In many applications, we need to choose the regularization parameter by au-


tomatic methods rather than by eye. If the noise-level is known, then the
discrepancy principle is the best: choose the parameter to make the residual
Kf − g close in norm to the expected norm of the noise. If the noise-level
is not known, then generalized cross validation and the L-curve are popular
methods. See [1,2] for discussion of such methods.

[1] Per Christian Hansen, James M. Nagy, and Dianne P. O’Leary. Deblurring
Images: Matrices, Spectra, and Filtering. SIAM Press, Philadelphia, 2006.
[2] Bert W. Rust and Dianne P. O’Leary, “Residual Periodograms for Choos-
ing Regularization Parameters for Ill-Posed Problems”, Inverse Problems, 24
(2008) 034005.
36 Chapter 6. Solutions: Case Study: Image Deblurring: I Can See Clearly Now

Original Image Blurred Image

50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250

Tikhonov with λ = 0.0015 TSVD with p = 2500

50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250

Figure 6.1. The original image, blurred image, and results of the two algorithms
Chapter 7

Solutions: Case Study:


Updating and Downdating
Matrix Factorizations: A
Change in Plans

CHALLENGE 7.1.
(a) Set the columns of Z to be the differences between the old columns and the new
columns, and set the columns of V to be the 6th and 7th columns of the identity
matrix.
(b) The first column of Z can be the difference between the old column 6 and the
new one; the second can be the 4th column of the identity matrix. The first column
of V is then the 6th column of the identity matrix and the second is the difference
between the old row 4 and the new one but set the 6th element to zero.

CHALLENGE 7.2.
(a) This is verified by direct computation.
(b) We use several facts to get an algorithm that is O(kn2 ) instead of O(n3 ) for
dense matrices:
• x = (A − ZVT )−1 b = (A−1 + A−1 Z(I − VT A−1 Z)−1 VT A−1 )b.
• Forming A−1 from the LU decomposition takes O(n3 ) operations, but forming
A−1 b as U\(L\b) uses forward- and back-substitution and just takes O(n2 ).
• (I − VT A−1 Z) is only k × k, so factoring it is cheap: O(k 3 ). (Forming it is
more expensive, with a cost that is O(kn2 ).)
• Matrix multiplication is associative.
Using MATLAB notation, once we have formed [L,U]=lu(A), the resulting algo-
rithm is
y = U \ (L \ b);
Zh = U \ (L \ Z);

37
38 Chapter 7. Solutions: Case Study: Updating and Downdating

Making Sherman−Morrison−Woodbury time comparable to Backslash


300

250

200
Rank of update

150

100

50

0
0 100 200 300 400 500 600 700 800 900 1000
Size of matrix

Figure 7.1. Results of Challenge 4. The plot shows the rank k0 of the up-
date for which the time for using the Sherman–Morrison–Woodbury formula was
approximately the same as the time for solving using factorization. Sherman–
Morrison–Woodbury was faster for n ≥ 40 when the rank of the update was less
than 0.25n.

t = (eye(k) - V’*Zh) \ (V’*y);


x = y + Zh*t;

CHALLENGE 7.3. See sherman mw.m on the website.

CHALLENGE 7.4. The solution is given in problem4.m on the website, and


the results are plotted in the textbook. In this experiment (Sun UltraSPARC-III
with clock speed 750 MHz running MATLAB 6) Sherman-Morrison-Woodbury was
faster for n ≥ 40 when the rank of the update was less than 0.25n.

CHALLENGE 7.5.
39

(a) (Review)  
cz1 + sz2
Gz = = xe1
sz1 − cz2
Multiplying the first equation by c, the second by s, and adding yields

(c2 + s2 )z1 = cx ,

so
c = z1 /x .
Similarly, we can determine that

s = z2 /x .

Since c2 + s2 = 1, we conclude that

z12 + z22 = x2 ,

so we can take
z1
c=  ,
z12 + z22
z2
s=  .
z12 + z22

(b) The first rotation matrix is chosen to zero a61 . The second zeros the resulting
entry in row 6, column 2, and the final one zeros row 6, column 3.

CHALLENGE 7.6. See problem6.m and qrcolchange.m on the website.

CHALLENGE 7.7.
⎡ ⎤
0
  ⎢ 0 ⎥
⎢ ⎥ 
A 0 ⎢ .. ⎥
Anew = +⎢ . ⎥ an+1,1 ... an+1,n (an+1,n+1 − 1)
0 1 ⎢ ⎥
⎣ 0 ⎦
1
⎡ ⎤
a1,n+1
⎢ a2,n+1 ⎥
⎢ ⎥ 
⎢ .. ⎥
+⎢ . ⎥ 0 ... 0 1
⎢ ⎥
⎣ an,n+1 ⎦
0
40 Chapter 7. Solutions: Case Study: Updating and Downdating
⎡ ⎤
0 a1,n+1
  ⎢ 0 a2,n+1 ⎥ 
⎢ ⎥
A 0 ⎢ .. .. ⎥ an+1,1 ... an+1,n 0
= +⎢ . . ⎥ .
0 1 ⎢ ⎥ 0 ... 0 1
⎣ 0 an,n+1 ⎦
1 (an+1,n+1 − 1)

So we can take
⎡ ⎤
0 a1,n+1
⎢ 0 a2,n+1 ⎥  
⎢ ⎥
⎢ .. .. ⎥ T an+1,1 ... an+1,n 0
Z = −⎢ . . ⎥;V = .
⎢ ⎥ 0 ... 0 1
⎣ 0 an,n+1 ⎦
1 (an+1,n+1 − 1)
Chapter 8

Solutions: Case Study:


The Direction-of-Arrival
Problem: Coming at You

CHALLENGE 8.1. Let wk = SCzk , and multiply the equation BAΦSCzk =


λk BASCzk by (BA)−1 to obtain
Φwk = λk wk , k = 1, . . . , d.
Then, by the definition of eigenvalue, we see that λk is an eigenvalue of Φ corre-
sponding to the eigenvector wk . Since Φ is a diagonal matrix, its eigenvalues are
its diagonal entries, so the result follows.

 we see that
CHALLENGE 8.2. Using the SVD of U,
  ∗   ∗  
∗ ∗ V1 V2
T [U1 U2 ] = ΔV = Δ Δ .
V∗3 V∗4
Now we compute the matrices from Problem 1:
 
−1 ∗ ∗ Σ1 −1
BASC = [Δ1 , 0d×(m−d) ]T U1 [Σ1 , 0d×(n−d) ]W W
0(n−d)×d
 ∗ 
−1 V1
= [Δ1 , 0d×(m−d) ]Δ
V∗3
= V∗1 .
 
−1 ∗ ∗ Σ1 −1
BAΦSC = [Δ1 , 0d×(m−d) ]T U2 [Σ1 , 0d×(n−d) ]W W
0(n−d)×d
 ∗ 
−1 V2
= [Δ1 , 0d×(m−d) ]Δ
V∗4
= V∗2 .
Thus, with this choice of B and C, the eigenvalue problem of Challenge 1 reduces
to V∗2 zk = λk V∗1 zk .

41
42 Chapter 8. Solutions: Case Study: The Direction-of-Arrival Problem

Results using rectangular windowing


30

25

20

15

10

0
0 100 200 300 400 500 600 700

Figure 8.1. Results of Challenge 3: the true DOA (blue) and the DOA
estimated by rectangular windowing (red) as a function of time.

CHALLENGE 8.3. The results are shown in Figure 8.1. The average error
in the angle estimate is 0.62 degrees, and the average relative error is 0.046. The
estimated DOAs are quite reliable except when the signals get very close.

CHALLENGE 8.4.
 
  f X∗
Xnew X∗new = fX x = f 2 XX∗ + xx∗ .
x∗

The matrix XX∗ has 4m2 entries, so multiplying it by f 2 requires O(m2 ) mul-
tiplications. The number of multiplications needed to form xx∗ is also O(m2 )
multiplications.

CHALLENGE 8.5. The results are shown in Figure 8.2. The average error
in the angle estimate is 0.62 degrees, and the average relative error is 0.046. The
results are quite similar to those for rectangular windowing.
43

Results using exponential windowing


30

25

20

15

10

0
0 100 200 300 400 500 600 700

Figure 8.2. Results of Problem 5: the true DOA (blue) and the DOA
estimated by exponential windowing (red) as a function of time.

CHALLENGE 8.6.
(a) The sum of the squares of the entries of X is the square of the Frobenius norm
of X, and this norm is invariant under multiplication by an orthogonal matrix.
Therefore,
X2F = Σ2F = σ12 + . . . + σm
2
.

(b) The expected value of the square of each entry of X is ψ 2 , so the sum of these
mn values has expected value ψ 2 mn.
(c) The expected value is now

m 
n 
n
mf 2 ψ 2
f 2j E(x2kj ) = m f 2j ψ 2 →
1 − f2
k=1 j=1 j=1

for large n, where E denotes expected value.

CHALLENGE 8.7. The software on the website varies κ between 2 and 6. For
rectangular windowing, a window size of 4 produced fewer d-failures than window
sizes of 6 or 8 at a price of increasing the average error to 0.75 degrees. As κ
increased, the number of d-failures also increased, but the average error when d was
correct decreased.
For exponential windowing, the fewest d-failures (8) occurred for f = 0.7 and
κ = 2, but the average error in this case was 1.02. As κ increased, the number of
d-failures increased but again the average error when d was correct decreased.
44 Chapter 8. Solutions: Case Study: The Direction-of-Arrival Problem

We have seen that matrix-based algorithms are powerful tools for signal pro-
cessing, but they must be used in the light of statistical theory and the problem’s
geometry.

CHALLENGE 8.8. No answer provided.


Unit III

SOLUTIONS: Optimization
and Data Fitting

45
Chapter 9

Solutions: Numerical
Methods for
Unconstrained
Optimization

CHALLENGE 9.1.
f (x) = x41 + x2 (x2 − 1),
   
4x31 12x21 0
g(x) = , H(x) = .
2x2 − 1 0 2
Step 1:
         
12x21 0 −1 4x31 48 0 −1 32 −32/48
p=− =− = ,
0 2 2x2 − 1 0 2 −3 +3/2
so      
2 −2/3 4/3
x← + = .
−1 +3/2 1/2

CHALLENGE 9.2.
   
ex1 +x2 (1 + x1 ) 8
g(x) = = ;
ex1 +x2 x1 + 2x2 4.7726
   
ex1 +x2 (2 + x1 ) ex1 +x2 (1 + x1 ) 12 8
H(x) = = .
ex1 +x2 (1 + x1 ) ex1 +x2 x1 + 2 8 6
Now det(H) = 72 − 64 = 8, so
    
1 6 −8 −8 −1.2274
p = −H−1 g = = .
8 −8 12 −4.7726 0.8411
We’d never use this inverse formula on a computer, except possibly for 2x2 matrices.
Gauss elimination is generally better:
   
1 0 12 8
L= ,U = .
2/3 1 0 2/3

Note that pT g = −5.8050 < 0, so the direction is downhill.

47
48 Chapter 9. Solutions: Numerical Methods for Unconstrained Optimization

CHALLENGE 9.3.

x = [1;2];
for i=1:5,
g = [4*(x(1) - 5)^3 - x(2);
4*(x(2) + 1)^3 - x(1)];
H = [12*(x(1) - 5)^2, -1;
-1, 12*(x(2) + 1)^2];
p = -H \ g;
x = x + p;
end

CHALLENGE 9.4. The Lagrangian function is


1 λ
L(p, λ) = f (x) + pT g + pT Hp + (pT p − h2 ).
2 2
Setting the partial derivatives to zero yields

g + Hp + λp = 0,
1 T
(p p − h2 ) = 0.
2
 = −g, as in the
Thus the first equation is equivalent to (H + λI)p = −g, or Hp
Levenberg-Marquardt algorithm. The parameter λ is chosen so that pT p = h2 .

CHALLENGE 9.5. If f is quadratic, then

H(k) s(k) = g(x(k+1) ) − g(x(k) ),

where H is the Hessian matrix of f . Close to x(k+1) , a quadratic model is a close fit


to any function, so we demand this property to hold for our approximation to H.

CHALLENGE 9.6. The formula for Broyden’s good method is

s(k)T
B(k+1) = B(k) − (B(k) s(k) − y(k) ) .
s(k)T s(k)
49

To verify the secant condition, compute


s(k)T s(k)
B(k+1) s(k) = B(k) s(k) − (B(k) s(k) − y(k) )
s(k)T s(k)
= B(k) s(k) − (B(k) s(k) − y(k) )
= y(k) ,
as desired. If vT s(k) = 0, then
s(k)T v
B(k+1) v = B(k) v − (B(k) s(k) − y(k) )
s(k)T s(k)
= B(k) v,
as desired.

CHALLENGE 9.7.
B(k) s(k) s(k)T B(k) y(k) y(k)T (k)
B(k+1) s(k) = B(k) s(k) − s(k) + s
s(k)T B(k) s(k) y(k)T s(k)
(k) (k) (k)T
B s s B(k) s(k) y(k) y(k)T s(k)
= B(k) s(k) − +
s(k)T B(k) s(k) y(k)T s(k)
s(k)T B(k) s(k) y(k)T s(k)
= B(k) s(k) − B(k) s(k) (k)
+ y(k) (k)T (k)
s(k)T B s(k) y s
(k) (k) (k) (k) (k)
=B s −B s +y
= y(k) .

CHALLENGE 9.8. Dropping superscripts for brevity, and taking advantage of


symmetry of H, we obtain
1
f (x(0) + αp(0) ) = (x + αp)T H(x + αp) − (x + αp)T b
2
1 1
= xT Hx − xT b + αpT Hx + α2 pT Hp − αpT b.
2 2
Differentiating with respect to α we obtain
pT Hx + αpT Hp − pT b = 0,
so
pT b − pT Hx pT r
α= T
= T ,
p Hp p Hp
where r = b − Hx.
If we differentiate a second time, we find that the second derivative of f with respect
to α is pT Hp > 0 (when p = 0), so we have found a minimizer.
50 Chapter 9. Solutions: Numerical Methods for Unconstrained Optimization

CHALLENGE 9.9.

function Hv = Htimes(x,v,h)

% Input:
% x is the current evaluation point.
% v is the direction of change in x.
% h is the stepsize for the change,
% a small positive parameter (e.g., h = 0.001).
% We use a function [f,g] = myfnct(x),
% which returns the function value f(x) and the gradient g(x).
%
% Output:
% Hv is a finite difference approximation to H*v,
% where H is the Hessian matrix of myfnct.
%
% DPO

[f, g ] = myfnct(x);
[fp,gp] = myfnct(x + h * v);
Hv = (gp - g)/h;

CHALLENGE 9.10. Here is one way to make the decision:


• If the function is not differentiable, use Nelder-Meade.
• If 2nd derivatives (Hessians) are cheaply available and there is enough storage
for them, use Newton.
• Otherwise, use quasi-Newton (with a finite-difference 1st derivative if neces-
sary).

CHALLENGE 9.11.
• Newton: often converges with a quadratic rate when started close enough to
a solution, but requires both first and second derivatives (or good approxima-
tions of them) as well as storage and solution of a linear system with a matrix
of size 2000 × 2000.
51

• Quasi-Newton: often converges superlinearly when started close enough to a


solution, but requires first derivatives (or good approximations of them) and
storage of a matrix of size 2000 × 2000, unless the matrix is accumulated
implicitly by saving the vectors s and y.
• Pattern search: converges only linearly, but has good global behavior and
requires only function values, no derivatives.

If first derivatives (or approximations) were available, I would use quasi-Newton,


with updating of the matrix decomposition (or a limited memory version). Other-
wise, I would use pattern search.

CHALLENGE 9.12. I would use pattern search to minimize F (x) = −y(1) as a


function of x. When a function value F (x) is needed, I would call one of MATLAB’s
stiff ode solvers, since I don’t know whether the problem is stiff or not, and return
the value computed as y(1). The value of x would need to be passed to the function
that evaluates f for the ode solver.
I chose pattern search because it has proven convergence and does not require
derivatives of F with respect to x. Note that these derivatives are not available
for this problem: we can compute derivatives of y with respect to t but not with
respect to x. And since our value of y(1) is only an approximation, the use of finite
differences to estimate derivatives with respect to x would yield values too noisy to
be useful.

CHALLENGE 9.13.
Method conv. rate Storage f evals/itn g evals/itn H evals/itn
Truncated Newton >1 O(n) 0 ≤n+1 0
Newton 2 O(n2 ) 01 1 1

Quasi-Newton > 12 O(n2 ) 01 1 0

steepest descent 1 O(n) 01 1 0

Conjugate gradients 1 O(n) 01 1 0

Notes on the table:


1. Once the counts for the linesearch are omitted, no function evaluations are
needed.
2. For a single step, Quasi-Newton is superlinear; it is n-step quadratic.
52 Chapter 9. Solutions: Numerical Methods for Unconstrained Optimization
Chapter 10

Solutions: Numerical
Methods for Constrained
Optimization

CHALLENGE 10.1. The graphical solutions are left to the reader.


(a) The optimality condition is that the gradient should be zero. We calculate
 
2x1 − x2 + 5
g(x) = ,
8x2 − x1 + 3
 
2 −1
H(x) = .
−1 8
Since H is positive definite (Gerschgorin theorem), f (x) has a unique minimizer
satisfying g(x) = 0, so  
−5
H(x) x = ,
−3
and therefore x = [−2.8667, −0.7333]T .
(b) Using a Lagrange multiplier we obtain
L(x, λ) = x21 + 4x22 − x1 x2 + 5x1 + 3x2 + 6 − λ(x1 + x2 − 2).
Differentiating gives the optimality conditions
2x1 − x2 + 5 − λ = 0,
8x2 − x1 + 3 − λ = 0,
x1 + x2 − 2 = 0.
Solving this linear system of equations yields x = [1.3333, 0.6667]T , λ = 7.
(c) In this case, AT = I, so the conditions are
 
2x1 − x2 + 5
λ= ,
8x2 − x1 + 3
λ ≥ 0,
x ≥ 0,
λ1 x1 + λ2 x2 = 0.

53
54 Chapter 10. Solutions: Numerical Methods for Constrained Optimization

The solution is x = 0, λ = [5, 3]T .


(d) Remembering to convert the first constraint to ≥ form, we get
⎡ ⎤T
1 0  
⎣ 0 ⎦ 2x1 − x2 + 5
1 λ= ,
8x2 − x1 + 3
−2x1 −2x2
λ ≥ 0,
x ≥ 0,
1 − x21 − x22 ≥ 0,
λ1 x1 + λ2 x2 + λ3 (1 − x21 − x22 ) = 0.

The solution is x = 0, λ = [5, 3, 0]T .

CHALLENGE 10.2. (Partial Solution) We need to verify that ZT ∇xx L(x, λ)Z
is positive semidefinite.
(a)  
2 −1
Z ∇xx L(x, λ)Z = H(x) =
T
.
−1 8

(b)  
2 −1
∇xx L(x, λ) = ,
−1 8

and we can take ZT = [1, −1].


(c)  
2 −1
∇xx L(x, λ) = .
−1 8
Both constraints are active at the optimal solution, so Z is the empty matrix and
the optimality condition is satisfied trivially.
(d) The x ≥ 0 constraints are active, so the solution is as in part (c).

CHALLENGE 10.3. The vector [6, 1]T is a particular solution to x1 − 2x2 = 4,


and the vector [2, 1]T is a basis for the nullspace of the matrix A = [1, −2]. (These
choices are not unique, so there are many correct answers.) Using our choices, any
solution to the equality constraint can be expressed as
     
6 2 6 + 2v
x= + v= .
1 1 1+v
55

Therefore, our problem is equivalent to

min 5(6 + 2v)4 + (6 + 2v)(1 + v) + 6(1 + v)2


v

subject to

6 + 2v ≥ 0,
1 + v ≥ 0.

Using a log barrier function for these constraints, we obtain the unconstrained
problem
min Bμ (v)
v

where

Bμ (v) = 5(6 + 2v)4 + (6 + 2v)(1 + v) + 6(1 + v)2 − μ log(6 + 2v) − μ log(1 + v).

Notice that if 1 + v ≥ 0, then 6 + 2v ≥ 0. Therefore, the first log term can be


dropped from Bμ (v).

CHALLENGE 10.4.
(a) The central path is defined by

Ax = b,

A w + s = c,
μF  (x) + s = 0,

where F  (x) = X−1 e, w is an m × 1 vector, e is the column vector of ones, and X


is diag(x).
(b)The central path is defined by

trace(ai x) = bi . i = 1, . . . , m,

m
wi ai + s = c,
i=1
μF  (x) + s = 0,

where w is an m × 1 vector and x and s are symmetric n × n matrices. Since


F (x) = log(det(x)), we can compute F  (x) = −x−1 .

CHALLENGE 10.5.
56 Chapter 10. Solutions: Numerical Methods for Constrained Optimization

(a) K ∗ is the set of vectors that form nonnegative inner products with every vector
that has nonnegative entries. Therefore, K = K ∗ = {x : x ≥ 0}, the positive
orthant in Rn .
(b) The dual problem is
max wT b
w
subject to AT w + s = c and s ≥ 0.
(c) Since s ≥ 0, the constraint AT w + s = c means that each component of AT w
is less than or equal to each component of c, since we need to add s on in order to
get equality. Therefore, AT w ≤ c.
(d)
First order optimality conditions: The constraints are

x ≥ 0,
Ax − b = 0.

The derivative matrix for the constraints becomes


 
I
.
A

Using a bit of foresight, let’s call the Lagrange multipliers for the inequality con-
straints s and those for the equality constraints λ. Then the optimality conditions
are

Is + AT λ = c,
s ≥ 0,
x ≥ 0,
Ax = b,
sT x = 0.

The central path:

Ax = b,

A w + s = c,
μX−1 e + s = 0,

where s, x ≥ 0 (since the feasible cone is the positive orthant).


Both sets of conditions have Ax = b. Setting λ = w shows the equivalence of
s + AT λ = c and A∗ w + s = c, since A∗ = AT . Multiplying μX−1 e + s = 0 by X,
we obtain μe + Xs = 0, which is equivalent to sT x = 0 when μ = 0, since x and s
are nonnegative. Therefore the conditions are equivalent.
57

CHALLENGE 10.6. (Partial Solution) We minimize by making both factors


large in absolute value and with the same sign. The largest absolute values occur at
the corners, and the minimizers occur when x1 = x2 = 1 and when x1 = x2 = 0. In
both cases, f (x) = −1/4. By changing one of the 1/2 values to 1/3 (for example),
one of these becomes a local minimizer.
58 Chapter 10. Solutions: Numerical Methods for Constrained Optimization
Chapter 11

Solutions: Case Study:


Classified Information:
The Data Clustering
Problem

(coauthored by Nargess Memarsadeghi)

CHALLENGE 11.1. The original image takes mpb bits, while the clustered
image takes kb bits to store the cluster centers and mplog2 k bits to store the
cluster indices for all mp pixels. For jpeg images with RGB (red, green, blue)
values ranging between 0 and 255, we need 8 bits for each of the q = 3 values (red,
green, and blue). Therefore, an RGB image with 250,000 pixels takes 24∗250, 000 =
6, 000, 000 bits, while the clustered image takes about 250, 000 log2 4 = 500, 000 bits
if we have 3 or 4 clusters and 250,000 bits if we have 2 clusters. These numbers can
be further reduced by compression techniques such as run-length encoding.

CHALLENGE 11.2.

(a) Neither D nor R is convex everywhere. Figure 11.2 plots these functions for a
particular choice of points as one of the cluster centers is moved. We fix the data
points at 0 and 1 and one of the centers at 1.2, and plot D and R as a function of
the second center c. For c < −1.2 and c > 1.2, the function D is constant, since the
second cluster is empty, while for −1.2 < c < 1.2, the function is quadratic. Since
each function is above some of its secants (the line connecting two points on the
graph), each function fails to be convex.

(b) Neither D nor R is differentiable everywhere. Again see Figure 11.1. The
function D fails to be differentiable at c = −1.2 and c = 1.2. Trouble occurs at the
points where a data value moves from one cluster to another.

59
60 Chapter 11. Solutions: Case Study: Classified Information

1.5

0.5

R(c)
D(c)

0
−1.5 −1 −0.5 0 0.5 1 1.5
c

Figure 11.1. The functions R and D for a particular dataset.

(c) We want to minimize the function


n
D(c) = xi − c2
i=1

over all choices of c. Since there is only one center c, this function is convex
and differentiable everywhere, and the solution must be a zero of the gradient.
Differentiating with respect to c we obtain

n
2(xi − c) = 0,
i=1

so
1
n
c= xi .
n i=1
It is easy to verify that this is a minimizer, not a maximizer or a stationary point,
so the solution is to choose c to be the centroid (mean value) of the data points.
(d) As one data point moves from the others, eventually a center will “follow” it,
giving it its own cluster. So the clustering algorithm will fit k − 1 clusters to the
remaining n − 1 data points.

CHALLENGE 11.3. A sample program is given on the website. Note that


when using a general purpose optimization function, it is important to match the
61

Figure 11.2. The images resulting from minimizing R.

termination criteria to the scaling of the variables and the function to be minimized;
otherwise the function might terminate prematurely (even with negative entries in
the cluster centers if, for example, the objective function values are all quite close
together) or never terminate (if small changes in the variables lead to large changes
in the function). We chose to scale R and D by dividing by 256 and by the number
of points in the summation.
The solution is very sensitive to the initial guess, since there are many local
minimizers.
(a) The number of variables is kq.
(b) Although the number of variables is quite small (9 for k = 3 and 15 for k = 5),
evaluating the function R or D is quite expensive, since it involves a mapping
each of the 1,000 pixels to a cluster. Therefore, the overhead of the algorithm is
insignificant and the time is proportional to the number of function evaluations.
The functions are not differentiable, so modeling as a quadratic function is not so
effective. This slows the convergence rate, although only 15-25 iterations are used.
This was enough to converge when minimizing D, but not enough for R to converge.
Actually, a major part of the time in the sample implementation is postpro-
cessing: the construction of the resulting image!
(c) Figures 11.2 and 11.3 show the results with k = 3, 4, 5 clusters. The solution
62 Chapter 11. Solutions: Case Study: Classified Information

Figure 11.3. The images resulting from minimizing D.

is very dependent on the initial guess, but the rather unlikely choice that we made
(all zeros in the green coordinate) gave some of the best results.
Our first evaluation criterion should be how the image looks, sometimes called
the “eyeball norm”. In the results for minimizing D, it is harder to differentiate the
dog from the background. For minimizing R with k = 3, his white fur is rendered as
green and the image quality is much worse than for minimizing D or using k-means.
For k = 4 or k = 5, though, minimizing R yields a good reconstruction, with good
shading in the fur and good rendering of the table legs in the background, and the
results look better than those for minimizing D. (Note that the table legs were not
part of the sample that determined the cluster centers.)
We can measure the quantitative change in the images, too. Each pixel value
xi in the original or the clustered image is a vector with q dimensions, and we can
measure the relative change in the image as
 1/2
||2
n
i=1 ||xoriginal
i − xclustered
i
n original 2
.
i=1 ||xi ||

This measure is usually smaller when minimizing R rather than D: .363 vs .271 for
k = 3, .190 vs .212 for k = 4, and .161 vs .271 for k = 5. The optimization program
sometimes stops with negative coordinates for a cluster center or no points in the
63

Figure 11.4. The images resulting from k-means.

cluster; for example, for k = 5, minimizing D produced only 3 nonempty clusters,


and for k = 2, minimizing R produced only 2 nonempty clusters.
(d) If q < 4, then k might be chosen by plotting the data. For larger values of q,
we might try increasing values of k, stopping when the cluster radii fall below the
noise level in the data or when the cluster radii stay relatively constant.
Only one choice of data values appears in the sample program, but we can
easily modify the program to see how sensitive the solution is to the choice of data.

CHALLENGE 11.4. A sample program is given on the website and results are
shown in Figure 11.4. This k-Means function is much faster than the algorithm for
Challenge 3. The best results for k = 3 and k = 5 are those from k-means, but
the k = 4 result from minimizing R seems somewhat better than the other k = 4
results. The quantitative measures are mixed: the 2-norm of the relative change is
.247, .212, and .153 for k = 3, 4, 5 respectively, although the algorithm was not run
to convergence.
64 Chapter 11. Solutions: Case Study: Classified Information

Original Data 2 clusters

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

3 clusters 4 clusters

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Figure 11.5. Clustering the data for Challenge 5 using the first initializa-
tion of centers.

CHALLENGE 11.5. The website contains a sample program, and Figures 11.5
and 11.6 display the results. Each datapoint is displayed with a color and symbol
that represent its cluster. An intuitive clustering of this data is to make two vertical
clusters, as determined by the algorithm with the first initialization and k = 2.
Note, however, that the distance between the top and bottom data points in each
cluster is the same as the distance between the clusters (measured by the minimum
distance between points in different clusters)! The two clusters determined by the
second initialization have somewhat greater radii, but are not much worse. What
is worse about them, though, is that there is less distance between clusters.
If we choose to make too many clusters (k > 2), we add artificial distinctions
between data points.

CHALLENGE 11.6. The website contains a sample program, and Figures 11.7
65

Original Data 2 clusters

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

3 clusters 4 clusters

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Figure 11.6. Clustering the data for Challenge 5 using the second initial-
ization of centers.

and 11.8 display the results.


Coordinate scaling definitely changes the merits of the resulting clusters. The
clusters produced by the second initialization have much smaller radii. Nonlinear
scalings of the data also affect clustering; for example, the results of clustering the
pixels in the Charlie image could be very different if we represented the image in
coordinates other than RGB.
66 Chapter 11. Solutions: Case Study: Classified Information

Original Data 2 clusters

100 100

50 50

0 0

−50 −50

−100 −100

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

3 clusters 4 clusters

100 100

50 50

0 0

−50 −50

−100 −100

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Figure 11.7. Clustering the data for Challenge 6 using the first initializa-
tion of centers. Note that the vertical scale is far different from the horizontal.
67

Original Data 2 clusters

100 100

50 50

0 0

−50 −50

−100 −100

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

3 clusters 4 clusters

100 100

50 50

0 0

−50 −50

−100 −100

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

Figure 11.8. Clustering the data for Challenge 6 using the second initial-
ization of centers. Note that the vertical scale is far different from the horizontal.
68 Chapter 11. Solutions: Case Study: Classified Information
Chapter 12

Solutions: Case Study:


Achieving a Common
Viewpoint: Yaw, Pitch,
and Roll

(coauthored by David A. Schug)

CHALLENGE 12.1.
(a) First we rotate the object by an angle φ in the xy-plane. Then we rotate by
an angle −θ in the new xz-plane, and finish with a rotation of ψ in the resulting
yz-plane.
(b) We will use the QR decomposition of a matrix; any nonsingular matrix can be
expressed as the product of an orthogonal matrix times an upper triangular one.
One way to compute this is to use plane rotations to reduce elements below the di-
agonal of our matrix to zero. Let’s apply this to the matrix QT . Then by choosing
φ appropriately, we can make Qyaw QT have a zero in row 2, column 1. Similarly,
by choosing θ, we can force a zero in row 3, column 1 of Qpitch Qyaw QT (without
ruining our zero in row 2, column 1). (Note that since we require −π/2 < θ < π/2,
if cos θ turns out to be negative, we need to change the signs on cos θ and sin θ to
compensate.) Finally, we can choose ψ to force Qroll Qpitch Qyaw QT to be upper tri-
angular. Since the product of orthogonal matrices is orthogonal, and the only upper
triangular orthogonal matrices are diagonal, we conclude that Qroll Qpitch Qyaw is
a diagonal matrix (with entries ±1) times (QT )−1 . Now convince yourself that the
angles can be chosen so that the diagonal matrix is the identity. This method for
proving this property is particularly nice because it leads to a fast algorithm that
we can use in Challenge 4 to recover the Euler angles given an orthogonal matrix
Q.

CHALLENGE 12.2. A sample MATLAB program to solve this problem is


available on the website. The results are shown in Figure 12.1. In most cases, 2-4

69
70 Chapter 12. Solutions: Case Study: Achieving a Common Viewpoint

Problem 2 Problem 4
2 3

2
1
angle(radians)

angle(radians)
1
0
0

−1
−1

−2 −2
20 40 60 80 100 120 20 40 60 80 100 120
sample number sample number

Problem 2 Problem 4
−1 −14
10 10

−2
10
error

error

−15
10
−3
10

−4 −16
10 10
20 40 60 80 100 120 20 40 60 80 100 120
sample number sample number

Figure 12.1. Results of Challenge 2 (left) and Challenge 4 (right). The


top graphs show the computed yaw (blue plusses), pitch (green circles), and roll (red
x’s), and the bottom graphs show the error in Q (blue plusses) and the error in the
rotated positions (green circles).

digit accuracy is achieved for the angles and positions, but trouble is observed when
the pitch is close to vertical (±π/2).

CHALLENGE 12.3.

(a) Suppose that C is m × n. The first fact follows from


n  m 
n 
m
trace(CT C) = ( c2ik ) = c2ik = C2F .
k=1 i=1 k=1 i=1
71

To prove the second fact, note that


m n
trace(CD) = ( (cki dik )),
k=1 i=1

while

n m
trace(DC) = ( (dik cki )),
i=1 k=1

which is the same.


(b) Note that

B−QA2F = trace((B−QA)T (B−QA)) = trace(BT B+AT A)−2 trace(AT QT B),

so we can minimize the left-hand side by maximizing trace(AT QT B).


(c) We compute

trace(QT BAT ) = trace(QT UΣVT ) = trace(VT QT UΣ) = trace(ZΣ)


m 
m
= σi zii ≤ σi , (12.1)
i=1 i=1

where the inequality follows from the fact that elements of an orthogonal matrix lie
between −1 and 1.
(d) Since Z = VT QT U, we have Z = I if Q = UVT .

CHALLENGE 12.4. The results are shown in Figure 12.1. The computed
results are much better than those of Challenge 2, with errors at most 10−14 and
no trouble when the pitch is close to vertical.

CHALLENGE 12.5. We compute


m 
n
B − QA − teT 2F = (B − QA)2ij − 2ti (B − QA)ij + nt2i ,
i=1 j=1

and setting the partial derivative with respect to ti to zero yields

1
n
ti = (B − QA)ij .
n j=1
72 Chapter 12. Solutions: Case Study: Achieving a Common Viewpoint

Therefore,

1
n
1
t= bj − Qaj
n j=1 n
= cB − QcA .

This very nice observation was made by Hanson and Norris [1].

CHALLENGE 12.6. The results are shown in Figure 12.2. With no pertur-
bation, the errors in the angles, the error in the matrix Q, and the RMSD are all
less than 10−15 . With perturbation in each element uniformly distributed between
−10−3 and 10−3 , the errors rise to about 10−4 .
Comparison of the SVD method with other methods can be found in [2] and
[3], although none of these authors knew that the method was due to Hanson and
Norris.

CHALLENGE 12.7.
(a) Yes. Since in this case the rank of matrix A is 1, we have two singular values
σ2 = σ3 = 0. Therefore we only need z11 = 1 in equation (12.1) and we don’t care
about the values of z22 or z33 .
(b) Degenerate cases result from unfortunate choices of the points in A and B. If
all of the points in A lie on a line or a plane, then there exist multiple solution
matrices Q. Additionally, if two singular values of the matrix BT A are nonzero but
equal, then small perturbations in the data can create large changes in the matrix
Q. See [1].
(c) A degenerate case and a case of gymbal lock are illustrated on the website.

[1] Richard J. Hanson and Michael J. Norris, “Analysis of measurements based on


the singular value decomposition,” SIAM J. Scientific and Statistical Computing,
2(3):363-373, 1981.
[2] Kenichi Kanatani, “Analysis of 3-d rotation fitting,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, 16(5):543-549, May 1994.
[3] D.W. Eggert and A. Lorusso and R.B. Fisher, “Estimating 3-d rigid body trans-
formations: a comparison of four major algorithms,” Machine Learning and Appli-
cations, 9:272-290, 1997.
73

−16 sigma = 0.000 −4 sigma = 0.001


x 10 x 10
6

4
2
angle(radians)

angle(radians)
2

0
0
−2

−4 −2

−6
0 5 10 15 20 0 5 10 15 20
sample number sample number

sigma = 0.000 sigma = 0.001


−14 −3
10 10
error

error

−15 −4
10 10

−16 −5
10 10
0 5 10 15 20 0 5 10 15 20
sample number sample number

Figure 12.2. Results of Challenge 6. The left column shows the result
with no perturbation and the right column with perturbation of order 10−3 . The top
graphs show the computed yaw (blue plusses), pitch (green circles), and roll (red
x’s), and the bottom graphs show the error in Q (blue plusses) and the error in the
rotated positions (green circles).
74 Chapter 12. Solutions: Case Study: Achieving a Common Viewpoint
Chapter 13

Solutions: Case Study:


Fitting Exponentials: An
Interest in Rates

CHALLENGE 13.1. Sample MATLAB programs to solve this problem (and


the others in this chapter) are available on the website. The results are shown in
Figures 13.1 and 13.2. Note that the shape of the w clusters are rather circular; the
sensitivity in the two components is approximately equal. This is not true of the x
clusters; they are elongated in the direction corresponding to the eigenvector of the
smallest singular value, since small changes in the data in this direction cause large
changes in the solution. The length of the x cluster (and thus the sensitivity of the
solution) is greater in Figure 13.2 because the condition number is larger.

CHALLENGE 13.2. The results are shown in Figure 13.3. One thing to note
is that the sensitivity is not caused by the conditioning of the linear parameters;
as tf inal is varied, the condition number κ(A) varies from 62 to 146, which is
quite small. But the plots dramatically illustrate the fact that a wide range of α
values produce small residuals for this problem. This is an inherent limitation in
the problem and we cannot change it. It means, though, that we need to be very
careful in computing and reporting results of exponential fitting.
One important requirement on the data is that there be a sufficiently large
number of points in the range where each of the exponential terms is large.

CHALLENGE 13.3. When the true α = [−0.3, −0.4], the computations with 4
parameters produced unreliable results: [−0.343125, −2.527345] for the first guess
and [−0.335057, −0.661983] for the second. The results for 2 parameters were some-
what better but still unreliable: [−0.328577, −0.503422] for the first guess and
[−0.327283, −0.488988] for the second. Note that all of the runs produced one
significant figure for the larger of the rate constants but had more trouble with the
smaller.

75
76 Chapter 13. Solutions: Case Study: Fitting Exponentials: An Interest in Rates

−6 −6
x 10 x 10
2 2

change in w2
2

1
change in x

1
0
0
−1

−1 −2
−1.5 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 1.5
x 10
−4 change in x1 −6
x 10
−4 change in w1 −7
x 10 x 10
2

change in w2
2

2
change in x

−2
−1 −0.5 0 0.5 1 −1.5 −1 −0.5 0 0.5 1
change in x1 −4 change in w1 −5
x 10 x 10
0.01 0.02
change in w2
change in x2

0.005 0.01

0 0

−0.005 −0.01

−0.01 −0.02
−0.01 −0.005 0 0.005 0.01 −1 −0.5 0 0.5 1
change in x1 change in w1 −3
x 10

Figure 13.1. Challenge 1, with α = −[0.3, 0.4] and η = 10−6 (top row),
η = 10 (middle row), and η = 10−2 (bottom row). On the left, we plot the two
−4

components of x − xtrue and on the right w − wtrue .

For the harder problem, when the true α = [−0.30, −0.31], the computa-
tions with 4 parameters produced [−0.304889, −2.601087] for the first guess and
[−0.304889, −2.601087] for the second. The results for 2 parameters were again
better but unreliable for the smaller rate constant: [−0.304889, −0.866521] for the
first guess and [−0.304889, −0.866521] for the second.
The residuals for each of these fits are plotted in Figures 13.4 and 13.5. From
the fact that none of the residuals from our computed solutions for the first problem
resemble white noise, we can note that the solutions are not good approximations
to the data. Troubles in the second problem are more difficult to diagnose, since
the residual looks rather white. A single exponential function gives a good approx-
imation to this data and the second term has very little effect on the residual. This
is true to a lesser extent for the first dataset.
77

−5 −5
x 10 x 10
1 2

change in w2
change in x2

0.5 1

0 0

−0.5 −1

−1 −2
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
x 10
−3 change in x1 −5
x 10
−3 change in w1 −7
x 10 x 10
2 2

change in w2
change in x2

1
1
0
0
−1

−1 −2
−1 −0.5 0 0.5 1 −1.5 −1 −0.5 0 0.5 1
change in x1 −3 change in w1 −5
x 10 x 10
0.1 0.2
change in w2
change in x2

0.05 0.1

0 0

−0.05 −0.1

−0.1 −0.2
−0.1 −0.05 0 0.05 0.1 −1 −0.5 0 0.5 1
change in x1 change in w1 −3
x 10

Figure 13.2. Challenge 1, with α = −[0.30, 0.31] and η = 10−6 (top row),
η = 10 (middle row), and η = 10−2 (bottom row). On the left, we plot the two
−4

components of x − xtrue and on the right w − wtrue .

Note that results will vary with the particular sequence of random errors
generated.

CHALLENGE 13.4. We solved this problem using MATLAB’s lsqnonlin


and the two parameters α using several initial guesses: [−1, −2], [−5, −6], [−2, −6],
[0, −6], and [−1, −3]. All runs except the fourth produced values α = [−1.6016, −2.6963]
and a residual of 0.0024011. The fourth run produced a residual of .49631. The
residuals for the five runs are shown in Figure 13.6. The four “good” residuals look
like white noise of size about 10−4 , giving some confidence in the fit.
We tested the sensitivity of the residual norm to changes in the parameters by
creating a contour plot in the neighborhood of the optimal values computed above,
78 Chapter 13. Solutions: Case Study: Fitting Exponentials: An Interest in Rates

−0.2 tfinal = 1 −0.2 tfinal = 2

−0.4 −0.4
α2

α2
−0.6 −0.6

−0.8 −0.8
−0.8 −0.6 −0.4 −0.2 −0.8 −0.6 −0.4 −0.2
α1 α1

−0.2 tfinal = 3 −0.2 tfinal = 4

−0.4 −0.4
α2

α2
−0.6 −0.6

−0.8 −0.8
−0.8 −0.6 −0.4 −0.2 −0.8 −0.6 −0.4 −0.2
α1 α1

−0.2 tfinal = 5 −0.2 tfinal = 6

−0.4 −0.4
α2

α2

−0.6 −0.6

−0.8 −0.8
−0.8 −0.6 −0.4 −0.2 −0.8 −0.6 −0.4 −0.2
α α
1 1

Figure 13.3. Challenge 2. Contour plots of the residual norm as a function


of the estimates of α for various values of tf inal . The contours marked are 10−2 ,
10−6 , and 10−10 .

shown in Figure 13.7. If the contours were square, then reporting the uncertainty
in α as ± some value would be appropriate, but as we can see, this is far from
the case. The log=-2.6 contour outlines a set of α values that changes the residual
norm by less than 1%, the log = -2.5 contour denotes a change of less than 5%,
and the log = -2.36 contour corresponds to a 10% change. The best value found
was α = [−1.601660, −2.696310], with residual norm 0.002401137 = 10−2.6196 . Our
uncertainty in the rate constants is rather large.
The “true solution,” the value used to generate the data, was α = [−1.6, −2.7]
with x1 = −x2 = 0.75, and the standard deviation of the white noise was 10−4 .
Variants of Prony’s method [1] provide alternate approaches to exponential
fitting.
Exponential fitting is a very difficult problem, even when the number of terms
n is known. It becomes even easier to be fooled when determining n is part of the
problem!
79

−3 −4
x 10 x 10
2 5

0
0
−2

−4 −5
0 2 4 6 0 2 4 6
−3 −4
x 10 x 10
1 5

0.5

0 0

−0.5

−1 −5
0 2 4 6 0 2 4 6
−4
x 10
4
Top row: 1st guess
2
Middle row: 2nd guess
0 Left: 4 parameters
Right: 2 parameters
−2
Bottom row: True parameters
−4
0 2 4 6

Figure 13.4. Challenge 3. Residuals produced for the data with true α =
[−0.3, −0.4] by minimizing with 2 or 4 parameters and two initial guesses, and the
residual provided by the true parameter values.

[1] M. R. Osborne and G. K. Smyth, ”A modified Prony algorithm for exponential


function fitting,” SIAM J. Scientific Computing, 16(1):119-138, 1995.
80 Chapter 13. Solutions: Case Study: Fitting Exponentials: An Interest in Rates

−4 −4
x 10 x 10
5 4

0 0

−2

−5 −4
0 2 4 6 0 2 4 6
−4 −4
x 10 x 10
4 4

2 2

0 0

−2 −2

−4 −4
0 2 4 6 0 2 4 6
−4
x 10
4
Top row: 1st guess
2
Middle row: 2nd guess
0 Left: 4 parameters
Right: 2 parameters
−2
Bottom row: True parameters
−4
0 2 4 6

Figure 13.5. Challenge 3. Residuals produced for the data with true α =
[−0.30, −0.31] by minimizing with 2 or 4 parameters and two initial guesses, and
the residual provided by the true parameter values.
81

−4 −4
x 10 x 10
4 4

2 2

0 0

−2 −2

−4 −4
0 2 4 6 0 2 4 6
−4
x 10
4 0.05

2 0

0 −0.05

−2 −0.1

−4 −0.15
0 2 4 6 0 2 4 6
−4
x 10
4 0.15

2 0.1

0 0.05

−2 0

−4 −0.05
0 2 4 6 0 2 4 6

Figure 13.6. Challenge 4. Residuals for the five computed solutions (resid-
ual component vs t), and, in the lower right, the data.
82 Chapter 13. Solutions: Case Study: Fitting Exponentials: An Interest in Rates

−2.66

−2.67

−2.5 −2.3
6
−2.68
−2. −2. −2.5
5 5 −2.5
55 −2.3
6
−2.69 −2.
6 −
−2. 2.55 −2.
6
2

−2. 5
5 −2.55
α

−2.
−2. 36
−2.7 36
−2.6
−2.
−2.5 −2.5 55 −2.
−2.3 5 5
−2.71 6

−2.72 −2.3
6

−2.73

−1.64 −1.63 −1.62 −1.61 −1.6 −1.59 −1.58 −1.57


α
1

Figure 13.7. Challenge 4. Contour plot of log10 of the norm of the residual
for various values of the α parameters.
Chapter 14

Solutions: Case Study:


Blind Deconvolution:
Errors, Errors Everywhere

CHALLENGE 14.1. See the posted program problem1 and 3.m. The program
is not difficult, but it is important to make sure that you do the SVD only once (at
a cost of O(mn3 )) and then form each of the trial solutions at a cost of O(n2 ). This
requires using the associative law of multiplication.
In fact, it is possible to form each solution by updating a previous one (by
adding the newly added term) at a cost of O(n), and this would be an even better
algorithm, left as an exercise.

CHALLENGE 14.2.
(a) We know that
  
n+1
K g = σ̃i ũi ṽTi ,
i=1
 
so using the formula for E r we see that

    
n
K g + E r = σ̃i ũi ṽTi .
i=1

Now, since ṽn+1 is orthogonal to ṽi for k = 1, . . . , n, it follows that


  
    f
n
1
( K g + E r ) =− σ̃i ũi ṽTi ṽn+1 = 0.
−1 ṽn+1,n+1
i=1

Note that [E, r]F = σ̃n+1 .


(b) This can be proven using the fact that A2F = trace(AT A) where trace(B) is
the trace of the matrix B, equal to the sum of its diagonal elements (or the sum of

83
84 Chapter 14. Solutions: Case Study: Blind Deconvolution: Errors

its eigenvalues). We can use the fact that trace(AB) = trace(BA). (See Challenge
12.3.)
It can also be proven just from the definition of the Frobenius norm and the
fact that Ux2 = x2 for all vectors x and orthogonal matrices U. Using this
fact, and letting ai be the ith column of A, we see that


n 
n
UA2F = Uai 22 = ai 22 = A2F .
i=1 i=1

Ti be the ith row of A,


Similarly, letting a


m 
n
AV2F = aTi V22 =
 ai 22 = A2F ,

i=1 i=1

and the conclusion follows.


(c) From the constraint
 
  f
K+E g+r = 0,
−1

we see that  
T   T f
Ũ K+E g+r ṼṼ = 0,
−1
so
(Σ̃ + Ẽ)f̃ = 0,
 
T T f
where Ẽ = Ũ [E, r]Ṽ and f̃ = Ṽ . From part (b) we know that minimizing
−1
[E, r]F is the same as minimizing ẼF .
Therefore, to solve our problem, we want to make the smallest change to Σ̃
that makes the matrix Σ̃ + Ẽ rank deficient, so that the constraint can be satisfied
by a nonzero f̃ . Changing the (n + 1, n + 1) element of Σ̃ from σ̃n+1 to 0 certainly
makes the constraint feasible (by setting the last component of f̃ nonzero and the
other components zero). Any other change gives a bigger ẼF . Thus the smallest
value of the minimization function is σ̃n+1 , and since we verified in part (a) that
our solution has this value, we are finished.
If you don’t find that argument convincing, we can be more precise. We use
a fact found in the first pointer in Chapter 2: for any matrix B and vector z for
which Bz is defined: Bz2 ≤ B2 z2 , where B2 is defined to be the largest
singular value of B. Therefore,

• B2 ≤ BF , since we can see from part (b) and the singular value decom-
position of B that the Frobenius norm of B is just the square root of the sum
of the squares of its singular values.

• If (Σ̃ + Ẽ)f̃ = 0, then Σ̃f̃ = −Ẽf̃ .


85

Retaining 12 singular values Retaining 15 singular values


4 4

3 3

2 2

1 1

0 0

−1 −1
1 2 3 4 5 1 2 3 4 5

Retaining 17 singular values Retaining 21 singular values


4 5

3 4

3
2
2
1
1
0 0

−1 −1
1 2 3 4 5 1 2 3 4 5

Figure 14.1. Computed least squares solutions (counts vs. energies) for
various values of the cutoff parameter ñ.

2
n+1 2 n+1 2
• σ̃n+1 f̃ 22 = i=1
2
σ̃n+1 f̃ i ≤ i=1 σ̃i2 f̃ i = Σ̃f̃ 22

• Therefore, σ̃n+1 f̃ 2 ≤ Σ̃f̃ 2 = Ẽf̃ 2 ≤ Ẽ2 f̃ 2 ≤ ẼF f̃ 2 , so we


conclude that ẼF ≥ σ̃n+1 , and we have found a minimizing solution.

CHALLENGE 14.3. See the program problem1 and 3.m on the website.

CHALLENGE 14.4.
Model 1: Least squares.
To estimate the variance of the error, note that in the least squares model,
the last 5 components of the right-hand side UT g cannot be zeroed by any choice
of f , so if we believe the model, we believe thast these are entirely due to error. All
86 Chapter 14. Solutions: Case Study: Blind Deconvolution: Errors

L−curve for LS

0.97
10

0.96
10

0.95
10

0.94
solution norm

10

0.93
10

0.92
10

0.91
10

0.9
10

−1
10
residual norm

Figure 14.2. The L-curve for least squares solutions.

Retaining 12 singular values Retaining 15 singular values


4 4

3 3

2 2

1 1

0 0

−1 −1
1 2 3 4 5 1 2 3 4 5

Retaining 17 singular values Retaining 21 singular values


4 5

4
3
3
2 2

1 1

0
0
−1

−1 −2
1 2 3 4 5 1 2 3 4 5

Figure 14.3. Computed total least squares solutions (counts vs. energies)
for various values of the cutoff parameter ñ.
87

L−curve for TLS

0.97
10

0.96
10

0.95
10
solution norm

0.94
10

0.93
10

0.92
10

0.91
10

0.9
10

−2 −1
10 10
residual norm

Figure 14.4. The L-curve for total least squares solutions.

other components should have at least some data in addition to noise. Therefore,
estimate the variance using the last 5 to get δ 2 = 1.2349 × 10−4 .
The condition number of the matrix, the ratio of largest to smallest singular
value, is 61.8455. This is a well-conditioned matrix! Most spectroscopy problems
have a very ill-conditioned matrix. (An ill-conditioned one would have a condition
number of 103 or more.) This is a clue that there is probably error in the matrix,
moving the small singular values away from zero.
We try various choices of ñ, the number of singular values retained, and show
the results in Figure 14.1
√ (blue solid curves). The discrepancy principle predicts the
residual norm to be δ m = 0.0577. This is most closely matched by retaining 21
singular values, which gives 7 peaks, contradicting the given information that there
are at most 5 peaks. It also produces some rather large magnitude negative peaks,
and we know that counts need to be nonnegative. So the least squares model does
not seem to fit the data well.
An alternate way to pick ñ is to use the L-curve. This is a plot of the log of
the solution norm vs the log of the residual norm. It is called an L-curve because its
shape often resembles that of the letter L. What we really want is a small residual
and a solution norm that is not unreasonably big. So it has been proposed that we
take the value of ñ at the corner of the L-curve, since if we take a smaller ñ, the
residual norm increases fast, and if we take a larger one, the solution norm increases
fast. This plot, shown in Figure 14.2, advises that we should retain 15-17 singular
values, and referring to Figure 14.1, this yields 4 peaks, consistent with our given
information.
(The theoretical properties of the L-curve, as well as any other method that
does not require the variance to be given in advance, are not good, but often this
88 Chapter 14. Solutions: Case Study: Blind Deconvolution: Errors

method is more robust to errors in assumptions about the model, such as underes-
timating the variance or not accounting for errors in the matrix.)
An alternate heuristic is to look for a value of ñ that makes the residual look
most like white noise, but since our error is not normally distributed, this heuristic
doesn’t have much meaning for our problem.
An excellent way to approach this problem is to generate your own test data,
for which you know the true solution, and use it to gain insight into the choice of
ñ.
Model 2: TLS.
Sample solutions are shown in Figure 14.3. The discrepancy principle doesn’t
give much insight for TLS, so we use more heuristic methods, giving us even less
confidence in our choice.
For example, from Figure 14.4 we see that the L-curve corner occurs when 15
singular values are retained, giving a solution that looks very much like the L-curve
least squares solution. Because the number of peaks is reasonable, and because
there are only a small number of negative values in the solution, and these have
small magnitude, we might accept this solution.
Now we need to extract the energies and estimated counts for the 4 types of
particles. I have normalized so that the count for the lowest energy peak is 1.

The Computed Estimate to Energy Levels and Counts


bin centers 2.55 3.25 3.55 3.85
relative counts 1.00 1.39 1.91 0.90

A spectroscopist would actually estimate the counts by taking the integral


under each of the 4 peaks, and estimate the energy by the centroid of the peak, but
this is difficult since three of the peaks are not well separated.
The Truth.
The program used to generate the data is posted. The variance of the error
is 10−4 .
The True Energy Levels and Counts
energy 2.54 3.25 3.53 3.85
relative counts 1 1.5 2 1

So, despite all of the errors, our computed solution estimates the energy levels
to 2 digits and the relative counts to within 10%.
Chapter 15

Solutions: Case Study:


Blind Deconvolution: A
Matter of Norm

CHALLENGE 15.1. Writing out the expressions Ef and F e component by


component, we find that F is a Toeplitz matrix of size m × (m + n − 1) with first
row equal to [fn , fn−1 , . . . , f1 , 0, . . . , 0] and first column equal to [fn , 0, . . . , 0].

√ √ √ √
CHALLENGE 15.2. Let d = [1, 2, . . . , n, . . . , n, n − 1, . . . , 1] be a vector
of length m + n − 1 and let D be the diagonal matrix with entries d. Then
1 1
e, f ) ≡
F ( E2F + r22
2 2
1  2 2 1 2
m+n−1 m
= di ei + r
2 i=1 2 i=1 i

1  2 2 1 
m+n−1 m n
= di ei + (gi − (kij + eij )fj )2 .
2 i=1 2 i=1 j=1

We need the gradient and Hessian matrix of this function. Noting that Eij = en+i−j ,
and letting δij = 0 if i = j and 1 if i = j, we compute

e, f )
∂F ( m
= d2 e − ri fn+i− = (D2 
e − FT r) ,
∂e i=1

e, f )
∂F ( m
=− ri (ki + en+i− ) = −((K + E)T r) ,
∂f i=1

∂ 2 F (
e, f ) m
= δ,q d2 + fn+i− fn+i−q = (D2 + FT F)q ,
∂e ∂
eq i=1

∂ 2 F (
e, f ) m
= r+q + (kiq + eiq )fn+i− = (R + (K + E)T F)q ,
∂e ∂fq i=1

89
90 Chapter 15. Solutions: Case Study: Blind Deconvolution: A Matter of Norm

e, f ) 
m
∂F (
= (ki + ei )(kiq + eiq ) = ((K + E)T (K + E))q ,
∂f ∂fq i=1

where out-of-range entries in summations are assumed to be zero and R is a matrix


whose nonzero entries are components of r. So
 2 
D e − FT r
g = ∇F (e, f ) = ,
(K + E)T r
 
D2 + FT F RT + FT (K + E)
H(e, f ) = .
R + (K + E)T F (K + E)T (K + E)

e, f )p = −g.
The Newton direction is the solution to H(

CHALLENGE 15.3. The least squares problem is of the form

min Ax − b2 ,


x
where  
e
Δ
x=
Δf
and A and b are the given matrix and vector. So to minimize Ax − b2 = (Ax −
b)T (Ax − b), we set the derivative equal to zero, obtaining

AT Ax − AT b = 0.

The solution to this equation is a minimizer if the second derivative AT A is positive


definite (which requires that A have full column rank). Returning to our original
notation, we obtain
 T     T  
F K+E F K+E e
Δ F K+E −r
=− ,
D 0 D 0 Δf D 0 D
e

and this matches the expression Hp = −g from Challenge 2 except that the matrix
R (which should be small if the model is good) is omitted.

CHALLENGE 15.4. See the MATLAB program posted on the website.

CHALLENGE 15.5.
91

e, Δf , let
(a) Given any Δ
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
σ̄1  F K+E   −r 

⎣ σ̄2 ⎦ = ⎣ D 0
e
⎦ Δ e ⎦ .
+ ⎣ D
 Δf
σ̄3  0 λI λf 

Then Δ e, Δf , σ̄1 , σ̄2 , and σ̄3 form a feasible solution to the linear programming
problem, and
       
m q n
 F K+E Δe −r 
σ̄ = σ̄1i + σ̄2i + 
σ̄3i =  +  .
D 0 Δf De p
i=1 i=1 i=1

Therefore, a solution to the linear programming problem minimizes the norm, and
a minimizer of the norm is a solution to the linear programming problem, so the
two are equivalent.
(b) By similar reasoning, we obtain

min σ̄
b ,Δf ,σ̄
Δe

subject to −σ̄1 ≤ FΔ


e + (K + E)Δf − r ≤ σ̄1
−σ̄1 ≤ DΔ e + D
e ≤ σ̄1
−σ̄3 1 ≤ λΔf + λf ≤ σ̄3 1

where 1 is a column vector with each entry equal to 1, and of dimension m in the
first two inequalities, q in the second two, and n in the last two.

CHALLENGE 15.6. See the MATLAB program posted on the website.

CHALLENGE 15.7. Results for various values of λ are shown in Figures 15.1
and 15.2. The estimated counts are summarized in the following table:

The Computed Estimate to Energy Levels and Counts


bin centers 2.55 3.25 3.55 3.85
True counts 1.00 1.50 2.00 1.00
Least Squares 1.00 1.39 1.91 0.90
STLS 1.00 1.20 1.59 0.64
STLN, 1-Norm 1.00 0.96 1.36 0.60

STLN using the ∞-norm produced counts that were sometimes quite nega-
tive; nonnegativity constraints could be added to improve the results. All of the
structured algorithms had a linear convergence rate, rather than the quadratic rate
92 Chapter 15. Solutions: Case Study: Blind Deconvolution: A Matter of Norm

2−norm, lambda = 0.02 2−norm, lambda = 0.06


4 4

3 3

2 2

1 1

0 0

−1 −1
1 2 3 4 5 1 2 3 4 5

2−norm, lambda = 0.16 2−norm, lambda = 0.40


4 4

3 3

2 2

1 1

0 0

−1 −1
1 2 3 4 5 1 2 3 4 5

Figure 15.1. Results from the Structured Total Least Squares algorithm
for various values of λ.
Infinity norm, lambda = 0.00 Infinity norm, lambda = 0.06
6 4

5 3

4 2

3 1

2 0

1 −1

0 −2
1 2 3 4 5 1 2 3 4 5

1−norm, lambda = 0.02 1−norm, lambda = 0.06


4 4

3 3

2 2

1 1

0 0
1 2 3 4 5 1 2 3 4 5

Figure 15.2. Results from the Structured Total Least Norm algorithm,
using the 1-norm and the ∞-norm, for various values of λ.
93

expected from Newton’s method, because the residual r in this problem is large, so
the approximate Newton direction is not very accurate.
Least squares works best on this dataset, because the Toeplitz assumption
used by the structured algorithms STLS and STLN is violated by the way the
data was generated. It is worthwhile to generate a new data set, satisfying this
assumption, and experiment further.
94 Chapter 15. Solutions: Case Study: Blind Deconvolution: A Matter of Norm
Unit IV

SOLUTIONS: Monte Carlo


Computations

95
Chapter 16

Solutions: Monte Carlo


Principles

CHALLENGE 16.1. The mean is the sum of the samples divided by the number
of samples: μ6 = 24/6 = 4. The variance is
1  19
σ62 = (1 − 4)2 + (2 − 4)2 + (5 − 4)2 + (8 − 4)2 + (6 − 4)2 + (2 − 4)2 = .
6 3

CHALLENGE 16.2. The probability of drawing each card is 1/10, so the mean
of the distribution is
1
μ= (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 10 + 10) = 5.6.
10
The variance is
1 
σ2 = (1 − 5.6)2 + (2 − 5.6)2 + (3 − 5.6)2 + (4 − 5.6)2 + (5 − 5.6)2 + (6 − 5.6)2
10

+(7 − 5.6)2 + (8 − 5.6)2 + (10 − 5.6)2 + (10 − 5.6)2 = 9.04 .

CHALLENGE 16.3. Clearly f (x) ≥ 0, and


 1
1
3x2 dx = x3 0 = 1.
0

We calculate  1
1
3x4  3
μ= x (3x2 )dx =  =
0 4 0 4
and  1
2
σ = (x − 3/4)2 (3x2 )dx = 0.0375 .
0

97
98 Chapter 16. Solutions: Monte Carlo Principles

CHALLENGE 16.4.

function y = strange_random()

% We subtract 2 from the average sample value for randmy, to make the mean 0.
% Then we divide by the standard deviation, to make the resulting variance 1.

y = sqrt(1000)*(sum(randmy(1000))/1000 - 2)/sqrt(5);

CHALLENGE 16.5. In this program, z is a sample from a uniform distribution


on [0,1] and y is a sample from the desired distribution.
z = rand(1);
if (z < .6) then
y = 0;
else
y = 1;
end
Chapter 17

Case Study: Monte-Carlo


Minimization and
Counting: One, Two, . . . ,
Too Many

(coauthored by Isabel Beichl and Francis Sullivan)

CHALLENGE 17.1. The programs myfmin.m and myfminL.m on the website


solve this problem but do not make the graph.

CHALLENGE 17.2. (Partial Solution) The program sim anneal.m on the


website is one implementation of simulated annealing, and it can be run using
problem1 and 2.m. To finish the problem, experiment with the program. Be sure
to measure reliability as well as cost, and run multiple experiments to account for
the fact that the method is randomized. Also comment on the number of runs that
converge to x = 1.7922, which is a local minimizer with a function value not much
worse than the global minimizer.

CHALLENGE 17.3.
(a) Experiments with MATLAB’s travel program show that it works well for up
to 50 cities but, as is to be expected, slows down for larger sets. It’s interesting and
important to note that the solution is always a tour that does not cross itself. We’ll
return to this point shortly.
(b) Figures 17.1–17.3 show the results of simulated annealing for 100 random loca-
tions with temperature T = 1, 0.1, 0.01, where “score” is the length of the tour. The
actual tours for T = 0.1 and T = 0.01 are shown in Figures 17.4 and 17.5. Note
that the result for 0.01 looks pretty good but not that much better than the output

99
100 Chapter 17. Solutions: Case Study: Monte-Carlo Minimization and Counting

TSP by simulated annealing, T=1


58

56

54

52

50
score

48

46

44

42

40
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
trial

Figure 17.1. TSP by simulated annealing, T = 1.

TSP by simulated annealing, T=0.1


55

50

45

40
score

35

30

25

20

15
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
trial

Figure 17.2. TSP by simulated annealing, T = 0.1.

for T = 0.1. However, the T = 1 result looks completely random and gets nowhere
near a low cost tour. This demonstrates that lowering the temperature really does
give a better approximation. However, because the T = 0.01 tour crosses itself, we
know that it’s still not the true solution. And we don’t know the true minimum
score (distance) or an algorithm for setting and changing T . Figuring out how to
101

TSP by simulated annealing, T=0.01


60

55

50

45

40
score

35

30

25

20

15
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
trial

Figure 17.3. TSP by simulated annealing, T = 0.01.

Tour after 10000 trials, T=0.1


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 17.4. Tour produced for TSP by simulated annealing, T = 0.1.

vary T is called determining the cooling schedule. One generally wants to use a
lower value of T as the solution is approached. The idea is to avoid a move that
would bounce away from the solution when we’re almost there.
How one designs a cooling schedule depends on analysis of the problem at
hand. Some general techniques have been proposed but cooling schedules are still
an active research topic.
102 Chapter 17. Solutions: Case Study: Monte-Carlo Minimization and Counting

Tour after 100000 trials, T=0.01


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 17.5. Tour produced for TSP by simulated annealing, T = 0.01

Late stage score improvement, 100 cities, T0=0.0215, 86 temperature changes

11.7

11.6

11.5

11.4

11.3
Score

11.2

11.1

11

10.9

10.8

10.7

0 1 2 3 4 5 6 7 8 9 10
Trial number x 10
4

Figure 17.6. TSP scores by simulated annealing, T = 0.0215, logarithmic


cooling schedule.

The most popular general approach setting a cooling schedule is to change


T whenever a proposed move is accepted. Suppose that the initial temperature is
T0 and a proposed move is finally accepted after k trials. Then the temperature
is reset to T0 / log(k). The idea behind the use of log(k) in the denominator is
that the number of trials k required before generating a random number less than
103

Very late stage score, 100 cities, T0=0.0046, 77 temperature changes


11.5

11.4

11.3

11.2

11.1
Score

11

10.9

10.8

10.7

0 1 2 3 4 5 6 7 8 9 10
Trial number 4
x 10

Figure 17.7. TSP scores by simulated annealing, T = 0.0046, logarithmic


cooling schedule.

exp(−1/T ) is exp(1/T ) on average, and so 1/T should look something like log(k).
This is the famous logarithmic cooling schedule [1].
Figures 17.6, 17.7 and 17.8 illustrate application of simulated annealing with
a logarithmic cooling schedule to a TSP with 100 random locations. The first two
graphs show how the score evolves over 100,000 trials at a low temperature. Note
that not many proposed moves that increase the score are accepted and that the
score does not improve very much. The last figure is a picture of the best tour
obtained. Because it crosses itself, it’s not the optimal tour. Getting that requires
more computation and/or more sophisticated cooling schedules. Solving TSP for
100 random locations is really quite difficult!
If you think this use of simulated annealing to attack the TSP seems quite
informal and heuristic rather than analytic, you’re right. In fact, some have argued
that simulated annealing is not really an optimization method but rather a collec-
tion of heuristic techniques that help in some cases. However, there is an important,
recently discovered connection between the central idea of simulated annealing and
use of Monte Carlo to approximate solutions to NP-hard problems, including de-
termining the volume of a bounded convex region K in Rn . If n is large, finding
V ol(K) can be a very hard problem. The most well-developed approach is to define
a sequence of convex sets:

K0 ⊂ K 1 ⊂ K 2 ⊂ . . . ⊂ K m = K

where V ol(K0 ) is easy to evaluate. For each i, perform a random walk in Ki


and count how many walkers happen to be in Ki−1 . This gives an estimate of
V ol(Ki−1 )/V ol(Ki ) and the product of these estimates for all i is an estimate for
104 Chapter 17. Solutions: Case Study: Monte-Carlo Minimization and Counting

Late stage TSP for 100 cities, log cooling schedule


1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 17.8. Tour produced for TSP by simulated annealing, logarithmic


cooling schedule.

V ol(K).
The connection to simulated annealing comes in a couple of ways. For one
thing, the random walk can done using a Metropolis algorithm with a different
rejection rate (i.e. a different temperature) for each i. A more recent idea is to
recognize that the volume is the integral of the characteristic function of the set K
so we can try to approach this integral by integrating a sequence of other, easier
functions instead. In particular we can embed the problem in Rn+1 by adding
an extra coefficient x0 to the points in Rn and then choose functions f0 < f1 <
f2 < ...fm where fm is the characteristic function of K but the others look like
exp(−x0 /T ) in the extra coefficient, x0 .
Another example of use of the idea of simulated annealing is the KRS algo-
rithm used in the next challenge. Those who have become fascinated by this subject
might want to try to identify the “temperature” in this case in order to understand
why KRS is a form of simulated annealing.

CHALLENGE 17.4.

(a) Here are some explicit counts, some done by hand and some by latticecount.m
by Thomas DuBois.
105

3500

KRS
Explicit count
3000

2500

2000
C(k)

1500

1000

500

0
1 2 3 4 5 6 7 8 9
k

Figure 17.9. Counts obtained by the KRS algorithm and by explicit count-
ing for a 4 × 4 lattice. For KRS we set the probabilities to 0.5, the number of steps
between records to  = 4, and the total number of steps to 105 . Because  was so
small, the samples were highly correlated, but the estimates are still quite good.

C(0) C(1) C(2) C(3) C(4) C(5) C(6) C(7) C(8)


2×2 1 4 2
2×3 1 7 11 3
3×3 1 12 44 56 18
4×4 1 24 224 1044 2593 3388 2150 552 36
6×6 1 60 1622 26172 281514 2135356 11785382 48145820 146702793

(b) One of the more interesting programming issues in this problem is the data
structure.
• If we keep track of each edge of the lattice, then we need to enumerate rules
for deciding whether two edges can be covered at the same time. For example,
in our 2 × 2 lattice, we cannot simultaneously have a dimer on a vertical edge
and one on a horizontal edge.
• If we keep track of each node of the lattice, then we need to know whether it
is occupied by a dimer, so our first idea might be to represent a monomer by
a zero and a dimer by a 1. But we need more information – whether its dimer
partner is above, below, left, or right. Without this additional information,
the array  
1 1
1 1
tells us that the 2 × 2 lattice has two dimers on it, but we can’t tell whether
they are horizontal or vertical.
106 Chapter 17. Solutions: Case Study: Monte-Carlo Minimization and Counting

• A third alternative is to keep track of both edges and nodes. Think of it as a


matching problem: each node can be matched with any of its four neighbors
in a dimer, or it can be a monomer. We maintain an array of nodes, where
the jth value is 0 if the node is a monomer, and equal to k, if (k, j) is a dimer.
We store the edges in an n2 × 4 array, where the row index indicates the node
at the beginning of the edge, and the nonzero entries in the row record the
indices of the neighboring nodes. Thus, each physical edge has two entries in
the array (in rows corresponding to its two nodes), and a few of the entries at
the edges are 0, since some nodes have fewer than 4 edges. We can generate a
KRS change by picking an edge from this array, and we update the node array
after we decide whether an addition, deletion, or swap should be considered.

The program KRS.m, by Sungwoo Park, on the website, is an efficient imple-


mentation of the second alternative. Sample results are shown in Figure 17.9.
Please refer to the original paper [2] for information on how to set the parame-
ters to KRS. Kenyon, Randall, and Sinclair showed that the algorithm samples well
if both the number of steps and the interval between records are very large, but in
practice the algorithm is considerably less sensitive than the analysis predicts.

[1] D. Bertsimas and J. Tsitsiklis, “Simulated annealing,” Statistical Science 8(1):10-


15, 1993.
[2] C. Kenyon, D. Randall, and A. Sinclair, “Approximating the number of monomer-
dimer coverings of a lattice,” J. Stat. Phys. 83(3-4):637-659, 1996.
Chapter 18

Solutions: Case Study:


Multidimensional
Integration: Partition and
Conquer

CHALLENGE 18.1. A sample program is given on the website. Method 2 gives


somewhat better results, since it averages the function values themselves rather than
just using them to decide whether a point is inside or outside the region. Three
digit accuracy is achieved for 100000 points in Method 1 and for 1000 and 100000 √
points for Method 2. The convergence √ rate for Method 1 is consistent with 1/ n,
since the product of the error with n is approximately constant for large n, but
for Method 2, the results are somewhat more variable. MATLAB’s function quad
uses 13 function evaluations to get three digit accuracy.
Clearly, for low dimensional integration of smooth functions, Monte Carlo
methods are not the methods of choice! Their value becomes apparent only when
the dimension d is large so that methods like quad would be forced to use a lot of
function evaluations.

CHALLENGE 18.2. See challenge2.m on the website.

CHALLENGE 18.3. A sample program is available on the website. Importance


sampling produces better estimates at lower cost: see the answer to Challenge 4 for
detailed results.

CHALLENGE 18.4. The results are shown in Figure 18.1. The pseudo-random
points from MATLAB’s rand are designed to have good statistical properties, but
they leave large gaps in space. The quasi-random points are both more predictable
and more evenly distributed. They tend to lie on diagonal lines, with longer strings
as the coordinate number increases. Other algorithms for generating quasi-random
points avoid this defect.

107
108 Chapter 18. Solutions: Case Study: Multidimensional Integration

Pseudorandom samples First two quasirandom coordinates


1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Second two quasirandom coordinates Third two quasirandom coordinates


1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 18.1. 500 pseudo-random points (upper left). 500 quasi-random


points: coordinates 1-2 (upper right), coordinates 3-4 (lower left), coordinates 5-6
(lower right).

CHALLENGE 18.5. The following table gives the absolute value of the errors
in the estimates from each of the four methods.
n Method 1 Method 2 Method 3 Method 4
10 3.65e-03 1.11e-02 4.67e-03 1.50e-02
100 1.35e-03 3.38e-03 1.02e-03 2.49e-03
1000 2.85e-03 2.38e-04 1.22e-05 3.00e-04
10000 1.57e-03 1.14e-03 1.75e-04 4.10e-05
100000 4.97e-04 1.72e-04 1.44e-05 5.14e-06

Convergence rates can be determined from the slope of a straight line fit to


the logs of each set of errors.
The best results were obtained by Method 4, using quasi-random numbers in
Method 2. Method 3, importance sampling, was also quite good.

CHALLENGE 18.6. No answer provided.


Chapter 19

Solutions: Case Study:


Models of Infection:
Person to Person

CHALLENGE 19.1. See the solution to Challenge 3.

CHALLENGE 19.2. See the solution to Challenge 3.

CHALLENGE 19.3. The results of a simulation of each of these three models


are given in Figures 19.1-19.3. The MATLAB program that generated these results
can be found on the website. In general, mobility increases the infection rate and
vaccination decreases it dramatically. In our sample runs, the infection peaks around
day 18 with no mobility, and around day 15 when patients are moved. Individual
runs may vary.

CHALLENGE 19.4. The histograms for ν = 0, 0.1, 0.2, and 0.3 are shown
in Figure 19.4. The mean percent of the population infected drops from 73.6% for
ν = 0 (with a variance of 4.5%), to 4.1% for ν = 0.3 (with a variance of only 0.06%).

CHALLENGE 19.5. From Challenge 4, we know that a very low vaccination rate
is sufficient to dramatically reduce the infection rate: somewhat less than ν = 0.1.
But using a nonlinear equation solver on a noisy function is quite dangerous; it
is easily fooled by outliers, and by changing the starting guess, you can make it
produce almost any value.

109
110 Chapter 19. Solutions: Case Study: Models of Infection: Person to Person

Disease Status with tau = 0.200000


1
Infected
Susceptible
0.9 Recovered

0.8

0.7
Proportion of individuals

0.6

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40
day

Figure 19.1. Proportion of individuals infected by day in a 10 × 10 grid


of hospital beds, with infection rate τ = 0.2.

CHALLENGE 19.6.

(a) The transition probabilities are given in Figure 19.5, and the matrix is given in
the MATLAB program found on the website.

(b) Ae1 is equal to column 1 of A, which contains the probabilities of transitioning


from state 1 to any of the other states. More generally, if p is a vector of probabilities
of initially being in each of the states, then Ap is the vector of probabilities of being
111

Disease Status with tau = 0.200000, delta = 0.010000


1
Infected
Susceptible
0.9 Recovered

0.8

0.7
Proportion of individuals

0.6

0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35
day

Figure 19.2. Proportion of individuals infected by day in a 10 × 10 grid


of hospital beds, with infection rate τ = 0.2 and mobility rate δ = 0.01.

in them at time 1.
(c) If A is a dense matrix, then computing A(Ae1 ) costs 2s2 multiplications, where
s is the number of states. Computing (A2 )e1 costs s3 + s2 multiplications, and this
is quite a bit more when s is large. (We should also take advantage of the zeros in
A and avoid multiplying by them. If we do this for our matrix, A has 21 nonzero
elements while A2 has 23, so again it takes more multiplications to form (A2 )e1
than to form A(Ae1 ). We should also note that the product Ae1 is just the first
112 Chapter 19. Solutions: Case Study: Models of Infection: Person to Person

Disease Status with tau = 0.200000, delta = 0.010000, nu= 0.100000


0.9
Infected
Susceptible
0.8 Recovered
Vaccinated

0.7

0.6
Proportion of individuals

0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10 12 14
day

Figure 19.3. Proportion of individuals infected by day in a 10 × 10 grid


of hospital beds, with infection rate τ = 0.2, mobility rate δ = 0.01, and vaccination
rate ν = 0.1.

column of A, so it could be computed without multiplications.)


(d) In this experiment, it took 280 Monte Carlo simulations to get 2 digits of
accuracy. Asking for 3 digits raises the number of trials into the ten thousands,
since the variance is high relative to this threshold.
(e) There is only one path to state Q, corresponding to a single infection, and the
product of the probabilities of transitions along this path are (1 − τ )4 . There are
113

Histogram of infection rate for nu = 0.000000 Histogram of infection rate for nu = 0.100000
400 200

300 150
Number of trials

Number of trials
200 100

100 50

0 0
0 20 40 60 80 100 0 10 20 30 40
Percent infected Percent infected

Histogram of infection rate for nu = 0.200000 Histogram of infection rate for nu = 0.300000
250 350

300
200
Number of trials

Number of trials

250
150 200

100 150

100
50
50

0 0
0 5 10 15 20 0 5 10 15 20
Percent infected Percent infected

Figure 19.4. Results of 1000 trials for a 10 × 10 grid of hospital beds, with
infection rate τ = 0.2 and vaccination rate ν, with ν varying.

2 paths to state S, and summing the product of the probabilities along the paths
gives (τ (1 − τ )2 + τ (1 − τ )3 ). The probability of reaching state P is the same, so
the probability of 2 infections is twice this number. Similarly, the probability of
reaching state R, corresponding to 3 infections, is τ 2 + 2τ 2 (1 − τ ) + (1 − τ )2 τ 2 . The
probabilities of reaching states P, Q, R, and S sum to one, since these are the only
possible outcomes.
114

1 N Q
1,-1,1 0,-1,0 1
τ2 (1-τ)2

epidemic.
1 O τ (1-τ) B τ (1-τ) M 1
0,-1,1 0,2,0 1,-1,0

(1-τ)2
A
0,1,0
τ (1-τ) τ2 τ (1-τ)
1

1 E C H
0,2,1 1,2,1 1,2,0
1-τ 1-τ
τ 1 τ

K F D I L
0,-1,2 1,-1,2 2,-1,2 2,-1,1 2,-1,0

1 1 1 1 1

S G 1 R 1 J P
0,-1,-1 2,-1,-1 -1,-1,-1 -1,-1,2 -1,-1,0

1 1 1
Chapter 19. Solutions: Case Study: Models of Infection: Person to Person

Figure 19.5. Transition probabilities for the Markov chain model of the
115

CHALLENGE 19.7.

• The probabilities are clearly nonnegative and sum to 1.


• Note that the jth component of eT A is the sum of the elements in column j,
and this is 1, so eT A = eT .

• Therefore, eT A = 1eT , and this means that the vector eT is unchanged in


direction when multiplied on the right by A. This is the definition of a left
eigenvector of A, and the eigenvalue is 1.

• Apply the Gerschgorin circle theorem to AT , which has the same eigenvalues
as A. If the main diagonal element of AT is 0 < α < 1, then the off-diagonal
elements are nonnegative and sum to 1 − α. Therefore, the Gerschgorin circle
is centered at α with radius 1 − α. This circle touches the unit circle at
the point (1, 0) but lies inside of it. The eigenvalues lie in the union of the
Gerschgorin circles, so all eigenvalues lie inside the unit circle.

• If A were irreducible then the eigenvalue at 1 would be simple; see, for exam-
ple, [1].

• Let the eigensystem of A be defined by Auj = λj uj , and let


n
e1 = αj uj ,
j=1

where u1 , . . . , u4 are a basis for the eigenspace corresponding to the eigenvalue


1. Then verify that
n
Ak e1 = αj λkj uj .
j=1

Since λkj → 0 as k → ∞ except for the eigenvalue 1, we see that

Ak e1 → α1 u1 + α2 u2 + α3 u3 + α4 u4 .

• Therefore, we converge to a multiple of the stationary vector.

[1] Richard Varga, Matrix Iterative Analysis, Prentice Hall, Englewood Cliffs, NJ,
1962.
116 Chapter 19. Solutions: Case Study: Models of Infection: Person to Person
Unit V

SOLUTIONS: Solution of
Differential Equations

117
Chapter 20

Solutions: Solution of
Ordinary Differential
Equations

CHALLENGE 20.1. (Partial Solution) See the programs on the website.

CHALLENGE 20.2. We need the real parts of all eigenvalues to be negative.


This means 4 − t2 < 0 and −t < 0, so the equation is stable when t > 2.

CHALLENGE 20.3. The polynomial is

p(t) = yn + (t − tn )fn ,

so we compute
p(tn+1 ) = yn + (tn+1 − tn )fn .

CHALLENGE 20.4. The true solution is y(t) = t. We compute:


tn Euler approximation
0 0
0.1 0 + 1 ∗ .1 = 0.1
0.2 .1 + 1 ∗ .1 = 0.2
... ...
1.0 .9 + 1 ∗ .1 = 1.0
Euler’s method is exact for this problem.

119
120 Chapter 20. Solutions: Solution of Ordinary Differential Equations

CHALLENGE 20.5. Since f (t, y) = −y, the backward Euler formula is

yn+1 = yn + hn f (tn+1 , yn+1 ) = yn − hn yn+1 .

Therefore,
(1 + hn )yn+1 = yn ,
so
1
yn+1 = yn .
1 + hn
We compute:
tn yn y(tn )
0 1 1
0.1 1/1.1 = 0.9091 0.9048
0.2 (1/1.1)2 = 0.8264 0.8187
0.3 (1/1.1)3 = 0.7513 0.7408

CHALLENGE 20.6. Rearranging, we get

(1 + ha/2)yn+1 = (1 − ha/2)yn ,

so
1 − ha/2
yn+1 = yn .
1 + ha/2
Apply Taylor series expansion to the differential equation to get
h2 
y(tn+1 ) = y(tn ) + hy  (tn ) + y (ξ)
2
h2
= y(tn ) − hay(tn ) + y  (ξ)
2
h2 
= (1 − ha)y(tn ) + y (ξ),
2
where ξ is a point between tn and tn+1 . Let en = yn − y(tn ), and subtract our two
expressions to obtain
1 − ha/2 1 − ha/2 h2
en+1 = en − (1 − ha − )y(t) − y  (ξ)
1 + ha/2 1 + ha/2 2

Now, since
1 − ha/2 h2 a2 h2
(1 − ha − )y(t) = − y(t) = − y  (t),
1 + ha/2 2 + ha 2 + ha
we see that
1 − ha/2 h2 h2
en+1 = en + y  (t) + y  (ξ).
1 + ha/2 2 + ha 2
121

The last two terms can be combined and represent an effective local error. Therefore,
the global error is magnified if |(1 − ha/2)/(1 + ha/2)| > 1. Conversely, the method
is stable when  
 1 − ha/2 
 
 1 + ha/2  < 1,
which holds for all h > 0.

CHALLENGE 20.7. Recall that Euler’s method is

yn+1 = yn + hf (tn , yn ),

and backward Euler is


yn+1 = yn + hf (tn+1 , yn+1 ).

P : y = 1 + .1(12 ) = 1.1,
E : f = (1.1)2 − .5 = 0.71,
C : y = 1 + .1 ∗ .71 = 1.071,
E : f = (1.071)2 − 0.5.

The predicted value is quite close to the corrected value; this is an indication
that the stepsize is small enough to obtain some accuracy in the computed solution.

CHALLENGE 20.8. f (t, y) = 10y 2 − 20.


P: y P = y(0) + .1f (0, y(0)) = 1 + .1(−10) = 0.
E: f P = f (.1, y P ) = 10 ∗ 0 − 20 = −20.
C: y C = y(0) + .1f P = 1 − 2 = −1.
E: f C = f (.1, y C ) = 10 − 20 = −10.

Note that the predicted and corrected values are quite different, so neither can be
trusted; we should reduce the stepsize and recompute. The true value is y(.1) ≈
−0.69.

CHALLENGE 20.9. Suppose y  is the result of the predictor and ỹ is the result
of the corrector. Assuming ỹ is much more accurate than y ,


y − ytrue  ≈ 
y − ỹ ≡ δ

If δ > τ , reduce h and retake the step:


122 Chapter 20. Solutions: Solution of Ordinary Differential Equations

• perhaps h = h/2.

• perhaps h = h/2p where, since we need δ2−5p ≈ τ , we define p = (log δ −


log τ )/(5 log 2).

CHALLENGE 20.10. We know that if our old values are correct,

3h4 (4)
P
yn+1 − y(tn+1 ) = y (η).
8
h4
C
yn+1 − y(tn+1 ) = − y (4) (ν).
24
Subtracting, we obtain

3h4 (4) h4
P
yn+1 − yn+1
C
= y (η) − (− y (4) (ν)),
8 24
P C
where η, ν are in the interval containing yn+1 , yn+1 , and the true value. Since 3/8 +
1/24 = 10/24, the error in the corrector can be estimated as  = |yn+1 P
− yn+1
C
|/10.
Now, if  > τ , we might reduce h by a factor of 2 and retake the step. If  << τ ,
we might double h in preparation for the next step (expecting that the local error
might increase by a factor of 24 ).

CHALLENGE 20.11. No answer provided.

CHALLENGE 20.12.

y (t) = D∇y H(y)(t)


⎡ ⎤⎡ ⎤
0 1 0 0 y1 (t) + y1 (t) y22 (t) y32 (t) y42 (t)
⎢ −1 0 0 0 ⎥ ⎢ y2 (t) + y12 (t) y2 (t) y32 (t) y42 (t) ⎥
=⎢⎣ 0 0
⎥⎢ ⎥
0 1 ⎦ ⎣ y3 (t) + y12 (t) y22 (t) y3 (t) y42 (t) ⎦
0 0 −1 0 y4 (t) + y12 (t) y22 (t) y32 (t) y4 (t)
⎡ ⎤
y2 (t) + y12 (t) y2 (t) y32 (t) y42 (t)
⎢ −(y1 (t) + y1 (t) y22 (t) y32 (t) y42 (t)) ⎥
=⎢⎣
⎥.
y4 (t) + y12 (t) y22 (t) y32 (t) y4 (t) ⎦
−(y3 (t) + y12 (t) y22 (t) y3 (t) y42 (t))
123

CHALLENGE 20.13.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
u(t) 1 0 0 7 −6 0 4t
y(t) = ⎣ v(t) ⎦ , M(t) = ⎣ 0 1 0 ⎦ , A(t) = ⎣ 4 −2 0 ⎦ , f (t) = ⎣ 0 ⎦ .
w(t) 0 0 0 1 1 1 −24

CHALLENGE 20.14. We calculate


     
u(t) Inu 0 C B
y(t) = , M= , A= , f (t) = 0.
p(t) 0 0 BT 0

Therefore, ⎡ ⎤
Inu 0 0 0
⎢ 0 0 0 0 ⎥
P1 = ⎢ ⎥
⎣ −A −B Inu 0 ⎦ ,
−BT 0 0 0
⎡ ⎤
C B 0 0
⎢ BT 0 0 0 ⎥
N1 = ⎢
⎣ 0
⎥,
0 0 0 ⎦
0 0 0 0
so rank(P1 ) = 2(nu + np ) − 2np . Therefore we take na = 2np . We need a basis for
the nullspace of PT1 , and we can take, for example,
 
T 0 Inp 0 0
Z = .
BT 0 0 Inp

Now we calculate  
1 = BT 0
N ,
BT C BT B
so we can take, for example,
 
X
T= ,
−(BT B)−1 BT CX

which has nd = nu − np columns. Then


    
Inu 0 X X
MT = = ,
0 0 −(BT B)−1 BT CX 0

so we can take  
WT = XT , 0 ,
which makes W MT = X X which has rank nd = nu − np , as desired.
T T

All the matrices are constant with respect to time, so the differentiability
assumptions are satisfied.
124 Chapter 20. Solutions: Solution of Ordinary Differential Equations

CHALLENGE 20.15.
• Using the notation of the pointer, we let a(t) = 1 > 0, b(t) = 8.125π cot((1 +
t)π/8), c(t) = π 2 > 0, and f (t) = −3π 2 . These are all smooth functions on
[0, 1].
• Since c(t) = π 2 /2 > 0 and
 1
π4
[f (t)]2 dt = ,
0 4
the solution exists and is unique.
• Since f (t) < 0, the Maximum Principle tells us that

max u(t) ≤ max(−2.0761, −2.2929, 0) = 0.


t∈[0,1]

• Letting v(t) = −3, we see that

−v  (t) + 8.125π cot((1 + t)π/8)v  (t) + π 2 v(t) = −3π 2

and v(0) = v(1) = −3. Therefore the Monotonicity Theorem says that u(t) ≥
v(t) for t ∈ [0, 1].
• Therefore we conclude −3 ≤ u(t) ≤ 0 for t ∈ [0, 1].
Note on how I constructed the problem: The true solution to the prob-
lem is u(t) = cos((1 + t)π/8) − 3, which does indeed have the properties we proved
about it. But we can obtain a lot of information about the solution (as illustrated
in this problem) without ever evaluating it!

CHALLENGE 20.16. Let y(1) (t) = a(t), y(2) (t) = a (t). Then our system is
 
 y(2) (t)
y (t) = 2 .
y(1) (t) − 5y(2) (t)

function [t,y,z] = solvebvp()

z = fzero(@fvalue,2);
% {\tt now the solution can be obtained by using ode45 with
% {\tt initial conditions [5,z].}}\STATE{\% {\tt For example,
[t,y] = ode45(@yprime,[0:.1:1],[5 z]);

% End of solvebvp
125

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function yp = yprime(t,y)
yp = [y(2); y(1)^2 - 5 * y(2)];

% End of yprime

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function f = fvalue(z)

[t,y] = ode45(@yprime,[0 1],[5,z]);


f = y(end,1)-2;

% End of fvalue

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

CHALLENGE 20.17.

function [t,y,z] = solvebvp()

z = fzero(@evalshoot,[-1,1]);

[t,y] = ode45(@yprime,[0,1],[1,z]);

% The true solution is ...

utrue = cos(pi*t/2) + t.^2;

plot(t,y(:,1),t,utrue)
legend(’Computed solution’,’True solution’)
xlabel(’t’)
ylabel(’u’)

% end of solvebvp

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function f = evalshoot(z)

% Given a value for y(2) at time t=0, see how close


126 Chapter 20. Solutions: Solution of Ordinary Differential Equations

% y(2) is to b_desired at t=1.

b_desired = 1;
[t,y] = ode45(@yprime,[0,1],[1,z]);
f = y(end,1)-b_desired;

% end of evalshoot

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function yp = yprime(t,y)

yp = [ y(2); -(pi/2)^2*y(1)+(pi/2)^2*t.^2 + 2];

% end of yprime

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

CHALLENGE 20.18. No answer provided.

CHALLENGE 20.19. (Partial Solution.) We use Taylor series expansions to


derive the second formula:
h2  h3 h4
u(t + h) = u(t) + hu (t) + u (t) + u (t) + u (η1 ), η1 ∈ [t, t + h],
2 6 24
2 3
h h h4
u(t − h) = u(t) − hu (t) + u (t) − u (t) + u (η2 ), η2 ∈ [t − h, t].
2 6 24

Adding, we obtain

h4 
u(t + h) + u(t − h) = 2u(t) + h2 u (t) + [u (η1 ) + u (η2 )] .
24

Using the Mean Value Theorm on the last term and solving for u (t) gives

u(t − h) − 2u(t) + u(t + h) h4


⇒ u (t) = − 2u (
η ), η ∈ [t − h, t + h].
h2 2
 h · 24
 
O(h2 )
127

CHALLENGE 20.20. Let uj approximate u(jh). Then

u0 = 2
uj−1 − 2uj + uj+1 uj+1 − uj−1
2
= + 6uj
h 2h
u5 = 3

where j = 1, 2, 3, 4.

CHALLENGE 20.21. For j = 1, . . . , 99,


uj−1 − 2uj + uj+1 uj+1 − uj−1
Fj (u) = 2
− + jhuj − euj ,
h 2h
where u0 = 1 and u100 = 0.
128 Chapter 20. Solutions: Solution of Ordinary Differential Equations
Chapter 21

Solutions: Case Study:


More Models of Infection:
It’s Epidemic

CHALLENGE 21.1. Sample programs are given on the website. The results
are shown in Figure 21.1. 95.3% of the population becomes infected.

CHALLENGE 21.2. The results are shown in Figure 21.1 and, as expected, are
indistinguishable from those of Model 1.

CHALLENGE 21.3. The results are shown in Figure 21.2. 94.3% of the
population becomes infected, slightly less than in the first models, and the epidemic
dies out in roughly half the time.

CHALLENGE 21.4. Let’s use subscripts x to denote partial derivatives with


respect to x, so that Ixx (t, x, y) = ∂ 2 I(t, x, y)/∂x2 .
(a) Since Taylor series expansion yields

h2 h3
I(t)i−1,j = I(t, x, y) − hIx (t, x, y) + Ixx (t, x, y) − Ixxx (t, x, y) + O(h4 ) ,
2 6
h2 h3
I(t)i+1,j = I(t, x, y) + hIx (t, x, y) + Ixx (t, x, y) + Ixxx (t, x, y) + O(h4 ) ,
2 6
we see that
I(t)i−1,j − 2I(t)ij + I(t)i+1,j h2 Ixx (t, x, y) + O(h4 )
= = Ixx (t, x, y) + O(h2 ) .
h2 h2

129
130 Chapter 21. Solutions: Case Study: More Models of Infection: It’s Epidemic

Solution from Ordinary Differential Equation Model


1
Infected
Susceptible
0.9 Recovered

0.8

0.7
proportion of population

0.6

0.5

0.4

0.3

0.2

0.1

0
0 10 20 30 40 50 60
time

Figure 21.1. Proportion of individuals infected by the epidemic from the


ode Model 1 or the dae Model 2.

(b) The matrix A can be expressed as

A = T ⊗ I + I ⊗ T,

where ⎡ ⎤
−2 2
⎢ 1 −2 1 ⎥
⎢ ⎥
1 ⎢ . . . ⎥
T= 2⎢ ⎥,
h ⎢⎢ . . . ⎥

⎣ 1 −2 1 ⎦
2 −2
and T and the identity matrix I are matrices of dimension n × n. (The notation
C ⊗ D denotes the matrix whose (i, j)-th block is cij D. The MATLAB command
to form this matrix is kron(C,D), which means Kronecker product of C and D. See
Chapter 6.)

CHALLENGE 21.5. The results of (a) are given in Figure 21.3, and those
for (b) are given in Figure 21.4. The infection rate without vaccination is 95.3%
(very similar to Model 1) while with vaccination it drops to 38.9%. Vaccination
also significantly shortens the duration of the epidemic.
131

Solution from Delay Differential Equation Model


1
Infected
Susceptible
0.9 Recovered

0.8

proportion of population 0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10 12 14 16 18
time

Figure 21.2. Proportion of individuals infected by the epidemic from the


dde Model 3.

CHALLENGE 21.6. No answer provided.


132 Chapter 21. Solutions: Case Study: More Models of Infection: It’s Epidemic

Solution from Differential Equation Model


1
Infected
Recovered
0.9

0.8

0.7
infected proportion

0.6

0.5

0.4

0.3

0.2

0.1

0
0 10 20 30 40 50 60 70 80 90
time

Figure 21.3. Proportion of individuals infected by the epidemic from the


differential equation of Model 5a.

Solution from Differential Equation Model with Vaccination


1
Infected
Recovered
0.9

0.8

0.7
infected proportion

0.6

0.5

0.4

0.3

0.2

0.1

0
0 10 20 30 40 50 60
time

Figure 21.4. Proportion of individuals infected by the epidemic from the


differential equation of Model 5b, including vaccinations.
Chapter 22

Solutions: Case Study:


Robot Control: Swinging
Like a Pendulum

(coauthored by Yalin E. Sagduyu)

CHALLENGE 22.1. Under the transformation, equation (1) becomes


    
1 0 y(1) (t) y(2) (t)
 = ,
c m y(2) (t) −mg sin(y(1) (t))

or   

y(1) (t) 1 0 y(2) (t)
 = .
y(2) (t) −c/(m) 1/(m) −mg sin(y(1) (t))

Replacing sin(y(1) (t)) by y(1) (t) gives the system


  

 y(1) (t) 0 1 y(1) (t)
y =  = = Ay .
y(2) (t) −g/ −c/(m) y(2) (t)

The eigenvalues of the matrix A are the roots of det(A − λI) = 0, or the roots of
λ2 + λc/(m) + g/ = 0, and these are

c c2 g
λ1,2 = − ± 2 2
− .
2m 4m  
For the undamped case, c = 0, so the real part of each eigenvalue is zero and the
system is unstable. The real part of each eigenvalue is negative if c > 0, so in the
damped case, the system is stable.
If λ1 = λ2 , the eigenvectors of the matrix are
   
1 1
, ,
λ1 λ2

133
134 Chapter 22. Solutions: Case Study: Robot Control: Swinging Like a Pendulum

so the solution to the differential equation is


   
1 λ1 t 1
y(t) = α1 e + α2 eλ2 t ,
λ1 λ2
where α1 and α2 are constants determined by two additional conditions. If the
discriminant satisfies c2 /(4m2 2 ) − g/ > 0, then the solution decays; otherwise it
can have an oscillatory component in addition to a decaying one.

CHALLENGE 22.2. We note that v(0, 0) = 0, and it is easy to see that v > 0
for all other values of its arguments.
We differentiate:
d g sin θ(t) dθ(t) dθ(t) d2 θ(t)
v(y(t)) = +
dt  dt dt dt2
g sin θ(t) dθ(t) dθ(t) 1 dθ(t)
= − (c + mg sin(θ(t)))
 dt dt m dt
 2
−c dθ(t)
= ≤ 0.
m dt
Therefore, we can now conclude that the point θ = 0, dθ/dt = 0 is stable for both the
damped (c > 0) and undamped (c = 0) cases. For the undamped case, dv(y(t))/dt
is identically zero, and we cannot conclude that we have asymptotic stability. For
the damped case, we note that the set defined by dv(y(t))/dt = 0 contains all points
(θ, dθ/dt = 0), and the only invariant set is the one containing the single point (0, 0)
so this point is asymptotically stable.

CHALLENGE 22.3. From Challenge 1, we see that


   
0 1 0
A= , B= .
−g/ −c/(m) 1/(m)
Our dimensions are n = 2, m = 1, so the controllability matrix is
 
  0 1/(m)
B AB = .
1/(m) −c/(m)2
This matrix has rank 2, independent of c, so the system is controllable.

CHALLENGE 22.4. See the program problem4.m on the website. The results
are shown in Figures 22.1 and 22.2. The models for the undamped undriven pen-
dulum quickly show a phase difference in their results, while the damped undriven
pendulum results are quite similar. For the driven pendulum, the linear and non-
linear results differ more as the angle θf gets bigger, and the linear models do not
converge to θf .
135

Undamped Undriven Pendulum


1
Nonlinear model
Linear model
0.5

θ (t)
0

−0.5

−1
0 2 4 6 8 10 12 14 16 18 20
time (t)

Damped Undriven Pendulum


1
Nonlinear model
Linear model
0.5
θ (t)

−0.5

−1
0 2 4 6 8 10 12 14 16 18 20
time (t)

Figure 22.1. The linear and nonlinear undriven models.

0.8
Nonlinear model
0.6 Linear model
θ (t)

0.4

0.2
Damped Driven Pendulum, θf=0.392699
0
0 2 4 6 8 10 12 14 16 18 20
time (t)
0.8
Nonlinear model
0.75 Linear model
θ (t)

0.7

0.65
Damped Driven Pendulum, θf=0.785398
0.6
0 2 4 6 8 10 12 14 16 18 20
time (t)
1.4
Nonlinear model
1.2 Linear model
θ (t)

0.8

0.6 Damped Driven Pendulum, θf=1.047198


0 2 4 6 8 10 12 14 16 18 20
time (t)

Figure 22.2. The linear and nonlinear driven models.

CHALLENGE 22.5. See the program problem5.m on the website. The θ(t)
results for the original solution, shooting method, and finite difference method differ
by at most 0.004.
136 Chapter 22. Solutions: Case Study: Robot Control: Swinging Like a Pendulum

Dampled Driven Pendulum with Optimal Control Parameter


1.15

1.1

1.05

1
θ (t)

0.95

0.9

0.85

0.8

0.75
0 2 4 6 8 10 12 14 16 18 20
time (t)

Figure 22.3. The path of the robot arm with optimal control.

CHALLENGE 22.6. See the program problem6.m on the website. The energy
function returns the energy as specified above plus a large positive penalty term in
case the parameter is unsuccessful; the penalty keeps the minimizer from choosing
an unsuccessful parameter. For b = −1.7859, the total energy consumed is about
43.14 Joules. The motion of the pendulum is shown in Figure 22.3.
Note that it is always a good idea to sketch the function to be minimized to
see if the reported solution is reasonable.
Chapter 23

Solutions: Case Study:


Finite Differences and
Finite Elements: Getting
to Know You

CHALLENGE 23.1.
⎡ ⎤⎡ ⎤ ⎡ ⎤
2 −1 0 0 u1 f1
1 ⎢⎢ −1 2 −1 0 ⎥
⎥⎢
⎢ u2 ⎥ ⎢ f2 ⎥
⎥=⎢ ⎥
h2 ⎣ 0 −1 2 −1 ⎦ ⎣ u3 ⎦ ⎣ f3 ⎦ ,
0 0 −1 2 u4 f4

where h = 1/5, uj ≈ u(jh), and fj = f (jh).

CHALLENGE 23.2. Documentation is posted on the website for the program


finitediff2.m of Problem 3, which is very similar to finitediff1.m but more
useful. Please remember that if you use a program like finitediff1.m and fail
to include the name of the program’s author, or at least a reference to the web-
site from which you obtained it, it is plagiarism. Similarly, your implementation
of finitediff2.m should probably include a statement like, “Derived from finited-
iff1.m by Dianne O’Leary.”

CHALLENGE 23.3. See finitediff2.m on the website.

CHALLENGE 23.4.
(a) First notice that if α and β are constants and v and z are functions of x, then

a(u, αv + βz) = αa(u, v) + βa(u, z),

137
138 Chapter 23. Solutions: Case Study: Finite Differences and Finite Elements

since we can compute the integral of a sum as the sum of the integrals and then
move the constants outside the integrals. Therefore,


M −2
a(uh , vh ) = a(uh , vj φ j )
j=1


M −2
= vj a(uh , φj )
j=1


M −2
= vj (f, φj )
j=1


M −2
= (f, vj φ j )
j=1
= (f, vh ).

(b) We compute
 1
a(φj , φj ) = (φj (t))2 dt
0
 (j+1)h
= (φj (t))2 dt
(j−1)h
 jh
1
=2 dt
(j−1)h h2
2
= ,
h
and
 1
a(φj , φj+1 ) = φj (t)φj+1 (t)dt
0
 (j+1)h
= φj (t)φj+1 (t)dt
jh
 (j+1)h
(−1) 1
= dt
jh h h
1
=− .
h
So our system becomes
⎡ ⎤⎡ ⎤ ⎤⎡
2 −1 0 0 u1 f1
1⎢
⎢ −1 2 −1 0 ⎥
⎥⎢
⎢ u2 ⎥ ⎢ f2 ⎥
⎥=⎢ ⎥
h⎣ 0 −1 2 −1 ⎦ ⎣ u3 ⎦ ⎣ f3 ⎦
0 0 −1 2 u4 f4
139

where uj is the coefficient of φj in the representation of uh and


 1  (j+1)h
fj = f (t)φj (t)dt = f (t)φj (t)dt,
0 (j−1)h

which is h times a weighted average of f over the jth interval. The only difference
between the finite difference system and this system is that we have replaced point
samples of f by average values. Note that if a(t) is not constant, then the systems
look even more different.

CHALLENGE 23.5. See the program posted on the website.

CHALLENGE 23.6. See the program posted on the website.

CHALLENGE 23.7. Here are the results, in dull tables with interesting entries:

PROBLEM 1
Using coefficient functions a(1) and c(1) with true solution u(1)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 2.1541e-03 2.1662e-05 2.1662e-07
2nd order finite difference 2.1541e-03 2.1662e-05 2.1662e-07
Linear finite elements 1.3389e-13 1.4544e-14 1.4033e-13
Quadratic finite elements 3.1004e-05 3.5682e-09 3.6271e-13

PROBLEM 2
Using coefficient functions a(1) and c(2) with true solution u(1)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 1.7931e-03 1.8008e-05 1.8009e-07
2nd order finite difference 1.7931e-03 1.8008e-05 1.8009e-07
Linear finite elements 6.1283e-04 6.1378e-06 6.1368e-08
Quadratic finite elements 2.7279e-05 3.5164e-09 1.7416e-12

PROBLEM 3
140 Chapter 23. Solutions: Case Study: Finite Differences and Finite Elements

Using coefficient functions a(1) and c(3) with true solution u(1)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 1.9405e-03 1.9529e-05 1.9530e-07
2nd order finite difference 1.9405e-03 1.9529e-05 1.9530e-07
Linear finite elements 4.3912e-04 4.3908e-06 4.3906e-08
Quadratic finite elements 2.8745e-05 3.5282e-09 3.6134e-13

PROBLEM 4
Using coefficient functions a(2) and c(1) with true solution u(1)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 1.5788e-02 1.8705e-03 1.8979e-04
2nd order finite difference 3.8465e-03 3.8751e-05 3.8752e-07
Linear finite elements 1.3904e-03 1.3930e-05 1.3930e-07
Quadratic finite elements 1.6287e-04 1.9539e-08 1.9897e-12

PROBLEM 5
Using coefficient functions a(3) and c(1) with true solution u(1)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 1.1858e-02 1.4780e-03 1.5065e-04
2nd order finite difference 3.6018e-03 3.6454e-05 3.6467e-07
Linear finite elements 8.3148e-04 8.2486e-06 1.2200e-06
Quadratic finite elements 1.0981e-04 1.6801e-06 2.5858e-06

PROBLEM 6
Using coefficient functions a(1) and c(1) with true solution u(2)
Infinity norm of the error at the grid points
for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 8.9200e-02 9.5538e-02 9.6120e-02
2nd order finite difference 8.9200e-02 9.5538e-02 9.6120e-02
Linear finite elements 8.6564e-02 9.5219e-02 9.6086e-02
Quadratic finite elements 8.6570e-02 9.5224e-02 9.6088e-02

PROBLEM 7
Using coefficient functions a(1) and c(1) with true solution u(3)
141

Infinity norm of the error at the grid points


for various methods and numbers of interior grid points M

M = 9 99 999
1st order finite difference 1.5702e-01 1.6571e-01 1.6632e-01
2nd order finite difference 1.5702e-01 1.6571e-01 1.6632e-01
Linear finite elements 1.4974e-01 1.6472e-01 1.6622e-01
Quadratic finite elements 1.4975e-01 1.6472e-01 1.6622e-01

Discussion:
Clearly, the finite difference methods are easier to program and therefore are
almost always used when x is a single variable. Finite elements become useful,
though, when x has 2 or more components and the shape of the domain is nontrivial.
The bulk of the work in these methods is in function evaluations. We need
O(M ) evaluations of a, c, and f in order to form each matrix. For finite differences,
the constant is close to 1, but quad (the numerical integration routine) uses many
function evaluations per call (on the order of 10), making formation of the finite
element matrices about 10 times as expensive.
The experimental rate of convergence should be calculated as the log10 of
the ratio of the successive errors (since we increase the number of grid points by
a factor of 10 each time). There are several departures from the expected rate of
convergence:
• finitediff1 is expected to have a linear convergence rate (r = 1), but has
r = 2 for the first three problems because a = 0 and the approximation is
the same as that in finitediff2.
• The quadratic finite element approximation has r = 4 on Test Problems 1-4,
better than the r = 3 we might expect. This is called superconvergence and
happens because we only measured the error at the grid points, whereas the
r = 3 result was for the average value of the error over the entire interval.
• Linear finite elements give almost an exact answer to Test Problem 1 at the
grid points (but not between the grid points). This occurs because our finite
element equations demand that

a(uh , φj ) = (uh , φj ) = [−uh (tj−1 ) + 2uh (tj ) − uh (tj+1 )]/h = (f, φ),

and our true solution also satisfies this relation.


• In Test Problem 5, the coefficient function a has a discontinuous derivative at
x = 1/3. The matrix entries computed by the numerical integration routine
are not very accurate, so the finite element methods appear to have slow
convergence. This can be fixed by extra calls to quad so that it never tries to
integrate across the discontinuity.
• The “solution” to Test Problem 6 has a discontinuous derivative, and the
“solution” to Test Problem 7 is discontinuous. None of our methods compute
142 Chapter 23. Solutions: Case Study: Finite Differences and Finite Elements

good approximations, although all of them return a reasonable answer (See


Figure 23.1) that could be mistaken for what we are looking for. The finite
difference approximations lose accuracy because their error term depends on
u . The finite element equations were derived from the integrated (weak)
formulation of our problem, and when we used integration by parts, we left
off the boundary term that we would have gotten at x = 2/3, so our equations
are wrong. This is a case of, “Be careful what you ask for.”

• The entries in the finite element matrices are only approximations to the true
values, due to inaccuracy in estimation of the integrals. This means that as
the grid size is decreased, we need to reduce the tolerance that we send to
quad in order to keep the matrix accurate enough.

• The theoretical convergence rate only holds down to the rounding level of the
machine, so if we took even finer grids (much larger M ), we would fail to see
the expected rate.

On these simple 1-dimensional examples, we uncovered many pitfalls in naive


use of finite differences and finite elements. Nevertheless, both methods are quite
useful when used with care.
143

0.45
Computed solution
True solution
0.4

0.35

0.3

0.25
u

0.2

0.15

0.1

0.05

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t

Figure 23.1. The “solution” to the seventh test problem. We compute an


accurate answer to a different problem.
144 Chapter 23. Solutions: Case Study: Finite Differences and Finite Elements
Unit VI

SOLUTIONS: Nonlinear
Systems and Continuation
Methods

145
Chapter 24

Solutions: Nonlinear
Systems

CHALLENGE 24.1.
 
x2 y 3 + xy − 2
F(x) =
2xy 2 + x2 y + xy

and  
2xy 3 + y 3x2 y 2 + x
J(x) = 2 .
2y + 2xy + y 4xy + x2 + x

x = [5;4];
for i=1:5,
F = [x(1)^2*x(2)^3 + x(1)*x(2) - 2;
2*x(1)*x(2)^2 + x(1)^2*x(2) + x(1)*x(2)];

J = [2*x(1)*x(2)^3 + x(2), 3*x(1)^2*x(2)^2 + x(1);


2*x(2)^2 + 2*x(1)*x(2) + x(2), 4*x(1)*x(2) + x(1)^2 + x(1)];

p = - ( J \ F );
x = x + p;
end

CHALLENGE 24.2.

• The first line does n2 divisions. It would be better to add parentheses to drop
this to n: B = B + (y-B*s)*(s’/(s’*s)) .

147
148 Chapter 24. Solutions: Nonlinear Systems

• It is silly to refactor the matrix B each time, when it is just a rank-1 update
of the previous matrix. Instead, update a decomposition or (less desirable)
update the inverse using the techniques of Chapter 7.

CHALLENGE 24.3. For some vector r, we need to form

(A − ZVT )−1 r = A−1 r + A−1 Z(I − VT A−1 Z)−1 VT A−1 r,

where A = B(k) , Z = y − B(k) s, and V = −s/(sT s). Thus we need to

• form t = A−1 r and u = A−1 Z, at a cost of 2p multiplications,

• form α = 1 − VT u at a cost of n multiplications,

• form w = ((VT t)/α)u at a cost of 2n multiplications and 1 division

• add t and w.

The total number of multiplications is 2p + 3n + 1. Again, updating a matrix


decomposition has many advantages over this approach!

CHALLENGE 24.4. (Partial Solution.)


(a) We compute the partial of ρa (λ, x) with respect to λ:
 
x2 y 3 + xy − 2 − (x − a1 )
s= .
2xy 2 + x2 y + xy − (y − a2 )

Then the Jacobian of ρa is the 2 × 3 matrix


 
 x) = s, (1 − λ)I + λJ(x) ,
J(λ,

where J(x) is the matrix from Problem 1.

(b) In order for the function to be transversal to zero, the matrix J(λ,  x) must be
full rank (i.e., rank-2) at every point λ ∈ [0, 1), x, y ∈ (−∞, ∞).
The matrix J(x) has two eigenvalues – call them α1 and α2 . The matrix K =
(1 − λ)I + λJ(x) has eigenvalues (1 − λ) + λαi , so it is singular only if λ = 1/(1 − α1 )
or λ = 1/(1 − α2 ). Even if that happens, it is likely that the vector s will point in
a different direction, making the rank of J(λ, x) equal to 2.
149

CHALLENGE 24.5. Using the Lagrange form of the interpolating polynomial,


we can write
p(f ) = L1 (f )t1 + L2 (f )t2 + L3 (f )t3 ,
where
(f − f2 )(f − f3 )
L1 (f ) = ,
(f1 − f2 )(f1 − f3 )
(f − f1 )(f − f3 )
L2 (f ) = ,
(f2 − f1 )(f2 − f3 )
(f − f1 )(f − f2 )
L3 (f ) = .
(f3 − f1 )(f3 − f2 )

It is easy to verify that Lj (fj ) = 1 and Lj (fk ) = 0 if j = k. Therefore, p(f1 ) = t1 ,


p(f2 ) = t2 , and p(f3 ) = t3 , as desired.
Now, we want to estimate the value of t so that f (t) = 0, and we take this estimate
to be p(0). We calculate:
L(1) = f(2)*f(3)/((f(1)-f(2))*(f(1)-f(3)));
L(2) = f(1)*f(3)/((f(2)-f(1))*(f(2)-f(3)));
L(3) = f(1)*f(2)/((f(3)-f(1))*(f(3)-f(2)));
testimated = L*t;
150 Chapter 24. Solutions: Nonlinear Systems
Chapter 25

Solutions: Case Study:


Variable-Geometry
Trusses: What’s Your
Angle?

CHALLENGE 25.1.
See Figures 25.1 – 25.4.

CHALLENGE 25.2.
See the programs on the website.

CHALLENGE 25.3.
See the programs on the website and the figure in the book. Compare your
results with Arun, who found 8 solutions for Platform A, 8 for Platform B, 4 for
Platform C, and 16 for Platform D.
The continuation method was much slower than fsolve and gave no more solutions.
A trick called homogenization can be used to improve the homotopy. This involves
replacing the variables by their inverses in order to take solutions that are at infinity
and map them to zero. See [1,2,3] for more details.

1. V. Arun, The Solution of Variable-Geometry Truss Problems Using New Ho-


motopy Continuation Methods, PhD thesis, Mechanical Engineering Department,
Virginia Polytechnic Institute and State University, Blacksburg, Virginia, Septem-
ber 1990.
2. V. Arun, C. F. Reinholtz, and L. T. Watson, Application of new homotopy
continuation techniques to variable geometry trusses, Trans. of the ASME, 114:422–
428, September 1992.
3. A. Morgan, Solving Polynomial Systems Using Continuation For Engineering
and Scientific Problems, Prentice-Hall, Englewood Cliffs, NJ, 1987.

151
152 Chapter 25. Solutions: Case Study: Variable-Geometry Trusses

B1

π/3

MBC

MAB C1
π/3

MAC

π/3
A
1

Figure 25.1. The triangle at the base of the platform (in the xy-plane)
and the midpoints of each side.

C2

hC

θC
MAB
dC

Figure 25.2. The distance from MAB (located in the xy-plane) to C2 is


hC , and the side of the triangle with length dC = hC cos(θC ) points in the positive
x-direction. Therefore, the coordinates of C2 are those of MAB plus dC in the
x-direction, 0 in the y-direction, and hC sin(θC ) in the z-direction.
153

B
1 dA

π/3

π/3

MBC

C1

Figure 25.3. The endpoint of the side labeled dA determines the x and
y coordinates of A2 . The length is dA = hA cos(θA ), so the x-displacement from
MBC is dA cos(π/3) and the y-displacement is dA sin(π/3). The z-displacement is
hA sin(θA ).
154 Chapter 25. Solutions: Case Study: Variable-Geometry Trusses

C1

MAC

π/3

π/3 dB
A1

Figure 25.4. The endpoint of the side labeled dB determines the x and
y coordinates of B2 . The length is dB = hB cos(θB ), so the x-displacement from
MAC is dB cos(−π/3) and the y-displacement is dB sin(−π/3). The z-displacement
is hB sin(θB ).
Chapter 26

Solutions: Case Study:


Beetles, Cannibalism, and
Chaos: Analyzing a
Dynamical System Model

CHALLENGE 26.1. The results are shown in Figure 26.1. When μA = 0.1,
the solution eventually settles into a cycle, oscillating between two different values:
18.7 and 321.6 larvae, 156.7 and 9.1 pupae, and 110.1 and 121.2 adults. Thus the
population at 4 week intervals is constant. Note that the peak pupae population
lags 2 weeks behind the peak larvae population, and that the oscillation of the adult
population is small compared to the larvae and pupae.
For μA = 0.6, the population eventually approaches a fixed point: 110.7 larvae,
54.0 pupae, and 42.3 adults.
In the third case, μA = 0.9, there is no regular pattern for the solution, and
it is called chaotic. The number of larvae varies between 18 and 242, the number
of pupae between 8 and 117, and the number of adults between 9 and 94.

CHALLENGE 26.2. The results are shown in Figure 26.2. For the stable
solutions, if the model is initialized with population values near Af ixed , Lf ixed , and
Pf ixed , it will converge to these equilibrium values.

CHALLENGE 26.3. The bifurcation diagram is shown in Figure 26.3. The


largest tested value of μA that gives a stable solution is 0.58. If the computation
were performed in exact arithmetic, the graph would just be a plot of Lf ixed vs. μA .
When the solution is stable, rounding error in the computation produces a nearby
point from which the iteration tends to return to the fixed point. When the solution
is unstable, rounding error in the computation can cause the computed solution to
drift away. Sometimes it produces a solution that oscillates between two values (for
example, when μA = 0.72) and sometimes the solution becomes chaotic or at least
has a long cycle (for example, when μA = 0.94).

155
156 Chapter 26. Solutions: Case Study: Beetles, Cannibalism, and Chaos

Results of the LPA model with three different choices of μa


400

300

200

100

0
0 5 10 15 20 25 30 35 40 45 50
time (two−week periods)
200

150

100

50

0
0 5 10 15 20 25 30 35 40 45 50
time (two−week periods)
300

200

100

0
0 5 10 15 20 25 30 35 40 45 50
time (two−week periods)

Figure 26.1. Model predictions for b = 11.6772, μL = 0.5129, cel =


0.0093, cea = 0.0110, cpa = 0.0178, L(0) = 70, P (0) = 30, A(0) = 70, and μA = 0.1
(top), 0.6 (middle), and 0.9 (bottom). Number of larvae (blue dotted), pupae (green
solid), and adults (red dashed).

Equilibrium population as a function of b


160
Larvae
Pupae
Adults
140

120

100
Population

80

60

40

20

0
0 2 4 6 8 10 12 14 16 18 20
b

Figure 26.2. Equilibrium populations for μL = 0.5, μA = 0.5, cel = 0.01,


cea = 0.01, and cpa = 0.01, b = 1.0, 1.5, 2.0, . . . , 20.0. Stable solutions are marked
with plusses.
157

colony cel cea cpa b μL μA residual


new: 1 0.018664 0.008854 0.020690 5.58 0.144137 0.036097 5.04
old: 1 0.009800 0.017500 0.019800 23.36 0.472600 0.093400 17.19
new: 2 0.004212 0.013351 0.028541 6.77 0.587314 0.000005 7.25
old: 2 0.010500 0.008700 0.017400 11.24 0.501400 0.093000 14.24
new: 3 0.018904 0.006858 0.035082 6.47 0.288125 0.000062 4.37
old: 3 0.008000 0.004400 0.018000 5.34 0.508200 0.146800 4.66
new: 4 0.017520 0.012798 0.023705 6.79 0.284414 0.005774 6.47
old: 4 0.008000 0.006800 0.016200 7.20 0.564600 0.109900 7.42

Table 26.1. Parameter estimates computed in Challenge 4 for our mini-


mization function (“new”) and that of Dennis et al. (“old”).

Colony Norm of data vector New residual Old residual


Colony 1 33.55 5.04 17.19
Colony 2 33.70 7.25 14.24
Colony 3 33.44 4.37 4.66
Colony 4 33.68 6.47 7.42

Table 26.2. Residual norms computed in Challenge 4 for our minimization


function (“new”) and that of Dennis et al. (“old”).

CHALLENGE 26.4. The bounds used were 0 and 1 for all parameters except
b. The value of b was confined to the interval [0.1, 9.0]. The results are summarized
in Tables 26.1 and 26.2

CHALLENGE 26.5. When the data is randomly perturbed, the estimate of b


for the second colony ranges from 4.735941 to 6.831328.
A larger upper bound for b tended to cause the minimizer to converge to a
local solution with a much larger residual.
There are many ways to measure sensitivity:
• We might ask how large a change we see in b when the data is perturbed a
bit. This is a forward error result.
• We might ask how large a change we see in the residual when the value of b
is perturbed a bit. This is a backward error result.
To estimate the forward error, I repeated the fit, after adding 50 samples of
normally distributed error (mean 0, standard deviation 1) to the log of the counts.
This is only an approximation to the error assumption usually made for counts,
158 Chapter 26. Solutions: Case Study: Beetles, Cannibalism, and Chaos

Results of the LPA model with different choices of μa


1500

1000

Larvae
500

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

800

600
Pupae

400

200

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

800

600
Adults

400

200

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 26.3. The bifurcation diagram for the data in Problem 3.

Predictions by Dennis et al. (+) and our calculations (square)


600
Larvae

400

200

0
0 2 4 6 8 10 12 14 16 18 20

300
Pupae
200

100

0
0 2 4 6 8 10 12 14 16 18 20

140
Adults
120

100

80

60
0 2 4 6 8 10 12 14 16 18 20

Figure 26.4. Model predictions for Colony 1.

Poisson error, but by using the log function in their minimization, the authors are
assuming that this is how the error behaves. Even so, the estimates, shown in
Figure 26.8, range from 1.00 to 9.00, quite a large change.
To estimate the backward error, I varied b, keeping the other parameters at
their optimal values, and plotted the resulting residual vs. b in Figure 26.9. We see
that the residual is not very sensitive to changes in b.
159

Predictions by Dennis et al. (+) and our calculations (square)


600
Larvae

400

200

0
0 2 4 6 8 10 12 14 16 18 20

300
Pupae

200

100

0
0 2 4 6 8 10 12 14 16 18 20

150
Adults

100

50
0 2 4 6 8 10 12 14 16 18 20

Figure 26.5. Model predictions for Colony 2.

Predictions by Dennis et al. (+) and our calculations (square)


300
Larvae

200

100

0
0 2 4 6 8 10 12 14 16 18 20

150
Pupae

100

50

0
0 2 4 6 8 10 12 14 16 18 20

120
Adults
100

80

60

40
0 2 4 6 8 10 12 14 16 18 20

Figure 26.6. Model predictions for Colony 3.

Then I minimized the residual as a function of the 5 parameters remaining


after setting b to fixed values. From Figure 26.10, we conclude that for any value
of b between 1 and 50 we can obtain a residual norm within 10% of the computed
minimum over all choices of b. This model seems to give no insight into the true
value of b.
160 Chapter 26. Solutions: Case Study: Beetles, Cannibalism, and Chaos

Predictions by Dennis et al. (+) and our calculations (square)


600
Larvae

400

200

0
0 2 4 6 8 10 12 14 16 18 20

300
Pupae

200

100

0
0 2 4 6 8 10 12 14 16 18 20

140
Adults
120

100

80

60
0 2 4 6 8 10 12 14 16 18 20

Figure 26.7. Model predictions for Colony 4.

But as a final attempt, I used a continuation algorithm, repeating the compu-


tations from Figure 26.10, but starting each minimization from the optimal point
found for the previous value of b. The resulting residuals, shown in Figure 26.11,
are much smaller, and the b value is somewhat better determined – probably be-
tween 5 and 10. Even more interesting, the fitted model finally gives a reasonable
approximation of the data; see Figure 26.12.
To check the reliability of these estimates, it would be a good idea to repeat
the experiment for the data for the other three colonies, and to repeat the least
squares calculations using a variety of initial guesses.
161

Results of random perturbations on data for the second colony


1

0.8

0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8

−1
0 1 2 3 4 5 6 7 8 9
b estimates

Figure 26.8. Values of b computed for Colony 2 with 250 random pertur-
bations of the log of the data, drawn from a normal distribution with mean 0 and
standard deviation 1.

How sensitive is the residual to changes in b?


8

7.9

7.8

7.7

7.6
Residual norm

7.5

7.4

7.3

7.2

7.1

7
4 5 6 7 8 9 10 11 12
b

Figure 26.9. Changes in the residual as b is changed for Colony 2, leaving


the other parameters fixed.
162 Chapter 26. Solutions: Case Study: Beetles, Cannibalism, and Chaos

Comparison of optimal residual norm with best found for fixed b


7.7
Best
1.05*Optimal
7.65

7.6

7.55

7.5
Residual norm

7.45

7.4

7.35

7.3

7.25

7.2
0 5 10 15 20 25 30 35 40 45 50
b

Figure 26.10. Best (smallest) residuals for Colony 2 computed as a func-


tion of of the parameter b (blue circles) compared with the red dotted line, indicating
a 10% increase over the minimal computed residual.

Comparison of optimal residual norm with best found for fixed b


9
Best
1.05*Optimal

8.5

7.5
Residual norm

6.5

5.5

5
0 5 10 15 20 25 30 35 40 45 50
b

Figure 26.11. Best (smallest) residuals for Colony 2 computed as a func-


tion of of the parameter b (blue circles) compared with the red dotted line, indicating
a 10% increase over the minimal computed residual, using continuation.
163

Predictions by Dennis et al. (+) and our calculations (square)


600
Larvae

400

200

0
0 2 4 6 8 10 12 14 16 18 20

300
Pupae

200

100

0
0 2 4 6 8 10 12 14 16 18 20

300
Adults

200

100

0
0 2 4 6 8 10 12 14 16 18 20

Figure 26.12. Revised Model predictions for Colony 2, with parameters


cel = 0.008930, cea = cpa = 0, b = 7.5, μL = 0.515596, μA = 0.776820.
164 Chapter 26. Solutions: Case Study: Beetles, Cannibalism, and Chaos
Unit VII

SOLUTIONS: Sparse Matrix


Computations, with Application
to Partial Differential Equations

165
Chapter 27

Solutions: Solving Sparse


Linear Systems: Taking
the Direct Approach

CHALLENGE 27.1.
(a) We notice that in Gauss elimination, we need only 5 row operations to zero
elements in the lower triangle of the matrix, and the only row of the matrix that
is changed is the last row. Since this row has no zeros, no new nonzeros can be
produced.
(b) Since PT P = I, we see that

Ax = b ⇐⇒ PAx = Pb ⇐⇒ PAPT (Px) = Pb,

and this verifies that the reordered system has the same solution as the original
one.

CHALLENGE 27.2. For part (b), the important observation is that if element
k is the first nonzero in row , then we start the elimination on row  by a pivot
operation with row k, after row k already has zeros in its first k − 1 positions.
Therefore, an induction argument shows that no new nonzeros can be created before
the first nonzero in a row. A similar argument works for the columns. Part (a) is
a special case of this.

CHALLENGE 27.3. The graph is shown in Figure 27.1. The given matrix
is a permutation of a band matrix with bandwidth 2, and Reverse Cuthill-McKee
was able to determine this and produce an optimal ordering. The reorderings and
number of nonzeros in the Cholesky factor (nz(L)) are

167
168 Chapter 27. Solutions: Solving Sparse Linear Systems

8 2 10 6 4 7 9 3 5 1

Figure 27.1. The graph of the matrix of Problem 3.

Method Ordering nz(L)


Original: 1 2 3 4 5 6 7 8 9 10 27
Reverse Cuthill-McKee: 1 5 3 9 7 4 6 10 2 8 22
Minimum degree: 2 8 10 6 1 3 5 9 4 7 24
Nested dissection(1 level): 8 2 10 6 4 9 3 5 1 7 25
Eigenpartition(1 level): 1 3 5 9 2 4 6 7 8 10 25

(Note that these answers are not unique, due to tiebreaking.) The resulting matrices
and factors are shown in Figures 27.2-27.6.

CHALLENGE 27.4. Using a double precision word (2 words, or 8 bytes) as the


unit of storage and seconds as the unit of time, here are the results:

Solving Laplace equation on circle sector with n = 1208


Algorithm storage time residual norm
Cholesky 660640 1.14e+00 4.04e-15
Cholesky, R-Cuthill-McKee 143575 7.21e-02 2.82e-15
Cholesky, minimum degree 92008 5.18e-02 1.96e-15
Cholesky, approx. mindeg 76912 1.70e-01 1.68e-15
Cholesky, eigenpartition 90232 4.59e+00 1.86e-15
169

S Cholesky decomposition of S
0 0

2 2

4 4

6 6

8 8

10 10

0 2 4 6 8 10 0 2 4 6 8 10
nz = 34 nz = 27

Figure 27.2. Results of using original ordering.

S(r,r) after Cuthill−McKee ordering chol(S(r,r)) after Cuthill−McKee ordering


0 0

2 2

4 4

6 6

8 8

10 10

0 2 4 6 8 10 0 2 4 6 8 10
nz = 34 nz = 22

Figure 27.3. Results of reordering using reverse Cuthill-McKee.

Solving Laplace equation on circle sector with n = 4931


Algorithm storage time residual norm
Cholesky 6204481 3.21e+01 7.73e-15
Cholesky, R-Cuthill-McKee 1113694 7.08e-01 5.30e-15
Cholesky, minimum degree 486751 2.78e-01 2.85e-15
Cholesky, approx. mindeg 444109 2.34e-01 2.81e-15
170 Chapter 27. Solutions: Solving Sparse Linear Systems

S(r,r) after minimum degree ordering chol(S(r,r)) after minimum degree ordering
0 0

2 2

4 4

6 6

8 8

10 10

0 2 4 6 8 10 0 2 4 6 8 10
nz = 34 nz = 24

Figure 27.4. Results of reordering using minimum degree.

S(r,r) after nested dissection ordering chol(S(r,r)) after nested dissection ordering
0 0

2 2

4 4

6 6

8 8

10 10

0 2 4 6 8 10 0 2 4 6 8 10
nz = 34 nz = 25

Figure 27.5. Results of reordering using nested dissection.

(There were too many recursions in eigenpartition method specnd from the
Mesh Partitioning and Graph Separator Toolbox of Gilbert and Teng
http://www.cerfacs.fr/algor/Softs/MESHPART/.)
171

S(r,r) after eigenpartition ordering chol(S(r,r)) after eigenpartition ordering


0 0

2 2

4 4

6 6

8 8

10 10

0 2 4 6 8 10 0 2 4 6 8 10
nz = 34 nz = 25

Figure 27.6. Results of reordering using eigenpartitioning.

Solving Laplace equation on box, with n = 15625


Algorithm storage time residual norm
Cholesky 28565072 1.02e+02 6.98e-14
Cholesky, R-Cuthill-McKee 16773590 3.79e+01 6.10e-14
Cholesky, minimum degree 8796896 4.08e+01 4.39e-14
Cholesky, approx. mindeg 7549652 3.08e+01 3.66e-14
(There were too many recursions in eigenpartition method specnd.)
All algorithms produced solutions with small residual norm. On each problem,
the approximate minimum degree algorithm gave factors requiring the lowest stor-
age, preserving sparsity the best, and on the last two problems, it used the least time
as well. (Note that local storage used within MATLAB’s symrcm, symmmd, symamd,
and the toolbox specnd was not counted in this tabulation.) It is quite expensive
to compute the eigenpartition ordering, and this method should only be used if the
matrices will be used multiple times so that the cost can be amortized. To complete
this study, it would be important to try different values of n, to determine the rate
of increase of the storage and time as n increased.
To judge performance, several hardware parameters are significant, including
computer (Sun Blade 1000 Model 1750), processor (Sun UltraSPARC-III), clock
speed (750 MHz), and amount of RAM (1 Gbyte). The software specifications
of importance include the operating system (Solaris 8) and the MATLAB version
(6.5.1). Benchmarking is a difficult task, depending on the choice of hardware,
software, and test problems, and our results on this problem should certainly raise
more questions than they answer.
172 Chapter 27. Solutions: Solving Sparse Linear Systems
Chapter 28

Solutions: Iterative
Methods for Linear
Systems

CHALLENGE 28.1. See Solution to Challenge 6.

CHALLENGE 28.2. See the answer to Challenge 6.

CHALLENGE 28.3. See the answer to Challenge 6.

CHALLENGE 28.4. Consider our stationary iterative method

Mx(k+1) = Nx(k) + b

or
x(k+1) = M−1 Nx(k) + M−1 b.
Manipulating these equations a bit, we get

x(k+1) = x(k) + (M−1 N − I)x(k) + M−1 b


= x(k) + M−1 (N − M)x(k) + M−1 b
= x(k) + M−1 (b − Ax(k) )
= x(k) + M−1 r(k) .

CHALLENGE 28.5. See the answer to Challenge 6.

173
174 Chapter 28. Solutions: Iterative Methods for Linear Systems

CHALLENGE 28.6. The solution to these five problems is given on the website
in solution20.m. The results for the square domain are shown in Figures 28.1
and 28.2. Gauss-Seidel took too many iterations to be competitive. The parameter
cut is the drop-tolerance for the incomplete Cholesky decomposition. The AMD-
Cholesky decomposition was the fastest algorithm for this problem, but it required
5.4 times the storage of cg and 2.6 times the storage of the pcg algorithm with
incomplete Cholesky preconditioner for the problem of size 16129. Without reorder-
ing, Cholesky was slow and very demanding of storage, requiring almost 30 million
double-precision words for the largest problem (almost 70 times as much as for the
AMD reordering).
Gauss-Seidel took a large amount of time per iteration. This is an artifact of
the implementation, since it is a bit tricky to get MATLAB to avoid working with
the zero elements when accessing a sparse matrix row-by-row. Challenge: look at
the program in gauss seidel.m and try to speed it up. A better version is provided
in the solution to Challenge 32.4.
Results for the domain with the circle cut out were similar; see Figures 28.3
and 28.4.
175

Prob 1: Number of iterations, Original ordering


2
10
Chol
CGGS
cut=.05
cut=.5
Number of iterations

1
10

0
10
0 1 2 3 4 5
10 10 10 10 10 10
Number of unknowns

Prob 1: Number of iterations, AMD ordering


2
10
Chol
CGGS
cut=.05
cut=.5
Number of iterations

1
10

0
10
0 1 2 3 4 5
10 10 10 10 10 10
Number of unknowns

Figure 28.1. Number of iterations for the various methods applied to the
square domain.
176 Chapter 28. Solutions: Iterative Methods for Linear Systems

Prob 1: Time, Original ordering


3
10
Chol
CGGS
cut=.05
2
10 cut=.5

1
10

0
10
seconds

−1
10

−2
10

−3
10

−4
10
0 1 2 3 4 5
10 10 10 10 10 10
Number of unknowns

Prob 1: Time, AMD ordering


1
10
Chol
CGGS
cut=.05
cut=.5
0
10

−1
10
seconds

−2
10

−3
10

−4
10

−5
10
0 1 2 3 4 5
10 10 10 10 10 10
Number of unknowns

Figure 28.2. Timings for the various methods applied to the square domain.
177

Prob 2: Number of iterations, Original ordering


2
10
Chol
CGGS
cut=.05
cut=.5
Number of iterations

1
10

0
10
2 3 4 5
10 10 10 10
Number of unknowns

Prob 2: Number of iterations, AMD ordering


2
10
Chol
CGGS
cut=.05
cut=.5
Number of iterations

1
10

0
10
2 3 4 5
10 10 10 10
Number of unknowns

Figure 28.3. Number of iterations for the various methods applied to the
domain with the circle cut out.
178 Chapter 28. Solutions: Iterative Methods for Linear Systems

Prob 2: Time, Original ordering


3
10
Chol
CGGS
cut=.05
cut=.5
2
10

1
10
seconds

0
10

−1
10

−2
10

−3
10
2 3 4 5
10 10 10 10
Number of unknowns

Prob 2: Time, AMD ordering


1
10
Chol
CGGS
cut=.05
cut=.5
0
10

−1
10
seconds

−2
10

−3
10

−4
10
2 3 4 5
10 10 10 10
Number of unknowns

Figure 28.4. Timings for the various methods applied to the domain with
the circle cut out.
Chapter 29

Solutions: Case Study:


Elastoplastic Torsion:
Twist and Stress

CHALLENGE 29.1. A sample MATLAB program is available on the website.


We can estimate the error in E(u) by computing estimates with finer and finer
grids, using the finest one as an approximation to truth. We expect the error in
the estimates to drop by a factor of 4 each time the mesh size is halved (since
the error is proportional to h2 ), and that is what we observe. The mesh of Figure
29.1 produces an energy estimate with estimated error less than 0.1; the resulting
solution is shown in Figure 29.2.

CHALLENGE 29.2. We set up the Lagrangian function


  2 
2 2 x !2 y
L(x, y, λ) = (x − z1 ) + (y − z2 ) − λ + −1 ,
α β

where the scalar λ is the Lagrange multiplier for the constraint. Setting the three
partial derivatives to zero yields
x
2(x − z1 ) − 2λ 2 = 0,
α
y
2(y − z2 ) − 2λ 2 = 0,
β
!  2
x 2 y
+ − 1 = 0.
α β

We conclude that
α 2 z1
x= , (29.1)
α2 − λ
β 2 z2
y= 2 , (29.2)
β −λ

179
180 Chapter 29. Solutions: Case Study: Elastoplastic Torsion: Twist and Stress

0.8

0.6

0.4

0.2

0
y

−0.2

−0.4

−0.6

−0.8

−1
−1 −0.5 0 0.5 1
x

Figure 29.1. Mesh used for a circular cross-section.

1 2.5

0.8

0.6 2

0.4

0.2 1.5

0
y

−0.2 1

−0.4

−0.6 0.5

−0.8

−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x

Figure 29.2. Solution for the elastic model using a circular cross-section.

as long as the denominators are nonzero. Since |x| ≤ α and |y| ≤ β, we conclude
that the solution we seek has λ satisfying 0 ≤ λ ≤ min(α2 , β 2 ). So we can solve our
problem by solving the nonlinear equation
 2
x !2 y
f (λ) = + −1=0
α β
181

using (29.1) and (29.2) to define x(λ) and y(λ).


These formulas fail when z1 = 0 or z2 = 0. There are two points to check, de-
pending on whether it is shorter to move horizontally or vertically to the boundary.
When z = 0, for example, then the solution is either (x, y) = (0, β) or (α, 0), de-
pending on whether β or α is smaller. Full details are given in the sample program
for Challenge 3 and also in a description by David Eberly [1].

CHALLENGE 29.3. A sample program appears on the website as dist to ellipse.m.


The testing program plots the distances on a grid of points in the ellipse. Note that
it is important to test points that are near zero. To validate the program, we might
repeat the runs with various values of α and β, and also test the program for a
point bf z outside the ellipse.

CHALLENGE 29.4. The results are shown in Figures 29.3 and 29.4, created
with a program on the website. The meshes we used had the same refinement
as that determined for the circular domain of Challenge 1. A sensitivity analysis
should be done by refining the mesh once to see how much the solution changes in
order to obtain an error estimate.
Note that it would be more computationally efficient to take advantage of the
sequence of problems being solved by using the solution at the previous value of αθ
as an initial guess for the next value. See Chapter 24 for more information on such
continuation methods.

[1] David Eberly, Distance from a Point to an Ellipse in 2D, Magic Software, Inc.
www.magic-software.com/Documentation/DistanceEllipse2Ellipse2.pdf
182 Chapter 29. Solutions: Case Study: Elastoplastic Torsion: Twist and Stress

Figure 29.3. Elasto-plastic solutions for various cross-sections. On the


left, αθ = 0.5; on the right, αθ = 1.0.

2.5
b/a = 1.00
b/a = 0.80
b/a = 0.65
b/a = 0.50
b/a = 0.20
2

1.5
Torque/ ( σ0 α3)

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
G α θ / σ0

Figure 29.4. Torque computed for various cross-sections as θ is increased.


The red stars mark the boundary between elastic solutions and elastoplastic solu-
tions.
Chapter 30

Solutions: Case Study:


Fast Solvers and Sylvester
Equations: Both Sides
Now

CHALLENGE 30.1. Equating the (j, k) element on each side of the equation

(By U + UBx ) = F,

we obtain
1
f (xj , yk ) = (−u(xj−1 , yk )+2u(xj , yk )−u(xj+1 , yk )−u(xj , yk−1 )+2u(xj , yk )−u(xj , yk+1 )),
h2
which is the same as equation (k − 1)n + j of

(Ax + Ay )u = f .

CHALLENGE 30.2.
(a) Using MATLAB notation for subvectors, the algorithm is:
for i = 1 : n,
for j = 1 : n,
U (i, j) = (C(i, j) − L(i, 1 : i − 1) ∗ U (1 : i − 1, j)
−U (i, 1 : j − 1) ∗ R(1 : j − 1, j))/(L(i, i) + R(j, j))
end
end

The number of multiplications is



n 
n
(i − 1 + j − 1) = n2 (n − 1),
i=1 j=1

and the other operations are also easy to count.

183
184 Chapter 30. Solutions: Case Study: Fast Solvers and Sylvester Equations

(b) The algorithm fails if L(i, i) + R(j, j) = 0 for some value of i and j. The main
diagonal elements of triangular matrices are the eigenvalues of the matrix, so it is
necessary and sufficient that L and −R have no common eigenvalues.
(c) If AU + UB = C, then
WLW∗ U + UYRY∗ = C.
 + UR
Multiplying on the left by W∗ and on the right by Y, we obtain LU  = C.


CHALLENGE 30.3. The algorithm of Challenge 2(a) reduces to U (i, j) =


F (i, j)/(L(i, i) + R(j, j)) for i, j = 1, . . . , n, which requires n2 additions and divi-
sions.

CHALLENGE 30.4.
(a) Recall the identities
sin(a ± b) = sin a cos b ± cos a sin b.
If we form Bx times the jth column of V, then the kth element is
 
−vk−1,j + 2vk,j − vk+1,j αj (k − 1)jπ kjπ (k + 1)jπ
= − sin + 2 sin − sin
h2 h2 n+1 n+1 n+1

αj kjπ jπ kjπ jπ kjπ
= 2 − sin cos + cos sin + 2 sin
h n+1 n+1 n+1 n+1 n+1

kjπ jπ kjπ jπ
− sin cos − cos sin
n+1 n+1 n+1 n+1
 
αj jπ kjπ
= 2 2 − 2 cos sin
h n+1 n+1
 
1 jπ
= 2 2 − 2 cos vk,j
h n+1
= λj vk,j .
Stacking these elements we obtain Bx vj = λj vj .
(b) This follows by writing the kth component of Vy.

CHALLENGE 30.5. See the website for the programs. The results are shown
in Figures 30.1 and 30.2. All of the algorithms give accurate results, but as n gets
large, the efficiency of the fast algorithm of Challenge 4 becomes more apparent.
185

100
Matlab backslash
Schur algorithm
80 Fast sin transform

time(sec)
60

40

20

0
0 100 200 300 400 500 600
n

2
10
Matlab backslash
Schur algorithm
Fast sin transform
0
10
time(sec)

−2
10

−4
10
0 100 200 300 400 500 600
n

Figure 30.1. The time (seconds on a Sun UltraSPARC-III with clock speed
750 MHz running MATLAB 6) taken by the three algorithms as a function of n.
The bottom plot uses logscale to better display the times for the fast sine transform.

−12
x 10
1
Matlab backslash
Schur algorithm
0.8 Fast sin transform
(error norm) / n

0.6

0.4

0.2

0
0 100 200 300 400 500 600
n

−9
x 10
3
Matlab backslash
2.5 Schur algorithm
Fast sin transform
(residual norm) / n

1.5

0.5

0
0 100 200 300 400 500 600
n

Figure 30.2. The accuracy of the three algorithms as a function of n.


186 Chapter 30. Solutions: Case Study: Fast Solvers and Sylvester Equations
Chapter 31

Solutions: Case Study:


Eigenvalues: Valuable
Principles

CHALLENGE 31.1. We can verify by direct computation that wm satisfies the


boundary condition and that −∂ 2 wm /∂x2 − ∂ 2 wm /∂y 2 is (m2 + 2 )π 2 /b2 times
wm , so λm = (m2 + 2 )π 2 /b2 .

CHALLENGE 31.2.
(a) The eigenvalues are
j 2 + k2 2
λjk = π
4
for j, k = 1, 2, . . . . One expression for the eigenfunction is

vjk = sin(jπ(x + 1)/2) sin(kπ(y + 1)/2).

This is not unique, since some of the eigenvalues are multiple. So, for example, λ12 =
λ21 , and any function av12 + bv21 , for arbitrary scalars a and b, is an eigenfunction.
Even for simple eigenvalues, the function vjj can be multiplied by an arbitrary
constant, positive or negative.
The first six vjk are plotted in Figure 31.1, and it is an interesting exercise
to describe them in words. Note that as the eigenvalue increases, the number
of oscillations in the eigenfunction increases. In order to capture this behavior
in a piecewise linear approximation, we need a finer mesh for the eigenfunctions
corresponding to larger eigenvalues than we do for those corresponding to smaller
eigenvalues.
(b) When using piecewise linear finite elements, the jth computed eigenvalue lies
in an interval [λj , λj + Cj h2 ], where h is the mesh size used in the triangulation.
This is observed in our computation using the program problem1b.m, found on
the website. The error plots are shown in Figure 31.2. The horizontal axis is the

187
188 Chapter 31. Solutions: Case Study: Eigenvalues: Valuable Principles

Figure 31.1. Eigenfunctions corresponding to the eigenvalues λ =


4.9348, 12.3370, 12.3370 (top row) and λ = 19.7392, 24.6740, 24.6740 (bottom row).

number of triangles, which is approximately proportional to 1/h2 . The errors in


the approximate eigenvalues are as follows:

λj Mesh1 Mesh 2 Mesh 3 Mesh 4


j=1 5.03e-02 1.27e-02 3.20e-03 8.02e-04
j=6 1.29e+00 3.25e-01 8.15e-02 2.04e-02
j = 11 4.17e+00 1.04e+00 2.60e-01 6.50e-02
j = 16 8.54e+00 2.12e+00 5.28e-01 1.32e-01
j = 21 1.49e+01 3.67e+00 9.15e-01 2.29e-01
189

Errors in eigenvalues as a function of 1/h2


2
10
λ1
λ6
λ11
1 λ16
10
λ21

0
10
error in eigenvalue

−1
10

−2
10

−3
10

−4
10
2 3 4 5
10 10 10 10
(approx) 1/h2

Figure 31.2. The errors in the eigenvalue approximations.

The error ratios are as follows:


lambdaj Mesh 1 vs. 2 Mesh 2 vs. 3 Mesh 3 vs. 4
j=1 3.95e+00 3.98e+00 3.99e+00
j=6 3.98e+00 3.98e+00 3.99e+00
j = 11 4.02e+00 4.00e+00 4.00e+00
j = 16 4.04e+00 4.01e+00 4.00e+00
j = 21 4.05e+00 4.01e+00 4.00e+00
Therefore, the error is reduced by a factor of 4 as the side of each triangle is reduced
by a factor of 2, so the error is O(h2 ), as expected, but the larger the eigenvalue,
the finer the mesh necessary to achieve a given accuracy.

CHALLENGE 31.3.
(a) Suppose (for convenience of notation) that Ω ⊂ R2 . (Other dimensions are just
as easy.) First we apply integration by parts (with zero boundary conditions) to
see that if w = 0
 
(w, Aw) = − w∇ · (a∇w)dxdy
Ω
 
= ∇w · (a∇w)dxdy
Ω
   2  2
∂w ∂w
= a1 (x, y) + a2 (x, y) dxdy
Ω ∂x ∂y
≥ 0,
190 Chapter 31. Solutions: Case Study: Eigenvalues: Valuable Principles

since a1 (x), a2 (x) > 0.


Suppose Aw = λw. Then

0 ≤ (w, Aw) = λ(w, w),

so λ ≥ 0.
(b) We know that
(w, Aw)
λ1 (Ω) = min
w=0 (w, w)
where the integrals are taken over Ω and w is constrained to be zero on the boundary
of Ω. Suppose that the w that minimizes the function is w̃ and let’s extend w̃ to
make it zero over the part of Ω̃ not contained in Ω. Then

(w, Aw) (w̃, Aw̃)


λ1 (Ω̃) = min ≤ = λ1 (Ω).
w=0 (w, w) (w̃, w̃)

CHALLENGE 31.4. From Challenge 1, we know that √ the smallest eigenvalue


for a square with dimension b is 2π 2 /b, so we want b = 2/2. Using MATLAB’s
PDE Toolbox interactively, we discover that α ≈ 1.663.
Chapter 32

Solutions: Multigrid
Methods: Managing
Massive Meshes

CHALLENGE 32.1. The V-cycle performs the following operations:

η1 Gauss–Seidel iterations for h = 1/16: 15η1 multiplications (by h2 /2)


h = 1/16 residual evaluation 16 multiplications
Multiplication by R1/8 14 multiplications

η1 Gauss–Seidel iterations for h = 1/8: 7η1 multiplications


h = 1/8 residual evaluation 8 multiplications
Multiplication by R1/4 6 multiplications

η1 Gauss–Seidel iterations for h = 1/4: 3η1 multiplications


h = 1/4 residual evaluation 4 multiplications
Multiplication by R1/2 2 multiplications

Direct solution of the system for h = 1/2: 1 multiplication

Multiplication by P1/2 2 multiplications


η2 Gauss–Seidel iterations for h = 1/4: 3η2 multiplications

Multiplication by P1/4 6 multiplications


η2 Gauss–Seidel iterations for h = 1/8: 7η2 multiplications

Multiplication by P1/8 14 multiplications


η2 Gauss–Seidel iterations for h = 1/16: 15η2 multiplications

The total cost is less than the cost of 2(η1 + η2 ) iterations, plus 2 residual
calculations, all done on the h = 1/16 grid, plus 4 multiplications by R1/8 .

191
192 Chapter 32. Solutions: Multigrid Methods: Managing Massive Meshes

CHALLENGE 32.2. The right-hand sides have

15 + 7 + 3 + 1 = 16(1 + 1/2 + 1/4 + 1/8) − 4,

elements, which is less than twice the storage necessary for the right-hand side for
the finest grid. The same is true for the solution vectors. Similarly, each matrix Ah
has at most half of the number of nonzeros of the one for the next finer grid, so the
total matrix storage is less than 2 times that for A1/16 .
The matrices Ph can be stored as sparse matrices or, since we only need to form
their products with vectors, we can just write a function to perform multiplication
without explicitly storing them.

CHALLENGE 32.3. In Chapter 27 we saw that the fastest algorithm for the
finest grid for myproblem=1 was the AMD-Cholesky algorithm, which, on my com-
puter, took about 0.2 seconds and storage about 5 times that for the matrix. The
fastest iterative method, conjugate gradients with an incomplete Cholesky precon-
ditioner, took 0.9 seconds. My implementation of multigrid for this problem took
4 iterations and 8.2 seconds. The virtue of multigrid, though, is if we want a finer
grid, we will probably still get convergence in about 4 iterations, while the number
of iterations of the other algorithms increases with h, so eventually multigrid will
win.

CHALLENGE 32.4. The number of iterations remained 4 for κ = 10, 100, and
−10, but for κ = −100, multigrid failed to converge. As noted in the challenge, a
more complicated algorithm is necessary.
Note that the function smooth.m is a much faster implementation of Gauss–
Seidel than that given in the solution to Challenge 28.6 in Chapter 28.

You might also like