EijkhoutProgrammingProjects Book
EijkhoutProgrammingProjects Book
OTHER LANGUAGES
VICTOR EIJKHOUT
2020
2 Introduction to Scientific Programming
Contents
I SIMPLE PROJECTS 7
1 Prime numbers 9
1.1 Arithmetic 9
1.2 Conditionals 9
1.3 Looping 10
1.4 Functions 10
1.5 While loops 10
1.6 Classes and objects 11
1.6.1 Exceptions 12
1.6.2 Prime number decomposition 12
1.7 Other 13
1.8 Eratosthenes sieve 13
1.8.1 Arrays implementation 14
1.8.2 Streams implementation 14
1.9 Range implementation 15
1.10 User-friendliness 15
2 Geometry 17
2.1 Basic functions 17
2.2 Point class 17
2.3 Using one class in another 19
2.4 Is-a relationship 20
2.5 Pointers 20
2.6 More stuff 21
3 Zero finding 23
3.1 Root finding by bisection 23
3.1.1 Simple implementation 23
3.1.2 Polynomials 24
3.1.3 Left/right search points 25
3.1.4 Root finding 26
3.1.5 Object implementation 27
3.1.6 Templating 27
3.2 Newton’s method 28
3.2.1 Function implementation 28
3.2.2 Using lambdas 28
4 Eight queens 31
3
CONTENTS
II RESEARCH PROJECTS 37
5 Infectuous disease simulation 39
5.1 Model design 39
5.1.1 Other ways of modeling 39
5.2 Coding 40
5.2.1 The basics 40
5.2.2 Population 41
5.2.3 Contagion 42
5.2.4 Spreading 42
5.2.5 Mutation 43
5.2.6 Diseases without vaccine: Ebola and Covid-19 44
5.3 Ethics 44
5.4 Project writeup and submission 44
5.4.1 Program files 44
5.4.2 Writeup 45
5.5 Bonus: mathematical analysis 45
6 Google PageRank 47
6.1 Basic ideas 47
6.2 Clicking around 48
6.3 Graph algorithms 48
6.4 Page ranking 49
6.5 Graphs and linear algebra 50
7 Redistricting 51
7.1 Basic concepts 51
7.2 Basic functions 52
7.2.1 Voters 52
7.2.2 Populations 52
7.2.3 Districting 53
7.3 Strategy 54
7.4 Efficiency: dynamic programming 56
7.5 Extensions 56
7.6 Ethics 57
8 Amazon delivery truck scheduling 59
8.1 Problem statement 59
8.2 Coding up the basics 59
8.2.1 Address list 59
8.2.2 Add a depot 62
8.2.3 Greedy construction of a route 62
8.3 Optimizing the route 63
8.4 Multiple trucks 64
III APPENDIX 87
13 Style guide for project submissions 89
13.1 General approach 89
13.2 Style 89
13.3 Structure of your writeup 89
13.3.1 Introduction 90
Victor Eijkhout 5
CONTENTS
SIMPLE PROJECTS
Chapter 1
Prime numbers
In this chapter you will do a number of exercises regarding prime numbers that build on each other. Each
section lists the required prerequisites. Conversely, the exercises here are also referenced from the earlier
chapters.
1.1 Arithmetic
Exercise 1.1. Read two numbers and print out their modulus. The modulus operator is x%y.
• Can you also compute the modulus without the operator?
• What do you get for negative inputs, in both cases?
• Assign all your results to a variable before outputting them.
1.2 Conditionals
Exercise 1.2. Read two numbers and print a message stating whether the second is as divisor of the
first:
Code: Output
[primes] division:
int number,divisor;
bool is_a_divisor; ( echo 6 ; echo 2 ) |
/* ... */ divisiontest
if ( Enter a number:
/* ... */ Enter a trial divisor:
) { Indeed, 2 is a divisor of 6
cout << "Indeed, " << divisor
<< " is a divisor of " ( echo 9 ; echo 2 ) |
<< number << ’\n’; divisiontest
} else { Enter a number:
cout << "No, " << divisor Enter a trial divisor:
<< " is not a divisor of " No, 2 is not a divisor of 9
<< number << ’\n’;
}
9
1. Prime numbers
1.3 Looping
Exercise 1.3. Read an integer and set a boolean variable to determine whether it is prime by testing
for the smaller numbers if they divide that number.
Print a final message
Your number is prime
or
Your number is not prime: it is divisible by ....
Exercise 1.4. Rewrite the previous exercise with a boolean variable to represent the primeness of the
input number.
Exercise 1.5. Read in an integer r. If it is prime, print a message saying so. If it is not prime, find
integers p ≤ q so that r = p · q and so that p and q are as close together as possible. For instance, for
r = 30 you should print out 5,6, rather than 3,10. You are allowed to use the function sqrt.
1.4 Functions
Above you wrote several lines of code to test whether a number was prime.
Exercise 1.6. Write a function is_prime that has an integer parameter, and returns a boolean cor-
responding to whether the parameter was prime.
int main() {
bool isprime;
isprime = is_prime(13);
Read the number in, and print the value of the boolean.
Does your function have one or two return statements? Can you imagine what the other possibility
looks like? Do you have an argument for or against it?
Exercise 1.7. Take your prime number testing function is_prime, and use it to write a program
that prints multiple primes:
• Read an integer how_many from the input, indicating how many (successive) prime numbers
should be printed.
• Print that many successive primes, each on a separate line.
• (Hint: keep a variable number_of_primes_found that is increased whenever a new
prime is found.)
In the previous exercise you defined the primegenerator class, and you made one object of that
class:
primegenerator sequence;
But you can make multiple generators, that all have their own internal data and are therefore independent
of each other.
Exercise 1.9. The Goldbach conjecture says that every even number, from 4 on, is the sum of two
primes p + q. Write a program to test this for the even numbers up to a bound that you read in. Use the
primegenerator class you developed in exercise 1.8Classes and objectsexcounter.1.8.
This is a great exercise for a top-down approach!
1. Make an outer loop over the even numbers e.
2. For each e, generate all primes p.
3. From p + q = e, it follows that q = e − p is prime: test if that q is prime.
For each even number e then print e,p,q, for instance:
The number 10 is 3+7
If multiple possibilities exist, only print the first one you find.
An interesting corollary of the Goldbach conjecture is that each prime (start at 5) is equidistant between
two other primes.
The Goldbach conjecture says that every even number 2n (starting at 4), is the sum of two primes
p + q:
2n = p + q.
Victor Eijkhout 11
1. Prime numbers
1.6.1 Exceptions
Exercise 1.11. Revisit the prime generator class (exercise 1.8Classes and objectsexcounter.1.8) and
let it throw an exception once the candidate number is too large. (You can hardwire this maximum, or
use a limit; section ??.)
Code: Output
[primes] genx:
try {
do { 9931
auto cur = primes.nextprime(); 9941
cout << cur << ’\n’; 9949
} while (true); 9967
} catch ( string s ) { 9973
cout << s << ’\n’; Reached max int
}
You can implement this decomposition itself as a vector, (the i-th location stores the exponent of the i-th
prime) but let’s use a map instead.
Exercise 1.12. Write a constructor of an Integer from an int, and methods as_int / as_string
that convert the decomposition back to something classical. Start by assuming that each prime factor
appears only once.
Code: Output
[primes] decomposition26:
Integer i2(2);
cout << i2.as_string() << ": " 2ˆ1 : 2
<< i2.as_int() << ’\n’; 2ˆ1 3ˆ1 : 6
Integer i6(6);
cout << i6.as_string() << ": "
<< i6.as_int() << ’\n’;
Exercise 1.13. Extend the previous exercise to having multiplicity > 1 for the prime factors.
Code: Output
[primes] decomposition180:
Integer i180(180);
cout << i180.as_string() << ": " 2ˆ2 3ˆ2 5ˆ1 : 180
<< i180.as_int() << ’\n’;
1.7 Other
The following exercise requires std::optional, which you can learn about in section ??.
Exercise 1.14. Write a function first_factor that optionally returns the smallest factor of a given
input.
auto factor = first_factor(number);
if (factor.has_value())
cout << "Found factor: " << factor.value() << ’\n’;
Victor Eijkhout 13
1. Prime numbers
Exercise 1.15. Read in an integer that denotes the largest number you want to test. Make an array
of integers that long. Set the elements to the successive integers. Apply the sieve algorithm to find the
prime numbers.
Exercise 1.16. Write a stream class that generates integers and use it through a pointer.
Code: Output
[sieve] ints:
for (int i=0; i<7; i++)
cout << "Next int: " Next int: 2
<< the_ints->next() << ’\n’; Next int: 3
Next int: 4
Next int: 5
Next int: 6
Next int: 7
Next int: 8
Next, we need a stream that takes another stream as input, and filters out values from it.
that
1. Implements next, giving filtered values,
2. by calling the next method of the input stream and filtering out values.
Code: Output
[sieve] odds:
auto integers =
make_shared<stream>(); next odd: 3
auto odds = next odd: 5
shared_ptr<stream> next odd: 7
( new filtered_stream(2,integers) ); next odd: 9
for (int step=0; step<5; step++) next odd: 11
cout << "next odd: "
<< odds->next() << ’\n’;
Now you can implement the Eratosthenes sieve by making a filtered_stream for each prime num-
ber.
1.10 User-friendliness
Use the cxxopts package (section ??) to add commandline options to some primality programs.
Exercise 1.20. Take your old prime number testing program, and add commandline options:
• the -h option should print out usage information;
• specifying a single int --test 1001 should print out all primes under that number;
• specifying a set of ints --tests 57,125,1001 should test primeness for those.
Victor Eijkhout 15
1. Prime numbers
Geometry
In this set of exercises you will write a small ‘geometry’ package: code that manipulates points, lines,
shapes. These exercises mostly use the material of section ??.
Exercise 2.1. Write a function with (float or double) inputs x, y that returns the distance of point
(x, y) to the origin.
Test the following pairs: 1, 0; 0, 1; 1, 1; 3, 4.
Exercise 2.2. Write a function with inputs x, y, θ that alters x and y corresponding to rotating the
point (x, y) over an angle θ.
′
x cos θ − sin θ x
=
y′ sin θ cos θ y
17
2. Geometry
Exercise 2.4. Extend the Point class of the previous exercise with a method: distance that
computes the distance between this point and another: if p,q are Point objects,
p.distance(q)
Exercise 2.5. Write a method halfway that, given two Point objects p,q, construct the Point
halfway, that is, (p + q)/2:
Point p(1,2.2), q(3.4,5.6);
Point h = p.halfway(q);
You can write this function directly, or you could write functions Add and Scale and combine these.
(Later you will learn about operator overloading.)
How would you print out a Point to make sure you compute the halfway point correctly?
Exercise 2.7. Revisit exercise 2.2Basic functionsexcounter.2.2 using the Point class. Your code
should now look like:
newpoint = point.rotate(alpha);
Exercise 2.8. Advanced. Can you make a Point class that can accomodate any number of space
dimensions? Hint: use a vector; section ??. Can you make a constructor where you do not specify
the space dimension explicitly?
Exercise 2.11. Revisit exercises 2.2Basic functionsexcounter.2.2 and 2.7Point classexcounter.2.7, in-
troducing a Matrix class. Your code can now look like
newpoint = point.apply(rotation_matrix);
or
newpoint = rotation_matrix.apply(point);
Suppose you want to write a Rectangle class, which could have methods such as float Rectangle::area()
or bool Rectangle::contains(Point). Since rectangle has four corners, you could store four
Point objects in each Rectangle object. However, there is redundancy there: you only need three
points to infer the fourth. Let’s consider the case of a rectangle with sides that are horizontal and vertical;
then you need only two points.
Victor Eijkhout 19
2. Geometry
Intended API:
float Rectangle::area();
Exercise 2.12.
1. Make a class Rectangle (sides parallel to axes) with a constructor:
Rectangle(Point botleft,float width,float height);
The logical implementation is to store these quantities. Implement methods:
float area(); float rightedge_x(); float topedge_y();
and write a main program to test these.
2. Add a second constructor
Rectangle(Point botleft,Point topright);
Can you figure out how to use member initializer lists for the constructors?
Exercise 2.13. Make a copy of your solution of the previous exercise, and redesign your class so that
it stores two Point objects. Your main program should not change.
The previous exercise illustrates an important point: for well designed classes you can change the imple-
mentation (for instance motivated by efficiency) while the program that uses the class does not change.
Exercise 2.14. Take your code where a Rectangle was defined from one point, width, and height.
Make a class Square that inherits from Rectangle. It should have the function area defined,
inherited from Rectangle.
First ask yourself: what should the constructor of a Square look like?
Exercise 2.15. Revisit the LinearFunction class. Add methods slope and intercept.
Now generalize LinearFunction to StraightLine class. These two are almost the same except
for vertical lines. The slope and intercept do not apply to vertical lines, so design StraightLine
so that it stores the defining points internally. Let LinearFunction inherit.
2.5 Pointers
The following exercise is a little artificial.
Exercise 2.16. Make a DynRectangle class, which is constructed from two shared-pointers-to-Point
objects:
auto
origin = make_shared<Point>(0,0),
fivetwo = make_shared<Point>(5,2);
DynRectangle lielow( origin,fivetwo );
Calculate the area, scale the top-right point, and recalculate the area:
Code: Output
[pointer] dynrect:
cout << "Area: " << lielow.area() <<
’\n’; Area: 10
/* ... */ Area: 40
// scale the ‘fivetwo’ point by two
cout << "Area: " << lielow.area() <<
’\n’;
You can base this off the file pointrectangle.cxx in the repository
to the Rectangle class. The result is an array of all four corners, not in any order. Show by a compiler
error that the array can not be altered.
Exercise 2.18. Revisit exercise 2.5 and replace the add and scale functions by overloaded operators.
Hint: for the add function you may need ‘this’.
Victor Eijkhout 21
2. Geometry
Zero finding
For many functions f , finding their zeros, that is, the values x for which f (x) = 0, can not be done
analytically. You then have to resort to numerical root finding schemes. In this project you will develop
gradually more complicated implementations of a simple scheme: root finding by bisection.
In this scheme, you start with two points where the function has opposite signs, and move either the left
or right point to the mid point, depending on what sign the function has there. See figure 3.1.
In section 3.2 we will then look at Newton’s method.
Here we will not be interested in mathematical differences between the methods, though these are im-
portant: we will use these methods to exercise some programming techniques.
23
3. Zero finding
3.1.2 Polynomials
First of all, we need to have a way to represent polynomials. For a polynomial of degree d we need d + 1
coefficients:
We implement this by storing the coefficients in a vector<double>. We make the following arbitrary
decisions
1. let the first element of this vector be the coefficient of the highest power, and
2. for the coefficients to properly define a polynomial, this leading coefficient has to be nonzero.
Let’s start by having a fixed test polynomial, provided by a function set_coefficients. For this func-
tion to provide a proper polynomial, it has to satisfy the following test:
TEST_CASE( "coefficients represent polynomial" "[1]") {
vector<double> coefficients = { 1.5, 0., -3 };
REQUIRE( coefficients.size()>0 );
REQUIRE( coefficients.front()!=0. );
}
Above we postulated two conditions that an array of numbers should satisfy to qualify as the coeffi-
cients of a polynomial. Your code will probably be testing for this, so let’s introduce a boolean function
is_proper_polynomial:
• This function returns true if the array of numbers satisfies the two conditions;
• it returns false if either condition is not satisfied.
In order to test your function is_proper_polynomial you should check that
• it recognizes correct polynomials, and
• it fails for improper coefficients that do not properly define a polynomial.
Exercise 3.2. Write a function is_proper_polynomial as described, and write unit tests for it, both
passing and failing:
vector<double> good = /* proper coeficients */ ;
REQUIRE( is_proper_polynomial(good) );
vector<double> notso = /* improper coeficients */ ;
REQUIRE( not is_proper_polynomial(good) );
Next we need polynomial evaluation. We will build a function evaluate_at with the following defini-
tion:
double evaluate_at( const std::vector<double>& coefficients,double x);
You can interpret the array of coefficients in (at least) two ways, but with equation (3.1) we proscribed
one particular interpretation.
So we need a test that the coefficients are indeed interpreted with the leading coefficient first, and not
with the leading coefficient last. For instance:
polynomial second( {2,0,1} );
// correct interpretation: 2xˆ2 + 1
REQUIRE( second.is_proper() );
REQUIRE( second.evaluate_at(2) == Catch::Approx(9) );
// wrong interpretation: 1xˆ2 + 2
REQUIRE( second.evaluate_at(2) != Catch::Approx(6) );
y ← f (x).
that is, the function values in the left and right point are of opposite sign. Then there is a zero in the
interval (x− , x+ ); see figure 3.1.
Victor Eijkhout 25
3. Zero finding
Now we can find x− , x+ : start with some interval and move the end points out until the function values
have opposite sign.
Make sure your code passes these tests. What test do you need to add for the function values?
• moves one of the bounds to the mid point, such that the function again has opposite signs in the
left and right search point.
The structure of the code is as follows:
double find_zero( /* something */ ) {
while ( /* left and right too far apart */ ) {
// move bounds left and right closer together
}
return something;
}
Again, we test all the functionality separately. In this case this means that moving the bounds should be
a testable step.
Design unit tests, including on the precision attained, and make sure your code passes them.
Revisit the exercises of section 3.1.1 and introduce a polynomial class that stores the polynomial coef-
ficients. Several functions now become members of this class.
How can you generalize the polynomial class, for instance to the case of special forms such as (1 + x)n ?
3.1.6 Templating
In the implementations so far we used double for the numerical type. Make a templated version that
works both with float and double.
Can you see a difference in attainable precision between the two types?
Victor Eijkhout 27
3. Zero finding
f (xn )
xn+1 = xn −
f ′ (xn )
f (x) = x2 − n, f ′ (x) = 2x
which has the effect that, if we find an x such that f (x) = 0, we have
√
x = n.
It is of course simple to code this specific case; it should take you about 10 lines. However, we want to
have a general code that takes any two functions f, f ′ , and then uses Newton’s method to find a zero
of f .
Early computers had no hardware for computing a square root. Instead, they used Newton’s method .
√
Suppose you have a value y and you want want to compute x = y. This is equivalent to finding the
zero of
f (x) = x2 − y
where y is fixed. To indicate this dependence on y, we will write fy (x). Newton’s method then finds
the zero by evaluating
Exercise 3.8.
• Write functions f(x,y) and deriv(x,y), that compute fy (x) and fy′ (x) for the definition
of fy above.
• Read a value y and iterate until |f(x, y)| < 10−5 . Print x.
√
• Second part: write a function newton_root that computes y.
Exercise 3.9. The Newton method (HPC book, section 17) for finding the zero of a function f , that
is, finding the x for which f (x) = 0, can be programmed by supplying the function and its derivative:
double f(double x) { return x*x-2; };
double fprime(double x) { return 2*x; };
Next, we make the code modular by writing a general function newton_root, that contains the Newton
method of the previous exercise. Since it has to work for any functions f, f ′ , you have to pass the
objective function and the derivative as arguments:
double root = newton_root( f,fprime );
Exercise 3.10. Rewrite the Newton exercise above to use a function with prototype
double root = newton_root( f,fprime );
Next we extend functionality, but not by changing the root finding function: instead, we use a more
general way of specifying the objective function and derivative.
Victor Eijkhout 29
3. Zero finding
However, the newton_root function takes a function of only a real argument. Use a capture to make f
dependent on the integer parameter.
Exercise 3.12. You don’t need the gradient as an explicit function: you can approximate it as
f ′ (x) = f (x + h) − f (x) /h
that uses this. You can use a fixed value h=1e-6. Do not reimplement the whole newton method: instead
create a lambda for the gradient and pass it to the function newton_root you coded earlier.
Eight queens
Exercise 4.1. This algorithm will generate all 88 boards. Do you see at least one way to speed up the
search?
Since the number eight is, for now, fixed, you could write this code as an eight-deep loop nest. However
that is not elegant. For example, the only reason for the number 8 in the above exposition is that this is
the traditional size of a chess board. The problem, stated more abstractly as placing n queens on an n × n
board, has solutions for n ≥ 4.
31
4. Eight queens
This routine returns either a solution, or an indication that no solution was possible.
In the next section we will develop a solution systematically in a TDD manner.
The board We start by constructing a board, with a constructor that only indicates the size of the
problem:
ChessBoard(int n);
Bookkeeping: what’s the next row? Assuming that we fill in the board row-by-row, we have an
auxiliary function that returns the next row to be filled:
int next_row_to_be_filled()
This gives us our first simple test: on an empty board, the row to be filled in is row zero.
Exercise 4.3. Write this method and make sure that it passes the test for an empty board.
TEST_CASE( "empty board","[1]" ) {
constexpr int n=10;
ChessBoard empty(n);
REQUIRE( empty.next_row_to_be_filled()==0 );
}
By the rules of TDD you can actually write the method so that it only satisfies the test for the empty
board. Later, we will test that this method gives the right result after we have filled in a couple of rows,
and then of course your implementation needs to be general.
Place one queen Next, we have a function to place the next queen, whether this gives a feasible board
(meaning that no pieces can capture each other) or not:
void place_next_queen_at_column(int i);
This method should first of all catch incorrect indexing: we assume that the placement routine throws an
exception for invalid column numbers.
ChessBoard::place_next_queen_at_column( int c ) {
if ( /* c is outside the board */ )
throw(1); // or some other exception.
(Suppose you didn’t test for incorrect indexing. Can you construct a simple ‘cheating’ solution at any
size?)
Exercise 4.4. Write this method, and make sure it passes the following test for valid and invalid
column numbers:
REQUIRE_THROWS( empty.place_next_queen_at_column(-1) );
REQUIRE_THROWS( empty.place_next_queen_at_column(n) );
REQUIRE_NOTHROW( empty.place_next_queen_at_column(0) );
REQUIRE( empty.next_row_to_be_filled()==1 );
Is a (partial) board feasible? If you have a board, even partial, you want to test if it’s feasible, meaning
that the queens that have been placed can not capture each other.
The prototype of this method is:
bool feasible()
This test has to work for simple cases to begin with: an empty board is feasible, as is a board with only
one piece.
ChessBoard empty(n);
REQUIRE( empty.feasible() );
Victor Eijkhout 33
4. Eight queens
Exercise 4.5. Write the method and make sure it passes these tests.
We shouldn’t only do successfull tests, sometimes referred to as the ‘happy path’ through the code. For
instance, if we put two queens in the same column, the test should fail.
Exercise 4.6. Take the above initial attempt with a queen in position (0, 0), and add another queen in
column zero of the next row. Check that it passes the test:
ChessBoard collide = one;
// place a queen in a ‘colliding’ location
collide.place_next_queen_at_column(0);
// and test that this is not feasible
REQUIRE( not collide.feasible() );
Add a few tests more of your own. (These will not be exercised by the submission script, but you may
find them useful anyway.)
Testing configurations If we want to test the feasibility of non-trivial configurations, it is a good idea
to be able to ‘create’ solutions. For this we need a second type of constructor where we construct a fully
filled chess board from the locations of the pieces.
ChessBoard( int n,vector<int> cols );
ChessBoard( vector<int> cols );
• If the constructor is called with only a vector, this describes a full board.
• Adding an integer parameter indicates the size of the board, and the vector describes only the
rows that have been filled in.
Exercise 4.7. Write these constructors, and test that an explicitly given solution is a feasible board:
ChessBoard five( {0,3,1,4,2} );
REQUIRE( five.feasible() );
For an elegant approach to implementing this, see delegating constructors; section ??.
Ultimately we have to write the tricky stuff.
takes a board, empty or not, and tries to fill the remaining rows.
One problem is that this method needs to be able to communicate that, given some initial configuration,
no solution is possible. For this, we let the return type of place_queens be optional<ChessBoard>:
• if it is possible to finish the current board resulting in a solution, we return that filled board;
• otherwise we return {}, indicating that no solution was possible.
With the recursive strategy discussed in section 4.2, this placement method has roughly the following
structure:
place_queens() {
for ( int col=0; col<n; col++ ) {
ChessBoard next = *this;
// put a queen in column col on the ‘next’ board
// if this is feasible and full, we have a solution
// if it is feasible but no full, recurse
}
}
The line
ChessBoard next = *this;
The final step Above you coded the method feasible that tested whether a board is still a candidate
for a solution. Since this routine works for any partially filled board, you also need a method to test if
you’re done.
Because the function place_queens is recursive, it is a little hard to test in its entirety.
We start with a simpler test: if you almost have the solution, it can do the last step.
to generate a board that has all but the last row filled in, and that is still feasible. Test that you can find
the solution:
ChessBoard almost( 4, {1,3,0} );
Victor Eijkhout 35
4. Eight queens
Since this test only fills in the last row, it only does one loop, so printing out diagnostics is possible,
without getting overwhelmed in tons of output.
Solutions and non-solutions Now that you have the solution routine, test that it works starting from
an empty board. For instance, confirm there are no 3 × 3 solutions:
TEST_CASE( "no 3x3 solutions","[9]" ) {
ChessBoard three(3);
auto solution = three.place_queens();
REQUIRE( not solution.has_value() );
}
Exercise 4.11. (Optional) Can you modify your code so that it counts all the possible solutions?
Exercise 4.12. (Optional) How does the time to solution behave as function of n?
RESEARCH PROJECTS
Chapter 5
This section contains a sequence of exercises that builds up to a somewhat realistic simulation of the
spread of infectious diseases.
39
5. Infectuous disease simulation
fact, they can be combined. We can consider a country as a set of cities, where people travel between any
pair of cities. We then use a compartmental model inside a city, and a contact network between cities.
In this project we will only use the network model.
5.2 Coding
5.2.1 The basics
We start by writing code that models a single person. The main methods serve to infect a person, and to
track their state. We need to have some methods for inspecting that state.
The intended output looks something like:
On day 10, Joe is susceptible
On day 11, Joe is susceptible
On day 12, Joe is susceptible
On day 13, Joe is susceptible
On day 14, Joe is sick (5 to go)
On day 15, Joe is sick (4 to go)
On day 16, Joe is sick (3 to go)
On day 17, Joe is sick (2 to go)
On day 18, Joe is sick (1 to go)
On day 19, Joe is recovered
int step = 1;
for ( ; ; step++) {
joe.update();
float bad_luck = (float) rand()/(float)RAND_MAX;
if (bad_luck>.95)
joe.infect(5);
cout << "On day " << step << ", Joe is "
<< joe.status_string() << ’\n’;
if (joe.is_stable())
break;
}
Here is a suggestion how you can model disease status. Use a single integer with the following interpre-
tation:
Remark 2 Consider a proint of programming style. Now that you’ve modeled the state of a person with
an integer, you can use that as
void infect(n) {
if (state==0)
state = n;
}
5.2.2 Population
Next we need a Population class. Implement a population as a vector consisting of Person
objects. Initially we only infect one person, and there is no transmission of the disease.
The trace output should look something like:
Size of
population?
In step 1 #sick: 1 : ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ?
In step 2 #sick: 1 : ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ?
In step 3 #sick: 1 : ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ?
In step 4 #sick: 1 : ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ?
In step 5 #sick: 1 : ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ?
In step 6 #sick: 0 : ? ? ? ? ? ? ? ? ? ? - ? ? ? ? ? ? ? ? ?
Disease ran its course by step 6
Remark 3 Such a display is good for a sanity check on your program behavior. If you include such
displays in your writeup, make sure to use a monospace font, and don’t use a population size that needs
line wrapping. In further testing, you should use large populations, but do not include these displays.
Victor Eijkhout 41
5. Infectuous disease simulation
Population population(npeople);
For the ‘random’ part you can use the C language random number generator (section ??), or
the new Standard Template Library (STL) one in section ??.
• Write a method count_infected that counts how many people are infected.
• Write an update method that updates all persons in the population.
• Loop the update method until no people are infected: the Population::update method
should apply Person::update to all person in the population.
Write a routine that displays the state of the popular, using for instance: ? for susceptible, + for infected,
- for recovered.
5.2.3 Contagion
This past exercise was too simplistic: the original patient zero was the only one who ever got sick. Now
let’s incorporate contagion, and investigate the spread of the disease from a single infected person.
We start with a very simple model of infection.
Exercise 5.3. Read in a number 0 ≤ p ≤ 1 representing the probability of disease transmission upon
contact.
population.set_probability_of_transfer(probability);
Incorporate this into the program: in each step the direct neighbours of an infected person can now get
sick themselves. Run a number of simulations with population sizes and contagion probabilities. Are
there cases where people escape getting sick?
Exercise 5.4. Incorporate vaccination: read another number representing the percentage of people
that has been vaccinated. Choose those members of the population randomly.
Describe the effect of vaccinated people on the spread of the disease. Why is this model unrealistic?
5.2.4 Spreading
To make the simulation more realistic, we let every sick person come into contact with a fixed number
of random people every day. This gives us more or less the SIR model ; https://en.wikipedia.
org/wiki/Epidemic_model.
Set the number of people that a person comes into contact with, per day, to 6 or so. (You can also let this
be an upper bound for a random value, but that does not essentially change the simulation.) You have
already programmed the probability that a person who comes in contact with an infected person gets sick
themselves. Again start the simulation with a single infected person.
Exercise 5.5. Code the random interactions. Now run a number of simulations varying
• The percentage of people vaccinated, and
• the chance the disease is transmitted on contact.
Record how long the disease runs through the population. With a fixed number of contacts and proba-
bility of transmission, how is this number of function of the percentage that is vaccinated?
Report this function as a table or graph. Make sure you have enough data points for a meaningful
conclusion. Use a realistic population size. You can also do multiple runs and report the average, to
even out the effect of the random number generator.
Exercise 5.6. Investigate the matter of ‘herd immunity’: if enough people are vaccinated, then some
people who are not vaccinated will still never get sick. Let’s say you want to have this probability
over 95 percent. Investigate the percentage of vaccination that is needed for this as a function of the
contagiousness of the disease.
As in the previous exercise, make sure your data set is large enough.
Remark 4 The screen output you used above is good for sanity checks on small problems. However,
for realistic simulations you have to think what is a realistic population size. If your university campus
is a population where random people are likely to meet each other, what would be a population size to
model that? How about the city where you live?
Likewise, if you test different vaccination rates, what granularity do you use? With increases of 5 or 10
percent you can print all results to you screen, but you may miss things. Don’t be afraid to generate large
amount of data and feed them directly to a graphing program.
5.2.5 Mutation
The Covid years have shown how important mutations of an original virus can be. Next, you can include
mutation in your project. We model this as follows:
• Every so many transmissions, a virus will mutate into a new variant.
• A person who has recovered from one variant is still susceptible to other variants.
• For simplicity assume that each variant leaves a person sick the same number of days, and
• Vaccination is all-or-nothing: one vaccine is enough to protect against all variant;
• On the other hand, having recovered from one variant is not protection against others.
Implementation-wise speaking, we model this as follows. First of all, we need a Disease class, so that
we can infect a person with an explicit virus;
void infect(int);
void infect(Disease);
A Disease object now carries the information such as the chance of transmission, or how a long a person
stays under the weather. Modeling mutation is a little tricky. You could do it as follows:
• There is a global variants counter for new virus variants, and a global transmissions counter.
• Every time a person infects another, the newly infected person gets a new Disease object, with
the current variant, and the transmissions counter is updated.
• There is a parameter that determines after how many transmissions the disease mutates. If there
is a mutation, the global variants counter is updated, and from that point on, every infection
is with the new variant. (Note: this is not very realistic. You are free to come up with a better
model.)
• A each Person object has a vector of variants that they are recovered from; recovery from one
variant only makes them immune from that specific variant, not from others.
Victor Eijkhout 43
5. Infectuous disease simulation
Exercise 5.7. Add mutation to your model. Experiment with the mutation rate: as the mutation rate
increases, the disease should stay in the population longer. Does the relation with vaccination rate
change that you observed before?
5.3 Ethics
The subject of infectious diseases and vaccination is full of ethical questions. The main one is The
chances of something happening to me are very small, so why shouldn’t I bend the rules a little?. This
reasoning is most often applied to vaccination, where people for some reason or other refuse to get
vaccinated.
Explore this question and others you may come up with: it is clear that everyone bending the rules will
have disastrous consequences, but what if only a few people did this?
2. The better solution requires you to use separate compilation for building the program, and you
need a header file. You would now have infect_lib.cc which is compiled separately, and
infect_lib.h which is included both in the library file and the main program:
#include "infect_lib.h"
5.4.2 Writeup
In the writeup, describe the ‘experiments’ you have performed and the conclusions you draw from them.
The exercises above give you a number of questions to address.
For each main program, include some sample output, but note that this is no substitute for writing out
your conclusions in full sentences.
The exercises in section 5.2.4Spreadingsubsection.5.2.4 ask you to explore the program behavior as a
function of one or more parameters. Include a table to report on the behavior you found. You can use
Matlab or Matplotlib in Python (or even Excell) to plot your data, but that is not required.
Si+1 = Si (1 − λi dt)
where λi is the product of the number of infected people and a constant that reflects the number
of meetings and the infectiousness of the disease. We write:
2. The number of infected people similarly increases by λSi Ii , but it also decreases by people
recovering (or dying):
Ri+1 = Ri (1 + γIi ).
Exercise 5.8. Code this scheme. What is the effect of varying dt?
Exercise 5.9. For the disease to become an epidemic, the number of newly infected has to be larger
than the number of recovered. That is,
Victor Eijkhout 45
5. Infectuous disease simulation
Google PageRank
Exercise 6.1. Make a class Page which initially just contains the name of the page. Write a method
to display the page. Since we will be using pointers quite a bit, let this be the intended code for testing:
auto homepage = make_shared<Page>("My Home Page");
cout << "Homepage has no links yet:" << ’\n’;
cout << homepage->as_string() << ’\n’;
Next, add links to the page. A link is a pointer to another page, and since there can be any number of
them, you will need a vector of them. Write a method click that follows the link. Intended code:
auto utexas = make_shared<Page>("University Home Page");
homepage->add_link(utexas);
auto searchpage = make_shared<Page>("google");
homepage->add_link(searchpage);
cout << homepage->as_string() << ’\n’;
Exercise 6.2. Add some more links to your homepage. Write a method random_click for the
Page class. Intended code:
for (int iclick=0; iclick<20; iclick++) {
47
6. Google PageRank
Exercise 6.3. Now make a class Web which foremost contains a bunch (technically: a vector) of
pages. Or rather: of pointers to pages. Since we don’t want to build a whole internet by hand, let’s
have a method create_random_links which makes a random number of links to random pages.
Intended code:
Web internet(netsize);
internet.create_random_links(avglinks);
Now we can start our simulation. Write a method Web::random_walk that takes a page, and the
length of the walk, and simulates the result of randomly clicking that many times on the current page.
(Current page. Not the starting page.)
Let’s start working towards PageRank. First we see if there are pages that are more popular than others.
You can do that by starting a random walk once on each page. Or maybe a couple of times.
Exercise 6.4. Apart from the size of your internet, what other design parameters are there for your
tests? Can you give a back-of-the-envelope estimation of their effect?
Exercise 6.5. Your first simulation is to start on each page a number of times, and counts where that
lands you. Intended code:
vector<int> landing_counts(internet.number_of_pages(),0);
for ( auto page : internet.all_pages() ) {
for (int iwalk=0; iwalk<5; iwalk++) {
auto endpage = internet.random_walk(page,2*avglinks,tracing);
landing_counts.at(endpage->global_ID())++;
}
}
Display the results and analyze. You may find that you finish on certain pages too many times. What’s
happening? Fix that.
Exercise 6.6. Code the above algorithm, keeping track of how many steps it takes to reach each
vertex w. This is the Single Source Shortest Path algorithm (for unweighted graphs).
The diameter is defined as the maximal shortest path. Code this.
Code: Output
[google] pdfsetup:
ProbabilityDistribution
Initial distribution:
random_state(internet.number_of_pages()); 0:0.00, 1:0.02, 2:0.07,
random_state.set_random(); 3:0.05, 4:0.06, 5:0.08,
cout << "Initial distribution: " << 6:0.04, 7:0.04, 8:0.04,
random_state.as_string() << ’\n’; 9:0.01, 10:0.07, 11:0.05,
12:0.01, 13:0.04,
14:0.08, 15:0.06,
16:0.10, 17:0.06,
18:0.11, 19:0.01,
Next we need a method that given a probability distribution, gives you the new distribution corresponding
to performing a single click. (This is related to Markov chains; see HPC book, section 9.2.1.)
Test it by
• start with a distribution that is nonzero in exactly one page;
• print the new distribution corresponding to one click;
• do this for several pages and inspect the result visually.
Victor Eijkhout 49
6. Google PageRank
Then start with a random distribution and run a couple of iterations. How fast does the process converge?
Compare the result to the random walk exercise above.
Exercise 6.9. In the random walk exercise you had to deal with the fact that some pages have no
outgoing links. In that case you transitioned to a random page. That mechanism is lacking in the
globalclick method. Figure out a way to incorporate this.
Exercise 6.10. Add a page that you will artificially made look important: add a number of pages (for
instance four times the average number of links) that all link to this page, but no one links to them.
(Because of the random clicking they will still sometimes be reached.)
Compute the rank of the artificially hyped page. Did you manage to trick Google into ranking this page
high?
Exercise 6.11. Add the matrix representation of the Web object and reimplement the globalclick
method. Test for correctness.
Do a timing comparison.
The iteration you did above to find a stable probability distribution corresponds to the ‘power method’
in linear algebra. Look up the Perron-Frobenius theory and see what it implies for page ranking.
Redistricting
In this project you can explore ‘gerrymandering’, the strategic drawing of districts to give a minority
population a majority of districts1 .
1. This project is obviously based on the Northern American political system. Hopefully the explanations here are clear
enough. Please contact the author if you know of other countries that have a similar system.
51
7. Redistricting
Exercise 7.1. Implement a Voter class. You could for instance let ±1 stand for A/B, and 0 for
undecided.
cout << "Voter 5 is positive:" << ’\n’;
Voter nr5(5,+1);
cout << nr5.print() << ’\n’;
7.2.2 Populations
Code: Output
[gerry] district:
cout << "Making district with one B
voter" << ’\n’; Making district with one B voter
Voter nr5(5,+1); .. size: 1
District nine( nr5 ); .. lean: 1
cout << ".. size: " << nine.size() <<
’\n’; Making district ABA
cout << ".. lean: " << nine.lean() << .. size: 3
’\n’; .. lean: -1
/* ... */
cout << "Making district ABA" << ’\n’;
District nine( vector<Voter>
{ {1,-1},{2,+1},{3,-1}
} );
cout << ".. size: " << nine.size() <<
’\n’;
cout << ".. lean: " << nine.lean() <<
’\n’;
Exercise 7.3. Implement a Population class that will initially model a whole state.
Code: Output
[gerry] population:
string pns( "-++--" );
Population some(pns); Population from string -++--
cout << "Population from string " << pns .. size: 5
<< ’\n’; .. lean: -1
cout << ".. size: " << some.size() << sub population 1--3
’\n’; .. size: 2
cout << ".. lean: " << some.lean() << .. lean: 1
’\n’;
Population group=some.sub(1,3);
cout << "sub population 1--3" << ’\n’;
cout << ".. size: " << group.size() <<
’\n’;
cout << ".. lean: " << group.lean() <<
’\n’;
In addition to an explicit creation, also write a constructor that specifies how many people and what the
majority is:
Population( int population_size,int majority,bool trace=false )
7.2.3 Districting
The next level of complication is to have a set of districts. Since we will be creating this incrementally,
we need some methods for extending it.
Exercise 7.4. Write a class Districting that stores a vector of District objects. Write
size and lean methods:
Victor Eijkhout 53
7. Redistricting
Code: Output
[gerry] gerryempty:
cout << "Making single voter population
B" << ’\n’; Making single voter population B
Population people( vector<Voter>{ .. size: 1
Voter(0,+1) } ); .. lean: 1
cout << ".. size: " << people.size() << Start with empty districting:
’\n’; .. number of districts: 0
cout << ".. lean: " << people.lean() <<
’\n’;
Districting gerry;
cout << "Start with empty districting:"
<< ’\n’;
cout << ".. number of districts: " <<
gerry.size() << ’\n’;
7.3 Strategy
Now we need a method for districting a population:
Districting Population::minority_rules( int ndistricts );
Rather than generating all possible partitions of the population, we take an incremental approach (this is
related to the solution strategy called dynamic programming):
• The basic question is to divide a population optimally over n districts;
• We do this recursively by first solving a division of a subpopulation over n − 1 districts,
• and extending that with the remaining population as one district.
This means that you need to consider all the ways of having the ‘remaining’ population into one district,
and that means that you will have a loop over all ways of splitting the population, outside of your
recursion; see figure 7.1Multiple ways of splitting a populationfigure.7.1.
• For all p = 0, . . . n − 1 considering splitting the state into 0, . . . , p − 1 and p, . . . , n − 1.
• Use the best districting of the first group, and make the last group into a single district.
• Keep the districting that gives the strongest minority rule, over all values of p.
You can now realize the above simple example:
AAABB => AAA|B|B
Victor Eijkhout 55
7. Redistricting
Note: the range for p given above is not quite correct: for instance, the initial part of the population
needs to be big enough to accomodate n − 1 voters.
Exercise 7.7. Test multiple population sizes; how much majority can you give party B while still
giving party A a majority.
Exercise 7.8. Improve your implementation by storing and reusing results for the initial sub-populations.
In a way, we solved the program backward: we looked at making a district out of the last so-many voters,
and then recursively solving a smaller problem for the first however-many voters. But in that process, we
decided what is the best way to assign districts to the first 1 voter, first 2, first 3, et cetera. Actually, for
more than one voter, say five voters, we found the result on the best attainable minority rule assigning
these five voters to one, two, three, four districts.
The process of computing the ‘best’ districting forward, is known as dynamic programming. The fun-
damental assumption here is that you can use intermediate results and extend them, without having to
reconsider the earlier problems.
Consider for instance that you’ve considered districting ten voters over up to five districts. Now the
majority for eleven voters and five districts is the minimum of
• ten voters and five districts, and the new voter is added to the last district; or
• ten voters and four districts, and the new voter becomes a new district.
7.5 Extensions
The project so far has several simplifying assumptions.
• Congressional districts need to be approximately the same size. Can you put a limit on the ratio
between sizes? Can the minority still gain a majority?
Exercise 7.10. The biggest assumption is of course that we considered a one-dimensional state. With
two dimensions you have more degrees of freedom of shaping the districts. Implement a two-dimensional
scheme; use a completely square state, where the census districts form a regular grid. Limit the shape
of the congressional districts to be convex.
Exercise 7.11. Look up the definition of efficiency gap (and ‘wasted votes’), and implement it in your
code.
7.6 Ethics
The activity of redistricting was intended to give people a fair representation. In its degenerate form of
Gerrymandering this concept of fairness is violated because the explicit goal is to give the minority a
majority of votes. Explore ways that this unfairness can be undone.
In your explorations above, the only characteristic of a voter was their preference for party A or B.
However, in practice voters can be considered part of communities. The Voting Rights Act is concerned
about ‘minority vote dilution’. Can you show examples that a color-blind districting would affect some
communities negatively?
Victor Eijkhout 57
7. Redistricting
This section contains a sequence of exercises that builds up to a simulation of delivery truck scheduling.
Exercise 8.1. Code a class Address with the above functionality, and test it.
59
8. Amazon delivery truck scheduling
Code: Output
[amazon] address:
Address one(1.,1.),
two(2.,2.); Address
cerr << "Distance: " Distance: 1.41421
<< one.distance(two) .. address
<< ’\n’; Address 1 should be closest to
the depot. Check: 1
Square5
Travel in order: 24.1421
Square route: (0,0) (0,5)
(5,5) (5,0) (0,0)
has length 20
.. square5
Hundred houses
Route in order has length
25852.6
TSP based on mere listing has
length: 2751.99 over naive
25852.6
Single route has length: 2078.43
.. new route accepted with
length 2076.65
Final route has length 2076.65
over initial 2078.43
TSP route has length 1899.4
over initial 2078.43
Two routes
Route1: (0,0) (2,0) (3,2)
(2,3) (0,2) (0,0)
route2: (0,0) (3,1) (2,1)
(1,2) (1,3) (0,0)
total length 19.6251
start with 9.88635,9.73877
Pass 0
.. down to 9.81256,8.57649
Pass 1
Victor Eijkhout Pass 2 61
Pass 3
Pass 4
TSP Route1: (0,0) (3,1) (3,2)
(2,3) (0,2) (0,0)
route2: (0,0) (2,0) (2,1)
(1,2) (1,3) (0,0)
total length 18.389
8. Amazon delivery truck scheduling
Exercise 8.2. Implement a class AddressList; it probably needs the following methods:
• add_address for constructing the list;
• length to give the distance one has to travel to visit all addresses in order;
• index_closest_to that gives you the address on the list closests to another address, presum-
ably not on the list.
that constructs a new address list, containing the same addresses, but arranged to give a shorter length to
travel.
Exercise 8.3. Write the greedy_route method for the AddressList class.
1. Assume that the route starts at the depot, which is located at (0, 0). Then incrementally con-
struct a new list by:
2. Maintain an Address variable we_are_here of the current location;
3. repeatedly find the address closest to we_are_here.
Extend this to a method for the Route class by working on the subvector that does not contain the final
element.
Test it on this example:
Code: Output
[amazon] square5:
Route deliveries;
deliveries.add_address( Address(0,5) Travel in order: 24.1421
); Square route: (0,0) (0,5)
deliveries.add_address( Address(5,0) (5,5) (5,0) (0,0)
); has length 20
deliveries.add_address( Address(5,5)
);
cerr << "Travel in order: " <<
deliveries.length() << ’\n’;
assert( deliveries.size()==5 );
auto route =
deliveries.greedy_route();
assert( route.size()==5 );
auto len = route.length();
cerr << "Square route: " <<
route.as_string()
<< "\n has length " << len <<
’\n’;
However, you can approximate the solution heuristically. One method, the Kernighan-Lin algorithm [8],
is based on the opt2 idea: if you have a path that ‘crosses itself’, you can make it shorter by reversing
Victor Eijkhout 63
8. Amazon delivery truck scheduling
part of it. Figure 8.1Illustration of the ‘opt2’ idea of reversing part of a pathfigure.8.1 shows that the
path 1 − −2 − −3 − −4 can be made shorter by reversing part of it, giving 1 − −3 − −2 − −4.
Since recognizing where a path crosses itself can be hard, or even impossible for graphs that don’t have
Cartesian coordinates associated, we adopt a scheme:
for all nodes m<n on the path [1..N]:
make a new route from
[1..m-1] + [m--n].reversed + [n+1..N]
if the new route is shorter, keep it
Exercise 8.4. Code the opt2 heuristic: write a method to reverse part of the route, and write the loop
that tries this with multiple starting and ending points. Try it out on some simple test cases to convince
you that your code works as intended.
Exercise 8.6. Earlier you had programmed the greedy heuristic. Compare the improvement you get
from the opt2 heuristic, starting both with the given list of addresses, and with a greedy traversal of it.
Exercise 8.7. Write a function that optimizes two paths simultaneously using the multi-path version
of the opt2 heuristic. For a test case, see figure 8.3Multiple paths test casefigure.8.3.
You have quite a bit of freedom here:
• The start points of the two segments should be chosen independently;
• the lengths can be chosen independently, but need not; and finally
• each segment can be reversed.
More flexibility also means a longer runtime of your program. Does it pay off? Do some tests and report
results.
Based on the above description there will be a lot of code duplication. Make sure to introduce functions
and methods for various operations.
Exercise 8.8. Explore a scenario where there are two trucks, and each have a number of addresses
that can not be exchanged with the other route. How much longer is the total distance? Experiment with
the ratio of prime to non-prime addresses.
8.6 Dynamicism
So far we have assumed that the list of addresses to be delivered to is given. This is of course not true:
new deliveries will need to be scheduled continuously.
Exercise 8.9. Implement a scenario where every day a random number of new deliveries is added to
the list. Explore strategies and design choices.
Victor Eijkhout 65
8. Amazon delivery truck scheduling
8.7 Ethics
People sometimes criticize Amazon’s labor policies, including regarding its drivers. Can you make any
observations from your simulations in this respect?
Linear algebra operations such as the matrix-matrix product are easy to code in a naive way. However,
this does not lead to high performance. In these exercises you will explore the basics of a strategy for
high performance.
However, this is not the only way to code this operation. The loops can be permuted, giving a total of six
implementations.
Exercise 9.1. Code one of the permuted algorithms and test its correctness. If the reference algorithm
above can be said to be ‘inner-product based’, how would you describe your variant?
Yet another implementation is based on a block partitioning. Let A, B, C be split on 2 × 2 block form:
A11 A12 B11 B12 C11 C12
A= , B= , C=
A21 A22 B21 B22 C21 C22
Then
C11 =A11 B11 + A12 B21 ,
C12 =A11 B12 + A12 B22 ,
(9.1)
C21 =A21 B11 + A22 B21 ,
C22 =A21 B12 + A22 B22
(Convince yourself that this actually computes the same product C = A · B.)
67
9. High performance linear algebra
Remark 5 Historically, linear algebra software such as the Basic Linear Algebra Subprograms (BLAS)
has used columnwise storage, meaning that the location of an element (i, j) is computed as i + j · M
(we will use zero-based indexing throughout this project, both for code and mathematical expressions.)
The reason for this stems from the origins of the BLAS in the Fortran language, which uses column-
major ordering of array elements. On the other hand, static arrays (such as x[5][6][7]) in the C/C++
languges have row-major ordering, where element (i, j) is stored in location j + i · N .
Above, you saw the idea of block algorithms, which requires taking submatrices. For efficiency, we don’t
want to copy elements into a new array, so we want the submatrix to correspond to a subarray.
Now we have a problem: only a submatrix that consists of a sequence of columns is contiguous. For
this reason, linear algbra software treats each matrix like a submatrix, described by three parameters
M, N, LDA, where ‘LDA’ stands for ‘leading dimension of A’ (see BLAS [6], and Lapack [1]). This is
Figure 9.1: Submatrix out of a matrix, with m,n,lda of the submatrix indicated
illustrated in figure 9.1Submatrix out of a matrix, with m,n,lda of the submatrix indicatedfigure.9.1.
Exercise 9.2. In terms of M, N, LDA, what is the location of the (i, j) element?
Implementationwise we also have a problem. If we use std::vector for storage, it is not possible to take
subarrays, since C++ insists that a vector has its own storage. The solution is to use span; section ??.
We could have two types of matrices: top level matrices that store a vector<double>, and submatrices
that store a span<double>, but that is a lot of complication. It could be done using std::variant
(section ??), but let’s not.
Instead, let’s adopt the following idiom, where we create a vector at the top level, and then create matrices
from its memory.
// example values for M,LDA,N
M = 2; LDA = M+2; N = 3;
// create a vector to contain the data
vector<double> one_data(LDA*N,1.);
// create a matrix using the vector data
Matrix one(M,LDA,N,one_data.data());
(If you have not previously programmed in C, you need to get used to the double* mechanism. Read up
on section ??.)
Write a method
double& Matrix::at(int i,int j);
Exercise 9.4. Write a method for adding matrices. Test it on matrices that have the same M, N , but
different LDA.
Use of the at method is great for debugging, but it is not efficient. Use the preprocessor (chapter ??) to
introduce alternatives:
#ifdef DEBUG
c.at(i,j) += a.at(i,k) * b.at(k,j)
#else
cdata[ /* expression with i,j */ ] += adata[ ... ] * bdata[ ... ]
#endif
Exercise 9.5. Implement this. Use a cpp #define macro for the optimized indexing expression. (See
section ??.)
Victor Eijkhout 69
9. High performance linear algebra
9.2.1 Submatrices
Next we need to support constructing actual submatrices. Since we will mostly aim for decomposition
in 2 × 2 block form, it is enough to write four methods:
Matrix Left(int j);
Matrix Right(int j);
Matrix Top(int i);
Matrix Bot(int i);
9.3 Multiplication
You can now write a first multiplication routine, for instance with a prototype
void Matrix::MatMult( Matrix& other,Matrix& out );
Next, write
void Matrix::BlockedMatMult( Matrix& other,Matrix& out );
which
• Executes the 2 × 2 block product, using again RecursiveMatMult for the blocks.
• When the block is small enough, use the regular MatMult product.
Exercise 9.8. Read up on cache memory, and argue that the naive matrix-matrix product implemen-
tation is unlikely actually to reuse data.
Explain why the recursive strategy does lead to data reuse.
Above, you set a cutoff point for when to switch from the recursive to the regular product.
Exercise 9.9. Argue that continuing to recurse will not have much benefit once the product is con-
tained in the cache. What are the cache sizes of your processor?
Do experiments with various cutoff points. Can you relate this to the cache sizes?
cblas_dgemm
( CblasColMajor, CblasNoTrans, CblasNoTrans,
m,other.n,n, alpha,adata,lda,
bdata,other.lda,
beta,cdata,out.lda);
Exercise 9.10. Use another cpp conditional to implement MatMult through a call to cblas_dgemm.
What performance do you now get?
You see that your recursive implementation is faster than the naive one, but not nearly as fast as the
CBlas one. This is because
• the CBlas implementation is probably based on an entirely different strategy [5], and
• it probably involves a certain amount of assembly coding.
Victor Eijkhout 71
9. High performance linear algebra
Graph algorithms
In this project you will explore some common graph algorithms, and their various possible implementa-
tions. The main theme here will be that the common textbook exposition of algorithms is not necessarily
the best way to phrase them computationally.
As background knowledge for this project, you are encouraged to read HPC book, chapter 9; for an
elementary tutorial on graphs, see HPC book, chapter 19.
73
10. Graph algorithms
Exercise 10.1. Finish the Dag class. In particular, add a method to generate example graphs:
• For testing the ‘circular’ graph is often useful: connect edges
0 → 1 → · · · → N − 1 → 0.
where you use some convention, such as negative distance, to indicate that a node has been removed
from the set.
However, C++ has an actual set container with methods for adding an element, finding it, and removing
it; see section ??. This makes for a more direct expression of our algorithms. In our case, we’d need a
set of int/int or int/float pairs, depending on the graph algorithm. (It is also possible to use a map, using
an int as lookup key, and int or float as values.)
For the unweighted graph we only need a set of finished nodes, and we insert node 0 as our starting
point:
using node_info = std::pair<unsigned,unsigned>;
std::set< node_info > distances;
distances.insert( {0,0} );
For Dijkstra’s algorithm we need both a set of finished nodes, and nodes that we are still working on. We
again set the starting node, and we set the distance for all unprocessed nodes to infinity:
const unsigned inf = std::numeric_limits<unsigned>::max();
using node_info = std::pair<unsigned,unsigned>;
std::set< node_info > distances,to_be_done;
to_be_done.insert( {0,0} );
for (unsigned n=1; n<graph_size; n++)
to_be_done.insert( {n,inf} );
(Why do we need that second set here, while it was not necessary for the unweighted graph case?)
Exercise 10.2. Write a code fragment that tests if a node is in the distances set.
• You can of course write a loop for this. In that case know that iterating over a set gives you the
key/value pairs. Use structured bindings; section ??.
• But it’s better to use an ‘algorithm’, in the technical sense of ‘algorithms built into the standard
Exercise 10.3. Finish the program that computes the SSSP algorithm and test it.
This code has an obvious inefficiency: for each level we iterate through all finished nodes, even if all
their neighbors may already have been processed.
Exercise 10.4. Maintain a set of ‘current level’ nodes, and only investigate these to find the next level.
Time the two variants on some large graphs.
Victor Eijkhout 75
10. Graph algorithms
for (;;) {
if (to_be_done.size()==0) break;
/*
* Find the node with least distance
*/
/* ... */
cout << "min: " << nclose << " @ " << dclose << ’\n’;
/*
* Move that node to done,
*/
to_be_done.erase(closest_node);
distances.insert( *closest_node );
/*
* set neighbors of nclose to have that distance + 1
*/
const auto& nbors = graph.neighbors(nclose);
for ( auto n : nbors ) {
// find ‘n’ in distances
/* ... */
{
/*
* if ‘n’ does not have known distance,
* find where it occurs in ‘to_be_done’ and update
*/
/* ... */
to_be_done.erase( cfind );
to_be_done.insert( {n,dclose+1} );
/* ... */
}
(Note that we erase a record in the to_be_done set, and then re-insert the same key with a new value.
We could have done a simple update if we had used a map instead of a set.)
The various places where you find nodes in the finished / unfinished sets are up to you to implement.
You can use simple loops, or use find_if to find the elements matching the node numbers.
Exercise 10.5. Fill in the details of the above outline to realize Dijkstra’s algorithm.
Remark 6 In general it’s not a good idea to store a matrix as a vector-of-vectors, but in this case we
need to be able to return a matrix row, so it is convenient.
This is the simplest solution, but not necessarily the most efficient one, as it creates a new vector object
for each matrix-vector multiply.
As explained in the theory background, graph algorithms can be formulated as matrix-vector multipli-
cations with unusual add/multiply operations. Thus, the core of the multiplication routine could look
like
for ( int row=0; row<n; row++ ) {
for ( int col=0; col<n; col++) {
result[col] = add( result[col], mult( left[row],adjacency[row][col] ) );
}
}
Exercise 10.6. Implement the add / mult routines to make the SSSP algorithm on unweighted graphs
work.
The shortest distance 0 → 4 is 4, but in the first step a larger distance of 5 is discovered. Your algorithm
should show an output similar to this for the successive updates to the known shortest distances:
Victor Eijkhout 77
10. Graph algorithms
Input : 0 . . . .
step 0: 0 1 . . 5
step 1: 0 1 2 . 5
step 2: 0 1 2 3 5
step 3: 0 1 2 3 4
Exercise 10.7. Implement new versions of the add / mult routines to make the matrix-vector multi-
plication correspond to Dijkstra’s algorithm for SSSP on weighted graphs.
Climate change
that contain temperature deviations from the 1951–1980 average. Deviations are given for each month
of each year 1880–2018. These data files and more can be found at https://data.giss.nasa.
gov/gistemp/.
Exercise 11.1. Start by making a listing of the available years, and an array monthly_deviation
of size 12 × nyears, where nyears is the number of full years in the file. Use formats and array notation.
The text files contain lines that do not concern you. Do you filter them out in your program, or are you
using a shell script? Hint: a judicious use of grep will make the Fortran code much easier.
79
11. Climate change
each point has a chance of 1/n to be a record high. Since over n + 1 years each year has a chance of
1/(n + 1), the n + 1st year has a chance 1/(n + 1) of being a record.
We conclude that, as a function of n, the chance of a record high (or low, but let’s stick with highs) goes
down as 1/n, and that the gap between successive highs is approximately a linear function of the year1 .
This is something we can test.
Again, use array notation. This is also a great place to use the Where clause.
Exercise 11.3. Now take each month, and find the gaps between records. This gives you two arrays:
gapyears for the years where a gap between record highs starts, and gapsizes for the length of
that gap.
This function, since it is applied individually to each month, uses no array notation.
The hypothesis is now that the gapsizes are a linear function of the year, for instance measured as distance
from the starting year. Of course they are not exactly a linear function, but maybe we can fit a linear
function through it by linear regression.
You’ll find that the gaps are decidedly not linearly increasing. So is this negative result the end of the
story, or can we do more?
Exercise 11.5. Can you turn this exercise into a test of global warming? Can you interpret the devia-
tions as the sum of a yearly increase in temperature plus a stationary distribution, rather than a stationary
distribution by itself?
1. Technically, we are dealing with a uniform distribution of temperatures, which makes the maxima and minima have a
beta-distribution.
In this set of exercises you will write a ‘desk calculator’: a small interactive calculator that combines
numerical and symbolic calculation.
These exercises mostly use the material of chapters ??, ??, ??.
A named variable has a value, and a string field that is the expression that generated the variable. When
you create the variable, the expression can be anything.
type(namedvar) :: x,y,z,a
x = namedvar("x",1 )
y = namedvar("yvar",2 )
Next we are going to do calculations with these type objects. For instance, adding two objects
• adds their values, and
• concatenates their expression fields, giving the expression corresponding to the sum value.
Your first assignment is to write varadd and varmult functions that get the following program working
with the indicated output. This uses string manipulation from sections ?? and ??.
Exercise 12.1. The following main program should give the corresponding output:
Code: Output
[structf] varhandling:
print *,x
print *,y x 1
z = varadd(x,y) yvar 2
print *,z (x)+(yvar) 3
a = varmult(x,z) (x)*((x)+(yvar)) 3
print *,a
81
12. Desk Calculator Interpreter
(To be clear: the two routines need to do both numeric and string ‘addition’ and ‘multiplication’.)
You can base this off the file namedvar.cxx in the repository
Exercise 12.2. Create a module (suggested name: VarHandling) and move the namedvar type defi-
nition and the routines varadd, varmult into it.
Exercise 12.3. Also create a module (suggested name: InputHandling) that contains the routines
islower, isdigit from the character exercises in chapter ??. You will also need an isop routine to
recognize arithmetic operations.
Exercise 12.4. Write a loop that accepts character input, and only prints out what kind of character
was encountered: a lowercase character, a digit, or a character denoting an arithmetic operation +-*/.
Code: Output
[structf] interchar:
do
read *,input Inputs: 4 x 3 + 0
if (input .eq. ’0’) then 4 is a digit
exit x is a lowercase
else if ( isdigit(input) ) then 3 is a digit
+ is an operator
12.3.1 Stack
Next, we are going to store values in namedvar types on a stack. A stack is a data structure where new
elements go on the top, so we need to indicate with a stack pointer that top element. Equivalently, the
stack pointer indicates how many elements there already are:
type(namedvar),dimension(10) :: stack
integer :: stackpointer=0
Since we are using modules, let’s keep the stack out of the main program and put it in the appropriate
module.
Exercise 12.5. Add the stack variable and the stack pointer to the VarHandling module.
Since Fortran uses 1-based indexing, a starting value of zero is correct. For C/C++ it would have been -1.
Next we will start implementing stack operations, such as putting namedvar objects on the stack.
(You have already coded isdigit in exercise ??.) but a cleaner design uses a function call to a
method in the VarHandling module:
Note that the stack_push routine does not have the stack or stack pointer as arguments: since
they are all in the same module, they are accessible as global variable.
Finally,
4. if it is a letter indicating an operation +, −, ×, /,
(a) take the two top entries from the stack, lowering the stack pointer;
(b) apply that operation to the operanrds; and
(c) push the result onto the stack.
The auxiliary function stack_display is a little tricky, so you get that here. This uses string formatting
(section ??) and implied do loops (section ??): Also, note that the stack array and the stackpointer
act like global variables.
subroutine stack_display()
implicit none
! local variables
integer :: istck
if (stackpointer.eq.0) return
print ’( 10( a,a, a,i0,"; ") )’, ( &
" expr=",trim(stack(istck)%expression), &
" val=",stack(istck)%value, &
istck=1,stackpointer )
Victor Eijkhout 83
12. Desk Calculator Interpreter
Exercise 12.6. Make your event loop accept digits, creating a new entry:
Code: Output
[structf] internum:
else if ( isdigit(input) ) then
call stack_push(input) Inputs: 4 5 6 0
expr=4 val=4;
expr=4 val=4; expr=5 val=5;
expr=4 val=4; expr=5 val=5; expr=6
Next we integrate the operations: if the input character corresponds to an arithmetic operator, we call
stack_op with that character. That routine in turn calls the appropriate operation depending on what the
character was.
Exercise 12.7. Add a clause to your event loop to handle characters that stand for arithmetic opera-
tions:
Code: Output
[structf] internumop:
else if ( isop(input) ) then
call stack_op(input) Inputs: 4 5 6 + + 0
expr=4 val=4;
expr=4 val=4; expr=5 val=5;
expr=4 val=4; expr=5 val=5; expr=6
expr=4 val=4; expr=(5)+(6) val=11;
expr=(4)+((5)+(6)) val=15;
Exercise 12.8. Add the id field to the namedvar, and make sure your program still compiles and
runs.
The event loop is now extended with an extra step. If the input character is a lowercase letter, it is used
as the id of a namedvar as follows.
• If there is already a stack entry with that id, it is duplicated on top of the stack;
Exercise 12.9. Write the missing function and its clause in the event loop:
Code: Output
[structf] stackfind:
stacksearch = find_on_stack(stack,stackpointer,input)
if ( stacksearch>=1 ) then Inputs: 1 x 2 y x y + z 0
stackpointer = stackpointer+1 id:. expr=1 val=1;
stack(stackpointer) = stack(stacksearch) id:x expr=1 val=1;
id:x expr=1 val=1; id:. expr=2 val=2;
id:x expr=1 val=1; id:y expr=2 val=2;
id:x expr=1 val=1; id:y expr=2 val=2;
id:x expr=1 val=1; id:y expr=2 val=2;
val=2;
id:x expr=1 val=1; id:y expr=2 val=2;
id:x expr=1 val=1; id:y expr=2 val=2;
12.4 Modularizing
With the modules and the functions you have developed so far, you have a very clean main program:
do
call stack_display()
read *,input
if (input .eq. ’0’) exit
if ( isdigit(input) ) then
call stack_push(input)
else if ( isop(input) ) then
call stack_op(input)
else if ( islower(input) ) then
call stack_name(input)
end if
end do
You see that by moving the stack into the module, neither the stack variable nor the stack pointer are
visible in the main program anymore.
But there is an important limitation to this design: there is exactly one stack, declared as a sort of global
variable, accessible through a module.
Whether having global data is good practice is another matter. In this case it’s defensible: in a calculator
app there will be exactly one stack.
Victor Eijkhout 85
12. Desk Calculator Interpreter
Exercise 12.10. Change the event loop so that it calls methods of the stackstruct type, rather than
functions that take the stack as input.
For instance, the push function is called as:
if ( isdigit(input) ) then
call thestack%push(input)
et cetera.
APPENDIX
Chapter 13
13.2 Style
Your report should obey the rules of proper English.
• Observing correct spelling and grammar goes without saying.
• Use full sentences.
• Try to avoid verbiage that is disparaging or otherwise inadvisable. The academic XSEDE has
the following guidelines: https://www.xsede.org/terminology; much longer and
more extentsive, the Google developer documentation style guide [4] is also a great resource.
89
13. Style guide for project submissions
• Introductory section that is extremely high level: what is the problem, what did you do, what
did you find.
• Conclusion: what do your findings mean, what are limitations, opportunities for future exten-
sions.
• Bibliography.
13.3.1 Introduction
The reader of your document need not be familiar with the project description, or even the problem it
addresses. Indicate what the problem is, give theoretical background if appropriate, possibly sketch a
historic background, and describe in global terms how you set out to solve the problem, as well as a brief
statement of your findings.
13.4 Experiments
You should not expect your program to run once and give you a final answer to your research question.
Ask yourself: what parameters can be varied, and then vary them! This allows you to generate graphs or
multi-dimensional plots.
If you vary a parameter, think about what granularity you use. Do ten data points suffice, or do you get
insight from using 10, 000?
Above all: computers are very fast, they do a billion operations per second. So don’t be shy in using
long program runs. Your program is not a calculator where a press on the button immediately gives the
answer: you should expect program runs to take seconds, maybe minutes.
13.5.2 Code
Your report should describe in a global manner the algorithms you developed, and you should include
relevant code snippets. If you want to include full listings, relegate that to an appendix: code snippets in
the text should only be used to illustrate especially salient points.
Do not use screen shots of your code: at the very least use a monospaced font such as the verbatim
environment, but using the listings package (used in this book) is very much recommended.
Amazon
delivery truck, 59
prime, 59, 65
bisection, 23
compilation
separate, 45
connected components, see graph, connected
constructor
delegating, 34
covid-19, 44
ebola, 44
efficiency gap, 56
91
eight queens, 31
gerrymandering, 51
Goldbach conjecture, 11
Google, 47
developer documentation style guide, 89
graph
connected, 48
diameter, 49
greedy search, see search, greedy
header, 45
Horner’s rule, 25
INDEX
makefile, 45
Manhattan distance, 59
Markov chain, 49
memoization, 56
memory
bottleneck, 71
Newton’s method, 28
NP-hard, 63
operator
overloading, 86
opt2, 63
Pagerank, 47
programming
dynamic, 54
root finding, 23
search
greedy, 62, 63
Single Source Shortest Path, 49
SIR model, 42
stack, 82
pointer, 82
structured bindings, 74
variable
global
in Fortran module, 83
vector, 12
XSEDE, 89
Victor Eijkhout 93
INDEX
Bibliography
95
14. Bibliography
97