Shortest Path
Shortest Path
Shortest Paths
In this chapter we will cover problems involving finding the shortest path between vertices in a
graph with weights (lengths) on the edges. One obvious application is in finding the shortest
route from one address to another, however shortest paths have many other application. We
will look at a version of the problem assuming we are finding distances from a single source to
all other vertices in the graph, and two variants depending on whether we allow negative edge
weights or not. We first introduce weighted graphs.
Many applications of graphs require associating weights or other values with the edges of a
graph. Such graphs can be formally defined as follows.
In a graph, if the data associated with the edges are real numbers (L ⊆ R), we often use the
term “weight” to refer to the values, and use the term “weighted graph” to refer to the graph. In
the general case, we use the terms “edge label” or “edge values” to refer to the values. Weights
or other values on edges could represent many things, such as a distance, or a capacity, or the
strength of a relationship.
197
198 CHAPTER 13. SHORTEST PATHS
0.7
1 3
-1.5 -2.0
2 4
3.0
Chapter 9 described three different representations of graphs suitable for parallel algorithms,
edge sets, adjacency tables, and adjacency sequences.
Question 13.3. Can you see how we can extend these representations to support edge
values?
We can extend all of these representations to support edge values by separately representing
the function from edges to values using a table (mapping)—the table maps each edge (or arc) to
its value. This representation allows looking up the edge value of an edge e = (u, v) by using a
table lookup. We call this an edge table.
Example 13.4. For the weighted graph in Example 13.2, the edge table is:
A nice property of an edge table is that it works uniformly with all representations of
the structure of a graph, and is clean since it separates the edge values from the structural
information. However keeping the edge table separately creates redundancy, wasting space and
possibly requiring extra work accessing the edge values. The redundancy can be avoided by
storing the edge values directly with the edge information. For example, using edge tables makes
edge sets redundant so there is typically no need to keep a separate edge set. Adjacency tables
can be extended to include the edge values by replacing the set of neighbors of a vertex v with a
table that maps each neighbor u to the edge value w(v, u). Adjacency sequences can be extended
by creating a sequence of neighbor-value pairs for each out edge of a vertex.
13.2. SHORTEST WEIGHTED PATHS 199
Example 13.5. For the weighted graph in Example 13.2, the adjacency table representa-
tion is
Consider a weighted graph G = (V, E, w), w : E → R. The graph can be either directed or
undirected. For convenience we define w(u, v) = ∞ if (u, v) 6∈ E. We define the weight of a
path as the sum of the weights of the edges along that path.
Example 13.6. In the following graph the weight of the path h s, a, b, e i is 6. The weight
of the path h s, a, b, s i is 10.
2
1 a b 3
7
s e
4 6
c d
5
For a weighted graph G(V, E, w) a shortest (weighted) path from vertex u to vertex v is a
path from u to v with minimum weight. There might be multiple paths with equal weight, and
if so they are all shortest weighted paths from u to v. We use σG (u, v) to indicate the weight
of a shortest path from u to v. In shortest path problems, we are required to find the shortest
(weighted) path between vertices, or perhaps just the weight of such paths.
In the graph above, the shortest path from s to e is hs, a, b, ei with weight 6.
Question 13.8. What happens if we change the weight of the edge (b, s) from 7 to −7?
If we change the weight of the edge (b, s) to −7 the shortest path between s and e has weight
−∞ since a path can keep going around the cycle h s, a, b, s i reducing its weight by 4 each time
200 CHAPTER 13. SHORTEST PATHS
around. Recall that a path allows repeated vertices—-a simple path does not. For this reason,
when computing shortest paths we will need to be careful about cycles of negative weight. As
we will see, even if there are no negative weight cycles, negative edge weights make finding
shortest paths more difficult. We will therefore first consider the problem of finding shortest path
when there are no negative edge weights.
Computing shortest paths is important in many practical applications. In fact, there are
several variants of this problem such as the “single source” and the “multiple source” versions,
both with or without negative weights.
Although there can be many equal weight shortest paths between two vertices, the problem
only requires finding one. Also sometimes we only care about the weight of the shortest path
to each vertex and not the path itself. We will refer to this variant of the S SSP problem as the
S SSPδ problem.
Exercise 13.10. For a weighted graph G = (V, E, w), assume you are given the distances
δG s, v for all vertices V . For a particular vertex u, describe how you could find a shortest
path from s to u by only looking at the in neighbors of each vertex on the path. Hint:
start at u.
In Chapter 11 we saw how Breadth-First Search (BFS) can be used to solve the single-source
shortest path problem on graphs without edge weights, or, equivalently, where all edges have
weight 1.
Question 13.11. Can we use BFS to solve the single-source shortest path problem on
weighted graphs?
BFS does not work on weighted graphs, unfortunately, because it does not take edge weights
into account.
13.3. DIJKSTRA’S ALGORITHM 201
Example 13.12. To see why BFS does not work, consider the following directed graph
with 3 vertices:
1
1 a b
3
s
In this example, BFS first visits b and then a. When it visits b, it assigns it an incorrect
weight of 3. Since BFS never visit b again, it will not find the shortest path going trough a,
which happens to be shorter.
Example 13.13. Another graph where BFS fails to find the shortest paths correctly
1
1 a b 1
s c
2 2
d
The reason why BFS works on unweighted graphs is quite interesting and helpful for
understanding other shortest path algorithms. The key idea behind BFS is to visit vertices in
order of their distance from the source, visiting closest vertices first, and the next closest, and so
on. More specifically, for each frontier Fi , BFS has the correct distance from the source to each
vertex in the frontier. It then can determine the correct distance for unencountered neighbors that
are distance one further away (on the next frontier).
Dijkstra’s algorithm solves the S SSP problem when all the weights on the edges are non-negative
(i.e. w : E → R∗ ). We will refer to this variant of S SSP as the S SSP+ problem. Dijkstra’s is
an important algorithm both because it is an efficient algorithm for an important problem, but
also because it is a very elegant example of an efficient greedy algorithm that generates optimal
solutions on a nontrivial task.
In this section, we are going to (re-)discover this algorithms by taking advantage of properties
of graphs and shortest paths. Before going further we note that since no edges have negative
weights, there cannot be a negative weight cycle. Therefore one can never make a path shorter
by visiting a vertex twice—i.e. a path that cycles back to a vertex cannot have smaller weight
than the path that ends at the first visit to the vertex. This means we only need to consider simple
paths when searching for a shortest path.
202 CHAPTER 13. SHORTEST PATHS
Question 13.14. Can you think of a brute-force algorithm for finding the shortest? path.
Let’s start with a brute force algorithm for the S SSP+ problem, that, for each vertex, considers
all simple paths between the source and the vertex and selects the shortest such path.
Question 13.15. How many simple paths can there be between two vertices in a graph?
Unfortunately there can be an exponential number of paths between any pair of vertices, so
any algorithms that tries to look at all paths is not likely to scale beyond very small instances.
We therefore have to try to reduce the work. Toward this end we note that the sub-paths of a
shortest path must also be shortest paths between their end vertices, and look at how this can
help. This sub-paths property is a key property of shortest paths.
Example 13.16. If a shortest path from Pittsburgh to San Francisco goes through
Chicago, then that shortest path includes the shortest path from Pittsburgh to Chicago.
Exercise 13.17. Prove that the sub-paths property holds for any graph, also in the
presence of negative weights.
We are going to use this property both now to derive Dijkstra’s algorithm, and also again
in the next section to derive Bellman-Ford’s algorithm for the S SSP problem on graphs that do
allow negative edge weights. To see how this property can be helpful, suppose an oracle has told
you the shortest paths to all vertices except for one vertex, v.
Question 13.18. Using the sub-paths property, can you find the shortest path to v?
We can find the shortest path to v, because we know the shortest subpath form the source to
the vertex u immediately before v. All we need to do is find which u among the in-neighbors of
v minimizes the weight of the path to u, i.e. δG (s, u) plus the additional edge weight to get to v.
We note that this argument relies on the fact that a shortest path must be simple so cannot go
through v itself.
13.3. DIJKSTRA’S ALGORITHM 203
Example 13.19. In the following graph G, suppose that we have found the shortest paths
from the source s to all the other vertices except for v. The weight of the shortest path
to v is min δG (s, a) + 3, δG (s, b) + 6, δG (s, c) + 5. The shortest path goes through the
vertex (a, b, or c) that minimizes the weight.
Visited Set
a 3
6
s b v
5
c
Let’s try to generalize the argument to the case where instead of the oracle telling us the
shortest paths from s to all but one of the vertices, it tells us the shortest paths from s to some
subset of the vertices X ⊂ V with s ∈ X. Also let’s define Y to be the vertices not in X, i.e.
V \ X. The question is whether we can efficiently determine the shortest path to any vertices in
Y . If we could do this, then we would have a crank to repeatedly add new vertices to X until we
are done. As in graph search, we call the set of vertices that are neighbors of X but not in X,
i.e. N + (X) \ X, the frontier. It should be clear that any path that leaves X has to go through a
frontier vertex on the way out. Therefore for every v ∈ Y the shortest path from s must start in
X, since s ∈ X, and then leave X via a vertex in the frontier.
Question 13.20. Do you see how we can use this property to identify a vertex v ∈ Y
that is at least as close to the source as any other vertex in Y ?
Basically the intuition now is that at least one path that goes from X directly to a vertex on
the frontier, is an overall shortest path to any vertex in Y . This is because all paths to Y must
go through the frontier when exiting X, and since edges are non-negative, a subpath cannot be
longer than the full path.
204 CHAPTER 13. SHORTEST PATHS
Example 13.21. In the following graph suppose that we have found the shortest paths
from the source s to all the vertices in X (marked by numbers next to the vertices). The
shortest path from any vertex in X directly to a vertex in Y is the path to d followed
by the edge (d, v) with weight 9. If edge weights are non-negative there cannot be any
shorter way to get to v, whatever w(u, v) is, therefore we know that δ(s, v) = 9.
Y= V \ X
X 8
a
3
4
6
b
s u
5
7
c
3 ?
5
4
d v
This intuition is formalized in Lemma 13.22 along with a proof. The Lemma tells us that
once we know the shortest paths to a set X we can add more vertices to X with their shortest
paths. This gives us a crank that can churn out at least one new vertex on each round. Note that
this approach is actually an instance of priority-first search.
Recall that priority-first search is a graph search in which on each step we visit the frontier
vertices with the highest “priority”. In our case the visited set is X, and the priority is defined in
terms of p(v), the shortest-path weight consisting of a path to x ∈ X with an additional edge
from x to v. This gives us the following rather concise definition of Dijkstra’s algorithm.
Note that Dijkstra’s algorithm will visit vertices in non-decreasing shortest-path weight since
on each step it visits unvisited vertices that have the minimum shortest-path weight from s.
13.3. DIJKSTRA’S ALGORITHM 205
X Y=V \ X
vx vt
vm
s
Remark 13.25. It may be tempting to think that Dijkstra’s algorithm visits vertices
strictly in increasing order of shortest-path weight from the source, visiting vertices with
equal shortest-path weight on the same step. This is not true. To see this consider the
example below and convince yourself that it does not contradict our reasoning.
b
0
1
s a
0
c
206 CHAPTER 13. SHORTEST PATHS
Lemma 13.26. Dijkstra’s algorithm returns, d(v) = δG (s, v) for v reachable from s.
Proof. We show the invariant for each step that for all x ∈ X (the visited set), d(x) =
δG (s, x). This is true at the start since X = {s} and d(s) = 0. On each step the search
adds vertices v that minimizes P (v) = minx∈X (d(x) + w(x, v)). By our assumption
we have that d(x) = δG (s, x) for x ∈ X. By Lemma 13.22, p(v) = δG (s, v), giving
d(v) = δG (s, v) for the newly added vertices, maintaining the invariant. As with all
priority-first searches, it will eventually visit all reachable v.
Exercise 13.27. Argue that if all edges in a graph have weight 1 then Dijkstra’s algorithm
as described visits exactly the same set of vertices in each round as BFS does.
We now discuss how to implement this abstract algorithm efficiently using a priority queue to
maintain P (v). We use a priority queue that supports deleteMin and insert. The priority-
queue based algorithm is given in Algorithm 13.28. This variant of the algorithm only adds one
vertex at the time even if there are multiple vertices with equal distance that could be added in
parallel. It can be made parallel by generalizing priority queues, but we leave this as an exercise.
The algorithm maintains the visited set X as a table mapping each visited vertex u to
d(u) = δG (s, u). It also maintains a priority queue Q that keeps the frontier prioritized based on
the shortest distance from s directly from vertices in X. On each round, the algorithm selects the
vertex x with least distance d in the priority queue (line 8 in the code) and, if it hasn’t already
been visited, visits it by adding (x 7→ d) to the table of visited vertices (line 15), and then adds
all its neighbors v to Q along with the priority d(x) + w(x, v) (i.e. the distance to v through x)
(lines 17 and 18). Note that a neighbor might already be in Q since it could have been added
by another of its in-neighbors. Q can therefore contain duplicate entries for a vertex, but what
is important is that the minimum distance will always be pulled out first. line 10 checks to see
whether a vertex pulled from the priority queue has already been visited and discard it if it has.
This algorithm is just a concrete implementation of the previously described Dijkstra’s algorithm.
We note that there are a couple variants on Dijkstra’s algorithm using Priority Queues. Firstly
we could check insider the relax function whether whether u is already in X and if so not
insert it into the Priority Queue. This does not affect the asymptotic work bounds but probably
would give some improvement in practice. Another variant is to decrease the priority of the
neighbors instead of adding duplicates to the priority queue. This requires a more powerful
priority queue that supports a decreaseKey function.
Costs of Dijkstra’s Algorithm. We now consider the work of the Priority Queue version of
Dijkstra’s algorithm. The algorithm is sequential so the span is the same. We analyze the work
by counting up the number of operations. In the algorithm (Figure 13.28) includes a box around
13.3. DIJKSTRA’S ALGORITHM 207
each operation on the graph G, the set of visited vertices X, or the priority queue PQ. The
PQ.insert in line 21 is called only once, so we can ignore it. Of the remaining operations,
The iter and NG (v) on line 18 are on the graph, lines 10 and 15 are on the table of visited
vertices, and lines 8 and 17 are on the priority queue. For the priority queue operations, we
have only discussed one cost model which, for a queue of size n, requires O(log n) work and
span for each of PQ.insert and PQ.deleteMin. We have no need for a meld operation
here. For the graph, we can either use a tree-based table or an array to access the neighbors1
There is no need for single threaded array since we are not updating the graph. For the table of
distances to visited vertices we can use a tree table, an array sequence, or a single threaded array
sequences. The following table summarizes the costs of the operations, along with the number
of calls made to each operation. There is no parallelism in the algorithm so we only need to
consider the sequential execution of the calls.
1
We could also use a hash table, but we have not yet discussed them.
208 CHAPTER 13. SHORTEST PATHS
Example 13.29. An example run of Dijkstra’s algorithm. Note that after visiting s, a,
and b, the queue Q contains two distances for c corresponding to the two paths from s
to c discovered thus far. The algorithm takes the shortest distance and adds it to X. A
similar situation arises when c is visited, but this time for d. Note that when c is visited,
an additional distance for a is added to the priority queue even though it is already
visited. Redundant entries for both are removed next before visiting d. The vertex e is
never visited as it is unreachable from s. Finally, notice that the distances in X never
decrease.
2 X Q 2 X Q
1 a b 1 a b
s@0 s@0 a@1
c@5
s 2 1 5 s 2 1 5
5 3 5 3
c d c d
0 0
e e
2 X Q 2 X Q
1 a b 1 a b
s@0 b@2 s@0 c@4
a@1 c@5 a@1 c@5
s 2 1 5 s 2 1 5 b@3 d@8
5 3 5 3
c d c d
0 0
e e
2 X Q 2 X Q
1 a b 1 a b
s@0 c@5 s@0 d@7
a@1 a@6 a@1 d@8
s 2 1 5 b@2 d@7 s 2 1 5 b@2
c@4 d@8 c@4
5 3 5 3
c d c d
0 0
e e
2 2 X Q
a b X Q 1 a b
1 s@0
s@0 d@8
a@1 a@1
s 2 1 5 s 2 1 5 b@2
b@2
c@4 c@4
3 5 3
5 c d d@7 c d d@7
0 0
e e
13.4. THE BELLMAN FORD ALGORITHM 209
We now turn to solving the single source shortest path problem in the general case where we
allow negative weights in the graph. One might ask how negative weights make sense. If talking
about distances on a map, they probably do not, but various other problems reduce to shortest
paths, and in these reductions negative weights show up. Before proceeding we note that if
there is a negative weight cycle (the sum of weights on the cycle is negative) reachable from the
source, then there cannot be a solution. This is because every time we go around the cycle we
get a shorter path, so to find a shortest path we would just go around forever. In the case that
a negative weight cycle can be reached from the source vertex, we would like solutions to the
S SSP problem to return some indicator that such a cycle exists and terminate.
Exercise 13.30. Consider the following currency exchange problem: given the a set
currencies, a set of exchange rates between them, and a source currency s, find for each
other currency v the best sequence of exchanges to get from s to v. Hint: how can you
convert multiplication to addition.
210 CHAPTER 13. SHORTEST PATHS
Exercise 13.31. In your solution to the previous exercise can you get negative weight
cycles? If so, what does this mean?
Question 13.32. Why can’t we use Dijkstra’s algorithm to compute shortest path when
there are negative edges?
Recall that in our development of Dijkstra’s algorithm we assumed non-negative edge weights.
This both allowed us to only consider simple paths (with no cycles) but more importantly played
a critical role in arguing the correctness of Dijkstra’s property. To see where Dijkstra’s property
fails with negative edge weights consider the following very simple example:
∞ 3 0
a a a
3 3 3
-2 -2 -2
0 s 0 s 0 s
2 b 2 b 2 b
∞ 2 2
Dijkstra’s algorithm would visit b then a and leave b with a distance of 2 instead of the correct
distance 1. The problem is that the overall shortest path directly from the visited set to the
frontier is not necessarily the shortest path to that vertex. There can be a shorter path that first
steps further away (to a in the example), and then reduces the length with the negative edge
weight.
Question 13.33. How can we find shortest paths on a graph with negative weights?
Question 13.34. Recall that for Dijkstra’s algorithm, we started with the brute-force
algorithm and realized a key property of shortest paths. Do you recall the property?
A property we can still take advantage of is that the subpaths of a shortest paths themselves
are shortest. Using this property, we can build shortest paths in a slightly different way: by
bounding the number of edges on a path.
Question 13.35. Suppose that you have computed the shortest path with ` or less edges
from s to all vertices. Can you come up with an algorithm to update the shortest paths
for ` + 1 edges?
13.4. THE BELLMAN FORD ALGORITHM 211
4 function BF(D, k) =
5 let
6 D0 = {v 7→ min(Dv , minu∈NG− (v) (Du + w(u, v))) : v ∈ V }
7 in
8 if (k = |V |) then ⊥
9 else if (all{Dv = Dv0 : v ∈ V }) then D
10 else BF (D0 , k + 1)
11 end
12 D = {v 7→ if v = s then 0 else ∞ : v ∈ V }
13 in BF (D, 0) end
If we have computed the shortest path with ` or less edges from s to all vertices, we can
computer the shortest paths for ` + 1 edges by considering extending all paths by one edge. To
do this all we need to do is consider all incoming edges of each vertex. This is the idea behind
the Bellman-Ford algorithm.
We define the following
l
δG (s, t) = the shortest weighted path from s to t using at most l edges.
0
We can start by determining δG (s, v) for all v ∈ V , which is infinite for all vertices except s itself,
1
for which it is 0. Then perhaps we can use this to determine δG (s, v) for all v ∈ V . In general
k+1 k
we want to determine δG (s, v) based on all δG (s, v). The question is how do we calculate this.
It turns out to be easy since to determine the shortest path with at most k + 1 edges to a vertex v
all that is needed is the shortest path with k edges to each of its in-neighbors and then to add in
the weight of the one additional edge. This gives us
δ k+1 (v) = min(δ k (v), min
−
(δ k (x) + w(x, v)) ).
x∈N (v)
∞ 1 ∞ 3
1 ∞ 0
1
4
a c a c a c
3 3 3
-2 1 -2 1 -2 1
0 s 0 s 0 s
2 b d 2 b d 2 b d
1 1 1
∞ ∞ 2 ∞ 2 3
0 1 0 1
1 1
a c a c
3 3
-2 1 -2 1
0 s 0 s
2 b d 2 b d
1 1
2 3 2 2
Figure 13.1: Steps of the Bellman Ford algorithm. The numbers with red squares indicate what
changed on each step.
Proof. By induction on the number of edges k in a path. The base case is correct since Ds = 0.
For all v ∈ V \ s, on each step a shortest (s, v) path with up to k + 1 edges must consist of a
shortest (s, u) path of up to k edges followed by a single edge (u, v). Therefore if we take the
minimum of these we get the overall shortest path with up to k + 1 edges. For the source the self
edge will maintain Ds = 0. The algorithm can only proceed to n rounds if there is a reachable
negative-weight cycle. Otherwise a shortest path to every v is simple and can consist of at most
n vertices and hence n − 1 edges.
Cost of Bellman Ford. We now analyze the cost of the algorithm. First we assume the graph
is represented as a table of tables, as we suggested for the implementation of Dijkstra. We then
consider representing it as a sequence of sequences.
For a table of tables we assume the graph G is represented as a (R vtxTable) vtxTable,
where vtxTable maps vertices to values. The R are the real valued weights on the edges. We
assume the distances D are represented as a R vtxTable. Let’s consider the cost of one call
to BF , not including the recursive calls. The only nontrivial computations are on lines 6 and 9.
Line 6 consists of a tabulate over the vertices. As the cost specification for tables indicate, to
calculated the work we take the sum of the work for each vertex, and for the span we take the
maximum of the spans, and add O(log n). Now consider what the algorithm does for each vertex.
First it has to find the neighbors in the graph (using a find G v). This requires O(log |V |)
13.4. THE BELLMAN FORD ALGORITHM 213
work and span. Then it involves a map over the neighbors. Each instance of this map requires
a find in the distance table to get Du and an addition of the weight. The find takes O(log |V |)
work and span. Finally there is a reduce that takes O(1 + |NG (v)|) work and O(log |NG (v)|)
span. Using n = |V | and m = |E|, the overall work and span are therefore
X X
W = O log n + |NG (v)| + (1 + log n)
v∈V u∈NG (v)
= O ((n + m) log n)
S = O max log n + log |NG (v)| + max (1 + log n)
v∈V u∈N (v)
= O(log n)
Line 9 is simpler to analyze since it only involves a tabulate and a reduction. It requires O(n log n)
work and O(log n) span.
Now the number of calls to BF is bounded by n, as discussed earlier. These calls are done
sequentially so we can multiply the work and span for each call by the number of calls giving:
W (n, m) = O(nm log n)
S(n, m) = O(n log n)
Cost of Bellman Ford using Sequences If we assume the vertices are the integers {0, 1, . . . , |V |−
1} then we can use array sequences to implement a vtxTable. Instead of using a find, which
requires O(log n) work, we can use nth requiring only O(1) work. This improvement in costs
can be applied for looking up in the graph to find the neighbors of a vertex, and looking up in the
distance table to find the current distance. By using the improved costs we get:
X X
W = O 1 + |NG (v)| + 1
v∈V u∈NG (v)
= O(m)
S = O max 1 + log |NG (v)| + max 1
v∈V u∈N (v)
= O(log n)
and hence the overall complexity for BellmanFord with array sequences is:
W (n, m) = O(nm)
S(n, m) = O(n log n)