0% found this document useful (0 votes)
105 views

AlgNotes PDF

This document discusses algorithms and provides an overview of the following topics: 1. It defines what an algorithm is and notes that algorithms are fundamental building blocks of computer programs. 2. It explains that designing and analyzing algorithms involves considering problems to solve, determining if solutions exist, finding and evaluating potential algorithm solutions, ensuring algorithms are correct, and analyzing their efficiency. 3. It uses the Fibonacci sequence problem to demonstrate how the design of an algorithm is more important than other factors like programming language or computer speed. It shows that a naive recursive solution is inefficient compared to an iterative solution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

AlgNotes PDF

This document discusses algorithms and provides an overview of the following topics: 1. It defines what an algorithm is and notes that algorithms are fundamental building blocks of computer programs. 2. It explains that designing and analyzing algorithms involves considering problems to solve, determining if solutions exist, finding and evaluating potential algorithm solutions, ensuring algorithms are correct, and analyzing their efficiency. 3. It uses the Fibonacci sequence problem to demonstrate how the design of an algorithm is more important than other factors like programming language or computer speed. It shows that a naive recursive solution is inefficient compared to an iterative solution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Computer Science and Software Engineering, 2008

CITS3210 Algorithms
Lecture Notes

Notes by CSSE, Comics by xkcd.com


1
Overview
Computer Science and Software Engineering, 2011
1. Introduction

(a) What are Algorithms?

(b) Design of Algorithms.

(c) Types of Algorithms.


CITS3210 Algorithms
2. Complexity
Introduction
(a) Growth rates.

(b) Asymptotic analysis, O and Θ.

(c) Average case analysis.

(d) Recurrence relations.

3. Sorting

(a) Insertion Sort.

(b) Merge Sort.

(c) QuickSort.
Notes by CSSE, Comics by xkcd.com
1 2

What will we be studying?

We will study a collection of algorithms,


What you should already know? examining their design, analysis and sometimes
even implementation. The topics we will cover
will be taken from the following list:
This unit will require the following basic
knowledge:
1. Specifying and implementing algorithms.

1. Java Programming: classes, control 2. Basic complexity analysis.


structures, recursion, testing, etc
3. Sorting Algorithms.
2. Data Structures: stacks, queues, lists,
trees, etc. 4. Graph algorithms.

5. Network flow algorithms.


3. Complexity: definition of “big O”, Θ
notation, amortized analysis etc.
6. Computational Geometry.

4. Some maths: proof methods, such as proof 7. String algorithms.


by induction, some understanding of
continuous functions
8. Greedy/Dynamic algorithms.

9. Optimization Algorithms.

3 4
What are the outcomes of this unit?

At the end of the unit you will:

1. be able to identify and abstract


What are algorithms?
computational problems.

An algorithm is a well-defined finite set of rules


2. know important algorithmic techniques and
that specifies a sequential series of elementary
a range of useful algorithms.
operations to be applied to some data called
the input, producing after a finite amount of
3. be able to implement algorithms as a time some data called the output.
solution to any solvable problem.
An algorithm solves some computational
4. be able to analyse the complexity and problem.
correctness of algorithms.
Algorithms (along with data structures) are the
5. be able to design correct and efficient fundamental “building blocks” from which
algorithms. programs are constructed. Only by fully
understanding them is it possible to write very
The course will proceed by covering a number effective programs.
of algorithms; as they are covered, the general
algorithmic technique involved will be
highlighted, and the role of appropriate data
structures, and efficient implementation
considered.
5 6

Design and Analysis

An algorithmic solution to a computational


problem will usually involve designing an Design and Analysis
algorithm, and then analysing its performance.
In designing and analysing an algorithm we
Design A good algorithm designer must have a should consider the following questions:
thorough background knowledge of algorithmic
techniques, but especially substantial creativity
and imagination. Often the most obvious way 1. What is the problem we have to solve?
of doing something is inefficient, and a better
solution will require thinking “out of the box”.
2. Does a solution exist?
In this respect, algorithm design is as much an
art as a science.
3. Can we find a solution (algorithm), and is
Analysis A good algorithm analyst must be there more than one solution?
able to carefully estimate or calculate the
resources (time, space or other) that the
4. Is the algorithm correct?
algorithm will use when running. This requires
logic, care and often some mathematical ability.
5. How efficient is the algorithm?
The aim of this course is to give you sufficient
background to understand and appreciate the
issues involved in the design and analysis of
algorithms.
7 8
The naive solution

The naive solution is to simply write a recursive


The importance of design method that directly models the problem.

By far the most important thing in a program static int fib(int n) {


is the design of the algorithm. It is far more
significant than the language the program is return (n<3 ? 1 : fib(n-1) + fib(n-2));
written in, or the clock speed of the computer.
}
To demonstrate this, we consider the problem
of computing the Fibonacci numbers. Is this a good algorithm/program in terms of
resource usage?
The Fibonacci sequence is the sequence of
integers starting Timing it on a (2005) iMac gives the following
results (the time is in seconds and is for a loop
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . calculating Fn 10000 times).
which is formally defined by
Value Time Value Time
F1 = F2 = 1 and Fn = Fn−1 + Fn−2. F20 1.65 F24 9.946
F21 2.51 F25 15.95
F22 3.94 F26 25.68
Let us devise an algorithm to compute Fn. F23 6.29 F27 41.40

How long will it take to compute F30, F40 or


F50?
9 10

Theoretical results

Each method call to fib() does roughly the


Experimental results
same amount of work (just two comparisons
and one addition), so we will have a very rough
Make a plot of the times taken.
estimate of the time taken if we count how
many method calls are made.

40.0
Exercise: Show the number of method calls
made to fib() is 2Fn − 1.

30.0

20.0

10.0

22 24 26

11 12
Re-design the algorithm

We can easily re-design the algorithm as an


iterative algorithm.

static int fib(int n) {


An Iterative Algorithm
int f_2; /* F(i+2) */
int f_1 = 1; /* F(i+1) */ An iterative algorithm gives the following
int f_0 = 1; /* F(i) */ times:

for (int i = 1; i < n; i++) { Value Time Value Time


/* F(i+2) = F(i+1) + F(i) */ F20 0.23 F103 0.25
f_2 = f_1 + f_0; F21 0.23 F104 0.48
F22 0.23 F105 2.20
F23 0.23 F106 20.26
/* F(i) = F(i+1); F(i+1) = F(i+2) */
f_0 = f_1;
f_1 = f_2;
}

return f_0;
}

13 14

Another solution?
Recurrence Relations
The Fibonacci sequence is specified by the
homogeneous recurrence relation:
Recurrence relations can be a useful way to
1 if n = 1, 2;
!
F (n) = specify the complexity of recursive functions.
F (n − 1) + F (n − 2) otherwise.

In general we can define a closed form for For example the linear homogeneous
these recurrence equations: recurrence relation:
1 if n = 1, 2;
!
F (n) =
F (n − 1) + F (n − 2) otherwise
F (n) = Aαn + Bβ n specifies the sequence 1, 1, 2, 3, 5, 8, 13, .....

where α, β are the roots of


In general a linear homogeneous recurrence
x2 − x − 1 = 0. relation is given as:
F (1) = c1
• You need to be able to derive a recurrence F (2) = c2
relation that describes an algorithms ...
complexity. F (k) = ck
F (n) = a1F (n − 1) + ... + ak F (n − k)
For example
• You need to be able to recognize that
1 if n = 1, 2;
!
linear recurrence relations specify F (n) =
exponential functions. 2F (n − 1) + F (n − 2) otherwise
specifies the sequence 1, 1, 3, 7, 17, 41, ...
See CLRS, Chapter 4.
15 16
Solving the recurrence Solving the recurrence

All linear homogeneous recurrence relations The roots of the polynomial are
specify exponential functions. We can find a " √
closed form for the recurrence relation as −b ± b2 − 4ac 1± 5
=
follows: 2a 2
and so the solution is
Suppose that F (n) = rn . # √ $n # √ $n
1+ 5 1− 5
Then rn = a1rn−1 + ... + ak r(n − k). We divide U (n) = A +B
2 2
both sides of the equation by rn−k .
Then rk = a1rk−1 + ... + ak .
To find r we can solve the polynomial If we substitute n = 1 and n = 2 into the
equation: rk − a1rk−1 − .... − aK = 0. equation we get
1 −1
A=√ B=√
There are k solutions, r1, ..., rk to this equation, 5 5
and each satisfies the recurrence: Thus
√ $n √ $n
F (n) = a1F (n − 1) + a2F (n − 1) + ... + ak F (n − k).
# #
1 1+ 5 1 1− 5
F (n) = √ −√
5 2 5 2
We also have to satisfy the rest of the
recurrence relation, F (1) = c1 etc. To do this
we can use a linear combination of the
solutions, rkn. That is, we must find α1, ..., αk
such that

F (n) = α1r1n + ... + αk rkn


This can be done by solving linear equations.
17 18

What is an algorithm?

We need to be more precise now what we


A computational problem: Sorting
mean by a problem, a solution and how we
shall judge whether or not an algorithm is a
Instance: A sequence L of comparable objects.
good solution to the problem.

Question: What is the sequence obtained when


A computational problem consists of a general
the elements of L are placed in ascending
description of a question to be answered,
order?
usually involving some free variables or
parameters.
An instance of Sorting is simply a specific list
of comparable items, such as
An instance of a computational problem is a
specific question obtained by assigning values L = [25, 15, 11, 30, 101, 16, 21, 2]
to the parameters of the problem.
or

An algorithm solves a computational problem if L = [“dog”, “cat”, “aardvark”, “possum”].


when presented with any instance of the
problem as input, it produces the answer to the
question as its output.

19 20
A computational problem: Travelling Salesman An algorithm for Sorting

Instance: A set of “cities” X together with a One simple algorithm for Sorting is called
“distance” d(x, y) between any pair x, y ∈ X. Insertion Sort. The basic principle is that it
takes a series of steps such that after the i-th
Question: What is the shortest circular route step, the first i objects in the array are sorted.
that starts and ends at a given city and visits Then the (i + 1)-th step inserts the (i + 1)-th
all the cities? element into the correct position, so that now
the first i + 1 elements are sorted.
An instance of Travelling Salesman is a list of
cities, together with the distances between the procedure INSERTION-SORT(A)
cities, such as for j ← 2 to length[A]
do key ← A[j]
X = {A, B, C, D, E, F } ! Insert A[j] into the sorted sequence
A B C D E F ! A[1 . . . j − 1]
A 0 2 4 ∞ 1 3 i=j−1
B 2 0 6 2 1 4 while i > 0 and A[i] > key
d= C 4 6 0 1 2 1 do A[i + 1] ← A[i]
D ∞ 2 1 0 6 1 i=i−1
E 1 1 2 6 0 3
A[i + 1] ← key
F 3 4 1 1 3 0

21 22

Pseudo-code Pseudo-code (contd)

Pseudo-code provides a way of expressing


• A statement v ← e implies that expression e
algorithms in a way that is independent of any
should be evaluated and the resulting value
programming language. It abstracts away other
assigned to variable v. Or, in the case of
program details such as the type system and
v1 ← v2 ← e, to variables v1 and v2.
declaring variables and arrays. Some points to
note are:
• All variables should be treated as local to
their procedures.
• The statement blocks are determined by
indentation, rather than { and } delimiters
as in Java. • Arrays indexation is denoted by A[i] and
arrays are assumed to be indexed from 1 to
N (rather than 0 to N − 1, the approach
• Control statements, such as if...
followed by Java).
then...else and while have similar
interpretations to Java.
See CLRS (page 19-20) for more details.

• The character ! is used to indicate a


But to return to the insertion sort: What do
comment line.
we actually mean by a good algorithm?

23 24
Correctness of insertion sort
Evaluating Algorithms
Insertion sort can be shown to be correct by a
proof by induction.
There are many considerations involved in this procedure INSERTION-SORT(A)
question. for j ← 2 to length[A]
do key ← A[j]
! Insert A[j] into the sorted sequence
! A[1 . . . j − 1]
• Correctness i=j−1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
1. Theoretical correctness i=i−1
A[i + 1] ← key
2. Numerical stability

We do the induction over the loop variable j.


• Efficiency
The base case of the induction is:
“the first element is sorted”,
1. Complexity
and the inductive step is:
2. Speed “given the first j elements are sorted after the
j th iteration, the first j + 1 elements will be
sorted after the j + 1th iteration.

25 26

Aside: Proof by Contradiction

Proof by Induction
Another proof technique you may need is proof
by contradiction.
To show insertion sort is correct, let p(n) be
the statement “after the nth iteration, the first
Here, if you want to show some property p is
n + 1 elements of the array are sorted”
true, you assume p is not true, and show this
assumption leads to a contradiction
To show p(0) we simply note that a single
(something we know is not true, like i < i).
element is always sorted.

For example, two sorted arrays of integers, L,


Given p(i) is true for all i < n, we must show
containing exactly the same elements, must be
that p(n) is true:
identical.
After the (n − 1)th iteration the first n
Proof by contradiction: Suppose M &= N are
elements of the array are sorted.
two distinct, sorted arrays containing the same
The nth iteration takes the (n + 1)th element
elements. Let i be the least number such that
and inserts it after the last element that a)
M [i] &= N [i]. Suppose a = M [i] < N [i]. Since M
comes before it, and b) is less than it.
and N contain the same elements, and
Therefore after the nth iteration, the first n + 1
M [j] = N [j] for all j < i, we must have
elements of the array are sorted.
a = N [k] for some k > i. But then N [k] < N [i]
so N is not sorted: contradiction.

27 28
Complexity of insertion sort Correctness

For simple programs, we can directly calculate An algorithm is correct if, when it terminates,
the number of basic operations that will be the output is a correct answer to the given
performed:
question.
procedure INSERTION-SORT(A)
1 for j ← 2 to length[A]
2 do key ← A[j] Incorrect algorithms or implementations
! Insert A[j] into the sorted sequence A[1 . . . j − 1]
3 i=j−1 abound, and there are many costly and
4 while i > 0 and A[i] > key embarrassing examples:
5 do A[i + 1] ← A[i]
6 i=i−1
7 A[i + 1] ← key
• Intel’s Pentium division bug—a
scientist discovered that the original
The block containing lines 2-7 will be executed Pentium chip gave incorrect results on
length[A] − 1 times, and contains 3 basic certain divisions. Intel only reluctantly
operations replaced the chips.
In the worst case the block containing lines 5-7 • USS Yorktown—after switching their
will be executed j − 1 times, and contains 2 systems to Windows NT, a “division by
basic operations. zero” error crashed every computer on
In the worst case the algorithm will take the ship, causing a multi-million dollar
warship to drift helplessly for several
(N − 1).3 + 2(2 + 3 + ... + N ) = N 2 + 4N − 5 hours.
where length[A] = N . • Others...?

29 30

Types of Algorithm
Theoretical correctness

For all solvable problems, you should (already!)


It is usually possible to give a mathematical
be able to produce a correct algorithm. The
proof that an algorithm in the abstract is
brute force approach simply requires you to
correct, but proving that an implementation
(that is, actual code) is correct is much more
difficult. 1. enumerate all possible solutions to the
problem, and
This is the province of an area known as
software verification, which attempts to
provide logical tools that allow specification of 2. iterate through them until you find one
programs and reasoning about programs to be that works.
done in a rigorous fashion.
This is rarely practical. Other strategies to
The alternative to formal software verification consider are:
is testing; although thorough testing is vital for
any program, one can never be certain that
everything possible has been covered. • Divide and conquer - Divide the
problem into smaller problems to solve.
Even with vigorous testing, there is always the • Dynamic programming.
possibility of hardware error—mission critical • Greedy algorithms.
software must take this into account. • Tree traversals/State space search

31 32
Numerical Stability Accumulation of errors

You can be fairly certain of exact results from Performing repeated calculations will take the
a computer program provided all arithmetic is small truncation errors and cause them to
done with the integers accumulate. The resulting error is known as
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} and you guard roundoff error. If we are careful or lucky, the
carefully about any overflow. roundoff error will tend to behave randomly,
both positive and negative, and the growth of
error will be slow.
However the situation is entirely different when
the problem involves real number, because
Certain calculations however, vastly increase
there is necessarily some round-off error when
roundoff error and can cause errors to grow
real numbers are stored in a computer. A
catastrophically to the point where they
floating point representation of a number in
completely swamp the real result.
base β with precision p is a representation of
the form. Two particular operations that can cause
d.ddddd × β e numerical instability are
where d.ddddd has exactly p digits.
• Subtraction of nearly equal quantities
• Division by numbers that are nearly
zero

It is important to be aware of the possibility for


roundoff error and to alter your algorithm
appropriately.
33 34

Efficiency
Measuring time

An algorithm is efficient if it uses as few


How should we measure the time taken by an
resources as possible. Typically the resources
algorithm?
which we are interested in are

We can do it experimentally by measuring the


• Time, and number of seconds it takes for a program to
• Space (memory) run — this is often called benchmarking and is
often seen in popular magazines. This can be
useful, but depends on many factors:
Other resources are important in practical
terms, but are outside the scope of the design
and analysis of algorithms. • The machine on which it is running.
• The language in which it is written.
In many situations there is a trade-off between • The skill of the programmer.
time and space, in that an algorithm can be • The instance on which the program is
made faster if it uses more space or smaller if being run, both in terms of size and
it takes longer. which particular instance it is.

Although a thorough analysis of an algorithm


So it is not an independent measure of the
should consider both time and space, time is
algorithm, but rather a measure of the
considered more important, and this course will
implementation, the machine and the instance.
focus on time complexity.

35 36
Complexity
Example
The complexity of an algorithm is a
“device-independent” measure of how much Suppose you run a small business and have a
time it consumes. Rather than expressing the program to keep track of your 1024 customers.
time consumed in seconds, we attempt to The list of customers is changing frequently
count how many “elementary operations” the and you often need to sort it. Your two
algorithm performs when presented with programmers Alice and Bob both come up
instances of different sizes. with algorithms.

The result is expressed as a function, giving Alice presents an algorithm that will sort n
the number of operations in terms of the size names using 256n lg n comparisons and Bob
of the instance. This measure is not as precise presents an algorithm that uses n2
as a benchmark, but much more useful for comparisons. (Note: lg n ≡ log2 n)
answering the kind of questions that commonly
arise:
Your current computer system takes 10−3
seconds to make one comparison, and so when
• I want to solve a problem twice as your boss benchmarks the algorithms he
big. How long will that take me? concludes that clearly Bob’s algorithm is
• We can afford to buy a machine twice better.
as fast? What size of problem can we
solve in the same time? Size Alice Bob
1024 2621 1049

The answers to questions like this depend on But is he right?


the complexity of the algorithm.
37 38

Hardware improvement

A time-critical application requires you to sort


Expansion as many items as possible in an hour. How
many can you sort?
Alice however points out that the business is
expanding and that using Bob’s algorithm An hour has 3600 seconds, so we can make
3600000 comparisons. Thus if Alice’s
could be a mistake. As the business expands,
algorithm can sort nA items, and Bob’s nB
her algorithm becomes more competitive, and
items, then
soon overtakes Bob’s.
3600000 = 256nA lg nA = n2
B,
Size Alice Bob which has the solution
1024 2621 1049
2048 5767 4194 nA = 1352 nB = 1897.
4096 12583 16777
8192 27263 67109 But suppose that we replace the machines with
ones that are four times as fast. Now each
So Alice’s algorithm is much better placed for comparison takes 14 × 10
−3 seconds so we can

expansion. make 14400000 comparisons in the same time.


Solving
A benchmark only tells you about the situation 14400000 = 256nA lg nA = n2
B,
today, whereas a software developer should be yields
thinking about the situation both today and
nA = 4620 nB = 3794.
tomorrow!

Notice that Alice’s algorithm gains much more


from the faster machines than Bob’s.
39 40
Different instances of the same size

So far we have assumed that the algorithm


takes the same amount of time on every
instance of the same size. But this is almost Worst case analysis
never true, and so we must decide whether to
do best case, worst case or average case Most often, algorithms are analysed by their
analysis. worst case running times — the reasons for
this are:
In best case analysis we consider the time
taken by the algorithm to be the time it takes
• This is the only “safe” analysis that
on the best input of size n.
provides a guaranteed upper bound on the
time taken by the algorithm.
In worst case analysis we consider the time
taken by the algorithm to be the time it takes
on the worst input of size n. • Average case analysis requires making some
assumptions about the probability
In average case analysis we consider the time distribution of the inputs.
taken by the algorithm to be the average of
the times taken on inputs of size n.
• Average case analysis is much harder to do.

Best case analysis has only a limited role, so


normally the choice is between a worst case
analysis or attempting to do an average case
analysis.
41 42

Big-O notation
Big-Theta notation
Our analysis of insertion sort showed that it
took about n2 + 4n − 5 operations, but this is
Big-O notation defines an asymptotic upper
more precise than necessary. As previously
bound for a function f (n). But sometimes we
discussed, the most important thing about the
can define a lower bound as well, allowing a
time taken by an algorithm is its rate of
tighter constraint to be defined. In this case
growth. The fact that it is n2/2 rather than
we use an alternative notation.
2n2 or n2/10 is considered irrelevant. This
motivates the traditional definition of Big-O
Definition A function f (n) is said to be
notation.
Θ(g(n)) if there are constants c1, c2 and N
such that
Definition A function f (n) is said to be
O(g(n)) if there are constants c and N such 0 ≤ c1g(n) ≤ f (n) ≤ c2g(n) ∀n ≥ N.
that
f (n) ≤ cg(n) ∀n ≥ N. If we say that f (n) = Θ(n2) then we are
implying that f (n) is approximately
proportional to n2 for large values of n.
Thus by taking g(n) = n2, c = 1 and N = 1 we
conclude that the running time of Insertion
Sort is O(n2), and moreover this is the best See CLRS (section 3) for a more detailed
bound that we can find. (In other words description of the O and Θ notation.
Insertion Sort is not O(n) or O(n lg n).)

43 44
Why is big-O notation useful? An asymptotically better sorting algorithm

In one sense, big-O notation hides or loses a procedure MERGE-SORT(A, p, r)


lot of useful information. For example, the if p < r
functions then q ← ,(p + r)/2-
MERGE-SORT(A, p, q)
MERGE-SORT(A, q + 1, r)
f (n) = n2 / 1000
MERGE(A, p, q, r)
g(n) = 100 n2
h(n) = 1010 n2
procedure MERGE(A, p, q, r)
are all O(n2) despite being quite different. n1 ← q − p + 1; n2 ← r − q
allocate arrays L[1 . . . n1 + 1] and R[1 . . . n2 + 1]
However in another sense, the notation for i ← 1 to n1
contains the essential information, in that it do L[i] ← A[p + i − 1]
completely describes the asymptotic rate of for j ← 1 to n2
growth of the function. In particular it contains do R[j] ← A[q + j]
enough information to give answers to the L[n1 + 1] ← ∞; R[n2 + 1] ← ∞
questions: i ← 1; j ← 1
for k ← p to r
do if L[i] ≤ R[j]
• Which algorithm will ultimately be then A[k] ← L[i]
faster as the input size increases? i←i+1
• If I buy a machine 10 times as fast, else A[k] ← R[j]
what size problems can I solve in the j ←j+1
same time?

45 46

Merge-sort complexity
The Master Theorem
The complexity of Merge Sort can be shown to
be Θ(nlgn). Merge Sort’s complexity can be described by
the recurrence relation:

F (n) = 2F (n/2) + n, where F (1) = 1.

As this variety of recurrence relation appears


frequently in divide and conquer algorithms it
is useful to have an method to find the
asymptotic complexity of these functions.

The Master Theorem: Let f (n) be a


function described by the recurrence::

f (n) = af (n/b) + cnd.


where a, b ≥ 1, d ≥ 0 and c > 0 are constants.
Then
O(nd) if a < bd
 

 

f (n) is O(n d lgn) if a = bd
O(nlogba) if a > bd

 

See CLRS, 4.3.

47 48
Average case analysis Inversions

Definition An inversion in a permutation σ is


The major problem with average case analysis
an ordered pair (i, j) such that
is that we must make an assumption about the
probability distribution of the inputs. For a i < j and σi > σj .
problem like Sorting there is at least a
theoretically reasonable choice—assume that For example, the permutation σ = 1342 has
every permutation of length n has an equal two inversions, while σ = 2431 has four.
chance of occurring (already we are assuming
that the list has no duplicates). It is straightforward to see that the number of
comparisons that a permutation requires to be
For example, we can consider each of the 24
permutations when sorting four inputs with sorted is equal to the number of inversions in it
insertion sort: (check this!) plus a constant, c.
Comparisons Inputs (For sorting four inputs, c = 3)
3 1234, 2134
4 1243, 1324, 2143, 2314, 3124, 3214
5 1342, 1423, 2341, 2413, 3142, 3241, So the average number of comparisons
4123, 4213 required is equal to the average number of
6 1432, 2431, 3412, 3421, 4132, 4231,
inversions in all the permutations of length n.
4312, 4321

So the weighted average of comparisons is Theorem The average number of inversions


among all the permutations of length n is
(3 × 2) + (4 × 6) + (5 × 8) + (6 × 8)
= 4.916 n(n − 1)/4.
24
(recall that the best case for four inputs is 3,
Thus Insertion Sort takes O(n2) time on
whereas the worst case is 6). average.
49 50

Input size
An asymptotically worse algorithm
The complexity of an algorithm is a measure of
Quicksort is Θ(n2), but it’s average complexity how long it takes as a function of the size of
is better than Merge-sort! (CLRS Chapter 7) the input. For Sorting we took the number of
items n, as a measure of the size of the input.
procedure QUICKSORT(A, p, r)
if p < r This is only true provided that the actual size
then q ← PARTITION(A, p, r) of the items does not grow as their number
QUICKSORT(A, p, q − 1) increases. As long as they are all some
QUICKSORT(A, q + 1, r) constant size K, then the input size is Kn. The
actual value of the constant does not matter,
as we are only expressing the complexity in
procedure PARTITION(A, p, r) big-O notation, which suppresses all constants.
x ← A[r]
i←p−1 But what is an appropriate input parameter for
for j ← p to r − 1 Travelling Salesman? If the instance has n
do if A[j] ≤ x cities, then the input itself has size Kn2—this
then i ← i + 1 is because we need to specify the distance
exchange A[i] ↔ A[j] between each pair of cities.
exchange A[i + 1] ↔ A[r]
return i + 1 Therefore you must be careful about what
parameter most accurately reflects the size of
the input.

51 52
Travelling Salesman Good Algorithms

Naive solution: Consider every permutation of Theoretical computer scientists use a very
the n cities, and compute the length of the broad brush to distinguish between good and
resulting tour, saving the shortest length. bad algorithms.

How long will this take? We count two main An algorithm is good if it runs in time that is a
operations polynomial function of the size of the input,
otherwise it is bad.

• Generating the n! permutations. Good O(1) constant time


• Evaluating each tour at cost of O(n). O(n) linear
O(n lg n) loglinear
O(n2) quadratic
.. ..
If we assume that by clever programming, we
can compute each permutation in constant O(nk ) “polynomial”
.. ..
time, then the total time is O(n.n!). .. ..
Bad 2n exponential
Is this a good algorithm? cn exponential
n! factorial

A problem for which no polynomial time


algorithm can be found is called intractable. As
far as we know, Travelling Salesman is an
intractable problem, though no-one has proved
this.
53 54

Summary (cont.)

Summary 6. Different kinds of algorithms have been


defined, including brute-force algorithms,
divide and conquer algorithms, greedy
1. An algorithm is a well defined set of rules
algorithms, dynamic algorithms, and tree
for solving a computational problem.
traversals.

2. A well designed algorithm should be


7. The efficiency of an algorithm is a measure
efficient for problems of all sizes.
of complexity that indicates how long an
algorithm will take.
3. Algorithms are generally evaluated with
respect to correctness, stability, and
8. Big “O” is a measure of complexity that is
efficiency (for space and speed).
the asymptotic worst case upper bound.

4. Theoretical correctness can be established


9. Θ (big theta) is a measure of complexity
using mathematical proof.
that is the asymptotic worst case tight
bound.
5. Numerical stability is required for
algorithms to give accurate answers. 10. Average case analysis attempts to measure
how fast an algorithm is for an average
(typical) input.

55 56
Summary (cont.)

11. Insertion sort is a sorting algorithm that


runs in time O(n2).

12. Merge sort is a sorting algorithm that runs


in time O(nlgn).

13. Quicksort is a sorting algorithm that runs


in time O(n2) but is faster than Merge sort
in the average case.

14. Polynomial algorithms (e.g.


O(n), O(nlgn), O(nk )) are regarded as
feasible.

15. Exponential algorithms (e.g. O(2n ), O(n))


are regarded as infeasible.

57
Overview
Computer Science and Software Engineering, 2011
1. Introduction

- Terminology and definitions


- Graph representations

2. Tree Search
CITS3210 Algorithms - Breadth first search
- Depth first search
Graph Algorithms - Topological sort

3. Minimum Spanning Trees

- Kruskal’s algorithm
- Prim’s algorithm
- Implementations
- Priority first search

4. Shortest Path Algorithms

- Dijkstra’s algorithm
- Bellman-Ford algorithm
- Dynamic Programming
Notes by CSSE, Comics by xkcd.com
1 2

Isomorphisms

What is a graph?
Consider the following two graphs:

Definition A graph G consists of a set V (G)


called vertices together with a collection E(G) D C 4
""
! "" !
of pairs of vertices. Each pair {x, y} ∈ E(G) is ! ""
"
!
! "" !
! "" !
called an edge of G. ! "" !
! "" !
! "" !
! " !

Example If A B 1 2 3

V (G) = {A, B, C, D} Apart from the “names” of the vertices and


and the geometric positions it is clear that these
two graphs are basically the same — in this
E(G) = {{A, B}, {C, D}, {A, D}, {B, C}, {A, C}}
situation we say that they are isomorphic.
then G is a graph with 4 vertices and 5 edges.
Definition Two graphs G1 and G2 are
isomorphic if there is a one-one mapping
φ : V (G1) → V (G2) such that
{φ(x), φ(y)} ∈ E(G2) if and only if
D C {x, y} ∈ E(G1).
!
!
!
!
!
! In this case the isomorphism is given by the
!
!
mapping
A B
φ(A) = 2 φ(B) = 3 φ(C) = 4 φ(D) = 1
3 4
What are graphs used for?

Graphs are used to model a wide range of


commonly occurring situations, enabling More examples
questions about a particular problem to be
reduced to certain well-studied “standard” In computing: A graph can be used to
graph theory questions. represent processors that are connected via a
communication link in a parallel computer
For examples consider the three graphs G1, G2 system.
and G3 defined as follows:
In chemistry: The vertices of a graph can be
V (G1) = all the telephone exchanges in used to represent the carbon atoms in a
Australia, and {x, y} ∈ E(G1) if exchanges x molecule, and an edge between two vertices
and y are physically connected by fibre-optic represents the bond between the corresponding
cable. atoms.

!
V (G2) = all the airstrips in the world, and
!
{x, y} ∈ E(G2) if there is a direct passenger ! ! !
! !!" !
"! "! "
flight from x to y. " !"

! !
!
! "
V (G3) = all the people who have ever !"

published a paper in a refereed journal in the


world, and {x, y} ∈ E(G3) if x and y have been
joint authors on a paper.

5 6

In games: The vertices can be the 64 squares An example graph


on a chessboard, and the edge that joins two
squares can be used to denote the valid Consider the following graph G4.
movement of a knight from one square to the
other.

! ! ! ! ! ! ! !
%&& #& % &$$ #&
% &$$ #&
% &$$ #&
% &$$ #&
% &$$#% $$# 1 4
% && # $ %$&& # $ %$&& # $ %$&& # $ %$&& # $ %$&& # $%$ #
#$&%&$ #$&& #$&& #$&& #$&& % $#$&&
!$$ ! $ ! $ ! $ ! $ ! $ # &! %& # &!
% $ % $ % $ % $ % #
%# $ %&# &$ %&# &$ %&# &$ %&# &$ %&
%&& #% #&% & #%$$ #&
% & #%$$ #&
% & #%$$ #&
% & #%$$ #&
% & #%$$#% #% $$#
%# & %&
# $ # &
%$ %&# $ %$ # & %&# $ %$ # & %&# $ %$ # & %&# $ %$ # &%&# $%$ # %#
$& $& $& $& $& $&
!& #! %# & #! %# & #! %# & #! %# & #! %# & $%#! %# & $%!
#%$$ #% #%$ && $#% #& %$& $#% #& %$& $#% #& %$& $#% #& %$& $#% #& % &#%
$# %# %& $ $%&
$ $%&
$ $%&
$ $%&
$
% & #%& #% & $ %$ #% &
#& $ %$ #% &
#& $ %$ #% &
#& $ %$ #% &
#& $ %$ #% $#%
#& $ #
%# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $%# %#
# &
!& #! %# & #! %# & #! %# & #! %# & #! %# &
2 5
$%#! %# & $%!
#%$$ #% #%& $& $#% #& %$& $#% #& %$& $#% #& %$& $#% #& %$& $#% #& % &#%
$
# %# %& $ $%&
$ $%&
$ $%&
$ $%&
$ " !
% & #%& #% & $ %$ #% &
#& $ %$ #% &
#& $#%
&$ #% & $ %$ #% &
#& $#%
&$ #% $#%$ # " !
%# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $%# %#
# & " !
!& !# %# & #! %# & #! %# & #! %# & #! %# & $%#! %# & $%!
#%$$ #% #%& $& $# % # &
%$& $# % # &
%$& $# % # &
%$& $# % # &
%$& $#% #&
% & #% " !
$
# %# %& $ $ $
%& $%&
$ $%&
$ $%&
$ " !
% & #%& #% & $ %$ #% &
#& $ %$ #% &
#& $#%
&$ #% & $ %$ #% &
#& $#%
&$ #% $#%$ # " !
%# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $
# & %# %& $ $%# %#
# & " !
!& !# %# & #! %# & #! %# & #! %# & #! %# & $%#! %# & $%!
#%$$ #% #%& $& $# % # &
%$& $# % # &
%$& $# % # &
%$& $# % # &
%$& $#% #&
% & #% " !
$
# %# %& $ $ $
%& $ $
%& $ $
%& $ $
%&
% & #%& #% &
%# %& $
# &
$ %$ #% &
#&
%# %& $
# &
$
%# %&
%$ #% &
#&
$
# &
$
%# %&
%$ #% &
#&
$
# &
$
%# %&
%$ #% &
#&
$
# &
$
%# %&
%$ #% $#%$ #
#&
$%# %#
# &
3 6 7
$ $ $ $ $ $
!& !# %# & !# %# & #! %# & #! %# & #! %# & $%#! %# & $%!
#%$$ #% #%& $& $#% #& %$& $#% #& %$& $#% #& %$& $#% #& %$& $#% #& % &#%
$
# %# %& $ $ $
%& $ $
%& $ $
%& $ $
%&
& #%& &
$ %$
#& &$ %$
#& &$ %$
#& &$ %$
#& &$ %$
#& $#%$
# %&$&$# %& $&$# %& $&$# %& $&$# %& $&$# %& $&$# %
!$
#
#$$% #&
%$ !#
$& $% #&
&%$ !#
$& $% #&
&%$#!
$& $% #&
& #!
%$
$& $% #&
& #!
%$
$&
&%#!
$% #&&%
&%!
The graph G4 has 7 vertices and 9 edges.

7 8
Basic properties of graphs

Let us consider some of the basic terminology


of graphs:
Subgraphs
Adjacency If {x, y} ∈ E(G), we say that x and
y are adjacent to each other, and sometimes If G is a graph, then a subgraph H is a graph
write x ∼ y. The number of vertices adjacent such that
to v is called the degree or valency of v. The V (H) ⊆ V (G)
sum of the degrees of the vertices of a graph is
and
even.
E(H) ⊆ E(G)
Paths A path of length n in a graph is a
sequence of vertices v1 ∼ v2 ∼ · · · ∼ vn+1 such A spanning subgraph H has the property that
that (vi, vi+1) ∈ E(G) and vertices V (H) = V (G) — in other words H has been
{v1, v2, . . . , vn+1} are distinct. obtained from G only by removing edges.

Cycles A cycle of length n is a sequence of An induced subgraph H must contain every


vertices v1 ∼ v2 ∼ · · · vn ∼ vn+1 such that edge of G whose endpoints lie in V (H) — in
v1 = vn+1, (vi, vi+1) ∈ E(G) and therefore only other words H has been obtained from G by
vertices {v1, v2, . . . , vn} are distinct. removing vertices and their adjoining edges.

Distance The distance between two vertices x


and y in a graph is the length of the shortest
path between them.
9 10

Counting Exercises

In the graph G4:


Connectivity, forests and trees

• How many paths are there from 1 to


Connected A graph G is connected if there is
7?
a path between any two vertices. If the graph
• How many cycles are there?
is not connected then its connected
• How many spanning subgraphs are
components are the maximal induced
there?
subgraphs that are connected.
• How many induced subgraphs are
there?
Forests A forest is a graph that has no cycles.

Trees A tree is a forest with only one


connected component. It is easy to see that a
1 4 tree with n vertices must have exactly n − 1
edges.

The vertices of degree 1 in a tree are called


2 5
!
! "
"
the leaves of the tree.
! "
! "
! "
! "
! "
! "
3 6 7

11 12
Distance in weighted graphs

Directed and weighted graphs When talking about weighted graphs, we need
to extend the concept of distance.
There are two important extensions to the
basic definition of a graph. Definition In a weighted graph X a path

x = x0 ∼ x1 ∼ · · · ∼ xn = y
Directed graphs In a directed graph, an
edge is an ordered pair of vertices, and hence has weight
has a direction. In directed graphs, edges are i=n−1
!
often called arcs. w(xi , xi+1).
i=0
The shortest path between two vertices x and
Directed Tree Each vertex has at most one
y is the path of minimum weight.
directed edge leading into it, and there is one
vertex (the root) which has a path to every
other vertex.

Weighted graphs In a weighted graph, each


of the edges is assigned a weight (usually a
non-negative integer). More formally we say
that a weighted graph is a graph G together
with a weight function w : E(G) → R (then
w(e) represents the weight of the edge e).

13 14

Representation of graphs

There are two main ways to represent a graph


— adjacency lists or an adjacency matrix.

Adjacency lists The graph G is represented


For comparison...
by an array of |V (G)| linked lists, with each list
containing the neighbours of a vertex.
...the graph G4.
Therefore we would represent G4 as follows:

!
1 2 !
! ! !
2 1 3 5 !
3 !
2 !
5 !
6 ! 1 4
!
4 5 !
! ! ! ! !
5 2 3 4 6 7 !
! ! !
6 3 5 7 ! 2 5
" #
! ! " #
7 5 6 ! " #
" #
" #
" #
" #
This representation requires two list elements " #
3 6 7
for each edge and therefore the space required
is Θ(|V (G)| + |E(G)|).

Note: In general to avoid writing |V (G)| and


|E(G)| we shall simply put V = |V (G)| and
E = |E(G)|.
15 16
Adjacency matrix More on the two representations

The adjacency matrix of a graph G is a V × V For small graphs or those without weighted
matrix A where the rows and columns are edges it is often better to use the adjacency
indexed by the vertices and such that Aij = 1 if matrix representation anyway.
and only if vertex i is adjacent to vertex j.
It is also easy and more intuitive to define
For graph G4 we have the following adjacency matrix representations for directed
and weighted graphs.
 
0 1 0 0 0 0 0

 1 0 1 0 1 0 0 

0 1 0 0 1 1 0
 
However your final choice of representation
 
 
A= 0 0 0 0 1 0 0 
  depends precisely what questions you will be

 0 1 1 1 0 1 1 


 0 0 1 0 1 0 1

 asking. Consider how you would answer the
0 0 0 0 1 1 0 following questions in both representations (in
particular, how much time it would take).
The adjacency matrix representation uses
Θ(V 2) space. Is vertex v adjacent to vertex w in an
undirected graph?
For a sparse graph E is much less than V 2, and
hence we would normally prefer the adjacency What is the out-degree of a vertex v in a
list representation. directed graph?

For a dense graph E is close to V 2 and the What is the in-degree of a vertex v in a
adjacency matrix representation is preferred. directed graph?
17 18

Recursive Representation Breadth-first search

A third representation to consider is a recursive Searching through a graph is one of the most
representation. In this representation you may fundamental of all algorithmic tasks, and
not have access to a list of all vertices in the therefore we shall examine several techniques
graph. Instead you have access to a single for doing so.
vertex, and from that vertex you can deduce
the adjacent vertices. Breadth-first search is a simple but extremely
important technique for searching a graph.
The following java class is an example of such This search technique starts from a given
a representation: vertex v and constructs a spanning tree for G,
called the breadth-first tree. It uses a (first-in,
abstract class Vertex{ first-out) queue as its main data structure.

int data; Following CLRS (section 22.2), as the search


progresses, we will divide the vertices of the
Vertex[] getAdjacentVertices(){} graph into three categories, black vertices
which are the vertices that have been fully
} examined and incorporated into the tree, grey
vertices which are the vertices that have been
This type of data structure is likely to arise if seen (because they are adjacent to a tree
you consider, for example, graphs of all states vertex) and placed on the queue, and white
in a chess game, or communication networks. vertices, which have not yet been examined.

19 20
Queues

Recall that a queue is a first-in-first-out buffer.


Breadth-first search initialization
Items are pushed (or enqueued) onto the end
of the queue, and items can be popped (or The final breadth-first tree will be stored as an
dequeued) from the front of the queue. array called π where π[x] is the immediate
parent of x in the spanning tree. Of course, as
A Queue is commonly implemented using v is the root of this tree, π[v] will remain
either a block representation, or a linked undefined (or nil in CLRS).
representation.
To initialize the search we mark the colour of
We will assume that the push and pop every vertex as white and the queue is empty.
operations can be performed in constant time. Then the first step is to mark the colour of v
You may also assume that we can examine the to be grey, put π[v] to be undefined.
first element of the queue, and decide if the
queue is empty, all in constant time (i.e.
Θ(1)).

21 22

Breadth-first search repetitive step Queues revisited

Then the following procedure is repeated until


Recall that a queue is a data structure whereby
the queue, Q, is empty. the element taken off the data structure is the
element that has been on the queue for the
procedure BFS(v) longest time.
Push v on to the tail of Q
while Q is not empty
If the maximum length of the queue is known
Pop vertex w from the head of Q in advance (and is not too great) then a queue
for each vertex x adjacent to w do
can be very efficiently implemented by simply
if colour[x] is white then using an array.
π[x] ← w
colour[x] ← grey
An array of n elements is initialized, and two
Push x on to the tail of Q
pointers called head and tail are maintained —
end if
the head gives the location of the next element
end for
to be removed, while the tail gives the location
colour[w] ← black
of the first empty space in the array.
end while

It is trivial to see that both enqueueing and


At the end of the search, every vertex in the
dequeueing operations take Θ(1) time.
graph will have colour black and the parent or
predecessor array π will contain the details of
See CLRS (section 10.1) for further details.
the breadth-first search tree.

23 24
Example of breadth-first search After visiting vertex 1

'(
#$
!"
1
%&
4 1 4

'(
#$
2 5 !"
2
%&
5
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
3 6 7 3 6 7

Head Head
↓ ↓
queue 1 queue 1 2

x colour[x] π[x] x colour[x] π[x]


1 grey undef 1 black undef
2 white 2 grey 1
3 white 3 white
4 white 4 white
5 white 5 white
6 white 6 white
7 white 7 white
25 26

After visiting vertex 2 After visiting vertex 3

1 4 1 4

'(
#$ '(
#$
2 !"
%&
5 2 !"
%&
5
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
'( '(
! " ! "
#$
! " ! #$ "

!"
3
%&
6 7 3 !"
6
%&
7

Head Head
↓ ↓
queue 1 2 3 5 queue 1 2 3 5 6

x colour[x] π[x] x colour[x] π[x]


1 black undef 1 black undef
2 black 1 2 black 1
3 grey 2 3 black 2
4 white 4 white
5 grey 2 5 grey 2
6 white 6 grey 3
7 white 7 white
27 28
After visiting vertex 5 After visiting vertex 6

'(
#$ '(
#$
1 !"
4
%&
1 !"
4
%&

2 5 2 5
! !
! ! !
!
! !
! ! !
!
! !
! ! !
!
! !
!
! ! !
!
!
! !
!
! ! !
!
!
! !
!
! ! !
!
!
! !
'( '( '(
! ! ! !
#$ #$ #$
! !
! !
!
!!
! ! !
!
!!
!
3 !"
6
%& !"
%&
7 3 6 !"
7
%&

Head Head
↓ ↓
queue 1 2 3 5 6 4 7 queue 1 2 3 5 6 4 7

x colour[x] π[x] x colour[x] π[x]


1 black undef 1 black undef
2 black 1 2 black 1
3 black 2 3 black 2
4 grey 5 4 grey 5
5 black 2 5 black 2
6 grey 3 6 black 3
7 grey 5 7 grey 5
29 30

After visiting vertex 4 After visiting vertex 7

1 4 1 4

2 5 2 5
! !
! ! !
!
! !
! ! !
!
! !
! ! !
!
! !
!
! ! !
!
!
! !
!
! ! !
!
!
! !
!
! ! !
!
!
! !
'(
! ! ! !
#$
! !
! !
!
!!
! ! !
!
!!
!
3 6 !"
%&
7 3 6 7

Head Head
↓ ↓
queue 1 2 3 5 6 4 7 queue 1 2 3 5 6 4 7

x colour[x] π[x] x colour[x] π[x]


1 black undef 1 black undef
2 black 1 2 black 1
3 black 2 3 black 2
4 black 5 4 black 5
5 black 2 5 black 2
6 black 3 6 black 3
7 grey 5 7 black 5
31 32
At termination
Uses of BFS

At the termination of breadth-first search every


Breadth-first search is particularly useful for
vertex in the same connected component as v
certain simple tasks such as determining
is a black vertex and the array π contains
whether a graph is connected, or finding the
details of a spanning tree for that component
distance between two vertices.
— the breadth-first tree.

The vertices of G are examined in order of


Time analysis
increasing distance from v — first v, then its
neighbours, then the vertices at distance 2
During the breadth-first search each vertex is
from v and so on. The spanning tree
enqueued once and dequeued once. As each
constructed provides a shortest path from any
enqueueing/dequeueing operation takes
vertex back to v just by following the array π.
constant time, the queue manipulation takes
Θ(V ) time. At the time the vertex is
Therefore it is simple to modify the
dequeued, the adjacency list of that vertex is
breadth-first search to provide an array of
completely examined. Therefore we take Θ(E)
distances dist where dist[u] is the distance of
time examining all the adjacency lists and the
the vertex u from the source vertex v.
total time is Θ(V + E).

33 34

Breadth-first search finding distances


Depth-first search

To initialize the search we mark the colour of


Depth-first search is another important
every vertex as white and the queue is empty.
technique for searching a graph. Similarly to
Then the first step is to mark the colour of v
breadth-first search it also computes a
to be grey, set π[v] to be undefined, set dist[v]
spanning tree for the graph, but the tree is
to be 0, and add v to the queue, Q. Then we
very different.
repeat the following procedure.

The structure of depth-first search is naturally


while Q is not empty
recursive so we will give a recursive description
Pop vertex w from the head of Q
of it. Nevertheless it is useful and important to
for each vertex x adjacent to w do
consider the non-recursive implementation of
if colour[x] is white then
the search.
dist[x] ← dist[w]+1
π[x] ← w
The fundamental idea behind depth-first search
colour[x] ← grey
is to visit the next unvisited vertex, thus
Push x on to the tail of Q
extending the current path as far as possible.
end if
When the search gets stuck in a “corner” we
end for
back up along the path until a new avenue
colour[w] ← black
presents itself (this is called backtracking).
end while

35 36
Basic recursive depth-first search
A Non-recursive DFS
The following recursive program computes the
depth-first search tree for a graph G starting All recursive algorithms can be implemented as
from the source vertex v. non-recursive algorithms. A non-recursive DFS
requires a stack to record the previously visited
To initialize the search we mark the colour of vertices.
every vertex as white. Then we call the
recursive routine DFS(v) where v is the source procedure DFS(w)
vertex. initialize stack S
push w onto S
procedure DFS(w) while S not empty do
colour[w] ← grey x ← pop off S
for each vertex x adjacent to w do if colour[x]=white then
if colour[x] is white then colour[x] ← black
π[x] ← w for each vertex y adjacent to x do
DFS(x) if colour[y] is white then
end if push y onto S
end for π[y] ← x
colour[w] ← black end if
end for
At the end of this depth-first search procedure end if
we have produced a spanning tree containing end while
every vertex in the connected component
containing v.
37 38

Example of depth-first search Immediately prior to calling DFS(2)

'(
#$
!"
1 4 1
%&
4

2 5 2 5
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
! " ! "
3 6 7 3 6 7

x colour[x] π[x] x colour[x] π[x]


1 white undef 1 grey undef
2 white 2 white 1
3 white 3 white
4 white 4 white
5 white 5 white
6 white 6 white
7 white 7 white
39 40
Immediately prior to calling DFS(3) Immediately prior to calling DFS(5)

'(
#$ '(
#$
!"
1
%&
4 !"
1
%&
4

'(
#$ '(
#$
!"
2
%&
5 !"
2
%&
5
! " !!
!
! "
! " !
!
! "
! " !
!! "
! " !!
! "
! " !
!
! "
! " !
!! "
'(
! " !!
! "
! " #$
!
!
! "
3 6 7 !"
3
%&
6 7

x colour[x] π[x] x colour[x] π[x]


1 grey undef 1 grey undef
2 grey 1 2 grey 1
3 white 2 3 grey 2
4 white 4 white
5 white 5 white 3
6 white 6 white
7 white 7 white
41 42

Immediately prior to calling DFS(4) Immediately prior to calling DFS(6)

Now the call to DFS(4) actually finishes


without making any more recursive calls so we
'(
return to examining the neighbours of vertex 5,
#$
!"
1
%&
4 the next of which is vertex 6.

'(
#$ '(
#$
!"
2
%& !"
5 '(
!!%&
!
! "
#$
!
!
! " !"
1
%&
4
!
!! "
!!
! "
!
!
! "
!
!! "
'(
!!
! "
#$
!
!
! "

!"
3
%&
6 7 '(
#$ '(
#$
!"
2
%& !" 5
!!%&
!
! "
!
!
! "
!
!! "
!!
! "
!
!
! "
!
!! "
'(
!!
! "
#$
!
!
! "

!"
3
%&
6 7

x colour[x] π[x] x colour[x] π[x]


1 grey undef 1 grey undef
2 grey 1 2 grey 1
3 grey 2 3 grey 2
4 white 5 4 black 5
5 grey 3 5 grey 3
6 white 6 white 5
7 white 7 white
43 44
Immediately prior to calling DFS(7)
The depth-first search tree

After completion of the search we can draw


'(
#$ the depth-first search tree for this graph:
!"
1
%&
4

1
'(
#$ '(
#$
!"
2
%&
5 !"
!%& *
!
!
!
!!
! !
!
2
!
!! !
!!
! !
!
!
! ! )
!
!! ! 3
'( '(
!!
! !
#$
!
!
! #$ !

!"
3
%& !"
6
%&
7 + )
5
! "
! "
! "
4 6
"
"
"
7

x colour[x] π[x] In this picture the slightly thicker straight


1 grey undef edges are the tree edges (see later) and the
2 grey 1
remaining edges are the back edges — the
3 grey 2
4 black 5 back edges arise when we examine an edge
5 grey 3 (u, v) and discover that its endpoint v no
6 grey 5 longer has the colour white
7 white 6
45 46

Discovery and finish times

The operation of depth-first search actually


Analysis of DFS
gives us more information than simply the
depth-first search tree; we can assign two
The running time of DFS is easy to analyse as
times to each vertex.
follows.

Consider the following modification of the


First we observe that the routine DFS(w) is
search, where time is a global variable that
called exactly once for each vertex w; during
starts at time 1.
the execution of this routine we perform only
constant time array accesses, and run through
procedure DFS(w)
the adjacency list of w once.
colour[w] ← grey
discovery[w] ← time
Running through the adjacency list of each
time ← time+1
vertex exactly once takes Θ(E) time overall,
for each vertex x adjacent to w do
and hence the total time taken is Θ(V + E).
if colour[x] is white then
π[x] ← w
In fact, we can say more and observe that
DFS(x)
because every vertex and every edge are
end if
examined precisely once in both BFS and DFS,
end for
the time taken is Θ(V + E).
colour[w] ← black
finish[w] ← time
time ← time+1

47 48
Depth-first search for directed graphs
The parenthesis property
A depth-first search on an undirected graph
This assigns to each vertex a discovery time, produces a classification of the edges of the
which is the time at which it is first discovered, graph into tree edges, or back edges. For a
and a finish time, which is the time at which directed graph, there are further possibilities.
all its neighbours have been searched and it no The same depth-first search algorithm can be
longer plays any further role in the search. used to classify the edges into four types:

The discovery and finish times satisfy a tree edges If the procedure DFS(u) calls
property called the parenthesis property. DFS(v) then (u, v) is a tree edge

Imagine writing down an expression consisting


entirely of labelled parentheses — at the time back edges If the procedure DFS(u) explores
the edge (u, v) but finds that v is an
of discovering vertex u we open a parenthesis
already visited ancestor of u, then (u, v) is
(u and a the time of finishing with u we close
a back edge
the parenthesis u).

Then the resulting expression is a well-formed forward edges If the procedure DFS(u)
expression with correctly nested parentheses. explores the edge (u, v) but finds that v is
an already visited descendant of u, then
For our example depth-first search we get: (u, v) is a forward edge

(1 (2 (3 (4 (5 5 ) (6 (7 7) 6) 4) 3) 2) 1)
cross edges All other edges are cross-edges

49 50

Topological sort Example of a dag to be topologically sorted

We shall consider a classic simple application For example, consider this dag describing the
of depth-first search. stages of getting dressed and the dependency
between items of clothing (from CLRS, page
Definition A directed acyclic graph (dag) is a 550).
directed graph with no directed cycles.
underpants
socks
Theorem In a depth-first search of a dag there
are no back edges.
trousers
shoes
Consider now some complicated process in
which various jobs must be completed before
watch
others are started. We can model this by a shirt
belt
graph D where the vertices are the jobs to be
completed and there is an edge from job u to
job v if job u must be completed before job v is tie
started. Our aim is to find some linear ordering
of the jobs such that they can be completed jacket
without violating any of the constraints.
What is the appropriate linear order in which to
This is called finding a topological sort of the do these jobs so that all the precedences are
dag D. satisfied.

51 52
Doing the topological sort

! " !
D " H ! L " P
" "
" "
" "
Algorithm for TOPOLOGICAL SORT "! "
"
"
"
"!
" "
" "
" "
! "
The algorithm for topological sort is an C " G K ! O
extremely simple application of depth-first !" !"
"! "!
search.
! ! "
B " F " J ! N
Algorithm "
"
"
"
" " !"
"! " "
" "
" "
Apply the depth-first search procedure to find "
"
"
"
" !
the finishing times of each vertex. As each A E ! I " M
vertex is finished, put it onto the front of a
linked list.

At the end of the depth-first search the linked


list will contain the vertices in topologically
sorted order.

53 54

After the entire search

After the first depth-first search


4/7 5/6 18/21 19/20
! " !
D " H ! L " P
" "
4/7 5/6 18/21 19/20 " "
" "
! " !
D " H ! L " P "!
"
"
"
" "!
" " " "
" " " "
" " 3/26 "
8/25 "
17/22 29/32
"! " " "! ! "
"
"
"
" C " G K ! O
3/26 " 8/25 " 17/22
" "
! " !" !"
C " G K ! O "! "!

!" !"
2/27 9/24 10/23 30/31
"! "! ! ! "
B " F " J ! N
" !!
!
!
2/27 9/24 10/23 " !
!
!
" !!
!
! ! " !"
B " F " J ! N "! "
"
!!
!
!
!
!
" !!
!
! " !
!
!
" !
!
! " !!
" !! 1/28 11/12 ! 13/16 14/15
! !" " !
!
!
" !
!
! " !
"!
"
"
!
!
!!
! A E ! I " M
!
" !!
!
1/28 " 11/12 !
!
!
13/16 14/15
" !
A E ! I " M
As the vertices were placed at the front of a
linked list as they became finished the final
Notice that there is a component that has not
topological sort is: O − N − A − B − C − G − F −
been reached by the depth-first search. To
J −K −L−P −I −M −E−D−H
complete the search we just repeatedly perform
depth-first searches until all vertices have been
A topologically sorted dag has the property
examined.
that any edges drawn in the above diagram will
got from left-to-right.

55 56
Analysis and correctness Proof (contd)

Time analysis of the algorithm is very easy — WHITE: v is a descendant of u so we will set
to the Θ(V + E) time for the depth-first search its time now but we are still exploring u so we
we must add Θ(V ) time for the manipulation will set its finished time at some point in the
of the linked list. Therefore the total time future (and so therefore f [v] < f [u]). (refer
taken is again Θ(V + E). back to the psuedocode).

Proof of topological sort BLACK: v has already been visited and so its
finish time must have been set earlier, whereas
Suppose DFS has calculated the finish times of we are still exploring u and so we will set its
a dag G = (V, E). For any pair of adjacent finish time in the future (and so again
vertices u, v ∈ V (implying (u, v) ∈ E) then we f [v] < f [u]).
just need to show f [v] < f [u] (the destination
vertex v must finish first). Since for every edge in G there are two
possible destination vertex colours and in each
For each edge (u, v) explored by DFS of G case we can show f [v] < f [u], we have shown
consider the colour of vertex v. that this property applies to every connected
vertex in G.
GREY: v can never be grey since v should
therefore be an ancestor of u and so the graph See CLRS (theorem 22.11) for a more
would be cyclic. thorough treatment.

57 58

Other uses for DFS


Minimum spanning tree (MST)
DFS is the standard algorithmic method for
solving the following two problems: Consider a group of villages in a remote area
that are to be connected by telephone lines.
Strongly connected components In a There is a certain cost associated with laying
directed graph D a strongly connected the lines between any pair of villages,
component is a maximal subset S of the depending on their distance apart, the terrain
vertices such that for any two vertices u, v ∈ S and some pairs just cannot be connected.
there is a directed path from u to v and from v
to u. Our task is to find the minimum possible cost
in laying lines that will connect all the villages.
Depth-first search can be used on a digraph to
find strongly connected components in time This situation can be modelled by a weighted
Θ(V + E). graph W , in which the weight on each edge is
the cost of laying that line. A minimum
Articulation points In a connected, spanning tree in a graph is a subgraph that is
undirected graph, an articulation point is a (1) a spanning subgraph (2) a tree and (3) has
vertex whose removal disconnects the graph. a lower weight than any other spanning tree.

Depth-first search can be used on a graph to It is clear that finding a MST for W is the
find all the articulation points in time solution to this problem.
Θ(V + E).

59 60
Kruskal’s method

The greedy method Kruskal invented the following very simple


method for building a minimum spanning tree.
Definition A greedy algorithm is an algorithm It is based on building a forest of lowest
in which at each stage a locally optimal choice possible weight and continuing to add edges
is made. until it becomes a spanning tree.

A greedy algorithm is therefore one in which no Kruskal’s method


overall strategy is followed, but you simply do
whatever looks best at the moment. Initialize F to be the forest with all the vertices
of G but none of the edges.
For example a mountain climber using the
greedy strategy to climb Everest would at repeat
every step climb in the steepest direction. Pick an edge e of minimum possible weight
From this analogy we get the computational if F ∪ {e} is a forest then
search technique known as hill-climbing. F ← F ∪ {e}
end if
In general greedy methods have limited use, until F contains n − 1 edges
but fortunately, the problem of finding a
minimum spanning tree can be solved by a Therefore we just keep on picking the smallest
greedy method. possible edge, and adding it to the forest,
providing that we never create a cycle along
the way.

61 62

Example After using edges of weight 1

1 3 1 1 3 1

1 1 2 2 1 1 2 2
1 6 2 1 6 2

7 2 5 2 7 2 5 2
8 2 2 8 2 2

2 4 5 6 2 4 5 6
2 5 1 2 5 1

63 64
After using edges of weight 2 The final MST

1 3 1 1 3 1

1 1 2 2 1 1 2 2
1 6 2 1 6 2

7 2 5 2 7 2 5 2
8 2 2 8 2 2

2 4 5 6 2 4 5 6
2 5 1 2 5 1

65 66

Prim’s algorithm

Prim’s algorithm is another greedy algorithm Prim’s algorithm in action


for finding a minimum spanning tree.
1 3 1
The idea behind Prim’s algorithm is to grow a
minimum spanning tree edge-by-edge by always
1 1 2 2
adding the shortest edge that touches a vertex
in the current tree. 1 6 2

Notice the difference between the algorithms: 7 2 5 2


Kruskal’s algorithm always maintains a 8 2 2
spanning subgraph which only becomes a tree
at the final stage.
2 4 5 6

On the other hand, Prim’s algorithm always 2 5 1


maintains a tree which only becomes spanning
at the final stage.

67 68
Problem solved?

One solution As far as a mathematician is concerned the


problem of a minimum spanning tree is
1 3 1 well-solved. We have two simple algorithms
both of which are guaranteed to find the best
solution. (After all, a greedy algorithm must
1 1 2 2
be one of the simplest possible).
1 6 2
In fact, the reason why the greedy algorithm
7 2 5 2 works in this case is well understood — the
8 2 2 collection of all the subsets of the edges of a
graph that do not contain a cycle forms what
is called a (graphic) matroid (see CLRS,
2 4 5 6
section 16.4).
2 5 1
Loosely speaking, a greedy algorithm always
works on a matroid and never works otherwise.

69 70

Implementation of Kruskal

The main problem in the implementation of


Kruskal is to decide whether the next edge to
be added is allowable — that is, does it create
a cycle or not.

Implementation issues Suppose that at some stage in the algorithm


the next shortest edge is {x, y}. Then there are
In fact the problem is far from solved because two possibilities:
we have to decide how to implement the two
greedy algorithms. x and y lie in different trees of F : In this
case adding the edge does not create any
The details of the implementation of the two new cycles, but merges together two of the
algorithms are interesting because they use trees of F
(and illustrate) two important data structures
— the partition and the priority queue.
x and y lie in the same tree of F : In this
case adding the edge creates a cycle and
the edge should not be added to F

Therefore we need data structures that allow


us to quickly find the tree to which an element
belongs and quickly merge two trees.
71 72
Partitions or disjoint sets
Naive partitions

The appropriate data structure for this problem


One simple way to represent a partition is
is the partition (sometimes known under the
simply to choose one element of each cell to
name disjoint sets). Recall that a partition of a
be the “leader” of that cell. Then we can
set is a collection of disjoint subsets (called
simply keep an array π of length n where π(x)
cells) that cover the entire set.
is the leader of the cell containing x.

At the beginning of Kruskal’s algorithm we


Example Consider the partition of 8 elements
have a partition of the vertices into the
into 3 cells as follows:
discrete partition where each cell has size 1.
As each edge is added to the minimum {0, 2 | 1, 3, 5 | 4, 6, 7}
spanning tree, the number of cells in the
partition decreases by one.
We could represent this as an array as follows

The operations that we need for Kruskal’s x 0 1 2 3 4 5 6 7


algorithm are π(x) 0 1 0 1 4 1 4 4

Then certainly the operation Find is


Union(cell,cell) Creates a new partition by
straightforward — we can decide whether x
merging two cells
and y are in the same cell just by comparing
π(x) with π(y).
Find(element) Finds the cell containing a
given element Thus Find has complexity Θ(1).

73 74

The disjoint sets forest

Updating the partition Consider the following graphical representation


of the data structure above, where each
Suppose now that we wish to update the element points (upwards) to the “leader” of
partition by merging the first two cells to the cell.
obtain the partition
0 1 4
! "# $% "# $%
{0, 1, 2, 3, 5 | 4, 6, 7} " $ " $
" $ " $
" $ " $
" $ " $
" $ " $

We could update the data structure by running 2 3 5 6 7


through the entire array π and updating it as
necessary. Merging two cells is accomplished by adjusting
the pointers so they point to the new leader.
x 0 1 2 3 4 5 6 7
π(x) 0 0 0 0 4 0 4 4 0( 4
!&
$% (
' "# $%
$ &(( " $
$ & ( " $
$ & (( " $
This takes time Θ(n), and hence the merging $
$
&
&
(
(
( "
" $
$

operation is rather slow. 2 1 3 5 6 7

Can we improve the time of the merging However we can achieve something similar by
operation? just adjusting one pointer — suppose we
simply change the pointer for the element 1, by
making it point to 0 instead of itself.

75 76
The new data structure

Just adjusting this one pointer results in the


following data structure. Union-by-rank heuristic

0& 4
! &
& "
"# $%
$
There are two heuristics that can be applied to
&
& " $
&
&
& "
" $
$
the new data structure, that speed things up
&
& " $
2 1 6 7 enormously at the cost of maintaining a little
"# $%
" $ extra data.
" $
" $
" $
" $
3 5 Let the rank of a root node of a tree be the
height of that tree (the maximum distance
from a leaf to the root).
This new improved merging has complexity
only Θ(1). However we have now lost the The union-by-rank heuristic tries to keep the
ability to do the Find properly. In order to
trees balanced at all times. When a merging
correctly find the leader of the cell containing
operation needs to be done, the root of the
an element we have to run through a little
shorter tree is made to point to the root of the
loop:
taller tree. The resulting tree therefore does
procedure Find(x) not increase its height unless both trees are the
while x != π(x) same height in which case the height increases
x = π(x) by one.

This new find operation may take time O(n) so


we seem to have gained nothing.
77 78

Path compression heuristic


Complexity of Kruskal
The path compression heuristic is based on the
idea that when we perform a Find(x) In the worst case, we will perform E operations
operation we have to follow a path from x to on the partition data structure which has size
the root of the tree containing x. V . By the complicated argument in CLRS we
see that the total time for these operations if
After we have done this why do we not simply we use both heuristics is O(E lg∗ V ). (Note:
go back down through this path and make all lg∗ x is the iterated log function, which grows
these elements point directly to the root of the extremely slowly with x; see CLRS, page 56)
tree, rather than in a long chain through each
other? However we must add to this the time that is
needed to sort the edges — because we have
This is reminiscent of our naive algorithm, to examine the edges in order of length. This
where we made every element point directly to time is O(E lg E) if we use a sorting technique
the leader of its cell, but it is much cheaper such as mergesort, and hence the overall
because we only alter things that we needed to complexity of Kruskal’s algorithm is O(E lg E).
look at anyway.

79 80
The priority queue ADT

Recall that a priority queue is an abstract data


type that stores objects with an associated key.
The priority of the object depends on the value
Implementation of Prim of the key.

For Prim’s algorithm we repeatedly have to The operations associated with this data type
select the next vertex that is closest to the include
tree that we have built so far. Therefore we
need some sort of data structure that will insert(queue,entry,key) Places an entry with
enable us to associate a value with each vertex its associated key into the data structure
(being the distance to the tree under
construction) and rapidly select the vertex with
the lowest value. change(queue,entry,newkey) Changes the
value of the key associated with a given
entry
From our study of Data Structures we know
that the appropriate data structure is a priority
queue and that a priority queue is implemented max(queue) Returns the element with the
by using a heap. highest priority

extractmax(queue) Returns the element with


the highest priority from the queue and
deletes it from the queue

81 82

Heaps
Prim’s algorithm
A heap is a complete binary tree such that the
key associated with any node is larger than (or It is now easy to see how to implement Prim’s
equal to) the key associated with either of its algorithm.
children. This means that the root of the
binary tree has the largest key. We first initialize our priority queue Q to
empty. We then select an arbitrary vertex s to
99 grow our minimum spanning tree A and set the
&%%
&&& %%%
&& %%% key value of s to 0. In addition, we maintain an
&&& %%
&&& %%

23 78 array, π, where each element π[v] contains the


$$## $$##
$$ ## $$ ## vertex that connects v to the spanning tree
$$ ## $$ ##

15 7 55 23 being grown.
"! "! "! "
" ! " ! " ! "
" ! " ! " ! "
Here we want low key values to represent high
5 1 6 2 50 40 11
priorities, so we will rename our two last
priority queue operations to min(queue) and
We can insert items into a heap, change the extractmin(queue).
key value of an item in the heap, and remove
the item at the root from a heap, (always Next, we add each vertex v != s to Q and set
maintaining the heap property) in time the key value key[v] using the following criteria:
O(log n), where n is the size of the heap. !
weight (v, s) if (v, s) ∈ E
key[v] =
∞ otherwise
A heap can be used to implement all priority
queue operations in time O(log n).
83 84
Each time an element is added to the priority
queue Q, a heapify is carried out to maintain 3. We then examine the neighbours of u. For
the heap property of Q. Since low key values each neighbour v, there are two
represent high priorities, the heap for Q is so possibilities:
maintained that the key associated with any
node is smaller (rather than larger) than the (a) If v is is already in the spanning tree A
key of any of its children. This means that the being constructed then we do not
root of the binary tree always has the smallest consider it further.
key. (b) If v is currently on the priority queue Q,
then we see whether this new edge (u, v)
We store the following information in the should cause an update in the priority of
minimum spanning tree A: (v, key[v], π[v]). v. If the value weight (u, v) is lower than
Thus, at the beginning of the algorithm, the current key value of v, then we
A = {(s, 0, undef)}. change key[v] to weight (u, v) and set
π[v] = u. Note that each time the key
At each stage of the algorithm: value of a vertex in Q is updated, a
heapify is carried out to maintain the
1. We extract the vertex u that has the heap property of Q.
highest priority (that is, the lowest key
value!). With the binary tree being
heapified, u is simply the root of the tree. At the termination of the algorithm, Q = ∅ and
the spanning tree A contain all the edges,
2. We add (u, key[u], π[u]) to A and carry out
together with their weights, that span the tree.
extractmin(Q)

85 86

Priority-first search
Complexity of Prim

Let us generalize the ideas behind this


The complexity of Prim’s algorithm is
implementation of Prim’s algorithm.
dominated by the heap operations.

Consider the following very general


Every vertex is extracted from the priority
graph-searching algorithm. We will later show
queue at some stage, hence the extractmin
that by choosing different specifications of the
operations in the worst case take time
priority we can make this algorithm do very
O(V lg V ).
different things. This algorithm will produce a
priority-first search tree.
Also, every edge is examined at some stage in
the algorithm and each edge examination
The key-values or priorities associated with
potentially causes a change operation. Hence
each vertex are stored in an array called key .
in the worst case these operations take time
O(E lg V ).
Initially we set key [v] to ∞ for all the vertices
v ∈ V (G) and build a heap with these keys —
Therefore the total time is
this can be done in time O(V ).
O(V lg V + E lg V ) = O(E lg V )
Then we select the source vertex s for the
search and perform change(s,0) to change the
Which is the better algorithm: Kruskal’s or
key of s to 0, thus placing s at the top of the
Prim’s?
priority queue.

87 88
The operation of PFS
Complexity of PFS
After initialization the operation of PFS is as
follows: The complexity of this search is easy to
calculate — the main loop is executed V
procedure PFS(s) times, and each extractmin operation takes
change(s,0) O(lg V ) yielding a total time of O(V lg V ) for
while Q != ∅ the extraction operations.
u ← extractmin(Q)
for each v adjacent to u do During all V operations of the main loop we
if v ∈ Q ∧ PRIORITY < key [v] then examine the adjacency list of each vertex
π[v] ← u exactly once — hence we make E calls, each
change(Q,v,PRIORITY) of which may cause a change to be performed.
end if Hence we do at most O(E lg V ) work on these
end for operations.
end while
Therefore the total is
It is important to notice how the array π is
managed — for every vertex v ∈ Q with a finite
key value, π[v] is the vertex not in Q that was
O(V lg V + E lg V ) = O(E lg V ).
responsible for the key of v reaching the
highest priority it has currently reached.

89 90

Shortest paths

Prim’s algorithms is PFS


Let G be a directed weighted graph. The
shortest path between two vertices v and w is
Prim’s algorithm can be expressed as a
the path from v to w for which the sum of the
priority-first search by observing that the
weights on the path-edges is lowest. Notice
priority of a vertex is the weight of the shortest
that if we take an unweighted graph to be a
edge joining the vertex to the rest of the tree.
special instance of a weighted graph, but with
all edge weights equal to 1, then this coincides
This is achieved in the code above by simply
with the normal definition of shortest path.
replacing the string PRIORITY by

The weight of the shortest path from v to w is


weight (u, v)
denoted by δ(v, w).

At any stage of the algorithm:


Let s ∈ V (G) be a specified vertex called the
source vertex.
• The vertices not in Q form the tree so far.

The single-source shortest paths problem is to


• For each vertex v ∈ Q, key [v] gives the length
find the shortest path from s to every other
of the shortest edge from v to a vertex in the
vertex in the graph (as opposed to the all-pairs
tree, and π[v] shows which tree vertex that is.
shortest paths problem, where we must find
the distance between every pair of vertices).

91 92
Dijkstra’s algorithm

Dijkstra’s algorithm is a famous single-source


shortest paths algorithm suitable for the cases Dijkstra’s algorithm in action
when the weights are all non-negative.

1 3 1
Dijkstra’s algorithm can be implemented as a
priority-first search by taking the priority of a
vertex v ∈ Q to be the shortest path from s to 1 1 2 2
v that consists entirely of vertices in the 1 6 2
priority-first search tree (except of course for
v).
7 2 5 2
8 2 2
This can be implemented as a PFS by
replacing PRIORITY with
2 4 5 6
key [u] + weight (u, v)
2 5 1

At the end of the search, the array key []


contains the lengths of the shortest paths from
the source vertex s.

93 94

Proof of correctness

It is possible to prove that Dijkstra’s algorithm Proof (contd)


is correct by proving the following claim
(assuming T = V (G) − Q is the set of vertices
that have already been removed from Q). The decomposed path may be illustrated thus.

At the time that a vertex u is removed


T=V(G)!Q
from Q and placed into T
key [u] = δ(s, u). u

This is a proof by contradiction, meaning that p1 p2


we try to prove key [u] #= δ(s, u) and if we fail s
x y
then we will have proved the opposite.

Assuming u #= s then T #= ∅ and there exists a


path p from s to u. We can decompose the
path into three sections:
Firstly, we know key [y] = δ(s, y) since the edge
(x, y) will have been examined when x was
1. A path p1 from s to vertex x, such that
x ∈ T and the path is of length 0 or more. added to T .

2. An edge between x and y, such that y ∈ Q Furthermore, we know that y is before u on


and (x, y) ∈ E(G). path p and therefore δ(s, y) ≤ δ(s, u). This
implies key [y] ≤ key [u] (inequality A).
3. A path p2 from y to u of length 0 or more.

95 96
Relaxation

Consider the following property of Dijkstra’s


algorithm.

Proof (contd) • At any stage of Dijkstra’s algorithm the


following inequality holds:
But we also know that u was chosen from Q
δ(s, v) ≤ key [v]
before y which implies key [u] ≤ key [y]
(inequality B) since the priority queue always
returns the vertex with the smallest key. This is saying that the key[] array always holds
a collection of upper bounds on the actual
Inequalities A and B can only be satisfied if values that we are seeking. We can view these
key [u] = key [y] but this implies values as being our “best estimate” to the
value so far, and Dijkstra’s algorithm as a
key [u] = δ(s, u) = δ(s, y) = key [y] procedure for systematically improving our
estimates to the correct values.
But our initial assumption was that
key [u] "= δ(s, u) giving rise to the contradiction. The fundamental step in Dijkstra’s algorithm,
where the bounds are altered is when we
Hence we have proved that key [u] = δ(s, u) at
examine the edge (u, v) and do the following
the time that u enters T .
operation

key [v] ← min(key [v], key [u] + weight (u, v))

This is called relaxing the edge (u, v).


97 98

Relaxation schedules
Negative edge weights

Consider now an algorithm that is of the


Dijkstra’s algorithm cannot be used when the
following general form:
graph has some negative edge-weights (why
not? find an example).
• Initially an array d[] is initialized to
have d[s] = 0 for some source vertex s In general, no algorithm for shortest paths can
and d[v] = ∞ for all other vertices work if the graph contains a cycle of negative
• A sequence of edge relaxations is total weight (because a path could be made
performed, possibly altering the values arbitrarily short by going round and round the
in the array d[]. cycle). Therefore the question of finding
shortest paths makes no sense if there is a
negative cycle.
We observe that the value d[v] is always an
upper bound for the value δ(s, v) because
However, what if there are some negative edge
relaxing the edge (u, v) will either leave the
weights but no negative cycles?
upper bound unchanged or replace it by a
better estimate from an upper bound on a
The Bellman-Ford algorithm is a relaxation
path from s → u → v.
schedule that can be run on graphs with
negative edge weights. It will either fail in
Dijkstra’s algorithm is a particular schedule for
which case the graph has a negative cycle and
performing the edge relaxations that
the problem is ill-posed, or will finish with the
guarantees that the upper bounds converge to
single-source shortest paths in the array d[].
the exact values.
99 100
Complexity of Bellman-Ford
Bellman-Ford algorithm
The complexity is particularly easy to calculate
The initialization step is as described above. in this case because we know exactly how
Let us suppose that the weights on the edges many relaxations are done — namely E(V − 1),
are given by the function w. and adding that to the final failure check loop,
and the initialization loop we see that
Then consider the following relaxation Bellman-Ford is O(EV )
schedule:
There remains just one question — how does
for i = 1 to |V (G)| − 1 do it work?
for each edge (u, v) ∈ E(G) do
d[v] ← min(d[v], d[u] + w(u, v))
end for each
end for

Finally we make a single check to determine if


we have a failure:

for each edge (u, v) ∈ E(G) do


if d[v] > d[u] + w(u, v) then
FAIL
end if
end for each

101 102

Correctness of Bellman-Ford

Now at the initialization stage d[s] = 0 and it


Let us consider some of the properties of
always remains the same. After one pass
relaxation in a graph with no negative cycles.
through the main loop the edge (s, v1) is
relaxed and by Property 1, d[v1] = δ(s, v1) and
Property 1 Consider an edge (u, v) that lies on
it remains at that value. After the second pass
the shortest path from s to v. If the sequence
the edge (v1, v2) is relaxed and after this
of relaxations includes relaxing (u, v) at a stage
relaxation we have d[v2] = δ(s, v2) and it
when d[u] = δ(s, u), then d[v] is set to δ(s, v)
remains at this value.
and never changes after that.

As the number of edges in the path is at most


Once convinced that Property 1 holds we can
|V (G)| − 1, after all the loops have been
show that the algorithm is correct for graphs
performed d[v] = δ(s, v).
with no negative cycles, as follows.

Note that this is an inductive argument where


Consider any vertex v and let us examine the
the induction hyptohesis is “after n iterations,
shortest path from s to v, namely
all shortest paths of length n have been found”.
s ∼ v1 ∼ v2 · · · vk ∼ v

103 104
All-pairs shortest paths A dynamic programming method

Now we turn our attention to constructing a Dynamic programming is a general algorithmic


complete table of shortest distances, which technique for solving problems that can be
must contain the shortest distance between characterised by two features:
any pair of vertices.

• The problem is broken down into a


If the graph has no negative edge weights then
collection of smaller subproblems
we could simply make V runs of Dijkstra’s
• The solution is built up from the
algorithm, at a total cost of O(V E lg V ),
stored values of the solutions to all of
whereas if there are negative edge weights then
the subproblems
we could make V runs of the Bellman-Ford
algorithm at a total cost of O(V 2E).
For the all-pairs shortest paths problem we
The two algorithms we shall examine both use define the simpler problem to be
the adjacency matrix representation of the
graph, hence are most suitable for dense “What is the length of the shortest path from
graphs. Recall that for a weighted graph the vertex i to j that uses at most m edges?”
weighted adjacency matrix A has weight (i, j) as
its ij-entry, where weight (i, j) = ∞ if i and j are We shall solve this for m = 1, then use that
not adjacent. solution to solve for m = 2, and so on . . .

105 106

The initial step The inductive step

(m)
We shall let dij denote the distance from What is the smallest weight of the path from
vertex i to vertex j along a path that uses at vertex i to vertex j that uses at most m edges?
most m edges, and define D(m) to be the Now a path using at most m edges can either
(m)
matrix whose ij-entry is the value dij . be

As a shortest path between any two vertices


(1) A path using less than m edges
can contain at most V − 1 edges, the matrix
(2) A path using exactly m edges,
D(V −1) contains the table of all-pairs shortest
composed of a path using m − 1 edges
paths.
from i to an auxiliary vertex k and the
edge (k, j).
Our overall plan therefore is to use D(1) to
compute D(2), then use D(2) to compute D(3)
(m)
and so on. We shall take the entry dij to be the lowest
weight path from the above choices.
The case m = 1
Therefore we get
Now the matrix D(1) is easy to compute — the ! "
(m) (m−1) (m−1)
length of a shortest path using at most one dij = min dij , min {dik + w(k, j)}
1≤k≤V
edge from i to j is simply the weight of the
edge from i to j. Therefore D(1) is just the = min {dik
(m−1)
+ w(k, j)}
1≤k≤V
adjacency matrix A.

107 108
Example Computing D(2)

Consider the weighted graph with the following


weighted adjacency matrix:
  
∞ 11 2 6
0 0 ∞ 11 2 6
 1 0 4 ∞ ∞   1 0 4 ∞ ∞ 
  
  
 10 ∞ 0 ∞ ∞   10 ∞ 0 ∞ ∞ 
 

   ∞ 2 6 0 3  ∞ 2 6 0 3 
 
0 ∞ 11 2 6  
 1

0 4 ∞ ∞ 
 ∞ ∞ 6 ∞ 0 ∞ ∞ 6 ∞ 0
A=D (1) 
=

 10 ∞ 0 ∞ ∞   
0 4 8 2 5

 ∞ 2 6 0 3 
 
 1 0 4 3 7 
 
∞ ∞ 6 ∞ 0  
=  10 ∞ 0 12 16 


 3 2 6 0 3 
 
Let us see how to compute an entry in D(2), 16 ∞ 6 ∞ 0
suppose we are interested in the (1, 3) entry:
If we multiply two matrices AB = C, then we
Then we see that compute
k=V
'
cij = aik bkj
1 →1→3 has cost 0 + 11 = 11 k=1
1 →2→3 has cost ∞+4=∞ If we replace the multiplication aik bkj by
1 →3→3 has cost 11 + 0 = 11 addition aik + bkj and replace summation Σ by
1 →4→3 has cost 2+6=8 the minimum min then we get
1 →5→3 has cost 6 + 6 = 12
k=V
cij = min aik + bkj
k=1
The minimum of all of these is 8, hence the which is precisely the operation we are
(1, 3) entry of D(2) is set to 8. performing to calculate our matrices.
109 110

A new matrix “product”

(m)
Recall the method for computing dij , the
The remaining matrices (i, j) entry of the matrix D(m) using the
method similar to matrix multiplication.
Proceeding to compute D(3) from D(2) and A,
(m)
and then D(4) from D(3) and A we get: dij ← ∞
for k = 1 to V do
(m) (m) (m−1)
dij = min(dij , dik + w(k, j))
end for
 
0 4 8 2 5
1 0 4 3 6
 
 
(3) 
=

D  10 14 0 12 15 
 Let us use ! to denote this new matrix product.

 3 2 6 0 3 

16 ∞ 6 18 0
Then we have
 
0 4 8 2 5 D(m) = D(m−1) ! A
 1 0 4 3 6 
 

D(4) = 
 
 10 14 0 12 15 

 3 2 6 0 3 
  Hence it is an easy matter to see that we can
16 20 6 18 0 compute as follows:

D(2) = A ! A D(3) = D(2) ! A . . .

111 112
Complexity of this method

The time taken for this method is easily seen Floyd-Warshall


to be Θ(V 4) as it performs V matrix
“multiplications” each of which involves a The Floyd-Warshall algorithm uses a different
triply nested for loop with each variable dynamic programming formalism.
running from 1 to V .
(k)
For this algorithm we shall define dij to be the
However we can reduce the complexity of the length of the shortest path from i to j whose
algorithm by remembering that we do not need intermediate vertices all lie in the set {1, . . . , k}.
to compute all the intermediate products D(1),
D(2) and so on, but we are only interested in As before, we shall define D(k) to be the
D(V −1). Therefore we can simply compute: (k)
matrix whose (i, j) entry is dij .
D(2) = A ! A
The initial case
D(4) = D(2) ! D(2)
(0)
What is the matrix D(0) — the entry dij is
D(8) = D(4) ! D(4) the length of the shortest path from i to j with
Therefore we only need to do this operation at no intermediate vertices. Therefore D(0) is
most lg V times before we reach the matrix we simply the adjacency matrix A.
want. The time required is therefore actually
Θ(V 3"lg V #).

113 114

The inductive step The overall algorithm

For the inductive step we assume that we have The overall algorithm is then simply a matter
constructed already the matrix D(k−1) and of running V times through a loop, with each
wish to use it to construct the matrix D(k). entry being assigned as the minimum of two
possibilities. Therefore the overall complexity
Let us consider all the paths from i to j whose of the algorithm is just O(V 3).
intermediate vertices lie in {1, 2, . . . , k}. There
are two possibilities for such paths D(0) ← A
for k = 1 to V do
for i = 1 to V do
(1) The path does not use vertex k
for j = 1 to V do
(2) The path does use vertex k (k) (k−1) (k−1) (k−1)
dij = min(dij , dik + dkj )
end for j
The shortest possible length of all the paths in end for i
(k−1)
category (1) is given by dij which we end for k
already know.
At the end of the procedure we have the
If the path does use vertex k then it must go matrix D(V ) whose (i, j) entry contains the
from vertex i to k and then proceed on to j, length of the shortest path from i to j, all of
and the length of the shortest path in this whose vertices lie in {1, 2, . . . , V } — in other
(k−1) (k−1)
category is dik + dkj . words, the shortest path in total.

115 116
Example
The entire sequence of matrices
Consider the weighted directed graph with the
following adjacency matrix:  
0 ∞ 11 2 6
 1 0 4 3 7
 

(2) 
=

D  10 ∞ 0 12 16
  
0 ∞ 11 2 6 
 3 2 6 0 3
 
 1 0 4 ∞ ∞ 
  
(0)   ∞ ∞ 6 ∞ 0
D  10 ∞
= 0 ∞ ∞ 
 ∞ 2 6 0 3   
0 ∞ 11 2 6
 
∞ ∞ 6 ∞ 0
 1 0 4 3 7
 

(3) 
∞ 0

D  10
= 12 16 

Let us see how to compute D(1)
 3 2 6 0 3 
 
16 ∞ 6 18 0
 
  0 4 8 2 5
0 ∞ 11 2 6
 1 0 4 3 6
 
1 0 4
  
 
D (4) 
=

(1)  10 14 0 12 15
  
D = 10 ∞ 0  
 3 2 6 0 3
   
 ∞ 2 6 0 3  
16 20 6 18 0
 
∞ ∞ 6 ∞ 0
 
0 4 8 2 5
To find the (2, 4) entry of this matrix we have  1 0 4 3 6 
 

D (5)  
=  10 14 0 12 15 
to consider the paths through the vertex 1 — 

 3 2 6 0 3 
is there a path from 2 – 1 – 4 that has a  
better value than the current path? If so, then 16 20 6 18 0
that entry is updated.
117 118

Summary

1. A graph G is fully described by a set of


Finding the actual shortest paths vertices V (G) and a set of edges E(G).

In both of these algorithms we have not


addressed the question of actually finding the 2. Graphs may be directed so that the edges
correspond to one directional arcs:
paths themselves.
(u, v) ∈ E(G) #⇒ (v, u) ∈ E(G)

For the Floyd-Warshall algorithm this is


achieved by constructing a further sequence of 3. Graphs may be weighted when an
arrays P (k) whose (i, j) entry contains a additional weight value is associated with
predecessor of j on the path from i to j. As each edge: w : E(G) → R.
the entries are updated the predecessors will
change — if the matrix entry is not changed
then the predecessor does not change, but if 4. Graphs may be represented as adjacency
the entry does change, because the path list or adjacency matrix data structures.
originally from i to j becomes re-routed
through the vertex k, then the predecessor of j
5. Searching may occur breadth first (BFS) or
becomes the predecessor of j on the path from
depth first (DFS).
k to j.

6. DFS and BFS create a spanning tree from


any graph.

119 120
Summary (contd)
Summary (contd)
7. DFS visits the vertices nearest to the
source first. It can be used to determine
12. Dynamic Programming is a general
whether a graph is connected.
approach for solving problems which can be
decomposed into sub-problems and where
8. BFS visits the vertices furtherest to the solutions to sub-problems can be combined
source first. It can be used to perform a to solve the main problem.
topological sort.

13. Dynamic Programming can be used to


9. Kruskal’s and Prim’s methods are two
solve the shortest path problem directly or
greedy algorithms for determining the
via the Floyd-Warshall formulation.
minimum spanning tree of a graph.

10. Dijkstra’s method determines the shortest 14. The minimum path problem can be used
path between any two vertices in a directed for motion planning of robots through large
graph so long as all the weights are graphs using a priority first search.
non-negative.

11. When directed graphs have negative edge


weights the Bellman-Ford algorithm may Recommended reading:
be used (but it will fail if the graph has a CLRS, Chapters 22–25
negative cycle).

121 122
Computer Science and Software Engineering, 2011 Flow networks

In this section we see how our fundamental


graph theoretic algorithms can be combined to
solve a more complex problem.

A flow network is a directed graph in which


each directed edge (u, v) has a non-negative
CITS3210 Algorithms capacity c (u, v) ≥ 0. The flow network has two
special vertices — a source s and a sink t.
Network Flow
! 4 ! 6
" " " " t
! !
!" 9 !" 3 !"

3 2 2 5 4 2

"! "! "!


! 6 ! 9
" " " "
! !
!" 7 !" 5 !"

7 9 3 4 9 6

"!
! 4 "!
! 2 "!
s " " " "
! !
Notes by CSSE, Comics by xkcd.com 3 6
1 2

A flow
The MAX FLOW problem
A flow in a flow network is a function
MAX FLOW
f :V ×V →R Instance. A flow network G with source s and
that satisfies the following properties. sink t.
Question. What is the maximum flow from s
Capacity constraint to t?

For each edge (u, v) The most convenient mental model for the
network flow problem is to think of the edges
f (u, v) ≤ c (u, v) of the capacity graph as representing pipelines
of various capacities.
Skew symmetry
The source is to be viewed as a producer of
For each pair of vertices u, v some sort of fluid (maybe an oil well), and the
sink as a consumer of some sort of fluid
f (u, v) = −f (v, u)
(maybe an oil refinery).

Flow conservation The network flow problem is then the problem


of deciding how much of the fluid to route
For all vertices u ∈ V − {s, t} we have along each of the pipelines in order to achieve
! the maximum flow of fluid from the source to
f (u, v) = 0
the sink.
v∈V

3 4
An example flow

This diagram shows a flow in the above flow


Interpreting negative flows
network. Each edge is labelled with the
amount of flow passing along that edge.
The concept of a negative flow often appears
! -!
2 ! -!
3 to be a little confusing.
" " " " t
! !
!" !
2 !" !
3 !"
However it is fundamentally no more difficult
!
2 -!
2 !
1 -!
1 !
1 -!
1 than the concept of an overdraft at a bank
being a negative balance.
"!
!
!
0 "!
!
!
0 "!
" " " " If there are 3 units of flow moving from A to
! !
!" !
0 !" !
0 !"
B, then we can equally view that as being −3
! units of flow from B to A. This is the concept
2 -!
2 !
1 -!
1 !
1 -!
1
that is being captured by skew-symmetry.

"!
! -!
2 "!
! -!
1 "!
If we increase the flow from A to B by one unit
s " " " "
! !
!
2 !
1 then the new flow will be 4 units from A to B
(same as −4 units from B to A), whereas if we
The value of a flow f is defined to be the total
increase the flow from B to A by one unit then
flow leaving the source vertex
! the new flow will be −2 units from B to A
|f | = f (s, v) (same as 2 units from A to B).
v∈V

This flow has value 4.


5 6

The residual network The residual network

Consider the same flow, but this time also The residual network is the network where we
including the (original) capacities of the edges just list the “unused capacities” of the pipes.
on the same diagram. Given a capacity graph G and a flow f the
residual network is called Gf where Gf has the
same vertex set as G and capacities cf (u, v)
! -!
2/4 ! -!
3/6
" " " " t given by
! !
!" !
2/9 !" !
3/3 !"

cf (u, v) = c (u, v) − f (u, v)


!
2/3 -!
2/2 !
1/2 -!
1/5 !
1/4 -!
1/2

! 6 ! 9
"! !
0/6 "! !
0/9 "!
" " " " t
! ! ! !
" " " "
!" 7 !" 0 !"
! !
!" !
0/7 !" !
0/5 !"

1 4 1 6 3 3
!
2/7 -!
2/9 !
1/3 -!
1/4 !
1/9 -!
1/6
"! "! "!
! 6 ! 9
" "
"!
! -!
2/4 "!
! -!
1/2 "!
!" 7
"
!
!" 5
"
!
!"
s " " " "
! !
!
2/3 !
1/6
5 11 2 5 8 7
It is clear that some of the pipes have got
some residual capacity in that they are not "!
! 6 "!
! 3 "!
being fully used. s " " " "
! !
1 5

7 8
Augmenting flows

If we can find a flow f ! in the residual network


The total flow so far
Gf , then we can form a new flow f ∗ in the
original network G where
Adding this flow to the original flow gives us a
f ∗(u, v) = f (u, v) + f ! (u, v) new flow with a higher value — the new flow
has a value of 7.

Such a flow in the residual network is called an


! -!
2 ! -!
3
augmenting flow. " " " " t
! !
!" !
2 !" !
3 !"

! !
" " " " t !
2 -!
2 !
1 -!
1 !
4 -!
4
! !
!" !" !"

!
3 -!
3
"!
! -!
3 "!
! -!
3 "!
" " " "
! !
!" !
3 !" !
3 !"

"!
! -!
3 "!
! -!
3 "!
!
" " " " 5 -!
5 !
1 -!
1 !
1 -!
1
! !
!" !
3 !" !
3 !"

!
3 -!
3
"!
! -!
2 "!
! -!
1 "!
s " " " "
! !
!
2 !
1
"! "! "!
! !
s " " " "
! !

9 10

The new residual network


Ford-Fulkerson method

! 6 ! 9
" " " " t
!" 7 !
!" 0 !
!"
The Ford-Fulkerson method is an iterative
method for solving the maximum flow problem.
1 4 1 6 0 6
It proceeds by starting with the zero valued
"!
9 "!
12 "! flow (where f (u, v) = 0 for all u, v ∈ V ).
! !
" " " "
! !
!" 4 !" 2 !"
At each stage in the method an augmenting
2 14 2 5 8 7 path is found — that is, a path from s to t
along which we may push some additional flow.
Given an augmenting path the bottleneck
"!
! 6 "!
! 3 "!
s " " " "
capacity b is the smallest residual capacity of
! !
1 5 the edges along the path.

We see that the pipes into t now have no


We can construct a flow of value b in the
unused capacity — they are currently
residual network by taking flows of b along the
saturated. Therefore we cannot increase this
edges of the path, and zeros elsewhere.
flow any further by finding an augmenting flow.

This process continues until there are no


The question is: have we found the maximum
augmenting paths left.
flow, or did we go wrong at an earlier stage?

11 12
Cuts

For a given flow network G = (V, E), a source An s,t-cut is a partition of V into two subsets
vertex s and a sink vertex t, Ford-Fulkerson S and T such that s ∈ S and t ∈ T .
method can be summarised as follows:
! 4 ! 6
T " " T " " t
! !
!" 9 !" 3 !"
Ford-Fulkerson(G, s, t)
for each edge (u, v) ∈ E do 3 2 2 5 4 2
f (u, v) ← 0
f (v, u) ← 0 "!
! 6 "!
! 9 "!

S " " S " " T


! !
while there exists a path p from s to t !" 7 !" 5 !"

in the residual network Gf do


7 9 3 4 9 6
cf (p) ← min{cf (u, v) : (u, v) is in p}
for each edge (u, v) in p do "! "! "!
! 4 ! 2
f (u, v) ← f (u, v) + cf (p) s " " S " " T
! !
3 6
f (v, u) ← −f (u, v)
Here the vertices marked S are in S and the
ones marked T are in T . The line draws the
boundary between S and T .

13 14

The capacity of a cut


Flow across a cut
The capacity of a cut is the sum of the
capacities of all of the edges that go between
S and T . Now let us compute the flow across this cut
when the network is carrying the flow of value
More formally 7 that we found earlier.
!
c (S, T ) = c (u, v)
u∈S,v∈T ! -!
2 ! -!
3
" " " " t
! !
!" !
2 !" !
3 !"

! 4 ! 6
T " " T " " t
!" 9 !
!" 3 !
!"
!
2 -!
2 !
1 -!
1 !
4 -!
4

3 2 2 5 4 2 "!
! -!
3 "!
! -!
3 "!
" " " "
! !
!" !
3 !" !
3 !"
"! "! "!
! 6 ! 9
S " " S " " T !
!" 7 !
!" 5 !
!" 5 -!
5 !
1 -!
1 !
1 -!
1

7 9 3 4 9 6 "!
! -!
2 "!
! -!
1 "!
s " " " "
! !
!
2 !
1
"!
! 4 "!
! 2 "!
s " " S " " T
3 !
6 ! The flow across the cut is

Therefore the capacity of this cut is 2+1+3+1=7


3 + 2 + 5 + 6 = 16
15 16
Flow across another cut Flow across all cuts

Now let us compute the flow across a different Theorem. The flow across every cut has the
cut. same value.

-!
2 -!
3 In order to prove this theorem our strategy will
! !
" " " " t be to show that moving a single vertex from
! !
!" !
2 !" !
3 !"
one side of a cut to the other does not affect
!
2 -!
2 !
1 -!
1 !
4 -!
4 the flow across that cut.

This will then show that any two cuts have the
"!
! -!
3 "!
! -!
3 "!
" " " " same flow across them because we can shift
! !
!" !
3 !" !
3 !"
any number of vertices from one side of the
cut to the other without affecting the flow
!
5 -!
5 !
1 -!
1 !
1 -!
1
across the cut.

"!
! -!
2 "!
! -!
1 "!
Proof. Suppose S, T is a cut such that u ∈ S.
s " " " "
! !
!
2 !
1 We show that we can move u to T without
altering the flow across the cut by considering
The flow across this cut is the value
3+3+1=7 f (S, T ) − f (S − {u}, T + {u})

17 18

Minimum cut
Proof continued
For any s, t-cut S, T it is clear that
The contribution that vertex u makes to the
f (S, T ) ≤ c (S, T )
flow f (S, T ) is
!
f (u, w)
Therefore the value of the flow is at most the
w∈T
capacity of the cut. Therefore we can consider
whereas the contribution it makes to the flow
the cut with the lowest possible capacity —
f (S − {u}, T + {u}) is
!
the minimum cut, and it is clear that the
f (w, u) capacity of this cut is an upper bound for the
w∈S maximum flow.

Therefore Therefore
f (S, T ) − f (S − {u}, T + {u})
! ! max flow ≤ min cut
= f (u, w) − f (w, u)
w∈T w∈S
! ! Example
= f (u, w) + f (u, w)
w∈T w∈S
! For our example, the cut S = V − {t}, T = {t}
= f (u, w) has capacity 7, so the maximum flow has value
w∈V
at most 7. (As we have already found a flow of
= 0 value 7 we can be sure that this is indeed the
maximum).

19 20
Max-flow min-cut theorem

The important max-flow min-cut theorem tells Justification


us that the inequality of the previous slide is
actually an equality. Condition (1) implies Condition (2)

Theorem If f is a maximum flow then clearly we cannot


find any augmenting paths.
If f is a flow in the flow network G with source
s and sink t, then the following three Condition (2) implies Condition (3)
conditions are equivalent.
If the residual network contains no paths from
s to t, then let S be all the vertices reachable
1. f is a maximum flow in G
from s, and let T be the remaining vertices.
There are no edges in Gf from S to T and
2. The residual network Gf contains no hence all of these edges are saturated and
augmenting paths from s to t f (S, T ) = c (S, T ).

Condition (3) implies Condition (1)


3. |f | = c (S, T ) for some s,t-cut S, T

Obvious.
The max-flow min-cut theorem is an instance
of duality that is used in linear optimization

21 22

Finding the minimum cut Ford-Fulkerson is correct

Let us find the minimum cut in our previous The main significance of the max-flow min-cut
example. theorem is that it tells us that if our current
flow is not the maximum flow, then we are
guaranteed that there will be an augmenting
! 4 ! 6
" " " " t path.
! !
!" 9 !" 3 !"

3 2 2 5 4 2 This means that the Ford-Fulkerson method is


always guaranteed to find a maximum flow,
regardless of how we choose augmenting paths.
"!
! 6 "!
! 9 "!
" " " "
! !
!" 7 !" 5 !" However it is possible to make unfortunate
choices of augmenting paths in such a way that
7 9 3 4 9 6
the algorithm may take an inordinate amount
of time to finish, and indeed examples can be
"!
! 4 "!
! 2 "!
constructed so that if a bad augmenting path
s " " " "
3 !
6 ! is chosen the algorithm will never finish.

23 24
Complexity of Ford-Fulkerson
Improving the performance

If the capacities of the edges are all integers,


Edmonds and Karp suggested two heuristics
then at each step the algorithm augments the
for improving the performance of the algorithm
flow by at least 1 unit. Finding each
by choosing better augmenting paths.
augmenting path can be done in time O(E)
(where E is the number of edges of the original
network that have non-zero capacity). Their first heuristic seems natural enough:

Therefore the complexity is O(E|f ∗|) where |f ∗| • Always augment by a path of maximum
is the value of the maximum flow. bottleneck capacity

It is easy to construct examples where it would


actually take this long. The path of maximum bottleneck capacity can
be found by a priority-first search — similar to
1000 Dijkstra’s algorithm except that the priority is
!
" t based on the bottleneck capacity of the path
!
! so far, rather than distances.
!
!
! 1
1000 ! 1000
"! !
!
"!
This can be implemented in time O(E ln |f ∗|),
!
!
! but we will not consider the derivation of this
!
s ! bound.
"
1000

25 26

Analysis
The second heuristic
We can view the Edmonds-Karp heuristic as
Edmonds and Karp’s second heuristic produces operating in several “stages” where each stage
an asymptotic complexity which is independent deals with all the augmenting paths of a given
of the edge capacities. length.

How many augmenting paths of a given length


• Always augment by a path with the fewest
can there be?
number of edges

There can be at most E augmenting paths of


Suppose we perform a breadth-first search on a given length, and if performed efficiently,
the residual network G to find the shortest each augmentation can be done in time E.
path p from s to t. After augmenting the flow
along p by the bottleneck capacity, consider As there are at most V stages of the algorithm
the new residual network G". The shortest path we get a total time of O(V E 2).
from s to t in G" must be at least as long as
that in G, because any new edges in G" that
are not in G must point back along the path p,
and hence cannot contribute to a shorter path.

This shows that the lengths of the shortest


paths found at each stage is always constant or
increasing.

27 28
Bipartite graph
Applications of network flow
A bipartite graph is an undirected graph
One interesting application of network flow is G = (V, E) in which V can be partitioned into
to solve the bipartite matching problem. two sets V1 and V2 such that (u, v) ∈ E implies
either u ∈ V1 and v ∈ V2 or u ∈ V2 and v ∈ V1.
A matching in a graph G is a set of edges that That is, all edges go between the two sets V1
do not share any vertices. and V2.

The maximum matching problem is to find the Example of a bipartite graph


largest possible matching in a graph.
A

Although the problem can be solved in


F
polynomial time, the algorithm required to
B (Vertices in V1
solve it in the general case (when G can be any
may represent a
graph) is horrendously complicated. G
number
C
of machines and
The situation is much simpler for a bipartite vertices in V2
H
graph, where network flow can directly be used. D may represent a
number of
Matchings for bipartite graphs can be I
tasks.)
E
considered to be simple versions of the stable
marriage problem.
V1 V2

29 30

Finding a maximum bipartite matching


using Ford-Fulkerson method
The stable marriage problem

A
The stable marriage problem is not a network
F flow problem, but it is a matching problem.
B The scenario is as follows:

G
s C t
We are given two sets, VM and VF (male and
female) of the same size. Also, every man
H v ∈ Vm ranks every woman u ∈ VF and every
D
woman u ∈ VF ranks every man in VM .

I
E We will write u <v u" if v would rather marry u
than u".
V1 V2
The stable marriage problem is to find a
(Assume all edge capacities are 1). matching E ⊂ VM × VF such that if (v, u) and
(w, z) are in E, then either u <v z or w <z v.

Recommended reading: CLRS, Chapter 26

31 32
Pseudo-Code

Let P be a listing of everybody’s preferences.

procedure StableMarriage(VM , VF , P )
Solution E←∅
while |E| < n
The Gale-Shapley algorithm is a solution to the for each v ∈ VM where ∀u, (v, u) ∈ /E
stable marriage problem that involves a u ← v’s next preference
number of rounds. Heuristically, each round if (w, u) ∈ E and v <u w
every “unengaged” man proposes to his most E ← E − {(w, u)} ∪ {(v, u)}
preferred woman that he has not already else if ∀w, (w, u) ∈
/E
proposed to. If this woman is unengaged or E ← E ∪ {(v, u)}
engaged to someone she prefers less than her return E
new suitor, she breaks off her current
engagement and accepts the new proposal.

These iterations continue until everybody is


engaged. It can be shown that at this point we
have a solution to the stable marriage problem.

33 34

Summary cont.
Summary

5. the Max-flow min-cut theorem states that


1. A flow network is a directed, weighted
the maximum flow for the source to the
graph with a source and a sink
sink is equal to the minimum flow across all
cuts.
2. A flow assigns a real value to each edge in
the flow network and satisfies the capacity
6. Edmonds’ and Karp’s Heuristics improve
constraint, skew symmetry and flow
the performance of the Ford-Fulkerson
conservation.
method.

3. The Ford Fulkerson method solves the


7. Flow networks can be used to solve some
maximum flow problem by iteratively search
simple matching problems in Bipartite
for augmenting paths in the residual flow
graphs.
network.

8. The Gale-Shapely algorithm can be used to


4. A cut is a set of edges that divides the
solve the stable marriage matching
source form the sink.
problem.

35 36
Computational Geometry
Computer Science and Software Engineering, 2011
Computational Geometry is the study of
algorithms for solving geometric problems.
This has applications in many areas such as
graphics, robotic, molecular modelling, forestry,
statistics, meteorology,... basically any field
that stores data as a set of points in space.

CITS3210 Algorithms In this topic we will restrict our attention to


geometric problems in the 2 dimensional plane.
Computational Geometry

Notes by CSSE, Comics by xkcd.com


1 2

Geometric Objects Geometric Objects

The types of objects we will use are:

• Points: a point (x, y) is represented using


two Cartesian coordinates: x, y ∈ R. The
origin (0, 0) is point.

• Lines: line p = (p1, p2) is represented by


two points p1 "= p2 which represent the two Line
end points of a line. Sometimes we may
represent a line using only one point, with vector
the understanding that second point is the
origin.

• Vectors: A vector !v = (v1, v2) is a point


direction. A vector v and a point (x, y) may polygon
represent the (directed) line segment
(x, y), (x + v1, y + v2).

• Polygons: A polygon is a closed shape


made up of straight edges (e1, e2, ..., ek )
where ei is a line for i = 1, ..., k and the
start point of e1 is the end point of ek .

3 4
Is this point on a line?
Geometric Problems

Given these basic structures we may be


interested in:

• whether a point is on a line?

• how close is a point to a line?

• whether two lines intersect?

• where do two lines intersect?

• whether a point is inside a polygon? Do these two lines intersect?

• what is the area of a polygon?

• what is the smallest polygon that contains


a number of points?

5 6

Is this point inside the polygon?

The problem with geometry

The difficulty with geometric algorithms is that


the “Human” approach is so different to the
algorithmic approach. As humans we are able
to represent the geometric objects in a 2
dimensional plane, and a single glance at that
plane is enough to determine whether or not a
point is on a line, or in a polygon.

However there is no algorithmic variation of a


“glance”, so we are required to look for a
mathematical approach to solving these
problems.

Do any of these lines intersect? It turns out that surprisingly little mathematics
is required for the algorithms we will be using,
although some basic linear algebra is assumed.
(You should know how to add and subtract
points in the plane, perform scalar
multiplication, calculate Euclidean distances
and solve systems of linear equations).

7 8
The Cross-Product
Simplifying the cross product
In fact the geometric mathematics in the
problems we have mentioned can generally be The mathematical definition and properties of
reduced to the question: the cross product are all very interesting, but
the only thing we need to worry about is the
sign of p1 × p2: if it is positive, then p1 is to
With respect to point A, is point B to the right of p2; if it is 0, p1 and p2 are on the
the left or right of point C? same line; and if it is negative p1 is to the left
of p2 (all with respect to the origin).
Suppose p1 = (x1, y1) and p2 = (x2, y2) are two
vectors (i.e. lines that start at the origin and To this end, let’s define a function for the
end at the given point). The cross product of direction of an angle
these two vectors is the (3D) vector that is
perpendicular to both vectors, and has
Dir(p0,p1, p2) = (p1 − p0) × (p2 − p0)
magnitude

p1 × p2 = x1y2 − x2y1
The only thing you need to be careful of is
which in turn is equal to the signed area of the that you get the order of the points right.
parallelogram with edges p1 and p2.

p2

p1 9 10

The right hand rule Using the Cross-Product

The first question we can answer is whether a


point, p0, is on a line, (p1, p2).

procedure OnLine(p0, p1, p2)


if Dir(p0, p1, p2) = 0
if p1.x ≤ p0.x ≤ p2.x
return true
else if p2.x ≤ p0.x ≤ p1.x
return true
else return false
else return false

p0

>0 p2

<0
p0
p1

You should always check the specification


carefully to see if the end-points of a segment
(Which version of the rule you use isn’t should be included. There are similar boundary
important as long as it is applied consistently). conditions for most of the following algorithms.
11 12
Diagram

Closest point on a line

A more general version of the problem above is


what is the closest point on a line segment to
a given point. This can be solved in a number
of ways, including binary search, or using p0
trigonometry. The following solution makes p2
use of some very basic calculus to determine
the closest point on the line (p1, p2) to the
point p0.

procedure ClosestPoint(p0, p1, p2) t


a ← p2.x − p1.x, b ← p2.y − p1.y
t ← (a(p0 .x − p1.x) + b(p0.y − p1.y))/(a2 + b2) p1
if t > 1
return p2
else if t < 0
return p1
else
return ((p1.x + ta), (p1.y + tb)

13 14

Dot products Intersecting Lines

A related concept is the dot product of two The next question we can answer is whether
vectors v = (x1, y1) and u = (x2, y2), which is two lines, (p0, p1) and (p2, p3) intersect. (To
calculated as x1x2 + y1y2. simplify the code we will suppose intersect
means the lines cross properly, rather than
The dot product is the length of the vector v touch at an end-point, and the endpoints of
when it is projected orthogonally onto the each line are sorted lexicographically).
vector u multiplied by the length of u (or vice
versa). procedure Crosses(p0 , p1, p2, p3)
u d ← Dir(p0, p1, p2)× Dir(p0, p1, p3)
if d < 0
return true
else return false

Crosses is true if both p2 and p3 are not both


on the same side of (p0, p1). However this is
not enough to decide whether the lines
intersect.
p2
u.v/ |v| v
p3
p1
It is useful to know, but be careful not to
confuse it with the cross product.
15 16
p0
Intersecting Line Segments

Crosses(p, q) actually returns if the line


Any Segments Intersect
segment q intersects the infinite extension of p.
We can reuse this method the see if the two
The methods we have seen so far barely
segments intersect.
deserve to be called algorithms. They are
simply achieved through basic arithmetic and
procedure Intersects(p0, p1, p2, p3)
comparisons.
if Crosses(p0 , p1, p2, p3)
if Crosses(p2, p3, p0, p1)
However, as with all things computer, once we
return true
know how to do one thing well we want to
else return false
know if we can do it well many times all at
else return false
once....

Now suppose we are given a set of lines, and


we would like to know whether any two lines in
the set intersect.

Here we adopt something akin to a human


approach, where we scan all of the points
p3 p1 making up to line segments, inspecting each to
see if there is an intersection.
p2
p0
17 18

The sweep line

The Sweep Line


d
To process all the line segments efficiently we
imagine a sweep line passing moving from left a c
to right.
h
e
Every time it the start or the end of a line f
segment it triggers an event, where we must b
check to see if an intersection has occured.
g

As the sweep line progresses from left to right,


<d,e,f,g>
we are required to maintain an ordered list of
the line segments that cross the sweep line,
ordered from top to bottom (with respect to
their first point).

19 20
Events
Pseudo-code
There are two types of events we must
consider: when we encounter the first point of Suppose that S = {s1, s2, ...sn} is a set line
a line segment, and when we encounter the segments, where si = (pi , qi), and let T be an
second point of a line segment: ordered list.

• When we encounter the first point of a line Sort the set of points {pi, qi | si ∈ S}
segment, we insert the line segment into for each point r in the sorted list
our ordered list of line segments. When we do if r = pi
do this we should check to see if the new then INSERT(T ,si )
line segment intersects the line segment if ABOVE(T ,si ) exists and intersects si
directly above it or below it in the list. then return true
if BELOW(T ,si ) exists and intersects si
then return true
• When we encouter the end point of a line else if r = qi
segment, we remove the segment from the then if ABOVE(T ,si ) and BELOW(T ,si )
ordered list. This will cause the line then return true
segment above and below the removed DELETE(T ,si )
segment to become adjacent, so we must return false
check if they intersect.

21 22

Correctness

To see the correctness of this algorithm we


need to show the following three statements:
Other considerations?

1. If two line segments si and sj intersect, From the previous statements we are able to
there is some event point x, such that at x conclude the algorithm is correct. However:
si is next to sj in the list (say si and sj are
neighbours).
• What if lines start and end at the same
sweep line. Does the proof still work?
2. In between event points, the only way new
neighbours may arise is if already
• What simplifying assumption does this
neigbouring line segments intersect.
algorithm make?

3. At an event point, the only way new


• How would you change this algorithm if you
neighbours may arise is by adding a line
actually needed to return all intersections?
segment (creating up to two new pairs of
neighbours), or by removing a line segment
(creating up to one new pair of
neighbours).

23 24
Complexity

Assuming there are N line segments there are Convex Hull


2N event-points to deal with.
The problem of the convex hull is to find the
• These points must be sorted (using, say smallest convex polygon that contains a set of
heapsort). This takes O(N lg N ) time. points.

• Then, for each point we need to either: A shape is convex if any line segment
between two points inside the shape
1. insert the corresponding segment into
an ordered list (finding the segments remains inside the shape.
above and below) O(lgN ), and calculate
whether the segments intersect O(1); or
The convex hull algorithm we will examine is
2. find the segments above and below known as Graham’s scan. Like the
(O(lg N )), calculate whether they all-segments intersection algorithm, it is based
intersect (O(1)), and delete the segment on a sweep, but this time it is a roational
from the ordered list (O(lg N )). sweep, rather than a linear sweep.

This gives overall performance of O(N lg N ). The algorithm is based on the fact that you
can walk clockwise round the perimeter of a
Note that the insertions, deletions and find convex hull, and every point of the set will
operations on the ordered list require an always on your right.
efficient implementation, such as a Red-Black
tree (java.util.TreeMap) or an AVL tree.
25 26

The Convex Hull Graham’s Scan

Graham’s scan starts with the (bottom)


left-most point of the set, and processes the
points in order from left to right (with resepct
to that point). The each point is examined
and if there is known point to its left (with
respect to any point examined so far) it is a
candidate for the convex hull. These condidate
points are kept on a stack. As every new point
is examined, either:

1. there is only one point already in the stack


in which case the new point is added.

2. it is to the right of the line segment


through the top two points on the stack, in
which case it is pushed on to the stack; or

3. it is to the left of the line segment through


the top two points on the stack, in which
case the top element of the stack is
popped off, and we repeat the test.

After every point has been processed, the


remaining points on the stack form the convex
hull.
27 28
Graham’s Scan

Pseudo-code

Let P be a set of points {p1, ..., pn} and S be a


stack.

2 procedure Graham-Scan(P )
1 Find the left-most point p0 in P
5
Sort the set of points P − {p0}
3
according to their angle around p0
for each point p in the sorted list
4 6 do if |S| = 1
0
8 then PUSH(S,p)
7
else q1 ← POP(S), q0 ← POP(S)
while DIR(p0,p1, p) ¡0
9
do q1 ← q0, q0 ← POP(S)
PUSH(S, q0), PUSH(S,q1), PUSH(S,q)
RETURN S.

29 30

Correctness (sketch)
Complexity
The algorithm returns a stack of points. If we
list these points in order (wrapping back to the We can find the left-most point in time O(n).
start vertex) we get the edges of a polygon.
We can sort the points according to their angle
We now must show two things: around p0 in time O(n lg n). Note we do not
have to calculate the angle to do this. We can
just do a normal sort, but instead of using <
1. The polygon is convex: This follows from for comparison, we can use the DIR function.
the fact that the algorithm ensures that
each corner of the polygon turns right (for The algorithm then has two nest loops each
a clockwise direction). potentially going through n iterations.
However, we may note that the inner loop is
popping elements of the stack. As each
2. The polygon contains every point in P :
element is added to the stack exactly once,
Every point p is added to S, and points are
this operation cannot be performed more than
only removed if we find an edge p1p2 such
n times.
that the triangle p0p1p2 contains p. As
p0, p1, p2 will then appear in the stack, p
will be contained in the polygon. Therefore the total complexity is O(n lg n)

31 32
Closest pair of points

Closest pair of points


Pl Pr
Finding the closest pair of points in a set of
points is another useful algorithm that
demonstrates a useful technique for d
computational geometry - divide and conquer.

The problem assumes we are given a set of


points P , and we are required to return the two
points tat are closest to one another with
respect to Euclidean distance.
d d
The idea of the algorithm is to split the set of
points P into two halves, PL and PR . We can
then recursively solve the problem for the two
halves and then combine the solution (take the
minimum pairing). But what is there is one
point in PL and one point in PR that are closer
to one another? We need a way to quickly
check if there are any closer pairs crossing the
partition.

33 34

Pseudo-code

Checking Points Across Partitions Let P = {p1, ..., pn } be a set of points. We will
just give the method for finding the closest
If we have solved the Closest Pair of Points distance. We will assume that P is sorted by
problem for PL and PR then we know the x-coordinate, and also that we have access to
minimum distance betwwen any pair of points a copy of P , Q, that is sorted by y-coordinate
in eitehr partition. Let this distance be δ. (with an inverse index).
Therefore, we only need to check if points
within a δ-width strip on either side of the procedure ClosestPair(P )
divide are closer. Split P into PL (the n/2 leftmost points)
and PR (the other points)
Furthermore, we know all the points in either δ = min{ClosestPair(PL ), ClosestPair(PR )}
side of the 2δ-width strip must be at least a For each point p in P
distance of δ from one another. We can use if px is within δ of pxn/2
this fact to show that each point in the strip then add p to A
y
only needs to be compared to the 5 subsequent For each point qi in A in order of qi
points (ordered from top to bottom). For j = 1, ..., 5
δ = min{δ, DIST (qi , qi+j }
RETURN δ

35 36
Analysis Point inside a polygon

The divide and conquer strategy is easily seen Suppose we are given a polygon (as a set of
to be valid, however the algorithm descibed points P = (p0, p1, ...pn) where p0 = pn) and we
above uses a number of optimizations, such as are required to determine whether a point q is
presorting and examining a relatively small set inside the polygon or not.
of pairs of points. See CLRS chapter 33 for a
justification of these optimizations. This algorithm is relatively simple. We take a
line segment starting at q and ending at a
The complexity of O(n lg n) can be shown point we know to be outside the polygon.
using the recurrence: Then we count the number of times the line
segment crosses an edge of the polygon. As
T (n) = 2T (n/2) + O(n)
every time it crosses an edge, the line segment
(see for example merge-sort). What goes from insdie the polygon to outside, or
optimizations do we require to ensure that the from outside the polygon to inside. Therefore
divide and merge can be performed in time if there are an odd number of such crossings,
O(n)? the pointis inside, otherwise it is outside.

37 38

Ray casting Pseudo-code

Given a simple polygon P = (p0, p1, ..., pn )


where p0 = pn and a point q we determine
whether q is in P as follows:

procedure Pt-In-Poly(P , q)
Find a point r outside the polygon
e.g. ((minX) − 1, 0)
set c ← 0
for i = 1 to n
do if Intersects(r, q, pi−1, pi)
5 then c ← c + 1
4 if c is even return false
else return true.
3
1 2
You should be careful how the Intersect
methods treats lines touching at a point, or
interval.The intersects method provided in
these notes requires that the lines properly
cross, but it still requires a careful treatment in
the rare case that the ray passes through the
corner of a polygon appropriate in this context.

This special case, along with correctness and


completeness are left as excercises.
39 40
A final note
Area of a polygon
We can see computational geometry has as
We will assume that we are given a polygon P much in common with graph algorithms as it
that is not self intersecting, and furthermore does with maths. However, a main point of
we shall suppose that the polygon is difference is to number of boundary cases to
represented by a sequence of points consider:
{p0, p1, ..., pn} where (pi , pi+1) are the edges in
clockwise order, and p0 = pn . Always check:

The area of a polygon can be computed by


1. Can you rearrange a sum to reduce round
breaking it up into a series of triangles. The
off error?
area may then be computed by calculated by
computing the sum:
2. Do you really need to do (real valued)
division?
1 n
Σ (xi+1yi − xiyi+1)
2 0
3. Have you considered all boundary cases:
The derivation follows from the points touching, co-linear points, 0-area
characterisation of the cross-product as the polygons?
area of the parallelogram created by the
adjacent vectors.
4. What degree of accuracy is required?

41 42

Summary

We have examined:

• 2D Geometric objects.

• Cross-products, dot products and the right


hand rule.

• Closest point to a line, intersecting line


segments.

• Sweep lines:

– All intersecting segments.

– Convex hull.

• Divide and Conquer: closest pair of points.

• Ray casting: Point inside a polygon.

• Useful functions: Area of a polygon

43
Computer Science and Software Engineering, 2011

Overview

In this topic we will look at pattern-matching


algorithms for strings. Particularly, we will look
CITS3210 Algorithms at the Rabin-Karp algorithm, the
Knuth-Morris-Pratt algorithm, and the
String and File Processing Boyer-Moore algorithm.

We will also consider a dynamic programming


solution to the Longest Common Substring
problem.

Finally we will examine some file compression


algorithms, including Huffman coding, and the
Ziv Lempel algorithms.

Notes by CSSE, Comics by xkcd.com


1 2

Matches
Pattern Matching
String-matching clearly has many important
applications — text editing programs being
We consider the following problem. Suppose T
only the most obvious of these. Other
is a string of length n over a finite alphabet Σ,
applications include searching for patterns in
and that P is a string of length m over Σ.
DNA sequences or searching for particular
patterns in bit-mapped images.
The pattern-matching problem is to find
occurrences of P within T . Analysis of the
We can describe a match by giving the number
problem varies according to whether we are
of characters s that the pattern must be
searching for all occurrences of P or just the
shifted along the text in order for every
first occurrence of P .
character in the shifted pattern match the
corresponding text characters. We call this
For example, suppose that we have
number a valid shift.
Σ = {a, b, c} and

abaaabacccaabbaccaababacaababaac
T = abaaabacccaabbaccaababacaababaac aab
P = aab aab
aab
aab
Our aim is to find all the substrings of the text
aab
that are equal to aab.

Here we see that s = 3 is a valid shift.


3 4
Analysis of the naive method
The naive method
In the worst case, we might have to examine
The naive pattern matcher simply considers each of the m characters of the pattern at
every possible shift s in turn, using a simple every candidate shift.
loop to check if the shift is valid.
The number of possible shifts is n − m + 1 so
When s = 0 we have the worst case takes

m(n − m + 1)
abaaabacccaabbaccaababacaababaac
aab comparisons.

which fails at the second character of the The naive string matcher is inefficient because
pattern. when it checks the shift s it makes no use of
any information that might have been found
When s = 1 we have earlier (when checking previous shifts).

abaaabacccaabbaccaababacaababaac For example if we have


aab
000000001000001000000010000000
which fails at the first character of the pattern. 000000001

Eventually this will succeed when s = 3. then it is clear that no shift s ≤ 9 can possibly
work.
5 6

Rabin-Karp algorithm
Rabin-Karp continued
The naive algorithm basically consists of two
nested loops — the outermost loop runs
Thus to try the shift s = 0, instead of
through all the n − m + 1 possible shifts, and
comparing
for each such shift the innermost loop runs
through the m characters seeing if they match. 1−7−6
against
Rabin and Karp propose a modified algorithm
that tries to replace the innermost loop with a 1−2−2
single comparison as often as possible. character by character, we simply do one
operation comparing 176 against 122.
Consider the following example, with alphabet
being decimal digits. It takes time O(m) to compute the value 176
from the string of characters in the pattern P .
122938491281760821308176283101
176
However it is possible to compute all the
n − m + 1 decimal values from the text just in
Suppose now that we have computer words
time O(n), because it takes a constant number
that can store decimal numbers of size less
of operations to get the “next” value from the
than 1000 in one word (and hence compare
previous.
such numbers in one operation).

Then we can view the entire pattern as a To go from 122 to 229 only requires dropping
single decimal number and the substrings of the 1, multiplying by 10 and adding the 9.
the text of length m as single numbers.
7 8
Rabin-Karp formalized
But what if the pattern is long?
Being a bit more formal, let P [1..m] be an
array holding the pattern and T [1..n] be an This algorithm works well, but under the
array holding the text. unreasonable restriction that m is sufficiently
small that the values p and {ts | 0 ≤ s ≤ n − m}
We define the values all fit into a single word.

p = P [m] + 10P [m − 1] + · · · + 10m−1P [1]


To make this algorithm practical Rabin and
ts = T [s+m]+10T [s+m−1]+· · ·+10m−1T [s+1] Karp suggested using one-word values related
to p and ts and comparing these instead. They
suggested using the values
Then clearly the pattern matches the text with
shift s if and only if ts = p. p# = p mod q
and
The value ts+1 can be calculated from ts easily
t#s = ts mod q
by the operation
where q is some large prime number but still
ts+1 = 10(ts − 10m−1T [s + 1]) + T [s + m + 1] sufficiently small that dq fits into one word.

If the alphabet is not decimal, but in fact has Again it is easy to see that t#s+1 can be
size d, then we can simply regard the values as computed from t#s in constant time.
d−ary integers and proceed as before.

9 10

Example
The whole algorithm
Suppose we have the following text and pattern
If t#s $= p# then the shift s is definitely not valid,
and can thus be rejected with only one 5 4 1 4 2 1 3 5 6 2 1 4 1 4
comparison. If t#s = p# then either ts = p and 4 1 4
the shift s is valid, or ts $= p and we have a
spurious hit. Suppose we use the modulus q = 13, then
p# = 414 mod 13 = 11.

The entire algorithm is thus:


What are the values t#0, t#1, etc associated with
the text?
Compute p# and t#0
for s ← 0 to n − m do !5 "#4 1$ 4 2 1 3 5 6 2 1 4 1 4
if p# = t#s then t#0 = 8
if T [s + 1..s + m] = P [1..m] then
output “shift s is valid” 5 4
! "#1 4$ 2 1 3 5 6 2 1 4 1 4
t#1 = 11
end if
end if
This is a genuine hit, so s = 1 is a valid shift.
Compute t#s+1 from t#s
end for 5 4 1
! "#4 2$ 1 3 5 6 2 1 4 1 4
t#2 = 12
The worst case time complexity is the same as
for the naive algorithm, but in practice where We get one spurious hit in this search:
there are few matches, the algorithm runs
5 4 1 4 2 1 3 5 6 2 !1 "#
4 1$ 4
quickly. t#10 = 11
11 12
A finite automaton
Finite automata
Consider the following 4-state automaton:
Recall that a finite automaton M is a 5-tuple
(Q, q0, A, Σ, δ) where
Q = {q0, q1, q2, q3}
A = {q3}
• Q is a finite set of states Σ = {0, 1}
δ is given by the following table
• q0 ∈ Q is the start state

• A ⊆ Q is a distinguished set of accepting q 0 1


states q0 q0 q1
q1 q0 q3
• Σ is a finite input alphabet q2 q0 q3
q3 q0 q0
• δ : Q × Σ → Q is a function called the
transition function

q2 1" q3
!

Initially the finite automaton is in the start !


!
!
state q0. It reads characters from the input !
!
string x one at a time, and changes states 0 !
!
"! ! 01 !"
according to the transition function. Whenever !
! 1
!
the current state is in A, the set of accepting !
!
!
states, we say that M has accepted the string !
! 0
!
"
read so far. 0 q0 q1
!"
"
"
!
1
13 14

A string matching automaton


Skipping invalid shifts
We shall devise a string matching automaton
such that M accepts any string that ends with The reason that we can immediately eliminate
the pattern P . Then we can run the text T the shift s + 1 is that we have already
through the automaton, recording every time examined the following characters (while trying
the machine enters an accepting state, thereby the shift s)
determining every occurrence of P within T .
a b
! b "#
a b b$ a b a a
To see how we should devise the string
matching automaton, let us consider the naive and it is immediate that the pattern does not
algorithm at some stage of its operation, when start like this, and hence this shift is invalid.
trying to find the pattern abbabaa.
To determine the smallest shift that is
a b b a b b a b a a
consistent with the characters examined so far
a b b a b a a
we need to know the answer to the question:
Suppose we are maintaining a counter
indicating how many pattern characters have “What is the longest suffix of this string that is
matched so far — this shift s fails at the 6th also a prefix of the pattern P ?”
character. Although the naive algorithm would
suggest trying the shift s + 1 we should really In this instance we see that the last 3
try the shift s + 3 next. characters of this string match the first 3 of
the pattern, so the next feasible shift is
a b b a b b a b a a s + 6 − 3 = s + 3.
a b b a b a a
15 16
The states

For a pattern P of length m we devise a string


matching automaton as follows: The transition function

The states will be Now suppose, for example, that the automaton
is given the string
Q = {0, 1, . . . , m}
where the state i corresponds to Pi, the a b b a b b ···
leading substring of P of length i.
The first five characters match the pattern, so
The start state q0 = 0 and the only accepting the automaton moves from state 0, to 1, to 2,
state is m. to 3, to 4 and then 5. After receiving the sixth
character b which does not match the pattern,
a b b a b a a what state should the automaton enter?
! ! ! ! ! ! !
0 " 1 " 2 " 3 " 4 " 5 " 6 " 7
As we observed earlier, the longest suffix of this
This is only a partially specified automaton, string that is a prefix of the pattern abbabaa
but it is clear that it will accept the pattern P . has length 3, so we should move to state 3,
indicating that only the last 3 characters read
We will specify the remainder of the match the beginning of the pattern.
automaton so that it is in state i if the last i
characters read match the first i characters of
the pattern.

17 18

The entire automaton Using the automaton

We can express this more formally: The automaton has the following transition
function:
If the machine is in state q and receives a q a b
character c, then the next state should be q ! 0 1 0
where q ! is the largest number such that Pq! is 1 1 2
a suffix of Pq c. 2 1 3
3 4 0
Applying this rule we get the following finite 4 1 5
5 6 3
state automaton to match the string abbabaa.
6 7 2
7 1 2
! "
b
! " "
! "
a Use it on the following string
a b
a b b a b a a ababbbabbabaabbabaabaaabababbabaabbabbaa
0 1 2 3 4 5 6 7
#$ b #$ a a b b
% & Character a b a b b b a b b a b
% & Old state 0 1 2 1 2 3 0 1 2 3 4
% &
New state 1 2 1 2 3 0 1 2 3 4 5

Compressing this information:


By convention here all the horizontal edges are
pointing to the right, while all the curved line
ababbbabbabaabbabaabaaabababbabaabbabbaa
segments are pointing to the left.
1212301234567234567211121212345672345341
19 20
Analysis and implementation

Given a pattern P we must first compute the Regular expressions


transition function. Once this is computed the
time taken to find all occurrences of the
pattern in a text of length n is just O(n) —
each character is examined precisely once, and
no “backing-up” in the text is required. This
makes it particularly suitable when the text
must be read in from disk or tape and cannot
be totally stored in an array.

The time taken to compute the transition


function depends on the size of the alphabet,
but can be reduced to O(m|Σ|), by a clever
implementation.

Therefore the total time taken by the program


is O(n + m|Σ|)

Recommended reading: CLRS, Chapter 32,


pages 906–922

21 22

Knuth-Morris-Pratt The prefix function

The Knuth-Morris-Pratt algorithm is a Let us return to our example where we are


variation on the string matching automaton matching the pattern abbabaa.
that works in a very similar fashion, but
eliminates the need to compute the entire
Suppose as before that we are matching this
transition function.
against some text and that we detect a
mismatch on the sixth character.
In the string matching automaton, for any
state the transition function gives |Σ| possible a b b a b x y z
destinations—one for each of the |Σ| possible a b b a b a a
characters that may be read next.
In the string-matching automaton we used
The KMP algorithm replaces this by just two information about what the actual value of x
possible destinations — depending only on was, and moved to the appropriate state.
whether the next character matches the
pattern or does not match the pattern. In KMP we do exactly the same thing except
that we do not use the information about the
As we already know that the action for a value of x — except that it does not match
matching character is to move from state q to the pattern. So in this case we simply consider
q + 1, we only need to store the state changes how far along the pattern we could be after
required for a non-matching character. This reading abbab — in this case if we are not at
takes just one array of length m, and we shall position 5 the next best option is that we are
see that it can be computed in time O(m). at position 2.

23 24
The KMP algorithm

The prefix function π then depends entirely on


the pattern and is defined as follows: π(q) is Heuristics
the largest k < q such that Pk is a suffix of Pq .
Although the KMP algorithm is asymptotically
The KMP algorithm then proceeds simply: linear, and hence best possible, there are
certain heuristics which in some commonly
q←0 occurring cases allow us to do better.
for i from 1 to n do
while q > 0 and T [i] "= P [q + 1] These heuristics are particularly effective when
q ← π(q) the alphabet is quite large and the pattern
end while quite long, because they enable us to avoid
if P [q + 1] = T [i] then even looking at many text characters.
q ←q+1
end if The two heuristics are called the bad character
if q = m then heuristic and the good suffix heuristic.
output “shift of i − m is valid”
q ← π(q) The algorithm that incorporates these two
end if independent heuristics is called the
end for Boyer-Moore algorithm.

This algorithm has nested loops. Why is it


linear rather than quadratic?

25 26

The algorithm without the heuristics

The algorithm before the heuristics are applied


is simply a version of the naive algorithm, in Bad characters
which each possible shift s = 0, 1, . . . is tried in
turn. Consider the following situation:

However when testing a given shift, the o n c e _ w e _ n o t i c e d _ t h a t


characters in the pattern and text are i m b a l a n c e
compared from right to left. If all the
characters match then we have found a valid The two last characters ce match the text but
shift. the i in the text is a bad character.

If a mismatch is found, then the shift s is not Now as soon as we detect the bad character i
valid, and we try the next possible shift by we know immediately that the next shift must
setting be at least 6 places or the i will simply not
match.
s←s+1
and starting the testing loop again. Notice that advancing the shift by 6 places
means that 6 text characters are not examined
The two heuristics both operate by providing a at all.
number other than 1 by which the current shift
can be incremented without missing any
matches.

27 28
The bad character heuristic

The bad-character heuristic involves Good suffixes


precomputing a function
Consider the following situation:
λ : Σ → {0, 1, . . . , m}
such that for a character c, λ(c) is the t h e _ l a t e _ e d i t i o n _ o f
right-most position in P where c occurs (and 0 e d i t e d
if c does not occur in P ).

Then if a mismatch is detected when scanning The characters of the text that do match with
position j of the pattern (remember we are the pattern are called the good suffix. In this
going from right-to-left so j goes from m to case the good suffix is ed. Any shift of the
1), the bad character heuristic proposes pattern cannot be valid unless it matches at
advancing the shift by the equation: least the good suffix that we have already
found. In this case we must move the pattern
s ← s + (j − λ(T [s + j]))
at least 4 spaces in order that the ed at the
beginning of the pattern matches the good
Notice that the bad-character heuristic might suffix.
occasionally propose altering the shift to the
left, so it cannot be used alone.

29 30

The Boyer-Moore algorithm

The good-suffix heuristic


The Boyer-Moore algorithm simply involves
taking the larger of the two advances in the
The good-suffix heuristic involves
shift proposed by the two heuristics.
precomputing a function

γ : {1, . . . , m} → {1, . . . , m} Therefore, if a mismatch is detected at


character j of the pattern when examining shift
where γ(j) is the smallest positive shift of P so
s, we advance the shift according to:
that it matches with all the characters in
P [j + 1..m] that it still overlaps. s ← s + max(γ(j), j − λ(T [s + j]))

We notice that this condition can always be


The time taken to precompute the two
vacuously satisfied by taking γ(j) to be m, and
functions γ and λ can be shown to be O(m)
hence γ(j) > 0.
and O(|Σ| + m) respectively.

Therefore if a mismatch is detected at


Like the naive algorithm the worst case is when
character j in the pattern, the good-suffix
the pattern matches every time, and in this
heuristic proposes advancing the shift by
case it will take just as much time as the naive
s ← s + γ(j) algorithm. However this is rarely the case and
in practice the Boyer-Moore algorithm
performs well.

31 32
Example Example continued

Consider the pattern: What is γ(22)? This is the smallest shift of P


that will match the 1 character P [23], and this
o n e _ s h o n e _ t h e _ o n e _ p h o n e is 6.

o n e _ s h o n e _ t h e _ o n e _ p h o n e
What is the last occurrence function λ? o n e _ s h o n e _ t h e _ o n e

c λ(c) c λ(c) c λ(c) c λ(c)


a 0 h 20 o 21 v 0 The smallest shift that matches P [22..23] is
b 0 i 0 p 19 w 0 also 6.
c 0 j 0 q 0 x 0
d 0 k 0 r 0 y 0 o n e _ s h o n e _ t h e _ o n e _ p h o n e
e 23 l 0 s 5 z 0 o n e _ s h o n e _ t h e _ o n e
f 0 m 0 t 11 - 0
g 0 n 22 u 0 18
so γ(21) = 6.

The smallest shift that matches P [21..23] is


also 6

o n e _ s h o n e _ t h e _ o n e _ p h o n e
o n e _ s h o n e _ t h e _ o n e

so γ(20) = 6.
33 34

Longest Common Subsequence


However the smallest shift that matches
P [20..23] is 14 Consider the following problem

o n e _ s h o n e _ t h e _ o n e _ p h o n e
o n e _ s h o n e LONGEST COMMON
SUBSEQUENCE
Instance: Two sequences X and Y
so γ(19) = 14. Question: What is a longest common
subsequence of X and Y
What about γ(18)? What is the smallest shift
that can match the characters p h o n e? A Example
shift of 20 will match all those that are still
left.
If

o n e _ s h o n e _ t h e _ o n e _ p h o n e X = "A, B, C, B, D, A, B#
o n e and
Y = "B, D, C, A, B, A#
This then shows us that γ(j) = 20 for all
j ≤ 18, so then a longest common subsequence is either

 6
 20 ≤ j ≤ 22 "B, C, B, A#
γ(j) = 14 j = 19 or

 20 1 ≤ j ≤ 18
"B, D, A, B#

35 36
A recursive relationship A recursive solution

As is usual for dynamic programming problems This can easily be turned into a recursive
we start by finding an appropriate recursion, algorithm as follows.
whereby the problem can be solved by solving
smaller subproblems. Given the two sequences X and Y we find the
LCS Z as follows:
Suppose that
If xm = yn then find the LCS Z % of Xm−1 and
Yn−1 and set Z = Z %xm.
X = !x1, x2, . . . , xm"
If xm $= yn then find the LCS Z1 of Xm−1 and
Y = !y1, y2, . . . , yn" Y , and the LCS Z2 of X and Yn−1, and set Z
and that they have a longest common to be the longer of these two.
subsequence
It is easy to see that this algorithm requires the
Z = !z1, z2, . . . , zk "
computation of the LCS of Xi and Yj for all
values of i and j. We will let l(i, j) denote the
If xm = yn then zk = xm = yn and Zk−1 is a length of the longest common subsequence of
LCS of Xm−1 and Yn−1. Xi and Yj .

Otherwise Z is either a LCS of Xm−1 and Y or Then we have the following relationship on the
a LCS of X and Yn−1. lengths

(This depends on whether zk $= xm or zk $= yn  0
 if ij = 0
l(i, j) = l(i − 1, j − 1) + 1 if xi = yj
respectively — at least one of these two 
 max(l(i − 1, j), l(i, j − 1)) if x $= y
possibilities must arise.) i j

37 38

The dynamic programming table

We have the choice of memoizing the above


Memoization
algorithm or constructing a bottom-up
dynamic programming table.
The simplest way to turn a top-down recursive
algorithm into a sort of dynamic programming
In this case our table will be an
routine is memoization. The idea behind this is
(m + 1) × (n + 1) table where the (i, j) entry is
that the return values of the function calls are
the length of the LCS of Xi and Yj .
simply stored in an array as they are computed.

Therefore we already know the border entries


The function is changed so that its first step is
of this table, and we want to know the value of
to look up the table and see whether l(i, j) is
l(m, n) being the length of the LCS of the
already known. If so, then it just returns the
original two sequences.
value immediately, otherwise it computes the
value in the normal way.
In addition to this however we will retain some
additional information in the table - namely
Alternatively, we can simply accept that we
each entry will contain either a left-pointing
must at some stage compute all the O(n2)
arrow ←, a upward-pointing arrow ↑ or a
values l(i, j) and try to schedule these
diagonal arrow ).
computations as efficiently as possible, using a
dynamic programming table.
These arrows will tell us which of the subcases
was responsible for the entry getting that
value.
39 40
The first table
Our example
First we fill in the border of the table with the
zeros.
For our worked example we will use the
sequences j 0 1 2 3 4 5 6
i yj 1 1 0 1 1 0
0 xi 0 0 0 0 0 0 0
1 0 0
X = !0, 1, 1, 0, 1, 0, 0, 1" 2 1 0
3 1 0
and
4 0 0
Y = !1, 1, 0, 1, 1, 0" 5 1 0
5 0 0
7 0 0
Then our initial empty table is: 8 1 0

j 0 1 2 3 4 5 6 Now each entry (i, j) depends on xi, yj and the


i yj 1 1 0 1 1 0 values to the left (i, j − 1), above (i − 1, j), and
0 xi above-left (i − 1, j − 1).
1 0
2 1 In particular, we proceed as follows:
3 1
4 0 If xi = yj then put the symbol $ in the square,
5 1 together with the value l(i − 1, j − 1) + 1.
5 0
7 0 Otherwise put the greater of the values
8 1 l(i − 1, j) and l(i, j − 1) into the square with the
appropriate arrow.
41 42

The final array

After filling it in row by row we eventually


The first row reach the final array:

It is easy to compute the first row, starting in j 0 1 2 3 4 5 6


the (1, 1) position: i yj 1 1 0 1 1 0
0 xi 0 0 0 0 0 0 0
1 0 0 ↑0 ↑0 $1 ←1 ←1 $1
j 0 1 2 3 4 5 6
2 1 0 $1 $1 ↑1 $2 $2 ←2
i yj 1 1 0 1 1 0
3 1 0 $1 $2 ←2 $2 $3 ←3
0 xi 0 0 0 0 0 0 0
4 0 0 ↑1 ↑2 $3 ←3 ↑3 $4
1 0 0 ↑0 ↑0 $1 ←1 ←1 $1
5 1 0 $1 $2 ↑3 $4 $4 ↑4
2 1 0
6 0 0 ↑1 ↑2 $3 ↑4 ↑4 $5
3 1 0
7 0 0 ↑1 ↑2 $3 ↑4 ↑4 $5
4 0 0
8 1 0 ↑1 $2 ↑3 $4 $5 ↑5
5 1 0
6 0 0
7 0 0 This then tells us that the LCS of X = X8 and
8 1 0 Y = Y6 has length 5 — because the entry
l(8, 6) = 5.
Computation proceeds as described above.
This time we have kept enough information,
via the arrows, for us to compute what the
LCS of X and Y is.

43 44
Finding the LCS
Finding the LCS
The LCS can be found (in reverse) by tracing
the path of the arrows from l(m, n). Each We can trace back the arrows in our final array,
diagonal arrow encountered gives us another in the manner just described, to determine that
element of the LCS. the LCS is 11010 and see which elements
within the two sequences match.
As l(8, 6) points to l(7, 6) so we know that the
LCS is the LCS of X7 and Y6.
j 0 1 2 3 4 5 6
Now l(7, 6) has a diagonal arrow, pointing to i yj 1 1 0 1 1 0
l(6, 5) so in this case we have found the last 0 xi 0 0 0 0 0 0 0
entry of the LCS — namely it is x7 = y6 = 0. 1 0 0 ↑0 ↑0 "1 ←1 ←1 "1
2 1 0 "1 "1 ↑1 "2 "2 ←2
Then l(6, 5) points (upwards) to l(5, 5), which 3 1 0 "1 "2 ←2 "2 "3 ←3
points diagonally to l(4, 4) and hence 1 is the 4 0 0 ↑1 ↑2 "3 ←3 ↑3 "4
second-last entry of the LCS.
5 1 0 "1 "2 ↑3 "4 "4 ↑4
6 0 0 ↑1 ↑2 "3 ↑4 ↑4 "5
Proceeding in this way, we find that the LCS is
7 0 0 ↑1 ↑2 "3 ↑4 ↑4 "5
11010 8 1 0 ↑1 "2 ↑3 "4 "5 ↑5

Notice that if at the very final stage of the A match occurs whenever we encounter a
algorithm (where we had a free choice) we had diagonal arrow along the reverse path.
chosen to make l(8, 6) point to l(8, 5) we
would have found a different LCS See section 15.4 of CLRS for the pseudo-code
11011 for this algorithm.

45 46

Data Compression Algorithms

Data compression algorithms exploit patterns


Analysis in data files to compress the files. Every
compression algorithm should have a
The analysis for longest common subsequence corresponding decompression algorithm that
is particularly easy. can recover (most of) the original data.

After initialization we simply fill in mn entries Data compression algorihtms are used by
in the table — with each entry costing only a programs such as WinZip, pkzip and zip. They
constant number of comparisons. Therefore are also used in the definition of many data
the cost to produce the table is Θ(mn) formats such as pdf, jpeg, mpeg and .doc.

Following the trail back to actually find the Data compression algorithms can either be
LCS takes time at most O(m + n) and lossless (e.g. for archiving purposes) or lossy
therefore the total time taken is Θ(mn). (e.g. for media files).

We will consider some lossless algorithms


below.

47 48
Simplification

Let C be the set of characters we are working


with. To simplify things, let us suppose that
we are storing only the 10 numeric characters
Huffman coding 0, 1, . . ., 9. That is, set C = {0, 1, · · · , 9}.

A nice application of a greedy algorithm is A fixed length code to store these 10


characters would require at least 4 bits per
found in an approach to data compression
character. For example we might use a code
called Huffman coding. like this:

Suppose that we have a large amount of text Char Code


0 0000
that we wish to store on a computer disk in an
1 0001
efficient way. The simplest way to do this is 2 0010
simply to assign a binary code to each 3 0011
character, and then store the binary codes 4 0100
consecutively in the computer memory. 5 0101
6 0110
7 0111
The ASCII system for example, uses a fixed
8 1000
8-bit code to represent each character. Storing 9 1001
n characters as ASCII text requires 8n bits of
memory. However in any non-random piece of text,
some characters occur far more frequently than
others, and hence it is possible to save space
by using a variable length code where the more
frequently occurring characters are given
shorter codes.
49 50

A good code

What would happen if we used the following


Non-random data code to store the data rather than the fixed
length code?
Consider the following data, which is taken
from a Postscript file. Char Code
0 1
Char Freq 1 010
5 1294 2 01111
9 1525 3 0011
6 2260 4 00101
4 2561 5 011100
2 4442 6 00100
3 5960 7 0110
7 6878 8 000
8 8865 9 011101
1 11610
0 70784 To store the string 0748901 we would get

0000011101001000100100000001
Notice that there are many more occurrences
of 0 and 1 than the other characters. using the fixed length code and

10110001010000111011010
using the variable length code.

51 52
Prefix codes

In order to be able to decode the variable


length code properly it is necessary that it be a
prefix code — that is, a code in which no Cost of a tree
codeword is a prefix of any other codeword.
Now assign to each leaf of the tree a value,
Decoding such a code is done using a binary f (c), which is the frequency of occurrence of
tree. the character c represented by the leaf.

Let dT (c) be the depth of character c’s leaf in


""$
$$
0""
"" $$
$$
1 the tree T .
"" $

"$$
0
"
0
""
"" $$
$$ 1
"" $$
Then the number of bits required to encode a
!!# !!#
file is
0
!
#
#
1 0!
#
#
1 !
8 1 B(T ) = f (c)dT (c)
0!!!###1 0!!!###1 c ∈C
3 7 which we define as the cost of the tree T .
0!!!###1 0!!!###1
6 4 2
0!!!###1
5 9

53 54

For example, the number of bits required to


store the string 0748901 can be computed from
the tree T : Optimal trees

d=0 A tree representing an optimal code for a file is


""$
$$
"" $$
"" $$ always a full binary tree — namely, one where
"" $

"$$$
0:2 d=1 every node is either a leaf or has precisely two
"
"" $$
"" $$
"" $ children.
d=2
!# !#
! # ! #
! # ! #
8:1 1:1 d=3 Therefore if we are dealing with an alphabet of
!#
! #
!#
! # s symbols we can be sure that our tree has
! # ! #
3:0 7:1 d=4 precisely s leaves and s − 1 internal nodes, each
!!#
# !!##
! # ! # with two children.
6:0 4:1 2:0 d=5
!!#
#
! # Huffman invented a greedy algorithm to
5:0 9:1 d=6
construct such an optimal tree.

giving The resulting code is called a Huffman code


B(T ) = 2×1+1×3+1×3+1×4+1×5+1×6 = 23. for that file.

Thus, the cost of the tree T is 23.

55 56
Huffman’s algorithm

The first few steps


The algorithm starts by creating a forest of s
single nodes, each representing one character,
and each with an associated value, being the Given the data above, the first two entries off
the priority queue are 5 and 9 so we create a
frequency of occurrence of that character.
These values are placed into a priority queue new node
(implemented as a linear array).
2819
! "
! "
5:1294 9:1525 6:2260 4:2561 2:4442 5:1294 9:1525

3:5960 7:6878 8:8865 1:11610 0:70784


The priority queue is now one element shorter,
as shown below:
Then repeat the following procedure s − 1
times: 6:2260 4:2561 2819 2:4442 ...
!! ""
! "
Remove from the priority queue the two nodes 5:1294 9:1525
L and R with the lowest values, and create a
internal node of the binary tree whose left child The next two are 6 and 4 yielding
is L and right child R.
2819 2:4442 4821 · · ·
! " ! "
! " ! "
Compute the value of the new node as the 5:1294 9:1525 6:2260 4:2561
sum of the values of L and R and insert this
into the priority queue.

57 58

Why does it work?

In order to show that Huffman’s algorithm


Now the smallest two nodes are 2 and the works, we must show that there can be no
internal node with value 2819, hence we now prefix codes that are better than the one
get: produced by Huffman’s algorithm.

4821 3:5960 7:6878 7261 · · · The proof is divided into two steps:
! " ! "
! " ! "
6:2260 4:2561 2819 2:4442 First it is necessary to demonstrate that the
! "
! "
5:1294 9:1525 first step (merging the two lowest frequency
characters) cannot cause the tree to be
Notice how we are growing sections of the tree non-optimal. This is done by showing that any
from the bottom-up (compare with the tree on optimal tree can be reorganised so that these
slide 16). two characters have the same parent node.
(see CLRS, Lemma 16.2, page 388)
See CLRS (page 388) for the pseudo-code
corresponding to this algorithm. Secondly we note that after making an optimal
first choice, the problem can be reduced to
finding a Huffman code for a smaller alphabet.
(see CLRS, Lemma 16.3, page 391)

59 60
Algorithms: Adaptive Huffman Coding

The Adaptive Huffman Coding algorithms


seek to create a Huffman tree on the fly. A
Huffman Coding allows us to encode frequently
Adaptive Huffman Coding occurring characters in a lesser number of bits
than rarely occurring characters. Adaptive
Huffman Coding determines determines the
Huffman coding requires that we have accurate
Huffman Tree only from the frequencies of the
estimates of the probablities of each character characters already read.
occuring.
Recall that prefix codes are defined using a
In general, we can make estimates of the binary tree. It can be shown that a prefix code
frequencies of characters occuring in English is optimal if and only if the binary tree has the
sibling property.
text, but these estimates are not useful when
we consider other data formats. A binary tree recording the frequency of
characters has the sibling property iff
Adaptive Huffman coding calculates character
frequencies on the fly and uses these dynamic 1. every node except the root has a sibling.
frequencies to encode characters. This
technique can be applied to binary files as well
2. each right-hand sibling (including non-leaf
as text files.
nodes) has at least as high a frequency as
its left-hand sibling

(The frequency of non-leaf nodes ins the sum


of the frequency of it’s children).
61 62

Adaptive Huffman Coding

Adaptive Huffman Coding As characters are read it is possible to


efficiently update the frequencies, and modify
the binary tree so that the sibling property is
preserved. It is also possible to do this in a
60
0
""
""$$$
$$ 1 deterministic way so that a similar process can
"" $$
"" $
27 33 decompress the code.
" $$ "$
0
""
"" $$
$$
1
"" $
12 15 See
0!!!###1 0!!!###1 http://www.cs.duke.edu/ jsv/Papers/Vit87.jacmACMversion.pdf
6 6 7 8 for more details.
0!!!###1 0!!!###1
3 3 4 4
0!!!###1 0!!!###1 As opposed to the LZ algorithms that follow,
1 2 2 2 Huffman methods only encode one character
0!!!###1 at a time. However, best performance often
1 1 comes from combining compression algorithms
(for example, gzip combines LZ77 and
Adaptive Huffman Coding).

63 64
Ziv-Lempel compression algorithms Algorithms: LZ77

The Ziv-Lempel compression algorithms are a The LZ77 algorithms use a sliding window.
family of compression algorithms that can be The sliding window is a buffer consisting of the
applied to arbitrary file types. last m letters encoded (a0...am−1) and the next
n letters to be encoded (b0...bn−1).
The Ziv-Lempel algorithms represent recurring
strings with abbreviated codes. There are two Initially we let a0 = a1 = ... = an−1 = w0 and
main types: output "0, 0, w# where w0 is the first letter of
the word to be compressed

• LZ77 variants use a buffer to look for


The algorithm looks for the longest prefix of
recurring strings in a small section of the
b0...bn−1 appearing in a0...am−1. If the longest
file.
prefix found is b0...bk−1 = ai...ai+k−1, then the
entire prefix is encoded as the tuple
• LZW variants dynamically create a "i, k, bk #
dictionary of recurring strings, and assigns
a simple code to each such string. where i is the offset, k is the length and bk is
the next character.

65 66

LZ77 Example cont.

LZ77 Example To decompress the code we can reconstruct


the sliding window at each step of the
Suppose that m = n = 4 and we would like to algorithm. Eg, given
compress the word w = aababacbaa
"0, 0, a#"0, 2, b#"2, 3, c#"1, 2, a#

Word Window Output Input Window Output


aababacbaa "0, 0, a# "0, 0, a#

aababacbaa aaaa aaba "0, 2, b# "0, 2, b# aaaa aab? aab

abacbaa aaab a abac "2, 3, c# "2, 3, c# aaab a abac abac

baa abac baa "1, 2, a# "1, 2, a# abac baa? baa

This outputs Note the trick with the third triple "2, 3, c# that
allows the look-back buffer to overflow into the
"0, 0, a#"0, 2, b#"2, 3, c#"1, 2, a#
look ahead buffer. See
http://en.wikipedia.org/wiki/LZ77 and LZ78 for
more information.

67 68
Algorithms: LZW
Algorithms: LZW

The LZW algorithms use a dynamic dictionary


The decompression algorithm is as follows:
The dictionary maps words to codes and is
initially defined for every byte (0-255). The
k = next byte
compression algorithm is as follows:
output k
w = k
w = null
while(k = next byte)
while(k = next byte)
if there’s no dictionary entry for k
if wk in the dictionary
entry = w + first letter of w
w = wk
else
else
entry = dictionary entry for k
add wk to dictionary
output entry
output code for w
add w + first letter of entry to dictionary
w = k
w = entry
output code for w

69 70

LZW Example cont.


LZW Example

To decompress the code "00140250# we


Consider the word w = aababa, and a dictionary
initialize the dictionary as before. Then
D where D[0] = a, D[1] = b and D[2] = c. The
compression algorithm proceeds as follows:
Read Do Output
Read Do Output
0 w=a a
a w=a −
0 w = a, D[3] = aa a
a w = a, D[3] = aa 0
1 w = b, D[4] = ab b
b w = b, D[4] = ab 0
4 w = ab, D[5] = ba ab
a w = a, D[5] = ba 1
0 w = a, D[6] = aba a
b w = ab −
2 w = c, D[7] = ac c
a w = a, D[6] = aba 4
5 w = ba, D[8] = cb ba
c w = c, D[7] = ac 0
0 w = a, D[9] = baa a
b w = b, D[8] = cb 2
a w = ba −
a w = a, D[9] = baa 5 See
0 http://en.wikipedia.org/wiki/LZ77 and LZ78,
also.

71 72
Summary Summary cont.

1. String matching is the problem of finding 5. The Boyer-Moore algorithm uses the bad
all matches for a given pattern, in a given character and good suffix heuristics to give
sample of text. the best performance in the expected case.

2. The Rabin-Karp algorithm uses prime 6. The longest common subsequence problem
numbers to find matches in linear time in is can be solved using dynamic
the expected case. programming.

3. A String matching automata works in linear 7. Dynammic programming can improve the
time, but requires a significant amount of efficiency of divide and conquor algorithms
precomputing. by storing the resul;ts of sub-computations
so they can be reused later.

4. The Knuth-Morris-Pratt uses the same


principal as a string matching automata, 8. Data Compression algorithms use pattern
but reduces the amount of precomputation matching to find efficient ways to compress
required. file.

73 74

Summary cont.

9. Huffman coding uses a greedy approach to


recode the alphabet with a more efficient
binary code.

10. Adaptive Huffman coding uses the same


approach, but with the overhead of
precomputing the code.

11. LZ77 uses pattern matching to express


segments of the file in terms of recently
occuring segments.

12. LZW uses a hash function to store


commonly occuring strings so it can refer
to them by their key.

75
Computer Science and Software Engineering, 2011

Algorithm Design

In this section we will consider some general


algorithmic techniques for optimization
problems — namely greedy algorithms,
dynamic programming and Approximation
CITS3210 Algorithms Algorithms.

Optimization Algorithms A greedy algorithm proceeds by making a


single choice at each stage of the computation
— at each stage the algorithm chooses the
“best” move to make based on purely local
information. Previously seen examples include
Kruskal’s algorithm, Prim’s algorithm and
Huffman Coding.

Greedy algorithms are usually extremely


efficient, but they can only be applied to a
small number of problems.

Notes by CSSE, Comics by xkcd.com


1 2

Greedy Algorithms Intervals

Consider the following simple computational There is an obvious relationship between


activities and intervals on the real line.
problem.
An interval of the real line consists of the real
ACTIVITY SELECTION numbers lying between two reals called the
Instance: A set S = {t1, t2, . . . , tn} of endpoints of the interval.
“activities” where each activity ti has an
(a, b) = {x ∈ R | a < x < b}
associated start time si and finish time fi .
Question: Select the largest possible number
If the interval includes its endpoints then it is
of tasks from S that can be completed without
said to be closed, otherwise open. It can also
incompatibilities (two activities are
be open at one endpoint and closed at the
incompatible iff they overlap).
other.
(a, b) = {x ∈ R | a < x < b}
Example Consider the following set of
activities [a, b) = {x ∈ R | a ≤ x < b}
{(6, 9), (1, 10), (2, 4), (1, 7), (5, 6), (8, 11), (9, 11)} (a, b] = {x ∈ R | a < x ≤ b}

[a, b] = {x ∈ R | a ≤ x ≤ b}
The following schedules are all allowable

For definiteness we will henceforth make the


(1, 10) assumption that all the activity intervals are
(1, 7), (8, 11) closed on the left and open on the right.
(2, 4), (5, 6), (9, 11) ti = [si, fi)
3 4
Problem reduction Greedy approach

It is easy to see that choosing [1,7) was a bad


To solve this problem we must make some choice, which raises the question of what
choice of the first interval, then the second would be a good choice?
interval and so on. Clearly the later choices
depend on the earlier ones in that some A greedy algorithm simply chooses what is
locally the best option at every stage. There
time-slots are no longer available.
are various possible ways to be greedy,
including
Suppose that we (arbitrarily) select the interval
[1, 7) from the collection
• Choose the shortest interval
{[6, 9), [1, 10), [2, 4), [1, 7), [5, 6), [8, 11), [9, 11)}.
Then all the intervals that overlap with this • Choose the interval starting the first
one can no longer be scheduled, leaving the set
• Choose the interval finishing the first
{[8, 11), [9, 11)}
from which we must choose the largest • Choose the interval that intersects with the
possible set of pairwise disjoint intervals — in fewest others
this case just one of the remaining intervals.
The greedy approach can be viewed as a very
This is simply a smaller instance of the same local procedure — making the best choice for
problem ACTIVITY SELECTION. Therefore the current moment without regard for any
possible future consequences of that choice.
an algorithm for the problem can be expressed
recursively simply by specifying a rule for Sometimes a greedy approach yields an optimal
choosing one interval. solution, but frequently it does not.
5 6

Activity Selection
Algorithm
Consider the greedy approach of selecting the
interval that finishes first from the collection As a precondition the list of tasks must be
{[6, 9), [1, 10), [2, 4), [1, 7), [5, 6), [8, 11), [9, 11)} sorted into ascending order of their finish times
to ensure
Then we would choose [2, 4) as the first finish(t1) ≤ finish(t2) ≤ finish(t3) ≤ . . .
interval, and after eliminating clashes we are
left with the task of finding the largest set of
mutually disjoint intervals from the set The pseudo-code will then process the sorted
list of tasks t:
{[6, 9), [5, 6), [8, 11), [9, 11)}.
procedure GREEDY-ACTIVITY-SEL(t)
At this stage, we simply apply the algorithm A ← {t1}
recursively. Therefore being greedy in the same i←1
way we select [5, 6) as the next interval, and for m ← 2 to length(t) do
after eliminating clashes (none in this case) we if start(tm) ≥ finish(ti ) then
are left with.
A ← A ∪ {tm}
{[6, 9), [8, 11), [9, 11)}. i←m
end if
Continuing in this way gives the ultimate result end for
that the largest possible collection of return A
non-intersecting intervals is
It returns A, a subset of compatible activities.
[2,4) then [5,6) then [6,9) then [9,11).
7 8
Does it work?
Intuitive Proof
The greedy algorithm gives us a solution to the
activity scheduling problem — but is it actually The formal proof that we can use t1 as the
the best solution, or could we do better by first task and be certain that we will not
considering the global impact of our choices
change the number of compatible tasks is
more carefully.
rather involved and you are referred to the text
book (see CLRS, pages 373-375).
For the problem ACTIVITY SELECTION
we can show that the greedy algorithm always
finds an optimal solution. However the basic idea is a proof by
contradiction. Assume using t1 results in a
We suppose first that the activities are ordered sub-optimal solution and therefore we can find
by finishing time - so that a compatible solution with (k + 1) tasks. This
would only be possible if we can find two tasks
f1 ≤ f2 ≤ · · · ≤ fn
t"1 and t""1 which occupy the same interval as t1.
But this would imply
Now consider some optimal solution for the
problem consisting of k tasks (s1 ≤ (s"1 < f1" ) ≤ (s""1 < f1"") ≤ f1)
and hence that f1" < f1 but we know that the
ti1 , ti2 , . . . , tik
tasks are sorted in order of ascending finish
times, so no task can have a finish time less
Then
that that of t1, leading to a contradiction.
t1, ti2 , . . . , tik Hence using t1 as the first task must lead to
is also an optimal solution since it will also an optimal solution with k tasks.
consist of k tasks.
9 10

Vertex Cover
Running time
A vertex cover for a graph G is a set of
The running time for this algorithm is vertices V " ⊆ V (G) such that every edge has at
dominated by the time taken to sort the n least one end in V " (the set of vertices covers
inputs at the start of the algorithm. all the edges.

Using quicksort this can be accomplished in an The following graph


average time of O(n lg n). ! ! !

As greedy algorithms are so simple, they always !


!
!
!
!
! !
! !
have low degree polynomial running times. ! !
! !! !!

Because they are so quick, we might be


has a vertex cover of size 4.
tempted to ask why we should not always use
! " !
greedy algorithms.

" ! "
! !
Unfortunately, greedy algorithms only work for ! !
! !
! !
a certain narrow range of problems — most ! !! !"

problems cannot be solved by a greedy


algorithm. The VERTEX COVER problem is to find the
smallest vertex cover for a graph G.

11 12
A greedy algorithm

One greedy algorithm is to cover as many Greed is not always good


edges as possible with each choice, by
choosing the vertex of highest degree at each The previous example shows that it does not
stage and then deleting the covered edges. always pay to be greedy. Although choosing
the vertex of highest degree does cover the
For this graph
greatest number of edges, that choice makes
! !
our later choices worse.

! ! ! ! !
In problems where the greedy algorithm works,
! !
the earlier choices do not interfere negatively
with the later choices.
the greedy algorithm gives
! " Unfortunately, most problems are not amenable
to the greedy algorithm.
" ! " ! "

VERTEX COVER is actually a very hard


" !
problem, and there is no known algorithm that
is essentially better than just enumerating all
while the true solution is
the possible subsets of vertices. (Technically
" !
speaking, it is an example of a problem that is
! " ! " !
known to be NP-hard.)

! "
!

13 14

More NP-problems
Non-deterministic polynomial time

Consider the following two problems:


A computational problem is in the class P,
(the polynomial time problems) if there is a
TRAVELLING SALESMAN
deterministic algorithm that solves the problem
Instance: A finite set C = {c1, c2, . . . , cn} of
and runs in time O(nk ) where k is some
“cities”, a “distance” d(ci , cj ) ∈ R+ between
integer. These problems are generally
each pair of cities.
considered feasible.
Question: What is the shortest circular tour
visiting each city exactly once?
A computational problem is in the class NP,
(the non-deterministic polynomial time
DOMINATING SET
problems) if there is a non-deterministic
Instance: A graph G
algorithm that that can solve the problem in
Question: What is the smallest dominating set
polynomial time.
for G?

That is, an NP algorithm requires lucky


(A dominating set of a graph is a set of
guesses to work efficiently (i.e. guessing what
vertices V " ⊆ V such that every vertex of G has
the optimal vertex cover is).
distance at most 1 from some vertex in V ".)

15 16
How hard are these problems?

There are no algorithms known for these


problems whose time complexity is a The 0-1 Knapsack Problem
polynomial function of the size of the input.
This means that the only known algorithms Suppose we are given a knapsack of a given
take time that is exponential in the size of the capacity, and a selection of items, each with a
input. given weight and value. The 0-1 knapsack
problem is to select the combination of items
There is a large class of problems, known as with the greatest value that will fit into the
NP-hard problems which have the following knapsack.
properties
Formally, if W is the size of the knapsack and
• There is no polynomial time algorithm {1, ...n} is a set of items where the weight of i
known for the problem is wi and the value of i is vi, then the problem
is to:
• If you could solve one of these problems in
Select T ⊆ {1, ..., n} that maximizes Σi∈T vi,
polynomial time, then you could solve them
given Σi∈T wi < W .
all in polynomial time

For example W might be the amount of


Both TRAVELLING SALESMAN and memory on an MP3 player, wi may be the size
DOMINATING SET are NP-hard. of the song i, and vi may reflect how much you
like song i.
The most important problem in theoretical
computer science is whether or not this class
of problems can be solved in polynomial time.
17 18

A dynamic programming solution

The structure of a dynamic programming


The Fractional Knapsack Problem algorithm is to:

The fractional knapsack problem is similar,


1. define the solution to the problem in terms
except that rather than choosing which items
of solutions to sub-problems;
to take, you are able to choose how much of
each item you will take. That is the problem is
to find a function T : {1, ..., n} → [0, 1] that 2. recursively solve the smaller sub-problems,
maximizes Σi∈T T (i)vi, given Σi∈T T (i)wi < W . recording the solutions in a table;

It is easy to see that the fractional knapsack


problem can be solved by a greedy algorithm. 3. construct the solution to the original
However the 0-1 knapsack problem is much problem from the table.
harder, and has been shown to be
NP-complete.

While there is no known “feasible” solution for


the 0-1 knapsack problem we will examine a
dynamic programming solution that can give
reasonable performance.

19 20
A dynamic programming solution

Often inefficient recursive algorithms can be


A recursive solution made more efficient by using dynamic
programming. The structure of a dynamic
Given the 0-1 knapsack problem specified by programming algorithm is to:
the pair ({w1, ...wn}, {v1, ..., vn}, W ), we will
consider the solution to the sub-problems
1. define as recursive solution to the problem
specified by the pairs ({w1, ...wm}, {v1, ..., vm}, w)
in terms of solutions to sub-problems;
where m < n and w < W .

Let V (m, w) be the value of the optimal 2. recursively solve the smaller sub-problems,
solution to this subproblem. Then for any m recording the solutions in a table;
and any w, we can see

V (m, w) = max{V (m−1, w), vm+V (m−1, w−wm)}. 3. construct the solution to the original
problem from the table.
Since V (0, w) = 0 for all w this allows us to
define a (very inefficient) recursive algorithm.
For the 0-1 knapsack problem we will construct
a table where the entries are V (i, j) for
i = 0, ..., n and j = 0, ..., W .

21 22

Pseudo-code
Example

Knapsack({w1, ...wn}, {v1, ..., vn}, W )


Suppose W = 5 and we are given three items
for w from 1 to n do
where
V (0, w) ← 0
for i from 1 to n do
i 1 2 3
vi 2 3 4 for w from 1 to n do
wi 1 2 3 if V (i − 1, w) > vi + V (i − 1, w − wi) do
V (i, w) ← V (i − 1, w)
The table initially looks like else
V (i, w) ← vi + V (i − 1, w − wi)
i\w 0 1 2 3 4 5 return V (n, W )
0 0 0 0 0 0 0
1
It is clear that the complexity of this algorithm
2
3 is O(nW ). Note that this is not a polynomial
solution to an NP-complete problem. Why not

23 24
Linear Programming

Example
The fractional knapsack problem is an example
of a linear programming problem. A linear
i\w 0 1 2 3 4 5
0 0 0 0 0 0 0 programme is an optimization problem of the
1 0 2 2 2 2 2 form:
2 0 2 3 4 4 4 Find real numbers: x1, ..., xn
3 0 2 3 4 6 7 that maximizes Σn i=1ci xi
subject to Σn i=1 ij xi ≤ bj for j = 1, ..., m
a
Note that the actual items contributing to the and xi ≥ 0 for j = 1, ..., n.
solution (that is, items 2 and 3) can be found Therefore a linear programme is paramaterized
by examination of the table. If T (i, w) are the by the the cost vector, (c1, ..., cn), an n × m
items that produce the solution V (i, w), then array of constraint coefficients, aij , and a
T (i, w) = T (i − 1, w) if V (i, w) = V (i − 1, w) bounds vector (b1, ..., bm).
= {i} ∪ T (i − 1, w − wi) otherwise.
It is clear the fractional knapsack problem can
be presented as a linear programme.

25 26

Solving linear programmes


Applications of linear programming
All linear programmes can be solved by the
Many natural optimization problems can be simplex algorithm, which requires exponential
expressed as a linear programme. time, but is generally feasible in practise.

For example, given a weighted, directed graph, The simplex algorithm is effectively a
G = (V, E), the length of shortest path from s hill-climbing algorithm that moves
to t can be described using a linear programme. incrementally improves the solution until no
Using the distance array from the Bellman-Ford further improvements can be made.
algorithm, we have the programme:
Maximize d[t] There are also polynomial interior point
subject to d[v] − d[u] ≤ w(u, v) for j = 1, ..., m methods to solve linear programmes.
and d[s] = 0.
Maximum flow problems can also be easily We won’t examine these algorithms. Rather we
converted into linear programmes. will simply consider the technique of converting
problems into linear programmes.

27 28
Example
Integer Linear Programming

#
3# Adding the constraint that all solutions to a
#
#
# ! linear programme be integer values, gives an
" # !
2"" #!
integer linear programme.
"" !#
"
!" #
! ""# !
! "#
" !
1 ! #
""
!
! !#""
! ! # "" The 0-1 knapsack problem can be written as
! # ""
0 !
! #
#
""
""
an integer linear programme, as can the
0!
! 1 2 #
#
3 ""
travelling salesmen problem.
#

Maximize x + y, where x + 2y ≤ 4 Therefore we should not expect to find a


y−x≤1 feasible algorithm to to solve the integer linear
x−y ≤1 programming problem.
x, y ≥ 0

29 30

Approximation Algorithms
Standard Instances
An approximation algorithm is an algorithm
that produces some feasible solution but with
Both TRAVELLING SALESMAN and
no guarantee that the solution is optimal.
DOMINATING SET have been fairly
extensively studied, and a number of algorithms
Therefore an approximation algorithm for the for their solution have been proposed.
travelling salesman problem would produce
some valid circular tour, but it may not be the
In each case there are some standard instances
shortest tour.
for would-be solvers to test their code on. A
package called TSPLIB provides a variety of
An approximation algorithm for the minimum standard travelling salesman problems. Some
dominating set problem would produce some of them have known optimal solutions, while
dominating set for G, but it may not be the others are currently unsolved and TSPLIB just
smallest possible dominating set. records the best known solution.

The performance of an approximation There are problems with around 2000 cities for
algorithm on a given instance I is measured by which the best solution is not known, but this
the ratio problem has been very heavily studied by
A(I)/OP T (I) brilliant groups of researchers using massive
computer power and very sophisticated
where A(I) is the value given by the
techniques.
approximation algorithm and OP T (I) is the
true optimum value.

31 32
The football pool problem

In many European countries a popular form of Winning 2nd prize


lottery is the “football pools”, which are based
on the results of soccer matches. Each player Now there are a total of 729 possible outcomes
picks the results of n matches, where the result for the 6 matches. To guarantee winning the
can be either a Home Win, Away Win or Draw. first prize we would need to make 729 different
entries to cover every possible outcome.
By assigning three values as follows
Suppose however that getting all but one of
the predictions correct results in winning
0 for Home Win second prize. So for example if our entry was
1 for Away Win 020201 and the actual outcome was 010201
2 for Draw then we would have 5 out of 6 correct and
would win second prize.
we can think of this choice as a word of length
n with entries from the alphabet {0, 1, 2}. In trying to generate pools “systems” we want
to be able to answer the question
For example
“How many entries do we need to make in
020201
order to guarantee winning at least second
would mean that the player had picked Home prize?”
Wins for matches 1, 3 and 5, Away Win for
match 6 and Draws for matches 2 and 4.

33 34

A graph domination problem


Known records
We can define a graph F6 as follows:
The following are the best known values for a
minimum dominating set for Fn.
The vertices of F6 are the 729 words of
length 6 over {0, 1, 2}. n Number vertices Best known dom. set
Two vertices are adjacent if the 2 9 3
corresponding words differ in only one 3 27 5
4 81 9
coordinate position.
5 243 27
6 729 ≤ 73
7 2187 ≤ 186
Then we are seeking a minimum dominating
8 6561 ≤ 486
set for the graph F6.
Notice that the minimum dominating set for F4
More generally, we can define a series of is perfect — each vertex is adjacent to 8
graphs Fn where the vertices are the 3n words others, so that each vertex of the dominating
of length n with entries from {0, 1, 2} with the set dominates 9 vertices. As there are 81
same rule for determining adjacency. vertices in F4 this means every vertex is
dominated by exactly one vertex in the
This collection of graphs is called the football dominating set.
pool graphs and has been quite extensively
studied with regard to the size of the minimum This is usually called a perfect code.
dominating set.

35 36
Types of Travelling Salesman Instance
A greedy approximation algorithm
Consider a travelling salesman problem defined
There is a natural greedy approximation in the following way. The “cities” are n
algorithm for the minimum dominating set randomly chosen points ci = (xi, yi) on the
problem. Euclidean plane, and the “distances” are
defined by the normal Euclidean distance
Start by selecting a vertex of maximum degree !
d(ci , cj ) = (xi − xj )2 + (yi − yj )2
(so it dominates the greatest number of
vertices). Then mark or delete all of the or the Manhattan distance
dominated vertices, and select the next vertex
d(ci , cj ) = |xi − xj | + |yi − yj |
that dominates the greatest number of
currently undominated vertices. Repeat until
all vertices are dominated. Instance of the travelling salesman problem
that arise in this fashion are called geometric
The graph P5 (a path with 5 vertices) shows travelling salesman problems. Here the
that this algorithm does not always find the “distance” between the cities is actually the
optimal solution. geometric distance between the corresponding
points under some metric.

37 38

Non-geometric instances
Properties of geometric instances
Of course it is easy to define instances that are
All geometric instances have the properties not geometric.
that they are symmetric and satisfy the
triangle inequality. Let X = {A, B, C, D, E, F }

If Let d be given by
d(ci , cj ) = d(cj , ci)
A B C D E F
for all pairs of cities in an instance of A 0 2 4 ∞ 1 3
TRAVELLING SALESMAN then we say that B 2 0 6 2 1 4
the instance is symmetric. C 4 ∞ 0 1 2 1
D ∞ 2 1 0 9 1
If E 1 1 2 6 0 3
F 3 4 1 1 3 0
d(ci , ck ) ≤ d(ci, cj ) + d(cj , ck )
for all triples of cities in an instance of Many approximation algorithms only work for
TRAVELLING SALESMAN then we say that geometric instances because it is such an
the instance satisfies the triangle inequality. important special case, but remember that it is
only a special case!

39 40
A geometric instance of NN

Nearest Neighbour The best case gave a tour of length 636.28

One example of an approximation algorithm is


the following greedy algorithm known as
Nearest Neighbour (N N ). Starting at city 38. Tour length is 636.28

90

• Start at a randomly chosen vertex


80

• At each stage visit the closest currently 70


unvisited city
60

50

y
For an n-city instance of TRAVELLING
SALESMAN this algorithm takes time O(n2). 40

30

For any instance I, let N N (I) be the length of 20

the tour found by N N and let OP T (I) be the


10
length of the optimal tour. Then
N N (I)/OP T (I) is a measure of how good this 0 20 40 60 80 100
x
algorithm is on a given instance.

Unfortunately this is not very good.

41 42

A geometric instance of NN

The worst case gave a tour of length 842.94 Approximation algorithms

Theorem For any constant k > 1 there are


instances of TRAVELLING SALESMAN such
Starting at city 12. Tour length is 842.94 that N N (I) ≥ k OP T (I).
90
Even more seriously this is not just because
80
N N is not sufficiently sophisticated — we
70 cannot expect good behaviour from any
60
polynomial time heuristic.

50
y

Theorem Suppose A is a polynomial time


40
approximation algorithm for TRAVELLING
30 SALESMAN such that A(I) ≤ k OP T (I) for
some constant k. Then there is a polynomial
20
time algorithm to solve TRAVELLING
10
SALESMAN.
0 20 40 60 80 100
x
Therefore it seems hopeless to try to find
decent approximation algorithms for
TRAVELLING SALESMAN.

43 44
Search the tree . . .
Minimum spanning tree
Perform a depth first search on the minimum
Suppose that we have an instance I of spanning tree.
TRAVELLING SALESMAN that is symmetric
and satisfies the triangle inequality. Then the
following algorithm called M ST is guaranteed
to find a tour that is at most twice the optimal
length.

• Find a minimum spanning tree for I


• Do a depth-first search on the tree
• Visit the vertices in order of discovery time

Then 1 16 15 14

M ST (I) ≤ 2 OP T (I).
2 3 12 13
In order to see why this works, we first observe
that removing one edge from the optimal tour
yields a spanning tree for I, and therefore the 5 4 8 10

weight of the minimum spanning tree is less


than the length of the shortest tour.
7 6 9 11
45 46

. . . and take shortcuts

If we were to simply follow the path of the Coalesced simple paths


depth-first search algorithm — including the
backtracking — we would walk along each The method of coalesced simple paths uses a
edge exactly once in each direction, creating a greedy method to build up a tour edge by
tour that has length exactly twice the weight edge. At every stage the “partial tour” is a
of the minimum spanning tree, but is illegal collection of simple paths.
because it visits some vertices twice.

• Sort the edges into increasing weight


The simple solution is to just take “shortcuts”
according to the ordering of the vertices. • At each stage add the lowest weight edge
possible without creating a cycle or a
1 16 15 14 vertex of degree 3.
• Join the ends of the path to form a cycle

2 3 12 13
$
$ This algorithm proceeds very much like
$
$
$ Kruskal’s algorithm, but the added simplicity
$
5 4 " 8 $
10 means that the complicated union-find data
"" $
! "" $ #
!
! "
"" $#
#$
structure is unnecessary.
!"" # $
""! # $
"" ! # $
7 6 9 11

47 48
Insertion methods Three insertion techniques

There is a large class of methods called Random insertion


insertion methods which maintain a closed
cycle as a partial tour and at each stage of the
At each stage the next vertex x is chosen
procedure insert an extra vertex into the partial
randomly from the untouched vertices.
tour.

Nearest insertion
Suppose that we are intending to insert the
new vertex x into the partial tour C (called C
At each stage the vertex x is chosen to be the
because it is a cycle).
one closest to C.

In turn we consider each edge {u, v} of the


Farthest insertion
partial tour C, and we find the edge such that

d(u, x) + d(x, v) − d(u, v) At each stage the vertex x is chosen to be the


one farthest from C.
is a minimum.

(In all three insertion methods the vertex x is


Then the edge {u, v} is deleted, and edges
chosen first and then it is inserted in the best
{u, x} and {x, y} added, hence creating a tour
position.)
with one additional edge.

49 50

Tour found by nearest insertion Tour found by farthest insertion

Nearest insertion tours ranged from 631 to 701 Farthest insertion tours ranged from 594 to
on the above example. 679 on the above example.

Example of a tour found by nearest insertion Example of a tour found by farthest insertion

90 90

80 80

70 70

60 60

50 50
y

40 40

30 30

20 20

10 10

0 20 40 60 80 100 0 20 40 60 80 100
x x

51 52
Tour found by random insertion

A fourth insertion technique


Random insertion tours ranged from 607 to
667 on the above example.
Cheapest insertion

This method is a bit more expensive than the


Example of a tour found by random insertion
other methods in that we search through all
90 the edges {u, v} in C and all the vertices x ∈
/C
80 trying to find the vertex and edge which
minimizes
70

60
d(u, x) + d(x, v) − d(u, v)

50
y

The other three methods can all be


40
programmed in time O(n2) whereas this
30
method seems to require at least an additional
20 factor of lg n.
10

Nearest insertion and cheapest insertion can be


0 20 40
x
60 80 100
shown to produce tours of length no greater
than twice the optimal tour length by their
close relationship to minimum spanning tree
algorithms.

53 54

Improving TRAVELLING SALESMAN tours

Iterative Improvement A basic move for TRAVELLING SALESMAN


problems involves deleting two edges in the
One common feature of the tours produced by tour, and replacing them with two non-edges,
the greedy heuristics that we have seen is that as follows.
it is immediately easy to see how they can be
improved, just by changing a few edges here "" A C !!!!!
"""
" !
and there. " !
" !
" !
" !
" !
"
!
The procedure of iterative improvement refers ! "
! "
to the process of starting with a feasible !
! "
"
! "
solution to a problem and changing it slightly !!
!!!
! "
""
"
B D ""
in order to improve it.

Suppose the tour runs AD, D ! C, CB,


An iterative improvement algorithm involves B ! A. Then deleting AD and CB, we replace
two things them with AC and DB.

• A rule for changing one feasible solution to ""


" A C !!!!!
""
another
• A schedule for deciding which moves to
make

!! "
!!! ""
B D ""
55 56
A state space graph

We can view this process in a more abstract


2-OPT
sense as a heuristic search on a huge graph
called the state space graph.
Consider now an iterative improvement
algorithm that proceeds by examining every
We define the state space graph S(I) for an
pair of edges, and performing an exchange if
instance of TRAVELLING SALESMAN as
the tour can be improved.
follows.

This procedure must eventually terminate, and


the resulting tour is called 2-optimal. The vertices of S(I) consist of all the
feasible tours for the instance I.
There are more complicated “moves” that Two feasible tours T1 and T2 are
involve deleting 3 edges and reconnecting the neighbours if they can be obtained from
tour, and in general deleting k edges and then each other by the edge exchange
reconnecting the tour. process above.

A tour that cannot be improved by a k edge


Each vertex T has a cost c(T ) associated with
exchange is called k-optimal. In practice it is
it, being the length of the tour T .
rare to compute anything beyond 2-optimal or
3-optimal tours.
To completely solve TRAVELLING
SALESMAN requires finding which of the
(n − 1)! vertices of S(I) has the lowest cost.
57 58

Hill-climbing
Searching the state space graph
The simplest heuristic state-space search is
In general S(I) is so vast that it is totally known as hill-climbing.
impossible to write down the entire graph.
The rule for proceeding from one state to
The greedy insertion methods all provide us another is very easy
with a single vertex in S(I) (a single tour), and
the iterative improvement heuristics all involve
• Systematically generate neighbours T " of T
doing a walk in S(I) moving along edges from
and move to the first neighbour of lower
tour to neighbouring tour attempting to find
cost than T .
the lowest cost vertex.

In this type of state space searching we have This procedure will terminate when T has no
the concept of a “current” tour T and at each neighbours of lower cost — in this case T is a
stage of the search we generate a neighbour T " 2-optimal tour.
of T and decide whether the search should
proceed to T " or not. An obvious variant of this is to always choose
the best move at each step.

59 60
A local optimum State-space for DOMINATING SET

A hill-climb will always finish on a vertex of We can apply similar methods to the graph
lower cost than all its neighbours — such a domination problem provided that we define
vertex is a local minimum. the state-space graph carefully.

Unfortunately the state space graph has an Suppose that we are trying to see whether a
enormous number of local minima, each of graph G has a dominating set of size k. Then
them possibly tremendously different from the the “states” in the state space graph are all
global minimum. the possible subsets of V (G) of size k. The
“cost” of each can be taken to be the number
If we mentally picture the state space graph as of vertices not dominated by the corresponding
a kind of “landscape” where costs are k-subset. The solution that we are seeking is
represented by heights, then S(I) is a savagely then a state of cost 0.
jagged landscape of enormously high
dimension. Now we must define some concept of
“neighbouring states”. In this situation a
Hill climbing merely moves directly into the natural way to define a neighbouring state is
nearest local optimum and cannot proceed the state that results from moving one of the k
from there. vertices to a different position.

61 62

Heuristic search for graph domination Annealing

We can now apply the hill-climbing procedure Annealing is a physical process used in forming
to this state space graph. crystalline solids.

In this fashion the search “wanders” around At a high temperature the solid is molten, and
the state-space graph, but again it will the molecules are moving fast and randomly. If
inevitably end up in a local minimum from the mixture is very gradually cooled, then as
which there is not escape. the temperature drops the mixture becomes
more ordered, with molecules beginning to
Hill climbing is unsatisfactory because it has no align into a crystalline structure. If the cooling
mechanism for escaping locally optimum is sufficiently slow, then at freezing point the
solutions. Ideally we want a heuristic search resulting solid has a perfect regular crystalline
technique that tries to improve the current structure.
solution but has some method for escaping
local optima. The crystalline structure has the lowest
potential energy, so we can regard the process
Two techniques that have been proposed and as trying to find the configuration of a group
extensively investigated in the last decade or so of molecules with a global minimum potential
are energy.

Annealing is successful because the slow


Simulated Annealing
cooling allows the physical system to escape
Tabu Search
from local minima.
63 64
Simulated annealing Uphill moves

Simulated annealing is an attempt to apply Dynamically altering p is usually done by


these same principles to problems of maintaining a temperature variable t which is
combinatorial optimization. gradually lowered throughout the operation of
the algorithm, and applying the following rules.
For TRAVELLING SALESMAN we regard the
optimal tour as the “crystal” for which we are Suppose that we are currently at a vertex T
searching and the other tours, being less with a cost c(T ). The randomly generated
perfect, as the flawed semi-molten crystals,
neighbour T ! of T has cost c(T ! ) and so if the
while for GRAPH DOMINATION we regard the
move is made then the difference will be
states with cost 0 (that is, genuine dominating
sets) as the “crystals”. ∆c = c(T !) − c(T )

The overall structure of simulated annealing is:


Then the probability of accepting the move is
taken to be
• Randomly generate a neighbour T ! of the
p = exp(−∆c/t)
vertex T
• If c(T ! ) ≤ c(T ) then accept the move to T !
If ∆c < 0, then p > 1, so this corresponds to
• If c(T ! ) > c(T ) then with a certain accepting all moves to a lower cost neighbour.
probability p accept the move to T !
Otherwise, if t is high, then −∆c/t is very small
The probability p of accepting an uphill move is and p ≈ 1. If t is small then −∆c/t will be large
dynamically altered throughout the algorithm. and negative and p ≈ 0.
65 66

Cooling schedule How good is it?

Therefore at high temperatures, almost all Simulated annealing has had success in several
moves are accepted, good or bad, whereas as areas of combinatorial optimization, particularly
the temperature reduces, fewer bad moves are in problems with continuous variables.
accepted and the procedure settles down
again. When t ≈ 0 then the procedure reverts In general it seems to work considerably better
to a hill-climb. than hill-climbing, though it is not clear
whether it works much better than multiple
The value of the the initial temperature and hill-climbs.
the way in which it is reduced is called a
cooling schedule: Each of these combinatorial optimization
heuristics has their own adherents, and
something akin to religious wars can erupt if
• Start with some initial temperature t0 anyone is rash enough to say “X is better than
• Perform N iterations at each temperature Y”.
• Reduce the temperature by a constant
multiplicative factor t ← Kt Experimentation is fraught with problems also,
in that an empirical comparison of techniques
depends so heavily on the test problems that
For example the values t0 = 1, N = 1000, almost any desired result can be convincingly
K = 0.95 might be suitable. produced by careful enough choice.

Performance of this algorithm is highly Nonetheless the literature is liberally dotted


problem-specific and cooling schedule-specific. with “An empirical comparison of . . . and . . .”.
67 68
Tabu search The basic idea

The word tabu (or taboo) means something The basic idea of a tabu search is that it
prohibited or forbidden. always maintains a tabu list detailing the last h
vertices that it has visited.
Tabu search is another combinatorial search
heuristic that combines some of the features of
hill-climbing and simulated annealing. However • Select the best possible neighbour T ! of T .
it can only be used in slightly more restricted • If T ! is not on the tabu list, then move to it
circumstances. and update the tabu list accordingly.

Tabu search attempts to combat two obvious


weaknesses of hill-climbing and simulated We notice that the tabu search is very
annealing — the inability of hill-climbing to aggressive — it always seeks to move in the
escape from local minima, and the early waste best possible direction. Without a tabu list this
of time in simulated annealing where the process would always end in a cycle of length
temperature is very high and the search is 2, with the algorithm flipping between a local
proceeding almost randomly with almost no minimum and its nearest neighbour.
pressure to improve the solution quality.
The tabu list prevents the search from
Tabu search attempts to spend almost all of immediately returning to a recently visited tour
its time close to local minima, while still having and (hopefully) forces it to take a different
the facility to escape them. track out of that local minimum.

69 70

Practical considerations

Tabu search for graph domination


The main problem of tabu search is that at
each iteration it requires complete enumeration
The best dominating sets for the football pool
of the neighbourhood of a vertex — this may
graphs were largely constructed by tabu search
be prohibitively expensive.
techniques, together with a mathematical
construction that reduces the search to smaller
Similarly to choosing a cooling schedule for
but denser graphs.
simulated annealing, a tabu schedule must be
chosen for tabu search. It is important to
There are many practical considerations in
choose the length of the tabu list very carefully
implementing a tabu search — firstly it is
— this is again very problem-specific.
necessary to be very efficient in evaluating the
cost function on the neighbouring states.
On the positive side, tabu search manages to
examine many more “close-to-optimum”
There are also many variants on a tabu search
solutions than simulated annealing.
— for example, only searching a portion of the
neighbourhood of a given state, maybe by
Another positive feature of tabu search is that
concentrating on the moves that are likely to
provided care is taken to prevent cycling, the
result in an improvement rather than all
search can be left running for as long as
possible moves.
resources allow, while the length of a simulated
annealing run is usually fixed in advance.

71 72
A glimpse of GAs

Each solution is encoded as a string.

Genetic algorithms Breeding two strings involves selecting a


position at random, breaking the strings into a
Genetic algorithms provide an entirely different head and tail at that point, and swapping tails.
approach to the problems of combinatorial This operation is referred to as cross-over:
optimization.
ABCDE F GH I J K L
Like simulated annealing, genetic algorithms
try to model a physical process that improves ab c d e f gh i j k l
“quality” — in this case the physical process is
evolution. ABCD e f g h i j k l
a b c d E FGH I J KL
A genetic algorithm proceeds by maintaining a
pool containing many feasible solutions, each Parents are chosen in direct proportion to their
with its associated fitness. fitness so that the fitter strings breed more
often.
At each iteration, a new population of solutions
is created by breeding and mutation, with the Mutation involves arbitrarily altering one of the
fitter solutions being more likely to procreate. elements of the string.

As usual there are several parameters to


fine-tune the algorithm such as population size,
mutation frequency and so on.
73 74

GAs for combinatorial optimization?


Summary
Although GAs have their adherents it may not
be easy to adapt them successfully to 1. Greedy algorithms solve optimization
combinatorial optimization problems such as
problems by searching the best local
TRAVELLING SALESMAN and GRAPH
direction. They are applied in the Activity
DOMINATION.
selection problem, Huffman coding and
some graph algorithms.
The problem here seems to be that there is no
way one can arbitrarily combine two tours to
create a third tour — simply hacking two tours 2. Vertex cover, travelling salesman and the
apart and joining the bits together will not 0-1 knapsack problem are all instances of
work in general. NP-complete problems, (i.e. for which no
feasible algorithm is known).
Similarly, it is hard to come up with a good
representation for a candidate dominating set
in such a way that arbitrary cross-over does 3. A dynamic programming solution exists for
not destroy all its good properties. the 0-1 knapsack problem.

The crucial distinction seems to be that


hill-climbing, simulated annealing and tabu 4. Linear programmes are problems of
search are all local search methods whereas a optimizing a linear cost function, subject to
genetic algorithm is not. linear constraints. They can be applied in
many optimization problems, and may be
solved by the simplex algorithm.
Recommended reading: CLRS, Chapter 35
75 76
The End?

Summary cont.

5. Heuristic algorithms can be applied to


approximate optimal solutions to geometric
instances of the travelling salesman
problem.

6. Other heuristic methods include


hill-climbing, simulated annealing, tabu
search and genetic algorithms.

77 78

You might also like