Merged Lecture Notes
Merged Lecture Notes
Algorithmics I
Algorithmics I, 2022 2
1 Sorting and Tries 3
Sorting - Comparison based sorting
Claim: no sorting algorithm that is based on pairwise comparison
of values can be better than O(n log n)
a1>b1
no yes
a2>b2 a3>b3
no yes no yes
Algorithmics I, 2022 3
1 Sorting and Tries 4
Sorting - Comparison based sorting
Claim: no sorting algorithm that is based on pairwise comparison
of values can be better than O(n log n)
Algorithmics I, 2022 4
1 Sorting and Tries 5
Sorting - Comparison based sorting
We have shown the decision tree has at least n! leaf nodes
Algorithmics I, 2022 5
1 Sorting and Tries 6
Sorting - Comparison based sorting
We have shown: complexity is no better than O(h) and 2h+1 ≥ n!
− h is the height of the decision tree
− n is the number of items to be sorted
h+1 ≥ log2(n!)
> log2(n/2)n/2 (since n! > (n/2)n/2)
= (n/2)log2(n/2) (since log ab = b log a)
= (n/2)log2n - (n/2)log22 (since log a/b = log a – log b)
= (n/2)log2n - n/2 (since logaa = 1)
Algorithmics I, 2022 6
1 Sorting and Tries 7
Sorting – Radix sorting
We haven shown no sorting algorithm that is based on pairwise
comparisons can be better than O(n log n) in the worst case
− therefore to improve on this worst case bound, we have to devise a
method based on something other than comparisons
Algorithmics I, 2022 7
1 Sorting and Tries 8
Sorting – Radix sorting - Algorithm
Each item has bit positions labelled 0,1,…,m-1
− bit 0 being the least significant (i.e. the right-most)
item = 0010100100110001
Algorithmics I, 2022 8
1 Sorting and Tries 9
Sorting – Radix sorting - Algorithm
Each item has bit positions labelled 0,1,…,m-1
− bit 0 being the least significant (i.e. the right-most)
The algorithm uses m/b iterations
− in each iteration the items are distributed into 2b buckets
− a bucket is just a list
length b length b
− the buckets are labelled 0,1,…,2b-1 (or, equivalently, 00...0 to 11...1)
− during the ith iteration an item is placed in the bucket corresponding to
the integer represented by the bits in positions b×i−1,…,b×(i−1)
• e.g. for b=4 and i=2, consider bits in position 7,..,4
item = 0010100100110001
• 0011 represents the integer 3
• so item is placed in the bucket labelled 3 (or, equivalently, 0011)
− at the end of an iteration the buckets are concatenated to give a new
sequence which will be used as the starting point of the next iteration
Algorithmics I, 2022 9
1 Sorting and Tries 10
Sorting – Radix sorting - Example
Suppose we want to sort the following sequence with Radix sort
15 43 5 27 60 18 26 2
Algorithmics I, 2022 10
1 Sorting and Tries 11
Sorting – Radix sorting - Example
Sequence: 15 43 5 27 60 18 26 2
Algorithmics I, 2022 12
1 Sorting and Tries 13
Sorting – Radix sorting - Example
New sequence: 18 2 5 26 43 27 60 15
Algorithmics I, 2022 13
1 Sorting and Tries 14
Sorting – Radix sorting - Pseudocode
// suppose that:
// a is the sequence to be sorted
// m is the number of bits in each item of the sequence a
// b is the ‘block length’ of radix sort
Algorithmics I, 2022 14
1 Sorting and Tries 15
Sorting – Radix sorting - Pseudocode
Algorithmics I, 2022 15
1 Sorting and Tries 16
Sorting – Radix sorting - Correctness
Let x and y be two items with x<y
− need to show that x precedes y in the final sequence
Suppose j is the last iteration for which relevant bits of x and y differ
− since x<y and j is the last iteration that x and y differ
the relevant bits of x must be smaller than those of y
− therefore x goes into an ‘earlier’ bucket than y
and hence x precedes y in the sequence after this iteration
Algorithmics I, 2022 16
1 Sorting and Tries 17
Sorting – Radix sorting - Complexity
Number of iterations is m/b and number of buckets is 2b
Time-space trade-off
− the larger the value of b, the smaller the multiplicative constant (m/b) in
the complexity function and so the faster the algorithm will become
− however an array of size 2b is required for the buckets
therefore increasing b will increase the space requirements
Algorithmics I, 2022 17
1 Sorting and Tries 18
Tries (retrieval)
Binary search trees are comparison-based data structures
Example: use a trie to store items with a key value that is a string
− say the words in a dictionary
Algorithmics I, 2022 18
1 Sorting and Tries 19
Tries - Examples
An example trie containing words from a 4 letter alphabet
a t
e r
r t a r a e a r
e
e t e r t a r t a r a e
e e e r e t t a e
r t
Algorithmics I, 2022 19
1 Sorting and Tries 20
Tries - Examples
An example trie containing words from a 4 letter alphabet
a t
e r
r t a r a e a r
e
e t e r t a r t a r a e
e e e r e t t a e
r t
Algorithmics I, 2022 20
1 Sorting and Tries 21
Tries – Search algorithm (pseudo code)
// searching for a word w in a trie t
Node n = root of t; // current node (start at root)
int i = 0; // current position in word w (start at beginning)
while (true) {
if (n has a child c labelled w.charAt(i)) {
// can match the character of word in the current position
if (i == w.length()-1) { // end of word
if (c is an 'intermediate' node) return "absent";
else return "present";
}
else { // not at end of word
n = c; // move to child node
i++; // move to next character of word
}
}
else return "absent"; // cannot match current character
}
Algorithmics I, 2022 21
1 Sorting and Tries 22
Tries – Insertion algorithm (pseudo code)
Algorithmics I, 2022 22
1 Sorting and Tries 23
Tries - Algorithms
Deletion of a string from a trie
− exercise
Algorithmics I, 2022 23
1 Sorting and Tries 24
Tries - Implementation
Various possible implementations
− using an array (of pointers to represent the children of each node)
− using a linked lists (to represent the children of each node)
− time/space trade-off
List implementation
a e r t
− trie
− becomes the list e
′′ F
′e′ T
Algorithmics I, 2022 24
1 Sorting and Tries 25
Tries – Class to represent dictionary tries
public class Node { // node of a trie
private char letter; // label on incoming branch
private boolean isWord; // true when node represents a word
private Node sibling; // next sibling (when it exists)
private Node child; // first child (when it exists)
Algorithmics I, 2022 25
1 Sorting and Tries 26
Tries – Method to search
private enum Outcomes {PRESENT, ABSENT, UNKNOWN}
/** search trie for word w */
public boolean search(String w) {
Outcomes outcome = Outcomes.UNKNOWN;
int i = 0; // position in word so far searched (start at beginning)
Node current = root.getChild(); // start with first child of root
while (outcome == Outcomes.UNKNOWN) {
if (current == null) outcome = Outcomes.ABSENT; // dead-end
else if (current.getLetter() == w.charAt(i)) { // positions match
if (i == w.length()-1) outcome = Outcomes.PRESENT; // matched word
else { // descend one level…
current = current.getChild(); // in trie
i++; // in word being searched
}
}
else current = current.getSibling(); // try next sibling
}
if (outcome != Outcomes.PRESENT) return false;
else return current.getIsWord(); // true if current node represents a word
}
Algorithmics I, 2022 26
1 Sorting and Tries 27
Tries – Method to insert
public void insert(String w){ /* insert word w into trie */
int i = 0; // position in word (start at beginning)
Node current = root; // current node of trie (start at root)
Node next = current.getChild(); // child of current node we are testing
while (i < w.length()) { // not reached the end of the word
if (next.getLetter() == w.charAt(i)) { // chars match: descend a level
current = next; // update current to the child node
next = current.getChild(); // update child node
i++; // next position in word
} else if (next != null) next = next.getSibling(); // try next child
else { // no more siblings: need new node
Node x = new Node(s.charAt(i)); // label with ith element of word
x.setSibling(current.getChild()); // sibling: first child of current
current.setChild(x); // make it first child of current node
current = x; // move to the new node
next = current.getChild(); // update child node
i++; // next position in word
}
}
current.setIsWord(true); // current represents word w
}
Algorithmics I, 2022 27
1 Sorting and Tries 28
Algorithmics I 2022
Algorithmics I
a b c x c
V= {a,b,c,x,y,z}
E= { {a,x},{a,y},{a,z},
a z
{b,x},{b,y},{b,z},
{c,x},{c,y},{c,z} }
x y z y b
Algorithmics I, 2022 2
2 Graphs and Graph Algorithms 30
Graph basics
x c
a b c
a z
x y z y b
In this graph:
− vertices a & z are adjacent that is {a,z} is an element of the edge set E
− vertices a & b are non-adjacent that is {a,b} is not an element of E
− vertex a is incident to edge {a,x}
− a➝x➝b➝y➝c is a path of length 4 (number of edges)
− a➝x➝b➝y➝a is a cycle of length 4
− all vertices have degree 3
• i.e. all vertices are incident to three edges
Algorithmics I, 2022 3
2 Graphs and Graph Algorithms 31
Graph basics - Definitions
A graph is: connected, if every pair of vertices is joined by a path
x y z
u v w
A non-connected graph has two or more connected components
Algorithmics I, 2022 6
2 Graphs and Graph Algorithms 34
Graph basics – Directed graphs
A directed graph (digraph) D = (V,E)
− V is the finite set of vertices and E is the finite set of edges
− here each edge is an ordered pair (x,y) of vertices
v x
u for example (u,v),(w,y),(y,w) Î E
w z
y
− u is adjacent to v and v is adjacent from u
− y has in-degree 2 and out-degree 1
Algorithmics I, 2022 7
2 Graphs and Graph Algorithms 35
Graph representations – Undirected graphs
Undirected graph: Adjacency matrix
− one row and column for each vertex
− row i, column j contains a 1 if ith and jth vertices adjacent, 0 otherwise
Algorithmics I, 2022 8
2 Graphs and Graph Algorithms 36
Graph representations – Undirected graphs
Undirected graph G
x y z
u v w
u v w x y z
u: 0 1 0 1 0 0 u: v➝x
v: 1 0 1 1 1 0 v: u➝w➝x➝y
w: 0 1 0 1 1 0 w: v➝x➝y
x: 1 1 1 0 1 0 x: u➝v➝w➝y
y: 0 1 1 1 0 1 y: v➝w➝x➝z
z: 0 0 0 0 1 0 z: y
|V|×|V| array 2×|E| entries in all
Algorithmics I, 2022 9
2 Graphs and Graph Algorithms 37
Graph representations – Directed graphs
Directed graph: Adjacency matrix
− one row and column for each vertex
− row i, column j contains a 1 if there is an edge from i to j
and 0 otherwise
Algorithmics I, 2022 10
2 Graphs and Graph Algorithms 38
Graph representations – Directed graphs
Directed graph D v x
u
w z
y
Algorithmics I, 2022 11
2 Graphs and Graph Algorithms 39
Implementation – Adjacency lists
Recall adjacency list for an undirected graph
− one list for each vertex
− list i contains an element for j if the vertices i and j are adjacent
Algorithmics I, 2022 13
2 Graphs and Graph Algorithms 41
Implementation – Adjacency lists
import java.util.LinkedList; // we require the linked list class
Algorithmics I, 2022 15
2 Graphs and Graph Algorithms 43
Implementation – Adjacency lists
import java.util.LinkedList; // again require the linked list class
/** class to represent a graph */
public class Graph {
Algorithmics I, 2022 16
2 Graphs and Graph Algorithms 44
Graph search and traversal algorithms
Graph search and traversal algorithms
− a systematic way to explore a graph (when starting from some vertex)
v x
u
w z
y
Example: web crawler collects data from hypertext documents
by traversing a directed graph D where
− vertices are hypertext documents
− (u,v) is an edge if document u contains a hyperlink to document v
Algorithmics 3, 2010 17
2 Graphs and Graph Algorithms 45
Depth first search/traversal (DFS)
From starting vertex
− follow a path of unvisited vertices until path can be extended no further
− then backtrack along the path until an unvisited vertex can be reached
− continue until we cannot find any unvisited vertices
Repeat for other components (if any)
Algorithmics I, 2022 18
2 Graphs and Graph Algorithms 46
Depth first search/traversal (DFS)
From starting vertex
− follow a path of unvisited vertices until path can be extended no further
− then backtrack along the path until an unvisited vertex can be reached
− continue until we cannot find any unvisited vertices
Repeat for other components (if any)
Algorithmics I, 2022 19
2 Graphs and Graph Algorithms 47
Depth first traversal - Example
Undirected graph G
7 3
5 6
Algorithmics I, 2012 20
2 Graphs and Graph Algorithms 48
Implementation – DFS – Add to vertex class
Algorithmics I, 2012 21
2 Graphs and Graph Algorithms 49
Implementation – DFS – Add to graph class
/** visit vertex v, with predecessor index p, during a dfs */
private void visit(Vertex v, int p){
v.setVisited(true); // update as now visited
v.setPred(p); // set predecessor (indicates edge used to find vertex)
LinkedList<AdjListNode> L = v.getAdjList(); // get adjacency list
Algorithmics I, 2012 22
2 Graphs and Graph Algorithms 50
Analysis – Depth first search
Each vertex is visited, and each element in the adjacency lists is
processed, so overall O(n+m)
− where n is the number of vertices and m the number of edges
Some applications
− to determine if a given graph is connected
− to identify the connected components of a graph
− to determine if a given graph contains a cycle (see tutorial questions)
− to determine if a given graph is bipartite (see tutorial questions)
Algorithmics I, 2022 23
2 Graphs and Graph Algorithms 51
Breadth first search/traversal (BFS)
Search fans out as widely as possible at each vertex
− from the current vertex, visit all the adjacent vertices
this is referred to as processing the current vertex
− vertices are processed in the order in which they are visited
− continue until all vertices in current component have been processed
− then repeat for other components
(if there are any)
Algorithmics I, 2022 24
2 Graphs and Graph Algorithms 52
Breadth first traversal - Example
Undirected graph G
b a c
d
e f
g h
Algorithmics I, 2022 25
2 Graphs and Graph Algorithms 53
Analysis – Breadth first search
Complexity
− each vertex is visited and queued exactly once
− each adjacency list is traversed once
− so overall O(n+m) (n is the number of vertices and m number of edges)
− can adapt to adjacency matrix representation but O(n2) (as for DFS)
Example application
− finding the distance between two vertices, say v and w, in a graph
− the distance is the number of edges in the shortest path from v to w
− assign distance to v to be 0
− carry out a breadth-first search from v
− when visiting a new vertex for first time, assign its distance to be
1 + the distance to its predecessor in the BF spanning tree
− stop when w is reached
Algorithmics I, 2022 26
2 Graphs and Graph Algorithms 54
Distance between two vertices - Example
Distance between v and w v
− assign distance to v to be 0
− carry out a breadth-first search from v
− when visiting a new vertex for first time
assign its distance to be 1 + the distance
to its predecessor in the BF spanning tree w
1 0 1
v 2
shortest
1 2 path
Algorithmics I, 2022 27
2 Graphs and Graph Algorithms 55
Weighted graphs
Each edge e has an integer weight given by wt(e)>0
− graph may be undirected or directed
− weight may represent length, cost, capacity, etc
− if an edge is not part of the graph its weight is infinity
4 u 5
v w
5
6 7 5 4
x y
8
6 z 7
x y
8
6 z 7
adjacency matrix
adjacency list
u v w x y z
u 0 4 5 7 0 0 u:v(4)➝w(5)➝x(7)
v 4 0 5 6 0 0 v:u(4)➝w(5)➝x(6)
w 5 5 0 0 4 5 w:u(5)➝v(5)➝y(4)➝z(5)
x 7 6 0 0 8 6 x:u(7)➝v(6)➝y(8)➝z(6)
y 0 0 4 8 0 7 y:w(4)➝x(8)➝z(7)
z 0 0 5 6 7 0 z:w(5)➝x(6)➝y(7)
Algorithmics I, 2022 29
2 Graphs and Graph Algorithms 57
Weighted graphs - Shortest Paths
Given a weighted (un)directed graph and two vertices u and v
find a shortest path between u and v (for directed from u to v)
− where the length of a path is the sum of the weights of its edges
Applications include:
− flight reservations
− internet packet routing
− driving directions
Algorithmics I, 2022 30
2 Graphs and Graph Algorithms 58
Edsger Dijkstra, in an interview in 2010...
"… the algorithm for the shortest path, which I designed in
about 20 minutes. One morning I was shopping in Amsterdam
with my young fiancé, and tired, we sat down on the cafe
terrace to drink a cup of coffee, and I was just thinking about
whether I could do this, and I then designed the algorithm
for the shortest path."
Dijkstra, E.W. A note on two problems in Connexion with graphs.
Numerische Mathematik 1, 269–271 (1959)
Dijkstra describes the algorithm in English in 1956 (he was 26 years old)
− most people were programming in assembly language
− only one high-level language: Fortran by John Backus at IBM and not quite finished
No big O notation in 1959, in the paper, Dijkstra says: “my solution is preferred
to another one … the amount of work to be done seems considerably less.”
Algorithmics I, 2022 31
2 Graphs and Graph Algorithms 59
Dijkstra’s algorithm (as seen in NOSE2)
Algorithm finds shortest path between one vertex u and all others
− based on maintaining a set S containing all vertices for which shortest
path with u is currently known
− S initially contains only u (obviously shortest path between u and u is 0)
− eventually S contains all the vertices (so all shortest paths are known)
d(v)
v
w
u
d(w)
Algorithmics I, 2022 34
2 Graphs and Graph Algorithms 62
Dijkstra’s algorithm – Edge relaxation
Each vertex v has a label d(v) indicating the length of a shortest
path between u and v passing only through vertices in S
− suppose v and w are not in S then we know
• the shortest path between u and v passing only through S equals d(v)
• the shortest path between u and w passing only through S equals d(w)
− now suppose v is added to S and the edge e = {v,w} has weight wt(e)
− calculate the shortest path between u and w passing only through S∪{v}
S = {u}; // initialise S
for (each vertex w) d(w) = wt(u,w); // initialise lengths
Algorithmics I, 2022 36
2 Graphs and Graph Algorithms 64
Dijkstra’s algorithm – Complexity
S = {u}; // initialise S
for (each vertex w) d(w) = wt(u,w); // initialise lengths
Algorithmics I, 2022 38
2 Graphs and Graph Algorithms 66
Dijkstra’s algorithm – Pseudo code
S = {u}; // initialise S
for (each vertex w) d(w) = wt(u,w); // initialise lengths
Applications include:
− design of networks for computer, telecommunications, transportation,
gas, electricity, ...
− clustering, approximating the travelling salesman problem
Algorithmics I, 2022 40
2 Graphs and Graph Algorithms 68
Weighted graphs – Example – Spanning tree
Weighted graph G spanning tree:
4 5 subgraph which is
both a tree and
5 ‘spans’ every vertex
6 7 5 4
8
6 7
5
Spanning tree for G delete edges while still
‘spanning’ vertices
− weight 28 5
6 5 cannot delete any
more edges and
we have a tree
7
Algorithmics I, 2022 41
2 Graphs and Graph Algorithms 69
Weighted graphs – Example – Spanning tree
Weighted graph G spanning tree:
4 5 subgraph which is
both a tree and
5 ‘spans’ every vertex
6 7 5 4
8
6 7
Algorithmics I, 2022 43
2 Graphs and Graph Algorithms 71
Minimum weight spanning tree problem
An example of a problem in combinatorial optimisation
− find ‘best’ way of doing something among a (large) number of candidates
− can always be solved, at least in theory, by exhaustive search
− however this may be infeasible
− typically an exponential-time algorithm
Algorithmics I, 2022 44
2 Graphs and Graph Algorithms 72
The Prim-Jarnik algorithm
Min spanning tree is constructed by choosing a sequence of edge
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
while (number of ntv > 0){
find edge e = {p,q} of graph such that
p is a tv;
q is an ntv;
wt(e) is minimised over such edges;
adjoin edge e to the (spanning) tree;
make q a tv;
}
Analysis (n is the number of vertices)
− intitialisation O(n) (n operations to set vertices to be tv or ntv)
− the outer loop is executed n-1 times
− the inner loop checks all edges from a tree-vertex to a non-tree-vertex
− there can be O(n2) of these each time so overall the algorithm is O(n3)
Algorithmics I, 2022 45
2 Graphs and Graph Algorithms 73
The Prim-Jarnik algorithm – Example
Weighted graph G 4 5
5
6 7 5 4
8
6 7
u
Minimum spanning
4
tree for G
v w
− weight 24 5
6 5 4
x y
Algorithmics I, 2022 z 46
2 Graphs and Graph Algorithms 74
Dijkstra’s refinement
Introduce a attribute bestTV for each non-tree vertex (ntv) q
− bestTV is set to the tree vertex (tv) p for which wt({p,q}) is minimised
Algorithmics I, 2022 47
2 Graphs and Graph Algorithms 75
Dijkstra’s refinement – Analysis
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
for (each ntv s) set s.bestTV = r; // r is the only tv
− initialisation is O(n)
Algorithmics I, 2022 48
2 Graphs and Graph Algorithms 76
Dijkstra’s refinement – Analysis
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
for (each ntv s) set s.bestTV = r; // r is the only tv
− initialisation is O(n)
− while loop is executed n-1 times
Algorithmics I, 2022 49
2 Graphs and Graph Algorithms 77
Dijkstra’s refinement – Analysis
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
for (each ntv s) set s.bestTV = r; // r is the only tv
− initialisation is O(n)
− while loop is executed n-1 times
− first part takes O(n)
• O(n) to find minimal ntv and O(1) to adjoin and update
Algorithmics I, 2022 50
2 Graphs and Graph Algorithms 78
Dijkstra’s refinement – Analysis
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
for (each ntv s) set s.bestTV = r; // r is the only tv
− initialisation is O(n)
− while loop is executed n-1 times
− second part (inner loop) takes O(n)
• for each ntv s only need to compare weights for s.bestTV and new tv vertex
(i.e. q) to update the value of s.bestTV
Algorithmics I, 2022 51
2 Graphs and Graph Algorithms 79
Dijkstra’s refinement – Analysis
set an arbitrary vertex r to be a tree-vertex (tv);
set all other vertices to be non-tree-vertices (ntv);
for (each ntv s) set s.bestTV = r; // r is the only tv
− initialisation is O(n)
− while loop is executed n-1 times
− first part and second part each take O(n)
− overall the algorithm is O(n2)
Algorithmics I, 2022 52
2 Graphs and Graph Algorithms 80
Dijkstra’s refinement - Example
u
Weighted graph G
4
v w
5
Minimum spanning tree for G 5 4
− weight 24 x y
6
q q.bestTV wt({q.bestTV,q}) z
u - -
v - -
w - -
x - -
y - -
z - -
Algorithmics I, 2022 53
2 Graphs and Graph Algorithms 81
The Prim-Jarnik algorithm – Correctness
Is the algorithm correct ?
− i.e. does it return a minimum weight spanning tree for any graph G
Proof:
− suppose for graph G the algorithm returns the tree T
− compare T with a minimum spanning tree X of G
− if they are the same we are happy (it is a minimum weight spanning tree)
− therefore remains to consider the case when they are different…
Algorithmics I, 2022 54
2 Graphs and Graph Algorithms 82
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
T X adding e to X we
get a cycle C
(since X is a
e e spanning tree)
Algorithmics I, 2022 55
2 Graphs and Graph Algorithms 83
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
T adding e to X we
C
get a cycle C
(since X is a
e e spanning tree)
Algorithmics I, 2022 56
2 Graphs and Graph Algorithms 84
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
Algorithmics I, 2022 57
2 Graphs and Graph Algorithms 85
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
ntv
not
not in S
in S
Algorithmics I, 2022 58
2 Graphs and Graph Algorithms 86
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
we also have:
T
in S C wt(f)≥wt(e)
since the algorithm
tv
picks e and not f
e e
we can replace f by e
f in S in X to get another
spanning tree Y
ntv
not
not in S
in S
Algorithmics I, 2022 59
2 Graphs and Graph Algorithms 87
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
Y we also have:
T
wt(f)≥wt(e)
since the algorithm
tv
picks e and not f
e e
we can replace f by e
in X to get another
spanning tree Y
ntv
since wt(f)≥wt(e),
weight of Y cannot be
greater than X, and
since X is minimal, Y is
minimal
Algorithmics I, 2022 60
2 Graphs and Graph Algorithms 88
The Prim-Jarnik algorithm – Correctness
Suppose that T and X are different
− T tree returned by the algorithm and X a minimum spanning tree of G
− let e be the first edge chosen to be in T that is not in X
Y we also have:
T
continuing the process we can convert X to T maintaining minimality
wt(f)≥wt(e)
since the algorithm
tv
which proves that T is indeed a minimal spanning tree
picks e and not f
e e
hence the algorithm is correctwe can replace f by e
in X to get another
spanning tree Y
ntv
since wt(f)≥wt(e),
weight of Y cannot be
greater than X, and
since X is minimal, Y is
minimal
Algorithmics I, 2022 61
2 Graphs and Graph Algorithms 89
Directed Acyclic Graphs -Topological ordering
A Directed Acyclic Graph (DAG) is a directed graph with no cycles
Basic fact: a DAG has at least one source and at least one sink
− forms the basis of a topological ordering algorithm
Algorithmics I, 2022 62
2 Graphs and Graph Algorithms 90
Directed Acyclic Graphs - Example
Directed acyclic graph D
− with more than one source and more than one sink
Algorithmics I, 2022 63
2 Graphs and Graph Algorithms 91
Directed Acyclic Graphs - Example
Directed acyclic graph D
− with more than one source and more than one sink
Algorithmics I, 2022 64
2 Graphs and Graph Algorithms 92
Directed Acyclic Graphs - Example
Directed acyclic graph D
− with more than one source and more than one sink
Algorithmics I, 2022 65
2 Graphs and Graph Algorithms 93
Directed Acyclic Graphs - Example
Directed acyclic graph D
6
1
5
Topological ordering of D
8
2
3
9
Source vertex (in-degree equals 0)
4 7
Sink vertex (out-degree equals 0)
Algorithmics I, 2022 66
2 Graphs and Graph Algorithms 94
Topological ordering algorithm
// assume each vertex has 2 integer attributes: label and count
// count is the number of incoming edges from unlabelled vertices
// label will give the topological ordering
for (each vertex v) // add vertices with no incoming edges to the queue
if (v.getCount() == 0) add v to sourceQueue; // i.e. source vertices
z
u
source queue: 〈〉 v w
y
t 7 x s
1
5
8 r
z
u
v w
2 a topological ordering on D
3 y 9
s
4
6
r
Algorithmics I, 2022 68
2 Graphs and Graph Algorithms 96
Topological ordering algorithm - Correctness
A vertex is given a label only when the number of incoming edges
from unlabelled vertices is zero
− all predecessor vertices must already be labelled with smaller numbers
Algorithmics I, 2022 69
2 Graphs and Graph Algorithms 97
Topological ordering algorithm - Correctness
A vertex is given a label only when the number of incoming edges
from unlabelled vertices is zero
− all predecessor vertices must already be labelled with smaller numbers
− dependent on using a queue (first in first out for labelling)
Algorithmics I, 2022 70
2 Graphs and Graph Algorithms 98
Topological ordering algorithm - Analysis
Analysis (n vertices, m edges)
• for adjacency lists representation
− finding in-degree of each vertex is O(n+m) (scan adjacency lists)
− main loop is executed n times within it one list is scanned
(and the same list is never scanned twice)
− so every list is scanned again and overall algorithm is O(n+m)
Algorithmics I, 2022 71
2 Graphs and Graph Algorithms 99
Deadlock detection
Determining whether a digraph contains a cycle
Algorithmics I, 2022 72
2 Graphs and Graph Algorithms 100
Algorithmics I 2022
Algorithmics I
compression algorithm
original file compressed file
Algorithmics I, 2022 2
3 Strings and text algorithms 102
Text compression
Examples of text compression
− compress, gzip in Unix, ZIP utilities for Windows, …
− two main approaches statistical and dictionary
Algorithmics I, 2022 4
3 Strings and text algorithms 104
Huffman tree construction - Example
Space E A T I S R O N U H C D
Character frequencies: 15 11 9 8 7 7 7 6 4 3 2 1 1
Algorithmics I, 2022 6
3 Strings and text algorithms 106
Huffman code - Example
Space E A T I S R O N U H C D
Character frequencies: 15 11 9 8 7 7 7 6 4 3 2 1 1
Huffman tree: 81
49 32
28 21 15 17
Space
14 14 11 10 8 9
E T A
7 7 7 7 6 4
I S R O N
3
4 huffman code:
U
Space 10 I 0000 U 00101
2 2 E 010 S 0001 H 001001
H A 111 R 0011 C 0010000
1 1 T 110 O 0110 D 0010001
N 0111
C D
Algorithmics I, 2022 7
3 Strings and text algorithms 107
Huffman code - Example
Space E A T I S R O N U H C D
Character frequencies: 15 11 9 8 7 7 7 6 4 3 2 1 1
Huffman tree: 81
49 32
28 21 15 17
Space
14 14 11 10 8 9
E T A
7 7 7 7 6 4
I S R O N
3
4
U prefix property: no codeword is a prefix of another
2 2
H equivalently: no path to one character is a prefix of
1 1 another (since characters are only found at leaves)
C D
Algorithmics I, 2022 8
3 Strings and text algorithms 108
Huffman encoding - Optimality
Weighted path length (WPL) of a tree T
− ∑ (weight)×(distance from root) where sum is over all leaf nodes
− for the example tree: WPL equals: 7×4 + 7×4 + 1×7 + 1×7 + 2×6 +
3×5 + 7×4 + 11×3 + 6×4 + 4×4 + 15×2 + 8×3 + 9×3 = 279
81
49 32
28 21 15 17
Space
14 14 8 9
11 10
E T A
7 7 7 7
6 4
I S R
O N
3
4
U
2 2
H
Algorithmics I, 2022 1 1 9
3 Strings and text algorithms C D 109
Huffman encoding - Optimality
Weighted path length (WPL) of a tree T
− ∑ (weight)×(distance from root) where sum is over all leaf nodes
− for the example tree: WPL equals: 7×4 + 7×4 + 1×7 + 1×7 + 2×6 +
3×5 + 7×4 + 11×3 + 6×4 + 4×4 + 15×2 + 8×3 + 9×3 = 279
Huffman tree has minimum WPL over all binary trees with the
given leaf weights
− Huffman tree need not be unique (e.g. nodes>2 with min weight)
− however all Huffman trees for a given set of frequencies have same WPL
− so what?
− weighted path length (WPL) is the number of bits in compressed file
• bits = sum over chars (frequency of char × code length of char)
− so a Huffman tree minimises this number
− hence Huffman coding is optimal, for all possible codes built in this way
Algorithmics I, 2022 10
3 Strings and text algorithms 110
Huffman encoding – Algorithmic requirements
Building the Huffman tree
− if the text length equals n and there are m distinct chars in text
− O(n) time to find the frequencies
− O(mlog m) time to construct the code, for example using a (min-) heap
to store the parentless nodes and their weights
• initially build a heap where nodes correspond to the m characters labelled
by their frequencies, therefore takes O(m) time to build the heap
• one iteration takes O(log m) time:
• find and remove (O(log m)) two minimum weights
• then insert (O(log m)) new weight (sum of minimum weights found)
• and there are m-1 iterations before the heap is empty
• each iteration decreases the size of the heap by 1
− so O(n + mlog m) overall
− in fact, m is essentially a constant, so it is really O(n)
Algorithmics I, 2022 11
3 Strings and text algorithms 111
Huffman encoding – Algorithmic requirements
Compression & decompression are both O(n) time
− assuming m is constant
Algorithmics I, 2022 12
3 Strings and text algorithms 112
Huffman encoding – Algorithmic requirements
Problem: some representation of the Huffman tree must be stored
with the compressed file
− otherwise decompression would be impossible
Alternatives
− use a fixed set of frequencies based on typical values for text
• but this will usually reduce the compression ratio
− use adaptive Huffman coding: the (same) tree is built and adapted by the
compressor and by the decompressor as characters are encoded/decoded
• this slows down compression and decompression (but not by much if
done in a clever way)
Algorithmics I, 2022 13
3 Strings and text algorithms 113
LZW compression
A popular dictionary-based method
− the basis of compress and gzip in Unix also used in gif and tiff formats
− due to Lempel, Ziv and Welch
− algorithm was under patented to Unisys (but patent now expired)
Algorithmics I, 2022 14
3 Strings and text algorithms 114
LZW compression
The dictionary is build dynamically during compression
− and also during decompression
Algorithmics I, 2022 15
3 Strings and text algorithms 115
LZW compression
Key question: how many bits are in a codeword?
− in the most used version of the algorithm, this value changes as the
compression (or decompression) algorithm proceeds
Algorithmics I, 2022 16
3 Strings and text algorithms 116
LZW compression – Pseudo code
set current text position i to 0;
initialise codeword length k (say to 8);
initialise the dictionary d;
Algorithmics I, 2022 17
3 Strings and text algorithms 117
LZW compression - Variants
Constant codeword length: fix the codeword length for all time
− the dictionary has fixed capacity: when full, just stop adding to it
Algorithmics I, 2022 18
3 Strings and text algorithms 118
LZW compression - Example
Text = G A C G A T A C G A T A C G
File size = 14 bytes, or 28 bits if 2 bits/char
Algorithmics I, 2022 19
3 Strings and text algorithms 119
LZW decompression
Decompression algorithm builds same dictionary as compression
algorithm
− but one step out of phase
Algorithmics I, 2022 20
3 Strings and text algorithms 120
LZW decompression – Pseudo code
initialise codeword length k;
initialise the dictionary;
read the first codeword x from the compressed file f; // i.e. read k bits
String s = d.lookUp(x); // look up codeword in dictionary
output s; // output decompressed string
Algorithmics I, 2022 21
3 Strings and text algorithms 121
LZW decompression - Example
Compressed file: 10000001100011010101111001
file size = 26 bits
Uncompressed Text = G A C G A T A C G A T A C G
Algorithmics I, 2022 22
3 Strings and text algorithms 122
LZW decompression – Special case
It is possible to encounter a codeword that is not (yet) in the
dictionary
− because decompression is ‘out of phase’ with compression
− but in that case it is possible to deduce what string it must represent
− consider: A A B A B A B A A
and work through compression and decompression for this text
Algorithmics I, 2022 23
3 Strings and text algorithms 123
LZW decompression
Appropriate data structure for decompression is a simple table
Algorithmics I, 2022 24
3 Strings and text algorithms 124
Strings - Notation
For a string s=s0s1…sm-1
− m is the length of the string
− s[i] is the (i+1)th element of the string, i.e. si
− s[i..j] is the substring from the ith to jth position, i.e. sisi+1…sj
Algorithmics I, 2022 25
3 Strings and text algorithms 125
String comparison
Fundamental question: how similar, or how different, are 2 strings?
− applications include:
• biology (DNA and protein sequences)
• file comparison (diff in Unix, and other similar file utilities)
• spelling correction, speech recognition,…
Algorithmics I, 2022 26
3 Strings and text algorithms 126
String comparison – String distance
The distance between s and t is defined to be the smallest
number of basic operations needed to transform s to t
− for example consider the strings s and t
s: a b a d c d b
t: a c b a c a c b
insert ‘c’ delete ‘d’ substitute ‘a’ for ‘d’ insert ‘c’
s: a - b a d c d - b
t: a c b a - c a c b
Algorithmics I, 2022 27
3 Strings and text algorithms 127
String comparison – String distance
The distance between s and t is defined to be the smallest
number of basic operations needed to transform s into t
− for example for the strings
s: a b a d c d b
t: a c b a c a c b
s: a - b a d c d - b
t: a c b a - c a c b
Algorithmics I, 2022 28
3 Strings and text algorithms 128
String comparison – String distance
More complex models are possible
− e.g. we can allocate a cost to each basic operation
− our methods adapt easily but we will stick to the unit-cost model
Algorithmics I, 2022 29
3 Strings and text algorithms 129
String distance – Dynamic programming
Recall the ith prefix of string s is the first i characters of s
− let d(i,j) be the distance between ith prefix of s and the jth prefix of t
− distance between s and t is then d(m,n)
(since s and t of lengths m and n)
− in the base cases we set d(i,0)=i and d(0,j)=j for all i≤n and j≤m
− since the distance from/to an empty string to/from a string of length k
is equal to k (we require k insertions/deletions)
Algorithmics I, 2022 30
3 Strings and text algorithms 130
String distance – Dynamic programming
In an optimal alignment of the ith prefix of s with the jth prefix of t
the last position of the alignment must either be of the form:
* - * *
*
if s[i-1] = t[j-1] and , or otherwise
* - $
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
otherwise
Algorithmics I, 2022 31
3 Strings and text algorithms 131
String distance – Dynamic programming
In an optimal alignment of the ith prefix of s with the jth prefix of t
the last position of the alignment must either be of the form:
* - * *
*
if s[i-1] = t[j-1] and , or otherwise
* - $
In this case, insert element into s and distance given by 1 (for the
insertion) plus distance between ith prefix of s and i-1th prefix of t
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
1 + min{ d(i,j−1) } otherwise
Algorithmics I, 2022 32
3 Strings and text algorithms 132
String distance – Dynamic programming
In an optimal alignment of the ith prefix of s with the jth prefix of t
the last position of the alignment must either be of the form:
* - * *
*
if s[i-1] = t[j-1] and , or otherwise
* - $
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
1 + min{ d(i,j−1), d(i−1,j), } otherwise
Algorithmics I, 2022 33
3 Strings and text algorithms 133
String distance – Dynamic programming
In an optimal alignment of the ith prefix of s with the jth prefix of t
the last position of the alignment must either be of the form:
* - * *
*
if s[i-1] = t[j-1] and , or otherwise
* - $
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
1 + min{ d(i,j−1), d(i−1,j), d(i−1,j−1) } otherwise
Algorithmics I, 2022 34
3 Strings and text algorithms 134
String distance – Dynamic programming
In an optimal alignment of the ith prefix of s with the jth prefix of t
the last position of the alignment must either be of the form:
* - * *
*
if s[i-1] = t[j-1] and , or otherwise
* - $
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
1 + min{ d(i,j−1), d(i−1,j), d(i−1,j−1) } otherwise
Algorithmics I, 2022 35
3 Strings and text algorithms 135
String distance – Dynamic programming
The complete recurrence relation is given by:
d(i-1,j-1) if s[i-1]=t[j-1]
d(i,j) =
1+min{ d(i,j−1),d(i−1,j),d(i−1,j−1)} otherwise
Algorithmics I, 2022 36
3 Strings and text algorithms 136
String distance – Dynamic programming
Algorithmics I, 2022 37
3 Strings and text algorithms 137
String distance - Example
s\t 0 1 2 3 4 5 6 7 8
a c b a c a c b
0 0 1 2 3 4 5 6 7 8
1 a 1 0 1 2 3 4 5 6 7
2 b 2 1 1 1 2 3 4 5 6
3 a 3 2 2 2 1 2 3 4 5
4 d 4 3 3 3 2 2 3 4 5
5 c 5 4 3 4 3 2 3 3 4
6 d 6 5 4 4 4 3 3 4 4
7 b 7 6 5 4 5 4 4 4 4
Algorithmics I, 2022 38
3 Strings and text algorithms 138
String distance – Dynamic programming
The traceback phase used to construct an optimal alignment
− trace a path in the table from bottom right to top left
− draw an arrow from an entry to the entry that led to its value
Interpretation
− vertical steps as deletions
− horizontal steps as insertions
− diagonal steps as matches or substitutions
• a match if the distance does not change and a substitution otherwise
Algorithmics I, 2022 39
3 Strings and text algorithms 139
String distance – Example (traceback)
s\t 0 1 2 3 4 5 6 7 8
a c b a c a c b
0 0 1 2 3 4 5 6 7 8
1 a 1 0 1 2 3 4 5 6 7
2 b 2 1 1 1 2 3 4 5 6
3 a 3 2 2 2 1 2 3 4 5
4 d 4 3 3 3 2 2 3 4 5
5 c 5 4 3 4 3 2 3 3 4
6 d 6 5 4 4 4 3 3 4 4
7 b 7 6 5 4 5 4 4 4 4
s: a - b a d - c d b
t: a c b a c a c - b
Corresponding alignment: step: d h d d d h d v d
(d=diagonal, v = vertical, h = horizontal)
Algorithmics I, 2022 40
3 Strings and text algorithms 140
String/pattern search
Searching a (long) text for a (short) string/pattern
− many applications including
• information retrieval
• text editing
• computational biology
Algorithmics I, 2022 42
3 Strings and text algorithms 142
String search – Brute force algorithm
/** return smallest k such that s occurs in t starting at position k */
public int bruteForce (char[] s, char[] t){
int m = s.length; // length of string/pattern
int n = t.length; // length of text
int sp = 0; // starting position in text t
int i = 0; // curr position in text
int j = 0; // curr position in string/pattern s
while (sp <= n-m && j < m) { // not reached end of text/string
if (t[i] == s[j]){ // chars match
i++; // move on in text
j++; // move on in string/pattern
} else { // a mismatch
j = 0; // start again in string
sp++; // advance starting position
i = sp; // back up in text to new starting position
}
}
if (j == m) return sp; // occurrence found (reached end of string)
else return -1; // no occurrence (reached end of text)
}
Algorithmics I, 2022 43
3 Strings and text algorithms 143
String search – Brute force algorithm
Worst case is no better than O(mn)
− e.g. search for s = aa … ab in t = aa ... aaaa … ab
length m length n
Algorithmics I, 2022 44
3 Strings and text algorithms 144
String search – KMP algorithm
The Knuth-Morris-Pratt (KMP) algorithm
− addresses first challenge: linear (O(m+n)) in the worst case
It is an on-line algorithm
− i.e., it removes the need to back-up in the text
− involves pre-processing the string to build a border table
− border table: an array b with entry b[j] for each position j of the string
Algorithmics I, 2022 45
3 Strings and text algorithms 145
String search – KMP algorithm
A substring of string s is a sequence of consecutive characters of s
− if s has length n, then s[i..j] is a substring for i and j with 0≤i≤j≤n-1
Algorithmics I, 2022 46
3 Strings and text algorithms 146
String search – Border table
KMP algorithm requires the border table of the string pattern
− a border of a string s is a substring that is both a prefix and a suffix and
cannot be the string itself
Border table b: array which has the same size as the string
− b[j] = the length of the longest border of s[0..j-1]
= max { k | s[0..k-1] = s[j-k..j-1] ∧ k<j }
Example
string/pattern s a b a b a c a
j 0 1 2 3 4 5 6
b[j] 0 0 0 1 2 3 0
Algorithmics I, 2022 47
3 Strings and text algorithms 147
String search – Brute force versus KMP
Example - Mismatch between s and t at position 9 in s
jnew j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
string/pattern s a g a g c a g a g a g c a g
text t a g a g c a g a g t * * * * …
inew i
Applying the brute force algorithm, after the mis-match:
− s has to be ‘moved along’ one position relative to t
− then we start again at position 0 in s and jump back j-1 positions in t
Algorithmics I, 2022 48
3 Strings and text algorithms 148
String search – Brute force versus KMP
Example - Mismatch between s and t at position 9 in s
j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
string/pattern s a g a g c a g a g a g c a g
text t a g a g c a g a g t * * * * …
i
Applying the KMP algorithm, after the mis-match:
− s has to be ‘moved along’ until the characters to the left of i again match
Algorithmics I, 2022 49
3 Strings and text algorithms 149
String search – Brute force versus KMP
mis-match
j
string/pattern s s[0..j-1] $ …
text t … s[0..j-1] * …
i
Need to move s along until the characters to the left of i match
therefore need start of s[0..j-1] to match end of s[0..j-1]
− therefore use longest border of s[0..j-1]
− i.e. longest substring that is both a prefix and a suffix of s[0..j-1]
string/pattern s $ …
text t … * …
Algorithmics I, 2022 50
3 Strings and text algorithms 150
String search – Brute force versus KMP
Example - Mismatch between s and t at position 9 in s
jnew j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
string/pattern s a g a g c a g a g a g c a g
text t a g a g c a g a g * * * * * …
i
Applying the KMP algorithm, after the mis-match:
− s has to be ‘moved along’ until the characters to the left of i again match
− this determines the new value of j, the value of i is unchanged
− length of the longest border of s[0..j-1] is 4 in this case
• i.e. longest substring that is both a prefix and a suffix of s[0..j-1]
− so the new value of j is 4
Algorithmics I, 2022 51
3 Strings and text algorithms 151
String search – Brute force versus KMP
Example - Mismatch between s and t at position 9 in s
jnew j
0 1 2 3 4 5 6 7 8 9 10 11
string/pattern s t g a g c a g a g a g c
text t t g a g c a g a g t * * * * …
i
Applying the KMP algorithm, after the mis-match:
− s has to be ‘moved along’ until the characters to the left of i again match
Algorithmics I, 2022 52
3 Strings and text algorithms 152
String search – Brute force versus KMP
Example - Mismatch between s and t at position 0 in s
j
0 1 2 3 4 5 6 7 8 9 10 11 12 13
string/pattern s t g a g c a g a g a g c a g
text t a g a g c a g a g t * * * * …
i inew
Applying the KMP algorithm, after the mis-match:
− s has to be ‘moved along’ until the characters to the left of i again match
Algorithmics I, 2022 53
3 Strings and text algorithms 153
KMP search - Implementation
/** return smallest k such that s occurs from position k in t or -1 if no k exists */
public int kmp(char[] t, char[] s) {
int m = s.length; // length of string/pattern
int n = t.length; // length of text
int i = 0; // current position in text
int j = 0; // current position in string s
int [] b = new int[m]; // create border table
setUp(b); // set up the border table
while (i <= n) { // not reached end of text
if (t[i] == s[j]){ // if positions match
i++; // move on in text
j++; // move on in string
if (j = m) return i – j; // reached end of string so a match
} else { // mismatch adjust current position in string using the border table
if (b[j] > 0) // there is a common prefix/suffix
j = b[j]; // change position in string (position in text unchanged)
else { // no common prefix/suffix
if (j = 0) i++; // move forward one position in text if not advanced
else j = 0; // else start from beginning of the string
}
}
}
return -1; // no occurrence
}
54
Algorithmics I, 2022
3 Strings and text algorithms 154
KMP - Example
string/pattern s a b a b a c a
text t b a c b a b a b a b a c a a b
string/pattern s a b a b a c a
j 0 1 2 3 4 5 6
b[j] 0 0 0 1 2 3 0
Algorithmics I, 2022 55
3 Strings and text algorithms 155
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 56
3 Strings and text algorithms 156
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 57
3 Strings and text algorithms 157
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 58
3 Strings and text algorithms 158
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 59
3 Strings and text algorithms 159
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 60
3 Strings and text algorithms 160
KMP search - Analysis
while (i<n)
if (t[i] == s[j]){
i++; j++;
}
else {
if (b[j]>0) j = b[j];
else {
if (j=0) i++;
else j = 0;
}
}
Algorithmics I, 2022 62
3 Strings and text algorithms 162
Boyer-Moore Algorithm
Challenge 1: can we find a solution that is linear in the worst case?
Yes: KMP
Algorithmics I, 2022 63
3 Strings and text algorithms 163
Boyer-Moore Algorithm – Example
Search for ‘pill’ in ‘the caterpillar’
the caterpillar
pill
^
Algorithmics I, 2022 64
3 Strings and text algorithms 164
Boyer-Moore Algorithm – Example
Search for ‘pill’ in ‘the caterpillar’
the caterpillar
pill
^
Algorithmics I, 2022 65
3 Strings and text algorithms 165
Boyer-Moore Algorithm – Simplified version
The string is scanned right-to-left
− text character involved in a mismatch is used to decide next comparison
− involves pre-processing the string to record the position of the last
occurrence of each character c in the alphabet
− therefore the alphabet must be fixed in advance of the search
Algorithmics I, 2022 66
3 Strings and text algorithms 166
Boyer-Moore Algorithm – Simplified version
In our pseudocode we assume an array p[c] indexed by characters
− the characters range over the underlying alphabet of the text
− p[c] records the position in the string of the last occurrence of char c
− if the character c is absent from the string s, then let p[c]=-1
string/pattern s . . a . . . b . .
p[t[i]] j
reminder: p[t[i]] records the position
in s the last occurrence of character t[i]
Algorithmics I, 2022 68
3 Strings and text algorithms 168
Boyer-Moore Algorithm – Jump step – Case 1
Assume a mismatch between position s[j] and position t[i]
Case 1: the last position of character t[i] in s is before position j
sp i inew
text t * * * * * * a * * * * * * …
string/pattern s . . a . . . b . .
j
(m-1)-p[t[i]]
p[t[i]]
m-1
− i records the current position in the text we are checking
− new value of i equals i+(m-1)-p[t[i]]
Algorithmics I, 2022 69
3 Strings and text algorithms 169
Boyer-Moore Algorithm – Jump step – Case 1
Assume a mismatch between position s[j] and position t[i]
Case 1: the last position of character t[i] in s is before position j
sp i
text t * * * * * * a * * * * * * …
string/pattern s . . a . . . b . .
jnew
p[t[i]] j
Algorithmics I, 2022 70
3 Strings and text algorithms 170
Boyer-Moore Algorithm – Jump step – Case 1
Assume a mismatch between position s[j] and position t[i]
Case 1: the last position of character t[i] in s is before position j
sp spnew
i
text t * * * * * * a * * * * * * …
string/pattern s . . a . . . b . .
p[t[i]] j
j-p[t[i]]
Algorithmics I, 2022 71
3 Strings and text algorithms 171
Boyer-Moore Algorithm – Jump step – Case 2
Assume a mismatch between position s[j] and position t[i]
Case 2: last position of character t[i] in s is at least at position j
sp i
text t * * * * a * * * * * * * …
string/pattern s . . b . . . a
. . .
j p[t[i]]
move string along by one place and start again from the end of the string
Algorithmics I, 2022 72
3 Strings and text algorithms 172
Boyer-Moore Algorithm – Jump step – Case 2
Assume a mismatch between position s[j] and position t[i]
Case 2: last position of character t[i] in s is at least at position j
sp i inew
text t * * * * a * * * * * * * …
string/pattern s . . b . . . a . .
j position m-1
position j-1
(m-1)–(j-1)
Algorithmics I, 2022 73
3 Strings and text algorithms 173
Boyer-Moore Algorithm – Jump step – Case 2
Assume a mismatch between position s[j] and position t[i]
Case 2: last position of character t[i] in s is at least at position j
sp i
text t * * * * a * * * * * * * …
string/pattern s . . b . . . a . .
jnew
j
Algorithmics I, 2022 74
3 Strings and text algorithms 174
Boyer-Moore Algorithm – Jump step – Case 2
Assume a mismatch between position s[j] and position t[i]
Case 2: last position of character t[i] in s is at least at position j
sp spnew
i
text t * * * * a * * * * * * * …
string/pattern s . . b . . . a . .
j
Algorithmics I, 2022 75
3 Strings and text algorithms 175
Boyer-Moore Algorithm – Jump step – Case 3
Assume a mismatch between position s[j] and position t[i]
Case 3: character t[i] does not appear in s (i.e. we have p[j]=-1)
sp i
text t * * * * * * a * * * * * * * * .
* …
string/pattern s . . . . . . b . .
Algorithmics I, 2022 76
3 Strings and text algorithms 176
Boyer-Moore Algorithm – Jump step – Case 3
Assume a mismatch between position s[j] and position t[i]
Case 3: character t[i] does not appear in s (i.e. we have p[j]=-1)
sp i inew
text t * * * * * * a * * * * * * * * .
* …
string/pattern s . . . . . . b . .
m-1
Algorithmics I, 2022 77
3 Strings and text algorithms 177
Boyer-Moore Algorithm – Jump step – Case 3
Assume a mismatch between position s[j] and position t[i]
Case 3: character t[i] does not appear in s (i.e. we have p[j]=-1)
sp i inew
text t * * * * * * a * * * * * * * * .
* …
string/pattern s . . . . . . b . .
Algorithmics I, 2022 78
3 Strings and text algorithms 178
Boyer-Moore Algorithm – Jump step – Case 3
Assume a mismatch between position s[j] and position t[i]
Case 3: character t[i] does not appear in s (i.e. we have p[j]=-1)
sp i inew
text t * * * * * * a * * * * * * * * .
* …
string/pattern s . . . . . . b . .
jnew
j
Algorithmics I, 2022 79
3 Strings and text algorithms 179
Boyer-Moore Algorithm – Jump step – Case 3
Assume a mismatch between position s[j] and position t[i]
Case 3: character t[i] does not appear in s (i.e. we have p[j]=-1)
sp spnew
i inew
text t * * * * * * a * * * * * * * * .
* …
string/pattern s . . . . . . b . .
j+1
− sp records the current starting position of string in the text
− new value of sp equals sp+(j+1) as this is the amount the pattern/
string has been moved forward
Algorithmics I, 2022 80
3 Strings and text algorithms 180
Boyer-Moore Algorithm – All cases
Case 1: p[t[i]]<j and p[t[i]]≥0
− new value of i equals i+m-1-p[t[i]] Note p[t[i]] cannot
− new value of j equals m-1 equal j as p[t[i]] last
Case 2: p[t[i]]>j
position of character
t[i] in s
− new value of i equals i+m-j
and mismatch between
− new value of j equals m-1
t[i] and s[j]
− new value of sp equals sp+1
Case 3: p[t[i]]=-1
− new value of i equals i+m
− new value of j equals m-1
− new value of sp equals sp+j+1
Algorithmics I, 2022 81
3 Strings and text algorithms 181
Boyer-Moore Algorithm – All cases
We find that we can express these updates as follows:
− new value of i equals i + m – min(1+p[t[i]],j)
− new value of j equals m-1
− new value of sp equals sp + max(j-p[t[i]],1)
You do not need to learn these updates, just how the algorithm works
− this is sufficient for running it on an example (as you saw)
− and for working out what the updates are if needed (again as you saw)
Algorithmics I, 2022 82
3 Strings and text algorithms 182
Boyer-Moore Algorithm - Implementation
/** return smallest k such that s occurs at k in t or -1 if no k exists */
public int bm(char[] t, char[] s) {
int m = s.length; // length of string/pattern
int n = t.length; // length of text
int sp = 0; // current starting position of string in text
int i = m-1; // current position in text
int j = m-1; // current position in string/pattern
// declare a suitable array p
setUp(s, p); // set up the last occurrence array
while (sp <= n-m && j >= 0) {
if (t[i] == s[j]){ // current characters match
i--; // move back in text
j--; // move back in string
} else { // current characters do not match
sp += max(1, j - p[t[i]]);
i += m – min(j, 1 + p[t[i]]);
j = m-1; // return to end of string
}
}
if (j < 0) return sp; else return -1; // occurrence found yes/no
}
Algorithmics I, 2022 83
3 Strings and text algorithms 183
Boyer-Moore Algorithm - Complexity
Worst case is no better than O(mn)
− e.g. search for s = ba … aa in t = aa … aaaa … aa
length m length n
Algorithmics I, 2022 84
3 Strings and text algorithms 184
Algorithmics I 2022
Algorithmics I
Section 4 – NP completeness
4 NP Completeness 185
Some efficient algorithms we have seen
Algorithmics I, 2022 2
4 NP Completeness 186
Recall the Eulerian cycle problem (AF2)
1/8
1 3 4
7 2
2/6 5
Algorithmics I, 2022 3
4 NP Completeness 187
Recall the Hamiltonian cycle problem (AF2)
Algorithmics I, 2022 4
4 NP Completeness 188
Recall the Hamiltonian cycle problem (AF2)
Brute force algorithm:
− generate all permutations of vertices
− check each one to see if it is a cycle, i.e. corresponding edges are present
Algorithmics I, 2022 5
4 NP Completeness 189
Polynomial versus exponential time
Table shows running time of algorithms with various complexities
(assuming 109 operations per second)
20 40 50 60 70
n .00001 sec .00003 sec .00004 sec .00005 sec .00006 sec
n2 .0001 sec .0009 sec .0016 sec .0025 sec .0036 sec
n3 .001 sec .027 sec .064 sec .125 sec .216 sec
2n .001 sec 17.9 mins 12.7 days 35.7 years 366 cents
3n .059 sec 6.5 years 3855 cents 2´108 cents 1.3´1013 cents
n! 3.6 secs 8.4´1016 cents 2.6´1032 cents 9.6´1048 cents 2.6´1066 cents
Algorithmics I, 2022 6
4 NP Completeness 190
Polynomial versus exponential time
This behaviour still applies even with increases in computing power
− sizes of largest instance solvable in 1 hour on a current computer
− what happens when computers become faster?
current computer 100 times computer 1000 times
computer faster faster
n N1 100 N1 1000 N1
n2 N2 10 N2 31.6 N2
n3 N3 4.64 N3 10 N3
n5 N4 2.5 N4 3.98 N4
2n N5 N5 + 6.64 N5 + 9.97
3n N6 N6 + 4.19 N6 + 6.29
n! N7 £ N7 + 1 £ N7 + 1
Algorithmics I, 2022 7
4 NP Completeness 191
Polynomial versus exponential time
The message:
• Exponential-time algorithms are in general “bad”
− increases in processor speeds to do not lead to significant changes in this
slow behaviour when the input size is large
• Polynomial-time algorithms are in general “good”
Algorithmics I, 2022 8
4 NP Completeness 192
A brief interlude
You are asked to find a polynomial-time algorithm for the
Hamiltonian cycle problem
− this could be a difficult task, you do not want to have to report:
Algorithmics I, 2022 9
4 NP Completeness 193
A brief interlude
Definition: a problem Π is intractable if there does not exist a
polynomial-time algorithm that solves Π
− you could try to prove that the Hamiltonian Cycle problem is intractable
Algorithmics I, 2022 10
4 NP Completeness 194
A brief interlude
You could try to prove that the Hamiltonian cycle problem is “just as
hard” as a whole family of other difficult problems
“I cannot find an efficient algorithm, but neither can all these famous people!”
Algorithmics I, 2022 11
4 NP Completeness 195
A brief interlude
State of the Art for Hamiltonian cycle
− no polynomial-time algorithm has been found
− similarly, no proof of intractability has been found
− the problem is known to be an NP-complete problem
Algorithmics I, 2022 12
4 NP Completeness 196
NP-complete problems
No polynomial-time algorithm is known for a NP-complete problem
− however, if one of them is solvable in polynomial time, then they all are
Algorithmics I, 2022 13
4 NP Completeness 197
Intractable problems
Two different causes of intractability (no polynomial algorithm):
1. polynomial time is not sufficient in order to discover a solution
2. solution itself is so large that exponential time is needed to output it
Example of case 2:
− consider problem of generating all cycles for a given graph
Algorithmics I, 2022 14
4 NP Completeness 198
Intractable problems - Roadblock
A decidable problem that is intractable: Roadblock
− there are two players: A and B
− there is a network of roads, comprising intersections connected by roads
− each road is coloured either black, blue or green
− some intersections are marked either “A wins” or “B wins”
− a player has a fleet of cars located at intersections
• at most one per intersection
A wins
A
B
B wins B
A wins
Algorithmics I, 2022 16
4 NP Completeness 200
Intractable problems – Roadblock - Example
A wins
A
B
B
B wins B
A wins
Algorithmics I, 2022 17
4 NP Completeness 201
Summary
Polynomial-time Intractable
solvable problems
problems
? ?
NP-complete
problems
One of the question marks must be an ’equals’ sign, while the other
must be a ’not-equals’ sign
Algorithmics I, 2022 18
4 NP Completeness 202
Problem and problem instances
A problem is usually characterised by (unspecified) parameters
− typically there are infinitely many instances for a given problem
A problem instance is created by giving these parameters values
An NP-complete problem:
− Name: Hamiltonian Cycle (HC)
− Instance: a graph G
− Question: does G contain a cycle that visits
each vertex exactly once?
Algorithmics I, 2022 19
4 NP Completeness 203
Other NP-complete problems
Name: Travelling Salesman Decision Problem (TSDP)
Instance: a set of n cities and integer distance d(i,j) between each
pair of cities i, j, and a target integer K
Question: is there a permutation p1p2…pn-1pn of 1,2,…,n such that
d(p1,p2) + d(p2,p3) + … + d(pn-1,pn) + d(pn,p1) ≤ K ?
− i.e. is there a ‘travelling salesman tour’ of length ≤ K c1
Example: 9
10 5
− there is a travelling salesman tour of length 29
6
• d(1,3)+d(3,2)+d(2,4)+d(4,1)=5+6+9+9=29 c3 9
− there is no tour of length < 29 c2 9 c4
The travelling salesman decision problem is NP-complete
Algorithmics I, 2022 20
4 NP Completeness 204
Other NP-complete problems
Name: Clique Problem (CP)
Instance: a graph G and a target integer K
Question: does G contain a clique of size K?
− i.e. a set of K vertices for which there is an edge between all pairs
Example:
− there is a clique of size 4
− there is no clique of size 5
Algorithmics I, 2022 21
4 NP Completeness 205
Other NP-complete problems
Name: Graph Colouring Problem (GCP)
Instance: a graph G and a target integer K
Question: can one of K colours be attached to each vertex of G so
that adjacent vertices always have different colours?
Example:
− there is a colouring using 3 colours
− there is no colouring using 2 colours
Example:
− B = (x1∨x2∨¬x3)∧(¬x1∨x3∨¬x4)∧(¬x2∨x4)∧(x2∨¬x3∨x4)
− B is satisfiable: x1=true, x2=false, x3=true, x4=true
Algorithmics I, 2022 23
4 NP Completeness 207
Optimisation and search problems
An optimisation problem: find the maximum or minimum value
− e.g. the travelling salesman optimisation problem (TSOP) is to find the
minimum length of a tour
Algorithmics I, 2022 24
4 NP Completeness 208
The class P
P is the class of all decision problems that can be solved in
polynomial time
Algorithmics I, 2022 25
4 NP Completeness 209
The class NP
Algorithmics I, 2022 26
4 NP Completeness 210
Non-deterministic algorithms (NDAs)
Such an algorithm has an extra operation: non-deterministic choice
int nonDeterministicChoice(int n)
// returns a positive integer chosen from the range 1,…,n
Algorithmics I, 2022 28
4 NP Completeness 212
Non-deterministic algorithms - Example
Graph colouring
“guess” a colour
“verify” the for each vertex
colouring
Algorithmics I, 2022 29
4 NP Completeness 213
Non-deterministic algorithms
An non-deterministic algorithm can be viewed as
− a guessing stage (non-deterministic)
− a checking stage (deterministic and polynomial time)
Algorithmics I, 2022 30
4 NP Completeness 214
Polynomial time reductions
A polynomial-time reduction (PTR) is a mapping f from a decision
problem Π1 to a decision problem Π2 such that:
Algorithmics I, 2022 31
4 NP Completeness 215
Polynomial time reductions - Properties
Transitivity: Π1 ∝ Π2 and Π2 ∝ Π3 implies that Π1 ∝ Π3
Algorithmics I, 2022 32
4 NP Completeness 216
Polynomial time reductions - Properties
Transitivity: Π1 ∝ Π2 and Π2 ∝ Π3 implies that Π1 ∝ Π3
Algorithmics I, 2022 33
4 NP Completeness 217
Polynomial time reductions - Properties
Relevance to P: Π1 ∝ Π2 and Π2∈P implies that Π1∈P
− to solve an instance of Π1, reduce it to an instance of Π2
− roughly speaking, Π1 ∝ Π2 means that Π1 is ‘no harder’ than Π2
i.e. if we can solve Π2, then we can solve Π1 without much more effort
• just need to additional perform a polynomial time reduction
Algorithmics I, 2022 34
4 NP Completeness 218
Polynomial time reductions - Example
Reducing Hamiltonian cycle problem to travelling salesman problem
d(p1,p2)+d(p2,p3)+…+d(pn-1,pn)+d(pn,p1)≤K ? 6
c3 9
Algorithmics I, 2022 35
4 NP Completeness 219
Polynomial time reductions - Example
Reducing Hamiltonian cycle problem to travelling salesman problem
− G = (V,E) is an instance of HC
− construct TSDP instance f(G) where
• cities = V
• d(u,v)=1 if {u,v}∈E and 2 otherwise (is not an edge of G)
• K = |V|
a b a b
1
1 1
1
c 2
e e c
2 1
1
2 1
G
f(G)
d d
Algorithmics I, 2022 36
4 NP Completeness 220
Polynomial time reductions - Example
Reducing Hamiltonian cycle problem to travelling salesman problem
− G = (V,E) is an instance of HC
− construct TSDP instance f(G)
a b a b
1
1 1
1
c 2
e e c
2 1
1
2 1
G
f(G)
d d
− f(G) can be constructed in polynomial time
− f(G) has a tour of length ≤|V| if and only if G has a Hamiltonian cycle
(tour includes |V| edges so cannot take any of the edges with weight 2)
− therefore TSDP∈P implies that HC∈P
− equivalently HC∉P implies that TSDP∉P (contrapositive)
Algorithmics I, 2022 37
4 NP Completeness 221
NP-completeness
A decision problem Π is NP-complete if
1. Π∈NP
2. for every problem Π’ in NP: Π’ is polynomial-time reducable to Π
Consequences of definition
− if Π is NP-complete and Π∈P, then P = NP
− every problem in NP can be solved in polynomial time by reduction to Π
− supposing P ≠ NP, if Π is NP-complete, then Π∉P
The structure of NP if P ≠ NP
NP
P NP-complete
Algorithmics I, 2022 38
4 NP Completeness 222
Proving NP-completeness
A decision problem Π is NP-complete if
1. Π∈NP
2. for every problem Π’ in NP: Π’ is polynomial-time reducable to Π
Algorithmics I, 2022 39
4 NP Completeness 223
Proving NP-completeness
A decision problem Π is NP-complete if
1. Π∈NP
2. for every problem Π’ in NP: Π’ is polynomial-time reducable to Π
Algorithmics I, 2022 40
4 NP Completeness 224
Proving NP-completeness
The first NP-complete problem?
Example:
− B = (x1∨x2∨¬x3)∧(¬x1∨x3∨¬x4)∧(¬x2∨x4)∧(x2∨¬x3∨x4)
− B is satisfiable: x1=true, x2=false, x3=true, x4=true
Algorithmics I, 2022 41
4 NP Completeness 225
Proving NP-completeness
The first NP-complete problem?
Algorithmics I, 2022 42
4 NP Completeness 226
Clique is NP-complete
Name: Clique Problem (CP)
Instance: a graph G and a target integer K
Question: does G contain a clique of size K?
− i.e. a set of K vertices for which there is an edge between all pairs
Algorithmics I, 2022 43
4 NP Completeness 227
Clique is NP-complete
To complete the proof we need to show SAT µ CP
− i.e. a polynomial time reduction from SAT to CP
Algorithmics I, 2022 44
4 NP Completeness 228
Clique is NP-complete
To complete the proof we need to show SAT µ CP
− i.e. a polynomial time reduction from SAT to CP
Algorithmics I, 2022 45
4 NP Completeness 229
Clique is NP-complete
To prove it is a polynomial time reduction we can show:
Algorithmics I, 2022 46
4 NP Completeness 230
Clique is NP-complete
Why does the construction work?
Algorithmics I, 2022 47
4 NP Completeness 231
Clique is NP-complete - Example
B = (x1∨x2∨¬x3)∧(¬x1∨x3∨¬x4)∧(¬x2∨x4)∧(x2∨¬x3∨x4)
− there are K = 4 clauses
¬x3 ¬x1
C1 C2
The graph G x3
x2
− vertices of G are pairs
(l,C) where l is a literal ¬x4
x1
in clause C
− {(l,C),(m,D)} is an edge
if and only if l≠¬m and C≠D
x4
x2
¬x2 C3
C4 ¬x3
x4
Algorithmics I, 2022 48
4 NP Completeness 232
Clique is NP-complete
B = (x1∨x2∨¬x3)∧(¬x1∨x3∨¬x4)∧(¬x2∨x4)∧(x2∨¬x3∨x4)
− there are K = 4 clauses
¬x3 ¬x1
C1 C2
The graph G x3
x2
x4
x2
satisfying assignment
clique of size 4 ¬x2 C3
C4 ¬x3
x4
Algorithmics I, 2022 49
4 NP Completeness 233
Problem restrictions
A restriction of a problem consists of a subset of the instances of the
original problem
− if a restriction of a given decision problem Π is NP-complete, then so is Π
− given NP-complete problem Π, a restriction of Π might be NP-complete
Algorithmics I, 2022 50
4 NP Completeness 234
Problem restrictions
K-colouring
− restriction of Graph Colouring for for a fixed number K of colours
− 2-colouring is in P (it reduces to checking the graph is bipartite)
− 3-colouring is NP-complete
K-SAT
− restriction of SAT in which every clause contains exactly K literals
− 2-SAT is in P (proof is a tutorial exercise)
− 3-SAT is NP-complete
− showing 3-SAT ∈ NP is easy we will just show SAT ∝ 3-SAT
Algorithmics I, 2022 51
4 NP Completeness 235
SAT ∝ 3-SAT
Given instance B of SAT will construct an instance B’ of 3-SAT
For each clause C of B we construct a number of clauses of B’
Algorithmics I, 2022 52
4 NP Completeness 236
SAT ∝ 3-SAT
Given instance B of SAT will construct an instance B’ of 3-SAT
For each clause C of B we construct a number of clauses of B’
− B’ holds if and only if both the clauses (l1∨l2∨y) and (l1∨l2∨¬y) hold
Algorithmics I, 2022 53
4 NP Completeness 237
SAT ∝ 3-SAT
Given instance B of SAT will construct an instance B’ of 3-SAT
For each clause C of B we construct a number of clauses of B’
Algorithmics I, 2022 54
4 NP Completeness 238
SAT ∝ 3-SAT
Given instance B of SAT will construct an instance B’ of 3-SAT
For each clause C of B we construct a number of clauses of B’
Algorithmics I, 2022 55
4 NP Completeness 239
SAT ∝ 3-SAT
Given instance B of SAT will construct an instance B’ of 3-SAT
For each clause C of B we construct a number of clauses of B’
Algorithmics I, 2022 56
4 NP Completeness 240
Coping with NP-completeness
What to do if faced with an NP-complete problem?
Maybe only a restricted version is of interest (which maybe in P)
− e.g. 2-SAT, 2-colouring are in P
Seek an exponential-time algorithm improving on exhaustive search
− e.g. backtracking (as in the assessed exercise), branch-and-bound
− should extend the set of solvable instances
For an optimisation problem (e.g. calculating min/max value)
− settle for an approximation algorithm that runs in polynomial time
− especially if it gives a provably good result (within some factor of optimal)
− use a heuristic
• e.g. genetic algorithms, simulated annealing, neural networks
For a decision problem
− settle for a probabilistic algorithm correct answer with high probability
Algorithmics I, 2022 57
4 NP Completeness 241
Algorithmics I 2022
Algorithmics I
Section 5 - Computability
5 Computability 242
Introduction to Computability
What is a computer?
Algorithmics I, 2022 2
5 Computability 243
Unsolvable problems
Some problems cannot be solved by a computer
− even with unbounded time
Example: The Tiling Problem
− a tile is a 1×1 square, divided into 4 triangles by its diagonals with each
triangle is given a colour
− each tile has a fixed orientation (no rotations allowed)
− example tiles:
Algorithmics I, 2022 3
5 Computability 244
Tiling problem - Tiling a 5×5 square
Available tiles:
Algorithmics I, 2022 4
5 Computability 245
Tiling problem - Extending to a larger region
Overlap the top two rows with
the bottom two rows
− obtain an 8×5 tiled area
Next place two of
these 8×5 rectangles
side by side
− with the right hand
rectangle one row
above the left hand
rectangle
By repeating this pattern it
follows that any finite area
can be tiled
Algorithmics I, 2022 5
5 Computability 246
Tiling problem - Altering the tiles
Original tiles:
New tiles:
There are 39=19,683 possibilities if you want to try them all out…
Algorithmics I, 2022 6
5 Computability 247
Tiling problem
Tiling problem: given a set of tile descriptions, can any finite area, of
any size, be completely ‘tiled’ using only tiles from this set?
Algorithmics I, 2022 7
5 Computability 248
Undecidable problems
A problem Π that admits no algorithm is called non-computable or
unsolvable
Algorithmics I, 2022 8
5 Computability 249
Post’s correspondence problem (PCP)
A word is a finite string over some given finite alphabet
Example: n=5
− X1=abb, X2=a, X3=bab, X4=baba, X5=aba
− Y1=bbab, Y2=aa, Y3=ab, Y4=aa, Y5=a
− correspondence is given by the sequence 2, 1, 1, 4, 1, 5
• word constructed from Xi’s: aabbabbbabaabbaba
• word constructed from Yi’s: aabbabbbabaabbaba
Algorithmics I, 2022 9
5 Computability 250
Post’s correspondence problem (PCP)
A word is a finite string over some given finite alphabet
Algorithmics I, 2022 10
5 Computability 251
The halting problem
An impossible project: write a program Q that takes as input
− a legal program X (say in Java)
− an input string S for program X
and returns as output
− yes if program X halts when run with input S
− no if program X enters an infinite loop when run with input S
Algorithmics I, 2022 11
5 Computability 252
The halting problem
Example (small) programs
Algorithmics I, 2022 12
5 Computability 253
The halting problem
Example (small) programs
22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1
Algorithmics I, 2022 13
5 Computability 254
The halting problem - Undecidability
A formal definition of the halting problem (HP)
Instance: a legal Java program X and an input string S for X
− can substitute any language for Java
Question: does X halt when run on S?
yes output:"yes"
program X Q
does X
input string S halt on S?
no output:"no"
Algorithmics I, 2022 14
5 Computability 255
The halting problem - Undecidability
Define a new program P with input a legal program W in Java
program P(W)
yes while (true) null;
input program W Q
does W
program W input string W halt on W?
no exit
Algorithmics I, 2022 15
5 Computability 256
The halting problem - Undecidability
Define a new program P with input a legal program W in Java
program P(W)
yes while (true) null;
input program W Q
does W
program W input string W halt on W?
no exit
program P(P)
yes while (true) null;
program P Q
input
does P
program P input string P halt on P?
no exit
Algorithmics I, 2022 16
5 Computability 257
The halting problem - Undecidability
Now let the input W to P be the program P itself
program P(P)
yes while (true) null;
input program P Q
does P
program P input string P halt on P?
no exit
P calls Q(P,P)
− Q terminates by assumption, returning either "yes" or "no"
− recall we have assumed Q solves the halting problem
− suppose Q returns "yes", then by definition of Q this means P terminates
− but this also means P does not terminate (it enters the infinite loop)
− this is a contradiction therefore Q must return "no"
Algorithmics I, 2022 17
5 Computability 258
The halting problem - Undecidability
Now let the input W to P be the program P itself
program P(P)
yes while (true) null;
input program P Q
does P
program P input string P halt on P?
no exit
P calls Q(P,P)
− Q terminates by assumption, returning either "yes" or "no"
− recall we have assumed Q solves the halting problem
− therefore Q must return "no"
− this means by definition of Q that P does not terminate
− but this also means P does terminate
− so again a contradiction
Algorithmics I, 2022 18
5 Computability 259
The halting problem - Undecidability
Now let the input W to P be the program P itself
program P(P)
yes while (true) null;
input program P Q
does P
program P input string P halt on P?
no exit
P calls Q(P,P)
− Q terminates by assumption, returning either "yes" or "no"
− recall we have assumed Q solves the halting problem
− therefore Q can return neither "yes" nor "no"
− meaning no such program Q can exist
− if no such Q can exist, then no algorithm can solve the halting problem
− hence the problem is undeciable
Algorithmics I, 2022 19
5 Computability 260
The halting problem - Undecidability
To summarise the proof
− we assumed the existence of an algorithm A that solved HP
− implemented this algorithm as the program Q
− then constructed a program P which contains Q as a subroutine
− showing that if Q gives the answer "yes", we reach a contradiction
− so Q must give the answer "no", but this also leads to a contradiction
− the contradiction stems from assuming that Q, and hence A exists
− therefore no algorithm A exists and HP is undecidable
Algorithmics I, 2022 20
5 Computability 261
Proving undecidability by reduction
Suppose we can reduce any instance I of Π1 into an instance J of Π2
such that
− I has a ‘yes’-answer for Π1 if and only if J has a "yes"-answer for Π2
(like PTRs but no need for J to be constructed in polynomial time)
Algorithmics I, 2022 21
5 Computability 262
Hierarchy of decision problems
Undecidable
e.g. Tiling Problem,
Halting Problem
Intractable
e.g. Roadblock
NP-complete
e.g. SAT, HC, TSDP
Exactly one of
these lines is real
Polynomial-time solvable
e.g. String distance, (depends on whether
Eulerian cycle P equals NP)
Algorithmics I, 2022 22
5 Computability 263
Models of computation
Algorithmics I, 2022 23
5 Computability 264
Deterministic finite-state automata
Simple machines with limited memory which recognise input on
a read-only tape
A DFA consists of
− a finite input alphabet Σ
− a finite set of states Q
− a initial/start state q0 ∈ Q and set of accepting states F ⊆ Q
− control/program or transition relation T ⊆ (Q × Σ) × Q
• ((q,a),q’) ∈ T means if in state q and read a, then move to state q’
Algorithmics I, 2022 24
5 Computability 265
Deterministic finite-state automata
Simple machines with limited memory which recognise input on
a read-only tape
a a b b b
b
a b a
q0 q1 q2 q3
string is accepted
a b a,b
Algorithmics I, 2022 26
5 Computability 267
Deterministic finite-state automata
A DFA define a language
− determines whether the string on the input tape belongs to that language
− in other words, it solves a decision problem
a a b b b a
b
a b a
q0 q1 q2 q3
string is not accepted
a b a,b
Algorithmics I, 2022 27
5 Computability 268
Deterministic finite-state automata
A DFA define a language
− determines whether the string on the input tape belongs to that language
− in other words, it solves a decision problem
Algorithmics I, 2022 28
5 Computability 269
Deterministic finite-state automata
Recognises the language of strings containing two
consecutive a’s
b
a a
q0 q1 q2
b a,b
a a
q0 q1 q2
b a,b
Algorithmics I, 2022 29
5 Computability 270
Another example
a
b b
q0 q1 q2
b
Algorithmics I, 2022 30
5 Computability 271
Another example
a
b b
q0 q1 q2
b
Algorithmics I, 2022 31
5 Computability 272
NFA to DFA reduction
Can reduce a NFA to a DFA using the subset construction
− states of the DFA are sets of states of the NFA
− construction can cause a blow-up in the number of states
• in the worst case from N states to 2N states
a
b b
− NFA q0 q1 q2
b
a
b b
− DFA {q0} {q1} {q1,q2} b
a
Algorithmics I, 2022 32
5 Computability 273
Regular languages and regular expressions
The languages that can be recognised by finite-state automata
are called the regular languages
Algorithmics I, 2022 33
5 Computability 274
Regular expressions
Order of precedence (highest first)
− closure (*) then concatenation then choice (|)
− as mentioned brackets can be used to override this order
Additional operations
− complement ¬x
• equivalent to the 'or' of all characters in Σ except x
− any single character ?
• equivalent to the 'or' of all characters
Algorithmics I, 2022 34
5 Computability 275
Regular expressions - Examples
The examples from earlier
1) the language comprising one or more a's followed by one or more b’s
− aa*bb*
3) the language of strings that do not contain two consecutive a’s (harder)
− b*(abb*)*(ε|a)
Algorithmics I, 2022 35
5 Computability 276
Regular expressions - Closure
To clarify what R* means
− corresponds to 0 or more copies of the regular expression R
Algorithmics I, 2022 36
5 Computability 277
Regular expressions - Example
Consider the language (aa*bb*)*
− i.e. zero or more sequences which consist of a non-zero number of a’s
followed by a non-zero number of b’s
Corresponding DFA:
a b
a b
q0 q1 q2
b
a
q3
a,b a b a a b
Algorithmics I, 2022 37
5 Computability 278
Regular expressions - Example
A DFA cannot recognise { r* | r∈L(aa*bb*) }
− i.e. { (ambn)* | m>0 and n>0 }
− the problem is the DFA would need to remember the m and n to check
that a string is in the langauge
− but there are infinitely many values for m and n
− hence the DFA would need infinitely many states
− and we only have a finite number (DFA = deterministic finite automaton)
Algorithmics I, 2022 38
5 Computability 279
Regular expressions - Example
How can we recognising strings of the form anbn?
− i.e. a number of a's followed by the same number of b's
It turns out that there is no DFA that can recognise this language
− it cannot be done without some form of memory, e.g. a stack
Idea: as you read a’s, push them onto a stack, then pop the stack as
you read b’s, i.e. the stack works like a counter
So there are some functions (languages) that we would regard as
computable that cannot be computed by a finite-state automaton
− DFAs are not an adequate model of a general-purpose computer
i.e. our 'black box’
Algorithmics I, 2022 39
5 Computability 280
Pushdown automata
A pushdown automaton (PDA) consists of:
− a finite input alphabet Σ, a finite set of stack symbols G
− a finite set of states Q including start state and set of accepting states
− control or transition relation T ⊆ (Q×ΣÈ{ε}×GÈ{ε})×(Q×GÈ{ε})
ε – empty string
current tape old stack new new stack
state symbol symbol state symbol
or ε or ε or ε
Algorithmics I, 2022 40
5 Computability 281
Pushdown automata
Transition relation T ⊆ (Q × ΣÈ{ε} × GÈ{ε}) × (Q × GÈ{ε})
tape stack
control
a b a b a
w top
head v
Algorithmics I, 2022 41
5 Computability 282
Pushdown automata
A PDA accepts an input if and only if after the input has been read,
the stack is empty and control is in an accepting state
Algorithmics I, 2022 42
5 Computability 283
Pushdown automata
There is no explicit test that the stack is empty
− this can be achieved by adding a special symbol ($) to the stack at the
start of the computation
− i.e. we add the symbol to the stack when we know the stack is empty
and we never add $ at any other point during the computation
• unless we pop it from the stack as at this point we again know its empty
− then can check for emptiness by checking $ is on top of the stack
Algorithmics I, 2022 43
5 Computability 284
Pushdown automata
Note PDA defined here are non-deterministic (NDPDA)
− deterministic PDAs (DPDAs) are less powerful
− this differs from DFAs where non-determinism does not add power
− i.e. there are languages that can be recognised by a NDPDA but
not by a DPDA, e.g. the language of palindromes
• palindromes: strings that read the same forwards and backwards
Algorithmics I, 2022 44
5 Computability 285
Pushdown automata - Palindromes
Palindrones are sequences of characters that read the same forwards
and backwards (second half is the reverse of the first half)
Algorithmics I, 2022 45
5 Computability 286
Pushdown automata - Example
Consider the following PDA program (alphabet is {a,b})
− q0 is the start state and q0 and q3 are the only accepting states
− (q0,ε,ε)➝(q1,$) move to q1 and push $ onto stack ($ - special symbol)
− (q1,a,ε)➝(q1,1) read a & push 1 onto stack
− (q1,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack & move to q2
− (q2,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack
− (q2,ε,$)➝(q3,ε) if $ is the top of the stack, pop stack & move to q3
tape
a a b b (empty)
stack
ε,ε➝$
head q0 q1 a,ε➝1
b,1➝ε
ε,$➝ε
q3 q2 b,1➝ε
Algorithmics I, 2022 46
5 Computability 287
Pushdown automata - Example
Consider the following PDA program (alphabet is {a,b})
− q0 is the start state and q0 and q3 are the only accepting states
− (q0,ε,ε)➝(q1,$) move to q1 and push $ onto stack ($ - special symbol)
− (q1,a,ε)➝(q1,1) read a & push 1 onto stack
− (q1,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack & move to q2
− (q2,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack
− (q2,ε,$)➝(q3,ε) if $ is the top of the stack, pop stack & move to q3
Example Inputs
− if you try to recognise aabb, all of the input is read, as we have just seen
end up in an accepting state, and the stack is empty
− if you try to recognise aaabb, all the input is read, you end up in state q2
and the stack in not empty
− if you try to recognise aabbb, you are left with b on the tape, which
cannot be read because of an empty stack
Algorithmics I, 2022 47
5 Computability 288
Pushdown automata - Example
Consider the following PDA program (alphabet is {a,b})
− q0 is the start state and q0 and q3 are the only accepting states
− (q0,ε,ε)➝(q1,$) move to q1 and push $ onto stack ($ - special symbol)
− (q1,a,ε)➝(q1,1) read a & push 1 onto stack
− (q1,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack & move to q2
− (q2,b,1)➝(q2,ε) read b & 1 is top of stack, pop stack
− (q2,ε,$)➝(q3,ε) if $ is the top of the stack, pop stack & move to q3
Algorithmics I, 2022 48
5 Computability 289
Pushdown automata
Pushdown automata are more powerful than finite-state automata
− a PDA can recognise some languages that cannot be recognised by a DFA
− e.g. {anbn | n≥0} is recognised by the PDA example
Algorithmics I, 2022 49
5 Computability 290
Turing machines
A Turing Machine T to recognise a particular language consists of
Algorithmics I, 2022 50
5 Computability 291
Turing machines - Computation
The transition function is of the form
f : ((S⁄{sY,sN}) × Σ) ➝ (S × Σ × {Left, Right})
Algorithmics I, 2022 52
5 Computability 292
Turing machines - Computation
The (finite) input string is placed on the tape
− assume initially all other squares of the tape contain blanks
Algorithmics I, 2022 53
5 Computability 293
The palindrome problem
Instance: a finite string Y
Question: is Y a palindrome, i.e. is Y equal to the reverse of itself
− simple Java method to solve the above:
For simplicity, we assume that the string is composed of a's and b's
Algorithmics I, 2022 54
5 Computability 294
The palindrome problem – Turing machine
Formally defining a Turing Machine for even simple problems is hard
− much easier to design a pseudocode version
Algorithmics I, 2022 55
5 Computability 295
The palindrome problem – Turing machine
Formally defining a Turing Machine for even simple problems is hard
− much easier to design a pseudocode version
TM Algorithm for the Palindrome problem
read the symbol in the current square;
erase this symbol;
enter a state that 'remembers' it;
move tape head to the end of the input;
if (only blank characters remain)
enter the accepting state and halt;
else if (last character matches the one erased)
erase it too;
else
enter rejecting state and halt;
if (no input left)
enter accepting state and halt;
else
move to start of remaining input;
repeat from first step;
Algorithmics I, 2022 56
5 Computability 296
The palindrome problem – Turing machine
We need the following states (assuming alphabet is Σ={#,a,b}):
− s1, s2 moving right to look for the end, remembering the symbol erased
• i.e. s1 when read (and erased) a and s2 when read (and erased) b
Algorithmics I, 2022 57
5 Computability 297
The palindrome problem – Turing machine
Transitions:
− from s0, we enter sY if a blank is read, or move to s1 or s2 depending on
whether an a or b is read, erasing it in either case
− we stay in s1/s2 moving right until a blank is read, at which point we
enter s3/s4 and move left
− from s3/s4 we enter sY if a blank is read, sN if the 'wrong' symbol is read,
otherwise erase it, enter s5, and move left
− in s5 we move left until a blank is read, then move right and enter s0
States:
− s0 reading, erasing and remembering the leftmost symbol
− s1, s2 moving right to look for the end, remembering the symbol erased
− s3, s4 testing for the appropriate rightmost symbol
− s5 moving back to the leftmost symbol
Algorithmics I, 2022 58
5 Computability 298
The palindrome problem – Turing machine
A Turing machine can be described by its state transition diagram
which is a directed graph where
− each state is represented by a vertex
− f(s,σ) = (s¢,σ¢,d) is represented by an edge from vertex s to vertex s¢,
labelled (σ➝σ¢,d)
• edge from s to s’ represents moving to state s’
• σ➝σ¢ represents overwriting the symbol σ on the tape with the symbol σ¢
• d represents moving the tape head one square in direction d
Algorithmics I, 2022 59
5 Computability 299
The palindrome problem – Turing machine
(a®a,R)
(b®b,R)
(#®#,L)
(a®#,R) s1 s3 (a®#,L)
(a®a,L)
(#®#,L) (b®b,L) (b®b,L)
s0 sY sN s5
(#®#,R)
(#®#,L) (a®a,L)
(#®#,L)
(b®#,R) s2 s4 (b®#,L)
(a®a,R)
(b®b,R)
Algorithmics I, 2022 61
5 Computability 301
Turing machines – Functions - Example
Design a Turing machine to compute the function f(k) = k+1
− where the input is in binary
Example 1
− input: 1 0 0 0 1 0
pattern: replace right-most 0 with 1
− output: 1 0 0 0 1 1
then moving right:
Example 2 if 1 replace with 0 and continue right
− input: 1 0 0 1 1 1 if blank halt
− output: 1 0 1 0 0 0
Example 3 (special case)
− input 1 1 1 1 1 special case: no right-most 0, i.e. only 1’s
in the input pattern:
− output: 1 0 0 0 0 0
replace first blank before input with 1
then moving right:
if 1 replace with 0 and continue right
if blank halt
Algorithmics I, 2022 62
5 Computability 302
Turing machines – Functions - Example
Design a Turing machine to compute the function f(k) = k+1
− where the input is in binary
TM Algorithm for the function f(k) = k+1
Algorithmics I, 2022 63
5 Computability 303
Turing machines – Functions - Example
We need the following states
− s0: (start state) moving right seeking start of the input (first blank)
− s1: moving left to right-most 0 or blank
− s2: find first 0 or blank, changed it to 1 and moving right changing 1s to 0s
− s3: the halt state
Algorithmics I, 2022 64
5 Computability 304
Transition state diagram
(0➝1,R) (#➝#,L)
s0 s1 s2 s3
(#➝#,L) (#➝1,R)
(0➝0,R)
Algorithmics I, 2022 65
5 Computability 305
Turing recognizable and decidable
Algorithmics I, 2022 67
5 Computability 307
Enhanced Turing machines
A Turing machines may be enhanced in various ways:
− two or more tapes, rather than just one, may be available
− a 2-dimensional 'tape' may be available
− the TM may operate non-deterministically
• i.e. the transition 'function’ may be a relation rather than a function
− and many more …
Algorithmics I, 2022 68
5 Computability 308
Turing machines – P and NP
The class P is often introduced as the class of decision problems
solvable by a Turing machine in polynomial time
Algorithmics I, 2022 69
5 Computability 309
Counter programs
A completely different model of computation
− all general purpose programming languages have essentially the
same computational power
− a program written in one language could be translated (or compiled) into
a functionally equivalent program in any other
Algorithmics I, 2022 70
5 Computability 310
Counter programs
Counter programs have
Algorithmics I, 2022 71
5 Computability 311
Counter programs - Example
A counter program to evaluate the product x·y
(A, B and C are labels)
// initialise some variables
u = 0;
z = 0; // this will be the product of x and y when we finish
C: halt;
Algorithmics I, 2022 72
5 Computability 312
The Church-Turing Thesis
So is the Turing machine an appropriate model for the ‘black box’?
Algorithmics I, 2022 73
5 Computability 313
The Church-Turing Thesis
So is the Turing machine an appropriate model for the ‘black box’?
Algorithmics I, 2022 74
5 Computability 314