Algorithms and Data Structures
Algorithms and Data Structures
Structures
Alfred Strohmeier
[email protected]
http://lglwww.epfl.ch
March 2000
Insertion Sort
Straight (Linear) Insertion Sort
Binary Insertion Sort
Shell Sort
Exchange Sort
Straight Exchange Sort (Bubble Sort)
Shaker Sort
Quick Sort
Radix Sort
Tree Sort
Binary Tree Sort
Heap Sort
Merge Sort
External Sorting
Sort-Merge
Polyphase Merge
generic
procedure Sort_G
(Table: in out Table_Type);
-- Sort in increasing order of "<".
begin -- Sort_Demo
Ada.Text_IO. Put_Line
("Before Sorting: " & My_String);
Sort_String (My_String);
Ada.Text_IO. Put_Line
("After Sorting: "& My_String);
end Sort_Demo;
sorted part
Basic Idea:
• Find the index Small;
• Exchange the values located at Start and
Small;
• Advance Start.
Array
sorted part
Linked List
...
sorted part
...
Interior loop:
Best case: 0
Worst case: 1+2+...+(n-1) = (1/2)*n*(n-1)
On average: One must walk through half of
the list before finding the location where to
insert the element: (1/4)*n*(n-1)
Comparisons Exchanges
Best Case n-1 2*(n-1)
Average (1/4)*n*(n-1) (1/4)*n*(n-1) + 2*(n-1)
Worst Case (1/2)*n*(n-1) (1/2)*n*(n-1) + 2*(n-1)
J J+1
sorted part
Basic idea:
• walk through the unsorted part from the
end;
• exchange adjacent elements if not in
order;
• increase the sorted part, decrease the
unsorted part by one element.
end loop;
end Sort_G;
end Sort_G;
40 15 30 25 20 10 75 45 65 35 50 60 70
40 15 30 25 20 10 35 45 65 75 50 60 70
35 15 30 25 20 10 40 45 65 75 50 60 70
Table 9: Partitioning
Pivot_Index: Index_Type;
procedure Partition
(Table: in out Table_Type;
Pivot_Index: out Index_Type) is separate;
end if;
end Sort_G;
Storage space:
The procedure calls itself n-1 times, and the
requirement for storage is therefore
proportional to n. This is unacceptable.
Solution: Choose for the pivot the median of
the first, last and middle element in the table.
Place the median value at the first position of
the table and use the algorithm as shown
(Median-of-Three Partitioning).
Execution time:
The execution of Partition for a sequence of
length k needs k comparisons. Execution
time is therefore proportional to n2.
Suppose n = 2m
Quicksort for a sequence of size 2m calls itself
twice with a sequence of size 2m-1.
Storage space:
S2m = S2m-1 + 1
(the maximum for a recursive descent)
therefore:
S2m » m and hence Sn = O(logn)
Time behavior:
C2m = 2C2m-1 + 2m
(The 2m elements must be compared
with the pivot)
therefore:
C2m » 2m(m+1) and hence Cn = O(nlogn)
Parameter passing:
Beware of passing the Table parameter of
Sort_G by copy!
Solution in Ada:
Write local procedures which use the index
bounds of the table as parameters, and
therefore work on the global variable Table.
Stack (Pile)
Queue (Queue, File d'attente)
Deque (Double-Entry Queue,
Queue à double entrée)
Priority Queue (Queue de priorité)
Set (Ensemble)
Bag (Multiset, Multi-ensemble)
Vector (Vecteur)
Matrix (Matrice)
String (Chaîne)
(Linked) List (Liste chaînée)
Linear List (Liste linéaire)
Circular List (Liste circulaire)
Doubly-linked List (Liste doublement
chaînée)
Ring (Anneau)
Tree (Arbre)
Ordered Tree (Arbre ordonné)
(children are ordered)
2-Tree (Arbre d'ordre 2)
(every node has 0 or 2 children)
Trie (from retrieval)
(also called "Lexicographic Search
Tree")
(a trie of order m is empty or is a
sequence of m tries)
Binary Tree (Arbre binaire)
Binary Search Tree (Arbre de recherche)
AVL-Tree (Arbre équilibré)
Heap (Tas)
Multiway Search Tree
B-Tree
Graph (Graphe)
Directed Graph (Graphe orienté)
Undirected Graph (Graphe non orienté)
Weighted Graph (Graphe valué)
DAG (Directed Acyclic Graph, Graphe
orienté acyclique)
Map (Mappe, Table associative)
Hash Table (Table de hachage)
File (Fichier)
Sequential File (Fichier sequentiel)
Direct Access File (Fichier à accès direct)
Indexed File (Fichier indexé, fichier en
accès par clé)
Indexed-Sequential File (ISAM) (Fichier
indexé trié)
Logical Structure
Representation Structure
b a c null
Circular list
Doubly-linked list
? ?
null
null
null
null
Constructors:
• Create, build, and initialize an object.
Selectors:
• Retrieve information about the state of an
object.
Modifiers:
• Alter the state of an object.
Destructors:
• Destroy an object.
8 38 Top
7 4927
6 11
5 315
4 5
3 2352
2 11
1 325
First Definition
An operation is said to be primitive if it cannot
be decomposed.
Example
• procedure Pop
(S: in out Stack; E: out Element);
Second Definition
An operation is said to be primitive if it cannot
be implemented efficiently without access to
the internal representation of the data
structure.
Example
It is possible to compute the size of a stack by
popping off all its element and then
reconstructing it. Such an approach is highly
inefficient.
Definition
A set of primitive operations is sufficient if it
covers the usual usages of the data structure.
Example
A stack with a Push operation but lacking a
Pop operation is of limited value.
Is a stack without an iterator usable?
Definition
A complete set of operations is a set of
primitive operations including a sufficient set
of operations and covering all possible
usages of the data structure; otherwise
stated, a complete set is a "reasonable"
extension of a sufficient set of operations.
Example
Push, Pop, Top, Size and Iterate form a
complete set of operations for a stack.
generic
Max: Natural := 100;
type Item_Type is private;
package Stack_Class_G is
type Stack_Type is limited private;
procedure Push (Stack: in out Stack_Type;
Item: in Item_Type);
procedure Pop (Stack: in out Stack_Type);
function Top (Stack: Stack_Type)
return Item_Type;
generic
with procedure Action
(Item: in out Item_Type);
procedure Iterate (Stack: in Stack_Type);
Empty_Error: exception;
-- raised when an item is accessed or popped from an empty stack.
Full_Error: exception;
-- raised when an item is pushed on a full stack.
private
type Table_Type is array (1..Max)
of Item_Type;
type Stack_Type is record
Table: Table_Type;
Top: Integer range 0..Max := 0;
end record
end Stack_Class_G;
Kinds of trees
Binary tree
Traversal of a binary tree
Search tree
Expression tree
Polish forms
Strictly binary tree
Almost complete binary tree
Heap
E is a finite set
(i) E is empty
or
B C
D E F
G H I
A A
B B
B C
D E F
G H I
Preorder: ABDGCEHIF
Inorder: DGBAHEICF
Postorder: GDBHIEFCA
By level: ABCDEFGHI
C D
E F G H
I J K L
Preorder:
Inorder:
Postorder:
By level:
14
4 15
3 9 18
7 16 20
5 17
Inorder: 3 4 5 7 9 14 15 16 17 18 20
Application: Sorting
Input: 14, 15, 4, 9, 7, 18, 3, 5, 16, 20, 17
Processing: Build the tree
Result: Traverse in inorder
Application: Searching
+ *
a * + c
b c a b
(i) a+b*c (ii) (a+b)*c
log !
x n
(iii) log x (iv) n!
+ *
a * + c
b c a b
+ a * b c * + a b c
a + b * c (a + b) * c
abc*+ab+c*
:=
x /
+ *
- 2 a
b -
0.5
*
b 2 * c
4 a
Heap (Tas)
(i) A heap is an almost complete binary
tree.
(ii) The contents of a node is always
smaller or equal to that of the parent node.
Definitions
Oriented Graph (example and definitions)
Undirected Graph (example and definitions)
Representations
Adjacency Matrix
Adjacency Sets
Linked Lists
Contiguous Lists (matrices)
"Combination"
Abstract Data Type
List of Algorithms
Traversal
Shortest path
Representation of a weighted graph
Dijkstra’s Algorithm
Principle of dynamic programming
a
b
d
c
V = {a, b, c, d}
E = {(a, a), (a, c), (c, d), (d, c)}
• (a, a) is a self-loop (boucle)
• multiple parallel arcs are prohibited (E is a
set!)
a
b
d
c
a
b
d
c
a
b
d
c
V = {a, b, c, d, e}
E = {{a, c}, {a, d}, {c, d}, {d, e}}
• self-loops (boucle) are prohibited.
• multiple parallel edges are prohibited (E is
a set!).
1 2
4 3
• Adjacency matrix
• Adjacency sets (or lists)
• Linked lists
• Contiguous lists (matrices)
• "Combinations"
1 2
4 3
1 2 3 4
1 T T a (i, j) = T
2 T T <=>
3 (i, j) is an arc
4 T T T
type Matrix_Type is
array (Vertex_Type range <>,
Vertex_Type range <>) of Boolean;
1 2
4 3
1 2, 3
2 3, 4
3
4 1, 2, 3
{(1, {2, 3}), (2, {3, 4}), (3, Æ), (4, {1, 2, 3})}
type Adjacency_Set:Type is
array (Vertex_Type range <>)
of Set_of_Vertices;
null
vertex 3
type Vertex_Type;
type Edge_Type;
type Vertex_Access_Type is
access Vertex_Type;
type Edge_Access_Type is
access Edge_Type;
type Graph_Type is
new Vertex_Access_Type;
vertex 1 2 3 null
vertex 2 3 4 null
vertex 3 null
vertex 4 1 2 3 null
vertex 5 null
vertex 6 null
vertex 7 null
generic
type Vertex_Value_Type is private;
type Edge_Value_Type is private;
package Graph_G is
type Graph_Type is limited private;
type Vertex_Type is private;
type Edge_Type is private;
procedure Add
(Vertex: in out Vertex_Type;
Graph: in out Graph_Type);
procedure Remove
(Vertex: in out Vertex_Type;
Graph: in out Graph_Type);
procedure Add
(Edge: in out Edge_Type;
Graph: in out Graph_Type;
Source,
Destination: in Vertex_Type);
procedure Remove
(Edge: in out Edge_Type;
Graph: in out Graph_Type);
function Is_Empty
(Graph: Graph_Type)
return Boolean;
function Number_of_Vertices
(Graph: Graph_Type)
return Natural;
function Source
(Edge: Edge_Type)
return Vertex_Type;
function Destination
(Edge: Edge_Type)
return Vertex_Type;
generic
with procedure Process
(Vertex: in Vertex_Type;
Continue: in out Boolean);
procedure Visit_Vertices
(Graph: in Graph_Type);
generic
with procedure Process
(Edge: in Edge_Type;
Continue: in out Boolean);
procedure Visit_Edges
(Graph: in Graph_Type);
generic
with procedure Process
(Edge: in Edge_Type;
Continue: in out Boolean);
procedure Visit_Adj_Edges
(Vertex: in Vertex_Type
[; Graph: in Graph_Type]);
...
end Graph_G;
Depth-first search
Breadth-first search
Connectivity problems
Minimum Spanning Trees
Path-finding problems
Shortest path
Topological sorting
Transitive Closure
The Newtwork Flow problem
(Ford-Fulkerson)
Matching
Stable marriage problem
Travelling Salesperson problem
Planarity problem
Graph isomorphism problem
B C D
E F
G H
1 A
2 B 5 C 9 D
3 E 6 F
4 G 7 H
8 I
(A, B, E, G, C, F, H, I, D)
1 A
2 B 3 C 4 D
5 E 6 F
7 G 8 H
9 I
(A, B, C, D, E, F, G, H, I)
generic
with procedure Visit (Vertex: in Vertex_Type);
procedure Depth_First (Graph: in Graph_Type);
package Queue is
new Queue_G
(Element_Type => Vertex_Type);
type Queue_of_Vertices is
new Queue.Queue_Type;
generic
with procedure Visit
(Vertex: in Vertex_Type);
procedure Breadth_First
(Graph: in Graph_Type);
Statement 1:
Given a vertex Start and a vertex Target, find
the shortest path from Start to Target.
Statement 2:
Given a vertex Start, find the shortest paths
from Start to all other vertices.
2. Loop
2.1. Extract from Q the vertex C having the
smallest distance:
d (C) = min (d (V); V ÎQ)
2.2. Add C to S (see Justification)
2.3. Add the vertices adjacent to C to Q, and
update their distances:
For every W adjacent to C:
• if WÏQ: d(w) := d(C) + weight (C, W)
• if WÎQ: d(w) :=
min (d(W),d(C) + weight (C, W))
3. Stop condition
• Q is empty
13
4 7 1
Start A C E B
7 2
D
Start
A
2 5
6 3
E B
6
10
4 1 2
D 2 C
S
C
Start
P
X
Precondition:
Weight (V, W) = µ if there is no arc from V to W.
Q: Priority_Queue_Type;
C: Vertex_Type;
Precondition:
Weight (V, W) = µ if there is no arc between
V and W; and Weight (V, W) = 0 if V = W.
S: Set_of_Vertices;
Start, C: Vertex_Type;
Min_Dist: Weight_Type;
Found: Boolean;
Representation of a path
• For each vertex on the path, store its
predecessor (on the path).
Start I1 I2 End
Searching
Sequential Searching, Binary Search,
Tree Search, Hashing, Radix Searching
String Processing
String Searching
Knuth-Morris-Pratt, Boyer-Moore,
Robin-Karp
Pattern Matching
Parsing (Top-Down, Bottom-Up,
Compilers)
Compression
Huffman Code
Cryptology
Image Processing
Worst-case analysis:
complexity for problems the algorithm is in
trouble dealing with.
Average-case analysis:
complexity for "average" problems.
f, g: N+ ® R+
3. Sum (somme)
If f1 is O(g1) and f2 is O(g2),
then f1 + f2 is O(max (f1, f2)), where
max (f1, f2) (x) = max (f1 (x), f2 (x)); " x
4. Product (produit)
If f1 is O(g1) and f2 is O(g2),
then f1·f2 is O(g1·g2).
Sum
O(2n3 + 5n2 + 3) = O(max (2n3, 5n2 + 3))
= O(2n3)
Scaling:
O(2n3) = O(n3)
Transitivity:
O(2n3 + 5n2 + 3) = O(2n3)
and
O(2n3) = O(n3)
therefore
O(2n3 + 5n2 + 3) = O(n3)
C1 = 1
Cn = Cn-1 + n, for n >= 2
therefore
Cn = Cn-2 + (n-1) + n
...
= 1 + 2 +... + n
= (1/2)*n*(n+1)
C1= 0
Cn= Cn/2 + 1 n ³ 2
Approximation with n = 2m
C2m = C2m-1 + 1
= C2m-2 + 2
...
= C2o + m
=m
C1 = 1
Cn = Cn/2 + n n ³ 2
Approximation with n = 2m
C2m = C2m-1 + 2m
= C2m-2 + 2m-1 + 2m
= 1 + 21 + 22 +... + 2m
= 2m+1 - 1
hence
Cn = 2n - 1
Example: ??
traverse n elements
½ ½
Approximation: n = 2m
m
C m = 2¼C m–1 + 2
2 2
C m C m–1
2 2
-----------------
m
= --------------------------
m–1
+1 = m+1
2 2
m
C m = 2 ¼ (m + 1)
2
hence:
Cn @ n × log n
Example: Quick sort
begin
R := some evident value;
while P ¹ empty loop
Select X in P;
Delete X in P;
Modify R based on X;
end loop;
end Solve;
Red + Blue = ??
Combine: Seems hard!
A
max
R
L
min
B
Divide:
• Find the points with the largest and
smallest Y coordinates, called A and B.
• Allocate points to L or R depending on
which side of the line joining A and B, left
or right, they are.
A
max
R
L
min
B
Solve L and R
A
max
R
L
min
B
Combine:
• Connect both A and B to the "right"
vertices of the convex hulls of L and R.
Principle of divide-and-conquer:
In order to solve a large problem, it is divided
into smaller problems which can be solved
independently one from each other.
Dynamic programming
When one does not know exactly which
subproblems to solve, one solves them all,
and one stores the results for using them
later on for solving larger problems.
List of goods:
Name A B C D E
Size 3 4 7 8 9
Value 4 5 10 11 13
Problem
Pack goods of the highest total value in the
knapsack, up to its capacity.
k 1 2 3 4 5 6 7 8 9 10 11 12
Obj 0 0 4 4 4 8 8 8 12 12 12 16
Best A A A A A A A A A A
Obj 0 0 4 5 5 8 9 10 12 13 14 16
Best A B B A B B A B B A
Obj 0 0 4 5 5 8 10 10 12 14 15 16
Best A B B A C B A C C A
Obj 0 0 4 5 5 8 10 11 12 14 15 16
Best A B B A C D A C C A
Obj 0 0 4 5 5 8 10 11 13 14 15 17
Best A B B A C D E C C E
end loop;
end loop;
Undecidable/
Computable Unsolvable
Polynomial- Intractable
Time
where?
NP-Complete
Computability:
Whether or not it is possible to solve a
problem on a machine.
Machine:
• Turing Machine
Examples:
• Halting problem
• Trisect an arbitrary angle with a compass
and a straight edge.
Intractable Problems
There is an algorithm to solve the problem.
Any algorithm requires at least exponential
time.
Efficiency of an algorithm:
• Is a function of the problem size.
Deterministic algorithm/machine:
• At any time, whatever the algorithm/
machine is doing, there is only one thing
that it could do next.
P:
• The set of problems that can be solved by
deterministic algorithms in polynomial
time.
Non-deterministic algorithm/machine:
• To solve the problem, "guess" the solution,
then verify that the solution is correct.
NP:
• The set of problems that can be solved by
non-deterministic algorithms in polynomial
time.
Otherwise stated:
• There is no known polynomial-time
algorithm.
• It has not been proven that the problem is
intractable.
• It is easy to check that a given solution is
valid.
Hamilton circuit
• Does a (un)directed graph have a
Hamilton circuit (cycle), i.e. a circuit (cycle)
containing every vertex.
Colorability
Is an undirected graph k-colorable? (no two
adjacent vertices are assigned the same
color)
Multiprocessor scheduling
• Given a deadline and a set of tasks of
varying length to be performed on two
identical processors, can the tasks be
arranged so that the deadline is met?
3. "Approximation"
The problem is changed. The algorithm does
not find the best solution, but a solution
guaranteed to be close to the best (e.g. value
³ 95% of best value)