Lecture Notes For Design and Analysis of Algorithms
Lecture Notes For Design and Analysis of Algorithms
What is a problem?
The problem is defined as a relation on a set I of problem instances and a set S of problem
solutions.
Instance of Q of are I1: is 2 divisible by 2, I2: is 3 divisible by 2, and so forth and its solution
set S={yes, no}.
Note : No single data structure work well for all purposes, and so it important to know the
strengths and limitations of several of them.
What is an Algorithm?
An algorithm is any well-defined computational procedure that takes some value, or set of
values , as input and produce some value or set of values , as output. That is, an algorithm is
thus a sequence of computational steps that transform the input into the output.
The running time of an algorithm on a particular input is the number of steps executed.
2
Example: The running time of Bubble sort, Selection sort, Insertion sort is O(n ).
Asymptotic Notation
We are studying the asymptotic efficiency of algorithms. That is, we are concerned with how the
running time of an algorithm increases with the size of the input in the limit, as the size of the
input increases without bound.
Examples
3n²+4n+1=O(n²)
2
7n³+5n +6n+2=O(n³)
3
7n³+5n³+6n+2=O(n )
Big-Omega Notation (Ω)
The function f(n)=Ω(g(n)) iff there exists positive constants n0 and c such that f(n) ≥c g(n) for all
n≥n0.
Examples
3n²+4n+1= Ω (n²)
2
7n³+5n +6n+2= Ω (n³)
3
7n³+5n³+6n+2= Ω (n )
Big-Theta Notations ( Θ ) The function f(n)=Θ(g(n)) iff there exists positive constants n0 ,C1
Examples
3n²+4n+1= Θ (n²)
2
7n³+5n +6n+2= Θ (n³)
3
7n³+5n³+6n+2= Θ (n )
Red-Black Trees
Motivation for Red-Black Tree
A set is manipulated by algorihms can grow, shrink, or otherwise change over time, call such
sets dynamic.
Binary Search Tree of height h can implement any of the basic dynamic ste operations- such as
Insert, Delete, Minimum, Maximum- in O(h) or O(n) time.
Figure-a Figure-b
Figure-a: A binary search tree on 6 nodes with height 2. Figure-b: A less efficient binary search tree
with height 4 that contains the same keys.
The set opertions are fast if the height of the search tree is small, but if its height is large,
their performance may be no better than with a linked list.
Red-Black tree( or RBT) are one of many search-tree schemes that are “balanced” in order to
gurantee that basic dynamic-set operations take O(lg n) time in the worst case.
Example:
o Minimum(), Maximum()
o Successor(), Predecessor()
o Search()
o Insert() and Delete(): Will also take O(lg n) time, but will need special care
since they modify tree
The modified tree violates the properties of Red-Black tree and we have to
do the following things to make as the Red-Black tree
(a) we must change the colors of some of the nodes in the tree
(b) Change the pointer i.e., we chamge the pointer strucure through
rotation.
Rotation
We have two kinds of rotations: left rotations and right rotations. When we do a left rotation on a node
x , we assume that its right child y is not nil[T}. The left rotation “pivots” around the link from x to y . It
makes y the new root of the subtree, with x as y’s left child and y’s left child and y’s left child as x’s right
child . Similarly right rotation also and both rotations are shown below figure .
Left-Rotation Right-Rotation
x keeps its left child x keeps its left child
y keeps its right child y keeps its right child
y’s right child becomes x’s left child x’s right child becomes y’s left child
x’s and y’s parents change x’s and y’s parents change
Basic steps:
(1) Use Tree-Insert from BST (slightly modified) to insert a node z into T.
Procedure RB-Insert(T, z).
Color the node z red.
(2) Fix the modified tree by re-coloring nodes and performing rotation to preserve RB tree
property.
Procedure RB-Insert-Fixup.
B-Trees
Motivation for B-Tree
(1) Since Search, Maximum, Minimum, Insert, etc. operations take O(n) time on Binary
Search tree, but O(log n) time on the Red-Black tree for the same operations.
(2) Data access and modifications operations are faster on Red-Black tree than the
Binary search tree.
(3) B-tree are balanced search trees designed to work well on magnetic disks or other
direct-access secondary storage devices.
(4) B-tree are similar to red-black trees , but they are minimizing disk I/O oprations .
Many database systems use B-tree to store information.
B-Trees are a variation on binary search trees that allow quick searching in files on disk. Instead of
storing one key and having two children, B-tree nodes have n keys and n+1 children, where n can be
large.
n[x], the number of keys currently in node x. For example, n[|40|50|] in the above
example B-tree is 2. n[|70|80|90|] is 3.
The n[x] keys themselves, stored in nondecreasing order: key1[x] <= key2[x] <= ... <=
keyn[x][x]. For example, the keys in |70|80|90| are ordered.
leaf[x], a boolean value that is True if x is a leaf and False if x is an internal node.
2. If x is an internal node, it contains n[x]+1 pointers c1, c2, ... , cn[x], cn[x]+1 to its
children. For example, in the above B-tree, the root node has two keys, thus three
children. Leaf nodes have no children so their ci fields are undefined.
3. The keys keyi[x] separate the ranges of keys stored in each subtree: if ki is any key stored
in the subtree with root ci[x], then
k1 <= key1[x] <= k2 <= key2[x] <= ... <= keyn[x][x] <= kn[x]+1.
For example, everything in the far left subtree of the root is numbered less than 30.
Everything in the middle subtree is between 30 and 60, while everything in the far right
subtree is greater than 60. The same property can be seen at each level for all keys in
non-leaf nodes.
4. Every leaf has the same depth, which is the tree's height h. In the above example, h=2.
5. There are lower and upper bounds on the number of keys a node can contain. These
bounds can be expressed in terms of a fixed integer t >= 2 called the minimum degree of
the B-tree:
o Every node other than the root must have at least t-1 keys. Every internal node
other than the root, thus has at least t children. If the tree is nonempty, the root
must have at least one key.
o Every node can contain at most 2t-1 keys. Therefore, an internal node can have at
most 2t children. We say that a node is full if it contains exactly 2t-1 keys.
Searching a B-tree Searching a B-tree is much like searching a binary search tree, only the decision
whether to go "left" or "right" is replaced by the decision whether to go to child 1, child 2, ..., child n[x].
The following procedure, B-Tree-Search, should be called with the root node as its first parameter. It
returns the block where the key k was found along with the index of the key in the block, or "null" if the
key was not found:
B-Tree-Create (T)
x = allocate-node ();
leaf[x] = True
n[x] = 0
Disk-Write (x)
root[T] = x
This assumes there is an allocate-node function that returns a node with key, c, leaf fields, etc.,
and that each node has a unique "address" on the disk.
Clearly, the running time of B-Tree-Create is O(1), dominated by the time it takes to write the
node to disk.
Inserting into a B-tree is a bit more complicated than inserting into an ordinary binary search
tree. We have to find a place to put the new key. We would prefer to put it in the root, since that
is kept in RAM and so we don't have to do any disk accesses. If that node is not full (i.e., n[x] for
that node is not 2t-1), then we can just stick the new key in, shift around some pointers and keys,
write the results back to disk, and we're done. Otherwise, we will have to split the root and do
something with the resulting pair of nodes, maintaining the properties of the definition of a B-
tree.
Here is the general algorithm for insertinging a key k into a B-tree T. It calls two other
procedures, B-Tree-Split-Child, that splits a node, and B-Tree-Insert-Nonfull, that
handles inserting into a node that isn't full.
B-Tree-Insert (T, k)
r = root[T]
if n[r] = 2t - 1 then
// uh-oh, the root is full, we have to split it
s = allocate-node ()
root[T] = s // new root node
leaf[s] = False // will have some children
n[s] = 0 // for now
c1[s] = r // child is the old root node
B-Tree-Split-Child (s, 1, r) // r is split
B-Tree-Insert-Nonfull (s, k) // s is clearly not full
else
B-Tree-Insert-Nonfull (r, k)
endif
Let's look at the non-full case first: this procedure is called by B-Tree-Insert to inserta key
into a node that isn't full. In a B-tree with a large minimum degree, this is the common case.
Before looking at the pseudocode, let's look at a more English explanation of what's going to
happen:
To insert the key k into the node x, there are two cases:
x is a leaf node. Then we find where k belongs in the array of keys, shift everything over
to the left, and stick k in there.
x is not a leaf node. We can't just stick k in because it doesn't have any children; children
are really only created when we split a node, so we don't get an unbalanced tree. We find
a child of x where we can (recursively) insert k. We read that child in from disk. If that
child is full, we split it and figure out which one k belongs in. Then we recursively insert
k into this child (which we know is non-full, because if it were, we would have split it).
if leaf[x] then
keyi+1[x] = k
n[x]++
else
// find child where new key belongs:
i++
Disk-Read (ci[x])
if n[ci[x]] = 2t - 1 then
B-Tree-Insert-Nonfull (ci[x], k)
end if
Now let's see how to split a node. When we split a node, we always do it with respect to its
parent; two new nodes appear and the parent has one more child than it did before. Again, let's
see some English before we have to look at the pseudocode:
We will split a node y that is the ith child of its parent x. Node x will end up having one more
child we'll call z, and we'll make room for it in the ci[x] array right next to y.
We know y is full, so it has 2t-1 keys. We'll "cut" y in half, copying keyt+1[y] through key2t-1[y]
into the first t-1 keys of this new node z.
If the node isn't a leaf, we'll also have to copy over the child pointers ct+1[y] through c2t[y] (one
more child than keys) into the first t children of z.
Then we have to shift the keys and children of x over one starting at index i+1 to accomodate the
new node z, and then update the n[] counts on x, y and z, finally writing them to disk.
B-Tree-Split-Child (x, i, y)
z = allocate-node ()
n[z] = t - 1
for j in 1..t-1 do
keyj[z] = keyj+t[y]
end for
// having "chopped off" the right half of y, it now has t-1 keys
n[y] = t - 1
// shift everything in x over from i+1, then stick the new child in x;
// y will half its former self as ci[x] and z will
// be the other half as ci+1[x]
// ...to accomodate the new key we're bringing in from the middle
// of y (if you're wondering, since (t-1) + (t-1) = 2t-2, where
// the other key went, its coming into x)
keyi[x] = keyt[y]
n[x]++
Example of Insertion
Let's look at an example of inserting into a B-tree. For preservation of sanity, let t = 2. So a node
is full if it has 2(2)-1 = 3 keys in it, and each node can have up to 4 children. We'll insert the
sequence 5 9 3 7 1 2 8 6 0 4 into the tree:
Step 1: Insert 5
___
|_5_|
Step 2: Insert 9
B-Tree-Insert simply calls B-Tree-Insert-Nonfull, putting 9 to the
right of 5:
_______
|_5_|_9_|
Step 3: Insert 3
Again, B-Tree-Insert-Nonfull is called
___ _______
|_3_|_5_|_9_|
Step 4: Insert 7
Tree is full. We allocate a new (empty) node, make it the root, split
the former root, then pull 5 into the new root:
___
|_5_|
__ / \__
|_3_| |_9_|
Step 5: Insert 1
It goes in with 3
___
|_5_|
___ __ / \______
|_1_|_3_| |_7_|_9_|
Step 6: Insert 2
It goes in with 3
___
|_5_|
/ \
___ __ /___ \______
|_1_|_2_|_3_| |_7_|_9_|
Step 7: Insert 8
It goes in with 9
___
|_5_|
/ \
___ __ /___ \__________
|_1_|_2_|_3_| |_7_|_8_|_9_|
Step 8: Insert 6
It would go in with |7|8|9|, but that node is full. So we split it,
bringing its middle child into the root:
_______
|_5_|_8_|
/ | \
___ ____/__ _|_ \__
|_1_|_2_|_3_||_7_| |_9_|
Step 9: Insert 0
0 would go in with |1|2|3|, which is full, so we split it, sending the middle
child up to the root:
___________
|_2_|_5_|_8_|
_/ | | \_
_/ | | \_
_/_ __| |______ \___
|_1_| |_3_| |_6_|_7_| |_9_|
___
|_5_|
___/ \___
|_2_| |_8_|
_/ | | \_
_/ | | \_
___ _/_ ___|___ |_______ \____
|_0_|_1_| |_3_|_4_| |_6_|_7_| |_9_|
Example:
Example:
The Knuth-Morris-Pratt (KMP) Algorithm
Motivation for KMP algorithm
:
KMP algorithm is designed based on the next(q) (q ) function, we are calling the next(q)
(q) function as preprocessing and it takes O(m) time. The next(q) (q) function is
presented as a procedure called COMPUTE-PREFIX-FUNCTION(P).
Rough Idea:
(1) Initialize the solution matrix same as the input graph matrix as a first step.
(2) We update the solution matrix by considering all vertices as an intermediate vertex. The
idea is to one by one pick all vertices and updates all shortest paths which include the
picked vertex as an intermediate vertex in the shortest path.
(3) Pick vertex number k as an intermediate vertex, we already have considered vertices {1,
2, .. k-1} as intermediate vertices. For every pair (i, j) of the source and destination
vertices respectively, there are two possible cases.
(a) k is not an intermediate vertex in shortest path from i to j. We keep the value of
d[i][j] as it is.
(b) k is an intermediate vertex in shortest path from i to j. We update the value of d[i][j]
as d[i][k] + d[k][j] if d[i][j] > d[i][k] + d[k][j]
The following figure shows the above optimal substructure property in the all-pairs shortest path
problem.
Topological sort
Depth-first search can be used to perform a topological sort of a Directed Acyclic
Graph (or DAG). A topological sort of a DAG G=(V, E) is a linear ordering of all its
vertices such that if G contains edge (u, v), then u appers before v in the ordering.
If the graph is cyclic , then no liner ordering is possible.
A topological sort of a graph can be viewed as an ordering of its vertices along a
horizontal line so that all directed edges go from left to right.
Depth-first Search (DFS)
Explore edges out of the most recently discovered vertex v.
When all edges of v have been explored, backtrack to explore other edges leaving the
vertex from which v was discovered (its predecessor).
“Search as deep as possible first.”
Continue until all vertices reachable from the original source are discovered.
If any undiscovered vertices remain, then one of them is chosen as a new source and
search is repeated from that source.
TOPOLOGICAL-SORT(G)
1. call DFS(G) to compute finishing times f[v] for each vertex v.
2. as each vertex is finished, insert it onto the front of a linked list
3. return the linked list of vertices
Topological sorting for the example is : Its vertices are arranged from left to right in order of
decreasing finishing time. The topological sorted order for the shown graph is w z u v y x.