Advanced Business Data Structures
Advanced Business Data Structures
1|Page
TABLE OF CONTENTS
FOUNDATIONS TO DATA STRUCTURES ............................................................................................. 6
Basic Definitions ....................................................................................................................................... 6
Structural and Behavioral Definitions....................................................................................................... 7
Abstract Data Types (ADT) ...................................................................................................................... 8
Categories of data types ............................................................................................................................ 9
Structural Relationships ............................................................................................................................ 9
Why study Data structures ...................................................................................................................... 12
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS ......................................................... 13
The Classic Multiplication Algorithm .................................................................................................... 13
Algorithm's Performance ........................................................................................................................ 14
Θ-Notation (Same order) ........................................................................................................................ 15
Ο-Notation (Upper Bound) ..................................................................................................................... 16
Ω-Notation (Lower Bound)..................................................................................................................... 17
Algorithm Analysis ................................................................................................................................. 17
Optimality ............................................................................................................................................... 18
Reduction ................................................................................................................................................ 18
MATHEMATICS FOR ALGORITHMIC.................................................................................................. 19
Sets .......................................................................................................................................................... 19
Union of Sets....................................................................................................................................... 20
Symmetric difference .......................................................................................................................... 22
Sequences ............................................................................................................................................ 22
Linear Inequalities and Linear Equations ............................................................................................... 24
Inequalities .......................................................................................................................................... 24
Fundamental Properties of Inequalities............................................................................................... 24
Solution of Inequality ......................................................................................................................... 24
Geometric Interpretation of Inequalities ............................................................................................. 25
One Unknown ..................................................................................................................................... 26
Two Unknowns ................................................................................................................................... 26
n Equations in n Unknowns ................................................................................................................ 28
Solution of a Triangular System ......................................................................................................... 30
Back Substitution Method................................................................................................................... 30
Gaussian Elimination .......................................................................................................................... 31
2|Page
Second Part ......................................................................................................................................... 32
Determinants and systems of linear equations .................................................................................... 33
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm........................................ 34
Greedy Approach ................................................................................................................................ 34
Characteristics and Features of Problems solved by Greedy Algorithms ........................................... 35
Definitions of feasibility ..................................................................................................................... 36
1. An Activity Selection Problem ................................................................................................... 36
An activity-selection is the problem of scheduling a resource among several competing
activity. Problem Statement ............................................................................................................... 36
Greedy Algorithm for Selection Problem ........................................................................................... 37
2. Minimum Spanning Tree ............................................................................................................ 40
3. Kruskal's Algorithm .................................................................................................................... 43
4. Prim's Algorithm ......................................................................................................................... 49
5. Dijkstra's Algorithm .................................................................................................................... 50
Analysis .............................................................................................................................................. 50
Example: Step by Step operation of Dijkstra algorithm. .................................................................... 50
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer Algorithm....................... 54
Binary Search (simplest application of divide-and-conquer).............................................................. 54
Sequential Search ................................................................................................................................ 54
Analysis .............................................................................................................................................. 55
Binary Search ...................................................................................................................................... 55
Analysis .............................................................................................................................................. 55
Iterative Version of Binary Search...................................................................................................... 55
Analysis .............................................................................................................................................. 56
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming Algorithm ............... 57
The Principle of Optimality ................................................................................................................ 59
1. Matrix-chain Multiplication Problem ........................................................................................ 62
2. 0-1 Knapsack Problem ................................................................................................................ 73
3. Knapsack Problem ...................................................................................................................... 74
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Amortized Analysis ...................................... 77
1. Aggregate Method .............................................................................................................................. 77
Aggregate Method Characteristics ...................................................................................................... 77
2. Accounting Method ............................................................................................................................ 79
3|Page
3. Potential Method ................................................................................................................................. 83
GRAPH ALGORITHMS ............................................................................................................................ 97
Graph Theory is an area of mathematics that deals with following types of problems ...................... 97
Introduction to Graphs ............................................................................................................................ 97
Definitions............................................................................................................................................... 97
Graphs, vertices and edges .................................................................................................................. 97
Undirected and directed graphs........................................................................................................... 97
Neighbours and adjacency .................................................................................................................. 97
An example ......................................................................................................................................... 97
Mathematical definition ...................................................................................................................... 98
Digraph ................................................................................................................................................... 98
1. Transpose .......................................................................................................................................... 100
2. Square ............................................................................................................................................... 101
3. Incidence Matrix ............................................................................................................................... 102
Types of Graph Algorithms .................................................................................................................. 103
1. Breadth-First Search Traversal Algorithm ................................................................................ 103
2. Depth-First Search .................................................................................................................... 110
3. Strongly Connected Components.............................................................................................. 118
4. Euler Tour ................................................................................................................................. 124
Running Time of Euler Tour ............................................................................................................. 126
AUTOMATA THEORY .......................................................................................................................... 127
What is Automata Theory? ................................................................................................................... 127
The Central Concepts of Automata Theory .......................................................................................... 128
Languages ............................................................................................................................................. 129
Structural expressions ........................................................................................................................... 130
Proofs .................................................................................................................................................... 131
Terminology ...................................................................................................................................... 131
Hints for Finding Proofs ................................................................................................................... 131
Proving techniques ................................................................................................................................ 133
By contradiction ................................................................................................................................ 133
By induction ...................................................................................................................................... 134
Proof by Induction: Example ............................................................................................................ 135
Proof by Construction ....................................................................................................................... 135
4|Page
“If-and-Only-If” statements .................................................................................................................. 136
REFERENCES ......................................................................................................................................... 137
5|Page
FOUNDATIONS TO DATA STRUCTURES
Basic Definitions
Data structures
This is the study of methods of representing objects, the design of algorithms to manipulate the
representations, the proper encapsulation of objects in a reusable form, and the evaluation of the
cost of the implementation, including the measurement of the complexity of the time and space
requirements.
Algorithms
This is the separation between what a data structure represents and what an algorithm
accomplishes, from the implementation details of how things are actually carried out. I.e, hiding
the unnecessary details
Data Abstraction
Hiding of the representational details
Data Types
A data type consists of: a domain (= a set of values) and a set of operations; the kind of data
variables may “hold”.
Example 1:
6|Page
Example 2:
The data type fraction. How can we specify the domain and operations that define fractions? It
seems straightforward to name the operations; fractions are numbers so all the normal arithmetic
operations apply, such as addition, multiplication, and comparison. In addition there might be
some fraction-specific operations such as normalizing a fraction by removing common terms
from its numerator and denominator - for example, if we normalized 6/9 we'd get 2/3.
But how do we specify the domain for fractions, i.e. the set of possible values for a fraction?
The alternative approach to defining the set of values for fractions does not impose any internal
structure on them. Instead it just adds an operation that creates fractions out of other things, such
as
CREATE_FRACTION(N,D)
7|Page
The values of type fraction are defined to be the values that are produced by this function for any
valid combination of inputs.
The answer is that we have to constrain the behavior of this function, by relating it to the other
operations on fractions. For example, one of the key properties of multiplication is that:
So you see CREATE_FRACTION cannot be any old function, its behavior is highly constrained,
because we can write down lots and lots of constraints like this.
And that's the reason we call this sort of definition behavioral, because the definition is strictly in
terms of a set of operations and constraints or axioms relating the behavior of the operations to
one another.
8|Page
Categories of data types
Atomic/Basic data types
Structured data types
Abstract Data types
These are data types that are defined without imposing any structure on their values.
Example
o Boolean
o Integer
o Character
o Double
They are used to implement structured datypes.
Think of an array as a structured type, with each position in the array being a component, then
there is a structural relationship of `followed by': we say that component N is followed by
component N+1.
Structural Relationships
Many structured data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.
9|Page
Linear Structure:
The most common organization for components is a linear structure. A structure is linear if it has
these 2 properties:
Property P1: Each element is `followed by' at most one other element.
Property P2: No two elements are `followed by' the same element.
An array is an example of a linearly structured data type. We generally write a linearly structured
data type like this: A->B->C->D (this is one value with 4 parts).
Counter example 1 (violates P1): A points to B and C B<-A->C
Counter example 2 (violates P2): A and B both point to C A->C<-B
Tree Structure
In tree structure, an element can point to more than one other element, but no two element can
point to one element. i.e
Dropping Constraint P1: If we drop the first constraint and keep the second we get a tree
structure or hierarchy: no two elements are followed by the same element. This is a very
common structure too, and extremely useful.
10 | P a g e
A is followed by B C D, B by E F, C by G. We are not allowed to add any more arcs that point to
any of these nodes (except possibly A - see cyclic structures below).
Graph Structure
Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.
If we drop both constraints, we get a graph. In a graph, there are no constraints on the relations
we can define.
Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.
Cyclic Structures:
All the examples we've seen are acyclic. This means that there is no sequence of arrows that
leads back to where it started. Linear structures are usually acyclic, but cyclic ones are not
uncommon.
11 | P a g e
Example of cyclic linear structure: A B C D A
Graphs are often cyclic, although the special properties of acyclic graphs make them an
important topic of study.
Study of data structures expose designers/students to vast collection of tried and proven
methods used for designing efficient programs.
12 | P a g e
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS
An algorithm, named after the ninth century scholar Abu Jafar Muhammad Ibn Musu Al-
Khowarizmi, is defined as follows:
An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a required result.
An algorithm is a sequence of computational steps that transform the input into the
output.
An algorithm is a sequence of operations performed on data that have to be organized in
data structures.
An algorithm is an abstraction of a program to be executed on a physical machine (model
of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks: this is the
Euclid's algorithm for calculating the greatest common divisor of two integers. This theorem
appeared as the solution to the Proposition II in the Book VII of Euclid's "Elements." Euclid's
"Elements" consists of thirteen books, which contain a total number of 465 propositions.
13 | P a g e
Algorithmic is a branch of computer science that consists of designing and analyzing computer
algorithms
1. The “design” pertain to:
i. The description of algorithm at an abstract level by means of a pseudo language, and
ii. Proof of correctness that is, the algorithm solves the given problem in all cases.
2. The “analysis” deals with performance evaluation (complexity analysis).
We start with defining the model of computation, which is usually the Random Access Machine
(RAM) model, but other models of computations can be use such as PRAM. Once the model of
computation has been defined, an algorithm can be describe using a simple language (or pseudo
language) whose syntax is close to programming language such as C or java.
Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its space complexity
and time complexity. Time complexity of an algorithm concerns determining an expression of
the number of steps needed as a function of the problem size. Since the step count measure is
somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only
to get asymptotic bounds on the step count. Asymptotic analysis makes use of the O (Big Oh)
notation. Two other notational constructs used by computer scientists in the analysis of
algorithms are Θ (Big Theta) notation and Ω (Big Omega) notation.
The performance evaluation of an algorithm is obtained by totaling the number of occurrences of
each operation when running the algorithm. The performance of an algorithm is evaluated as a
function of the input size n and is to be considered modulo a multiplicative constant.
14 | P a g e
The following notations are commonly use notations in performance analysis and used to
characterize the complexity of an algorithm.
Graphically, for all values of n to the right of n0, the value of f(n) lies at or above c1 g(n) and at or
below c2 g(n). In other words, for all n ≥ n0, the function f(n) is equal to g(n) to within a constant
factor. We say that g(n) is an asymptotically tight bound for f(n).
In the set terminology, f(n) is said to be a member of the set Θ(g(n)) of functions. In other words,
because O(g(n)) is a set, we could write
f(n) ∈ Θ(g(n))
to indicate that f(n) is a member of Θ(g(n)). Instead, we write
f(n) = Θ(g(n))
to express the same notation.
Historically, this notation is "f(n) = Θ(g(n))" although the idea that f(n) is equal to something
called Θ(g(n)) is misleading.
15 | P a g e
Example: n2/2 − 2n = (n2), with c1 = 1/4, c2 = 1/2, and n0 = 8.
Graphically, for all values of n to the right of n0, the value of the function f(n) is on or below
g(n). We write f(n) = O(g(n)) to indicate that a function f(n) is a member of the set Ο(g(n)) i.e.
f(n) ∈ Ο(g(n))
Note that f(n) = Θ(g(n)) implies f(n) = Ο(g(n)), since Θ-notation is a stronger notation than Ο-
notation.
Example: 2n2 = Ο(n3), with c = 1 and n0 = 2.
16 | P a g e
Historical Note: The notation was introduced in 1892 by the German mathematician Paul
Bachman.
Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of the number of
operation (or running time) performed by an algorithm when the input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except
possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations over
all problem instances for a given size.
17 | P a g e
Because, it is quite difficult to estimate the statistical behavior of the input, most of the time we
content ourselves to a worst case behavior. Most of the time, the complexity of g(n) is
approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear
complexity), log n (logarithmic complexity), na where a ≥ 2 (polynomial complexity), an
(exponential complexity).
Optimality
Once the complexity of an algorithm has been estimated, the question arises whether this
algorithm is optimal. An algorithm for a given problem is optimal if its complexity reaches the
lower bound over all the algorithms solving this problem. For example, any algorithm solving
“the intersection of n segments” problem will execute at least n2 operations in the worst case
even if it does nothing but print the output. This is abbreviated by saying that the problem has
Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it will be optimal and
of complexity Θ(n2).
Reduction
Another technique for estimating the complexity of a problem is the transformation of problems,
also called problem reduction. As an example, suppose we know a lower bound for a problem A,
and that we would like to estimate a lower bound for a problem B. If we can transform A into B
by a transformation step whose cost is less than that for solving A, then B has the same bound as
A.
The Convex hull problem nicely illustrates "reduction" technique. A lower bound of Convex-hull
problem established by reducing the sorting problem (complexity: Θ(n log n)) to the Convex hull
problem.
18 | P a g e
MATHEMATICS FOR ALGORITHMIC
Sets
A set is a collection of different things (distinguishable objects or distinct objects) represented as
a unit. The objects in a set are called its elements or members. If an object x is a member of a set
S, we write x S. On the the hand, if x is not a member of S, we write z S. A set cannot
contain the same object more than once, and its elements are not ordered.
For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57} or
equivalently, 7 S and 8 S.
We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Thus, {n : n = m2 for some m N } means that a set of perfect squares.
Set Cardinality
The number of elements in a set is called cardinality or size of the set, denoted |S| or sometimes
n(S). The two sets have same cardinality if their elements can be put into a one-to-one
correspondence. It is easy to see that the cardinality of an empty set is zero i.e., | |.
Mustiest
If we do want to take the number of occurrences of members into account, we call the group a
multiset.
For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as multiset.
Infinite Set
A set contains infinite elements. For example, set of negative integers, set of integers, etc.
Empty Set
Set contain no member, denoted as or {}.
19 | P a g e
Subset
For two sets A and B, we say that A is a subset of B, written A B, if every member of A also is
a member of B.
Formally, A B if
x A implies x B
written
x A => x B.
Proper Subset
Set A is a proper subset of B, written A B, if A is a subset of B and not equal to B. That is, A
Equal Sets
The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased definition,
let A and B be sets. A = B if A B and B A.
Power Set
Let A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A. That is, P(A)
= {B : B A}.
For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the power set
of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0), (0, 1), (1, 0), (1, 1)}.
Disjoint Sets
Let A and B be sets. A and B are disjoint if A B = .
Union of Sets
The union of A and B, written A B, is the set we get by combining all elements in A and B into
a single set. That is,
A B = { x : x A or x B}.
For two finite sets A and B, we have identity
20 | P a g e
|A B| = |A| + |B| - |A B|
We can conclude
|A B| |A| + |B|
That is,
if |A B| = 0 then |A B| = |A| + |B| and if A B then |A| |B|
Intersection Sets
The intersection of set set A and B, written A B, is the set of elements that are both in A and
in B. That is,
A B = { x : x A and x B}.
Partition of Set
A collection of S = {Si} of nonempty sets form a partition of a set if
i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
ii. Their union is S, that is, S = Si
Difference of Sets
Let A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.
For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3} while B-A
= {4, 6, 8}.
Complement of a Set
All set under consideration are subset of some large set U called universal set. Given a universal
set U, the complement of A, written A', is the set of all elements under consideration that are not
in A.
21 | P a g e
Formally, let A be a subset of universal set U. The complement of A in U is
A' = A - U
OR
A' = {x : x U and x A}.
For any set A U, we have following laws
i. A'' = A
ii. A A' = .
iii. A A' = U
Symmetric difference
Let A and B be sets. The symmetric difference of A and B is
A B = (A B) - (A B)
As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The
Sequences
A sequence of objects is a list of objects in some order. For example, the sequence 7, 21, 57
would be written as (7, 21, 57). In a set the order does not matter but in a sequence it does.
Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.
Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7, 7, 21, 57) is
different from {7, 21, 57}.
22 | P a g e
Tuples
Finite sequence often are called tuples. For example,
(7, 21) 2-tuple or pair
(7, 21, 57) 3-tuple
(7, 21, ..., k ) k-tuple
An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a, b) = (a, {a,
b}).
For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x), (2, y), (2,
z)}.
n-tuples
The cartesian product of n sets A1, A2, ..., An is the set of n-tuples
A1 × A2 × ... × An = {(a1, ..., an) : ai Ai, i = 1, 2, ..., n}
whose cardinality is
| A1 × A2 × ... × An| = |A1| . |A2| ... |An|
If all sets are finite. We denote an n-fold cartesian product over a single set A by the set
An = A × A × ... × A
whose cardinality is
|An | = | A|n if A is finite.
23 | P a g e
Linear Inequalities and Linear Equations
Inequalities
The term inequality is applied to any statement involving one of the symbols <, >, , .
Example of inequalities are:
i. x 1
ii. x + y + 2z > 16
iii. p2 + q2 1/2
iv. a2 + ab > 1
Solution of Inequality
By solution of the one variable inequality 2x + 3 7 we mean any number which substituted for x
yields a true statement.
For example, 1 is a solution of 2x + 3 7 since 2(1) + 3 = 5 and 5 is less than and equal to 7.
By a solution of the two variable inequality x - y 5 we mean any ordered pair of numbers which
when substituted for x and y, respectively, yields a true statement.
For example, (2, 1) is a solution of x - y 5 because 2-1 = 1 and 1 5.
By a solution of the three variable inequality 2x - y + z 3 we means an ordered triple of number
which when substituted for x, y and z respectively, yields a true statement.
24 | P a g e
A solution of an inequality is said to satisfy the inequality. For example, (2, 1) is satisfy x - y 5.
Two or more inequalities, each with the same variables, considered as a unit, are said to form a
system of inequalities. For example,
x 0
y 0
2x + y 4
Note that the notion of a system of inequalities is analogous to that of a solution of a system of
equations.
Any solution common to all of the inequalities of a system of inequalities is said to be a solution
of that system of inequalities. A system of inequalities, each of whose members is linear, is said
to be a system of linear inequalities.
25 | P a g e
Linear Equations
One Unknown
A linear equation in one unknown can always be stated into the standard form
ax = b
where x is an unknown and a and b are constants. If a is not equal to zero, this equation has a
unique solution
x = b/a
Two Unknowns
A linear equation in two unknown, x and y, can be put into the form
ax + by = c
where x and y are two unknowns and a, b, c are real numbers. Also, we assume that a and b are
no zero.
Where a1, a2, b1, b2 are not zero. A pair of numbers which satisfies both equations is called a
simultaneous solution of the given equations or a solution of the system of equations.
26 | P a g e
1. If the system has exactly one solution, the graph of the linear equations intersect in one
point.
2. If the system has no solutions, the graphs of the linear equations are parallel.
3. If the system has an infinite number of solutions, the graphs of the linear equations
coincide.
The special cases (2) and (3) can only occur when the coefficient of x and y in the two linear
equations are proportional.
ALGORITHM
Step 1 Multiply the two equation by two numbers which are such that
the resulting coefficients of one of the unknown are negative of
each other.
Step 2 Add the equations obtained in Step 1.
27 | P a g e
The output of this algorithm is a linear equation in one unknown. This equation may be solved
for that unknown, and the solution may be substituted in one of the original equations yielding
the value of the other unknown.
As an example, consider the following system
3x + 2y = 8 ------------ (1)
2x - 5y = -1 ------------ (2)
Step 1: Multiply equation (1) by 2 and equation (2) by -3
6x + 4y = 16
-6x + 15y = 3
Step 2: Add equations, output of Step 1
19y = 19
Thus, we obtain an equation involving only unknown y. we solve for y to obtain
y=1
Next, we substitute y =1 in equation (1) to get
x=2
Therefore, x = 2 and y = 1 is the unique solution to the system.
n Equations in n Unknowns
Now, consider a system of n linear equations in n unknowns
a11x1 + a12x2 + . . . + a1nxn = b1
a21x1 + a22x2 + . . . + a2nxn = b2
.........................
an1x1 + an2x2 + . . . + annxn = bn
Where the aij, bi are real numbers. The number aij is called the coefficient of xj in the ith equation,
and the number bi is called the constant of the ith equation. A list of values for the unknowns,
x1 = k1, x2 = k2, . . . , xn = kn
or equivalently, a list of n numbers
u = (k1, k2, . . . , kn)
is called a solution of the system if, with kj substituted for xj, the left hand side of each equation
in fact equals the right hand side.
28 | P a g e
The above system is equivalent to the matrix equation.
The matrix is called the coefficient matrix of the system of n linear equations in the system of n
unknown.
Note for algorithmic nerds: we store a system in the computer as its augmented matrix.
Specifically, system is stored in computer as an N × (N+1) matrix array A, the augmented matrix
array A, the augmented matrix of the system. Therefore, the constants b1, b2, . . . , bn are
respectively stored as A1,N+1, A2,N+1, . . . , AN,N+1.
29 | P a g e
Solution of a Triangular System
If aij = 0 for i > j, then system of n linear equations in n unknown assumes the triangular form.
Where |A| = a11a22 . . . ann; If none of the diagonal entries a11,a22, . . ., ann is zero, the system has
a unique solution.
1. First, we solve the last equation for the last unknown, xn;
xn = bn/ann
2. Second, we substitute the value of xn in the next-to-last equation and solve it for the next-to-
last unknown, xn-1:
3. Third, we substitute these values for xn and xn-1 in the third-from-last equation and solve it for
the third-from-last unknown, xn-2 :
30 | P a g e
.
In general, we determine xk by substituting the previously obtained values of xn, xn-1, . . . , xk+1 in
the kth equation.
Gaussian Elimination
Gaussian elimination is a method used for finding the solution of a system of linear equations.
This method consider of two parts.
1. This part consists of step-by-step putting the system into triangular system.
2. This part consists of solving the triangular system by back substitution.
x - 3y - 2z = 6 --- (1)
2x - 4y + 2z = 18 --- (2)
-3x + 8y + 9z = -9 --- (3)
First Part
Eliminate first unknown x from the equations 2 and 3.
(a) multiply -2 to equation (1) and add it to equation (2). Equation (2) becomes
2y + 6z = 6
(b) Multiply 3 to equation (1) and add it to equation (3). Equation (3) becomes
-y + 3z = 9
31 | P a g e
And the original system is reduced to the system
x - 3y - 2z = 6
2y + 6z = 6
-y + 3z = 9
Now, we have to remove the second unknown, y, from new equation 3, using only the new
equation 2 and 3 (above).
a, Multiply equation (2) by 1/2 and add it to equation (3). The equation (3) becomes 6z = 12.
Therefore, our given system of three linear equation of 3 unknown is reduced to the triangular
system
x - 3y - 2z = 6
2y + 6z = 6
6z = 12
Second Part
In the second part, we solve the equation by back substitution and get
x = 1, y = -3, z = 2
In the first stage of the algorithm, the coefficient of x in the first equation is called the pivot, and
in the second stage of the algorithm, the coefficient of y in the second equation is the point.
Clearly, the algorithm cannot work if either pivot is zero. In such a case one must interchange
equation so that a pivot is not zero. In fact, if one would like to code this algorithm, then the
greatest accuracy is attained when the pivot is as large in absolute value as possible. For
example, we would like to interchange equation 1 and equation 2 in the original system in the
above example before eliminating x from the second and third equation.
32 | P a g e
2x - 4y + 2z = 18
x - 4y + 2z = 18
-3x + 8y + 9z = -9
Let D denote the determinant of the matrix A +(aij) of coefficients; that is, let D =|A|. Also, let Ni
denote the determinants of the matrix obtained by replacing the ith column of A by the column of
constants.
Theorem. If D 0, the above system of linear equations has the unique solution
This theorem is widely known as Cramer's rule. It is important to note that Gaussian elimination
is usually much more efficient for solving systems of linear equations than is the use of
determinants.
33 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm
Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the
sense that they take decisions on the basis of information at hand without worrying about the
effect these decisions may have in the future. They are easy to invent, easy to implement and
most of the time quite efficient. Many problems cannot be solved correctly by greedy approach.
Greedy algorithms are used to solve optimization problems
Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at any moment; it
never reconsiders this decision, whatever situation may arise later.
As an example consider the problem of "Making Change".
Coins available are:
dollars (100 cents)
quarters (25 cents)
dimes (10 cents)
nickels (5 cents)
pennies (1 cent)
Problem Make a change of a given amount using the smallest possible number of coins.
Informal Algorithm
Start with nothing.
at every stage without passing the given amount.
o add the largest to the coins already chosen.
Formal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n)
C ← {100, 25, 10, 5, 1} // constant.
Sol ← {}; // set that will hold the solution set.
Sum ← 0 sum of item in solution set
34 | P a g e
WHILE sum not = n
x = largest item in set C such that sum + x ≤ n
IF no such item THEN
RETURN "No Solution"
S ← S {value of x}
sum ← sum + x
RETURN S
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars,
3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the
largest coin without worrying about the consequences. Moreover, it never changes its mind in the
sense that once a coin has been included in the solution set, it remains there.
To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen
items and the other contains rejected items.
The greedy algorithm consists of four (4) function.
1. A function that checks whether chosen set of items provide a solution.
2. A function that checks the feasibility of a set.
3. The selection function tells which of the candidates is the most promising.
4. An objective function, which does not appear explicitly, gives the value of a solution.
35 | P a g e
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution,
but an optimal solution to the problem. In particular, the empty set is always promising why?
(because an optimal solution always exists)
Unlike Dynamic Programming, which solves the sub problems bottom-up, a greedy strategy
usually progresses in a top-down fashion, making one greedy choice after another, reducing each
problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that
lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.
Given a set S of n activities with and start time, Si and fi, finish time of an ith activity. Find the
maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi
36 | P a g e
Greedy Algorithm for Selection Problem
I. Sort the input activities by increasing finishing time.
f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-SELECTOR (s, f)
1. n = length [s]
2. A={i}
3. j = 1
4. for i = 2 to n
5. do if si ≥ fj
6. then A= AU{i}
7. j=i
8. return set A
Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already sorted in part I by their finish
time.
Correctness
Note that Greedy algorithm do not always produce optimal solutions but GREEDY-ACTIVITY-
SELECTOR does.
37 | P a g e
Theorem Algorithm GREED-ACTIVITY-SELECTOR produces solution of maximum size for
the activity-selection problem.
Proof
I. Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It
implies that activity 1 has the earliest finish time.
Suppose, A S is an optimal solution and let activities in A are ordered by finish time.
Suppose, the first activity in A is k.
If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is
nothing to proof here).
If k 1, we want to show that there is another solution B that begins with greedy choice,
activity 1.
Let B = A - {k} {1}. Because f1 fk, the activities in B are disjoint and since B has
same number of activities as A, i.e., |A| = |B|, B is also optimal.
II. Once the greedy choice is made, the problem reduces to finding an optimal solution for
the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an
optimal solution to the activity-selection problem S` = {i S: Si fi}.
why? Because if we could find a solution B` to S` with more activities then A`, adding 1
to B` would yield a solution B to S with more activities than A, there by contradicting the
optimality.
As an example consider the example. Given a set of activities to among lecture halls. Schedule
all the activities using minimal lecture halls.
In order to determine which activity should use which lecture hall, the algorithm uses the
GREEDY-ACTIVITY-SELECTOR to calculate the activities in the first lecture hall. If there are
38 | P a g e
some activities yet to be scheduled, a new lecture hall is selected and GREEDY-ACTIVITY-
SELECTOR is called again. This continues until all activities have been scheduled.
LECTURE-HALL-ASSIGNMENT (s, f)
n = length [s)
for i = 1 to n
do HALL [i] = NIL
k=1
while (Not empty (s))
do HALL [k] = GREEDY-ACTIVITY-SELECTOR (s, t, n)
k=k+1
return HALL
Correctness
39 | P a g e
The algorithm can be shown to be correct and optimal. As a contradiction, assume the number of
lecture halls are not optimal, that is, the algorithm allocates more hall than necessary. Therefore,
there exists a set of activities B which have been wrongly allocated. An activity b belonging to B
which has been allocated to hall H[i] should have optimally been allocated to H[k]. This implies
that the activities for lecture hall H[k] have not been allocated optimally, as the GREED-
ACTIVITY-SELECTOR produces the optimal set of activities for a particular lecture hall.
Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-SELECTOR runs
in θ(n). The running time of this algorithm is O(n2).
Two important Observations
Choosing the activity of least duration will not always produce an optimal solution. For
example, we have a set of activities {(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3,
5) or (6, 8) will be picked first, which will be picked first, which will prevent the optimal
solution of {(1, 4), (4, 7), (7, 10)} from being found.
Choosing the activity with the least overlap will not always produce solution. For
example, we have a set of activities {(0, 4), (4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0,
3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other activities is (4,
6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5),
(5, 9), (9, 10)} from being found.
Spanning Tree
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more
formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the
vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the
spanning tree is called a chord. We construct spanning tree whenever we want to find a simple,
cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.).
Spanning trees are important because of following reasons.
40 | P a g e
Spanning trees construct a sparse sub graph that tells a lot about the original graph.
Spanning trees a very important in designing efficient routing algorithms.
Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be
solved approximately by using spanning trees.
Spanning trees have wide applications in many areas, such as network design, etc.
One of the most elegant spanning tree algorithm that I know of is as follows:
Examine the edges in graph in any arbitrary sequence.
Decide whether each edge will be included in the spanning tree.
Note that each time a step of the algorithm is performed, one edge is examined. If there is only a
finite number of edges in the graph, the algorithm must halt after a finite number of steps. Thus,
the time complexity of this algorithm is clearly O(n), where n is the number of edges in the
graph.
Greediness It is easy to see that this algorithm has the property that each edge is examined at
most once. Algorithms, like this one, which examine each entity at most once and decide its fate
once and for all during that examination are called greedy algorithms. The obvious advantage of
greedy approach is that we do not have to spend time reexamining entities.
41 | P a g e
Consider the problem of finding a spanning tree with the smallest possible weight or the largest
possible weight, respectively called a minimum spanning tree and a maximum spanning tree. It is
easy to see that if a graph possesses a spanning tree, it must have a minimum spanning tree and
also a maximum spanning tree. These spanning trees can be constructed by performing the
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate ordering of the
edges.
Minimum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges is order of non
decreasing weight (smallest first, largest last). If two or more edges have the same weight, order
them arbitrarily.
Maximum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges in order of non
increasing weight (largest first, smallest last). If two or more edges have the same weight, order
them arbitrarily.
Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the
set of edges. Each edge has a given non negative length.
Problem Find a subset T of the edges of G such that all the vertices remain connected when
only the edges T are used, and the sum of the lengths of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges in T. [Note: A
connected graph with n vertices must have at least n-1 edges AND more that n-1 edges implies at
least one cycle]. So n-1 is the minimum number of edges in the T. Hence if G` is connected and
T has more that n-1 edges, we can remove at least one of these edges without disconnecting
(choose an edge that is part of cycle). This will decrease the total length of edges in T.
42 | P a g e
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must have n-1 edges
otherwise there exist at least one cycle. Hence if G` is connected and T has more that n-1 edges.
Implies that it contains at least one cycle. Remove edge from T without disconnecting the G`
(i.e., remove the edge that is part of the cycle). This will decrease the total length of the edges in
T. Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T must have n-1
edges and since G` is connected it must be a tree. The G` is called Minimum Spanning Tree
(MST).
3. Kruskal's Algorithm
This minimum spanning tree algorithm was first described by Kruskal in 1956 in the same paper
where he rediscovered Jarnik's algorithm. This algorithm was also rediscovered in 1957 by
Loberman and Weinberger, but somehow avoided being renamed after them. The basic idea of
the Kruskal's algorithms is as follows: scan all edges in increasing weight order; if an edge is
safe, keep it (i.e. add it to the set A).
Overall Strategy
Kruskal's Algorithm, as described in CLRS, is directly based on the generic MST algorithm. It
builds the MST in forest. Initially, each vertex is in its own tree in forest. Then, algorithm
consider each edge in turn, order by increasing weight. If an edge (u, v) connects two different
trees, then (u, v) is added to the set of edges of the MST, and two trees connected by an edge (u,
v) are merged into a single tree on the other hand, if an edge (u, v) connects two vertices in the
same tree, then edge (u, v) is discarded.
A little more formally, given a connected, undirected, weighted graph with a function w : E → R.
Starts with each vertex being its own component.
Repeatedly merges two components into one by choosing the light edge that connects
them (i.e., the light edge crossing the cut between them).
Scans the set of edges in monotonically increasing order by weight.
Uses a disjoint-set data structure to determine whether an edge connects vertices in
different components.
Data Structure
43 | P a g e
Before formalizing the above idea, lets quickly review the disjoint-set data structure from
Chapter 21.
Make_SET(v): Create a new set whose only member is pointed to by v. Note that for
this operation v must already be in a set.
FIND_SET(v): Returns a pointer to the set containing v.
UNION(u, v): Unites the dynamic sets that contain u and v into a new set that is union
of these two sets.
Algorithm
Start with an empty set A, and select at every stage the shortest edge that has not been chosen or
rejected, regardless of where this edge is situated in the graph.
KRUSKAL(V, E, w)
A←{} ▷ Set A will ultimately contains the edges of the MST
for each vertex v in V
do MAKE-SET(v)
sort E into nondecreasing order by weight w
for each (u, v) taken from the sorted list
do if FIND-SET(u) = FIND-SET(v)
then A ← A ∪ {(u, v)}
UNION(u, v)
return A
Illustrative Examples
Lets run through the following graph quickly to see how Kruskal's algorithm works on it:
44 | P a g e
Edge (c, f) : safe
Edge (g, i) : safe
Edge (e, f) : safe
Edge (c, e) : reject
Edge (d, h) : safe
Edge (f, h) : safe
Edge (e, d) : reject
Edge (b, d) : safe
Edge (d, g) : safe
Edge (b, c) : reject
Edge (g, h) : reject
Edge (a, b) : safe
At this point, we have only one component, so all other edges will be rejected. [We could add a
test to the main loop of KRUSKAL to stop once |V| − 1 edges have been added to A.]
Note Carefully: Suppose we had examined (c, e) before (e, f ). Then would have found (c, e)
safe and would have rejected (e, f ).
Step 2. The edge (c, i) creates the second tree. Choose vertex c as representative for second tree.
45 | P a g e
Step 3. Edge (g, g) is the next shortest edge. Add this edge and choose vertex g as representative.
Step 5. Add edge (c, f) and merge two trees. Vertex c is chosen as the representative.
Step 6. Edge (g, i) is the next next cheapest, but if we add this edge a cycle would be created.
Vertex c is the representative of both.
46 | P a g e
Step 8. If we add edge (h, i), edge(h, i) would make a cycle.
Step 10. Again, if we add edge (b, c), it would create a cycle. Add edge (d, e) instead to complete
the spanning tree. In this spanning tree all trees joined and vertex c is a sole representative.
Analysis
Initialize the set A: O(1)
First for loop: |V| MAKE-SETs
Sort E: O(E lg E)
Second for loop: O(E) FIND-SETs and UNIONs
47 | P a g e
Assuming the implementation of disjoint-set data structure, already seen in Chapter 21,
that uses union by rank and path compression: O((V + E) α(V)) + O(E lg E)
Since G is connected, |E| ≥ |V| − 1⇒ O(E α(V)) + O(E lg E).
α(|V|) = O(lg V) = O(lg E).
Therefore, total time is O(E lg E).
|E| ≤ |V|2 ⇒lg |E| = O(2 lg V) = O(lg V).
Therefore, O(E lg V) time. (If edges are already sorted, O(E α(V)), which is almost
linear.)
MST_KRUSKAL(G)
for each vertex v in V[G]
do define set S(v) ← {v}
Initialize priority queue Q that contains all edges of G, using the weights as keys
A←{} ▷ A will ultimately contains the edges of the MST
while A has less than n − 1 edges
do Let set S(v) contains v and S(u) contain u
if S(v) ≠ S(u)
then Add edge (u, v) to A
Merge S(v) and S(u) into one set i.e., union
return A
Analysis
The edge weight can be compared in constant time. Initialization of priority queue takes O(E lg
E) time by repeated insertion. At each iteration of while-loop, minimum edge can be removed in
O(log E) time, which is O(log V), since graph is simple. The total running time is O((V + E) log
V), which is O(E lg V) since graph is simple and connected.
48 | P a g e
4. Prim's Algorithm
This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from an
arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed;
the algorithm halts when all the vertices in the graph have been reached. This strategy is greedy
in the sense that at each step the partial spanning tree is augmented with an edge that is the
smallest among all possible adjacent edges.
MST-PRIM
T={}
Let r be an arbitrarily chosen vertex from V.
U = {r}
WHILE | U| < n
DO
Find u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.
T = TU{(u, v)}
U= UU{v}
Analysis
The algorithm spends most of its time in finding the smallest edge. So, time of the algorithm
basically depends on how do we search this edge.
Straightforward method
Just find the smallest edge by searching the adjacency list of the vertices in V. In this case, each
iteration costs O(m) time, yielding a total running time of O(mn).
Binary heap
By using binary heaps, the algorithm runs in O(m log n).
Fibonacci heap
By using Fibonacci heaps, the algorithm runs in O(m + n log n) time.
49 | P a g e
5. Dijkstra's Algorithm
Dijkstra's algorithm solves the single-source shortest-path problem when all edges have non-
negative weights. It is a greedy algorithm and similar to Prim's algorithm. Algorithm starts at the
source vertex, s, it grows a tree, T, that ultimately spans all vertices reachable from S. Vertices
are added to T in order of distance i.e., first S, then the vertex closest to S, then the next closest,
and so on. Following implementation assumes that graph G is represented by adjacency lists.
DIJKSTRA (G, w, s)
1. INITIALIZE SINGLE-SOURCE (G, s)
2. S ← { } // S will ultimately contains vertices of final shortest-path weights from s
3. Initialize priority queue Q i.e., Q ← V[G]
4. while priority queue Q is not empty do
5. u ← EXTRACT_MIN(Q) // Pull out new vertex
6. S ← S È {u}
// Perform relaxation for each vertex v adjacent to u
7. for each vertex v in Adj[u] do
8. Relax (u, v, w)
Analysis
Like Prim's algorithm, Dijkstra's algorithm runs in O(|E|lg|V|) time.
Step1. Given initial graph G=(V, E). All nodes nodes have infinite cost except the source node,
s, which has 0 cost.
50 | P a g e
Step 2. First we choose the node, which is closest to the source node, s. We initialize d[s] to 0.
Add it to S. Relax all nodes adjacent to source, s. Update predecessor (see red arrow in diagram
below) for all nodes updated.
Step 3. Choose the closest node, x. Relax all nodes adjacent to node x. Update predecessors for
nodes u, v and y (again notice red arrows in diagram below).
Step 4. Now, node y is the closest node, so add it to S. Relax node v and adjust its predecessor
(red arrows remember!).
51 | P a g e
Step 5. Now we have node u that is closest. Choose this node and adjust its neighbor node v.
Step 6. Finally, add node v. The predecessor list now defines the shortest path from each node to
the source node, s.
Q as a linear array
EXTRACT_MIN takes O(V) time and there are |V| such operations. Therefore, a total time for
EXTRACT_MIN in while-loop is O(V2). Since the total number of edges in all the adjacency list
52 | P a g e
is |E|. Therefore for-loop iterates |E| times with each iteration taking O(1) time. Hence, the
running time of the algorithm with array implementation is O(V2 + E) = O(V2).
Q as a Fibonacci heap
In this case, the amortized cost of each of |V| EXTRAT_MIN operations if O(lg V).
Operation DECREASE_KEY in the subroutine RELAX now takes only O(1) amortized time for
each of the |E| edges.
As we have mentioned above that Dijkstra's algorithm does not work on the digraph with
negative-weight edges. Now we give a simple example to show that Dijkstra's algorithm
produces incorrect results in this situation. Consider the digraph consists of V = {s, a, b} and E =
{(s, a), (s, b), (b, a)} where w(s, a) = 1, w(s, b) = 2, and w(b, a) = -2.
Dijkstra's algorithm gives d[a] = 1, d[b] = 2. But due to the negative-edge weight w(b, a), the
shortest distance from vertex s to vertex a is 1-2 = -1.
53 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer
Algorithm
Problem Let A[1 . . . n] be an array of non-decreasing sorted order; that is A [i] ≤ A [j]
whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query point. The problem consist of finding 'q' in the
array A. If q is not in A, then find the position where 'q' might be inserted.
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤ A[i].
Sequential Search
Look sequentially at each element of A until either we reach at the end of an array A or find an
item no smaller than 'q'.
Sequential search for 'q' in array A
54 | P a g e
for i = 1 to n do
if A [i] ≥ q then
return index i
return n + 1
Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is Ω(n) in the worst case
and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the array then loop
executed (n + 1) / 2 average number of times. On average (as well as the worst case), sequential
search takes θ(n) time.
Binary Search
Look for 'q' either in the first half or in the second half of the array A. Compare 'q' to an element
in the middle, n/2 , of the array. Let k = n/2 . If q ≤ A[k], then search in the A[1 . . . k];
otherwise search T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the promise
that
A[i-1] < x ≤ A[j]
If i = j then
return i (index)
k= (i + j)/2
if q ≤ A [k]
then return Binary Search [A [i-k], q]
else return Binary Search [A[k+1 . . j], q]
Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e., T(n) = θ(log n).
This version of the binary search takes logarithmic time in the best case.
55 | P a g e
if q > A [n]
then return n + 1
i = 1;
j = n;
while i < j do
k = (i + j)/2
if q ≤ A [k]
then j = k
else i = k + 1
return i (the index)
Analysis
The analysis of Iterative algorithm is identical to that of its recursive counterpart.
56 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming
Algorithm
Dynamic programming is a fancy name for using divide-and-conquer technique with a table. As
compared to divide-and-conquer, dynamic programming is more powerful and subtle design
technique. It is not a specific algorithm, but it is a meta-technique (like divide-and-conquer).
This technique was developed back in the days when "programming" meant "tabular method"
(like linear programming).
It does not really refer to computer programming. Here in our advanced algorithm course, we'll
also think of "programming" as a "tableau method" and certainly not writing code. Dynamic
programming is a stage-wise search method suitable for optimization problems whose solutions
may be viewed as the result of a sequence of decisions. The most attractive property of this
strategy is that during the search for a solution it avoids full enumeration by pruning early partial
decision solutions that cannot possibly lead to optimal solution. In many practical situations, this
strategy hits the optimal solution in a polynomial number of decision steps. However, in the
worst case, such a strategy may end up performing full enumeration.
Dynamic programming takes advantage of the duplication and arrange to solve each subproblem
only once, saving the solution (in table or in a globally accessible place) for later use. The
underlying idea of dynamic programming is: avoid calculating the same stuff twice, usually by
keeping a table of known results of subproblems. Unlike divide-and-conquer, which solves the
subproblems top-down, a dynamic programming is a bottom-up technique. The dynamic
programming technique is related to divide-and-conquer, in the sense that it breaks problem
down into smaller problems and it solves recursively. However, because of the somewhat
different nature of dynamic programming problems, standard divide-and-conquer solutions are
not usually efficient.
The dynamic programming is among the most powerful for designing algorithms for
optimization problem. This is true for two reasons. Firstly, dynamic programming solutions are
based on few common elements. Secondly, dynamic programming problems are typical
optimization problems i.e., find the minimum or maximum cost solution, subject to various
constraints.
In other words, this technique used for optimization problems:
Find a solution to the problem with the optimal value.
57 | P a g e
Then perform minimization or maximization. (We'll see example of both in CLRS).
There are three basic elements that characterize a dynamic programming algorithm:
1. Substructure
Decompose the given problem into smaller (and hopefully simpler) subproblems. Express the
solution of the original problem in terms of solutions for smaller problems. Note that unlike
divide-and-conquer problems, it is not usually sufficient to consider one decomposition, but
many different ones.
2. Table-Structure
After solving the subproblems, store the answers (results) to the subproblems in a table. This is
done because (typically) subproblem solutions are reused many times, and we do not want to
repeatedly solve the same problem over and over again.
3. Bottom-up Computation
Using table (or something), combine solutions of smaller subproblems to solve larger
subproblems, and eventually arrive at a solution to the complete problem. The idea of bottom-up
computation is as follow:
Bottom-up means
i. Start with the smallest subproblems.
ii. Combining theirs solutions obtain the solutions to subproblems of increasing size.
iii. Until arrive at the solution of the original problem.
Once we decided that we are going to attack the given problem with dynamic programming
technique, the most important step is the formulation of the problem. In other words, the most
important question in designing a dynamic programming solution to a problem is how to set up
the subproblem structure.
If I can't apply dynamic programming to all optimization problem, then the question is what
should I look for to apply this technique? Well! the answer is there are two important elements
58 | P a g e
that a problem must have in order for dynamic programming technique to be applicable (look for
those!).
1. Optimal Substructure
Show that a solution to a problem consists of making a choice, which leaves one or sub-problems
to solve. Now suppose that you are given this last choice to an optimal solution. [Students often
have trouble understanding the relationship between optimal substructure and determining which
choice is made in an optimal solution. One way to understand optimal substructure is to imagine
that "God" tells you what was the last choice made in an optimal solution.] Given this choice,
determine which subproblems arise and how to characterize the resulting space of subproblems.
Show that the solutions to the subproblems used within the optimal solution must themselves be
optimal (optimality principle). You usually use cut-and-paste:
Suppose that one of the subproblem is not optimal.
Cut it out.
Paste in an optimal solution.
Get a better solution to the original problem. Contradicts optimality of problem solution.
That was optimal substructure.
You need to ensure that you consider a wide enough range of choices and subproblems that you
get them all . ["God" is too busy to tell you what that last choice really was.] Try all the choices,
solve all the subproblems resulting from each choice, and pick the choice whose solution, along
the subproblem solutions, is best.
We have used "Optimality Principle" a couple of times. Now a word about this beast: The
optimal solution to the problem contains within it optimal solutions to subproblems. This is some
times called the principle of optimality.
The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an
optimal sequence of decisions or choices, each subsequence must also be optimal. For example,
in matrix chain multiplication problem, not only the value we are interested in is optimal but all
the other entries in the table are also represent optimal. The principle can be related as follows:
the optimal solution to a problem is a combination of optimal solutions to some of its
59 | P a g e
subproblems. The difficulty in turning the principle of optimally into an algorithm is that it is not
usually obvious which subproblems are relevant to the problem under consideration.
Informally, the running time of the dynamic programming algorithm depends on the overall
number of subproblems times the number of choices. For example, in the assembly-line
scheduling problem, there are Θ(n) subproblems and 2 choices for each implying running time is
Θ(n). In case of longest common subsequence problem, there are Θ(mn) subproblems and at least
60 | P a g e
2 choices for each implying Θ(mn) running time. Finally, in case of optimal binary search tree
problem, we have Θ(n2) sub-problems and Θ(n) choices for each implying Θ(n3) running time.
When we look at greedy algorithms, we'll see that they work in top down fashion:
First make a choice that looks best.
Then solve the resulting subproblem.
Warning! Its not correct into thinking optimal substructure applies to all optimization problems.
IT DOES NOT. dynamic programming is not applicable to all optimization problems.
In both problems, they gave us an unweighted, directed graph G = (V, E). And our job is to find a
path (sequence of connected edges) from vertex u in V to vertex v in V.
Subproblems Dependencies
It is easy to see that the subproblems, in our above examples, are independent subproblems: For
example, in the assembly line problem, there is only 1 subproblem so it is trivially independent.
Similarly, in the longest common subsequence problem, again we have only 1 subproblem thus it
is automatically independent. On the other hand, in the optimal binary search tree problem, we
have two subproblems, ki, . . . , kr − 1 and kr + 1, . . . , kj, which are clearly independent.
61 | P a g e
Store, don't recompute
Make a table indexed by subproblem.
When solving a subproblem:
o Lookup in the table.
o If answer is there, use it.
o Otherwise, compute answer, then store it.
In dynamic programming, we go one step further. We determine in what order we would want to
access the table, and fill it in that way.
Matrix-chain Multiplication
Knapsack Problem DP Solution
Activity Selection Problem DP Solution
The chain matrix multiplication problem is perhaps the most popular example of dynamic
programming used in the upper undergraduate course (or review basic issues of dynamic
programming in advanced algorithm's class).
The chain matrix multiplication problem involves the question of determining the optimal
sequence for performing a series of operations. This general class of problem is important in
complier design for code optimization and in databases for query optimization. We will study the
problem in a very restricted instance, where the dynamic programming issues are clear. Suppose
that our problem is to multiply a chain of n matrices A1 A2 ... An. Recall (from your discrete
62 | P a g e
structures course), matrix multiplication is an associative but not a commutative operation. This
means that you are free to parenthesize the above multiplication however we like, but we are not
free to rearrange the order of the matrices. Also, recall that when two (non-square) matrices are
being multiplied, there are restrictions on the dimensions.
Suppose, matrix A has p rows and q columns i.e., the dimension of matrix A is p × q. You can
multiply a matrix A of p × q dimensions times a matrix B of dimensions q × r, and the result will
be a matrix C with dimensions p × r. That is, you can multiply two matrices if they are
compatible: the number of columns of A must equal the number of rows of B.
In particular, for 1 ≤ i ≤ p and 1 ≤ j ≤ r, we have
C[i, j] = ∑1 ≤ k ≤ q A[i, k] B[k, j].
There are p . r total entries in C and each takes O(q) time to compute, thus the total time to
multiply these two matrices is dominated by the number of scalar multiplication, which is p . q .
r.
Problem Formulation
Note that although we can use any legal parenthesization, which will lead to a valid result. But,
not all parenthesizations involve the same number of operations. To understand this point,
consider the problem of a chain A1, A2, A3 of three matrices and suppose
A1 be of dimension 10 × 100
A2 be of dimension 100 × 5
A3 be of dimension 5 × 50
Then,
MultCost[((A1 A2) A3)] = (10 . 100 . 5) + (10 . 5 . 50) = 7,500 scalar multiplications.
MultCost[(A1 (A2 A3))] = (100 . 5 . 50) + (10 . 100 . 50) = 75,000 scalar multiplications.
It is easy to see that even for this small example, computing the product according to first
parenthesization is 10 times faster.
63 | P a g e
Equivalent formulation (perhaps more easy to work with!)
Given n matrices, A1, A2, ... An, where for 1 ≤ i ≤ n, Ai is a pi − 1 × pi, matrix, parenthesize the
product A1, A2, ... An so as to minimize the total cost, assuming that the cost of multiplying an pi −
1× pi matrix by a pi × pi + 1 matrix using the naive algorithm is pi − 1× pi × pi + 1.
Note that this algorithm does not perform the multiplications, it just figures out the best order in
which to perform the multiplication operations.
Naive Algorithm
Well, lets start from the obvious! Suppose we are given a list of n matrices. lets attack the
problem with brute-force and try all possible parenthesizations. It is easy to see that the number
of ways of parenthesizing an expression is very large. For instance, if you have just one item in
the list, then there is only one way to parenthesize. Similarly, if you have n item in the list, then
there are n − 1 places where you could split the list with the outermost pair of parentheses,
namely just after first item, just after the second item, and so on and so forth, and just after the (n
− 1)th item in the list.
On the other hand, when we split the given list just after the kth item, we create two sublists to be
parenthesized, one with k items, and the other with n − k items. After splitting, we could consider
all the ways of parenthesizing these sublists (brute force in action). If there are L ways to
parenthesize the left sublist and R ways to parenthesize the right sublist and since these are
independent choices, then the total is L times R. This suggests the following recurrence for P(n),
the number of different ways of parenthesizing n items:
This recurrence is related to a famous function in combinatorics called the Catalan numbers,
which in turn is related to the number of different binary trees on n nodes. The solution to this
recurrence is the sequence of Catalan numbers. In particular P(n) = C(n − 1), where C(n) is the
nth Catalan number. And, by applying Stirling's formula, we get the lower bound on the
sequence. That is,
64 | P a g e
since 4n is exponential and n3/2 is just a polynomial, the exponential will dominate the
expression, implying that function grows very fast. Thus, the number of solutions is exponential
in n, and the brute-force method of exhaustive search is a poor strategy for determining the
optimal parenthesization of a matrix chain. Therefore, the naive algorithm will not be practical
except for very small n.
Therefore, the problem of determining the optimal sequence of multiplications is broken up into
two questions:
Question 1: How do we decide where to split the chain? (What is k?)
Question 2: How do we parenthesize the subchains A1..k Ak+1..n?
65 | P a g e
The subchain problems can be solved by recursively applying the same scheme. On the other
hand, to determine the best value of k, we will consider all possible values of k, and pick the best
of them. Notice that this problem satisfies the principle of optimality, because once we decide to
break the sequence into the product , we should compute each subsequence optimally. That is,
for the global problem to be solved optimally, the subproblems must be solved optimally as well.
The key observation is that the parenthesization of the "prefix" subchain A1..k within this optimal
parenthesization of A1..n. must be an optimal parenthesization of A1..k.
Step: If i ≠ j, then we are asking about the product of the subchain Ai..j and we take advantage of
the structure of an optimal solution. We assume that the optimal parenthesization splits the
product, Ai..j into for each value of k, 1 ≤ k ≤ n − 1 as Ai..k . Ak+1..j.
The optimum time to compute is m[i, k], and the optimum time to compute is m[k + 1, j]. We
may assume that these values have been computed previously and stored in our array. Since Ai..k
is a matrix, and Ak+1..j is a matrix, the time to multiply them is pi − 1 . pk . pj. This suggests the
following recursive rule for computing m[i, j].
To keep track of optimal subsolutions, we store the value of k in a table s[i, j]. Recall, k is the
place at which we split the product Ai..j to get an optimal parenthesization. That is,
66 | P a g e
s[i, j] = k such that m[i, j] = m[i, k] + m[k + 1, j] + pi − 1 . pk . pj.
Implementing the Rule
The third step of the dynamic programming paradigm is to construct the value of an optimal
solution in a bottom-up fashion. It is pretty straight forward to translate the above recurrence into
a procedure. As we have remarked in the introduction that the dynamic programming is nothing
but the fancy name for divide-and-conquer with a table. But here in dynamic programming, as
opposed to divide-and-conquer, we solve subproblems sequentially. It means the trick here is to
solve them in the right order so that whenever the solution to a subproblem is needed, it is
already available in the table.
Consequently, in our problem the only tricky part is arranging the order in which to compute the
values (so that it is readily available when we need it). In the process of computing m[i, j] we
will need to access values m[i, k] and m[k + 1, j] for each value of k lying between i and j. This
suggests that we should organize our computation according to the number of matrices in the
subchain. So, lets work on the subchain:
Let L = j − i + 1 denote the length of the subchain being multiplied. The subchains of length 1
(m[i, i]) are trivial. Then we build up by computing the subchains of length 2, 3, ..., n. The final
answer is m[1, n].
Now set up the loop: Observe that if a subchain of length L starts at position i, then j = i + L − 1.
Since, we would like to keep j in bounds, this means we want j ≤ n, this, in turn, means that we
want i + L − 1 ≤ n, actually what we are saying here is that we want i ≤ n − L +1. This gives us
the closed interval for i. So our loop for i runs from 1 to n − L + 1.
67 | P a g e
IF (q < m[i, j]) {
m[i, j] = q;
s[i, j] = k;
}
}
}
}
return m[1, n](final cost) and s (splitting markers);
}
Example [on page 337 in CLRS]: The m-table computed by MatrixChain procedure for n = 6
matrices A1, A2, A3, A4, A5, A6 and their dimensions 30, 35, 15, 5, 10, 20, 25.
Note that the m-table is rotated so that the main diagonal runs horizontally. Only the main
diagonal and upper triangle is used.
Complexity Analysis
Clearly, the space complexity of this procedure Ο(n2). Since the tables m and s require Ο(n2)
space. As far as the time complexity is concern, a simple inspection of the for-loop(s) structures
gives us a running time of the procedure. Since, the three for-loops are nested three deep, and
68 | P a g e
each one of them iterates at most n times (that is to say indices L, i, and j takes on at most n − 1
values). Therefore, The running time of this procedure is Ο(n3).
Mult(i, j) {
if (i = = j) return A[i]; // Basis
else {
k = s[i, j];
X = Mult(i, k]; // X=A[i]…A[k]
Y = Mult(k + 1, j]; // Y=A[k+1]…A[j]
return XY; // multiply matrices X and Y
}
}
Again, we rotate the s-table so that the main diagonal runs horizontally but in this table we use
only upper triangle (and not the main diagonal).
69 | P a g e
In the example, the procedure computes the chain matrix product according to the
parenthesization ((A1(A2 A3))((A4 A5) A6).
Recursive Implementation
Here we will implement the recurrence in the following recursive procedure that determines m[i,
j], the minimum number of scalar multiplications needed to compute the chain matrix product
Ai..j. The recursive formulation have been set up in a top-down manner. Now consider the
following recursive implementation of the chainmatrix multiplication algorithm. The call Rec-
MatrixChain(p, i, j) computes and returns the value of m[i, j]. The initial call is RecMatrix-
Chain(p, 1, n). We only consider the cost here.
Rec-Matrix-Chain(array p, int i, int j) {
if (i = = j) m[i, i] = 0; // basic case
else {
m[i, j] = infinity; // initialize
for k = i to j − 1 do { // try all possible splits
cost=Rec-Matrix-Chain(p, i, k) + Rec-Matrix-Chain(p, k + 1, j) + p[i −
1]*p[k]*p[j];
if (cost<m[i, j]) then
m[i, j]= cost;
}
} // update if better
70 | P a g e
return m[i,j]; // return final cost
}
This version, which is based directly on the recurrence (the recursive formulation that we gave
for chain matrix problem) seems much simpler. So, what is wrong with this? The answer is the
running time is much higher than the algorithm that we gave before. In fact, we will see that its
running time is exponential in n, which is unacceptably slow.
Let T(n) be the running time of this algorithm on a sequence of matrices of length n, where n = j
− i + 1.
If i = j, then we have a sequence of length 1, and the time is Θ(1). Otherwise, we do Θ(1) work
and then consider all possible ways of splitting the sequence of length n into two sequences, one
of length k and the other of length n − k, and invoke the procedure recursively on each one. So,
we get the following recurrence, defined for n ≥ 1.
71 | P a g e
Now from very practical viewpoint, we would like to have the nice top-down structure of
recursive algorithm with the efficiency of bottom-up dynamic programming algorithm. The
question is: is it possible? The answer is yes, using the technique called memoization.
The fact that our recursive algorithm runs in exponential time is simply due to the spectacular
redundancy in the number of time it issues recursive calls. Now our problem is how could we
eliminate all this redundancy? We could store the value of "cost" in a globally accessible place
the first time we compute it and then simply use this precomputed value in place of all future
recursive calls. This technique of saving values that have already been computed is referred to as
memoization.
The idea is as follow. Let's reconsider the function RecMatrixChain() given above. It's job is to
compute m[i, j], and return its value. The main problem with the procedure is that it recomputes
the same entries over and over. So, we will fix this by allowing the procedure to compute each
entry exactly once. One way to do this is to initialize every entry to some special value (e.g.
UNDEFINED). Once an entries value has been computed, it is never recomputed.
In essence, what we are doing here is we are maintaining a table with subproblem solution (like
dynamic programming algorithm), but filling up the table more like recursive algorithm. In other
words, we would like to have best of both worlds!
Mem-Matrix-Chain(array p, int i, int j) {
if (m[i, j] != UNDEFINED) then
return m[i, j]; // already defined
else if ( i = = j) then
m[i, i] = 0; // basic case
else {
m[i, j] = infinity; // initialize
for k = i to j − 1 do { // try all splits
cost = Mem-Matrix-Chain(p, i, k) + Mem-Matrix-Chain(p, k + 1, j) + p[i − 1]
p[k] p[j];
if (cost < m[i, j]) then // update if better
m[i, j] = cost;
}
}
72 | P a g e
return m[i, j]; // return final cost
}
Like the dynamic programming algorithm, this version runs in time Ο(n3). Intuitively, the reason
is this: when we see the subroblem for the first time, we compute its solution and store in the
table. After that whenever we see the subproblem again, we simply looked up in the table and
returned the solution. So, we are computing each of the Ο(n2) table entry once and, and the work
needed to compute one table entry (most of it in the forloop) is at most Ο(n). So, memoization
turns an Ω(2n)-time algorithm into an time Ο(n3)-algorithm.
As a matter of fact, in general, Memoization is slower than bottom-up method, so it is not usually
used in practice. However, in some dynamic programming problems, many of the table entries
are simply not needed, and so bottom-up computation may compute entries that are never
needed. In these cases, we use memoization to compute the table entry once. If you know that
most of the table will not be needed, here is a way to save space. Rather than storing the whole
table explicitly as an array, you can store the "defined" entries of the table in a hash table, using
the index pair (i, j) as the hash key.
73 | P a g e
Exhibit greedy choice property.
Þ Greedy algorithm exists.
Exhibit optimal substructure property.
Þ
0-1 knapsack problem
Exhibit No greedy choice property.
Þ No greedy algorithm exists.
Exhibit optimal substructure property.
Only dynamic programming algorithm exists.
3. Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds. Then S` = S - {i} is an
optimal solution for W - wi pounds and the value to the solution S is Vi plus the value of the
subproblem.
We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2,
. . . , i and maximum weight w. Then
0 if i = 0 or w = 0
c[i,w] =c[i-1, w] if wi ≥ 0
max [vi + c[i-1, w-wi], c[i-1,
if i>0 and w ≥ wi
w]}
This says that the value of the solution to i items either include ith item, in which case it is vi plus
a subproblem solution for (i - 1) items and the weight excluding wi, or does not include ith item,
in which case it is a subproblem's solution for (i - 1) items and the same weight. That is, if the
thief picks item i, thief takes vi value, and thief can choose from items w - wi, and get c[i - 1, w -
wi] additional value. On other hand, if thief decides not to take item i, thief can choose from item
1,2, . . . , i- 1 upto the weight limit w, and get c[i - 1, w] value. The better of these two choices
should be made.
74 | P a g e
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula:
boundary values are 0, and other values are computed from the input and "earlier" values of c. So
the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR for finding a longest
common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two
sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in the table,
that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a row-major
order. That is, the first row of c is filled in from left to right, then the second row, and so on. At
the end of the computation, c[n, w] contains the maximum value that can be picked into the
knapsack.
Dynamic-0-1-knapsack (v, w, n, W)
FOR w = 0 TO W
DO c[0, w] = 0
FOR i=1 to n
DO c[i, 0] = 0
FOR w=1 TO W
DO IFf wi ≤ w
THEN IF vi + c[i-1, w-wi]
THEN c[i, w] = vi + c[i-1, w-wi]
ELSE c[i, w] = c[i-1, w]
ELSE
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n. w] and tracing backwards
where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and
we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue
tracing with c[i-1, w-W].
Analysis
75 | P a g e
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows: θ(nw) times to
fill the c-table, which has (n +1).(w +1) entries, each requiring θ(1) time to compute. O(n) time
to trace the solution, because the tracing process starts in row n of the table and moves up 1 row
at each step.
76 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Amortized Analysis
In an amortized analysis, the time required to perform a sequence of data structure operations is
average over all operation performed. Amortized analysis can be used to show that average cost
of an operation is small, if one average over a sequence of operations, even though a single
operation might be expensive. Unlike the average probability distribution function, the amortized
analysis guarantees the 'average' performance of each operation in the worst case.
CLR covers the three most common techniques used in amortized analysis. The main difference
is the way the cost is assign.
1. Aggregate Method
o Computes an upper bound T(n) on the total cost of a sequence of n operations.
2. Accounting Method
o Overcharge some operations early in the sequence. This 'overcharge' is used later
in the sequence for pay operation that charges less than they actually cost.
3. Potential Method
o Maintain the credit as the potential energy to pay for future operations.
1. Aggregate Method
77 | P a g e
do pop(s)
k = k-1
Analysis
i. Worst-case cost for MULTIPOP is O(n). There are n successive calls to MULTIPOP
would cost O(n2). We get unfair cost O(n2) because each item can be poped only once for
each time it is pushed.
ii. In a sequence of n mixed operations the most times multipop can be called n/2.Since the
cost of push and pop is O(1), the cost of n stack operations is O(n). Therefore, amortized
cost of an operation is the average: O(n)/n = O(1).
A single execution of INCREMENT takes O(k) in worst case when Array A contains all 1's.
Thus, a sequence of n INCREMENT operation on an initially zero counter takes O(nk) in the
worst case. This bound is correct but not tight.
78 | P a g e
Amortized Analysis
We can tighten the analysis to get a worst-case cost for a sequence of an INCREMENT's by
observing that not all bits flip each time INCREMENT is called.
Bit A[0] is changed ceiling n times (every time)
Bit A[0] is changed ceiling [n/21] times (every other time)
Bit A[0] is changed ceiling [n/22] times (every other time)
.
.
.
Bit A[0] is changed ceiling [n/2i] times.
For i > lg(n) , bit A i never flips at all. The total number of flips in a sequence is
thus
floor(log)
Si=0 n/2i < n ∞Si=0 1/2i = 2n
Therefore, the worst-case time for a sequence of n INCREMENT operation on an initially zero
counter is therefore O(n), so the amortized cost of each operation is O(n)/n = O(1).
2. Accounting Method
In this method, we assign changes to different operations, with some operations charged more or
less than they actually cost. In other words, we assign artificial charges to different operations.
Any overcharge for an operation on an item is stored (in an bank account) reserved for
that item.
Later, a different operation on that item can pay for its cost with the credit for that item.
The balanced in the (bank) account is not allowed to become negative.
The sum of the amortized cost for any sequence of operations is an upper bound for the
actual total cost of these operations.
The amortized cost of each operation must be chosen wisely in order to pay for each
operation at or before the cost is incurred.
79 | P a g e
Application 1: Stack Operation
Recall the actual costs of stack operations were:
PUSH (s, x) 1
POP (s) 1
MULTIPOP (s, k) min(k,s)
Therefore, for any sequence for n PUSH, POP, and MULTIPOP operations, the amortized cost is
an
j=1
Ci = Si 3 - Ciactual
= 3i - (2floor(lg1) + 1 + i -floor(lgi) - 2)
If i = 2k, where k ≥ 0, then
Ci = 3i - (2k+1 + i - k -2)
=k+2
If i = 2k + j, where k ≥ 0 and 1 ≤ j ≤ 2k, then
Ci = 3i - (2k+1 + i - k - 2)
= 2j + k + 2
This is an upperbound on the total actual cost. Since the total amortized cost is O(n) so is the
total cost.
As an example, consider a sequence of n operations is performed on a data structure. The ith
operation costs i if i is an exact power of 2, and 1 otherwise. The accounting method of
80 | P a g e
amortized analysis determine the amortized cost per operation as follows:
Let amortized cost per operation be 3, then the credit Ci after the ith operation is: Since k ≥ 1 and
j ≥ 1, so credit Ci always greater than zero. Hence, the total amortized cost 3n, that is O(n) is an
upper bound on the total actual cost. Therefore, the amortized cost of each operation is O(n)/n =
O(1).
Another example, consider a sequence of stack operations on a stack whose size never exceeds k.
After every k operations, a copy of the entire stack is made. We must show that the cost of n
stack operations, including copying the stack, is O(n) by assigning suitable amortized costs to the
various stack operations.
There are, ofcourse, many ways to assign amortized cost to stack operations. One way is:
PUSH 4,
POP 0,
MULTIPOP 0,
STACK-COPY 0.
Every time we PUSH, we pay a dollar (unit) to perform the actual operation and store 1 dollar
(put in the bank). That leaves us with 2 dollars, which is placed on x (say) element. When we
POP x element off the stack, one of two dollar is used to pay POP operation and the other one
(dollar) is again put into a bank account. The money in the bank is used to pay for the STACK-
COPY operation. Since after kk dollars in the bank and the stack size is never exceeds k, there is
enough dollars (units) in the bank (storage) to pay for the STACK-COPY operations. The cost of
n stack operations, including copy the stack is therefore O(n). operations, there are atleast
81 | P a g e
1. i = 0
2. while i < length[A] and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
Within the while loop, the cost of resetting the bits is paid for by the dollars on the bits that are
reset.At most one bit is set, in line 6 above, and therefore the amortized cost of an INCREMENT
operation is at most 2 dollars (units). Thus, for n INCREMENT operation, the total amortized
cost is O(n), which bounds the total actual cost.
Consider a Variant
Let us implement a binary counter as a bit vector so that any sequence of n INCREMENT and
RESET operations takes O(n) time on an initially zero counter,. The goal here is not only to
increment a counter but also to read it to zero, that is, make all bits in the binary counter to zero.
The new field , max[A], holds the index of the high-order 1 in A. Initially, set max[A] to -1. Now,
update max[A] appropriately when the counter is incremented (or reset). By contrast the cost of
RESET, we can limit it to an amount that can be covered from earlier INCREMENT'S.
INCREMENT (A)
1. i = 1
2. while i < length [A] and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
7. if i > max[A]
8. then max[A] = i
9. else max[A] = -1
Note that lines 7, 8 and 9 are added in the CLR algorithm of binary counter.
82 | P a g e
RESET(A)
For i = 0 to max[A]
do A[i] = 0
max[A] = -1
For the counter in the CLR we assume that it cost 1 dollar to flip a bit. In addition to that we
assume that we need 1 dollar to update max[A]. Setting and Resetting of bits work exactly as the
binary counter in CLR: Pay 1 dollar to set bit to 1 and placed another 1 dollar on the same bit as
credit. So, that the credit on each bit will pay to reset the bit during incrementing.
In addition, use 1 dollar to update max[A] and if max[A] increases place 1 dollar as a credit on a
new high-order 1. (If max[A] does not increase we just waste that one dollar). Since RESET
manipulates bits at some time before the high-order 1 got up to max[A], every bit seen by
RESET has one dollar credit on it. So, the zeroing of bits by RESET can be completely paid for
by the credit stored on the bits. We just need one dollar to pay for resetting max[A].
Thus, charging 4 dollars for each INCREMENT and 1 dollar for each RESET is sufficient, so the
sequence of n INCREMENT and RESET operations take O(n) amortized time.
3. Potential Method
This method stores pre-payments as potential or potential energy that can be released to pay for
future operations. The stored potential is associated with the entire data structure rather than
specific objects within the data structure.
Notation:
D0 is the initial data structure (e.g., stack)
Di is the data structure after the ith operation.
ci is the actual cost of the ith operation.
The potential function Ψ maps each Di to its potential value Ψ(Di)
The amortized cost ^ci of the ith operation w.r.t potential function Ψ is defined by
^
ci = ci + Ψ(Di) - Ψ (Di-1) --------- (1)
The amortized cost of each operation is therefore
^
ci = [Actual operation cost] + [change in potential].
83 | P a g e
By the eq.I, the total amortized cost of the n operation is
n ^ n
i=1 ci = i=1(ci + Ψ(Di) - Ψ (Di-1) )
n n
= i=1 ci + i=1 Ψ(Di) - n i=1Ψ (Di-1)
n
= i=1 ci + Ψ(D1) + Ψ(D2) + . . . + Ψ (Dn-1) + Ψ(Dn) - {Ψ(D0) + Ψ(D1) + . . . + Ψ (Dn-
1)
n
= i=1 ci + Ψ(Dn) - Ψ(D0) ----------- (2)
n
If we define a potential function Ψ so that Ψ(Dn) ≥ Ψ(D0), then the total amortized cost i=1
^
ci is an upper bound on the total actual cost.
As an example consider a sequence of n operations performed on a data structure. The ith
operation costs i if i is an exact power of 2 and 1 otherwise. The potential method of amortized
determines the amortized cost per operation as follows:
Let Ψ(Di) = 2i - 2ëlgi+1û + 1, then
Ψ(D0) = 0. Since 2ëlgi+1û ≤ 2i where i >0 ,
Therefore, Ψ(Di) ≥ 0 = Ψ(D0)
If i = 2k where k ≥ 0 then
2ëlgi+1û = 2k+1 = 2i
2ëlgiû = 2k = i
^
ci = ci + Ψ(Di) - Ψ(Di-1)
= i + (2i -2i+1) -{2(i-1)-i+1}
=2
If i = 2k + j where k ≥ 0 and 1 ≤ j ≤ 2k
then 2ëlgi+1û = 2[lgi]
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 3
n ^ n
Because i=1 ci = i=1 ci + Ψ(Dn) - Ψ(D0)
84 | P a g e
and Ψ(Di) ≥ Ψ(D0), so, the total amortized cost of n operation is an upper bound on the total
actual cost. Therefore, the total amortized cost of a sequence of n operation is O(n) and the
amortized cost per operation is O(n) / n = O(1).
Ψ(Di) ≥ 0 = Ψ(D0).
Therefore, the total amortized cost of n operations w.r.t. function Ψ represents an upper bound on
the actual cost.
Amortized costs of stack operations are:
PUSH
If the ith operation on a stack containing s object is a PUSH operation, then the potential
difference is
Ψ(Di) - Ψ(Di-1) = (s + 1) - s = 1
In simple words, if ith is PUSH then (i-1)th must be one less. By equation I, the amortized cost of
this PUSH operation is
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 1 + 1 = 2
MULTIPOP
If the ith operation on the stack is MULTIPOP(S, k) and k` = min(k,s) objects are popped off the
stack.
The actual cost of the operation is k`, and the potential difference is
Ψ(Di) - Ψ(Di-1) = -k`
why this is negative? Because we are taking off item from the stack. Thus, the amortized cost of
the MULTIPOP operation is
^
ci = ci + Ψ(Di) - Ψ(Di-1) = k`-k` = 0
85 | P a g e
POP
Similarly, the amortized cost of a POP operation is 0.
Analysis
Since amortized cost of each of the three operations is O(1), therefore, the total amortized cost of
n operations is O(n). The total amortized cost of n operations is an upper bound on the total
actual cost.
Lemma If data structure is Binary heap: Show that a potential function is O(nlgn) such that
the amortized cost of EXTRACT-MIN is constant.
Proof
We know that the amortized cost ^ci of operation i is defined as
^
ci = ci + Ψ(Di) - Ψ(Di-1)
For the heap operations, this gives us
c1lgn = c2lg(n+c3) + Ψ(Di) - Ψ(Di-1) (Insert) ------------(1)
c4 = c5lg(n + c6) + Ψ(Di) - Ψ(Di-1) (EXTRACT-MIN) -----(2)
Consider the potential function Ψ(D) = lg(n!), where n is the number of items in D.
From equation (1), we have
(c1 - c2)lg(n + c3) = lg(n!) - lg ((n-1)!) = lgn.
This clearly holds if c1 = c2 and c3 = 0.
From equation (2), we have
c4 - c5 lg(n + c6) = lg(n!) - lg ((n+1)!) = - lg(n+1).
This clearly holds if c4 = 0 and c4 = c6 = 1.
Remember that stirlings function tells that lg(n!) = θ(nlgn), so
Ψ(D) = θ(n lg n)
And this completes the proof.
86 | P a g e
Define the potential of the counter after ith INCREMENT operation to be bi, the number of 1's in
the counter after ith operation.
Let ith INCREMENT operation resets ti bits. This implies that actual cost = atmost (ti + 1).
Why? Because in addition to resetting ti it also sets at most one bit to 1.
Therefore, the number of 1's in the counter after the ith operation is therefore bi ≤ bi-1 - ti + 1, and
the potential difference is
Ψ(Di) - Ψ(Di-1) ≤ (bi-1 - ti + 1) - bi-1 = 1- ti
Putting this value in equation (1), we get
^
ci = ci + Ψ(Di) - Ψ (Di-1)
= (ti + 1) + (1- ti)
= 2
If counter starts at zero, then Ψ(D0) = 0. Since Ψ(Di) ≥ 0 for all i, the total amortized cost of a
sequence of n INCREMENT operation is an upper bound on the total actual cost, and so the
worst-case cost of n INCREMENT operations is O(n).
If counter does not start at zero, then the initial number are 1's (= b0).
After 'n' INCREMENT operations the number of 1's = bn, where 0 ≤ b0, bn ≤ k.
n ^
Since i=1 ci = (ci + Ψ(Di) + Ψ(Di-1))
n ^ n n n
=> i=1 ci = i=1 ci + i=1 Ψ(Di) + i=1 Ψ(Di-1)
n ^
=> i=1 ci = nSi=1 ci + Ψ(Dn) - Ψ(D0)
n n ^
=> i=1ci = i=1 ci + Ψ(D0) - Ψ(Dn)
We have ^ci ≤ 2 for all 1≤ i ≤ n. Since Ψ(Di) = b0 and Ψ(Dn) = b, the total cost of n
INCREMENT operation is
Since n i=1ci = n
i=1
^
ci + Ψ(Dn) + Ψ(D0)
≤n i=1 2 - bn + b0 why because ci ≤ 2
=2n i=1 - bn + b0
= 2n - bn + b
Note that since b0 ≤ k, if we execute at least n = Ω(k) INCREMENT Operations, the total actual
cost is O(n), no matter what initial value of counter is.
87 | P a g e
Implementation of a queue with two stacks, such that the amortized cost of each ENQUEUE and
each DEQUEUE Operation is O(1). ENQUEUE pushes an object onto the first stack.
DEQUEUE pops off an object from second stack if it is not empty. If second stack is empty,
DEQUEUE transfers all objects from the first stack to the second stack to the second stack and
then pops off the first object. The goal is to show that this implementation has an O(1) amortized
cost for each ENQUEUE and DEQUEUE operation. Suppose Di denotes the state of the stacks
after ith operation. Define Ψ(Di) to be the number of elements in the first stack. Clearly, Ψ(D0) =
0 and Ψ(Di) ≥ Ψ(D0) for all i. If the ith operation is an ENQUEUE operation, then Ψ(Di) - Ψ(Di-
1) =1
Since the actual cost of an ENQUEUE operation is 1, the amortized cost of an ENQUEUE
operation is 2. If the ith operation is a DEQUEUE, then there are two case to consider.
Suppose you have to make a series of decisions, among various choices, where
88 | P a g e
You don’t have enough information to know what to choose
Each decision leads to a new set of choices
Some sequence of choices (possibly more than one) may be a solution to your
problem
Backtracking is a methodical way of trying out various sequences of decisions, until you
find one that “works”
Solving a maze
Coloring a map
89 | P a g e
Solving a puzzle
In this puzzle, all holes but one are filled with white pegs
You can jump over one peg with another
Jumped pegs are removed
The object is to remove all but the last peg
You don’t have enough information to jump correctly
Each choice leads to another set of choices
One or more sequences of choices may (or may not) lead to a solution
Many kinds of puzzle can be solved with backtracking
90 | P a g e
Backtracking
Terminology I
91 | P a g e
Terminology II
92 | P a g e
Full example: Map coloring
The Four Color Theorem states that any map on a plane can be colored with no more than
four colors, so that no two countries with a common border are the same color
For most maps, finding a legal coloring is easy
For some maps, it can be fairly difficult to find a legal coloring
We will develop a complete Java program to solve this problem
Data structures
93 | P a g e
Creating the map
94 | P a g e
The main program
95 | P a g e
Checking if a color can be used
96 | P a g e
GRAPH ALGORITHMS
Graph Theory is an area of mathematics that deals with following types of problems
Connection problems
Scheduling problems
Transportation problems
Network analysis
Games and Puzzles.
The Graph Theory has important applications in Critical path analysis, Social psychology,
Matrix theory, Set theory, Topology, Group theory, Molecular chemistry, and Searching.
Those who would like to take a quick tour of essentials of graph theory please go directly to
"Graph Theory" from here.
Introduction to Graphs
Definitions
Graphs, vertices and edges
A graph is a collection of nodes called vertices, and the connections between them, called edges.
Undirected and directed graphs
When the edges in a graph have a direction, the graph is called a directed graph or digraph, and
the edges are called directed edges or arcs. Here, I shall be exclusively concerned with directed
graphs, and so when I refer to an edge, I mean a directed edge. This is not a limitation, since an
undirected graph can easily be implemented as a directed graph by adding edges between
connected vertices in both directions.
A representation can often be simplified if it is only being used for undirected graphs, and I'll
mention in passing how this can be achieved.
Neighbours and adjacency
A vertex that is the end-point of an edge is called a neighbour of the vertex that is its starting-
point. The first vertex is said to be adjacent to the second.
An example
97 | P a g e
The following diagram shows a graph with 5 vertices and 7 edges. The edges between A and D
and B and C are pairs that make a bidirectional connection, represented here by a double-headed
arrow.
Mathematical definition
More formally, a graph is an ordered pair, G = <V, A>, where V is the set of vertices, and A, the
set of arcs, is itself a set of ordered pairs of vertices.
For example, the following expressions describe the graph shown above in set-theoretic
language:
V = {A, B, C, D, E}
A = {<A, B>, <A, D>, <B, C>, <C, B>, <D, A>, <D, C>, <D, E>}
Digraph
A directed graph, or digraph G consists of a finite nonempty set of vertices V, and a finite set of
edges E, where an edge is an ordered pair of vertices in V. Vertices are also commonly referred
to as nodes. Edges are sometimes referred to as arcs.
98 | P a g e
The definition of graph implies that a graph can be drawn just knowing its vertex-set and its
edge-set. For example, our first example
99 | P a g e
This section covers following three important topics from algorithmic perspective.
1. Transpose
2. Square
3. Incidence Matrix
1. Transpose
If graph G = (V, E) is a directed graph, its transpose, GT = (V, ET) is the same as graph G with all
arrows reversed. We define the transpose of a adjacency matrix A = (aij) to be the adjacency
matrix AT = (Taij) given by Taij = aji. In other words, rows of matrix A become columns of matrix
AT and columns of matrix A becomes rows of matrix AT. Since in an undirected graph, (u, v) and
(v, u) represented the same edge, the adjacency matrix A of an undirected graph is its own
transpose: A = AT.
Formally, the transpose of a directed graph G = (V, E) is the graph GT (V, ET), where ET = {(u,
v) Î V×V : (u, v)ÎE. Thus, GT is G with all its edges reversed.
We can compute GT from G in the adjacency matrix representations and adjacency list
representations of graph G.
Algorithm for computing GT from G in representation of graph G is
100 | P a g e
ALGORITHM MATRIX TRANSPOSE (G, GT)
For i = 0 to i < V[G]
For j = 0 to j V[G]
GT (j, i) = G(i, j)
j = j + 1;
i=i+1
To see why it works notice that if GT(i, j) is equal to G(j, i), the same thing is achieved. The time
complexity is clearly O(V2).
To see why it works, notice if an edge exists from u to v, i.e., v is in the adjacency list of u, then
u is present in the adjacency list of v in the transpose of G.
2. Square
The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that (a, b)ÎE2 if and only
if for some vertex cÎV, both (u, c)ÎE and (c,b)ÎE. That is, G2 contains an edge between vertex a
and vertex b whenever G contains a path with exactly two edges between vertex a and vertex b.
101 | P a g e
Create a new array Adj'(A), indexed by V[G]
For each v in V[G] do
For each u in Adj[v] do
\\ v has a path of length 2.
\\ to each of the neighbors of u
make a copy of Adj[u] and append it to Adj'[v]
Return Adj'(A).
For each vertex, we must make a copy of at most |E| list elements. The total time is O(|V| * |E|).
Algorithm for Computing G2 from G in the Adjacency-Matrix representation of G.
For i = 1 to V[G]
For j = 1 to V[G]
For k = 1 to V[G]
c[i, j] = c[i, j] + c[i, k] * c[k, j]
3. Incidence Matrix
The incidence matrix of a directed graph G=(V, E) is a V×E matrix B = (bij) such that
-1 if edge j leaves vertex j.
bij = 1 if edge j enters vertex j.
0 otherwise.
If B is the incidence matrix and BT is its transpose, the diagonal of the product matrix BBT
represents the degree of all the nodes, i.e., if P is the product matrix BBT then P[i, j] represents
the degree of node i:
Specifically we have
BBT(i,j) = ∑eÎE bie bTej = ∑eÎE bie bje
Now,
If i = j, then biebje = 1, whenever edge e enters or leaves vertex i and 0 otherwise.
If i ≠ j, then biebje = -1, when e = (i, j) or e = (j, i) and 0 otherwise.
Therefore
102 | P a g e
BBT(i,j) = deg(i) = in_deg + Out_deg if i = j
= -(# of edges connecting i an j ) if i ≠ j
103 | P a g e
1. Undiscovered;
2. Discovered but not fully explored; and
3. Fully explored.
The state of a vertex, u, is stored in a color variable as follows:
1. color[u] = White - for the "undiscovered" state,
2. color [u] = Gray - for the "discovered but not fully explored" state, and
3. color [u] = Black - for the "fully explored" state.
The BFS(G, s) algorithm develops a breadth-first search tree with the source vertex, s, as its root.
The parent or predecessor of any other vertex in the tree is the vertex from which it was first
discovered. For each vertex, v, the parent of v is placed in the variable π[v]. Another variable,
d[v], computed by BFS contains the number of tree edges on the path from s to v. The breadth-
first search uses a FIFO queue, Q, to store gray vertices.
104 | P a g e
16. π[v] ← u
17. ENQUEUE(Q, v)
18. DEQUEUE(Q)
19. color[u] ← BLACK
Example: The following figure (from CLRS) illustrates the progress of breadth-first search on
the undirected sample graph.
a. After initialization (paint every vertex white, set d[u] to infinity for each vertex u, and set the
parent of every vertex to be NIL), the source vertex is discovered in line 5. Lines 8-9 initialize Q
to contain just the source vertex s.
b. The algorithm discovers all vertices 1 edge from s i.e., discovered all vertices (w and r) at
level 1.
c.
d. The algorithm discovers all vertices 2 edges from s i.e., discovered all vertices (t, x, and v) at
level 2.
e.
105 | P a g e
f.
g. The algorithm discovers all vertices 3 edges from s i.e., discovered all vertices (u and y) at
level 3.
h.
i. The algorithm terminates when every vertex has been fully explored.
Analysis
The while-loop in breadth-first search is executed at most |V| times. The reason is that
every vertex enqueued at most once. So, we have O(V).
The for-loop inside the while-loop is executed at most |E| times if G is a directed graph or
2|E| times if G is undirected. The reason is that every vertex dequeued at most once and
we examine (u, v) only when u is dequeued. Therefore, every edge examined at most
once if directed, at most twice if undirected. So, we have O(E).
106 | P a g e
Therefore, the total running time for breadth-first search traversal is O(V + E).
Lemma 22.3 (CLRS) At any time during the execution of BFS suppose that Q contains the
vertices {v1, v2, ..., vr} with v1 at the head and vr at the tail. Then d[v1] ≤ d[v2] ≤ ... ≤ d[vr] ≤ d[v1]
+ 1.
Let v be any vertex in V[G]. If v is reachable from s then let δ(s, v) be the minimum number of
edges in E[G] that must be traversed to go from vertex s to vertex v. If v is not reachable from s
then let δ(s, v) = ∞.
Theorem 22.5 (CLRS) If BFS is run on graph G from a source vertex s in V[G] then for all v
in V[G], d[v] = δ(s, v) and if v ≠ s is reachable from s then one of the shortest paths from s to v is
a shortest path from s to π[v] followed by the edge from π[v] to v.
BFS builds a tree called a breadth-first-tree containing all vertices reachable from s. The set of
edges in the tree (called tree edges) contain (π[v], v) for all v where π[v] ≠ NIL.
If v is reachable from s then there is a unique path of tree edges from s to v. Print-Path(G, s, v)
prints the vertices along that path in O(|V|) time.
Print-Path(G, s, v)
if v = s
then print s
else if π[v] ← NIL
then print "no path exists from " s "to" v"
else Print-Path(G, s, π[v])
print v
107 | P a g e
In our course, we will use BFS in the following:
Prim's MST algorithm. (CLRS, Chapter 23.)
Dijkstra's single source shortest path algorithm. (CLRS, Chapter 24.)
108 | P a g e
if partition [u] ← partition [v]
then return 0
else
if color[v] ← WHITE then
then color[v] ← gray
d[v] = d[u] + 1
partition[v] ← 3 − partition[u]
ENQUEUE (Q, v)
DEQUEUE (Q)
Color[u] ← BLACK
Return 1
Correctness
As Bipartite (G, S) traverse the graph it labels the vertices with a partition number consisted with
the graph being bipartite. If at any vertex, algorithm detects an inconsistency, it shows with an
invalid return value. Partition value of u will always be a valid number as it was enqueued at
some point and its partition was assigned at that point. At line 19, partition of v will unchanged if
it already set, otherwise it will be set to a value opposite to that of vertex u.
Analysis
The lines added to BFS algorithm take constant time to execute and so the running time is the
same as that of BFS which is O(V + E).
2. Diameter of Tree
The diameter of a tree T = (V, E) is the largest of all shortest-path distance in the tree and given
by max[dist(u, v)]. As we have mentioned that BSF can be use to compute, for every vertex in
graph, a path with the minimum number of edges between start vertex and current vertex. It is
quite easy to compute the diameter of a tree. For each vertex in the tree, we use BFS algorithm to
get a shortest-path. By using a global variable length, we record the largest of all shortest-paths.
ALGORITHM: TREE_DIAMETER (T)
maxlength ← 0
for s ← 0 to s < |V[T]|
do temp ← BSF(T, S)
109 | P a g e
if maxlength < temp
maxlength ← temp
increment s by 1
return maxlength
Analysis
This will clearly takes O(V(V + E)) time.
2. Depth-First Search
Depth-first search is a systematic way to find all the vertices reachable from a source vertex, s.
Historically, depth-first was first stated formally hundreds of years ago as a method for
traversing mazes. Like breadth-first search, DFS traverse a connected component of a given
graph and defines a spanning tree. The basic idea of depth-first search is this: It methodically
explore every edge. We start over from different vertices as necessary. As soon as we discover a
vertex, DFS starts exploring from it (unlike BFS, which puts a vertex on a queue so that it
explores from it later).
110 | P a g e
way back to the original source vertex, s, it has built a DFS tree of all vertices reachable from
that source. If there still undiscovered vertices in the graph then it selects one of them as the
source for another DFS tree. The result is a forest of DFS-trees.
Note that the edges lead to new vertices are called discovery or tree edges and the edges lead to
already visited (painted) vertices are called back edges.
Like BFS, to keep track of progress depth-first-search colors each vertex. Each vertex of the
graph is in one of three states:
1. Undiscovered;
2. Discovered but not finished (not done exploring from it); and
3. Finished (have found everything reachable from it) i.e. fully explored.
The state of a vertex, u, is stored in a color variable as follows:
1. color[u] = White - for the "undiscovered" state,
2. color[u] = Gray - for the "discovered but not finished" state, and
3. color[u] = Black - for the "finished" state.
Like BFS, depth-first search uses π[v] to record the parent of vertex v. We have π[v] = NIL if and
only if vertex v is the root of a depth-first tree.
DFS time-stamps each vertex when its color is changed.
1. When vertex v is changed from white to gray the time is recorded in d[v].
2. When vertex v is changed from gray to black the time is recorded in f[v].
The discovery and the finish times are unique integers, where for each vertex the finish time is
always after the discovery time. That is, each time-stamp is an unique integer in the range of 1 to
2|V| and for each vertex v, d[v] < f[v]. In other words, the following inequalities hold:
1 ≤ d[v] < f[v] ≤ 2|V|
111 | P a g e
1. for each vertex u in V[G]
2. do color[u] ← WHITE
3. π[u] ← NIL
4. time ← 0
5. for each vertex u in V[G]
6. do if color[u] ← WHITE
7. then DFS-Visit(u) ▷ build a new DFS-tree from u
DFS-Visit(u)
1. color[u] ← GRAY ▷ discover u
2. time ← time + 1
3. d[u] ← time
4. for each vertex v adjacent to u ▷ explore (u, v)
5. do if color[v] ← WHITE
6. then π[v] ← u
7. DFS-Visit(v)
8. color[u] ← BLACK
9. time ← time + 1
10. f[u] ← time ▷ we are done with u
Example (CLRS): In the following figure, the solid edge represents discovery or tree edge and
the dashed edge shows the back edge. Furthermore, each vertex has two time stamps: the first
time-stamp records when vertex is first discovered and second time-stamp records when the
search finishes examining adjacency list of vertex.
112 | P a g e
Analysis
The analysis is similar to that of BFS analysis. The DFS-Visit is called (from DFS or from itself)
once for each vertex in V[G] since each vertex is changed from white to gray once. The for-loop
in DFS-Visit is executed a total of |E| times for a directed graph or 2|E| times for an undirected
graph since each edge is explored once. Moreover, initialization takes Θ(|V|) time. Therefore, the
running time of DFS is Θ(V + E).
Note that its Θ, not just O, since guaranteed to examine every vertex and edge.
Consider vertex u and vertex v in V[G] after a DFS. Suppose vertex v in some DFS-tree. Then
we have d[u] < d[v] < f[v] < f[u] because of the following reasons:
1. Vertex u was discovered before vertex v; and
2. Vertex v was fully explored before vertex u was fully explored.
Note that converse also holds: if d[u] < d[v] < f[v] < f[u] then vertex v is in the same DFS-tree
and a vertex v is a descendent of vertex u.
113 | P a g e
Suppose vertex u and vertex v are in different DFS-trees or suppose vertex u and vertex v are in
the same DFS-tree but neither vertex is the descendent of the other. Then one vertex was
discovered and fully explored before the other was discovered i.e., f[u] < d[v] or f[v] < d[u].
Parenthesis Theorem For all u, v, exactly one of the following holds:
1. d[u] < f[u] < d[v] < f[v] or d[v] < f[v] < d[u] < f[u] and neither of u and v is a descendant of the
other.
2. d[u] < d[v] < f[v] < f[u] and v is a descendant of u.
3. d[v] < d[u] < f[u] < f[v] and u is a descendant of v.
[Proof omitted.]
So, d[u] < d[v] < f[u] < f[v] cannot happen. Like parentheses: ( ) [], ( [ ] ), and [ ( ) ] are OK but
( [ ) ] and [ ( ] ) are not OK.
Corollary Vertex v is a proper descendant of u if and only if d[u] < d[v] < f[v] < f[u].
White-path Theorem Vertex v is a descendant of u if and only if at time d[u], there is a path u
to v consisting of only white vertices. (Except for u, which was just colored gray.)
[Proof omitted.]
Consider a directed graph G = (V, E). After a DFS of graph G we can put each edge into one of
four classes:
1. A tree edge is an edge in a DFS-tree.
2. A back edge connects a vertex to an ancestor in a DFS-tree. Note that a self-loop is a back
edge.
3. A forward edge is a non-tree edge that connects a vertex to a descendent in a DFS-tree.
4. A cross edge is any other edge in graph G. It connects vertices in two different DFS-tree or
two vertices in the same DFS-tree neither of which is the ancestor of the other.
Lemma 1 An Edge (u, v) is a back edge if and only if d[v] < d[u] < f[u] < f[v].
Proof
114 | P a g e
(=> direction) From the definition of a back edge, it connects vertex u to an ancestor vertex v in
a DFS-tree. Hence, vertex u is a descendent of vertex v. Corollary 22.8 in the CLRS (or see
above) states that vertex u is a proper descendent of vertex v if and only if d[v] < d[u] < f[u] <
f[v]. Hence proved forward direction.
(<= direction) Again by the Corollary 22.8 (CLRS), vertex u is a proper descendent of vertex v.
Hence if an edge (u, v) exists from u to v then it is an edge connecting a descendent vertex u to
its ancestor vertex v. Hence, it is a back edge. Hence proved backward direction.
Conclusion: Immediate from both directions.
Lemma 2 An edge (u, v) is a cross edge if and only if d[v] < f[v] < d[u] < f[v].
Proof
First take => direction.
Observation 1 For an edge (u, v), d[u] < f[u] and d[v] < f[v] since for any vertex has to be
discovered before we can finish exploring it.
Observation 2 From the definition of a cross edge it is an edge which is not a tree edge, forward
edge or a backward edge. This implies that none of the relationships for forward edge {d[u] <
d[v] < f[v] < f[u]} or back edge {d[v] < d[u] < f[u] < f[v]} can hold for a cross edge.
From the above two observations we conclude that the only two possibilities are:
1. d[u] < f[u] < d[v] < f[v]
2. d[v] < f[v] < d[u] < f[u]
When the cross edge (u, v) is discovered we must be at vertex u and vertex v must be black. The
reason is that if v was white then edge (u, v) would be a tree edge and if v was gray edge (u, v)
would be a back edge. Therefore, d[v] < d[u] and hence possibility (2) holds true.
Now take <= direction.
We can prove this direction by eliminating the various possible edges that the given relation can
convey. If d[v] < d[v] < d[u] < f[u] then edge (u, v) cannot be a tree or a forward edge. Also, it
cannot be a back edge by lemma 1. Edge (u, v) is not a forward or back edge. Hence it must be a
cross edge (Confused? please go above and look again the definition of cross edge).
Conclusion: Immediately from both directions.
115 | P a g e
DFS-Visit can be modified to classify the edges of a directed graph during the depth first search:
DFS-Visit(u) ▷ with edge classification. G must be a directed graph
1. color[u] ← GRAY
2. time ← time + 1
3. d[u] ← time
4. for each vertex v adjacent to u
5. do if color[v] ← BLACK
6. then if d[u] < d[v]
7. then Classify (u, v) as a forward edge
8. else Classify (u, v) as a cross edge
9. if color[v] ← GRAY
10. then Classify (u, v) as a back edge
11. if color[v] ← WHITE
12. then π[v] ← u
13. Classify (u, v) as a tree edge
14. DFS-Visit(v)
15. color[u] ← BLACK
16. time ← time + 1
17. f[u] ← time
Theorem In a depth-first search of an undirected graph G, every edge in E[G] is either a tree
edge or a back edge. No forward or cross edges.
[Proof omitted.]
116 | P a g e
Algorithms based on DFS
Based upon DFS, there are O(V + E)-time algorithms for the following problems:
Testing whether graph is connected.
Computing a spanning forest of G.
Computing the connected components of G.
Computing a path between two vertices of G or reporting that no such path exists.
Computing a cycle in G or reporting that no such cycle exists.
Application
As an application of DFS lets determine whether or not an undirected graph contains a cycle. It is
not difficult to see that the algorithm for this problem would be very similar to DFS(G) except
that when the adjacent edge is already a GRAY edge than a cycle is detected. While doing this
the algorithm also takes care that it is not detecting a cycle when the GRAY edge is actually a
tree edge from a ancestor to a descendent.
117 | P a g e
if color[v] ← WHITE
do predecessor[v] ← u
recursively DFS_visit(v)
color[u] ← BLACK
f[u] ← time ← time + 1
Correctness
To see why this algorithm works suppose the node to visited v is a gray node, then there are two
possibilities. The first possibility is that the node v is a parent node of u and we are going back
the tree edge which we traversed while visiting u after visiting v. In that case it is not a cycle.
The second possibility is that v has already been encountered once during DFS_visit and what
we are traversing now will be back edge and hence a cycle is detected.
Time Complexity
The maximum number of possible edges in the graph G if it does not have cycle is |V| − 1. If G
has a cycles, then the number of edges exceeds this number. Hence, the algorithm will detects a
cycle at the most at the Vth edge if not before it. Therefore, the algorithm will run in O(V) time.
Decomposing a directed graph into its strongly connected components is a classic application of
depth-first search. The problem of finding connected components is at the heart of many graph
application. Generally speaking, the connected components of the graph correspond to different
classes of objects. The first linear-time algorithm for strongly connected components is due to
Tarjan (1972). Perhaps, the algorithm in the CLRS is easiest to code (program) to find strongly
connected components and is due to Sharir and Kosaraju.
Given digraph or directed graph G = (V, E), a strongly connected component (SCC) of G is a
maximal set of vertices C subset of V, such that for all u, v in C, both u Þ v and v Þ u; that is,
both u and v are reachable from each other. In other words, two vertices of directed graph are in
the same component if and only if they are reachable from each other.
118 | P a g e
C1 C2 C3 C4
The above directed graph has 4 strongly connected components: C1, C2, C3 and C4. If G has an
edge from some vertex in Ci to some vertex in Cj where i ≠ j, then one can reach any vertex in Cj
from any vertex in Ci but not return. In the example, one can reach any vertex in C2 from any
vertex in C1 but cannot return to C1 from C2.
The algorithm in CLRS for finding strongly connected components of G = (V, E) uses the
transpose of G, which define as:
GT = (V, ET), where ET = {(u, v): (v, u) in E}.
GT is G with all edges reversed.
From the given graph G, one can create GT in linear time (i.e., Θ(V + E)) if using adjacency lists.
Observation:
The graphs G and GT have the same SCC's. This means that vertices u and v are reachable from
each other in G if and only if reachable from each other in GT.
Component Graph
The idea behind the computation of SCC comes from a key property of the component graph,
which is defined as follows:
GSCC = (VSCC, ESCC), where VSCC has one vertex for each SCC in G and ESCC has an edge if
there's an edge between the corresponding SCC's in G.
For our example (above) the GSCC is:
119 | P a g e
The key property of GSCC is that the component graph is a dag, which the following lemma
implies.
Lemma GSCC is a dag. More formally, let C and C' be distinct SCC's in G, let u, v in C, u', v' in
C', and suppose there is a path u Þ u' in G. Then there cannot also be a path v' Þ v in G.
Proof Suppose there is a path v' Þ v in G. Then there are paths u Þ u' Þ v' and v' Þ v Þ u in G.
Therefore, u and v' are reachable from each other, so they are not in separate SCC's.
This completes the proof.
ALGORITHM
A DFS(G) produces a forest of DFS-trees. Let C be any strongly connected component of G, let
v be the first vertex on C discovered by the DFS and let T be the DFS-tree containing v when
DFS-visit(v) is called all vertices in C are reachable from v along paths containing visible
vertices; DFS-visit(v) will visit every vertex in C, add it to T as a descendant of v.
STRONGLY-CONNECTED-COMPONENTS (G)
1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute GT
3. Call DFS(GT), but in the main loop, consider vertices in order of decreasing f[u] (as
computed in first DFS)
4. Output the vertices in each tree of the depth-first forest formed in second DFS as a separate
SCC.
Time: The algorithm takes linear time i.e., θ(V + E), to compute SCC of a digraph G.
From our Example (above):
1. Do DFS
2. GT
3. DFS (roots blackened)
120 | P a g e
Another Example (CLRS) Consider a graph G = (V, E).
1. Call DFS(G)
2. Compute GT
3. Call DFS(GT) but this time consider the vertices in order to decreasing finish time.
4. Output the vertices of each tree in the DFS-forest as a separate strongly connected
components.
{a, b, e}, {c, d}, {f, g}, and {h}
121 | P a g e
Idea By considering vertices in second DFS in decreasing order of finishing times from first
DFS, we are visiting vertices of the component graph in topological sort order.
To prove that it really works, first we deal with two notational issues:
We will be discussing d[u] and f[u]. These always refer to the first DFS in the above
algorithm.
We extend notation for d and f to sets of vertices U subset V:
o d(U) = minu in U {d[u]} (earliest discovery time of any vertex in U)
o f(U) = minu in U {f[u]} (latest finishing time of any vertex in U)
Lemma Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in E such
that u in C and v in C'. Then f(C) > f(C').
Proof There are two cases, depending on which SCC had the first discovered vertex during the
first DFS.
Case i. If d(C) > d(C'), let x be the first vertex discovered in C. At time d[x], all vertices in C and
C' are white. Thus, there exist paths of white vertices from x to all vertices in C and C'.
By the white-path theorem, all vertices in C and C' are descendants of x in depth-first tree.
By the parenthesis theorem, we have f[x] = f(C) > f(C').
Case ii. If d(C) > d(C'), let y be the first vertex discovered in C'. At time d[y], all vertices in C'
are white and there is a white path from y to each vertex in C. This implies that all vertices in C'
become descendants of y. Again, f[y] = f(C').
At time d[y], all vertices in C are white.
By earlier lemma, since there is an edge (u, v), we cannot have a path from C' to C. So, no vertex
in C is reachable from y. Therefore, at time f[y], all vertices in C are still white. Therefore, for all
w in C, f[w] > f[y], which implies that f(C) > f(C').
This completes the proof.
Corollary Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in ET
where u in C and v in C'. Then f(C) < f(C').
122 | P a g e
Proof Edge (u, v) in ET implies (v, u) in E. Since SCC's of G and GT are the same, f(C') > f(C).
This completes the proof.
Corollary Let C and C' be distinct SCC's in G = (V, E), and suppose that f(C) > f(C'). Then
there cannot be an edge from C to C' in GT.
Now, we have the intuition to understand why the SCC procedure works.
When we do the second DFS, on GT, start with SCC C such that f(C) is maximum. The second
DFS starts from some x in C, and it visits all vertices in C. Corollary says that since f(C) > f(C')
for all C' ≠ C, there are no edges from C to C' in GT. Therefore, DFS will visit only vertices in C.
Which means that the depth-first tree rooted at x contains exactly the vertices of C.
The next root chosen in the second DFS is in SCC C' such that f(C') is maximum over all SCC's
other than C. DFS visits all vertices in C', but the only edges out of C' go to C, which we've
already visited.
Therefore, the only tree edges will be to vertices in C'.
We can continue the process.
Each time we choose a root for the second DFS, it can reach only
vertices in its SCC ‾ get tree edges to these,
vertices in SCC's already visited in second DFS ‾ get no tree edges to these.
We are visiting vertices of (GT)SCC in reverse of topologically sorted order. [CLRS has a formal
proof.]
Before leaving strongly connected components, lets prove that the component graph of G = (V,
E) is a directed acyclic graph.
Proof (by contradiction) Suppose component graph of G = (V, E) was not a DAG and G
comprised of a cycle consisting of vertices v1, v2 , . . . , vn . Each vi corresponds to a strongly
connected component (SCC) of component graph G. If v1, v2 , . . . , vn themselves form a cycle
then each vi ( i runs from 1 to n) should have been included in the SCC corresponding to vj ( j
123 | P a g e
runs from 1 to n and i ≠ j). But each of the vertices is a vertex from a difference SCC of G.
Hence, we have a contradiction! Therefore, SCC of G is a directed acyclic graph.
4. Euler Tour
The motivation of this section is derived from the famous Konigsberg bridge problem solved by
Leonhard Euler in 1736. The 18th century German city of Königsberg was situated on the river
Pregel. Within a park built on the banks of the river, there were two islands joined by seven
bridges. The puzzle asks whether it is possible to take a tour through the park, crossing each
bridge only once.
An exhaustive search requires starting at every possible point and traversing all the possible
paths from that point - an O(n!) problem. However Euler showed that an Eulerian path existed
if and only if
it is possible to go from any vertex to any other by following the edges (the graph must
be connected) and
every vertex must have an even number of edges connected to it, with at most two
exceptions (which constitute the starting and ending points).
It is easy to see that these are necessary conditions: to complete the tour, one needs to enter and
leave every point except the start and end points. The proof that these are sufficient conditions
may be found in the literature . Thus we now have a O(n) problem to determine whether a path
exists.
124 | P a g e
In order to get a solution transform the map into a graph in which the nodes represent the "dry
land" points and the arcs represent the bridges.
We can now easily see that the Bridges of Königsberg does not have a solution. A quick
inspection shows that it does have a Hamiltonian path.
Definition A Euler tour of a connected, directed graph G = (V, E) is a cycle that traverses each
edge of graph G exactly once, although it may visit a vertex more than once.
In the first part of this section we show that G has an Euler tour if and only if in-degrees of every
vertex is equal to out-degree vertex. In the second part, we describe an algorithm to find an Euler
tour of graph if one exists.
Part 1 Show that G has an Euler tour if and only if in-degree(v) = out-degree(v) for each vertex
vÎV
Proof
First we'll work with => direction.
We will call a cycle simple if it visits each vertex no more than once, and complex if can visit a
vertex more than once. We know that each vertex in a simple cycle in-degree and out-degree
one, and any complex cycles can be expressed as a union of simple cycles. This implies that any
vertex in a complex cycle (and in particular an Euler tour) has in-degree equal to its out-degree.
Thus, if a graph has an Euler tour than all of its vertices have equal in- and out- degrees.
125 | P a g e
Now look at the <= direction.
Suppose we have a connected graph for which the in-degree and out-degree of all vertices are
equal. Let C be the longest complex cycle within G. If C is not an Euler tour, then there is a
vertex v of G touched by C such that not all edges in and out v of are exhausted by C. We may
construct a cycle C` in G-C starting and ending at v by performing a walk in G-C. (The reason is
that G-C also has a property that in-degrees and out-degrees are equal.) this simply means that
the complex cycle that starts at v goes along the edges of C` (returning to v) and then goes along
the edges of C is a longer complex cycle than C. This contradicts our choice of C as the longest
complex cycle which means that C must have been an Euler tour.
ALGORITHM
Given a starting vertex , the v0 algorithm will first find a cycle C starting and ending at v0 such
that C contains all edges going into and out of v0. This can be performed by a walk in the graph.
As we discover vertices in cycle C, we will create a linked list which contains vertices in order
and such that the list begins and ends in vertex v0. We set the current painter to the head of the
list. We now traverse the list by moving our pointer "current" to successive vertices until we and
a vertex which has an outgoing edge which has not been discovered. (If we reach the end of the
list, then we have already found the Euler tour). Suppose we find the vertex, vi, that has an
undiscovered outgoing edge. We then take a walk beginning and ending at vi such that all
undiscovered edges containing vi are contained in the walk. We insert our new linked list into old
linked list in place of vi and more "current" to the new neighbor pointed to the first node
containing vi. We continue this process until we search the final node of the linked list, and the
list will then contains an Euler tour.
126 | P a g e
AUTOMATA THEORY
What is Automata Theory?
Study of abstract computing devices, or“machines”
Automaton = an abstract computing device
Note: A “device” need not even be a physical hardware!
A fundamental question in computer science:
Find out what different models of machines can do and cannot do
The theory of computation
Computability vs Complexity
127 | P a g e
The Central Concepts of Automata Theory
Alphabet
An alphabet is a finite, non-empty set of symbols
We use the symbol Σ (sigma) to denote an alphabet
Examples:
o Binary: Σ = {0,1}
o All lower case letters: Σ = {a,b,c,..z}
o Alphanumeric: Σ = {a-z, A-Z, 0-9}
o DNA molecule letters: Σ = {a,c,g,t}
Strings
A string or word is a finite sequence of symbols
chosen from Σ
Empty string is (or “epsilon”)
Length of a string w,denoted by “|w|”, is equal to the number of (non- ) characters in
the string
E.g., x = 010100 |x| = 6
x = 01 0 1 00 |x| = ?
xy = concatenation of two strings x and y
128 | P a g e
Powers of an alphabet
Let Σ be an alphabet.
Σk = the set of all strings of length k
Σ* = Σ0 U Σ1 U Σ2U …
Σ+ = Σ1 U Σ2 U Σ3 U …
Languages
Finite Automata
Some Applications
Software for designing and checking the behavior of digital circuits
Lexical analyzer of a typical compiler
Software for scanning large bodies of text (e.g., web pages) for pattern finding
Software for verifying systems of all types that have a finite number of states (e.g., stock
market transaction, communication/network protocol)
129 | P a g e
Structural expressions
Grammars
Regular expressions
E.g., unix style to capture city names such as “Palo Alto CA”:
130 | P a g e
Proofs
Terminology
No magic formula
Understand statement:
o Write in own words
o Consider parts separately
Recognize hidden parts:
o iff
o To prove that 2 sets are equal (ie A = B), prove that A ⊆ B and B ⊆ A
Think about why the statement must be true
Consider examples:
o Try examples that have the property
o Try to find examples that don't have the property
Attempt something easier (eg special case)
Write up ideas clearly
Come back to it; be patient
Deductive Proofs
131 | P a g e
From the given statement(s) to a conclusion statement (what we want to prove)
Logical progression by direct implications
Example for parsing a statement:
“If y≥4, then 2y≥y2.”
Given conclusion
(there are other ways of writing this).
Quantifiers
For all” or “For every”
Universal proofs
Notation*=?
“There exists”
Used in existential proofs
Notation*=?
Implication is denoted by =>
E.g., “IF A THEN B” can also be written as “A=>B”
132 | P a g e
Proving techniques
By contradiction
To prove P, assume ~P and use it to prove a contradiction (ie to another statement that is
obviously false). Then conclude that P must be true.
Example from the text, revised: Use the "fact" that if it's raining, then everyone who
comes in from the outside has an umbrella to prove it's not raining. First, assume that it
is raining. Therefore, from the "fact" given above, everyone who comes in will have an
umbrella. Someone just came in from outside without an umbrella. This is a
contradiction. Thus we conclude that it is not raining.
o The method of this proof is valid, but the conclusion is only valid if the "fact"
really is a fact!
Example from the net: Prove p: that there is no smallest positive rational number. First
assume p is false, that is, that there is a smallest positive rational. Call the smallest
rational number r...
Example from the text: square root of 2 is irrational
Careful: When using proof by contradiction, mistakes can lead to apparent
contradictions.
For homework: Prove there is no largest prime number. Please work on this yourself and don't
look up a solution
Another Example
o Start with the statement contradictory to the given statement
o E.g., To prove (A => B), we start with: (A and ~B) … and then show that
could never happen
o What if you want to prove that “(A and B => C or D)”?
133 | P a g e
By induction
134 | P a g e
Proof by Induction: Example
By counter-example
o Show an example that disproves the claim
o Note: There is no such thing called a“proof by example”!
o So when asked to prove a claim, an example thatsatisfied that claim is not a proof
Proof by Construction
135 | P a g e
“If-and-Only-If” statements
o “A if and only if B” (A <==> B)
o (if part) if B then A ( <= )
o (only if part) A only if B ( => )
(same as “if A then B”)
“If and only if” is abbreviated as “iff”
i.e., “A iff B”
Example:
Theorem: Let x be a real number. Then floor of x =ceiling of x if and only if x is an integer.
Proofs
for iff have two parts
o One for the “if part” & another for the “only if part”
136 | P a g e
REFERENCES
1. Algorithm Design - Foundations, Analysis & Internet Examples by Michael T.
Goodrich and Roberto Tamassia
2. Data Structures and Algorithms in Java by Michael T. Goodrich and Roberto
Tamassia
3. Data Structures and Algorithms in C++ by Michael T. Goodrich, Roberto
Tamassia and David M. Mount
4. Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson,
Ronald L. Rivest and Clifford Stein
137 | P a g e