0% found this document useful (0 votes)
14 views418 pages

Algorithms Made Easy_ A beginners Handbook to easily learn all algorithms and all types of data structures

The document provides a comprehensive guide on algorithms, covering topics such as algorithm complexity, sorting methods, tree structures, graph theory, and various algorithms including Dijkstra's and A*. It includes practical examples and implementations in multiple programming languages. Additionally, it discusses algorithmic concepts like Big-Theta notation and dynamic programming, making it a valuable resource for understanding algorithm design and analysis.

Uploaded by

ivanzarateb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views418 pages

Algorithms Made Easy_ A beginners Handbook to easily learn all algorithms and all types of data structures

The document provides a comprehensive guide on algorithms, covering topics such as algorithm complexity, sorting methods, tree structures, graph theory, and various algorithms including Dijkstra's and A*. It includes practical examples and implementations in multiple programming languages. Additionally, it discusses algorithmic concepts like Big-Theta notation and dynamic programming, making it a valuable resource for understanding algorithm design and analysis.

Uploaded by

ivanzarateb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 418

Contents

About
Chapter 1: Getting started with algorithms
Section 1.1: A sample algorithmic problem
Section 1.2: Getting Started with Simple Fizz Buzz Algorithm in Swift
Chapter 2: Algorithm Complexity
Section 2.1: Big-Theta notation
Section 2.2: Comparison of the asymptotic notations
Section 2.3: Big-Omega Notation
Chapter 3: Big-O Notation
Section 3.1: A Simple Loop
Section 3.2: A Nested Loop
Section 3.3: O(log n) types of Algorithms
Section 3.4: An O(log n) example
Chapter 4: Trees
Section 4.1: Typical anary tree representation
Section 4.2: Introduction
Section 4.3: To check if two Binary trees are same or not
Chapter 5: Binary Search Trees
Section 5.1: Binary Search Tree - Insertion (Python)
Section 5.2: Binary Search Tree - Deletion(C++)
Section 5.3: Lowest common ancestor in a BST
Section 5.4: Binary Search Tree - Python
Chapter 6: Check if a tree is BST or not
Section 6.1: Algorithm to check if a given binary tree is BST
Section 6.2: If a given input tree follows Binary search tree property or not
Chapter 7: Binary Tree traversals
Section 7.1: Level Order traversal - Implementation
Section 7.2: Pre-order, Inorder and Post Order traversal of a Binary Tree
Chapter 8: Lowest common ancestor of a Binary Tree
Section 8.1: Finding lowest common ancestor
Chapter 9: Graph
Section 9.1: Storing Graphs (Adjacency Matrix)
Section 9.2: Introduction To Graph Theory
Section 9.3: Storing Graphs (Adjacency List)
Section 9.4: Topological Sort
Section 9.5: Detecting a cycle in a directed graph using Depth First
Traversal
Section 9.6: Thorup's algorithm
Chapter 10: Graph Traversals
Section 10.1: Depth First Search traversal function
Chapter 11: Dijkstra’s Algorithm
Section 11.1: Dijkstra's Shortest Path Algorithm
Chapter 12: A* Pathfinding
Section 12.1: Introduction to A*
Section 12.2: A* Pathfinding through a maze with no obstacles
Section 12.3: Solving 8-puzzle problem using A* algorithm
Chapter 13: A* Pathfinding Algorithm
Section 13.1: Simple Example of A* Pathfinding: A maze with no
obstacles
Chapter 14: Dynamic Programming
Section 14.1: Edit Distance
Section 14.2: Weighted Job Scheduling Algorithm
Section 14.3: Longest Common Subsequence
Section 14.4: Fibonacci Number
Section 14.5: Longest Common Substring
Chapter 15: Applications of Dynamic Programming
Section 15.1: Fibonacci Numbers
Chapter 16: Kruskal's Algorithm
Section 16.1: Optimal, disjoint-set based implementation
Section 16.2: Simple, more detailed implementation
Section 16.3: Simple, disjoint-set based implementation
Section 16.4: Simple, high level implementation
Chapter 17: Greedy Algorithms
Section 17.1: Human Coding
Section 17.2: Activity Selection Problem
Section 17.3: Change-making problem
Chapter 18: Applications of Greedy technique
Section 18.1: Oine Caching
Section 18.2: Ticket automat
Section 18.3: Interval Scheduling
Section 18.4: Minimizing Lateness
Chapter 19: Prim's Algorithm
Section 19.1: Introduction To Prim's Algorithm
Chapter 20: Bellman–Ford Algorithm
Section 20.1: Single Source Shortest Path Algorithm (Given there is a
negative cycle in a graph)
Section 20.2: Detecting Negative Cycle in a Graph
Section 20.3: Why do we need to relax all the edges at most (V-1) times
Chapter 21: Line Algorithm
Section 21.1: Bresenham Line Drawing Algorithm
Chapter 22: Floyd-Warshall Algorithm
Section 22.1: All Pair Shortest Path Algorithm
Chapter 23: Catalan Number Algorithm
Section 23.1: Catalan Number Algorithm Basic Information
Chapter 24: Multithreaded Algorithms
Section 24.1: Square matrix multiplication multithread
Section 24.2: Multiplication matrix vector multithread
Section 24.3: merge-sort multithread
Chapter 25: Knuth Morris Pratt (KMP) Algorithm
Section 25.1: KMP-Example
Chapter 26: Edit Distance Dynamic Algorithm
Section 26.1: Minimum Edits required to convert string 1 to string 2
Chapter 27: Online algorithms
Section 27.1: Paging (Online Caching)
Chapter 28: Sorting
Section 28.1: Stability in Sorting
Chapter 29: Bubble Sort
Section 29.1: Bubble Sort
Section 29.2: Implementation in C & C++
Section 29.3: Implementation in C#
Section 29.4: Python Implementation
Section 29.5: Implementation in Java
Section 29.6: Implementation in Javascript
Chapter 30: Merge Sort
Section 30.1: Merge Sort Basics
Section 30.2: Merge Sort Implementation in Go
Section 30.3: Merge Sort Implementation in C & C#
Section 30.4: Merge Sort Implementation in Java
Section 30.5: Merge Sort Implementation in Python
Section 30.6: Bottoms-up Java Implementation
Chapter 31: Insertion Sort
Section 31.1: Haskell Implementation
Chapter 32: Bucket Sort
Section 32.1: C# Implementation
Chapter 33: Quicksort
Section 33.1: Quicksort Basics
Section 33.2: Quicksort in Python
Section 33.3: Lomuto partition java implementation
Chapter 34: Counting Sort
Section 34.1: Counting Sort Basic Information
Section 34.2: Psuedocode Implementation
Chapter 35: Heap Sort
Section 35.1: C# Implementation
Section 35.2: Heap Sort Basic Information
Chapter 36: Cycle Sort
Section 36.1: Pseudocode Implementation
Chapter 37: Odd-Even Sort
Section 37.1: Odd-Even Sort Basic Information
Chapter 38: Selection Sort
Section 38.1: Elixir Implementation
Section 38.2: Selection Sort Basic Information
Section 38.3: Implementation of Selection sort in C#
Chapter 39: Searching
Section 39.1: Binary Search
Section 39.2: Rabin Karp
Section 39.3: Analysis of Linear search (Worst, Average and Best Cases)
Section 39.4: Binary Search: On Sorted Numbers
Section 39.5: Linear search
Chapter 40: Substring Search
Section 40.1: Introduction To Knuth-Morris-Pratt (KMP) Algorithm
Section 40.2: Introduction to Rabin-Karp Algorithm
Section 40.3: Python Implementation of KMP algorithm
Section 40.4: KMP Algorithm in C
Chapter 41: Breadth-First Search
Section 41.1: Finding the Shortest Path from Source to other Nodes
Section 41.2: Finding Shortest Path from Source in a 2D graph
Section 41.3: Connected Components Of Undirected Graph Using BFS
Chapter 42: Depth First Search
Section 42.1: Introduction To Depth-First Search
Chapter 43: Hash Functions
Section 43.1: Hash codes for common types in C#
Section 43.2: Introduction to hash functions
Chapter 44: Travelling Salesman
Section 44.1: Brute Force Algorithm
Section 44.2: Dynamic Programming Algorithm
Chapter 45: Knapsack Problem
Section 45.1: Knapsack Problem Basics
Section 45.2: Solution Implemented in C#
Chapter 46: Equation Solving
Section 46.1: Linear Equation
Section 46.2: Non-Linear Equation
Chapter 47: Longest Common Subsequence
Section 47.1: Longest Common Subsequence Explanation
Chapter 48: Longest Increasing Subsequence
Section 48.1: Longest Increasing Subsequence Basic Information
Chapter 49: Check two strings are anagrams
Section 49.1: Sample input and output
Section 49.2: Generic Code for Anagrams
Chapter 50: Pascal's Triangle
Section 50.1: Pascal triangle in C
Chapter 51: Algo:- Print a m*n matrix in square wise
Section 51.1: Sample Example
Section 51.2: Write the generic code
Chapter 52: Matrix Exponentiation
Section 52.1: Matrix Exponentiation to Solve Example Problems
Chapter 53: polynomial-time bounded algorithm for Minimum
Vertex Cover
Section 53.1: Algorithm Pseudo Code
Chapter 54: Dynamic Time Warping
Section 54.1: Introduction To Dynamic Time Warping
Chapter 55: Fast Fourier Transform
Section 55.1: Radix 2 FFT
Section 55.2: Radix 2 Inverse FFT
Appendix A: Pseudocode
Section A.1: Variable aectations
Section A.2: Functions
Chapter 1: Getting started with
algorithms
Section 1.1: A sample algorithmic
problem
An algorithmic problem is specified by describing the complete set of
instances it must work on and of its output after running on one of these
instances. This distinction, between a problem and an instance of a problem,
is fundamental. The algorithmic problem known as sorting is defined as
follows: [Skiena:2008:ADM:1410219]
'_1 <= a'_2 <= ... <= a'_{n-1} <= a'_nProblem: Sorting
Input: A sequence of n keys, a_1, a_2, ...,
a_n.
Output: The reordering of the input sequence such that a
Haskell, Emacs An instance of sorting might be an array of strings, such as {}
or a sequence of numbers such as
154, 245, 1337 {}.
Section 1.2: Getting Started with
Simple Fizz Buzz Algorithm in Swift
For those of you that are new to programming in Swift and those of you
coming from different programming bases, such as Python or Java, this
article should be quite helpful. In this post, we will discuss a simple solution
for implementing swift algorithms.
Fizz Buzz
You may have seen Fizz Buzz written as Fizz Buzz, FizzBuzz, or Fizz-Buzz;
they're all referring to the same thing. That "thing" is the main topic of
discussion today. First, what is FizzBuzz?
This is a common question that comes up in job interviews.
Imagine a series of a number from 1 to 10.

Fizz and Buzz refer to any number that's a multiple of 3 and 5 respectively.
In other words, if a number is divisible by 3, it is substituted with fizz; if a
number is divisible by 5, it is substituted with buzz. If a number is
simultaneously a multiple of 3 AND 5, the number is replaced with "fizz
buzz." In essence, it emulates the famous children game "fizz buzz".
To work on this problem, open up Xcode to create a new playground and
initialize an array like below:

To find all the fizz and buzz, we must iterate through the array and check
which numbers are fizz and which are buzz. To do this, create a for loop to
iterate through the array we have initialised:

After this, we can simply use the "if else" condition and module operator in
swift ie - % to locate the fizz and buzz
Great! You can go to the debug console in Xcode playground to see the
output. You will find that the "fizzes" have been sorted out in your array.
For the Buzz part, we will use the same technique. Let's give it a try before
scrolling through the article — you can check your results against this
article once you've finished doing this.

Check the output!


It's rather straight forward — you divided the number by 3, fizz and divided
the number by 5, buzz. Now, increase the numbers in the array

We increased the range of numbers from 1-10 to 1-15 in order to demonstrate


the concept of a "fizz buzz." Since 15 is a multiple of both 3 and 5, the
number should be replaced with "fizz buzz." Try for yourself and check the
answer!
Here is the solution:

Wait...it's not over though! The whole purpose of the algorithm is to


customize the runtime correctly. Imagine if the range increases from 1-15 to
1-100. The compiler will check each number to determine whether it is
divisible by 3 or 5. It would then run through the numbers again to check if
the numbers are divisible by 3 and 5. The code would essentially have to run
through each number in the array twice — it would have to runs the
numbers by 3 first and then run it by 5. To speed up the process, we can
simply tell our code to divide the numbers by 15 directly.
Here is the final code:

As Simple as that, you can use any language of your choice and get started
Enjoy Coding
Chapter 2: Algorithm Complexity
Section 2.1: Big-Theta notation
Unlike Big-O notation, which represents only upper bound of the running
time for some algorithm, Big-Theta is a tight bound; both upper and lower
bound. Tight bound is more precise, but also more difficult to compute.
Ө Ө (g(x)) <=>
g

The Big-Theta notation is symmetric: f(x) =(x) =(f(x))


ӨAn intuitive way to grasp it is that f(x) =(g(x)) means that the graphs of f(x)
and g(x) grow in the same rate, or that the graphs 'behave' similarly for big
enough values of x.
The full mathematical expression of the Big-Theta notation is as follows:
Ө (f(x)) = {g: N0 -> R and c1, c2, n0 > 0, where c1 < abs(g(n) / f(n)), for
every n > n0 and abs is the absolute value }
An example
25n 42n 100

If the algorithm for the input n takes ^2 ++ 4 operations to finish, we say that is
O(n^2), but is also O(n^3) and O(n^). However, it is Ө (n^2) and it is not Ө (n^3),
Ө (n^4) etc. Algorithm that is Ө (f(n)) is also O(f(n)), but not vice versa!

Formal

mathematical

definition Ө (g(x))

is a set of

functions.
such that there exist positive constants c1, c2, N such that 0 <= c1 *g(x) <= f Ө (g(x))
= {f(x)
(x)
<= *g(x) for all x >N }
c2
∈ Ө
Ө

Because Ө (g(x)) is a set, we could write f(x)(g(x)) to indicate that f(x) is a


member of Ө (g(x)). Instead, we will usually write f(x) =(g(x)) to express the
same notion - that's the common way.
Өf T T

Whenever Ө (g(x)) appears in a formula, we interpret it as standing for some


anonymous function that we do not care to name. For example the equation
T(n) =(n/2) +(n), means T(n) =(n/2) +(n) where f(n) is a function in the set Ө (n).

Ө ->infinity
Let f and g be two functions defined on some subset of
the real numbers. We write f(x) =(g(x)) as x if and only if there are
positive constants K and L and a real number x0 such that holds:
<= f (x) <= L |g(x)| for x >= x0K|g(x)|.
all
The definition is equal to:
Ω O (g(x)) and f

f(x) =(x) =(g(x))

A method that uses limits


Ө limit (x ->infinity ) f c (0 , ∞ )

if (x)/g(x) = i.e. the limit exists and it's positive, then f(x) =(g(x))
Common Complexity Classes
n =
Name Notation n = 10 100
Constant Ө (1) 1 1
Logarithmic Ө (log(n)) 3 7
Linear Ө (n) 10 100
Linearithmic 30 700
Ө (n*log(n))
Quadratic Ө (n^2) 100 10
000
Exponential Ө (2^n) 1 024 1.267650e+
30
Factorial Ө (n!) 3 628 800
9.332622e+157
Section 2.2: Comparison of the
asymptotic notations
f(n) and g(n) Let be two functions defined on the set of the positive real
numbers, c, c1, c2, n0 are positive real constants.

=f(n) =
Notation f(n) = O(g(n)) f(n) = Ω (g(n)) f(n) = Θ (g(n)) o(g(n)) ω (g(n))

The asymptotic notations can be represented on a Venn diagram as follows:

Links
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein.
Introduction to Algorithms.
Section 2.3: Big-Omega Notation
Ω -notation is used for asymptotic lower bound.
Formal definition
Ω
Let f(n) and g(n) be two functions defined on the set of the positive real
numbers. We write f(n) =(g(n)) if there are positive constants c and n0 such that:
c g f (n) for all n ≥ n0

0 ≤ (n) ≤ .

Notes
Ω

means that f(n) grows asymptotically no slower than g(n). Also we


f(n) =(g(n))
can say about Ω (g(n)) when algorithm analysis is not enough for statement
about Θ (g(n)) or / and O(g(n)).
From the definitions of notations follows the theorem:
Ө Ω O (g(n)) and f

For two any functions f(n) and g(n) we have f(n) =(g(n)) if and only if f(n) =(n) =
(g(n)).

Graphically Ω -notation may be represented as follows:


Ω Ω
- 4.
Then f(n) = (n^2). It is also correct f(n) = (n), or even f(n)
Ω =(1).

Another example to solve perfect matching algorithm : If the number of


vertices is odd then output "No Perfect Matching" otherwise try all possible
matchings.
c g != o

We would like to say the algorithm requires exponential time but in fact you
cannot prove a Ω (n^2) lower bound using the usual definition of Ω since the
algorithm runs in linear time for n odd. We should instead define f(n)= Ω (g(n))
by saying for some constant c>0, f(n) ≥ (n) for infinitely many n. This gives a
nice correspondence between upper and lower bounds: f(n)= Ω (g(n)) iff f(n)(g(n)).
References
Formal definition and theorem are taken from the book "Thomas H. Cormen,
Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to
Algorithms".
Chapter 3: Big-O Notation
Definition
The Big-O notation is at its heart a mathematical notation, used to compare
the rate of convergence of functions.
n -> f (n) and n -> g (n) be functions defined over the natural numbers. f = O
Then we say that
Let (g) if and
f = O <=only if f(n)/g(n) is bounded when n approaches infinity. In other
A
words, (g) if and only if there exists a constant A, such that for all n,
f(n)/g(n).

Actually the scope of the Big-O notation is a bit wider in mathematics but for
simplicity I have narrowed it to what is used in algorithm complexity
analysis : functions defined on the naturals, that have non-zero values, and
the case of n growing to infinity.
What does it mean ?
100n 10n n

Let's take the case of f(n) =^2 ++ 1 and g(n) =^2. It is quite clear that both of these
functions tend to infinity as n tends to infinity. But sometimes knowing the
limit is not enough, and we also want to know the speed at which the
functions approach their limit. Notions like Big-O help compare and classify
functions by their speed of convergence.
10 f = O 100 + 10 / n <= 100

Let's find out if (g) by applying the definition. We have f(n)/g(n) =+ 1/n^2. Since
/n is 10 when n is 1 and is decreasing, and since 1/n^2 is 1 when n is 1 and is
also decreasing, we have ̀ f(n)/g(n)+
10 111 . The definition is satisfied because we have found a bound of f = O
f(n)/g(n) (111) and so

+ 1 =(g) (we say that f is a Big-O of n^2).


log n ^2* log
This means that f tends to infinity at approximately the same speed as g. Now
this may seem like a strange thing to say, because what we have found is that
f is at most 111 times bigger than g, or in other words when g grows by 1, f
grows by at most 111. It may seem that growing 111 times faster is not
"approximately the same speed". And indeed the Big-O notation is not a very
precise way to classify function convergence speed, which is why in
mathematics we use the equivalence relationship when we want a precise
estimation of speed. But for the purposes of separating algorithms in large
speed classes, Big-O is enough. We don't need to separate functions that
grow a fixed number of times faster than each other, but only functions that
grow infinitely faster than each other. For instance if we take h(n) =(n), we see
that h(n)/g(n) =(n) which tends to infinity with n so h is not O(n^2), because h
grows infinitely faster than n^2.
f = O (g) g = O (h), f = O Now I need to make a side note : you might have
and then noticed that if (h). For
f = O (n^3), f = O (n^4)... In algorithm complexity analysis, we f instance in
and frequently say our case,
we have =
f = O (g) g = O O(g) to mean that (f), which can be understood as "g is the
and smallest Big-O for f". In mathematics we say that such
functions are Big-Thetas of each other.
How is it used ?
When comparing algorithm performance, we are interested in the number of
operations that an algorithm performs. This is called time complexity. In this
model, we consider that each basic operation (addition, multiplication,
comparison, assignment, etc.) takes a fixed amount of time, and we count the
number of such operations. We can usually express this number as a function
of the size of the input, which we call n. And sadly, this number usually
grows to infinity with n (if it doesn't, we say that the algorithm is O(1)). We
separate our algorithms in big speed classes defined by Big-O : when we
speak about a "O(n^2) algorithm", we mean that the number of operations it
performs, expressed as a function of n, is a O(n^2). This says that our
algorithm is approximately as fast as an algorithm that would do a number of
operations equal to the square of the size of its input, or faster. The "or
faster" part is there because I used Big-O instead of Big-Theta, but usually
people will say Big-O to mean Big-Theta.
When counting operations, we usually consider the worst case: for instance if
we have a loop that can run at most n times and that contains 5 operations,
the number of operations we count is 5n. It is also possible to consider the
average case complexity.
Quick note : a fast algorithm is one that performs few operations, so if the
number of operations grows to infinity faster, then the algorithm is slower:
O(n) is better than O(n^2).
We are also sometimes interested in the space complexity of our
algorithm. For this we consider the number of bytes in memory occupied
by the algorithm as a function of the size of the input, and use Big-O the
same way.
Section 3.1: A Simple Loop
The following function finds the maximal element in an array:

The input size is the size of the array, which I called len in the code.
Let's count the operations.

These two assignments are done only once, so that's 2 operations. The
operations that are looped are:

a polynomial where the fastest growing term is a factor of n, so it is O(n).


if ( max < array
You probably have noticed that "operation" is not very well
defined. For instance I said that [i]) was one operation, but depending on the
architecture this statement can compile to for instance three instructions : one
memory read, one comparison and one branch. I have also considered all
operations as the same, even though for instance the memory operations will
be slower than the others, and their performance will vary wildly due for
instance to cache effects. I also have completely ignored the return statement,
the fact that a frame will be created for the function, etc. In the end it doesn't
matter to complexity analysis, because whatever way I choose to count
operations, it will only change the coefficient of the n factor and the constant,
so the result will still be O(n). Complexity shows how the algorithm scales
with the size of the input, but it isn't the only aspect of performance!
Section 3.2: A Nested Loop
The following function checks if an array has any duplicates by taking each
element, then iterating over the whole array to see if the element is there

an bn + c
The inner loop performs at each iteration a number of operations that
is constant with n. The outer loop also does a few constant operations, and
runs the inner loop n times. The outer loop itself is run n times. So the
operations inside the inner loop are run n^2 times, the operations in the outer
loop are run n times, and the assignment to i is done one time. Thus, the
complexity will be something like ^2 +, and since the highest term is n^2, the
O notation is O(n^2).

Obviously, this second version does less operations and so is


more efficient. How does that translate to Big-O notation?
Well, now the inner loop body is run 1 + 2 +
the second degree, and so is still only O(n^2). We have clearly lowered the
complexity, since we roughly divided by 2 the number of operations that we
are doing, but we are still in the same complexity class as defined by Big-O.
In order to lower the complexity to a lower class we would need to divide the
number of operations by something that tends to infinity with n.
Section 3.3: O(log n) types of
Algorithms
Let's say we have a problem of size n. Now for each step of our
algorithm(which we need write), our original problem becomes half of its
previous size(n/2).
So at each step, our problem becomes half.
Step Problem
n/2
n/4
n/8
n/16
When the problem space is reduced(i.e solved completely), it cannot be
reduced any further(n becomes equal to 1) after exiting check condition.

1. Let's say at kth step or number of operations: problem-size = 1

2. But we know at kth step, our problem-size should be: problem-size =

n/2k

3. From 1 and 2:

n/2k

or

n
=

2k

4. Take log on both sides

loge

loge2

or

loge

loge

5. Using formula logx m / logx n = logn m

=
log2

or

simply

log

Now we know that our algorithm can run maximum up to


log n, hence time complexity comes as O( log n)

A very simple example in code to support above text is :

So now if some one asks you if n is 256 how many steps that loop( or any
other algorithm that cuts down it's problem size into half) will run you can
very easily calculate.

k=

log2

256

k=
log2

28(

=>

logaa

= 1)

k=8
Another very good example for similar case is Binary Search Algorithm.
Section 3.4: An O(log n) example
Introduction
Consider the following problem:

L is a sorted list containing n signed integers (n being big enough), for


example [-5, -2, -1, 0, 1, 2, 4] (here, n has a value of 7). If L is known to contain
the integer 0, how can you find the index of 0 ?
Na ï ve approach
The first thing that comes to mind is to just read every index until 0 is found.
In the worst case, the number of operations is n, so the complexity is O(n).
This works fine for small values of n, but is there a more efficient way ?
Dichotomy
Consider the following algorithm (Python3):

a and b are the indexes between which 0 is to be found. Each time we enter
the loop, we use an index between a and b and use it to narrow the area to be
searched.
In the worst case, we have to wait until a and b are equal. But how many
operations does that take? Not n, because each time we enter the loop, we
divide the distance between a and b by about two. Rather, the complexity is
O(log n).
Explanation
Note: When we write "log", we mean the binary logarithm, or log base 2
(which we will write "log_2"). As O(log_2 n) = O(log n) (you can do the
math) we will use "log" instead of "log_2".
Let's call x the number of operations: we know that 1 = n / (2^x).

So 2^x = n,

then x = log n

Conclusion
When faced with successive divisions (be it by two or by any number),
remember that the complexity is logarithmic.
Chapter 4: Trees
Section 4.1: Typical anary tree
representation
Typically we represent an anary tree (one with potentially unlimited children
per node) as a binary tree, (one with exactly two children per node). The
"next" child is regarded as a sibling. Note that if a tree is binary, this
representation creates extra nodes.
We then iterate over the siblings and recurse down the children. As most
trees are relatively shallow - lots of children but only a few levels of
hierarchy, this gives rise to efficient code. Note human genealogies are an
exception (lots of levels of ancestors, only a few children per level).
If necessary back pointers can be kept to allow the tree to be ascended. These
are more difficult to maintain.
Note that it is typical to have one function to call on the root and a recursive
function with extra parameters, in this case tree depth.
Section 4.2: Introduction
Trees are a sub-type of the more general node-edge graph data structure.

To be a tree, a graph must satisfy two requirements:


It is acyclic. It contains no cycles (or "loops").
It is connected. For any given node in the graph, every node is
reachable. All nodes are reachable through one path in the graph.
The tree data structure is quite common within computer science. Trees are
used to model many different algorithmic data structures, such as ordinary
binary trees, red-black trees, B-trees, AB-trees, 23-trees, Heap, and tries.
it is common to refer to a Tree as a Rooted Tree by:
choosing 1 cell to be called `Root` painting the `Root` at the top creating lower layer for each
cell in the graph depending on their distance from the root -the bigger the distance, the lower the

cells (example above) common symbol for trees: T


Section 4.3: To check if two Binary
trees are same or not
1. For example if

the inputs are:

Example:1
a)

b)

Output should be true.


Example:2
If the inputs are:
a)
b)

Output should be false.


Pseudo code for the same:
Chapter 5: Binary Search Trees
Binary tree is a tree that each node in it has maximum of two children.
Binary search tree (BST) is a binary tree which its elements positioned in
special order. In each BST all values(i.e key) in left sub tree are less than
values in right sub tree.
Section 5.1: Binary Search Tree -
Insertion (Python)
This is a simple implementation of Binary Search Tree Insertion using
Python.
An example is shown below:

Following the code snippet each image shows the execution visualization
which makes it easier to visualize how this code works.
Section 5.2: Binary Search Tree -
Deletion(C++)
Before starting with deletion I just want to put some lights on what is a
Binary search tree(BST), Each node in a BST can have maximum of two
nodes(left and right child).The left sub-tree of a node has a key less than or
equal to its parent node's key. The right sub-tree of a node has a key greater
than to its parent node's key.
Deleting a node in a tree while maintaining its Binary search tree property.

There are three cases to be considered while deleting a node.


Case 1: Node to be deleted is the leaf node.(Node with value 22).
Case 2: Node to be deleted has one child.(Node with value 26).
Case 3: Node to be deleted has both children.(Node with value 49).
Explanation of cases:
1. When the node to be deleted is a leaf node then simply delete the node
and pass nullptr to its parent node.
2. When a node to be deleted is having only one child then copy the child
value to the node value and delete the child (Converted to case 1)
3. When a node to be delete is having two childs then the minimum from
its right sub tree can be copied to the node and then the minimum value
can be deleted from the node's right subtree (Converted to Case 2)
Note: The minimum in the right sub tree can have a maximum of one child
and that too right child if it's having the left child that means it's not the
minimum value or it's not following BST property.
The structure of a node in a tree and the code for Deletion:

Time complexity of above code is O(h), where h is


the height of the tree.
Section 5.3: Lowest
common ancestor in a BST
Consider the BST:

Lowest common ancestor of 22 and 26 is 24


Lowest common ancestor of 26 and 49 is 46
Lowest common ancestor of 22 and 24 is 24

Binary search tree property can be used for

finding nodes lowest ancestor Psuedo code:


Section 5.4: Binary Search Tree -
Python

""" Create different node and insert data into it"""


Chapter 6: Check if a tree is BST or
not
Section 6.1: Algorithm to check if a
given binary tree is BST
A binary tree is BST if it satisfies any one of the following condition:
1. It is empty
2. It has no subtrees
3. For every node x in the tree all the keys (if any) in the left sub tree must
be less than key(x) and all the keys (if any) in the right sub tree must be
greater than key(x).
So a straightforward recursive algorithm would be:

The above recursive algorithm is correct but inefficient, because it traverses


each node mutiple times.
K_MIN,K_MAX K_MIN,x x,K_MAXAnother approach to minimize the multiple
visits of each node is to remember the min and max possible values of the
keys in the subtree we are visiting. Let the minimum possible value of any
key be K_MIN and maximum value be K_MAX. When we start from the root of
the tree, the range of values in the tree is []. Let the key of root node be x.
Then the range of values in left subtree is [) and the range of values in right
subtree is (]. We will use this idea to develop a more efficient algorithm.
It will be initially called as:

Another approach will be to do inorder traversal of the Binary tree. If the


inorder traversal produces a sorted sequence of keys then the given tree is a
BST. To check if the inorder sequence is sorted remember the value of
previously visited node and compare it against the current node.
Section 6.2: If a given input tree follows
Binary search tree property or not
For example
if the input is:

Output should be false:

As 4 in the left sub-tree is

greater than the root value(3)

If the input is:

Output should be true


Chapter 7: Binary Tree traversals
Visiting a node of a binary tree in some particular order is
called traversals.
Section 7.1: Level Order
traversal - Implementation
For example if the given tree is:

Level order traversal will be


1234567
Printing node data level by level.
Code:
Queue data structure is used to achieve the above objective.
Section 7.2: Pre-order, Inorder and
Post Order traversal of a Binary Tree
Consider the Binary Tree:

Pre-order traversal(root) is traversing the node then left sub-tree of the


node and then the right sub-tree of the node.
So the pre-order traversal of above tree will be:
1245367
In-order traversal(root) is traversing the left sub-tree of the node then the
node and then right sub-tree of the node.

So the in-order

traversal of above tree

will be: 4 2 5 1 6 3 7
Post-order traversal(root) is traversing the left sub-tree of the node then the
right sub-tree and then the node.
So the post-order traversal of above tree will be:
4526731
Chapter 8: Lowest common
ancestor of a Binary Tree
Lowest common ancestor between two nodes n1 and n2 is defined as the
lowest node in the tree that has both n1 and n2 as descendants.
Section 8.1: Finding lowest common
ancestor
Consider the tree:

Lowest common ancestor of nodes with value 1 and 4 is 2


Lowest common ancestor of nodes with value 1 and 5 is 3
Lowest common ancestor of nodes with value 2 and 4 is 4
Lowest common ancestor of nodes with value 1 and 2 is 2
Chapter 9: Graph
A graph is a collection of points and lines connecting some (possibly empty)
subset of them. The points of a graph are called graph vertices, "nodes" or
simply "points." Similarly, the lines connecting the vertices of a graph are
called graph edges, "arcs" or "lines."
A graph G can be defined as a pair (V,E), where V is a set of vertices, and E
is a set of edges between the vertices E ⊆ {(u,v) | u, v ∈ V}.
Section 9.1: Storing Graphs
(Adjacency Matrix)
To store a graph, two methods are common:
Adjacency Matrix
Adjacency List
An adjacency matrix is a square matrix used to represent a finite graph. The
elements of the matrix indicate whether pairs of vertices are adjacent or not
in the graph.
Adjacent means 'next to or adjoining something else' or to be beside
something. For example, your neighbors are adjacent to you. In graph theory,
if we can go to node B from node A, we can say that node B is adjacent to
node A. Now we will learn about how to store which nodes are adjacent to
which one via Adjacency Matrix. This means, we will represent which nodes
share edge between them. Here matrix means 2D array.

Here you can see a table beside the graph, this is our adjacency matrix. Here
Matrix[i][j] = 1 represents there is an edge between i and j. If there's no
edge, we simply put Matrix[i][j] = 0.
These edges can be weighted, like it can represent the distance between two
cities. Then we'll put the value in Matrix[i][j] instead of putting 1.
The graph described above is Bidirectional or Undirected, that means, if we
can go to node 1 from node 2, we can also go to node 2 from node 1. If the
graph was Directed, then there would've been arrow sign on one side of the
graph. Even then, we could represent it using adjacency matrix.

We represent the nodes that don't share edge by infinity. One thing to be
noticed is that, if the graph is undirected, the matrix becomes symmetric.
The pseudo-code to create the matrix:

We can also populate the Matrix using this common way:

For directed graphs, we can remove Matrix[n2][n1] = cost line.


The drawbacks of using Adjacency Matrix:
Memory is a huge problem. No matter how many edges are there, we will
always need N * N sized matrix where N is the number of nodes. If there are
10000 nodes, the matrix size will be 4 * 10000 * 10000 around 381
megabytes. This is a huge waste of memory if we consider graphs that have a
few edges.
Suppose we want to find out to which node we can go from a node u. We'll
need to check the whole row of u, which costs a lot of time.
The only benefit is that, we can easily find the connection between u-v
nodes, and their cost using Adjacency Matrix.
Java code implemented using above pseudo-code:

private int[][] adjacency_matrix; public


Represent_Graph_Adjacency_Matrix(int v) { vertices = v;
adjacency_matrix = new int[vertices + 1][vertices + 1]; } public void
makeEdge(int to, int from, int edge)
{ try { adjacency_matrix[to][from]
= edge;
} catch (ArrayIndexOutOfBoundsException index)
{
System.out.println("The vertices does not exists");
}
} public int getEdge(int to, int from)
{ try { return adjacency_matrix[to]
[from];
} catch (ArrayIndexOutOfBoundsException index)
{
System.out.println("The vertices does not exists"); }
return -1;
} public static void main(String args[]) {
int v, e, count = 1, to = 0, from = 0; Scanner sc
= new Scanner(System.in);
Represent_Graph_Adjacency_Matrix graph; try
{
System.out.println("Enter the number of vertices: "); v=
sc.nextInt();
System.out.println("Enter the number of edges: "); e=
sc.nextInt(); graph = new Represent_Graph_Adjacency_Matrix(v);
System.out.println("Enter the edges: <to> <from>"); while
(count <= e) { to = sc.nextInt(); from =
sc.nextInt();
graph.makeEdge(to, from, 1);
count++; }

System.out.println("The adjacency matrix for the given graph is: ");


System.out.print(" "); for (int i = 1; i <= v; i++) System.out.print(i + " ");
System.out.println();
javac Represent_Graph_Adjacency_Matrix.java
Running the code: Save

the file and compile using Example:


Section 9.2: Introduction To Graph
Theory
Graph Theory is the study of graphs, which are mathematical structures used
to model pairwise relations between objects.
Did you know, almost all the problems of planet Earth can be converted into
problems of Roads and Cities, and solved? Graph Theory was invented many
years ago, even before the invention of computer. Leonhard Euler wrote a
paper on the Seven Bridges of Königsberg which is regarded as the first
paper of Graph Theory. Since then, people have come to realize that if we
can convert any problem to this City-Road problem, we can solve it easily by
Graph Theory.
Graph Theory has many applications.One of the most common application is
to find the shortest distance between one city to another. We all know that to
reach your PC, this web-page had to travel many routers from the server.
Graph Theory helps it to find out the routers that needed to be crossed.
During war, which street needs to be bombarded to disconnect the capital
city from others, that too can be found out using Graph Theory.
Let us first learn some basic definitions on Graph Theory.
Graph:
Let's say, we have 6 cities. We mark them as 1, 2, 3, 4, 5, 6. Now we connect
the cities that have roads between each other.
This is a simple graph where some cities are shown with the roads that are
connecting them. In Graph Theory, we call each of these cities Node or
Vertex and the roads are called Edge. Graph is simply a connection of these
nodes and edges.
A node can represent a lot of things. In some graphs, nodes represent cities,
some represent airports, some represent a square in a chessboard. Edge
represents the relation between each nodes. That relation can be the time to
go from one airport to another, the moves of a knight from one square to all
the other squares etc.
Path of Knight in a Chessboard
In simple words, a Node represents any object and Edge represents the
relation between two objects.
Adjacent Node:
If a node A shares an edge with node B, then B is considered to be adjacent
to A. In other words, if two nodes are directly connected, they are called
adjacent nodes. One node can have multiple adjacent nodes.
Directed and Undirected Graph:
In directed graphs, the edges have direction signs on one side, that means the
edges are Unidirectional. On the other hand, the edges of undirected graphs
have direction signs on both sides, that means they are Bidirectional. Usually
undirected graphs are represented with no signs on the either sides of the
edges.
Let's assume there is a party going on. The people in the party are represented
by nodes and there is an edge between two people if they shake hands. Then
this graph is undirected because any person A shake hands with person B if
and only if B also shakes hands with A. In contrast, if the edges from a
person A to another person B corresponds to A's admiring B, then this graph
is directed, because admiration is not necessarily reciprocated. The former
type of graph is called an undirected graph and the edges are called
undirected edges while the latter type of graph is called a directed graph and
the edges are called directed edges.
Weighted and Unweighted Graph:
A weighted graph is a graph in which a number (the weight) is assigned to
each edge. Such weights might represent for example costs, lengths or
capacities, depending on the problem at hand.

An unweighted graph is simply the opposite. We assume that, the weight of


all the edges are same (presumably 1).
Path:
A path represents a way of going from one node to another. It consists of
sequence of edges. There can be multiple paths between two nodes.

In the example above, there are two paths from A to D. A->B, B->C, C->D
is one path. The cost of this path is 3 + 4 + 2 = 9. Again, there's another path
A->D. The cost of this path is 10. The path that costs the lowest is called
shortest path.
Degree:
The degree of a vertex is the number of edges that are connected to it. If
there's any edge that connects to the vertex at both ends (a loop) is counted
twice.
In directed graphs, the nodes have two types of degrees:
In-degree: The number of edges that point to the node.
Out-degree: The number of edges that point from the node to other
nodes.
For undirected graphs, they are simply called degree.

Some Algorithms Related to Graph Theory


Bellman – Ford algorithm
Dijkstra's algorithm
Ford – Fulkerson algorithm
Kruskal's algorithm
Nearest neighbour algorithm
Prim's algorithm
Depth-first search
Breadth-first search
Section 9.3: Storing
Graphs (Adjacency List)
Adjacency list is a collection of unordered lists used to represent a finite
graph. Each list describes the set of neighbors of a vertex in a graph. It takes
less memory to store graphs.
Let's see a graph, and its adjacency matrix:

Now we create a list using these values.


This is called adjacency list. It shows which nodes are connected to which
nodes. We can store this information using a 2D array. But will cost us the
same memory as Adjacency Matrix. Instead we are going to use dynamically
allocated memory to store this one.
Many languages support Vector or List which we can use to store adjacency
list. For these, we don't need to specify the size of the List. We only need to
specify the maximum number of nodes.
The pseudo-code will be:

Since this one is an undirected graph, it there is an edge from x to y, there is


also an edge from y to x. If it was a directed graph, we'd omit the second one.
For weighted graphs, we need to store the cost too. We'll create another
vector or list named cost[] to store these. The pseudo-code:

From this one, we can easily find out the total number of nodes connected to
any node, and what these nodes are.
It takes less time than Adjacency Matrix. But if we needed to find out if
there's an edge between u and v, it'd have been easier if we kept an
adjacency matrix.
Section 9.4: Topological Sort
A topological ordering, or a topological sort, orders the vertices in a directed
acyclic graph on a line, i.e. in a list, such that all directed edges go from left
to right. Such an ordering cannot exist if the graph contains a directed cycle
because there is no way that you can keep going right on a line and still
return back to where you started from.
G V, ), then a linear ordering of all its vertices is such that if G u, v
E
contains an edge ( Formally,
in a graph = () ∈
Efrom vertex u to vertex v then u precedes v in the ordering.

It is important to note that each DAG has at least one topological sort.
There are known algorithms for constructing a topological ordering of any
DAG in linear time, one example is:
depth_first_search (G) to compute 1.v.Call f for each vertex v
finishing times 2. As each vertex is finished, insert it
into the front of a linked list
3. the linked list of vertices, as it is now sorted.
V+E V+E
A topological sort can be performed in ( ) time, since the depth-
first search algorithm takes ( ) time and it takes Ω (1) (constant time)
to insert each of |V| vertices into the front of a linked list.
Many applications use directed acyclic graphs to indicate precedences among
events. We use topological sorting so that we get an ordering to process each
vertex before any of its successors.
Vertices in a graph may represent tasks to be performed and the edges may
represent constraints that one task must be performed before another; a
topological ordering is a valid sequence to perform the tasks set of tasks
described in V.
Problem instance and its solution
Task ( hours_to_complete : int ), i. Task Cooldown ( hours : int ) such CooldownLet a
e. that vertice v
describe a
(4) describes a Task that takes 4 hours to complete, and an edge e describe a (3)
describes a duration of time to cool down after a completed task.
Let our graph be called dag (since it is a directed acyclic graph), and let it
contain 5 vertices:

where we connect the vertices with directed edges such that the graph is
acyclic,

then there are three possible topological orderings between A and E,


A -> B -> D -> E
1.
A -> C -> D -> E A -> C -> E
2.
3.
Section 9.5: Detecting a cycle in a
directed graph using Depth First
Traversal
A cycle in a directed graph exists if there's a back edge discovered during a
DFS. A back edge is an edge from a node to itself or one of the ancestors in a
DFS tree. For a disconnected graph, we get a DFS forest, so you have to
iterate through all vertices in the graph to find disjoint DFS trees.
C++ implementation:
Result: As shown below, there are three back edges in the graph. One
between vertex 0 and 2; between vertice 0, 1, and 2; and vertex 3. Time
complexity of search is O(V+E) where V is the number of vertices and E is
the number of edges.
Section 9.6: Thorup's algorithm
Thorup's algorithm for single source shortest path for undirected graph has
the time complexity O(m), lower than Dijkstra.
Basic ideas are the following. (Sorry, I didn't try implementing it yet, so I
might miss some minor details. And the original paper is paywalled so I tried
to reconstruct it from other sources referencing it. Please remove this
comment if you could verify.)
There are ways to find the spanning tree in O(m) (not described here).
You need to "grow" the spanning tree from the shortest edge to the
longest, and it would be a forest with several connected components
before fully grown.
Select an integer b (b>=2) and only consider the spanning forests with
length limit b^k. Merge the components which are exactly the same but
with different k, and call the minimum k the level of the component.
Then logically make components into a tree. u is the parent of v iff u is
the smallest component distinct from v that fully contains v. The root is
the whole graph and the leaves are single vertices in the original graph
(with the level of negative infinity). The tree still has only O(n) nodes.
Maintain the distance of each component to the source (like in Dijkstra's
algorithm). The distance of a component with more than one vertices is
the minimum distance of its unexpanded children. Set the distance of the
source vertex to 0 and update the ancestors accordingly.
Consider the distances in base b. When visiting a node in level k the first
time, put its children into buckets shared by all nodes of level k (as in
bucket sort, replacing the heap in Dijkstra's algorithm) by the digit k and
higher of its distance. Each time visiting a node, consider only its first b
buckets, visit and remove each of them, update the distance of the
current node, and relink the current node to its own parent using the new
distance and wait for the next visit for the following buckets.
When a leaf is visited, the current distance is the final distance of the
vertex. Expand all edges from it in the original graph and update the
distances accordingly.
Visit the root node (whole graph) repeatedly until the destination is
reached.
It is based on the fact that, there isn't an edge with length less than l between
two connected components of the spanning forest with length limitation l, so,
starting at distance x, you could focus only on one connected component
until you reach the distance x + l. You'll visit some vertices before vertices
with shorter distance are all visited, but that doesn't matter because it is
known there won't be a shorter path to here from those vertices. Other parts
work like the bucket sort / MSD radix sort, and of course, it requires the
O(m) spanning tree.
Chapter 10: Graph Traversals
Section 10.1: Depth First Search
traversal function
The function takes the argument of the current node index, adjacency list
(stored in vector of vectors in this example), and vector of boolean to keep
track of which node has been visited.
Chapter 11: Dijkstra’s Algorithm
Section 11.1: Dijkstra's Shortest Path
Algorithm
Before proceeding, it is recommended to have a brief idea about Adjacency
Matrix and BFS
Dijkstra's algorithm is known as single-source shortest path algorithm. It is
used for finding the shortest paths between nodes in a graph, which may
represent, for example, road networks. It was conceived by Edsger W.
Dijkstra in 1956 and published three years later.
We can find shortest path using Breadth First Search (BFS) searching
algorithm. This algorithm works fine, but the problem is, it assumes the cost
of traversing each path is same, that means the cost of each edge is same.
Dijkstra's algorithm helps us to find the shortest path where the cost of each
path is not the same.
At first we will see, how to modify BFS to write Dijkstra's algorithm, then
we will add priority queue to make it a complete Dijkstra's algorithm.
Let's say, the distance of each node from the source is kept in d[] array. As
in, d[3] represents that d[3] time is taken to reach node 3 from source. If we
don't know the distance, we will store infinity in d[3]. Also, let cost[u][v]
represent the cost of u-v. That means it takes cost[u][v] to go from u node to
v node.

We need to understand Edge Relaxation. Let's say, from your house, that is
source, it takes 10 minutes to go to place A. And it takes 25 minutes to go to
place B. We have,

Now let's say it takes 7 minutes to go from place A to place B, that means:
Then we can go to place B from source by going to place A from source and
then from place A, going to place B, which will take 10 + 7 = 17 minutes,
instead of 25 minutes. So,

Then we update,

This is called relaxation. We will go from node u to node v and if d[u] +


cost[u][v] < d[v] then we will update d[v] = d[u] + cost[u][v].
In BFS, we didn't need to visit any node twice. We only checked if a node is
visited or not. If it was not visited, we pushed the node in queue, marked it as
visited and incremented the distance by 1. In Dijkstra, we can push a node in
queue and instead of updating it with visited, we relax or update the new
edge. Let's look at one example:

Let's assume, Node 1 is the Source. Then,


We set, d[2], d[3] and d[4] to infinity because we don't know the distance
yet. And the distance of source is of course 0. Now, we go to other nodes
from source and if we can update them, then we'll push them in the queue.
Say for example, we'll traverse edge 1-2. As d[1] + 2 < d[2] which will make
d[2] = 2. Similarly, we'll traverse edge 1-3 which makes d[3] = 5.
We can clearly see that 5 is not the shortest distance we can cross to go to
node 3. So traversing a node only once, like BFS, doesn't work here. If we go
from node 2 to node 3 using edge 2-3, we can update d[3] = d[2] + 1 = 3. So
we can see that one node can be updated many times. How many times you
ask? The maximum number of times a node can be updated is the number of
in-degree of a node.
Let's see the pseudo-code for visiting any node multiple times. We will
simply modify BFS:

This can be used to find the shortest path of all node from the source. The
complexity of this code is not so good.
Here's why,
In BFS, when we go from node 1 to all other nodes, we follow first come,
first serve method. For example, we went to node 3 from source before
processing node 2. If we go to node 3 from source, we update node 4 as 5 +
3 = 8. When we again update node 3 from node 2, we need to update node 4
as 3 + 3 = 6 again! So node 4 is updated twice.
Dijkstra proposed, instead of going for First come, first serve method, if we
update the nearest nodes first, then it'll take less updates. If we processed
node 2 before, then node 3 would have been updated before, and after
updating node 4 accordingly, we'd easily get the shortest distance! The idea
is to choose from the queue, the node, that is closest to the source. So we
will use Priority Queue here so that when we pop the queue, it will bring us
the closest node u from source. How will it do that? It'll check the value of
d[u] with it.
Let's see the pseudo-code:

The pseudo-code returns distance of all other nodes from the source. If we
want to know distance of a single node v, we can simply return the value
when v is popped from the queue.
Now, does Dijkstra's Algorithm work when there's a negative edge? If there's
a negative cycle, then infinity loop will occur, as it will keep reducing the
cost every time. Even if there is a negative edge, Dijkstra won't work, unless
we return right after the target is popped. But then, it won't be a Dijkstra
algorithm. We'll need Bellman – Ford algorithm for processing negative
edge/cycle.
Complexity:
The complexity of BFS is O(log(V+E)) where V is the number of nodes and
E is the number of edges. For Dijkstra, the complexity is similar, but sorting
of Priority Queue takes O(logV). So the total complexity is: O(Vlog(V)+E)
Below is a Java example to solve Dijkstra's Shortest Path Algorithm using
Adjacency Matrix
int min = Integer.MAX_VALUE, min_index=-1;
for (int v = 0; v < V; v++) if (sptSet[v]
== false && dist[v] <= min) { min
= dist[v]; min_index = v; }
return
min_index; }
void printSolution(int dist[], int n)
{
System.out.println("Vertex Distance from Source");
for (int i = 0; i < V; i++) System.out.println(i+"
\t\t "+dist[i]); }

void dijkstra(int graph[][], int src)


{ Boolean sptSet[] = new
Boolean[V];
for (int i = 0; i < V; i++)
{ dist[i] =
Integer.MAX_VALUE;
sptSet[i] = false; }
dist[src] = 0;
for (int count = 0; count < V-1; count++)
{ int u = minDistance(dist,
sptSet); sptSet[u] = true; for
(int v = 0; v < V; v++)
if (!sptSet[v] && graph[u][v]!=0 &&
dist[u] != Integer.MAX_VALUE &&
dist[u]+graph[u][v] < dist[v])
dist[v] = dist[u] + graph[u][v]; }
printSolution(dist,
V); }
public static void main (String[] args) { int graph[][] =
new int[][]{{0, 4, 0, 0, 0, 0, 0, 8, 0}, {4, 0, 8,
0, 0, 0, 0, 11, 0},
{0, 8, 0, 7, 0, 4, 0, 0, 2},
{0, 0, 7, 0, 9, 14, 0, 0, 0},
{0, 0, 0, 9, 0, 10, 0, 0, 0},
{0, 0, 4, 14, 10, 0, 2, 0, 0},
{0, 0, 0, 0, 0, 2, 0, 1, 6},
{8, 11, 0, 0, 0, 0, 1, 0, 7},
{0, 0, 2, 0, 0, 0, 6, 7, 0}
};
ShortestPath t = new ShortestPath();

Expected output of the program is


Chapter 12: A* Pathfinding
Section 12.1: Introduction to A*
A* (A star) is a search algorithm that is used for finding path from one node
to another. So it can be compared with Breadth First Search, or Dijkstra ’ s
algorithm, or Depth First Search, or Best First Search. A* algorithm is
widely used in graph search for being better in efficiency and accuracy,
where graph pre-processing is not an option. A* is a an specialization of
Best First Search , in which the function of evaluation f is define in a
particular way. f(n) = g(n) + h(n) is the minimum cost since the initial node
to the objectives conditioned to go thought node n. g(n) is the minimum cost
from the initial node to n.
h(n) is the minimum cost from n to the closest objective to n
A* is an informed search algorithm and it always guarantees to find the
smallest path (path with minimum cost) in the least possible time (if uses
admissible heuristic). So it is both complete and optimal. The following
animation demonstrates A* search-
Section 12.2: A* Pathfinding through a
maze with no obstacles
Let's say we have the following 4 by 4 grid:

Let's assume that this is a maze. There are no walls/obstacles, though. We


only have a starting point (the green square), and an ending point (the red
square). Let's also assume that in order to get from green to red, we cannot
move diagonally. So, starting from the green square, let's see which squares
we can move to, and highlight them in blue:

In order to choose which square to move to next, we need to take into


account 2 heuristics:
1. The "g" value - This is how far away this node is from the green square.
2. The "h" value - This is how far away this node is from the red square.
3. The "f" value - This is the sum of the "g" value and the "h" value. This is
the final number which tells us which node to move to.
to. distance = abs ( from.

In order to calculate these heuristics, this is the formula we will use: x -x) +
to. abs ( from.
y -y)

This is known as the "Manhattan Distance" formula.


abs abs
Let's calculate the "g" value for the blue square immediately to the left of the
green square: (3 - 2) +(2 2) = 1
abs abs

Great! We've got the value: 1. Now, let's try calculating the "h"

value: (2 - 0) +(2 - 0) = 4 Perfect. Now, let's get the "f" value: 1 + 4 = 5

So, the final value for this node is "5".


Let's do the same for all the other blue squares. The big number in the center
of each square is the "f" value, while the number on the top left is the "g"
value, and the number on the top right is the "h" value:

We've calculated the g, h, and f values for all of the blue nodes. Now, which
do we pick?
Whichever one has the lowest f value.
However, in this case, we have 2 nodes with the same f value, 5. How do we
pick between them?
Simply, either choose one at random, or have a priority set. I usually prefer to
have a priority like so: "Right > Up > Down > Left"
One of the nodes with the f value of 5 takes us in the "Down" direction, and
the other takes us "Left". Since Down is at a higher priority than Left, we
choose the square which takes us "Down".
I now mark the nodes which we calculated the heuristics for, but did not
move to, as orange, and the node which we chose as cyan:

Alright, now let's calculate the same heuristics for the nodes around the cyan
node:
Again, we choose the node going down from the cyan node, as all the options
have the same f value:
Let's calculate the heuristics for the only neighbour that the cyan node has:
Alright, since we will follow the same pattern we have been following:
Once more, let's calculate the heuristics for the node's neighbour:
Let's move there:
Finally, we can see that we have a winning square beside us, so we move
there, and we are done.
Section 12.3: Solving 8-puzzle problem
using A* algorithm

Problem definition:
An 8 puzzle is a simple game consisting of a 3 x 3 grid (containing 9
squares). One of the squares is empty. The object is to move to squares
around into different positions and having the numbers displayed in the "goal
state".

Given an initial state of 8-puzzle game and a final state of to be reached, find
the most cost-effective path to reach the final state from initial state.
Let us consider the Manhattan distance between the current and final state as
the heuristic for this problem statement.

First we find the heuristic value required to reach the final state from initial
state. The cost function, g(n) = 0, as we are in the initial state

The above value is obtained, as 1 in the current state is 1 horizontal distance


away than the 1 in final state. Same goes for 2, 5, 6. _ is 2 horizontal distance
away and 2 vertical distance away. So total value for h(n) is 1 + 1 + 1 + 1 + 2
+ 2 = 8. Total cost function f(n) is equal to 8 + 0 = 8.
Now, the possible states that can be reached from initial state are found and it
happens that we can either move _ to right or downwards.
So states obtained after moving those moves are:

Again the total cost function is computed for these states using the method
described above and it turns out to be 6 and 7 respectively. We chose the
state with minimum cost which is state (1). The next possible moves can be
Left, Right or Down. We won't move Left as we were previously in that
state. So, we can move Right or Down.
Again we find the states obtained from (1).
(3) leads to cost function equal to 6 and (4) leads to 4. Also, we will consider
(2) obtained before which has cost function equal to 7. Choosing minimum
from them leads to (4). Next possible moves can be Left or Right or Down.
We get states:

We get costs equal to 5, 2 and 4 for (5), (6) and (7) respectively. Also, we
have previous states (3) and (2) with 6 and 7 respectively. We chose
minimum cost state which is (6). Next possible moves are Up, and Down and
clearly Down will lead us to final state leading to heuristic function value
equal to 0.
Chapter 13: A* Pathfinding
Algorithm
This topic is going to focus on the A* Pathfinding algorithm, how it's used,
and why it works.
Note to future contributors: I have added an example for A* Pathfinding
without any obstacles, on a 4x4 grid. An example with obstacles is still
needed.
Section 13.1: Simple Example of A*
Pathfinding: A maze with no obstacles
Let's say we have the following 4 by 4 grid:

Let's assume that this is a maze. There are no walls/obstacles, though. We


only have a starting point (the green square), and an ending point (the red
square). Let's also assume that in order to get from green to red, we cannot
move diagonally. So, starting from the green square, let's see which squares
we can move to, and highlight them in blue:

In order to choose which square to move to next, we need to take into


account 2 heuristics:
1. The "g" value - This is how far away this node is from the green square.
2. The "h" value - This is how far away this node is from the red square.
3. The "f" value - This is the sum of the "g" value and the "h" value. This is
the final number which tells us which node to move to.
to. distance = abs ( from.

In order to calculate these heuristics, this is the formula we will use: x -x) +
to. abs ( from.
y -y)

This is known as the "Manhattan Distance" formula.


abs abs
Let's calculate the "g" value for the blue square immediately to the left of the
green square: (3 - 2) +(2 2) = 1
abs abs

Great! We've got the value: 1. Now, let's try calculating the "h"

value: (2 - 0) +(2 - 0) = 4 Perfect. Now, let's get the "f" value: 1 + 4 = 5

So, the final value for this node is "5".


Let's do the same for all the other blue squares. The big number in the center
of each square is the "f" value, while the number on the top left is the "g"
value, and the number on the top right is the "h" value:

We've calculated the g, h, and f values for all of the blue nodes. Now, which
do we pick?
Whichever one has the lowest f value.
However, in this case, we have 2 nodes with the same f value, 5. How do we
pick between them?
Simply, either choose one at random, or have a priority set. I usually prefer to
have a priority like so: "Right > Up > Down > Left"
One of the nodes with the f value of 5 takes us in the "Down" direction, and
the other takes us "Left". Since Down is at a higher priority than Left, we
choose the square which takes us "Down".
I now mark the nodes which we calculated the heuristics for, but did not
move to, as orange, and the node which we chose as cyan:

Alright, now let's calculate the same heuristics for the nodes around the cyan
node:
Again, we choose the node going down from the cyan node, as all the options
have the same f value:
Let's calculate the heuristics for the only neighbour that the cyan node has:
Alright, since we will follow the same pattern we have been following:
Once more, let's calculate the heuristics for the node's neighbour:
Let's move there:
Finally, we can see that we have a winning square beside us, so we move
there, and we are done.
Chapter 14: Dynamic Programming
Dynamic programming is a widely used concept and its often used for
optimization. It refers to simplifying a complicated problem by breaking it
down into simpler sub-problems in a recursive manner usually a bottom-
up approach. There are two key attributes that a problem must have in
order for dynamic programming to be applicable "Optimal substructure"
and "Overlapping sub-problems". To achieve its optimization, dynamic
programming uses a concept called memoization
Section 14.1: Edit Distance
The problem statement is like if we are given two string str1 and str2
then how many minimum number of operations can be performed on
the str1 that it gets converted to str2. Implementation in Java

}
Output
3
Section 14.2: Weighted Job
Scheduling Algorithm
Weighted Job Scheduling Algorithm can also be denoted as Weighted
Activity Selection Algorithm.
The problem is, given certain jobs with their start time and end time, and a
profit you make when you finish the job, what is the maximum profit you can
make given no two jobs can be executed in parallel?
This one looks like Activity Selection using Greedy Algorithm, but there's an
added twist. That is, instead of maximizing the number of jobs finished, we
focus on making the maximum profit. The number of jobs performed doesn't
matter here.
Let's look at an example:

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | A | B | C | D | E | F |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (2,5) | (6,7) | (7,9) | (1,3) | (5,8) | (4,6) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 6 | 4 | 2 | 5 | 11 | 5 |
+-------------------------+---------+---------+---------+---------+---------+---------+

The jobs are denoted with a name, their start and finishing time and profit.
After a few iterations, we can find out if we perform Job-A and Job-E, we
can get the maximum profit of 17. Now how to find this out using an
algorithm?
The first thing we do is sort the jobs by their finishing time in non-decreasing
order. Why do we do this? It's because if we select a job that takes less time
to finish, then we leave the most amount of time for choosing other jobs. We
have:
+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

We'll have an additional temporary array Acc_Prof of size n (Here, n


denotes the total number of jobs). This will contain the maximum
accumulated profit of performing the jobs. Don't get it? Wait and watch.
We'll initialize the values of the array with the profit of each jobs. That
means, Acc_Prof[i] will at first hold the profit of performing i-th job.

+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

Now let's denote position 2 with i, and position 1 will be denoted with j. Our
strategy will be to iterate j from 1 to i-1 and after each iteration, we will
increment i by 1, until i becomes n+1.
j i

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
We check if Job[i] and Job[j] overlap, that is, if the finish time of Job[j] is
greater than Job[i]'s start time, then these two jobs can't be done together.
However, if they don't overlap, we'll check if Acc_Prof[j] + Profit[i] >
Acc_Prof[i]. If this is the case, we will update Acc_Prof[i] = Acc_Prof[j] +
Profit[i]. That is:

Here Acc_Prof[j] + Profit[i] represents the accumulated profit of doing


these two jobs toegther. Let's check it for our example:
Here Job[j] overlaps with Job[i]. So these to can't be done together. Since
our j is equal to i-1, we increment the value of i to i+1 that is 3. And we
make j = 1.
j i
+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+

|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
Now Job[j] and Job[i] don't overlap. The total amount of profit we can make
by picking these two jobs is: Acc_Prof[j] + Profit[i] = 5 + 5 = 10 which is
greater than Acc_Prof[i]. So we update Acc_Prof[i] = 10. We also increment
j by 1.
We get,
j i

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 10 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

Here, Job[j] overlaps with Job[i] and j is also equal to i-1. So we increment i
by 1, and make j = 1. We get,
j i

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

| Acc_Prof | 5 | 6 | 10 | 4 | 11 | 2 | +-------------------------+---------+--------
-+---------+---------+---------+---------+

Now, Job[j] and Job[i] don't overlap, we get the accumulated profit 5 + 4 =
9, which is greater than Acc_Prof[i]. We update Acc_Prof[i] = 9 and
increment j by 1.
j i

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 10 | 9 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

Again Job[j] and Job[i] don't overlap. The accumulated profit is: 6 + 4 = 10,
which is greater than Acc_Prof[i]. We again update Acc_Prof[i] = 10. We
increment j by 1. We get:
j i
+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 10 | 10 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+

If we continue this process, after iterating through the whole table using i, our
table will finally look like:

+-------------------------+---------+---------+---------+---------+---------+---------+
| Name | D | A | F | B | E | C |
+-------------------------+---------+---------+---------+---------+---------+---------+
|(Start Time, Finish Time)| (1,3) | (2,5) | (4,6) | (6,7) | (5,8) | (7,9) |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Profit | 5 | 6 | 5 | 4 | 11 | 2 |
+-------------------------+---------+---------+---------+---------+---------+---------+
| Acc_Prof | 5 | 6 | 10 | 14 | 17 | 8 |
+-------------------------+---------+---------+---------+---------+---------+---------+

* A few steps have been skipped to make the document shorter.


If we iterate through the array Acc_Prof, we can find out the maximum profit
to be 17! The pseudo-code:
The complexity of populating the Acc_Prof array is O(n2). The array
traversal takes O(n). So the total complexity of this algorithm is O(n2).
Now, If we want to find out which jobs were performed to get the maximum
profit, we need to traverse the array in reverse order and if the Acc_Prof
matches the maxProfit, we will push the name of the job in a stack and
subtract Profit of that job from maxProfit. We will do this until our
maxProfit > 0 or we reach the beginning point of the Acc_Prof array. The
pseudo-code will look like:

The complexity of this procedure is: O(n).


One thing to remember, if there are multiple job schedules that can give us
maximum profit, we can only find one job schedule via this procedure.
Section 14.3: Longest Common
Subsequence
If we are given with the two strings we have to find the longest common sub-
sequence present in both of them.
Example
LCS for input Sequences “ ABCDGH ” and “ AEDFHR ” is “ ADH ” of
length 3.
LCS for input Sequences “ AGGTAB ” and “ GXTXAYB ” is “ GTAB ”
of length 4.
Implementation in Java
Output
Section 14.4: Fibonacci Number
Bottom up approach for printing the nth Fibonacci number using Dynamic
Programming.
Recursive Tree

Overlapping Sub-problems
Here fib(0),fib(1) and fib(3) are the overlapping sub-problems.fib(0) is
getting repeated 3 times, fib(1) is getting repeated 5 times and fib(3) is
getting repeated 2 times.
Implementation

Time Complexity
O(n)
Section 14.5: Longest
Common Substring
Given 2 string str1 and str2 we have to find the length of the longest common
substring between them.
Examples
Input : X = "abcdxyz", y = "xyzabcd" Output : 4
The longest common substring is "abcd" and is of length 4.
Input : X = "zxabcdezy", y = "yzabcdezx" Output : 6
The longest common substring is "abcdez" and is of length 6.
Implementation in Java

Time Complexity
O(m*n)
Chapter 15: Applications of
Dynamic Programming
The basic idea behind dynamic programming is breaking a complex problem
down to several small and simple problems that are repeated. If you can
identify a simple subproblem that is repeatedly calculated, odds are there is a
dynamic programming approach to the problem.
As this topic is titled Applications of Dynamic Programming, it will focus
more on applications rather than the process of creating dynamic
programming algorithms.
Section 15.1: Fibonacci Numbers
Fibonacci Numbers are a prime subject for dynamic programming as the
traditional recursive approach makes a lot of repeated calculations. In these
examples I will be using the base case of

f(0) = (1) = 1.

Non-Dynamic Programming O(2^n) Runtime Complexity, O(n) Stack


complexity

fibonacci ( 15 ) will take twice as fibonacci ( 14The O(2^n)


runtime complexity
long as proof that can be seen here:
Computational complexity of
Fibonacci Sequence. The main point to note is that the runtime is
exponential, which means the runtime for this will double for every
subsequent term, ).
Memoized O(n) Runtime Complexity, O(n) Space complexity, O(n) Stack
complexity
With the memoized approach we introduce an array that can be thought of as
all the previous function calls. The
memo [n] is the result of the fibonacci location (n). This allows us to trade

function call space complexity of O(n) for a O(n)


runtime as we no longer need to
compute duplicate function calls.
Iterative Dynamic Programming O(n) Runtime complexity, O(n) Space
complexity, No recursive stack

fibonacci fibonacci (n-1) fibonacci


If we break the problem down into it's core
and elements you will notice that in order to
compute (n) we need (n-2). Also we can notice that our base case will appear at
the end of that recursive tree as seen above.
fibonacci
With this information, it now makes sense to compute the solution
backwards, starting at the base cases and working upwards. Now in order to
calculate (n) we first calculate all the fibonacci numbers up to and through n.
This main benefit here is that we now have eliminated the recursive stack
while keeping the O(n) runtime.
Unfortunately, we still have an O(n) space complexity but that can be changed
as well.
Advanced Iterative Dynamic Programming O(n) Runtime complexity, O(1)
Space complexity, No recursive stack
As noted above, the iterative dynamic programming approach starts from the
base cases and works to the end result. The key observation to make in order
to get to the space complexity to O(1) (constant) is the same
fibonacci (n-1) fibonacci observation we made for the recursive stack - we
and only need (n-2) to build
fibonacci (n). This means that we only need to save fibonacci (n-1) and fibonacci
the results for
(n-2) at any point in our iteration.

i , 1 , 0 , 1 , 0 , 1 , ..., i

To store these last 2 results I use an array of size 2 and simply flip which
index I am assigning to by using % 2 which will alternate like so: 0% 2.
== 11 i

I add both indexes of the array together because we know that addition is
commutative (5 + 6 = and 6 + 5 11). The result is then assigned to the older of
the two spots (denoted by % 2). The final result is then stored at the position
n%2

Notes
It is important to note that sometimes it may be best to come up with a
iterative memoized solution for
functions that perform large calculations repeatedly as you will build up
a cache of the answer to the function calls and subsequent calls may be
O(1) if it has already been computed.
Chapter 16: Kruskal's Algorithm
Section 16.1: Optimal, disjoint-set
based implementation
We can do two things to improve the simple and sub-optimal disjoint-set
subalgorithms:
1. Path compression heuristic: findSet does not need to ever handle a tree
with height bigger than 2. If it ends up iterating such a tree, it can link
the lower nodes directly to the root, optimizing future traversals;

2. Height-based merging heuristic: for each node, store the height of its
subtree. When merging, make the taller tree the parent of the smaller
one, thus not increasing anyone's height.

alpha Thisleads to O((n)) time for each operation, where alpha is the inverse of
the fast-growing Ackermann function, thus it is very slow growing, and can
be considered O(1) for practical purposes.
m log m + m O ( m log
m This makes the entire Kruskal's algorithm O() =), because of
the initial sorting.
Note
Path compression may reduce the height of the tree, hence comparing heights
of the trees during union operation might not be a trivial task. Hence to avoid
the complexity of storing and calculating the height of the trees the resulting
parent can be picked randomly:

In practice this randomised algorithm together with path compression for


findSet operation will result in comparable performance, yet much simpler to

implement.
Section 16.2: Simple, more detailed
implementation
In order to efficiently handle cycle detection, we consider each node as part
of a tree. When adding an edge, we check if its two component nodes are part
of distinct trees. Initially, each node makes up a one-node tree.
algorithm
kruskalMST(G: a graph)
sort Gs edges by their
value

MST = a forest of trees, initially each tree is a node in the graph


for each edge e in G:
if the root of the tree that e.first belongs to is not the same as
the root of the tree that e.second belongs to:
connect one of the roots to the other, thus merging two trees

return MST, which now a single-tree forest


Section 16.3: Simple, disjoint-set based
implementation
The above forest methodology is actually a disjoint-set data structure, which
involves three main operations:

O(n log n)
time for managing the disjoint-set data O(m*n
structure, leading to
This naive implementation leads to log n) time for the
entire Kruskal's algorithm.
Section 16.4: Simple, high
level implementation
Sort the edges by value and add each one to the MST in sorted order, if it
doesn't create a cycle.
Chapter 17: Greedy Algorithms
Section 17.1: Human Coding
Huffman code is a particular type of optimal prefix code that is commonly
used for lossless data compression. It compresses data very effectively saving
from 20% to 90% memory, depending on the characteristics of the data being
compressed. We consider the data to be a sequence of characters. Huffman's
greedy algorithm uses a table giving how often each character occurs (i.e., its
frequency) to build up an optimal way of representing each character as a
binary string. Huffman code was proposed by David A. Huffman in 1951.
Suppose we have a 100,000-character data file that we wish to store
compactly. We assume that there are only 6 different characters in that file.
The frequency of the characters are given by:

+------------------------+-----+-----+-----+-----+-----+-----+
| Character | a | b | c | d | e | f |
+------------------------+-----+-----+-----+-----+-----+-----+
|Frequency (in thousands)| 45 | 13 | 12 | 16 | 9 | 5 |
+------------------------+-----+-----+-----+-----+-----+-----+

We have many options for how to represent such a file of information. Here,
we consider the problem of designing a Binary Character Code in which
each character is represented by a unique binary string, which we call a
codeword.
The constructed tree will provide us with:

+------------------------+-----+-----+-----+-----+-----+-----+
| Character | a | b | c | d | e | f |
+------------------------+-----+-----+-----+-----+-----+-----+
| Fixed-length Codeword | 000 | 001 | 010 | 011 | 100 | 101 |
+------------------------+-----+-----+-----+-----+-----+-----+
|Variable-length Codeword| 0 | 101 | 100 | 111 | 1101| 1100|
+------------------------+-----+-----+-----+-----+-----+-----+

If we use a fixed-length code, we need three bits to represent 6 characters.


This method requires 300,000 bits to code the entire file. Now the question
is, can we do better?
A variable-length code can do considerably better than a fixed-length code,
by giving frequent characters short codewords and infrequent characters long
codewords. This code requires: (45 X 1 + 13 X 3 + 12 X 3 + 16 X 3 + 9 X 4
+ 5 X 4) X 1000 = 224000 bits to represent the file, which saves
approximately 25% of memory.
One thing to remember, we consider here only codes in which no codeword
is also a prefix of some other codeword. These are called prefix codes. For
variable-length coding, we code the 3-character file abc as 0.101.100 =
0101100, where "." denotes the concatenation.
Prefix codes are desirable because they simplify decoding. Since no
codeword is a prefix of any other, the codeword that begins an encoded file is
unambiguous. We can simply identify the initial codeword, translate it back
to the original character, and repeat the decoding process on the remainder of
the encoded file. For example, 001011101 parses uniquely as 0.0.101.1101,
which decodes to aabe. In short, all the combinations of binary
representations are unique. Say for example, if one letter is denoted by 110,
no other letter will be denoted by 1101 or 1100. This is because you might
face confusion on whether to select 110 or to continue on concatenating the
next bit and select that one.
Compression Technique:
The technique works by creating a binary tree of nodes. These can stored in a
regular array, the size of which depends on the number of symbols, n. A
node can either be a leaf node or an internal node. Initially all nodes are leaf
nodes, which contain the symbol itself, its frequency and optionally, a link to
its child nodes. As a convention, bit '0' represents left child and bit '1'
represents right child. Priority queue is used to store the nodes, which
provides the node with lowest frequency when popped. The process is
described below:
1. Create a leaf node for each symbol and add it to the priority queue.
2. While there is more than one node in the queue:
1. Remove the two nodes of highest priority from the queue.
2. Create a new internal node with these two nodes as children and
with frequency equal to the sum of the two nodes' frequency.
3. Add the new node to the queue.
3. The remaining node is the root node and the Huffman tree is complete.
For our example:
The pseudo-code looks like:

Although linear-time given sorted input, in general cases of arbitrary input,


using this algorithm requires presorting. Thus, since sorting takes O(nlogn)
time in general cases, both methods have same complexity.
Since n here is the number of symbols in the alphabet, which is typically very
small number (compared to the length of the message to be encoded), time
complexity is not very important in the choice of this algorithm.
Decompression Technique:
The process of decompression is simply a matter of translating the stream of
prefix codes to individual byte value, usually by traversing the Huffman tree
node by node as each bit is read from the input stream. Reaching a leaf node
necessarily terminates the search for that particular byte value. The leaf value
represents the desired character. Usually the Huffman Tree is constructed
using statistically adjusted data on each compression cycle, thus the
reconstruction is fairly simple. Otherwise, the information to reconstruct the
tree must be sent separately. The pseudo-code:

Greedy Explanation:
Huffman coding looks at the occurrence of each character and stores it as a
binary string in an optimal way. The idea is to assign variable-length codes to
input input characters, length of the assigned codes are based on the
frequencies of corresponding characters. We create a binary tree and operate
on it in bottom-up manner so that the least two frequent characters are as far
as possible from the root. In this way, the most frequent character gets the
smallest code and the least frequent character gets the largest code.
References:
Introduction to Algorithms - Charles E. Leiserson, Clifford Stein,
Ronald Rivest, and Thomas H. Cormen Huffman Coding - Wikipedia
Discrete Mathematics and Its
Applications - Kenneth H. Rosen
Section 17.2: Activity
Selection Problem
The Problem
You have a set of things to do (activities). Each activity has a start time and a
end time. You aren't allowed to perform more than one activity at a time.
Your task is to find a way to perform the maximum number of activities.
For example, suppose you have a selection of classes to choose from.
Activity No. start time end time
10.20 A.M 11.00AM
10.30 A.M 11.30AM
11.00 A.M 12.00AM
10.00 A.M 11.30AM
9.00 A.M 11.00AM
Remember, you can't take two classes at the same time. That means you can't
take class 1 and 2 because they share a common time 10.30 A.M to 11.00
A.M. However, you can take class 1 and 3 because they don't share a
common time. So your task is to take maximum number of classes as
possible without any overlap. How can you do that?
Analysis
Lets think for the solution by greedy approach.First of all we randomly chose
some approach and check that will work or not.
sort the activity by start time that means which activity start first we
will take them first. then take first to last from sorted list and check it
will intersect from previous taken activity or not. If the current activity
is not intersect with the previously taken activity, we will perform the
activity otherwise we will not perform. this approach will work for some
cases like
Activity No. start time end time
11.00 A.M 1.30P.M
11.30 A.M 12.00P.M
1.30 P.M 2.00P.M 4 10.00 A.M 11.00AM
the sorting order will be 4-->1-->2-->3 .The activity 4--> 1--> 3 will be
performed and the activity 2 will be skipped. the maximum 3 activity will be
performed. It works for this type of cases. but it will fail for some cases. Lets
apply this approach for the case
Activity No. start time end time
11.00 A.M 1.30P.M
11.30 A.M 12.00P.M
1.30 P.M 2.00P.M
10.00 A.M 3.00P.M
The sort order will be 4-->1-->2-->3 and only activity 4 will be performed
but the answer can be activity 1-->3 or 2-
->3 will be performed. So our approach will not work for the above case.
Let's try another approach
Sort the activity by time duration that means perform the shortest
activity first. that can solve the previous problem . Although the
problem is not completely solved. There still some cases that can fail the
solution. apply this approach on the case bellow.
Activity No. start time end time
1 6.00 A.M
11.40A.M 2
11.30 A.M
12.00P.M
3 11.40 P.M 2.00P.M
if we sort the activity by time duration the sort order will be 2--> 3 --->1 .
and if we perform activity No. 2 first then no other activity can be performed.
But the answer will be perform activity 1 then perform 3 . So we can perform
maximum 2 activity.So this can not be a solution of this problem. We should
try a different approach.
The solution
Sort the Activity by ending time that means the activity finishes first
that come first. the algorithm is given below

1. Sort the activities by its ending times.


2. If the activity to be performed do not share a common time with
the activities that previously performed, perform the activity.

Lets analyse the first example


Activity No. start time end time
10.20 A.M 11.00AM
10.30 A.M 11.30AM
11.00 A.M 12.00AM
10.00 A.M 11.30AM 5 9.00 A.M 11.00AM
sort the activity by its ending times , So sort order will be 1-->5-->2-->4-->3..
the answer is 1-->3 these two activities will be performed. ans that's the
answer. here is the sudo code.

1. sort: activities
2. perform first activity from the sorted list of activities.
3. Set : Current_activity := first activity
4. set: end_time := end_time of Current activity
5. go to next activity if exist, if not exist terminate .
6. if start_time of current activity <= end_time : perform the
activity and go to 4
7. else: got to 5.

see here for coding help http://www.geeksforgeeks.org/greedy-


algorithms-set-1-activity-selection-problem/
Section 17.3: Change-making problem
Given a money system, is it possible to give an amount of coins and how to
find a minimal set of coins corresponding to this amount.
Canonical money systems. For some money system, like the ones we use in
the real life, the "intuitive" solution works perfectly. For example, if the
different euro coins and bills (excluding cents) are 1 € , 2 € , 5 € , 10 € ,
giving the highest coin or bill until we reach the amount and repeating this
procedure will lead to the minimal set of coins.
We can do that recursively with OCaml :

These systems are made so that change-making is easy. The problem gets
harder when it comes to arbitrary money system.
General case. How to give 99 € with coins of 10 € , 7 € and 5 € ? Here,
giving coins of 10 € until we are left with 9 € leads obviously to no solution.
Worse than that a solution may not exist. This problem is in fact np-hard, but
acceptable solutions mixing greediness and memoization exist. The idea is
to explore all the possibilies and pick the one with the minimal number of
coins.
To give an amount X > 0, we choose a piece P in the money system, and then
solve the sub-problem corresponding to X-P. We try this for all the pieces of
the system. The solution, if it exists, is then the smallest path that led to 0.
Here an OCaml recursive function corresponding to this method. It returns
None, if no solution exists.
Note: We can remark that this procedure may compute several times the
change set for the same value. In practice, using memoization to avoid these
repetitions leads to faster (way faster) results.
Chapter 18: Applications of Greedy
technique
Section 18.1: Oine Caching
<= kthen we just put all elements in the cache and it will >>The caching
work, but usually is m problem arises
from the
limitation of finite space. Lets assume our cache C has k pages. Now we want
to process a sequence of m item requests which must have been placed in the
cache before they are processed.Of course if mk.
We say a request is a cache hit, when the item is already in cache, otherwise
its called a cache miss. In that case we must bring the requested item into
cache and evict another, assuming the cache is full. The Goal is a eviction
schedule that minimizes the number of evictions.
There are numerous greedy strategies for this problem, lets look at some:
1. First in, first out (FIFO): The oldest page gets evicted
2. Last in, first out (LIFO): The newest page gets evicted
3. Last recent out (LRU): Evict page whose most recent access was
earliest
4. Least frequently requested(LFU): Evict page that was least frequently
requested
5. Longest forward distance (LFD): Evict page in the cache that is not
requested until farthest in the future.
Attention: For the following examples we evict the page with the smallest
index, if more than one page could be evicted.
Example (FIFO)
Let the cache size be k=3 the initial cache a,b,c and the request
a,a,d,e,b,b,a,c,f,d,e,a,f,b,e,c:
cache
Request aadebba
cfdeafbe
c 1a a d d d d
aaadddff
fc
cache
2 bbbeeeeccceeebbb
cache
3 c c c c b b b b f f f a a a e e cache miss x x x
xxxxxxxxxx
Thirteen cache misses by sixteen requests does not sound very optimal, lets
try the same example with another strategy:
Example (LFD)
Let the cache size be k=3 the initial cache a,b,c and the request
a,a,d,e,b,b,a,c,f,d,e,a,f,b,e,c:

Request a a d e b b a c f d e a f b e c
cache 1 a a d e e e e e e e e e e e e c
cache 2 b b b b b b a a a a a a f f f f
cache 3 c c c c c c c c f d d d d b b b cache miss x x x xx xx x
Eight cache misses is a lot better.
Selftest: Do the example for LIFO, LFU, RFU and look what happend.
The following example programm (written in C++) consists of two parts:
The skeleton is a application, which solves the problem dependent on the
chosen greedy strategy:
The basic idea is simple: for every request I have two calls two my strategy:
1. apply: The strategy has to tell the caller which page to use
2. update: After the caller uses the place, it tells the strategy whether it was
a miss or not. Then the strategy may update its internal data. The
strategy LFU for example has to update the hit frequency for the cache
pages, while the LFD strategy has to recalculate the distances for the
cache pages.
Now lets look of example implementations for our five strategies:
FIFO
FIFO just needs the information how long a page is in the cache (and of
course only relative to the other pages). So the only thing to do is wait for a
miss and then make the pages, which where not evicted older. For our
example above the program solution is:
Thats exact the solution from above.
LIFO
The implementation of LIFO is more or less the same as by FIFO but we
evict the youngest not the oldest page. The program results are:
LRU

In case of LRU the strategy is independent from what is at the cache page, its
only interest is the last usage. The programm results are:
LFU
LFU evicts the page uses least often. So the update strategy is just to count
every access. Of course after a miss the count resets. The program results are:
LFD
The LFD strategy is different from everyone before. Its the only strategy that
uses the future requests for its decission who to evict. The implementation
uses the function calcNextUse to get the page which next use is farthest away in
the future. The program solution is equal to the solution by hand from above:

The greedy strategy LFD is indeed the only optimal strategy of the five
presented. The proof is rather long and can be found here or in the book by
Jon Kleinberg and Eva Tardos (see sources in remarks down below).
Algorithm vs Reality
The LFD strategy is optimal, but there is a big problem. Its an optimal
offline solution. In praxis caching is usually an online problem, that means
the strategy is useless because we cannot now the next time we need a
particular item. The other four strategies are also online strategies. For
online problems we need a general different approach.
Section 18.2: Ticket automat
First simple Example:
You have a ticket automat which gives exchange in coins with values 1, 2, 5,
10 and 20. The dispension of the exchange can be seen as a series of coin
drops until the right value is dispensed. We say a dispension is optimal when
its coin count is minimal for its value.
50 ]be the price for the ticket T 50 ] the money somebody paid PM >=Let M
and P in [1, for T, with in [1,.
Let
D=P-M.
We define the benefit of a step as the difference between D and D-c with c the
coin the automat dispense in this step.
The Greedy Technique for the exchange is the following pseudo algorithmic
approach:
D > 20 dispense a 20 coin D = D - 20
and set
Step 1: while
D > 10 dispense a 10 coin D = D - 10
and set
Step 2: while
DD=D

Step 3: while > 5 dispense a 5 coin and set - 5


D D = D Step 4: while > 2 dispense a 2 coin and set - 2
D D = D Step 5: while > 1 dispense a 1 coin and set - 1

Afterwards the sum of all coins clearly equals D. Its a greedy algorithm
because after each step and after each repitition of a step the benefit is
maximized. We cannot dispense another coin with a higher benefit.
Now the ticket automat as program (in C++):
Be aware there is now input checking to keep the example simple. One
example output:
As long as 1 is in the coin values we now, that the algorithm will terminate,
because:
Dstrictly decreases with every step
D is never >0 and smaller than than the smallest coin 1 at the same
time
But the algorithm has two pitfalls:
1. Let C be the biggest coin value. The runtime is only polynomial as long
as D/C is polynomial, because the representation of D uses only log D bits
and the runtime is at least linear in D/C.
2. In every step our algorithm chooses the local optimum. But this is not
sufficient to say that the algorithm finds the global optimal solution (see
more information here or in the Book of Korte and Vygen).
A simple counter example: the coins are 1,3,4 and D=6. The optimal
solution is clearly two coins of value 3 but greedy chooses 4 in the first
step so it has to choose 1 in step two and three. So it gives no optimal
soution. A possible optimal Algorithm for this example is based on
dynamic programming.
Section 18.3: Interval Scheduling
a,b,c,d,e,f,gWe have a set of jobs J={}. Let j in J be a job than its start at sj and
ends at fj. Two jobs are compatible if they don't overlap. A picture as
example:

The goal is to find the maximum subset of mutually compatible jobs.


There are several greedy approaches for this problem:
1. Earliest start time: Consider jobs in ascending order of sj
2. Earliest finish time: Consider jobs in ascending order of fj
fj 3.
- sj Shortest interval: Consider jobs in ascending order of
4. Fewest conflicts: For each job j, count the number of conflicting jobs cj
The question now is, which approach is really successfull. Early start time
definetly not, here is a counter example

Shortest interval is not optimal either


and fewest conflicts may indeed sound optimal, but here is a problem case
for this approach:

Which leaves us with earliest finish time. The pseudo code is quiet simple:
f1<=f2<=...<=fn
1. Sort jobs by finish time so that
2. Let A be an empty set
3. for j=1 to n if j is compatible to all jobs in A set A=A+{j}
4. A is a maximum subset of mutually compatible jobs
Or as C++ program:
The implementation of the algorithm is clearly in Θ (n^2). There is a Θ (n
log n) implementation and the interested reader may continue reading below
(Java Example).
Now we have a greedy algorithm for the interval scheduling problem, but is it
optimal?
Proposition: The greedy algorithm earliest finish time is optimal.
Proof:(by contradiction)
i1 = j1,i2 = j2,...,ir = jr Assume
greedy is not optimal and i1,i2,...,ik denote the set of
jobs selected by greedy. Let j1,j2,...,jm denote the set of jobs in an optimal
solution with for the largest possible value of r.
j1,j2,...,jr,i (r+1) ,j (r+2) ,...,jm The job i(r+1) exists and finishes before j(r+1)
(earliest finish). But than is
jk = ikalso a optimal solution and for all k in [1,(r+1)] is . thats a contradiction
to the maximality of r. This concludes the proof.
This second example demonstrates that there are usually many possible
greedy strategies but only some or even none might find the optimal solution
in every instance.
Below is a Java program that runs in Θ (n log n)

import java.util.Comparator;
class
Job
{
int start, finish, profit;
Job(int start, int finish, int profit)
{ this.start =
start;
this.finish = finish;
this.profit =
profit;

}
}

class JobComparator implements Comparator<Job> { public int


compare(Job a, Job b) { return a.finish < b.finish ? -1 : a.finish ==
b.finish ? 0 : 1;

}
}
public class WeightedIntervalScheduling { static public
int binarySearch(Job jobs[], int index) { int lo = 0,
hi = index - 1;
while (lo <= hi) { int mid = (lo + hi) /
2; if (jobs[mid].finish <= jobs[index].start)
{ if (jobs[mid + 1].finish <= jobs[index].start)
lo = mid + 1; else return mid;
} else hi = mid - 1;
}
return -1;
}
static public int schedule(Job jobs[]) {
Arrays.sort(jobs, new
JobComparator());
int n = jobs.length;
int table[] = new
int[n]; table[0] =
jobs[0].profit;
for (int i=1; i<n; i++) { int inclProf =
jobs[i].profit; int l = binarySearch(jobs, i);
if (l != -1) inclProf += table[l];
table[i] = Math.max(inclProf, table[i-1]);

And the expected output is:


Section 18.4: Minimizing Lateness
fj = sj + tj . We define max {0 ,fj - dhThere
are numerous problems minimizing
lateness L= lateness, here we have a single resource
which can only process one job at a time.
Job j requires tj units of processing time and is due at time dj. if j starts at time
sj it will finish at time } for all j. The goal is to minimize the maximum
lateness L.
1
2
3
4
5 6
tj
3
2
1
4
3 2
dj
6
8
9
9
10
11
Job 3 2 2 5 5 5 4 4 4 4 1 1 1 6 6
Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Lj -8 -5 -4 1 7 4
The solution L=7 is obviously not optimal. Lets look at some greedy
strategies:
1. Shortest processing time first: schedule jobs in ascending order og
processing time j`
2. Earliest deadline first: Schedule jobs in ascending order of deadline dj
dj3.
- tj Smallest slack: schedule jobs in ascending order of slack
Its easy to see that shortest processing time first is not optimal a good
counter example is
1
2
tj
1
5
dj
10
5
the smallest stack solution has simillar problems
1
2
tj
1
5
dj
3
5
the last strategy looks valid so we start with some pseudo code:
d1<=d2<=...<=dn
1. Sort n jobs by due time so that
2. Set t=0
3. for j=1 to n
t,t + tj Assign job j to interval []
sj =t fj =t+ tj tj
and

set
set
t=t+
s1,f1 ],[ s2,f2 ] ,..., [ sn,fn
4. return intervals []
And as implementation in C++:
And the output for this program is:

The runtime of the algorithm is obviously Θ (n log n) because sorting is the


dominating operation of this algorithm. Now we need to show that it is
optimal. Clearly an optimal schedule has no idle time. the earliest deadline
first schedule has also no idle time.
d1<=d2<=...<=dnLets assume the jobs are numbered so that . We say a inversion
of a schedule is a pair of jobs i and j so that i<j but j is scheduled before i. Due
to its definition the earliest deadline first schedule has no inversions. Of
course if a schedule has an inversion it has one with a pair of inverted jobs
scheduled consecutively.
Proposition: Swapping two adjacent, inverted jobs reduces the number of
inversions by one and does not increase the maximal lateness.
Lk = Mk for k != i,j Proof:
Let L be the lateness before the swap and M the
all lateness afterwards. Because exchanging two adjacent jobs
does not move the other jobs from their position it is .
Mi<=LiClearly it is since job i got scheduled earlier. if job j is late, so follows
from the definition:
Mj = fi-dj (definition)

<= fi-di (since i and j are exchanged)


<= Li

That means the lateness after swap is less or equal than before. This
concludes the proof.
Proposition: The earliest deadline first schedule S is optimal.
Proof:(by contradiction)
Lets assume S* is optimal schedule with the fewest possible number of
inversions. we can assume that S* has no idle time. If S* has no inversions,
then S=S* and we are done. If S* has an inversion, than it has an adjacent
inversion. The last Proposition states that we can swap the adjacent inversion
without increasing lateness but with decreasing the number of inversions.
This contradicts the definition of S*.
The minimizing lateness problem and its near related minimum makespan
problem, where the question for a minimal schedule is asked have lots of
applications in the real world. But usually you don't have only one machine
but many and they handle the same task at different rates. These problems get
NP-complete really fast.
Another interesting question arises if we don't look at the offline problem,
where we have all tasks and data at hand but at the online variant, where
tasks appear during execution.
Chapter 19: Prim's Algorithm
Section 19.1: Introduction To Prim's
Algorithm
Let's say we have 8 houses. We want to setup telephone lines between these
houses. The edge between the houses represent the cost of setting line
between two houses.

Our task is to set up lines in such a way that all the houses are connected and
the cost of setting up the whole connection is minimum. Now how do we
find that out? We can use Prim's Algorithm.
Prim's Algorithm is a greedy algorithm that finds a minimum spanning tree
for a weighted undirected graph. This means it finds a subset of the edges
that forms a tree that includes every node, where the total weight of all the
edges in the tree are minimized. The algorithm was developed in 1930 by
Czech mathematician Vojtěch Jarník and later rediscovered and republished
by computer scientist Robert Clay Prim in 1957 and Edsger Wybe Dijkstra in
1959. It is also known as DJP algorithm, Jarnik's algorithm, Prim-Jarnik
algorithm or Prim-Dijsktra algorithm.
Now let's look at the technical terms first. If we create a graph, S using some
nodes and edges of an undirected graph G, then S is called a subgraph of the
graph G. Now S will be called a Spanning Tree if and only if:
It contains all the nodes of G.
It is a tree, that means there is no cycle and
all the nodes are connected. There are (n-1)
edges in the tree, where n is the number of nodes
in G.
There can be many Spanning Tree's of a graph. The Minimum Spanning
Tree of a weighted undirected graph is a tree, such that sum of the weight of
the edges is minimum. Now we'll use Prim's algorithm to find out the
minimum spanning tree, that is how to set up the telephone lines in our
example graph in such way that the cost of set up is minimum.
At first we'll select a source node. Let's say, node-1 is our source. Now we'll
add the edge from node-1 that has the minimum cost to our subgraph. Here
we mark the edges that are in the subgraph using the color blue. Here 1-5 is
our desired edge.
Now we consider all the edges from node-1 and node-5 and take the
minimum. Since 1-5 is already marked, we
take 1-2.
This time, we consider node-1, node-2 and node-5 and take the minimum
edge which is 5-4.
The next step is important. From node-1, node-2, node-5 and node-4, the
minimum edge is 2-4. But if we select that one, it'll create a cycle in our
subgraph. This is because node-2 and node-4 are already in our subgraph. So
taking edge 2-4 doesn't benefit us. We'll select the edges in such way that it
adds a new node in our subgraph. So we select edge 4-8.
If we continue this way, we'll select edge 8-6, 6-7 and 4-3. Our subgraph will
look like:
This is our desired subgraph, that'll give us the minimum spanning tree. If we
remove the edges that we didn't
select, we'll get:
This is our minimum spanning tree (MST). So the cost of setting up the
telephone connections is: 4 + 2 + 5 + 11 + 9 + 2 + 1 = 34. And the set of
houses and their connections are shown in the graph. There can be multiple
MST of a graph. It depends on the source node we choose.
The pseudo-code of the algorithm is given below:

Procedure PrimsMST(Graph): // here Graph is a non-empty connected weighted graph


Vnew[] = {x} // New subgraph Vnew with source node x

Complexity:
Time complexity of the above naive approach is O(V ² ). It uses adjacency
matrix. We can reduce the complexity using priority queue. When we add a
new node to Vnew, we can add its adjacent edges in the priority queue. Then
pop the minimum weighted edge from it. Then the complexity will be:
O(ElogE), where E is the number of edges.
Again a Binary Heap can be constructed to reduce the complexity to
O(ElogV).
The pseudo-code using Priority Queue is given below:

Here key[] stores the minimum cost of traversing node-v. parent[] is used to
store the parent node. It is useful for traversing and printing the tree.
Below is a simple program in Java:

LinkCost[i][j] = mat[i][j];
if ( LinkCost[i][j] == 0 )
LinkCost[i][j] = infinite;
} } for ( i=0; i <
NNodes; i++) { for (
j=0; j < NNodes; j++)
if ( LinkCost[i][j] < infinite )
System.out.print( " " + LinkCost[i][j] + " " ); else
System.out.print(" * " );
System.out.println();
} } public int
unReached(boolean[] r) {
boolean done = true; for ( int i =
0; i < r.length; i++ ) if ( r[i] ==
false ) return i; return -1;
} public void Prim( ) { int i, j, k,
x, y; boolean[] Reached = new
boolean[NNodes]; int[] predNode =
new int[NNodes]; Reached[0] = true;
for ( k = 1; k < NNodes; k++ ) {
Reached[k] = false; }
predNode[0] = 0; printReachSet(
Reached ); for (k = 1; k <
NNodes; k++) { x = y = 0;
for ( i = 0; i < NNodes; i++ )
for ( j = 0; j < NNodes; j++ )
{
if ( Reached[i] && !Reached[j] &&
LinkCost[i][j] < LinkCost[x][y] )
{ x = i; y = j;
}
}
System.out.println("Min cost edge: (" +
+ x + "," +
+ y + ")" + "cost = " +
LinkCost[x][y]); predNode[y] = x; Reached[y] =
true; printReachSet( Reached );
System.out.println(); } int[] a= predNode;
for ( i = 0; i < NNodes; i++ )
System.out.println( a[i] + " --> " + i );
} void printReachSet(boolean[]
Reached )
javac Graph.java
Compile

the above code using

Output:
Chapter 20: Bellman–Ford
Algorithm
Section 20.1: Single Source Shortest
Path Algorithm (Given there is a
negative cycle in a graph)
Before reading this example, it is required to have a brief idea on edge-
relaxation. You can learn it from here
Bellman-Ford Algorithm is computes the shortest paths from a single source
vertex to all of the other vertices in a weighted digraph. Even though it is
slower than Dijkstra's Algorithm, it works in the cases when the weight of
the edge is negative and it also finds negative weight cycle in the graph. The
problem with Dijkstra's Algorithm is, if there's a negative cycle, you keep
going through the cycle again and again and keep reducing the distance
between two vertices.
The idea of this algorithm is to go through all the edges of this graph one-by-
one in some random order. It can be any random order. But you must ensure,
if u-v (where u and v are two vertices in a graph) is one of your orders, then
there must be an edge from u to v. Usually it is taken directly from the order
of the input given. Again, any random order will work.
After selecting the order, we will relax the edges according to the relaxation
formula. For a given edge u-v going from u to v the relaxation formula is:

That is, if the distance from source to any vertex u + the weight of the edge
u-v is less than the distance from source to another vertex v, we update the
distance from source to v. We need to relax the edges at most (V-1) times
where V is the number of edges in the graph. Why (V-1) you ask? We'll
explain it in another example. Also we are going to keep track of the parent
vertex of any vertex, that is when we relax an edge, we will set:

It means we've found another shorter path to reach v via u. We will need this
later to print the shortest path from source to the destined vertex.
Let's look at an example. We have a graph:

We have selected 1 as the source vertex. We want to find out the shortest
path from the source to all other vertices.
At first, d[1] = 0 because it is the source. And rest are infinity, because we
don't know their distance yet.
We will relax the edges in this sequence:

+--------+--------+--------+--------+--------+--------+--------+
| Serial | 1 | 2 | 3 | 4 | 5 | 6 |
+--------+--------+--------+--------+--------+--------+--------+
| Edge | 4->5 | 3->4 | 1->3 | 1->4 | 4->6 | 2->3 |
+--------+--------+--------+--------+--------+--------+--------+

You can take any sequence you want. If we relax the edges once, what do we
get? We get the distance from source to all other vertices of the path that
uses at most 1 edge. Now let's relax the edges and update the values of d[].
We get:
1. d[4] + cost[4][5] = infinity + 7 = infinity. We can't update this one.
2. d[2] + cost[3][4] = infinity. We can't update this one.
3. d[1] + cost[1][3] = 0 + 2 = 2 < d[2]. So d[3] = 2. Also parent[1] = 1.
4. d[1] + cost[1][4] = 4. So d[4] = 4 < d[4]. parent[4] = 1.
5. d[4] + cost[4][6] = 9. d[6] = 9 < d[6]. parent[6] = 4.

Our second iteration will provide us with the path using 2 nodes. We get:
1. d[4] + cost[4][5] = 12 < d[5]. d[5] = 12. parent[5] = 4.
2. d[3] + cost[3][4] = 1 < d[4]. d[4] = 1. parent[4] = 3.
3. d[3] remains unchanged.
4. d[4] remains unchanged.
5. d[4] + cost[4][6] = 6 < d[6]. d[6] = 6. parent[6] = 4.
6. d[3] remains unchanged.
Our graph will look like:
Our 3rd iteration will only update vertex 5, where d[5] will be 8. Our graph
will look like:

After this no matter how many iterations we do, we'll have the same
distances. So we will keep a flag that checks if any update takes place or not.
If it doesn't, we'll simply break the loop. Our pseudo-code will be:
To keep track of negative cycle, we can modify our code using the procedure
described here. Our completed pseudo-code will be:

Printing Path:
To print the shortest path to a vertex, we'll iterate back to its parent until we
find NULL and then print the vertices.
The pseudo-code will be:
Complexity:
Since we need to relax the edges maximum (V-1) times, the time complexity
of this algorithm will be equal to O(V * E) where E denotes the number of
edges, if we use adjacency list to represent the graph. However, if adjacency matrix
is used to represent the graph, time complexity will be O(V^3). Reason is
we can iterate through all edges in O(E) time when adjacency list is used, but it
takes O(V^2) time when adjacency matrix is used.
Section 20.2: Detecting Negative Cycle
in a Graph
To understand this example, it is recommended to have a brief idea about
Bellman-Ford algorithm which can be found here
Using Bellman-Ford algorithm, we can detect if there is a negative cycle in
our graph. We know that, to find out the shortest path, we need to relax all
the edges of the graph (V-1) times, where V is the number of vertices in a
graph. We have already seen that in this example, after (V-1) iterations, we
can't update d[], no matter how many iterations we do. Or can we?
If there is a negative cycle in a graph, even after (V-1) iterations, we can
update d[]. This happens because for every iteration, traversing through the
negative cycle always decreases the cost of the shortest path. This is why
BellmanFord algorithm limits the number of iterations to (V-1). If we used
Dijkstra's Algorithm here, we'd be stuck in an endless loop. However, let's
concentrate on finding negative cycle.
Let's assume, we have a graph:

Let's pick vertex 1 as the source. After applying Bellman-Ford's single


source shortest path algorithm to the graph, we'll find out the distances from
the source to all the other vertices.
This is how the graph looks like after (V-1) = 3 iterations. It should be the
result since there are 4 edges, we need at most 3 iterations to find out the
shortest path. So either this is the answer, or there is a negative weight cycle
in the graph. To find that, after (V-1) iterations, we do one more final
iteration and if the distance continues to decrease, it means that there is
definitely a negative weight cycle in the graph.
For this example: if we check 2-3, d[2] + cost[2][3] will give us 1 which is
less than d[3]. So we can conclude that there is a negative cycle in our graph.
So how do we find out the negative cycle? We do a bit modification to
Bellman-Ford procedure:
This is how we find out if there is a negative cycle in a graph. We can also
modify Bellman-Ford Algorithm to keep track of negative cycles.
Section 20.3: Why do we need to relax
all the edges at most (V-1) times
To understand this example, it is recommended to have a brief idea on
Bellman-Ford single source shortest path algorithm which can be found here
In Bellman-Ford algorithm, to find out the shortest path, we need to relax all
the edges of the graph. This process is repeated at most (V-1) times, where V
is the number of vertices in the graph.
The number of iterations needed to find out the shortest path from source to
all other vertices depends on the order that we select to relax the edges.
Let's take a look at an example:

Here, the source vertex is 1. We will find out the shortest distance between
the source and all the other vertices. We can clearly see that, to reach vertex
4, in the worst case, it'll take (V-1) edges. Now depending on the order in
which the edges are discovered, it might take (V-1) times to discover vertex
4. Didn't get it? Let's use Bellman-Ford algorithm to find out the shortest
path here:
We're going to use this sequence:

For our first iteration:


1. d[3] + cost[3][4] = infinity. It won't change anything.
2. d[2] + cost[2][3] = infinity. It won't change anything.
3. d[1] + cost[1][2] = 2 < d[2]. d[2] = 2. parent[2] = 1.
We can see that our relaxation process only changed d[2]. Our graph will
look like:

Second iteration:
1. d[3] + cost[3][4] = infinity. It won't change anything.
2. d[2] + cost[2][3] = 5 < d[3]. d[3] = 5. parent[3] = 2.
3. It won't be changed.
This time the relaxation process changed d[3]. Our graph will look like:

Third iteration:
1. d[3] + cost[3][4] = 7 < d[4]. d[4] = 7. parent[4] = 3.
2. It won't be changed.
3. It won't be changed.
Our third iteration finally found out the shortest path to 4 from 1. Our graph
will look like:

So, it took 3 iterations to find out the shortest path. After this one, no matter
how many times we relax the edges, the values in d[] will remain the same.
Now, if we considered another sequence:
We'd get:
1. d[1] + cost[1][2] = 2 < d[2]. d[2] = 2.
2. d[2] + cost[2][3] = 5 < d[3]. d[3] = 5.
3. d[3] + cost[3][4] = 7 < d[4]. d[4] = 5.
Our very first iteration has found the shortest path from source to all the
other nodes. Another sequence 1->2, 3->4, 2->3 is possible, which will give
us shortest path after 2 iterations. We can come to the decision that, no matter
how we arrange the sequence, it won't take more than 3 iterations to find out
shortest path from the source in this example.
We can conclude that, for the best case, it'll take 1 iteration to find out the
shortest path from source. For the worst case, it'll take (V-1) iterations,
which is why we repeat the process of relaxation (V-1) times.
Chapter 21: Line Algorithm
Line drawing is accomplished by calculating intermediate positions
along the line path between two specified endpoint positions. An output
device is then directed to fill in these positions between the endpoints.
Section 21.1: Bresenham Line
Drawing Algorithm
Background Theory: Bresenham ’ s Line Drawing Algorithm is an efficient
and accurate raster line generating algorithm developed by Bresenham. It
involves only integer calculation so it is accurate and fast. It can also be
extended to display circles another curves.
In Bresenham line drawing algorithm:
For Slope |m|<1:
Either value of x is increased
OR both x and y is increased using decision parameter.
For Slope |m|>1:
Either value of y is increased
OR both x and y is increased using decision parameter.
Algorithm for slope |m|<1:
1. Input two end points (x1,y1) and (x2,y2) of the line.
2. Plot the first point (x1,y1).
3. Calculate
Delx =| x2 – x1 |
Dely = | y2 – y1 |
4. Obtain the initial decision parameter as P = 2 * dely – delx
5. For I = 0 to delx in step of 1
If p < 0 then
X1 = x1 + 1
Pot(x1,y1)
P = p+ 2dely
Else
X1 = x1 + 1
Y1 = y1 + 1
Plot(x1,y1)
P = p + 2dely – 2 * delx
End if
End for
6. END

Source Code:
Algorithm for slope |m|>1:
1. Input two end points (x1,y1) and (x2,y2) of the line.
2. Plot the first point (x1,y1).
3. Calculate
Delx =| x2 – x1 |
Dely = | y2 – y1 |
4. Obtain the initial decision parameter as P = 2 * delx – dely
5. For I = 0 to dely in step of 1
If
p
<
0
then
y1
=
y1
+
1
Pot(x1,y1)
P = p+ 2delx
Else
X1
=
x1
+
1
Y1
=
y1
+
1
Plot(x1,y1)
P = p + 2delx – 2 * dely
End if
End for
6. END
Source Code:
Chapter 22: Floyd-Warshall
Algorithm
Section 22.1: All Pair Shortest Path
Algorithm
Floyd-Warshall's algorithm is for finding shortest paths in a weighted graph
with positive or negative edge weights. A single execution of the algorithm
will find the lengths (summed weights) of the shortest paths between all pair
of vertices. With a little variation, it can print the shortest path and can detect
negative cycles in a graph. FloydWarshall is a Dynamic-Programming
algorithm.
Let's look at an example. We're going to apply Floyd-Warshall's algorithm on
this graph:

distance w path u

First thing we do is, we take two 2D matrices. These are adjacency matrices.
The size of the matrices is going to be the total number of vertices. For our
graph, we will take 4 * 4 matrices. The Distance Matrix is going to store the
minimum distance found so far between two vertices. At first, for the edges,
if there is an edge between u-v and the distance/weight is w, we'll store: [u][v]
=. For all the edges that doesn't exist, we're gonna put infinity. The Path
Matrix is for regenerating minimum distance path between two vertices. So
initially, if there is a path between u and v, we're going to put [u][v] =. This
means the best way to come to vertex-v from vertex-u is to use the edge that
connects v with u. If there is no path between two vertices, we're going to put
N there indicating there is no path available now. The two tables for our
graph will look like:
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| | 1 | 2 | 3 | 4 | | | 1 | 2 | 3 | 4 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 1 | 0 | 3 | 6 | 15 | | 1 | N | 1 | 1 | 1 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 2 | inf | 0 | -2 | inf | | 2 | N | N | 2 | N |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 3 | inf | inf | 0 | 2 | | 3 | N | N | N | 3 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+

| 4 | 1 | inf | inf | 0 | | 4 | 4 | N | N | N | +-----+-----+-----+-----+-----


+ +-----+-----+-----+-----+-----+ distance path

Since there is no loop, the diagonals are set N. And the distance from the
vertex itself is 0.
To apply Floyd-Warshall algorithm, we're going to select a middle vertex k.
Then for each vertex i, we're going to check if we can go from i to k and then
k to j, where j is another vertex and minimize the cost of going from i to j. If
the current distance[i][j] is greater than distance[i][k] + distance[k][j],
we're going to put distance[i][j] equals to the summation of those two
distances. And the path[i][j] will be set to path[k][j], as it is better to go
from i to k, and then k to j. All the vertices will be selected as k. We'll have
3 nested loops: for k going from 1 to 4, i going from 1 to 4 and j going from
1 to 4. We're going check:

So what we're basically checking is, for every pair of vertices, do we get a
shorter distance by going through another vertex? The total number of
operations for our graph will be 4 * 4 * 4 = 64. That means we're going to do
this check 64 times. Let's look at a few of them:
When k = 1, i = 2 and j = 3, distance[i][j] is -2, which is not greater than
distance[i][k] + distance[k][j] = -2 + 0 = -2. So it will remain unchanged.
Again, when k = 1, i = 4 and j = 2, distance[i][j] = infinity, which is greater
than distance[i][k] + distance[k][j] = 1 + 3 = 4. So we put distance[i][j] =
4, and we put path[i][j] = path[k][j] = 1. What this means is, to go from
vertex-4 to vertex-2, the path 4->1->2 is shorter than the existing path. This
is how we populate both matrices. The calculation for each step is shown
here. After making necessary changes, our matrices will look like:
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| | 1 | 2 | 3 | 4 | | | 1 | 2 | 3 | 4 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 1 | 0 | 3 | 1 | 3 | | 1 | N | 1 | 2 | 3 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 2 | 1 | 0 | -2 | 0 | | 2 | 4 | N | 2 | 3 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 3 | 3 | 6 | 0 | 2 | | 3 | 4 | 1 | N | 3 |
+-----+-----+-----+-----+-----+ +-----+-----+-----+-----+-----+

| 4 | 1 | 4 | 2 | 0 | | 4 | 4 | 1 | 2 | N | +-----+-----+-----+-----+-----
+ +-----+-----+-----+-----+-----+ distance path

This is our shortest distance matrix. For example, the shortest distance from 1
to 4 is 3 and the shortest distance between 4 to 3 is 2. Our pseudo-code will
be:
Printing the path:
To print the path, we'll check the Path matrix. To print the path from u to v,
we'll start from path[u][v]. We'll set keep changing v = path[u][v] until we
find path[u][v] = u and push every values of path[u][v] in a stack. After
finding u, we'll print u and start popping items from the stack and print them.
This works because the path matrix stores the value of the vertex which
shares the shortest path to v from any other node. The pseudo-code will be:

Finding Negative Edge Cycle:


To find out if there is a negative edge cycle, we'll need to check the main
diagonal of distance matrix. If any value on the diagonal is negative, that
means there is a negative cycle in the graph.
Complexity:
The complexity of Floyd-Warshall algorithm is O(V ³ ) and the space
complexity is: O(V ² ).
Chapter 23: Catalan Number
Algorithm
Section 23.1: Catalan Number
Algorithm Basic Information
Catalan numbers algorithm is Dynamic Programming algorithm.
In combinatorial mathematics, the Catalan numbers form a sequence of
natural numbers that occur in various counting problems, often involving
recursively-defined objects. The Catalan numbers on nonnegative integers n
are a set of numbers that arise in tree enumeration problems of the type, 'In
how many ways can a regular n-gon be divided into n-2 triangles if different
orientations are counted separately?'
Application of Catalan Number Algorithm:
1. The number of ways to stack coins on a bottom row that consists of n
consecutive coins in a plane, such that no coins are allowed to be put on
the two sides of the bottom coins and every additional coin must be
above two other coins, is the nth Catalan number.
2. The number of ways to group a string of n pairs of parentheses, such that
each open parenthesis has a matching closed parenthesis, is the nth
Catalan number.
3. The number of ways to cut an n+2-sided convex polygon in a plane into
triangles by connecting vertices with straight, non-intersecting lines is
the nth Catalan number. This is the application in which Euler was
interested.
Using zero-based numbering, the nth Catalan number is given directly in
terms of binomial coefficients by the following equation.

Example of Catalan Number:


Here value of n = 4.(Best Example - From Wikipedia)
Auxiliary Space: O(n)
Time Complexity: O(n^2)
Chapter 24: Multithreaded
Algorithms
Examples for some multithreaded algorithms.
Section 24.1: Square matrix
multiplication multithread
Section 24.2: Multiplication matrix
vector multithread
Section 24.3: merge-sort multithread
A is an array and p and q indexes of the array such as you gonna sort the sub-
array A[p..r]. B is a sub-array which will be populated by the sort.
A call to p-merge-sort(A,p,r,B,s) sorts elements from A[p..r] and put them in
B[s..s+r-p].

Here is the auxiliary function that performs the merge in parallel.


p-merge assumes that the two sub-arrays to merge are in the same array but
doesn't assume they are adjacent in the array. That's why we need
p1,r1,p2,r2.

And here is the auxiliary function dichotomic-search.


x is the key to look for in the sub-array T[p..r].
Chapter 25: Knuth Morris Pratt
(KMP) Algorithm
The KMP is a pattern matching algorithm which searches for occurrences of
a "word" W within a main "text string" S by employing the observation that
when a mismatch occurs, we have the sufficient information to determine
where the next match could begin.We take advantage of this information to
avoid matching the characters that we know will anyway match.The worst
case complexity for searching a pattern reduces to O(n).
Section 25.1: KMP-Example
Algorithm
This algorithm is a two step process.First we create a auxiliary array lps[] and
then use this array for searching the pattern.
Preprocessing :
1. We pre-process the pattern and create an auxiliary array lps[] which is
used to skip characters while matching.
2. Here lps[] indicates longest proper prefix which is also suffix.A proper
prefix is prefix in which whole string is not included.For example,
prefixes of string ABC are “ ” , “ A ” , “ AB ” and “ ABC ” . Proper
prefixes are “ ” , “ A ” and “ AB ” . Suffixes of the string are “ ” ,
“ C ” , “ BC ” and “ ABC ” .
Searching
1. We keep matching characters txt[i] and pat[j] and keep incrementing i
and j while pat[j] and txt[i] keep matching.
2. When we see a mismatch,we know that characters pat[0..j-1] match
with txt[i-j+1 … i-1].We also know that lps[j-1] is count of characters
of pat[0 … j-1] that are both proper prefix and suffix.From this we can
conclude that we do not need to match these lps[j-1] characters with
txt[i-j … i-1] because we know that these characters will match
anyway.
Implementaion in Java
Chapter 26: Edit Distance Dynamic
Algorithm
Section 26.1: Minimum Edits required
to convert string 1 to string 2
The problem statement is like if we are given two string str1 and str2 then
how many minimum number of operations can be performed on the str1 that
it gets converted to str2.The Operations can be:
1. Insert
2. Remove
3. Replace
For Example

To solve this problem we will use a 2D array dp[n+1][m+1] where n is the


length of the first string and m is the length of the second string. For our
example, if str1 is azcef and str2 is abcdef then our array will be dp[6][7]and
our final answer will be stored at dp[5][6].

For dp[1][1] we have to check what can we do to convert a into a.It will be
0.For dp[1][2] we have to check what can we do to convert a into ab.It will
be 1 because we have to insert b.So after 1st iteration our array will look like
For iteration 2
For dp[2][1] we have to check that to convert az to a we need to remove z,
hence dp[2][1] will be 1.Similary for dp[2][2] we need to replace z with b,
hence dp[2][2] will be 1.So after 2nd iteration our dp[] array will look like.

So our formula will look like

After last iteration our dp[] array will look like


Implementation in Java

Time Complexity
Chapter 27: Online algorithms
Theory
Definition 1: An optimization problem Π consists of a set of instances
ΣΠ . For every instance σ∈ΣΠ there is a set Ζσ of solutions and a
objective functionf σ : Ζσ → ℜ≥ 0 which assigns apositive real value to
every solution. We say OPT( σ ) is the value of an optimal solution, A( σ )
is the solution of an Algorithm A for the problem Π and
wA( σ )=f σ (A( σ )) its value.
Definition 2: An online algorithm A for a minimization problem Π has a
competetive ratio of r ≥ 1 if there is a constant τ∈ℜ with

wA( σ ) = f σ (A( σ )) ≤ r ⋅ OPT(&sigma) + τ


for all instances σ∈ΣΠ . A is called a r-competitive online algorithm. Is
even

wA( σ ) ≤ r ⋅ OPT(&sigma)
for all instances σ∈ΣΠ then A is called a strictly r-competitive online
algorithm.
Proposition 1.3: LRU and FWF are marking algorithm.
Proof: At the beginning of each phase (except for the first one) FWF has a
cache miss and cleared the cache. that means we have k empty pages. In
every phase are maximal k different pages requested, so there will be now
eviction during the phase. So FWF is a marking algorithm.
Lets assume LRU is not a marking algorithm. Then there is an instance σ
where LRU a marked page x in phase i evicted. Let σ t the request in phase i
where x is evicted. Since x is marked there has to be a earlier request σ t* for
x in the same phase, so t* < t. After t* x is the caches newest page, so to got
evicted at t the sequence σ t*+1,..., σ t has to request at least k from x
different pages. That implies the phase i has requested at least k+1 different
pages which is a contradictory to the phase definition. So LRU has to be a
marking algorithm.
Proposition 1.4: Every marking algorithm is strictly k-competitive.
Proof: Let σ be an instance for the paging problem and l the number of
phases for σ . Is l = 1 then is every marking algorithm optimal and the
optimal offline algorithm cannot be better.
We assume l ≥ 2. the cost of every marking algorithm for instance σ is
bounded from above with l ⋅ k because in every phase a marking algorithm
cannot evict more than k pages without evicting one marked page.
Now we try to show that the optimal offline algorithm evicts at least k+l-2
pages for σ , k in the first phase and at least one for every following phase
except for the last one. For proof lets define l-2 disjunct subsequences of σ .
Subsequence i ∈ {1,...,l-2} starts at the second position of phase i+1 and
end with the first position of phase i+2. Let x be the first page of phase i+1.
At the beginning of subsequence i there is page x and at most k-1 different
pages in the optimal offline algorithms cache. In subsequence i are k page
request different from x, so the optimal offline algorithm has to evict at least
one page for every subsequence. Since at phase 1 beginning the cache is
still empty, the optimal offline algorithm causes k evictions during the first
phase. That shows that

wA( σ ) ≤ l ⋅ k ≤ (k+l-2)k ≤ OPT( σ ) ⋅ k


Corollary 1.5: LRU and FWF are strictly k-competitive.
Is there no constant r for which an online algorithm A is r-competitive, we
call A not competitive.
Proposition 1.6: LFU and LIFO are not competitive.
Proof: Let l ≥ 2 a constant, k ≥ 2 the cache size. The different cache pages
are nubered 1,...,k+1. We look at the following sequence:
First page 1 is requested l times than page 2 and so one. At the end there are
(l-1) alternating requests for page k and k+1.
LFU and LIFO fill their cache with pages 1-k. When page k+1 is requested
page k is evicted and vice versa. That means every request of subsequence
(k,k+1)l-1 evicts one page. In addition their are k-1 cache misses for the first
time use of pages 1-(k-1). So LFU and LIFO evict exact k-1+2(l-1) pages.
Now we must show that for every constant τ∈ℜ and every constan r ≤ 1
there exists an l so that

which is equal to

To satisfy this inequality you just have to choose l sufficient big. So LFU and
LIFO are not competetive.
Proposition 1.7: There is no r-competetive deterministic online algorithm
for paging with r < k.
Sources
Basic Material
1. Script Online Algorithms (german), Heiko Roeglin, University Bonn
2. Page replacement algorithm
Further Reading
1. Online Computation and Competetive Analysis by Allan Borodin and
Ran El-Yaniv
Source Code
1. Source code for offline caching
2. Source code for adversary game
Section 27.1:
Paging (Online
Caching)
Preface
Instead of starting with a formal definition, the goal is to approach these topic
via a row of examples, introducing definitions along the way. The remark
section Theory will consist of all definitions, theorems and propositions to
give you all information to faster look up specific aspects.
The remark section sources consists of the basis material used for this topic
and additional information for further reading. In addition you will find the
full source codes for the examples there. Please pay attention that to make the
source code for the examples more readable and shorter it refrains from
things like error handling etc. It also passes on some specific language
features which would obscure the clarity of the example like extensive use of
advanced libraries etc.

Paging
<= kthen we just put all elements in the cache and it will >>The paging
work, but usually is m problem arises
from the
limitation of finite space. Let's assume our cache C has k pages. Now we want
to process a sequence of m page requests which must have been placed in the
cache before they are processed. Of course if mk.
We say a request is a cache hit, when the page is already in cache, otherwise,
its called a cache miss. In that case, we must bring the requested page into
the cache and evict another, assuming the cache is full. The Goal is an
eviction schedule that minimizes the number of evictions.
There are numerous strategies for this problem, let's look at some:
1. First in, first out (FIFO): The oldest page gets evicted
2. Last in, first out (LIFO): The newest page gets evicted
3. Least recently used (LRU): Evict page whose most recent access was
earliest
4. Least frequently used (LFU): Evict page that was least frequently
requested
5. Longest forward distance (LFD): Evict page in the cache that is not
requested until farthest in the future. 6. Flush when full (FWF): clear
the cache complete as soon as a cache miss happened
There are two ways to approach this problem:
1. offline: the sequence of page requests is known ahead of time
2. online: the sequence of page requests is not known ahead of time
Offline Approach
For the first approach look at the topic Applications of Greedy technique. It's
third Example Offline Caching considers the first five strategies from above
and gives you a good entry point for the following.
The example program was extended with the FWF strategy:
The full sourcecode is available here. If we reuse the example from the topic,
we get the following output:

Even though LFD is optimal, FWF has fewer cache misses. But the main
goal was to minimize the number of evictions and for FWF five misses mean
15 evictions, which makes it the poorest choice for this example.

Online Approach
Now we want to approach the online problem of paging. But first we need an
understanding how to do it. Obviously an online algorithm cannot be better
than the optimal offline algorithm. But how much worse it is? We need
formal definitions to answer that question:
Definition 1.1: An optimization problem Π consists of a set of instances
ΣΠ . For every instance σ∈ΣΠ there is a set Ζσ of solutions and a
objective functionf σ : Ζσ → ℜ≥ 0 which assigns apositive real value to
every solution. We say OPT( σ ) is the value of an optimal solution, A( σ )
is the solution of an Algorithm A for the problem Π and
wA( σ )=f σ (A( σ )) its value.
Definition 1.2: An online algorithm A for a minimization problem Π has a
competetive ratio of r ≥ 1 if there is a constant τ∈ℜ with

wA( σ ) = f σ (A( σ )) ≤ r ⋅ OPT( σ ) + τ


for all instances σ∈ΣΠ . A is called a r-competitive online algorithm. Is
even

wA( σ ) ≤ r ⋅ OPT( σ )
for all instances σ∈ΣΠ then A is called a strictly r-competitive online
algorithm.
So the question is how competitive is our online algorithm compared to an
optimal offline algorithm. In their famous book Allan Borodin and Ran El-
Yaniv used another scenario to describe the online paging situation:
There is an evil adversary who knows your algorithm and the optimal offline
algorithm. In every step, he tries to request a page which is worst for you and
simultaneously best for the offline algorithm. the competitive factor of your
algorithm is the factor on how badly your algorithm did against the
adversary's optimal offline algorithm. If you want to try to be the adversary,
you can try the Adversary Game (try to beat the paging strategies).
Marking Algorithms
Instead of analysing every algorithm separately, let's look at a special online
algorithm family for the paging problem called marking algorithms.
Let σ =( σ 1,..., σ p) an instance for our problem and k our cache size, than
σ can be divided into phases:
Phase 1 is the maximal subsequence of σ from the start till
maximal k different pages are requested Phase i ≥ 2 is the
maximal subsequence of σ from the end of pase i-1 till maximal k
different pages are requested

A marking algorithm (implicitly or explicitly) maintains whether a page is


marked or not. At the beginning of each phase are all pages unmarked. Is a
page requested during a phase it gets marked. An algorithm is a marking
algorithm iff it never evicts a marked page from cache. That means pages
which are used during a phase will not be evicted.
Proposition 1.3: LRU and FWF are marking algorithm.
Proof: At the beginning of each phase (except for the first one) FWF has a
cache miss and cleared the cache. that means we have k empty pages. In
every phase are maximal k different pages requested, so there will be now
eviction during the phase. So FWF is a marking algorithm.
Let's assume LRU is not a marking algorithm. Then there is an instance σ
where LRU a marked page x in phase i evicted. Let σ t the request in phase i
where x is evicted. Since x is marked there has to be a earlier request σ t* for
x in the same phase, so t* < t. After t* x is the caches newest page, so to got
evicted at t the sequence σ t*+1,..., σ t has to request at least k from x
different pages. That implies the phase i has requested at least k+1 different
pages which is a contradictory to the phase definition. So LRU has to be a
marking algorithm.
Proposition 1.4: Every marking algorithm is strictly k-competitive.
Proof: Let σ be an instance for the paging problem and l the number of
phases for σ . Is l = 1 then is every marking algorithm optimal and the
optimal offline algorithm cannot be better.
We assume l ≥ 2. the cost of every marking algorithm, for instance, σ is
bounded from above with l ⋅ k because in every phase a marking algorithm
cannot evict more than k pages without evicting one marked page.
Now we try to show that the optimal offline algorithm evicts at least k+l-2
pages for σ , k in the first phase and at least one for every following phase
except for the last one. For proof lets define l-2 disjunct subsequences of σ .
Subsequence i ∈ {1,...,l-2} starts at the second position of phase i+1 and
end with the first position of phase i+2. Let x be the first page of phase i+1.
At the beginning of subsequence i there is page x and at most k-1 different
pages in the optimal offline algorithms cache. In subsequence i are k page
request different from x, so the optimal offline algorithm has to evict at least
one page for every subsequence. Since at phase 1 beginning the cache is
still empty, the optimal offline algorithm causes k evictions during the first
phase. That shows that

wA( σ ) ≤ l ⋅ k ≤ (k+l-2)k ≤ OPT( σ ) ⋅ k


Corollary 1.5: LRU and FWF are strictly k-competitive.
Excercise: Show that FIFO is no marking algorithm, but strictly k-
competitive.
Is there no constant r for which an online algorithm A is r-competitive, we
call A not competitive
Proposition 1.6: LFU and LIFO are not competitive.
Proof: Let l ≥ 2 a constant, k ≥ 2 the cache size. The different cache pages
are nubered 1,...,k+1. We look at the following sequence:

The first page 1 is requested l times than page 2 and so one. At the end, there
are (l-1) alternating requests for page k and k+1.
LFU and LIFO fill their cache with pages 1-k. When page k+1 is requested
page k is evicted and vice versa. That means every request of subsequence
(k,k+1)l-1 evicts one page. In addition, their are k-1 cache misses for the first
time use of pages 1-(k-1). So LFU and LIFO evict exact k-1+2(l-1) pages.
Now we must show that for every constant τ∈ℜ and every constant r ≤ 1
there exists an l so that

which is equal to

To satisfy this inequality you just have to choose l sufficient big. So LFU and
LIFO are not competitive.
Proposition 1.7: There is no r-competetive deterministic online algorithm
for paging with r < k.
The proof for this last proposition is rather long and based of the statement
that LFD is an optimal offline algorithm. The interested reader can look it up
in the book of Borodin and El-Yaniv (see sources below).
The Question is whether we could do better. For that, we have to leave the
deterministic approach behind us and start to randomize our algorithm.
Clearly, its much harder for the adversary to punish your algorithm if it's
randomized.
Randomized paging will be discussed in one of next examples...
Chapter 28: Sorting
Parameter Description
A sorting algorithm is stable if it preserves the relative order of
Stability
equal elements after sorting.
A sorting algorithm is in-place if it sorts using only O(1) auxiliary
In place
memory (not counting the array that needs to be sorted).
Best case A sorting algorithm has a best case time complexity of O(T
complexity running time is at least T(n) for all possible inputs.
Average case A sorting algorithm has an average case time complexity of
complexity if its running time, averaged over all possible inputs, is T
Worst case A sorting algorithm has a worst case time complexity of
complexity its running time is at
most T(n).
Section 28.1: Stability in Sorting
Stability in sorting means whether a sort algorithm maintains the relative
order of the equals keys of the original input in the result output.
So a sorting algorithm is said to be stable if two objects with equal keys
appear in the same order in sorted output as they appear in the input unsorted
array.
Consider a list of pairs:

Now we will sort the list using the first element of each pair.
A stable sorting of this list will output the below list:

, 3) appears , Because (97) in the original list as well.


after (9 An unstable sorting will output the below list:

Unstable sort may generate the same output as the stable sort but not always.
Well-known stable sorts:
Merge sort
Insertion sort
Radix sort
Tim sort
Bubble Sort
Well-known unstable sorts:
Heap sort
Quick sort
Chapter 29: Bubble Sort
Parameter Description
Stable Yes
In place Yes
Best case complexity O(n)
Average case complexity O(n^2)
Worst case
complexity O(n^2)
Space
complexity O(1)
Section 29.1: Bubble Sort
The BubbleSort compares each successive pair of elements in an unordered list
and inverts the elements if they are not in order.
The following example illustrates the bubble sort on the list {6,5,3,1,8,7,2,4}
(pairs that were compared in each step are encapsulated in '**'):

After one iteration through the list, we have {5,3,1,6,7,2,4,8}. Note that the
greatest unsorted value in the array (8 in this case) will always reach its final
position. Thus, to be sure the list is sorted we must iterate n-1 times for lists
of length n.
Graphic:
Section 29.2: Implementation in C &
C++

C Implementation

Bubble Sort with pointer


Section 29.3: Implementation in C#
Bubble sort is also known as Sinking Sort. It is a simple sorting algorithm
that repeatedly steps through the list to be sorted, compares each pair of
adjacent items and swaps them if they are in the wrong order.
Bubble sort example

Implementation of Bubble Sort


I used C# language to implement bubble sort algorithm
Section 29.4: Python Implementation
Section 29.5: Implementation in Java
Section 29.6: Implementation in
Javascript
Chapter 30: Merge Sort
Section 30.1: Merge Sort Basics
Merge Sort is a divide-and-conquer algorithm. It divides the input list of
length n in half successively until there are n lists of size 1. Then, pairs of
lists are merged together with the smaller first element among the pair of lists
being added in each step. Through successive merging and through
comparison of first elements, the sorted list is built.
An example:
The above recurrence can be solved either using Recurrence Tree method or
Master method. It falls in case II of
nLogn nLogn

Master Method and solution of the recurrence is Θ (). Time complexity of


Merge Sort is Θ () in all 3 cases (worst, average and best) as merge sort
always divides the array in two halves and take linear time to merge two
halves.
Auxiliary Space: O(n)
Algorithmic Paradigm: Divide and Conquer
Sorting In Place: Not in a typical implementation
Stable: Yes
Section 30.2: Merge Sort
Implementation in Go
Section 30.3: Merge Sort
Implementation in C & C#
C Merge Sort

C# Merge Sort
Section 30.4: Merge Sort
Implementation in Java
Below there is the implementation in Java using a generics approach. It is the
same algorithm, which is presented above.
Section 30.5: Merge Sort
Implementation in Python
Section 30.6: Bottoms-up Java
Implementation
Chapter 31: Insertion Sort
Section 31.1: Haskell Implementation
Chapter 32: Bucket Sort
Section 32.1: C# Implementation
Chapter 33: Quicksort
Section 33.1: Quicksort Basics
Quicksort is a sorting algorithm that picks an element ("the pivot") and
reorders the array forming two partitions such that all elements less than the
pivot come before it and all elements greater come after. The algorithm is
then applied recursively to the partitions until the list is sorted.
1. Lomuto partition scheme mechanism :
This scheme chooses a pivot which is typically the last element in the array.
The algorithm maintains the index to put the pivot in variable i and each time
it finds an element less than or equal to pivot, this index is incremented and
that element would be placed before the pivot.

Quick Sort mechanism :

Example of quick sort:


2. Hoare partition scheme:
It uses two indices that start at the ends of the array being partitioned, then
move toward each other, until they detect an inversion: a pair of elements,
one greater or equal than the pivot, one lesser or equal, that are in the wrong
order relative to each other. The inverted elements are then swapped. When
the indices meet, the algorithm stops and returns the final index. Hoare's
scheme is more efficient than Lomuto's partition scheme because it does
three times fewer swaps on average, and it creates efficient partitions even
when all values are equal.
Partition :
Section 33.2: Quicksort in Python

Prints "[1, 1, 2, 3, 6, 8, 10]"


Section 33.3: Lomuto partition java
implementation
Chapter 34: Counting Sort
Section 34.1: Counting Sort Basic
Information
Counting sort is an integer sorting algorithm for a collection of objects
that sorts according to the keys of the objects. Steps
1. Construct a working array C that has size equal to the range of the input
array A.
2. Iterate through A, assigning C[x] based on the number of times x
appeared in A.
3. Transform C into an array where C[x] refers to the number of values ≤
x by iterating through the array, assigning to each C[x] the sum of its
prior value and all values in C that come before it.
4. Iterate backwards through A, placing each value in to a new sorted array
B at the index recorded in C. This is done for a given A[x] by assigning
B[C[A[x]]] to A[x], and decrementing C[A[x]] in case there were
duplicate values in the original unsorted array.
Example of Counting Sort

Auxiliary Space: O(n+k)


Time Complexity: Worst-case: O(n+k), Best-case: O(n),
Average-case O(n+k)
Section 34.2: Psuedocode
Implementation
Constraints:
1. Input (an array to be sorted)
2. Number of element in input (n)
3. Keys in the range of 0..k-1 (k)
4. Count (an array of number)
Pseudocode:
Chapter 35: Heap Sort
Section 35.1: C# Implementation
Section 35.2: Heap Sort Basic
Information
Heap sort is a comparison based sorting technique on binary heap data
structure. It is similar to selection sort in which we first find the maximum
element and put it at the end of the data structure. Then repeat the same
process for the remaining items.
Pseudo code for Heap Sort:

Example of Heap Sort:

Auxiliary Space: O(1)


nlogn Time Complexity: O()
Chapter 36: Cycle Sort
Section 36.1: Pseudocode
Implementation
Chapter 37: Odd-Even Sort
Section 37.1: Odd-Even Sort Basic
Information
An Odd-Even Sort or brick sort is a simple sorting algorithm, which is
developed for use on parallel processors with local interconnection. It works
by comparing all odd/even indexed pairs of adjacent elements in the list and,
if a pair is in the wrong order the elements are switched. The next step
repeats this for even/odd indexed pairs. Then it alternates between odd/even
and even/odd steps until the list is sorted.
Pseudo code for Odd-Even Sort:

Wikipedia has best illustration of Odd-Even sort:

Example of Odd-Even Sort:


Implementation:
I used C# language to implement Odd-Even Sort Algorithm.
Time Complexity: O(n)
Chapter 38: Selection Sort
Section 38.1: Elixir Implementation
Section 38.2: Selection Sort Basic
Information
Selection sort is a sorting algorithm, specifically an in-place comparison sort.
It has O(n2) time complexity, making it inefficient on large lists, and
generally performs worse than the similar insertion sort. Selection sort is
noted for its simplicity, and it has performance advantages over more
complicated algorithms in certain situations, particularly where auxiliary
memory is limited.
The algorithm divides the input list into two parts: the sublist of items already
sorted, which is built up from left to right at the front (left) of the list, and the
sublist of items remaining to be sorted that occupy the rest of the list.
Initially, the sorted sublist is empty and the unsorted sublist is the entire input
list. The algorithm proceeds by finding the smallest (or largest, depending on
sorting order) element in the unsorted sublist, exchanging (swapping) it with
the leftmost unsorted element (putting it in sorted order), and moving the
sublist boundaries one element to the right.
Pseudo code for Selection sort:

Visualization of selection sort:


Example of Selection sort:
O(n)
Auxiliary
Space:
O(n^2)Time Complexity:
Section 38.3: Implementation
of Selection sort in C#
I used C# language to implement Selection sort algorithm.
Chapter 39: Searching
Section 39.1: Binary Search
Introduction
log n Binary Search is a Divide and Conquer search algorithm. It uses O() time
to find the location of an element in a search space where n is the size of the
search space.
Binary Search works by halving the search space at each iteration after
comparing the target value to the middle value of the search space.
To use Binary Search, the search space must be ordered (sorted) in some
way. Duplicate entries (ones that compare as equal according to the
comparison function) cannot be distinguished, though they don't violate the
Binary Search property.
Conventionally, we use less than (<) as the comparison function. If a < b, it
will return true. if a is not less than b and b is not less than a, a and b are
equal.

Example Question
You are an economist, a pretty bad one though. You are given the task of
finding the equilibrium price (that is, the price where supply = demand) for
rice.
Remember the higher a price is set, the larger the supply and the lesser the
demand
As your company is very efficient at calculating market forces, you can
instantly get the supply and demand in units of rice when the price of rice is
set at a certain price p.
10 ^ 17Yourboss wants the equilibrium price ASAP, but tells you that the
equilibrium price can be a positive integer that is at most and there is
guaranteed to be exactly 1 positive integer solution in the range. So get going
with your job before you lose it!
getSupply (k) getDemand You
are allowed to call functions (k), which will do
and exactly what is stated in the problem.
Example Explanation
10 ^ 17Here our search space is from 1 to . Thus a linear search is infeasible.
getSupply (k) increases getDemand (k) decreases. Thus, x >y However, notice
and for any that as the k goes
up, ,
getDemand getSupply getDemand getSupply

(x) -(x) >(y) -(y). Therefore, this search space is monotonic and
we can use Binary Search.
The following psuedocode demonstrates the usage of Binary Search:

else if demand > supply low = mid <- Solution is in


upper half of search space else <- supply==demand
condition return mid <- Found solution

~O ( log 10 ^ 17 )
time. This can be ~O ( log S This algorithm runs in ) time
generalized to where S is the size of the
search space since at every iteration of the while loop, we halved the search
space (from [low:high] to either [low:mid] or [mid:high]).
C Implementation of Binary Search with Recursion
Section 39.2: Rabin Karp
The Rabin – Karp algorithm or Karp – Rabin algorithm is a string searching
algorithm that uses hashing to find any one of a set of pattern strings in a
text.Its average and best case running time is O(n+m) in space O(p), but its
worst-case time is O(nm) where n is the length of the text and m is the length
of the pattern.
Algorithm implementation in java for string matching

While calculating hash value we are dividing it by a prime number in order to


avoid collision.After dividing by prime number the chances of collision will
be less, but still ther is a chance that the hash value can be same for two
strings,so when we get a match we have to check it character by character to
make sure that we got a proper match.
t =(d*(t - text.charAt(i)*h) + text.charAt(i+m))%q;
This is to recalculate the hash value for pattern,first by removing the left most
character and then adding the new character from the text.
Section 39.3: Analysis of Linear search
(Worst, Average and Best Cases)
We can have three cases to analyze an algorithm:
1. Worst Case
2. Average Case 3. Best Case

/* Driver program to test above functions*/

Worst Case Analysis (Usually Done)


In the worst case analysis, we calculate upper bound on running time of an
algorithm. We must know the case that causes maximum number of
operations to be executed. For Linear Search, the worst case happens when
the element to be searched (x in the above code) is not present in the array.
When x is not present, the search() functions compares it with all the
elements of arr[] one by one. Therefore, the worst case time complexity of
linear search would be Θ (n)
Average Case Analysis (Sometimes done)
In average case analysis, we take all possible inputs and calculate computing
time for all of the inputs. Sum all the calculated values and divide the sum by
total number of inputs. We must know (or predict) distribution of cases. For
the linear search problem, let us assume that all cases are uniformly
distributed (including the case of x not being present in array). So we sum all
the cases and divide the sum by (n+1). Following is the value of average case
time complexity.

Best Case Analysis (Bogus)


In the best case analysis, we calculate lower bound on running time of an
algorithm. We must know the case that causes minimum number of
operations to be executed. In the linear search problem, the best case occurs
when x is present at the first location. The number of operations in the best
case is constant (not dependent on n). So time complexity in the best case
would be Θ (1) Most of the times, we do worst case analysis to analyze
algorithms. In the worst analysis, we guarantee an upper bound on the
running time of an algorithm which is good information. The average case
analysis is not easy to do in most of the practical cases and it is rarely done.
In the average case analysis, we must know (or predict) the mathematical
distribution of all possible inputs. The Best Case analysis is bogus.
Guaranteeing a lower bound on an algorithm doesn ’ t provide any
information as in the worst case, an algorithm may take years to run.
For some algorithms, all the cases are asymptotically same, i.e., there are no
worst and best cases. For example, Merge Sort. Merge Sort does Θ (nLogn)
operations in all cases. Most of the other sorting algorithms have worst and
best cases. For example, in the typical implementation of Quick Sort (where
pivot is chosen as a corner element), the worst occurs when the input array is
already sorted and the best occur when the pivot elements always divide
array in two halves. For insertion sort, the worst case occurs when the array
is reverse sorted and the best case occurs when the array is sorted in the same
order as output.
Section 39.4: Binary Search: On Sorted
Numbers
It's easiest to show a binary search on numbers using pseudo-code

Do not attempt to return early by comparing array[mid] to x for equality. The


extra comparison can only slow the code down. Note you need to add one to
low to avoid becoming trapped by integer division always rounding down.
Interestingly, the above version of binary search allows you to find the
smallest occurrence of x in the array. If the array contains duplicates of x, the
algorithm can be modified slightly in order for it to return the largest
occurrence of x by simply adding to the if conditional:
Section 39.5: Linear search
Linear search is a simple algorithm. It loops through items until the query has
been found, which makes it a linear algorithm - the complexity is O(n),
where n is the number of items to go through.
Why O(n)? In worst-case scenario, you have to go through all of the n items.
It can be compared to looking for a book in a stack of books - you go through
them all until you find the one that you want.
Below is a Python implementation:
Chapter 40: Substring Search
Section 40.1: Introduction To Knuth-
Morris-Pratt (KMP) Algorithm
Suppose that we have a text and a pattern. We need to determine if the
pattern exists in the text or not. For example:

This pattern does exist in the text. So our substring search should return 3,
the index of the position from which this pattern starts. So how does our
brute force substring search procedure work?
mn What we usually do is: we start from the 0th index of the text and the 0th
index of our *pattern and we compare Text[0] with Pattern[0]. Since they
are not a match, we go to the next index of our text and we compare Text[1]
with Pattern[0]. Since this is a match, we increment the index of our pattern
and the index of the Text also. We compare Text[2] with Pattern[1]. They
are also a match. Following the same procedure stated before, we now
compare Text[3] with Pattern[2]. As they do not match, we start from the
next position where we started finding the match. That is index 2 of the Text.
We compare Text[2] with Pattern[0]. They don't match. Then incrementing
index of the Text, we compare Text[3] with Pattern[0]. They match. Again
Text[4] and Pattern[1] match, Text[5] and Pattern[2] match and Text[6]
and Pattern[3] match. Since we've reached the end of our Pattern, we now
return the index from which our match started, that is 3. If our pattern was:
bcgll, that means if the pattern didn't exist in our text, our search should return
exception or -1 or any other predefined value. We can clearly see that, in the
worst case, this algorithm would take O() time where m is the length of the
Text and n is the length of the Pattern. How do we reduce this time
complexity? This is where KMP Substring Search Algorithm comes into the
picture.
The Knuth-Morris-Pratt String Searching Algorithm or KMP Algorithm
searches for occurrences of a "Pattern" within a main "Text" by employing
the observation that when a mismatch occurs, the word itself embodies
sufficient information to determine where the next match could begin, thus
bypassing re-examination of previously matched characters. The algorithm
was conceived in 1970 by Donuld Knuth and Vaughan Pratt and
independently by James H. Morris. The trio published it jointly in 1977.
Let's extend our example Text and Pattern for better understanding:
+-------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

| Index |0 |1 |2 |3 |4 |5 |6 |7 |8 |9 |10|11|12|13|14|15|16|17|18|19|20|21|22|
+-------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| Text |a |b |c |x |a |b |c |d |a |b |x |a |b |c |d |a |b |c |d |a |b |c |y |
+-------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+---------+---+---+---+---+---+---+---+---+
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

At first, our Text and Pattern matches till index 2. Text[3] and Pattern[3]
doesn't match. So our aim is to not go backwards in this Text, that is, in case
of a mismatch, we don't want our matching to begin again from the position
that we started matching with. To achieve that, we'll look for a suffix in our
Pattern right before our mismatch occurred (substring abc), which is also a
prefix of the substring of our Pattern. For our example, since all the
characters are unique, there is no suffix, that is the prefix of our matched
substring. So what that means is, our next comparison will start from index 0.
Hold on for a bit, you'll understand why we did this. Next, we compare
Text[3] with Pattern[0] and it doesn't match. After that, for Text from index
4 to index 9 and for Pattern from index 0 to index 5, we find a match. We
find a mismatch in Text[10] and Pattern[6]. So we take the substring from
Pattern right before the point where mismatch occurs (substring abcdabc),
we check for a suffix, that is also a prefix of this substring. We can see here
ab is both the suffix and prefix of this substring. What that means is, since
we've matched until Text[10], the characters right before the mismatch is ab.
What we can infer from it is that since ab is also a prefix of the substring we
took, we don't have to check ab again and the next check can start from
Text[10] and Pattern[2]. We didn't have to look back to the whole Text, we
can start directly from where our mismatch occurred. Now we check
Text[10] and Pattern[2], since it's a mismatch, and the substring before
mismatch (abc) doesn't contain a suffix which is also a prefix, we check
Text[10] and Pattern[0], they don't match. After that for Text from index 11
to index 17 and for Pattern from index 0 to index 6. We find a mismatch in
Text[18] and Pattern[7]. So again we check the substring before mismatch
(substring abcdabc) and find abc is both the suffix and the prefix. So since
we matched till Pattern[7], abc must be before Text[18]. That means, we
don't need to compare until Text[17] and our comparison will start from
Text[18] and Pattern[3]. Thus we will find a match and we'll return 15
which is our starting index of the match. This is how our KMP Substring
Search works using suffix and prefix information.
Now, how do we efficiently compute if suffix is same as prefix and at what
point to start the check if there is a mismatch of character between Text and
Pattern. Let's take a look at an example:

We'll generate an array containing the required information. Let's call the
array S. The size of the array will be same as the length of the pattern. Since
the first letter of the Pattern can't be the suffix of any prefix, we'll put S[0] =
0. We take i = 1 and j = 0 at first. At each step we compare Pattern[i] and
Pattern[j] and increment i. If there is a match we put S[i] = j + 1 and
increment j, if there is a mismatch, we check the previous value position of j
(if available) and set j = S[j-1] (if j is not equal to 0), we keep doing this until
S[j] doesn't match with S[i] or j doesn't become 0. For the later one, we put
S[i] = 0. For our example:
Pattern[j] and Pattern[i] don't match, so we increment i and since j is 0, we
don't check the previous value and put
Pattern[i] = 0. If we keep incrementing i, for i = 4, we'll get a match, so we
put S[i] = S[4] = j + 1 = 0 + 1 = 1 and increment j and i. Our array will look
like:

Since Pattern[1] and Pattern[5] is a match, we put S[i] = S[5] = j + 1 = 1 +


1 = 2. If we continue, we'll find a mismatch for j = 3 and i = 7. Since j is not
equal to 0, we put j = S[j-1]. And we'll compare the characters at i and j are
same or not, since they are same, we'll put S[i] = j + 1. Our completed array
will look like:

This is our required array. Here a nonzero-value of S[i] means there is a S[i]
length suffix same as the prefix in that substring (substring from 0 to i) and
the next comparison will start from S[i] + 1 position of the Pattern. Our
algorithm to generate the array would look like:
The time complexity to build this array is O(n) and the space complexity is
also O(n). To make sure if you have completely understood the algorithm, try
to generate an array for pattern aabaabaa and check if the result matches with
this one.
Now let's do a substring search using the following example:

+---------+---+---+---+---+---+---+---+---+---+---+---+---+
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |
+---------+---+---+---+---+---+---+---+---+---+---+---+---+
| Text | a | b | x | a | b | c | a | b | c | a | b | y |
+---------+---+---+---+---+---+---+---+---+---+---+---+---+
+---------+---+---+---+---+---+---+
| Index | 0 | 1 | 2 | 3 | 4 | 5 |

We have a Text, a Pattern and a pre-calculated array S using our logic


defined before. We compare Text[0] and Pattern[0] and they are same.
Text[1] and Pattern[1] are same. Text[2] and Pattern[2] are not same. We
check the value at the position right before the mismatch. Since S[1] is 0,
there is no suffix that is same as the prefix in our substring and our
comparison starts at position S[1], which is 0. So Pattern[0] is not same as
Text[2], so we move on. Text[3] is same as Pattern[0] and there is a match
till Text[8] and Pattern[5]. We go one step back in the S array and find 2. So
this means there is a prefix of length 2 which is also the suffix of this
substring (abcab) which is ab. That also means that there is an ab before
Text[8]. So we can safely ignore Pattern[0] and Pattern[1] and start our
next comparison from Pattern[2] and Text[8]. If we continue, we'll find the
Pattern in the Text. Our procedure will look like:

The time complexity of this algorithm apart from the Suffix Array
Calculation is O(m). Since GenerateSuffixArray takes O(n), the total time
complexity of KMP Algorithm is: O(m+n).
j := S
PS: If you want to find multiple occurrences of Pattern in the Text,
instead of returning the value, print it/store it and set [j-1]. Also keep a flag to
track whether you have found any occurrence or not and handle it
accordingly.
Section 40.2: Introduction to Rabin-
Karp Algorithm
Rabin-Karp Algorithm is a string searching algorithm created by Richard M.
Karp and Michael O. Rabin that uses hashing to find any one of a set of
pattern strings in a text.
A substring of a string is another string that occurs in. For example, ver is a
substring of stackoverflow. Not to be confused with subsequence because
cover is a subsequence of the same string. In other words, any subset of
consecutive letters in a string is a substring of the given string.
In Rabin-Karp algorithm, we'll generate a hash of our pattern that we are
looking for & check if the rolling hash of our text matches the pattern or not.
If it doesn't match, we can guarantee that the pattern doesn't exist in the text.
However, if it does match, the pattern can be present in the text. Let's look at
an example:
Let's say we have a text: yeminsajid and we want to find out if the pattern
nsa exists in the text. To calculate the hash and rolling hash, we'll need to use
a prime number. This can be any prime number. Let's take prime = 11 for
this example. We'll determine hash value using this formula:

(1st letter) X (prime) + (2nd letter) X (prime) ¹ + (3rd letter) X (prime) ² X + ......

We'll denote:

The hash value of nsa will be:

Now we find the rolling-hash of our text. If the rolling hash matches with the
hash value of our pattern, we'll check if the strings match or not. Since our
pattern has 3 letters, we'll take 1st 3 letters yem from our text and calculate
hash value. We get:

This value doesn't match with our pattern's hash value. So the string doesn't
exists here. Now we need to consider the next step. To calculate the hash
value of our next string emi. We can calculate this using our formula. But
that would be rather trivial and cost us more. Instead, we use another
technique.
1653 - 25 = 1628 1628 / 11 = 148 We subtract the value of the First Letter of
Previous String from our current hash value.
In this case, y. We get, .
We divide the difference with our prime,
which is 11 for this example. We get, .
148 X = 1237
11 ² We add new letter X (prime) ⁻¹ , where m is the length of the
pattern, with the quotient, which is i = 9. We get, + 9.
The new hash value is not equal to our patterns hash value. Moving on, for n
we get:

It doesn't match. After that, for s, we get:

It doesn't match. Next, for a, we get:


It's a match! Now we compare our pattern with the current string. Since both
the strings match, the substring exists in this string. And we return the
starting position of our substring.
The pseudo-code will be:
Hash Calculation:

Hash Recalculation:

String Match:

Rabin-Karp:

If the algorithm doesn't find any match, it simply returns -1.


This algorithm is used in detecting plagiarism. Given source material, the
algorithm can rapidly search through a paper for instances of sentences from
the source material, ignoring details such as case and punctuation. Because of
the abundance of the sought strings, single-string searching algorithms are
impractical here. Again, KnuthMorris-Pratt algorithm or Boyer-Moore
String Search algorithm is faster single pattern string searching algorithm,
than Rabin-Karp. However, it is an algorithm of choice for multiple pattern
search. If we want to find any of the large number, say k, fixed length
patterns in a text, we can create a simple variant of the Rabin-Karp
algorithm.
For text of length n and p patterns of combined length m, its average and
best case running time is O(n+m) in space O(p), but its worst-case time
is O(nm).
Section 40.3: Python Implementation
of KMP algorithm
Haystack: The string in which given pattern
needs to be searched. Needle: The pattern to
be searched.
Time complexity: Search portion (strstr method) has the complexity O(n)
where n is the length of haystack but as needle is also pre parsed for building
prefix table O(m) is required for building prefix table where m is the length of
the needle.
Therefore, overall time complexity for KMP is O(n+m)
Space complexity: O(m) because of prefix table on needle.
Note: Following implementation returns the start position of match in
haystack (if there is a match) else returns -1, for edge cases like if
needle/haystack is an empty string or needle is not found in haystack.
Section 40.4: KMP Algorithm in C
Given a text txt and a pattern pat, the objective of this program will be to
print all the occurance of pat in txt.
Examples:
Input:

output:

Input:

output:

C Language Implementation:

// values for pattern int *lps = (int


*)malloc(sizeof(int)*M); int j = 0; //
index for pat[]

// Preprocess the pattern (calculate lps[] array)


computeLPSArray(pat, M, lps); int i = 0; //
index for txt[] while (i < N) { if (pat[j] ==
txt[i]) { j++; i++; } if (j == M)
{ printf("Found pattern at index %d \n", i-
j); j = lps[j-1];
}

// mismatch after j matches


else if (i < N && pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0) j = lps[j-1];
else i = i+1; } }
free(lps); // to avoid memory
leak
} void computeLPSArray(char *pat, int M, int *lps) { int len =
0; // length of the previous longest prefix suffix int i; lps[0] = 0;
// lps[0] is always 0 i = 1;
// the loop calculates lps[i] for i = 1 to M-1
while (i < M) { if (pat[i] == pat[len])
{ len++; lps[i] = len; i++; }
else // (pat[i] != pat[len])
{
if
(len != 0)
{
// This is tricky. Consider the example
// AAACAAAA and i = 7.
len = lps[len-1];

// Also, note that we do not increment i here


Output:

Reference:
http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
Chapter 41: Breadth-First Search
Section 41.1: Finding the Shortest Path
from Source to other Nodes
Breadth-first-search (BFS) is an algorithm for traversing or searching tree or
graph data structures. It starts at the tree root (or some arbitrary node of a
graph, sometimes referred to as a 'search key') and explores the neighbor
nodes first, before moving to the next level neighbors. BFS was invented in
the late 1950s by Edward Forrest Moore, who used it to find the shortest path
out of a maze and discovered independently by C. Y. Lee as a wire routing
algorithm in 1961.
The processes of BFS algorithm works under these assumptions:
1. We won't traverse any node more than once.
2. Source node or the node that we're starting from is situated in level 0.
3. The nodes we can directly reach from source node are level 1 nodes, the
nodes we can directly reach from level 1 nodes are level 2 nodes and so
on.
4. The level denotes the distance of the shortest path from the source.
Let's see an example:
Let's assume this graph represents connection between multiple cities, where
each node denotes a city and an edge between two nodes denote there is a
road linking them. We want to go from node 1 to node 10. So node 1 is our
source, which is level 0. We mark node 1 as visited. We can go to node 2,
node 3 and node 4 from here. So they'll be level (0+1) = level 1 nodes. Now
we'll mark them as visited and work with them.
The colored nodes are visited. The nodes that we're currently working with
will be marked with pink. We won't visit the same node twice. From node 2,
node 3 and node 4, we can go to node 6, node 7 and node 8. Let's mark
them as visited. The level of these nodes will be level (1+1) = level 2.
If you haven't noticed, the level of nodes simply denote the shortest path
distance from the source. For example: we've found node 8 on level 2. So
the distance from source to node 8 is 2.
We didn't yet reach our target node, that is node 10. So let's visit the next
nodes. we can directly go to from node 6, node 7 and node 8.
We can see that, we found node 10 at level 3. So the shortest path from
source to node 10 is 3. We searched the graph level by level and found the
shortest path. Now let's erase the edges that we didn't use:
After removing the edges that we didn't use, we get a tree called BFS tree.
This tree shows the shortest path from source to all other nodes.
So our task will be, to go from source to level 1 nodes. Then from level 1 to
level 2 nodes and so on until we reach our destination. We can use queue to
store the nodes that we are going to process. That is, for each node we're
going to work with, we'll push all other nodes that can be directly traversed
and not yet traversed in the queue.
The simulation of our example:
First we push the source in the queue. Our queue will look like:
The level of node 1 will be 0. level[1] = 0. Now we start our BFS. At first,
we pop a node from our queue. We get node 1. We can go to node 4, node 3
and node 2 from this one. We've reached these nodes from node 1. So
level[4] = level[3] = level[2] = level[1] + 1 = 1. Now we mark them as
visited and push them in the queue.

Now we pop node 4 and work with it. We can go to node 7 from node 4.
level[7] = level[4] + 1 = 2. We mark node 7 as visited and push it in the
queue.

From node 3, we can go to node 7 and node 8. Since we've already marked
node 7 as visited, we mark node 8 as visited, we change level[8] = level[3] +
1 = 2. We push node 8 in the queue.

This process will continue till we reach our destination or the queue becomes
empty. The level array will provide us with the distance of the shortest path
from source. We can initialize level array with infinity value, which will
mark that the nodes are not yet visited. Our pseudo-code will be:
By iterating through the level array, we can find out the distance of each node
from source. For example: the distance of node 10 from source will be
stored in level[10].
parent [v] := u Sometimeswe might need to print not only the shortest distance,
but also the path via which we can go to our destined node from the source.
For this we need to keep a parent array. parent[source] will be NULL. For
each update in level array, we'll simply add in our pseudo code inside the for
loop. After finishing BFS, to find the path, we'll traverse back the parent
array until we reach source which will be denoted by NULL value. The
pseudo-code will be:

Procedure PrintPath(u): //recursive | Procedure PrintPath(u): //iterative if


parent[u] is not equal to null | S = Stack()
PrintPath(parent[u]) | while parent[u] is not equal to null end
if | S.push(u) print -> u | u :=
parent[u]
| end while
| while S is not empty
| print -> S.pop
| end while
Complexity:
We've visited every node once and every edges once. So the complexity will
be O(V + E) where V is the number of nodes and E is the number of edges.
Section 41.2: Finding Shortest Path
from Source in a 2D graph
Most of the time, we'll need to find out the shortest path from single source to
all other nodes or a specific node in a 2D graph. Say for example: we want to
find out how many moves are required for a knight to reach a certain square
in a chessboard, or we have an array where some cells are blocked, we have
to find out the shortest path from one cell to another. We can move only
horizontally and vertically. Even diagonal moves can be possible too. For
these cases, we can convert the squares or cells in nodes and solve these
problems easily using BFS. Now our visited, parent and level will be 2D
arrays. For each node, we'll consider all possible moves. To find the distance
to a specific node, we'll also check whether we have reached our destination.
There will be one additional thing called direction array. This will simply
store the all possible combinations of directions we can go to. Let's say, for
horizontal and vertical moves, our direction arrays will be:

Here dx represents move in x-axis and dy represents move in y-axis. Again


this part is optional. You can also write all the possible combinations
separately. But it's easier to handle it using direction array. There can be
more and even different combinations for diagonal moves or knight moves.
The additional part we need to keep in mind is:
If any of the cell is blocked, for every possible moves, we'll check if
the cell is blocked or not.
We'll also check if we have gone out of bounds, that is
we've crossed the array boundaries. The number of rows and
columns will be given.
Our pseudo-code will be:
As we have discussed earlier, BFS only works for unweighted graphs. For
weighted graphs, we'll need Dijkstra's algorithm. For negative edge cycles,
we need Bellman-Ford's algorithm. Again this algorithm is single source
shortest path algorithm. If we need to find out distance from each nodes to all
other nodes, we'll need FloydWarshall's algorithm.
Section 41.3: Connected Components
Of Undirected Graph Using BFS
BFS can be used to find the connected components of an undirected graph.
We can also find if the given graph is connected or not. Our subsequent
discussion assumes we are dealing with undirected graphs.The definition of a
connected graph is:

A graph is connected if there is a path between every pair of


vertices.
Following is a connected graph.

Following graph is not connected and has 2 connected components:


1. Connected Component 1: {a,b,c,d,e}
2. Connected Component 2: {f}
BFS is a graph traversal algorithm. So starting from a random source node, if
on termination of algorithm, all nodes are visited, then the graph is
connected,otherwise it is not connected.
PseudoCode for the algorithm.

C implementation for finding the whether an undirected graph is connected


or not:
{ int n,e;//n is number of vertices, e is number of edges.
int i,j; char
**graph;//adjacency matrix
printf("Enter number of vertices:");
scanf("%d",&n);
if(n < 0 || n > MAXVERTICES)
{ fprintf(stderr, "Please enter a valid positive integer from 1 to %d",MAXVERTICES);
return -1;
}
graph = malloc(n * sizeof(char *));
visited = malloc(n*sizeof(char));
for(i = 0;i < n;++i)
{ graph[i] = malloc(n*sizeof(int)); visited[i] = 'N';//initially
all vertices are not visited.
for(j = 0;j <
n;++j)
graph[i][j] = 0; }
printf("enter number of edges and then enter them in pairs:"); scanf("%d",&e);
for(i = 0;i < e;++i)
{ int u,v;

scanf("%d%d",&u,&v);
graph[u-1][v-
1] = 1;
graph[v-1][u-1] =
1;
} if(isConnected(graph,n))
printf("The graph is connected"); else
printf("The graph is NOT connected\n");
}
void
enqueue(int
vertex) {
if(Qfront ==
NULL)
{
Qfront = malloc(sizeof(Node));
Qfront->v = vertex;
Qfront->next = NULL;
Qrear = Qfront;

else

{
Nodeptr newNode =
malloc(sizeof(Node)); newNode->v =
vertex; newNode->next = NULL;
Qrear->next = newNode;
Qrear = newNode;

}
}
int deque()
{
For Finding all the Connected components of an undirected graph, we only
need to add 2 lines of code to the BFS function. The idea is to call BFS
function until all vertices are visited.
The lines to be added are:
AND

and we define the following function:


Chapter 42: Depth First Search
Section 42.1: Introduction To Depth-
First Search
Depth-first search is an algorithm for traversing or searching tree or graph
data structures. One starts at the root and explores as far as possible along
each branch before backtracking. A version of depth-first search was
investigated in the 19th century French mathematician Charles Pierre
Tr é maux as a strategy for solving mazes.
Depth-first search is a systematic way to find all the vertices reachable from a
source vertex. Like breadth-first search, DFS traverse a connected
component of a given graph and defines a spanning tree. The basic idea of
depthfirst search is methodically exploring every edge. We start over from a
different vertices as necessary. As soon as we discover a vertex, DFS starts
exploring from it (unlike BFS, which puts a vertex on a queue so that it
explores from it later).
Let's look at an example. We'll traverse this graph:

We'll traverse the graph following these rules:


We'll start from the source.
No node will be visited twice.
The nodes we didn't visit yet, will be colored white.
The node we visited, but didn't visit all of its child nodes, will be
colored grey.
Completely traversed nodes will be colored black.
Let's look at it step by step:
We can see one important keyword. That is backedge. You can see. 5-1 is
called backedge. This is because, we're not yet done with node-1, so going
from another node to node-1 means there's a cycle in the graph. In DFS, if
we can go from one gray node to another, we can be certain that the graph
has a cycle. This is one of the ways of detecting cycle in a graph. Depending
on source node and the order of the nodes we visit, we can find out any edge
in a cycle as backedge. For example: if we went to 5 from 1 first, we'd have
found out 2-1 as backedge.
The edge that we take to go from gray node to white node are called tree
edge. If we only keep the tree edge's and remove others, we'll get DFS tree.
In undirected graph, if we can visit a already visited node, that must be a
backedge. But for directed graphs, we must check the colors. If and only if
we can go from one gray node to another gray node, that is called a
backedge.
In DFS, we can also keep timestamps for each node, which can be used in
many ways (e.g.: Topological Sort).
1. When a node v is changed from white to gray the time is recorded in
d[v].
2. When a node v is changed from gray to black the time is recorded in
f[v].
Here d[] means discovery time and f[] means finishing time. Our pesudo-code
will look like:

Complexity:
Each nodes and edges are visited once. So the complexity of DFS is O(V+E),
where V denotes the number of nodes and E denotes the number of edges.
Applications of Depth First Search:
Finding all pair shortest path in an undirected graph.
Detecting cycle in a graph.
Path finding.
Topological Sort.
Testing if a graph is bipartite.
Finding Strongly
Connected
Component. Solving
puzzles with one
solution.
Chapter 43: Hash Functions
Section 43.1: Hash codes for common
types in C#
GetHashCode() The
hash codes produced by method for built-in and common C#
types from the System namespace are shown below.
Boolean

1 if value is

true, 0

otherwise.

Byte,

UInt16,

Int32,

UInt32,

Single
Value (if necessary casted to Int32).
SByte

Char

Int16

Int64, Double
Xor between lower and upper 32 bits of 64 bit number

Decimal
Object

The default implementation is used sync block index.


String
Hash code computation depends on the platform type (Win32 or Win64),
feature of using randomized string hashing, Debug / Release mode. In case of
Win64 platform:

ValueType
The first non-static field is look for and get it's hashcode. If the type has no
non-static fields, the hashcode of the type returns. The hashcode of a static
member can't be taken because if that member is of the same type as the
original type, the calculating ends up in an infinite loop.
Nullable<T>

Array

References
GitHub .Net Core CLR
Section 43.2:
Introduction to hash
functions
x ∈ X of arbitrary size y ∈ Y Hash function h() is an arbitrary function which
to value mapped data of fixed size: y
h =(x). Good hash functions have follows
restrictions:

hash functions behave likes uniform distribution


hash functions is deterministic. h(x) should always
return the same value for a given x fast calculating
(has runtime O(1))

In general case size of hash function less then size of input data: |y| < |x|. Hash
functions are not reversible or in
x1, x2 ∈ X, x1 ≠ x2 : h ( x1 h ( x2other words it may be collision: ∃ ) =). X may be
finite or infinite set and Y is finite set.
Hash functions are used in a lot of parts of computer science, for example in
software engineering, cryptography, databases, networks, machine learning
and so on. There are many different types of hash functions, with differing
domain specific properties.
GetHashCode hashCode Often
hash is an integer value. There are special methods
in programmning languages for hash calculating. For example, in C# ()
method for all types returns Int32 value (32 bit integer number). In Java every
class provides () method which return int. Each data type has own or user
defined implementations.
Hash methods
z z x∈X
There are several approaches for determinig hash function. Without loss of
generality, lets = { ∈ ℤ : ≥ 0} are positive integer numbers. Often m is prime
(not too close to an exact power of 2).
Method Hash function
x mod m

Division method h(x) =


z ⌊ m ( xA 1) ⌋ , z∈ ℝ
mod A
Multiplication method h(x) = ∈ {: 0 << 1}
Hash table
Hash functions used in hash tables for computing index into an array of slots.
Hash table is data structure for implementing dictionaries (key-value
structure). Good implemented hash tables have O(1) time for the next
operations: insert, search and delete data by key. More than one keys may
hash to the same slot. There are two ways for resolving collision:
1. Chaining: linked list is used for storing elements with the same hash
value in slot
2. Open addressing: zero or one element is stored in each slot

The next methods are used to compute the probe sequences required for open
addressing
Method Formula
'(x) + i) mod m x, i

Linear probing h() = (h


'(x) + c1*i + c2*i^2) mod m x, i
Quadratic probing h() = (h
x, i h1 i * h2 (x)) mod
m Double hashing h() = ((x) +
i , 1 , ..., m -1}, '(x), h1(x), h2(x)
h Where ∈ {0 are auxiliary hash functions, c1,
c2 are positive auxiliary constants.
Examples
x ∈ U {1 , 1000 } , h = x modLets. The next table shows the hash values in case
m
of not prime and prime. Bolded text indicates the
same hash values. x m = 100 (not prime) m = 101 (prime)
723 23 16
103 3 2
738 38 31
292 92 90
61 61 61
87 87 87
995 95 86
549 49 44
991 91 82
757 57 50
920 20 11
626 26 20
557 57 52
831 31 23
619 19 13
Links
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
Clifford Stein. Introduction to Algorithms.
Overview of Hash Tables
Wolfram MathWorld - Hash Function
Chapter 44: Travelling Salesman
Section 44.1: Brute Force Algorithm
A path through every vertex exactly once is the same as ordering the vertex
in some way. Thus, to calculate the minimum cost of travelling through every
vertex exactly once, we can brute force every single one of the N!
permutations of the numbers from 1 to N.
Psuedocode

Time Complexity
N * NThere are N! permutations to go through and the cost of each path is
calculated in O(N), thus this algorithm takes O(!) time to output the exact
answer.
Section 44.2: Dynamic Programming
Algorithm
Notice that if we consider the path (in order):

and the path

The cost of going from vertex 1 to vertex 2 to vertex 3 remains the same, so
why must it be recalculated? This result can be saved for later use.
dp [ bitmask ] vertex Let ]
represent the minimum cost of travelling through all the
[
vertices whose corresponding bit in bitmask is set to 1 ending
at vertex. For example:

dp [ 12Since 12
represents 1100 in binary, ][2] represents going through vertices 2
and 3 in the graph with the path ending at vertex 2.
Thus we can have the following algorithm (C++ implementation):

int cost[N][N]; //Adjust the value of N if needed int


memo[1 << N][N]; //Set everything here to -1 int TSP(int
bitmask, int pos){ int cost = INF; if (bitmask == ((1 << N) - 1))
{ //All vertices have been explored return cost[pos][0];
//Cost to go back
} if (memo[bitmask][pos] != -1){ //If this has already been
computed
return memo[bitmask][pos]; //Just return the value, no need to recompute
} for (int i = 0; i < N; ++i){ //For every vertex if ((bitmask & (1 << i)) == 0){
//If the vertex has not been visited cost = min(cost,TSP(bitmask | (1 << i) , i) +
cost[pos][i]); //Visit the vertex
} } memo[bitmask][pos] = cost;
//Save the result return cost; }
//Call TSP(1,0)

This line may be a little confusing, so lets go through it slowly:


cost = min(cost,TSP(bitmask | (1 << i) , i) + cost[pos][i]);

bitmask << i

Here, | (1) sets the ith bit of bitmask to 1, which represents that the ith vertex has
been visited. The i after the comma represents the new pos in that function
call, which represents the new "last" vertex.
cost [ pos ][i] is to add the cost of travelling from vertex pos to vertex i.

Thus, this line is to update the value of cost to the minimum possible value of
travelling to every other vertex that has not been visited yet.
Time Complexity
TSP ( bitmask,pos
The function ) has 2^N values for bitmask and N values for pos.
Each function takes O(N) time to run (the for loop). Thus this implementation
takes O(N^2 * 2^N) time to output the exact answer.
Chapter 45: Knapsack Problem
Section 45.1: Knapsack Problem Basics
The Problem: Given a set of items where each item contains a weight and
value, determine the number of each to include in a collection so that the
total weight is less than or equal to a given limit and the total value is as large
as possible.
Pseudo code for Knapsack Problem
Given:
1. Values(array v)
2. Weights(array w)
3. Number of distinct items(n)
4. Capacity(W)

A simple implementation of the above pseudo code using Python:

Running the code: Save this in a file named knapSack.py

O(nW) Time Complexity of the above code: where n is the number of items
and W is the capacity of knapsack.
Section 45.2: Solution Implemented in
C#
Chapter 46: Equation Solving
Section 46.1: Linear Equation
There are two classes of methods for solving Linear Equations:
1. Direct Methods: Common characteristics of direct methods are that they
transform the original equation into equivalent equations that can be
solved more easily, means we get solve directly from an equation.
2. Iterative Method: Iterative or Indirect Methods, start with a guess of the
solution and then repeatedly refine the solution until a certain
convergence criterion is reached. Iterative methods are generally less
efficient than direct methods because large number of operations
required. Example- Jacobi's Iteration Method, Gauss-Seidal Iteration
Method.

Implementation in C-
while(!rootFound){
for(i=0; i<n; i++){ //calculation Nx[i]=b[i];
for(j=0; j<n; j++){ if(i!=j) Nx[i] = Nx[i]-a[i]
[j]*Nx[j];
}
Nx[i] = Nx[i] / a[i][i]; }
rootFound=1; //verification for(i=0; i<n; i++){ if(!( (Nx[i]-x[i])/x[i] >
-0.000001 && (Nx[i]-x[i])/x[i] < 0.000001 )){ rootFound=0; break; }
}
for(i=0; i<n; i++){ //evaluation
x[i]=Nx[i]; } }
return ; }
//Print array with comma separation
void print(int n, double x[n]){ int i;
for(i=0; i<n; i++){ printf("%lf, ",
x[i]);
} printf("\n\n");
return ; }
int main(){
//equation initialization int n=3;
//number of variables double x[n];
//variables
double b[n], //constants a[n][n];
//arguments
//assign values a[0][0]=8; a[0][1]=2; a[0][2]=-2; b[0]=8; //8x ₁
+2x ₂ -2x ₃ +8=0 a[1][0]=1; a[1][1]=-8; a[1][2]=3; b[1]=-4; //x ₁ -8x ₂
+3x ₃ -4=0 a[2][0]=2; a[2][1]=1; a[2][2]=9; b[2]=12; //2x ₁ +x ₂
+9x ₃ +12=0

int i;
for(i=0; i<n; i++){ //initialization x[i]=0; }
JacobisMethod(n, x, b, a); print(n, x);

for(i=0; i<n; i++){ //initialization


Section 46.2: Non-Linear Equation
An equation of the type f(x)=0 is either algebraic or transcendental. These
types of equations can be solved by using two types of methods-
1. Direct Method: This method gives the exact value of all the roots
directly in a finite number of steps.
2. Indirect or Iterative Method: Iterative methods are best suited for
computer programs to solve an equation. It is based on the concept of
successive approximation. In Iterative Method there are two ways to
solve an equation-
Bracketing Method: We take two initial points where the root lies
in between them. ExampleBisection Method, False Position
Method.
Open End Method: We take one or two initial values where the
root may be any-where. ExampleNewton-Raphson Method,
Successive Approximation Method, Secant Method.

Implementation in C:
} } printf("It took %d loops.\n",
loopCounter);

return
root;
}
/** * Takes two initial values and shortens the distance by single side.
**/ double FalsePosition(){ double root=0;
double a=1,
b=2; double
c=0;
int loopCounter=0; if(f(a)*f(b) < 0){
while(1){ loopCounter++; c=
(a*f(b) - b*f(a)) / (f(b) - f(a));
/*/printf("%lf\t %lf \n", c, f(c));/**////test
if(f(c)<0.00001 && f(c)>-0.00001){ root=c;
break; }
if((f(a))*(f(c)) < 0){
b=c;
}else{ a=c;
}
} } printf("It took %d loops.\n",
loopCounter);

return
root;
}
/** * Uses one initial value and gradually takes that value near to the real one. **/
double NewtonRaphson(){ double root=0;
double
x1=1;
double
x2=0;
int
loopCounter=0;
while(1){
loopCounter++;
x2 = x1 - (f(x1)/f2(x1));
/*/printf("%lf \t %lf \n", x2, f(x2));/**////test
if(f(x2)<0.00001 &&
f(x2)>-0.00001){ root=x2;
break;
}

x1=x2; } printf("It took %d


loops.\n", loopCounter);

return
root;
}
/** * Uses one initial value and gradually takes that value near to the real one. **/
double FixedPoint(){ double root=0; double x=1;
int
loopCounter=0;
while(1){
loopCounter++;
if( (x-g(x)) <0.00001 && (x-g(x)) >-0.00001){
root = x; break; }
/*/printf("%lf \t %lf \n", g(x), x-(g(x)));/**////test
x=g(x); } printf("It took %d
loops.\n", loopCounter);

return
root;
}
/** * uses two initial values & both value approaches to the
root. **/ double Secant(){ double root=0;
double
x0=1;
double
x1=2;
double
x2=0;
int
loopCounter=0;
while(1){
loopCounter++;
/*/printf("%lf \t %lf \t %lf \n", x0, x1, f(x1));/**////test
if(f(x1)<0.00001 && f(x1)>-0.00001){
root=x1; break; } x2 = ((x0*f(x1))-
(x1*f(x0))) / (f(x1)-f(x0));
x0=x1; x1=x2; } printf("It
took %d loops.\n", loopCounter);
return root;
}
Chapter 47: Longest Common
Subsequence
Section 47.1: Longest Common
Subsequence Explanation
One of the most important implementations of Dynamic Programming is
finding out the Longest Common Subsequence. Let's define some of the
basic terminologies first.
Subsequence:
A subsequence is a sequence that can be derived from another sequence by
deleting some elements without changing the order of the remaining
elements. Let's say we have a string ABC. If we erase zero or one or more
than one character from this string we get the subsequence of this string. So
the subsequences of string ABC will be {"A", "B", "C", "AB", "AC",
"BC", "ABC", " "}. Even if we remove all the characters, the empty string
will also be a subsequence. To find out the subsequence, for each characters
in a string, we have two options - either we take the character, or we don't. So
if the length of the string is n, there are 2n subsequences of that string.
Longest Common Subsequence:
As the name suggest, of all the common subsequencesbetween two strings,
the longest common subsequence(LCS) is the one with the maximum length.
For example: The common subsequences between "HELLOM" and
"HMLD" are "H", "HL", "HM" etc. Here "HLL" is the longest common
subsequence which has length 3.
Brute-Force Method:
We can generate all the subsequences of two strings using backtracking.
Then we can compare them to find out the common subsequences. After
we'll need to find out the one with the maximum length. We have already
seen that, there are 2n subsequences of a string of length n. It would take
years to solve the problem if our n crosses 20-25.
Dynamic Programming Method:
Let's approach our method with an example. Assume that, we have two
strings abcdaf and acbcf. Let's denote these with s1 and s2. So the longest
common subsequence of these two strings will be "abcf", which has length
4. Again I remind you, subsequences need not be continuous in the string. To
construct "abcf", we ignored "da" in s1 and "c" in s2. How do we find this
out using Dynamic Programming?
We'll start with a table (a 2D array) having all the characters of s1 in a row
and all the characters of s2 in column. Here the table is 0-indexed and we put
the characters from 1 to onwards. We'll traverse the table from left to right
for each row. Our table will look like:
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+

0 | | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

Here each row and column represent the length of the longest common
subsequence between two strings if we take the characters of that row and
column and add to the prefix before it. For example: Table[2][3] represents
the length of the longest common subsequence between "ac" and "abc".
The 0-th column represents the empty subsequence of s1. Similarly the 0-th
row represents the empty subsequence of s2. If we take an empty
subsequence of a string and try to match it with another string, no matter how
long the length of the second substring is, the common subsequence will
have 0 length. So we can fill-up the 0th rows and 0-th columns with 0's. We
get:
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+

0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
5 | f | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

Let's begin. When we're filling Table[1][1], we're asking ourselves, if we had
a string a and another string a and nothing else, what will be the longest
common subsequence here? The length of the LCS here will be 1. Now let's
look at Table[1][2]. We have string ab and string a. The length of the LCS
will be 1. As you can see, the rest of the values will be also 1 for the first row
as it considers only string a with abcd, abcda, abcdaf. So our table will look
like:
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+

0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
5 | f | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

For row 2, which will now include c. For Table[2][1] we have ac on one side
and a on the other side. So the length of the LCS is 1. Where did we get this
1 from? From the top, which denotes the LCS a between two substrings. So
what we are saying is, if s1[2] and s2[1] are not same, then the length of the
LCS will be the maximum of the length of
LCS at the top, or at the left. Taking the length of the LCS at the top denotes
that, we don't take the current character from s2. Similarly, Taking the length
of the LCS at the left denotes that, we don't take the current character from s1
to create the LCS. We get:
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+
0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | 0 | 1 | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
5 | f | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

So our first formula will be:

Moving on, for Table[2][2] we have string ab and ac. Since c and b are not
same, we put the maximum of the top or left here. In this case, it's again 1.
After that, for Table[2][3] we have string abc and ac. This time current
values of both row and column are same. Now the length of the LCS will be
equal to the maximum length of LCS so far + 1. How do we get the
maximum length of LCS so far? We check the diagonal value, which
represents the best match between ab and a. From this state, for the current
values, we added one more character to s1 and s2 which happened to be the
same. So the length of LCS will of course increase. We'll put 1 + 1 = 2 in
Table[2][3]. We get,
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+

0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | 0 | 1 | 1 | 2 | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
5 | f | 0 | | | | | | |
+-----+-----+-----+-----+-----+-----+-----+-----+

So our second formula will be:

We have defined both the cases. Using these two formulas, we can populate
the whole table. After filling up the table, it will look like this:
0 1 2 3 4 5 6
+-----+-----+-----+-----+-----+-----+-----+-----+
| ch ʳ | | a | b | c | d | a | f |
+-----+-----+-----+-----+-----+-----+-----+-----+
0 | | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+-----+-----+-----+
1 | a | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
+-----+-----+-----+-----+-----+-----+-----+-----+
2 | c | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
+-----+-----+-----+-----+-----+-----+-----+-----+
3 | b | 0 | 1 | 2 | 2 | 2 | 2 | 2 |
+-----+-----+-----+-----+-----+-----+-----+-----+
4 | c | 0 | 1 | 2 | 3 | 3 | 3 | 3 |
+-----+-----+-----+-----+-----+-----+-----+-----+
5 | f | 0 | 1 | 2 | 3 | 3 | 3 | 4 |
+-----+-----+-----+-----+-----+-----+-----+-----+

The length of the LCS between s1 and s2 will be Table[5][6] = 4. Here, 5


and 6 are the length of s2 and s1 respectively. Our pseudo-code will be:

The time complexity for this algorithm is: O(mn) where m and n denotes the
length of each strings.
How do we find out the longest common subsequence? We'll start from the
bottom-right corner. We will check from where the value is coming. If the
value is coming from the diagonal, that is if Table[i-1][j-1] is equal to
Table[i][j] - 1, we push either s2[i] or s1[j] (both are the same) and move
diagonally. If the value is coming from top, that means, if Table[i-1][j] is
equal to Table[i][j], we move to the top. If the value is coming from left, that
means, if Table[i][j-1] is equal to Table[i][j], we move to the left. When we
reach the leftmost or topmost column, our search ends. Then we pop the
values from the stack and print them. The pseudo-code:

Point to be noted: if both Table[i-1][j] and Table[i][j-1] is equal to Table[i]


[j] and Table[i-1][j-1] is not equal to Table[i][j] - 1, there can be two LCS
for that moment. This pseudo-code doesn't consider this situation. You'll
have to solve this recursively to find multiple LCSs.
The time complexity for this algorithm is: O(max(m, n)).
Chapter 48: Longest Increasing
Subsequence
Section 48.1: Longest Increasing
Subsequence Basic Information
The Longest Increasing Subsequence problem is to find subsequence from
the give input sequence in which subsequence's elements are sorted in lowest
to highest order. All subsequence are not contiguous or unique.
Application of Longest Increasing Subsequence:
Algorithms like Longest Increasing Subsequence, Longest Common
Subsequence are used in version control systems like Git and etc.
Simple form of Algorithm:
1. Find unique lines which are common to both documents.
2. Take all such lines from the first document and order them according to
their appearance in the second document.
3. Compute the LIS of the resulting sequence (by doing a Patience Sort),
getting the longest matching sequence of lines, a correspondence
between the lines of two documents.
4. Recurse the algorithm on each range of lines between already matched
ones.
Now let us consider a simpler example of the LCS problem. Here, input is
only one sequence of distinct integers a1,a2,...,an., and we want to find the
longest increasing subsequence in it. For example, if input is 7,3,8,4,2,6 then
the longest increasing subsequence is 3,4,6.
The easiest approach is to sort input elements in increasing order, and apply
the LCS algorithm to the original and sorted sequences. However, if you look
at the resulting array you would notice that many values are the same, and
the array looks very repetitive. This suggest that the LIS (longest increasing
subsequence) problem can be done with dynamic programming algorithm
using only one-dimensional array.
Pseudo Code:
1. Describe an array of values we want to compute.
<= i <= n max i ≤ n
For 1, let A(i) be the length of a longest increasing sequence of input.
Note that the length we are ultimately interested in is {A(i)|1 ≤ }.
2. Give a recurrence.
<= i <= n input max j < i and input

For 1, A(i) = 1 +{A(j)|1 ≤ (j) <(i)}.


3. Compute the values of A.
4. Find the optimal solution.
The following program uses A to compute an optimal solution. The first part
computes a value m such that A(m) is the length of an optimal increasing
subsequence of input. The second part computes an optimal increasing
subsequence, but for convenience we print it out in reverse order. This
program runs in time O(n), so the entire algorithm runs in time O(n^2).
Part 1:

Part 2:

Recursive Solution:
Approach 1:
Approach 2:

Approach 3:

Time Complexity in Approach 3: O(n^2)


Iterative Algorithm:
Computes the values iteratively in bottom up fashion.
Auxiliary Space: O(n)
Lets take {0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15} as input. So,
Longest Increasing Subsequence for the given input is {0, 2, 6, 9, 11, 15}.
Chapter 49: Check two strings are
anagrams
Two string with same set of character is called anagram. I have used
javascript here.
We will create an hash of str1 and increase count +1. We will loop on 2nd
string and check all characters are there in hash and decrease value of hash
key. Check all value of hash key are zero will be anagram.
Section 49.1: Sample input and output
Ex1:

These strings are anagrams.


// Create Hash from str1 and increase one count.

You can see hashKey 'o' is containing value 2 because o is 2 times in string.
Now loop over str2 and check for each character are present in hashMap, if
yes, decrease value of hashMap Key, else return false (which indicate it's not
anagram).

Now, loop over hashMap object and check all values are zero in the key of
hashMap.
In our case all values are zero so its a
anagram.
Section 49.2: Generic
Code for Anagrams
Time complexity: 3n i.e O(n).
Chapter 50: Pascal's Triangle
Section 50.1: Pascal triangle in C

Output
Chapter 51: Algo:- Print a m*n
matrix in square wise
Check sample input and output below.
Section 51.1: Sample Example
Section 51.2: Write the generic code
Chapter 52: Matrix Exponentiation
Section 52.1: Matrix Exponentiation to
Solve Example Problems
f f

Find f(n): nth Fibonacci number. The problem is quite easy when n is
relatively small. We can use simple recursion, f(n) =(n-1) +(n-2), or we can use
dynamic programming approach to avoid the calculation of same function
over and over again. But what will you do if the problem says, Given 0 < n <
10 ⁹ , find f(n) mod 999983? Dynamic programming will fail, so how do we
tackle this problem?
First let's see how matrix exponentiation can help to represent recursive
relation.
Prerequisites:
Given two matrices, know how to find their product. Further, given
the product matrix of two matrices, and one of them, know how to
find the other matrix.
Given a matrix of size d X d, know how to find its nth power in
O(d3log(n)).
Patterns:
At first we need a recursive relation and we want to find a matrix M which
can lead us to the desired state from a set of already known states. Let's
assume that, we know the k states of a given recurrence relation and we want
to find the (k+1)th state. Let M be a k X k matrix, and we build a matrix A:
[k X 1] from the known states of the recurrence relation, now we want to get
a matrix B:[k X 1] which will represent the set of next states, i. e. M X A =
B as shown below:

So, if we can design M accordingly, our job will be done! The matrix will
then be used to represent the recurrence relation.
Type 1:
f f

Let's start with the simplest one, f(n) =(n-1) +(n-2)


f f

We get, f(n+1) =(n) +(n-1).


Let's assume, we know f(n) and f(n-1); We want to find out f(n+1).
From the situation stated above, matrix A and matrix B can be formed as
shown below:

[Note: Matrix A will be always designed in such a way that, every state on
which f(n+1) depends, will be present] Now, we need to design a 2X2 matrix
M such that, it satisfies M X A = B as stated above.

The first element of B is f(n+1) which is actually f(n) + (n-1). To get this, from
matrix A, we need, 1 X f(n) and 1 X f(n-1). So the first row of M will be [1
1].
[Note: ----- means we are not concerned about this value.]
Similarly, 2nd item of B is f(n) which can be got by simply taking 1 X f(n)
from A, so the 2nd row of M is [1 0].

Then we get our desired 2 X 2 matrix M.

These matrices are simply derived using matrix multiplication.


Type 2:
aXf bXf
Let's make it a little complex: find f(n) =(n-1) +(n-2), where a and b are
constants.
aXf bXf
This tells us, f(n+1) =(n) +(n-1).
By this far, this should be clear that the dimension of the matrices will be
equal to the number of dependencies, i.e. in this particular example, again 2.
So for A and B, we can build two matrices of size 2 X 1:

(n-1),
we need [a, b] in the first row of objective matrix M. And for the 2nd
item in B, i.e. f(n) we already have that in matrix A, so we just take that,
which leads, the 2nd row of the matrix M to [1 0]. This time we get:

Pretty simple, eh?


Type 3:

If you've survived through to this stage, you've grown much older, now let's
face a bit complex relation: find f(n) =
cXf aXf
(n-1) +(n-3)?
Ooops! A few minutes ago, all we saw were contiguous states, but here, the
state f(n-2) is missing. Now?
aXf Xf

Actually this is not a problem anymore, we can convert the relation as


follows: f(n) =(n-1) + 0(n-2) +
cXf aXf cXf Xf

(n-3), deducing f(n+1) =(n) + 0(n-1) +(n-2). Now, we see that, this is actually
a form described in Type 2. So here the objective matrix M will be 3 X 3,
and the elements are:
| a 0 c | | f(n) | | f(n+1)
| | 1 0 0 | X | f(n-1) | = |
f(n) | | 0 1 0 | | f(n-2) | |
f(n-1) |

These are calculated in the same way as type 2, if you find it difficult, try it
on pen and paper.
Type 4:
f f c

Life is getting complex as hell, and Mr, Problem now asks you to find f(n) =(n-
1) +(n-2) + where c is any constant.
Now this is a new one and all we have seen in past, after the multiplication,
each state in A transforms to its next state in B.

So , normally we can't get it through previous fashion, but how about we add
c as a state:

Now, its not much hard to design M. Here's how its done, but don't forget to
verify:

Type 5:
aXf cXf dXf e

Let's put it altogether: find f(n) =(n-1) +(n-3) +(n-4) +. Let's leave it as an exercise
for
you. First try to find out the states and matrix M. And check if it matches
with your solution. Also find matrix A and B.

Type 6:
Sometimes the recurrence is given like this:

In short:

Here, we can split the functions in the basis of odd even and keep 2 different
matrix for both of them and calculate them separately.
Type 7:
Feeling little too confident? Good for you. Sometimes we may need to
maintain more than one recurrence, where they are interested. For example,
let a recurrence re;atopm be:

Here, recurrence g(n) is dependent upon f(n) and this can be calculated in the
same matrix but of increased dimensions. From these let's at first design the
matrices A and B.

So, these are the basic categories of recurrence relations which are used to
solveby this simple technique.
Chapter 53: polynomial-time
bounded algorithm for Minimum
Vertex Cover
Variable Meaning
G Input connected un-directed graph
X Set of vertices
C Final set of vertices
This is a polynomial algorithm for getting the minimum vertex cover of
connected undirected graph. The time complexity of this algorithm is
O(n2)
Section 53.1: Algorithm Pseudo
Code
Algorithm PMinVertexCover (graph G)
Chapter 54: Dynamic Time
Warping
Section 54.1: Introduction To Dynamic
Time Warping
Dynamic Time Warping (DTW) is an algorithm for measuring similarity
between two temporal sequences which may vary in speed. For instance,
similarities in walking could be detected using DTW, even if one person was
walking faster than the other, or if there were accelerations and decelerations
during the course of an observation. It can be used to match a sample voice
command with others command, even if the person talks faster or slower than
the prerecorded sample voice. DTW can be applied to temporal sequences of
video, audio and graphics data-indeed, any data which can be turned into a
linear sequence can be analyzed with DTW.
In general, DTW is a method that calculates an optimal match between two
given sequences with certain restrictions. But let's stick to the simpler points
here. Let's say, we have two voice sequences Sample and Test, and we want
to check if these two sequences match or not. Here voice sequence refers to
the converted digital signal of your voice. It might be the amplitude or
frequency of your voice that denotes the words you say. Let's assume:

We want to find out the optimal match between these two sequences.
At first, we define the distance between two points, d(x, y) where x and y
represent the two points. Let,

Let's create a 2D matrix Table using these two sequences. We'll calculate the
distances between each point of Sample with every points of Test and find
the optimal match between them.
+------+------+------+------+------+------+------+------+
| | 0 | 1 | 1 | 2 | 2 | 3 | 5 |
+------+------+------+------+------+------+------+------+
| 0 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 1 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 2 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 3 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | | | | | | | |
+------+------+------+------+------+------+------+------+
| 6 | | | | | | | |
+------+------+------+------+------+------+------+------+

Here, Table[i][j] represents the optimal distance between two sequences if


we consider the sequence up to Sample[i] and Test[j], considering all the
optimal distances we observed before.
For the first row, if we take no values from Sample, the distance between this
and Test will be infinity. So we put infinity on the first row. Same goes for
the first column. If we take no values from Test, the distance between this
one and Sample will also be infinity. And the distance between 0 and 0 will
simply be 0. We get,
+------+------+------+------+------+------+------+------+
| | 0 | 1 | 1 | 2 | 2 | 3 | 5 |
+------+------+------+------+------+------+------+------+
| 0 | 0 | inf | inf | inf | inf | inf | inf |
+------+------+------+------+------+------+------+------+
| 1 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 2 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 3 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 5 | inf | | | | | | |
+------+------+------+------+------+------+------+------+
| 6 | inf | | | | | | |
+------+------+------+------+------+------+------+------+

Now for each step, we'll consider the distance between each points in concern
and add it with the minimum distance we found so far. This will give us the
optimal distance of two sequences up to that position. Our formula will be,

Table[i][j] := d(i, j) + min(Table[i-1][j], Table[i-1][j-1], Table[i][j-1])

For the first one, d(1, 1) = 0, Table[0][0] represents the minimum. So the
value of Table[1][1] will be 0 + 0 = 0. For the second one, d(1, 2) = 0.
Table[1][1] represents the minimum. The value will be: Table[1][2] = 0 + 0
= 0. If we continue this way, after finishing, the table will look like:
+------+------+------+------+------+------+------+------+
| | 0 | 1 | 1 | 2 | 2 | 3 | 5 |
+------+------+------+------+------+------+------+------+
| 0 | 0 | inf | inf | inf | inf | inf | inf |
+------+------+------+------+------+------+------+------+
| 1 | inf | 0 | 0 | 1 | 2 | 4 | 8 |
+------+------+------+------+------+------+------+------+
| 2 | inf | 1 | 1 | 0 | 0 | 1 | 4 |
+------+------+------+------+------+------+------+------+
| 3 | inf | 3 | 3 | 1 | 1 | 0 | 2 |
+------+------+------+------+------+------+------+------+
| 5 | inf | 7 | 7 | 4 | 4 | 2 | 0 |
+------+------+------+------+------+------+------+------+
| 5 | inf | 11 | 11 | 7 | 7 | 4 | 0 |
+------+------+------+------+------+------+------+------+
| 5 | inf | 15 | 15 | 10 | 10 | 6 | 0 |
+------+------+------+------+------+------+------+------+
| 6 | inf | 20 | 20 | 14 | 14 | 9 | 1 |
+------+------+------+------+------+------+------+------+

The value at Table[7][6] represents the maximum distance between these


two given sequences. Here 1 represents the maximum distance between
Sample and Test is 1.

Now if we backtrack from the last point, all the way back towards the
starting (0, 0) point, we get a long line that moves horizontally, vertically
and diagonally. Our backtracking procedure will be: if Table[i-1][j-1] <=
Table[i-1][j] and Table[i-1][j-1] <= Table[i][j-1]

We'll continue this till we reach (0, 0). Each move has its own meaning:

Our pseudo-code will be:

Sample [i] is matched Test [j], then i -j We can also add a locality constraint.
with | That is, we require that if | is no larger
than w, a window parameter.
Complexity:
The complexity of computing DTW is O(m * n) where m and n represent the
length of each sequence. Faster techniques for computing DTW include
PrunedDTW, SparseDTW and FastDTW.
Applications:
Spoken word recognition
Correlation Power Analysis
Chapter 55: Fast Fourier
Transform
The Real and Complex form of DFT (Discrete Fourier Transforms) can be
used to perform frequency analysis or synthesis for any discrete and
periodic signals. The FFT (Fast Fourier Transform) is an implementation
of the DFT which may be performed quickly on modern CPUs.
Section 55.1: Radix 2 FFT
The simplest and perhaps best-known method for computing the FFT is the
Radix-2 Decimation in Time algorithm. The Radix-2 FFT works by
decomposing an N point time domain signal into N time domain signals each
composed of a single point

.
Signal decomposition, or ‘ decimation in time ’ is achieved by bit reversing
the indices for the array of time domain data. Thus, for a sixteen-point signal,
sample 1 (Binary 0001) is swapped with sample 8 (1000), sample 2 (0010) is
swapped with 4 (0100) and so on. Sample swapping using the bit reverse
technique can be achieved simply in software, but limits the use of the Radix
2 FFT to signals of length N = 2^M.
The value of a 1-point signal in the time domain is equal to its value in the
frequency domain, thus this array of decomposed single time-domain points
requires no transformation to become an array of frequency domain points.
The N single points; however, need to be reconstructed into one N-point
frequency spectra. Optimal reconstruction of the complete frequency
spectrum is performed using butterfly calculations. Each reconstruction stage
in the Radix-2 FFT performs a number of two point butterflies, using a
similar set of exponential weighting functions, Wn^R.
The FFT removes redundant calculations in the Discrete Fourier Transform
by exploiting the periodicity of Wn^R. Spectral reconstruction is completed
in log2(N) stages of butterfly calculations giving X[K]; the real and
imaginary frequency domain data in rectangular form. To convert to
magnitude and phase (polar coordinates) requires finding the absolute value,
√ (Re2 + Im2), and argument, tan-1(Im/Re).

The complete butterfly flow diagram for an eight point Radix 2 FFT is shown
below. Note the input signals have previously been reordered according to
the decimation in time procedure outlined previously.
The FFT typically operates on complex inputs and produces a complex
output. For real signals, the imaginary part may be set to zero and real part
set to the input signal, x[n], however many optimisations are possible
involving the transformation of real-only data. Values of Wn^R used
throughout the reconstruction can be determined using the exponential
weighting equation.
The value of R (the exponential weighting power) is determined the current
stage in the spectral reconstruction and the current calculation within a
particular butterfly.
Code Example (C/C++)
A C/C++ code sample for computing the Radix 2 FFT can be found below.
This is a simple implementation which works for any size N where N is a
power of 2. It is approx 3x slower than the fastest FFTw implementation, but
still a very good basis for future optimisation or for learning about how this
algorithm works.
if ((NN != N) || (NN == 0)) // Check N is a power of 2. return
false;

return
true;
}
void rad2FFT(int N, complex *x,
complex *DFT) { int M = 0;
// Check if power of two. If not, exit if (!isPwrTwo(N,
&M)) throw "Rad2FFT(): N must be a power of 2 for Radix
FFT";
// Integer Variables
int BSep; // BSep is memory spacing between butterflies int BWidth; // BWidth is memory
spacing of opposite ends of the butterfly int P; // P is number of similar Wn's to be used in that stage
int j; // j is used in a loop to perform all calculations in each stage int stage = 1; // stage is the
stage number of the FFT. There are M stages in total
(1 to M).
int HiIndex; // HiIndex is the index of the DFT array for the top value of each butterfly calc
int iaddr; // bitmask for bit reversal int ii; // Integer bitfield for bit reversal (Decimation in Time)
int MM1 = M - 1;
unsigned int i; int l;
unsigned int nMax = (unsigned
int)N;
// Double Precision Variables double TwoPi_N = TWOPI / (double)N; // constant to save computational
time. = 2*PI / N double TwoPi_NP;
// complex Variables (See 'struct complex') complex WN; // Wn is the exponential weighting
function in the form a + jb complex TEMP; // TEMP is used to save computation in the butterfly
calc complex *pDFT = DFT; // Pointer to first elements in DFT array complex *pLo; //
Pointer for lo / hi value of butterfly calcs complex *pHi; complex *pX; // Pointer to x[n]
// Decimation In Time - x[n] sample sorting
for (i = 0; i < nMax; i++, DFT++)
{ pX = x + i; // Calculate current x[n] from base address *x and index i.
ii = 0; // Reset new address for DFT[n] iaddr =
i; // Copy i for manipulations for (l = 0; l < M; l++) // Bit
reverse i and store in ii...
{ if (iaddr & 0x01) // Detemine least significant bit ii += (1 << (MM1 - l)); //
Increment ii by 2^(M-1-l) if lsb was 1 iaddr >>= 1; // right shift iaddr to test next bit.
Use logical operations for speed increase if (!iaddr) break; }
DFT = pDFT + ii; // Calculate current DFT[n] from base address *pDFT and bit reversed index ii

DFT->Re = pX->Re; // Update the complex array with address sorted time domain signal x[n]
DFT->Im = pX->Im; // NB: Imaginary is always zero }
// FFT Computation by butterfly calculation for (stage = 1; stage <= M; stage++)
// Loop for M stages, where 2^M = N {
BSep = (int)(pow(2, stage)); // Separation between butterflies = 2^stage
P = N / BSep; // Similar Wn's in this stage = N/Bsep
BWidth = BSep / 2; // Butterfly width (spacing between opposite points) = Separation / 2. TwoPi_NP
TwoPi_N*P;
for (j = 0; j < BWidth; j++) // Loop for j calculations per butterfly
{ if (j != 0) // Save on calculation if R = 0, as WN^0 = (1 + j0) {
//WN.Re = cos(TwoPi_NP*j)
WN.Re = cos(TwoPi_N*P*j); // Calculate Wn (Real and Imaginary)
WN.Im = -sin(TwoPi_N*P*j);
}
for (HiIndex = j; HiIndex < N; HiIndex += BSep) // Loop for HiIndex Step BSep butterflies per stage
{ pHi = pDFT + HiIndex; // Point to higher value pLo = pHi +
BWidth; // Point to lower value (Note VC++ adjusts for spacing between elements)
if (j != 0) // If exponential power is not zero...
{
//CMult(pLo, &WN, &TEMP); // Perform complex multiplication of Lovalue with Wn
TEMP.Re = (pLo->Re * WN.Re) - (pLo->Im * WN.Im);
TEMP.Im = (pLo->Re * WN.Im) + (pLo->Im * WN.Re);
//CSub (pHi, &TEMP, pLo); pLo->Re = pHi->Re - TEMP.Re; // Find new Lovalue
(complex subtraction) pLo->Im = pHi->Im - TEMP.Im;
//CAdd (pHi, &TEMP, pHi); // Find new Hivalue (complex addition) pHi
= (pHi->Re + TEMP.Re); pHi->Im = (pHi->Im + TEMP.Im);

else

{
TEMP.Re = pLo->Re;
TEMP.Im = pLo->Im;
//CSub (pHi, &TEMP, pLo); pLo->Re = pHi->Re - TEMP.Re; // Find new Lovalue
(complex subtraction) pLo->Im = pHi->Im - TEMP.Im;
//CAdd (pHi, &TEMP, pHi); // Find new Hivalue (complex addition) pHi
= (pHi->Re + TEMP.Re); pHi->Im = (pHi->Im + TEMP.Im);
}
}
}
}
Section 55.2: Radix 2 Inverse FFT
Due to the strong duality of the Fourier Transform, adjusting the output of a
forward transform can produce the inverse FFT. Data in the frequency
domain can be converted to the time domain by the following method:
1. Find the complex conjugate of the frequency domain data by inverting
the imaginary component for all instances of K.
2. Perform the forward FFT on the conjugated frequency domain data.
3. Divide each output of the result of this FFT by N to give the true time
domain value.
4. Find the complex conjugate of the output by inverting the imaginary
component of the time domain data for all instances of n.
Note: both frequency and time domain data are complex variables. Typically
the imaginary component of the time domain signal following an inverse FFT
is either zero, or ignored as rounding error. Increasing the precision of
variables from 32-bit float to 64-bit double, or 128-bit long double
significantly reduces rounding errors produced by several consecutive FFT
operations.
Code Example (C/C++)
Appendix A: Pseudocode
Section A.1: Variable aectations
You could describe variable affectation in different ways.
Typed
Section A.2: Functions
As long as the function name, return statement and parameters are clear,
you're fine.

or

or

are all quite clear, so you may use them. Try not to be ambiguous with a
variable affectation
THE END

You might also like