Randomized Algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

MODULE V

APPROXIMATION ALGORITHMS
A large number of optimization problems which are required to be solved in practice are
NP-hard. For such problems, it is not possible to design algorithms that can nd exactly
optimal solution to all instances of the problem in time which is polynomial in the size of
the input. Of course, we assume that P NP. If this stringent requirement is relaxed, the
problems may be solved reasonably well.

One possible approach of relaxation is to provide near-optimal solution rather than an
optimal solution keeping the running time of the algorithm in polynomial. This gives the
notion of the approximate solution of an optimization problem. Some problems (e.g.
Knapsack, Scheduling, Bin Packing, etc.) seem to be easy to approximate. On the other
hand, there are problems (e.g. Graph Coloring, Travelling Salesman, Clique, etc.) are so
hard that even nding very poor approximation can be shown to be NP-hard. Of course,
there is a class of problems (e.g., Vertex Cover, Euclidean Travelling Salesman, Steiner
Trees) which seem to be of intermediate complexity.

Vertex Cover
[Instance] Graph G = (V , E).
[Feasible Solution] A subset C V such that at least one vertex of every edge of G
belongs C.
[Value] The value of the solution is the size of the cover |C|, and the goal is to minimize
it.
Greedy Algorithm I
1. Take any edge of e E and choose one of its vertex (say, v)
2. Add v to C.
3. Remove all edges incident on v from E.
4. Repeat the process till all edges are removed from E.
Greedy Algorithm II
1. Take a vertex v V of maximum degree in the current graph.
2. Add v to C.
3. Remove all edges of E incident on v from E.
4. Repeat the process till all edges are removed from E.

Greedy Algorithm III
1. Take any edge (u, v) E.
2. Add both vertices u and v to C.
3. Remove all edges of E incident on u or v from E.
4. Repeat the process till all edges are removed from E.

Planar Graph Coloring
The problem of coloring the vertices of a graph is to assign color to vertices such that
no two adjacent vertices have the same color.
The goal is to minimize the number of colors.
The problem of deciding whether a planar graph is 3-colorable is NP-complete.
The problem is to design an absolute approximation algorithm for coloring planar
graphs such that the difference between an optimal solution and the solution obtained
by the algorithm is at most 2.

STRING MATCHING ALGORITHMS
String processing problem
Input: Two strings T and P.
Problem: Find if P is a substring of T.

Example (1): Input: T = gtgatcagatcact, P = tca
Output: Yes. gtgatcagatcact, shift=4, 9

Example (2):
Input: T = 189342670893, P = 1673
Output: No.

Nave Algorithm (T, P)
suppose n = length(T), m = length(P);
for shift s=0 through n-m do
if (P[1..m] = = T[s+1 .. s+m]) then // actually a for-loop runs here
print shift s;
End algorithm.

Complexity: O((n-m+1)m)
Note: Too many repetition of matching of characters.

Rabin-Karp scheme
Consider a character as a number in a radix system, e.g., English alphabet as in radix-
26. Pick up each m-length "number" starting from shift=0 through (n-m).
So, T = gtgatcagatcact, in radix-4 (a/0, t/1, g/2, c/3) becomes
gtg = '212' in base-4 = 32+4+2 in decimal,
tga = '120' in base-4 = 16+8+0 in decimal,
.
Then do the comparison with P - number-wise.
Advantage: Calculating strings can reuse old results.
Consider decimals: 4359 and 3592
3592 = (4359 - 4*1000)*10 + 2
General formula: t
s+1
= d (t
s
- d
m-1
T[s+1]) + T[s+m+1], in radix-d, where t
s
is the
corresponding number for the substring T[s..(s+m)]. Note, m is the size of P.

The first-pass scheme: (1) preprocess for (n-m) numbers on T and 1 for P, (2) compare the
number for P with those computed on T.
Problem: in case each number is too large for comparison
Solution: Hash, use modular arithmetic, with respect to a prime q.
New recurrence formula:
t
s+1
= (d (t
s
- h T[s+1]) + T[s+m+1]) mod q, where h = d
m-1
mod q.
q is a prime number so that we do not get a 0 in the mod operation.
Now, the comparison is not perfect, may have spurious hit (Figure 5.1 (a) & (b)).
So, we need a nave string matching when the comparison succeeds in modulo math.

Figure 5.1 Rabin-Karp Algorithm

Rabin-Karp Algorithm:
Input: Text string T, Pattern string to search for P, radix to be used d (= ||, for alphabet ), a
prime q
Output: Each index over T where P is found

Rabin-Karp-Matcher (T, P, d, q)
n = length(T); m = length(P);
h = d
m-1
mod q;
p = 0; t
0
= 0;
for i = 1 through m do // Preprocessing
p = (d*p + P[i]) mod q;
t
0
= (d* t
0
+ T[i]) mod q;
end for;
for s = 0 through (n-m) do // Matching
if (p = = t
s
) then
if (P[1..m] = = T[s+1 .. s+m]) then
print the shift value as s;
if ( s < n-m) then
t
s+1
= (d (t
s
- h*T[s+1]) + T[s+m+1]) mod q;
end for;
End algorithm.

Complexity:

Preprocessing: O(m)

Matching:
O(n-m+1)+ O(m) = O(n), considering each number matching is constant time.

However, if the translated numbers are large (i.e., m is large), then even the number matching
could be O(m). In that case, the complexity for the worst case scenario is when every shift is
successful ("valid shift"), e.g., T=a
n
and P=a
m
. For that case, the complexity is O(nm) as
before. But actually, for c hits, O((n-m+1) + cm) = O(n+m), for a small c, as is expected in
the real life.

TOPOLOGICAL SORT
Any partial order can be represented by a directed acyclic graph (DAG) G = (V,E)
A topological sort is an ordering of all of Gs vertices v1, v2, , vn such that
Before: for every edge (vi,vk) in E, i<k.
After: list all nodes of G such that all arrows are pointing to the right.
There are often many possible topological sorts of a given DAG


The above DAG has many valid topological sorts, including:
7, 5, 3, 11, 8, 2, 9, 10 (visual left-to-right, top-to-bottom)
3, 5, 7, 8, 11, 2, 9, 10 (smallest-numbered available vertex first)
5, 7, 3, 8, 11, 10, 9, 2 (fewest edges first)
7, 5, 11, 3, 10, 8, 9, 2 (largest-numbered available vertex first)
7, 5, 11, 2, 3, 8, 9, 10 (attempting top-to-bottom, left-to-right)
3, 7, 8, 5, 11, 10, 2, 9 (arbitrary)
Each topological order is a feasible schedule.
Algorithm
Assume indegree is stored with each node.
Repeat until no nodes remain:
Choose a node of zero indegree and output it.
Remove the node and all its edges and update indegres.

DETERMINISTIC AND NON-DETERMINISTIC ALGORITHMS
Algorithm is deterministic if for a given input the output generated is same for a
function. A mathematical function is deterministic. Hence the state is known at every step of
the algorithm. Algorithm is non deterministic if there are more than one path the algorithm
can take. Due to this, one cannot determine the next state of the machine running the
algorithm.
Deterministic Algorithms : Characteristics
- a computational problem ; Eg : Sorting.
A - a deterministic algorithm to solve . Eg : Selection Sort.
At every point during an execution of algorithm A over I , the next move of A is
uniquely well-defined.
The execution and running time, intermediate steps and the final output computed are
the same during each execution of A over I .
The course followed by the algorithm does not vary with execution as long as the
input I remains the same.
Non-Deterministic Algorithms
A non-deterministic algorithm is correct if (a) there is a sequence of choices that leads to
success; and (b) any sequence of choices that leads to success correctly solves the problem.
Implementation
The non-deterministic algorithm is considered as defining the state space, and any program
that exhaustively searches that space is a valid implementation of the non-deterministic
algorithm. The search technique is up to the implementor; he can use depth-first search,
breadth-first, iterative deepening ...
EXAMPLE: Exact Set Cover
Given: A set O and a collection C of subsets of O.
Find: A subcollection D of C such that every element of O is in exactly one set in D. (In
other words, the union of the sets in D is O and the intersection of any two sets in D is
empty.)
Example:
O = { a,b,c,d,e,f,g,h,i,j,k,l }
C = { C1, C2, C3, C4, C5, C6, C7, C8 } where
C1 = {a, b, c}
C2 = {a, d}
C3 = {b, c, d, e}
C4 = {b, c, h, k}
C5 = {c, f, g}
C6 = {e, g, i}
C7 = {f, j, l}
C8 = {f, k}
Solution: D={C2, C4, C6, C7}
(Note: Exact Set Cover is a problem where you cannot a priori predict the depth of the
shallowest goal state, so iterative deepening may well to work much better than DFS.)
Non-deterministic algorithm:
XCOVER1(in O,C)
{ D := empty;
DU := empty; /* union of elements in D */
while (there are elements in O not in DU) {
choose set X in C such that X does not overlap DU;
add X to D;
DU := DU union X;
}
return(D)
}
Corresponding search space: State: A subcollection of C.
Operator on state S: Add to S an element of C that does not overlap the union of the sets in
S.
Start state: The empty collection.
Goal state: An exact set cover.
#A problem with the above algorithm is that it generates each collection D in every possible
order. For example, one branch will first choose X=C2, then X=C4, then X=C6, then X=C7.
A different branch will first choose X=C7, then X=C2, then X=C6, then X=C4, finding the
same solution in a different order.

LOWER BOUND THEORY

Searching ordered lists with ComparisonBased Algorithms
Comparison-Based Algorithms: Information can be gained only by comparing keyto
element, or elementtoelement (in some problems).

Given: An integer n, a key, and an ordered list of n values.
Question: Is the key in the list and, if so, at what index?

We have an algorithm. We don't know what it is, or how it works. It accepts n, a key and a
list on n values. It MUST, though, work pretty much as follows:

1) It must calculate an index for the first compare based solely upon n since it has not yet
compared the key against anything, i.e., it has not yet obtained any additional information.
Notice, this means for a fixed value of n, the position of the first compare is fixed for all data
sets (of size n).

2) The following is repeated until the key is found, or until it is determined that no location
contains the key:
The key is compared against the item at the specified index.
a) If they are equal, the algorithm halts.
b) If the key is less, then it incorporates this information and computes a new index.
c) If the key is greater, then it incorporates this information and computes a new index

There are no rules about how this must be done. In fact we want to leave it wide open so
that we are not eliminating any possible algorithm.

After the first compare, there are two possible second compare locations (indexes). Neither
depends upon the key or any item in the list: Just upon the result of the first compare. Every
second compare on every set of n items will be one of these two locations.

Every third compare will be one of four locations. Every fourth compare will be one of eight
locations. And, so on. In fact, we may look at an algorithm (for a given n) as being described
by (or, possibly, describing) a binary tree in which the root corresponds to the first
comparison, it's children to the possible second comparisons, their four children represent the
possible third comparison, etc. This binary tree, called in this context a "decision tree," then
depicts for this algorithm every possible path of comparisons that could be forced by any
particular key and set of n values.

Observation 0: Every comparisonbased search algorithm has it's own set of decision tree's
(one for each value of n) even if we don't know what or how it does its task, we know it has
one for each n and are pretty much like the one described above.
Observation 1: For any decision tree and any rootleaf path, there is a set of date which will
force the algorithm to take that path. The number of compares with a given data set (key and
n values) is the number of nodes in the "forced" rootleaf path.
Observation 2: The longest rootleaf path is the "worst case" running time of the algorithm.
Observation 3: For any position i of {1, 2, . . . , n}, some data set contains the key in that
position. So every algorithm must have a compare for every index, that is, the decision tree
must have at least one node for each position.

Therefore, all decision treesfor the search problemmust have at least n nodes in them.

All binary trees with n nodes have a rootleaf path with at least log
2
(n+1) nodes (you can
verify this by induction).

Thus, all decision trees defined by search algorithms on n items, have a path requiring
log
2
(n+1) compares.

Therefore, the best any comparisonbased search algorithm can hope to do is log
2
n
log
2
(n+1).

This is the comparisonbased lower bound for the problem of searching an ordered list of n
items for a given key.

Comparison Based Sorting

Here, there is no "key." Typically, in comparisonbased sorting, we will compare values in
two locations and, depending upon which is greater, we might
1) do nothing,
2) exchange the two values,
3) move one of the values to a third position,
4) leave a reference (pointer) at one of the positions,
5) etc.

As with searching, above, each sorting algorithm has its own decision tree. Some differences
occur: Leaf nodes are the locations which indicate "the list is now sorted." Internal nodes
simply represent comparisons on the path to a leaf node.

As with searching, we will determine the minimum number of nodes any sorting algorithm
(comparisonbased) must have. Then, from that, the minimum height of all decision trees (for
sorting) can be determined providing the proof that all comparisonbased sorting algorithms
must use at least this many comparisons.

Actually, it is quite simple now. Every decision tree starts with an unordered list and ends up
at a leaf node with a sorted list. Suppose you have two lists, one is a rearrangement of the
other. Then, in sorting them, something must be done differently to one of the lists ( done at a
different time). Otherwise, if the same actions are performed to both lists in exactly the same
sequence, then one of them can not end up sorted. Therefore, they must go through different
paths from the root of the decision tree. By the same reasoning, all n! different permutations
of the integers 1, 2, . . ., n (these are valid things to sort, too, you know) must go through
distinct paths. Notice that distinct paths end in distinct leaf nodes. Thus, there must be n! leaf
nodes in every decision tree for sorting. That is, their height is at least log
2
(n!). By a common
result (one of Sterlings formula's) log
2
(n!) nlog
2
(n).

Therefore, all comparisonbased sorting algorithms require nlog
2
(n) time.

Oracles and Adversary Arguments
Oracles: Given some computational model, the oracle tells the outcome of each
comparison. In order to derive a good lower bound, the oracle tries its best to cause the
algorithm to work as hard as it might.

Adversary arguments: A method for obtaining lower bounds
Merging two sorted lists
Its a game between the adversary and the (unknown) algorithm.
The adversary has the input and the algorithm asks questions to the adversary
about the input.
The adversary tries to make the algorithm work the hardest by adjusting the input
(consistently).
It wins the game after the lower bound time (lower bound proven) if it is able to
come up with two different inputs.

RANDOMIZED ALGORITHMS
Characteristics
RA - a randomized algorithm to solve .
At every point during an execution of algorithm RA over I , the next move of A can
possibly be determined by employing randomly chosen bits and is not uniquely well-
defined.
The execution and running time, intermediate steps and the final output computed
could possibly vary for different executions of RA over the same I .
The course followed by the algorithm varies for different executions even if the input
I remains the same.
Why Randomization ?
Randomness often helps in significantly reducing the work involved in determining a
correct choice when there are several but finding one is very time consuming.
Reduction of work (and time) can be significant on the average or in the worst case.
Randomness often leads to very simple and elegant approaches to solve a problem or
it can improve the performance of the same algorithm.
Risk: loss of confidence in the correctness. This loss can be made very small by
repeated employment of randomness.
Assumes the availability of truly unbiased random bits which are very expensive to
generate in practice.
Randomized Version of Quick Sort : RandQSort(A; p; q):
1. If p q, EXIT.
2. Choose uniformly at random r { p,,q}.
3. s correct position of A[r ] in the sorted order.
4. Move randomly chosen pivot A[r ] into position s.
5. Move the remaining elements into "appropriate" positions.
6. RandQSort(A, p, s - 1);
7. RandQSort(A, s + 1, q).

Las Vegas algorithm
A Las Vegas algorithm is a randomized algorithm that always gives correct results;
that is, it always produces the correct result or it informs about the failure. In other words, a
Las Vegas algorithm does not gamble with the correctness of the result; it gambles only with
the resources used for the computation.
A simple example is randomized quicksort, where the pivot is chosen randomly, but
the result is always sorted. The usual definition of a Las Vegas algorithm includes the
restriction that the expected run time always be finite, when the expectation is carried out
over the space of random information, or entropy, used in the algorithm.

You might also like