MADF Unit 4
MADF Unit 4
No match No match
No match match
• Invalid shift: Pattern P does match with Text string T with any of the
total shift.
≠ ≠
2) Worst case also occurs when only the last character is different.
txt[] = "VVVVVVVVVVVVK";
pat[] = "VVVK";
0 1 2 3 45
Step 1: Construct the Bad Match Table for P: ABACAB
c A B C *
Last ( c) 4 5 3 -1
c A B C *
• Step 2: Searching P in T Last (
c) 4 5 3 -1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A B Shift by 1 unit
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A B Shift by 1 unit
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A B Shift by 1 unit
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A B Shift by 6 units
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A B Shift by 1 units
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A B A C A A B A D C A B A C A B A A B B
A B A C A B
A B A C A A B A D C A B c X Y Z *
Last (
c) 0 1 2 -1
X Y Z
X Y Z
X Y Z Best Time Complexity= (12/3)=(n/m)
X Y Z
Worst Case Text Length=12
Pattern Length=3
0 1 2 3 4
A A A A A c A *
Last (
c) 2 -1
A A A
a b
n
t e
at
t y t e
ant any
bet
t
beat
Performance of Standard Trie
• The worst case for the number of nodes of a trie occurs when no two
strings share a common non-empty prefix; that is, except for the
root, all internal nodes have one child.
• A trie T for a set S of strings can be used to implement a dictionary
whose keys are the strings of S.
• A search for a string X is performed in T by tracing down from the
root the path indicated by the characters in X.
• If this path can be traced and terminates at an external node, then X
is in the dictionary. If the path cannot be traced or the path can be
traced but terminates at an internal node, then X is not in the
dictionary.
Important structural properties of a standard trie
• Thus, nodes in a compressed trie are labeled with strings, which are
substrings of strings in the collection, rather than with individual
characters.
2) $
$ 8,8
3) e$
$ e$ 8,8 7,8
4) ze$
5) ize$
7) imize$
9) inimize$
$ e$
i mi nimize$ ze$ 8,8 7,8 1,1 0,1 2,8 6,8
• A) Standard trie
e i m n z
m z i n i e
n
i i e i m
z
m i
z m e
z
e i i
z z e
e e
B) Compressed Trie
e mi
i nimize ze
C) Suffix Trie
H:2 A:3
•Keep repeating Step-01 and Step-02 until all the nodes form a single tree.
•The tree finally obtained is the desired Huffman Tree.
A C E H I 10 15
3 5 8 2 7
5
C:5 I:7 E:8
H:2 A:3 C:5 I:7 E:8
H:2 A:3
25
H:2 A:3
10 15
I:7 E:8 10
5
C:5 I:7 E:8
5
C:5
H:2 A:3
H:2 A:3
Deriving Huffman tree Starting at the root, trace down to
every leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .
25
0 1
Generating Huffman code: Collecting the 0s and 1s for each
10 path from the root to a leaf and assigning a 0-1 code word for
15 each symbol .
0
1 0 1
5 E=11
C:5 I:7 E:8
I =10
0 1
C= 01
H:2 A:3 A = 001
H= 000
Input: ACE
Output: (001)(01)(11)
• Decoding
• Read compressed file & binary tree
• Use binary tree to decode file
• Follow path from root to leaf
25 Input: 0010111
0
0 1
00
001(A)
10 15 A0
0 A01
1 0 1
5 AC1
C:5 I:7 E:8 AC11
0 1 ACE
Output: ACE
H:2 A:3
Text Compression
Compression
• Text compression involves changing the representation of a file
so that the compressed output takes less space to store, or less
time to transmit, but still the original file can be
reconstructed exactly from its compressed representation.
Huffman Coding for Text Compression
• Text compression algorithms aim at statistical reductions in the
volume of data.
• One commonly used compression algorithm is Huffman coding, which
makes use of information on the frequency of characters to assign
variable-length codes to characters.
• Standard encoding schemes, such as the ASCII and Unicode systems,
use fixed- length binary strings to encode characters (with 7 bits in
the ASCII system and 16 in the Unicode system).
Huffman Coding
• Huffman Coding is a famous Greedy Algorithm.
• It is used for the lossless compression of data.
• It uses variable length encoding.
• It assigns variable length code to all the characters.
• The code length of a character depends on how frequently it occurs in
the given text.
• The character which occurs most frequently gets the smallest code.
• The character which occurs least frequently gets the largest code.
• It is also known as Huffman Encoding.
Prefix Rule-
H:2 A:3
•Keep repeating Step-01 and Step-02 until all the nodes form a single tree.
•The tree finally obtained is the desired Huffman Tree.
A C E H I 10 15
3 5 8 2 7
5
C:5 I:7 E:8
H:2 A:3 C:5 I:7 E:8
H:2 A:3
25
H:2 A:3
10 15
I:7 E:8 10
5
C:5 I:7 E:8
5
C:5
H:2 A:3
H:2 A:3
Deriving Huffman tree Starting at the root, trace down to
every leaf (mark ‘0’ for a left branch and ‘1’ for a right branch) .
25
0 1
Generating Huffman code: Collecting the 0s and 1s for each
10 path from the root to a leaf and assigning a 0-1 code word for
15 each symbol .
0
1 0 1
5 E=11
C:5 I:7 E:8
I =10
0 1
C= 01
H:2 A:3 A = 001
H= 000
Input: ACE
Output: (001)(01)(11)
• Decoding
• Read compressed file & binary tree
• Use binary tree to decode file
• Follow path from root to leaf
25 Input: 0010111
0
0 1
00
001(A)
10 15 A0
0 A01
1 0 1
5 AC1
C:5 I:7 E:8 AC11
0 1 ACE
Output: ACE
H:2 A:3
COMP 6.2 DAA UNIT – 4, Part 1
Class P
P is the class of all decision problems that are polynomially bounded. The implication is
that a decision problem X in P can be solved in polynomial time on a deterministic
computation model (or can be solved by deterministic algorithm).
A deterministic machine, at each point in time, executes an instruction. Depending on
the outcome of executing the instruction, it then executes some next instruction, which
is unique.
The class P consists of problems that are solvable in polynomial time. These problems are
also called tractable. The advantages in considering the class of polynomial-time
algorithms is that all reasonable deterministic single processor model of
computation can be simulated on each other.
1
COMP 6.2 DAA UNIT – 4, Part 1
OPTIMIZATION PROBLEM
Optimization problems are those for which the objective is to maximize or minimize some
values. For example,
Finding the minimum number of colors needed to color a given graph.
Finding the shortest path between two vertices in a graph.
DECISION PROBLEM
There are many problems for which the answer is a Yes or a No. These types of problems are
known as decision problems. For example,
Whether a given graph can be colored by only 4-colors.
Finding Hamiltonian cycle in a graph is not a decision problem, whereas checking a
graph is Hamiltonian or not is a decision problem.
NON-DETERMINISTIC ALGORITHMS
The deterministic algorithms has the property that the result of every operation is uniquely
defined, whereas the non-deterministic algorithm contains operations whose outcomes are not
uniquely defined but are limited to specified set of possibilities.
2
COMP 6.2 DAA UNIT – 4, Part 1
3
COMP 6.2 DAA UNIT – 4, Part 1
REDUCTION ( )
How to prove some problems are computationally difficult?
Consider the statement: “Problem X is at least as hard as problem Y”,
To prove such a statement: Reduce problem Y to problem X.
(i) If problem Y can be reduced to problem X, we denote this by Y ≤P X or X Y.
(ii) This means “Y is polynomal-time reducible to X.”
(iii) It also means that X is at least as hard as Y because if we can solve X, we can solve Y.
4
COMP 6.2 DAA UNIT – 4, Part 2
Problem Description: Given a graph G = (V, E) and a positive integer K, the Clique Decision
Problem is to determine whether the graph G contains a CLIQUE of size K or not.
For the given graph G=(V, E), use the subset V’ V of vertices in the clique as a certificate for G.
Checking V’ is a clique in polynomial time for each pair u V’, v V’ and the edge (u,v) E,
requires O(n2) .
To Show CDP is NP-hard: [Reduce an instance of known NP-hard problem into CDP Instance]
Proof: Let F = ⋀ can be a propositional calculus formula in CNF having K clauses and
let xi for 1≤i≤n be the n boolean variables or literals used in F. Construct from F, a graph G=(V,E)
such that G has a clique of size atleast K if and only if F is satisfiable.
For any F, the graph G = (V, E) is defined as follows:
The vertices V = {<a, i> | a is a literal / variable in the clause Ci (i.e) a Ci}
The edges E = {(<a,i>,<b,j>) | a and b belongs to different Clauses (i.e) i≠ j & a≠b’}
Proof of Claim: If F is satisfiable, then there is a set of truth values for all xi such that each
clause is true with this assignment.
Let S = { <a,i> | a is true in Ci} be a set containing exactly one <a,i> for each i. Between any two
nodes <a,i> and <b,j> in S there is an edge in G since i≠j and both a and b have the true value.
Thus S forms a clique of size K.
Example: Please refer Class-work note-book.
COMP 6.2 DAA UNIT – 4, Part 2
Problem Description: A set S V is a node cover for a graph G=(V,E) if and only if all edges in E
are incident to atleast one vertex in the set S.
To Show NCDP is in Class – NP: [Verify the vertex cover in polynomial time]
For the given graph G=(V, E), use the subset S V of vertices in the vertex cover as a
certificate for G. Checking vertices in S is covering all the edges of E requires O(|E|),
Linear
Hence the verification of node cover is done in polynomial time, NCDP is Class NP.
To Show NCDP is NP-hard: [Reduce an instance of known NP-hard problem into NCDP Instance]
Proof: Let G = (V, E) and K defines an instance of CDP. Let |V| = n, Construct from G, a new
graph G’=(V,E’) such that G’ has a node cover of atmost n-K vertices if and only if G has a clique
of size K.
The graph G’ = (V, E’) is defined as follows:
The edges E’ = {(u,v)|u V, v V and (u,v) E}. The graph G’ is the complement of G.
Claim: The graph G has a clique of size K if and only if G’ has a vertex cover of atmost n – K.
Proof of Claim:
Let K be any clique in G, since there are no edges in E’ connecting vertices in K, the
remaining n - |K| vertices in G’ must cover all edges in E’.
Hence, G’ can be obtained from G in polynomial time, CDP can be solved in polynomial
time if we have a polynomial time deterministic algorithm for NCDP.
So, NCDP is also NP-Hard.
Problem Description: A colouring of a graph G=(V,E) is a function f: V = {1, 2, …k} defined for
every i V. If any edge (u,v) is in E then f(u) ≠ f(v). The Chromatic number decision problem is to
determine whether G has a colouring for a given k.
To Show CNDP is in Class – NP: [Verify the coloured graph in polynomial time]
Given a graph G=(V,E) and a positive integer m is it possible to assign one of the
numbers(colours) 1, 2, ..., m to vertices of G so that for no edge in E is it true that the
vertices on that edge have been assigned the same colour.
The verification of coloured grpah is done in polynomial time and then CNDP is Class NP.
To Show CNDP is NP-hard: [Reduce an instance of known NP-hard problem into CNDP Instance]
Proof: Let F be a propositional calculus formula having atmost three literals in each clause and
having ‘r’ clauses C1, C2, …Cr. Let xi , 1≤i≤n be the ‘n’ boolean variables or literals used in F.
Construct a polynomial time graph G that is n+1 colourable if and only if F is satisfiable.
Proof of Claim:
First observe all yi form a complete sub-graph on n vertices. Since yi is connected to all
the xj and xj’ except xi and xi’, the colour ‘i’ can be assigned to only xi and xi’. However
(xi, xi’) is in E and also a new colour n+1 is needed for one of these vertices. The vertex
assigned a new colour n+1 is called a false vertex.
Each Clause has atmost three literals, each Ci is adjacent to a pair of vertices xj, xj’ for
atleast one j. so no Ci can be assigned the colour n+1. This imply that only colours that
can be assigned to Ci correspond to vertices xj or xj’ that are in clause Ci and are true
vertices. hence G is n+1 colourable if and only if there is a true vertex corresponding to
each Ci.
So G is n+1 colourable iff F is satisfiable.
COMP 6.2 DAA UNIT – 4, Part 3
Problem Description:
There are m identical processors (or) machines, P1, P2, …, Pm
There are n different jobs J1, J2, …, Jn to be processed.
Each job Ji requires some ti processing time.
A schedule S is an assignment of jobs to processors, which specifies the time interval
and the processor on which the job Ji is to be processed.
A Schedule can be either a non-preemptive schedule (the processing of a job is not
terminated until the job is complete) or a preemptive schedule.
Constraint: A job can’t be processed by more than one processor at any given time.
The problem is obtaining a minimum finish time non-preemptive schedule.
MFT (S) = ∑ , where fi is the time at which the processing of job Ji is completed.
WMFT (S) = ∑ , where wi is the weight associated with each job Ji.
Proof of Claim: With the instance of the partition problem, construct a two-processor
scheduling problem with n jobs and wi = ti = ai, 1 ≤ i ≤ n.
For this set of jobs the schedule S with weighted MFT atmost ½ ∑ 2
+ ¼ (∑ )2 iff the ai’s
have a partition.
COMP 6.2 DAA UNIT – 4, Part 3
Let the weights and times of jobs on Processor P1 be (w1, t1), (w2, t2),…(wk, tk) and on the
processor P2 be (w1’, t1’), (w2’, t2’),…(wj’, tj’) and the order the in which the jobs are processed.
Problem Description:
There are m processors P1, P2, P3…..Pm & there are n jobs J1, J2, J3…..Jn
Each job Ji, 1≤i≤n requires processing on every processor Pj, 1≤j≤m in sequence.
Each job requiring m tasks T1,i,T2,i,…Tm,i for 1≤i≤n to be performed and task Tj,i must be
assigned to processor Pj.
A Schedule for the n jobs is an assignment of tasks to time intervals on the processors.
For any job Jk the processing of task Tj, k, k>1 can not be started until task Tj-1, k has been
completed.
The problem of flow shop sequencing is to assign jobs to each assigned processors in a
manner that the every processors are engaged all the time without being left ideal.
Obtain a minimum finish time preemptive schedule is NP – hard.
To show the Minimum finish time preemptive FS schedule is NP – Hard
(i) Choose a known NP – hard problem: Partition problem.
(ii) Reduce Partition Minimum FT Preemptive FS Schedule Instance.
Proof: Let ai, 1 ≤ i ≤ n, be an instance of the Partition problem.
Let m= 3, construct the following preemptive FS instance with n+2 jobs with atmost two
non-zero tasks per job.
t1,i = ai t2,i =0 t3,i= ai, 1≤i≤n.
t1,n+1 = t2,n+1 =T t3,n+1= 0
t1,n+2 = 0 t2,n+2 =T t3,n+2= , where T = ∑ i
Claim: The constructed FS instance has a preemptive schedule with finish time at most 2T if and
only if A has a partition.
Proof of claim: If a partition problem instance A has a partition u, then there is a
non-preemptive schedule with finish time 2T. If A has no partition then all preemptive schedule
for JS must have a finish time greater than 2T. It can be shown by contradiction.
Let assume that there is a preemptive schedule for FS with finish time at most 2T:
(a) Task t1,n+1 must finish by time T as t2,n+1 = T and can not start until t1,n+1 finishes.
(b) Task t3,n+2 can not start before T units of time have elapsed as t2,n+2=T.
COMP 6.2 DAA UNIT – 4, Part 3
Let V be the set of indices of tasks completed on processor P 1 by time T excluding task t1,n+1,
∑ 1,i < as A has no partition. hence, ∑ 3,i >
Thus the processing of jobs not included in V can not commence on processor P3 until after
time T since their processor P1 processing is not completed until after T. So total amount of
processing time left for processor P3 at time T is t3,n+2 + ∑ 3,i > T.
The schedule length must therefore be more than 2T.
Problem Description:
There are m processors P1, P2, P3…..Pm & there are n jobs J1, J2, J3…..Jn
The time of the jth task of Ji is denoted as tk,i,j and the task is to be processed by Pk
The task for any job are to be carried out in the order 1, 2, 3, and so on. Task j can not
begin until task j-1 has been completed
Obtaining a minimum finish time preemptive or non-preemptive schedule is NP hard
when m=2.
To show the Minimum finish time preemptive JS schedule is NP – Hard
(i) Choose a known NP – hard problem: Partition problem.
(ii) Reduce Partition Minimum FT Preemptive JS Schedule Instance.
Proof: Let ai, 1 ≤ i ≤ n, be an instance of the Partition problem. Construct the following JS
instance with n+1 jobs and m=2 processors.
Jobs 1…n : t1,i,1 = t2,i,2 =ai for 1≤i≤n.
Job n+1 : t1,n+1,1 = t2,n+1,2 = t2,n+1,3 = t1,n+1,4 =
Claim: The job shop instance has a preemptive schedule with finish time at most 2T if and only
if A has a partition.
Proof of claim:
If A has a partition u then there is a schedule with finish time 2T.
If A has no partition then all schedules for JS must have a finish time greater than 2T.
Let assume that there is a schedule S with finish time at most 2T.
There can be no idle time on either P1 or P2.
Let R be the set of jobs scheduled for P1 in the interval [0 - ]. Let R’ be the subset of R
representing jobs whose first task is completed on P1 in this interval.
Since A has no partition ∑ i,j,1 < , consequently ∑ 2,j,2 <
Since only the second task of jobs in R’ can be scheduled on P2 in the interval [ , T]. It
follows that there is some idle time on P2 in this interval.
Hence S must have a finish time greater than 2T.
COMP 6.2 DAA UNIT – 4, Part 4
4. APPROXIMATION ALGORITHMS
An algorithm that runs in polynomial time and yields a solution close to the optimal solution is
called an approximation algorithm.
We will explore polynomial-time approximation algorithms for several NP-Hard problems.
Formal Definition:
Let P be a minimization problem and I be an instance of P.
Let A be an algorithm that finds feasible solution to instances of P.
Let A(I) is the cost of the solution returned by A for instance I and OPT(I) is the cost of the
optimal solution for I. Then, A is said to be an α-approximation algorithm for P if,
I, ≤ , where α ≥ 1.
So any minimum optimization problem A(I) ≥ OPT(I). Therefore, 1-approximation algorithm
produces an optimal solution.
An approximation algorithm with a large α may return a solution that is much worse than
optimal. So the smaller α is, the better quality of the approximation the algorithm produces.
The greedy algorithm produces set cover of size 3 by selecting the sets T1, T3 and T2 in order.
Probabilistic Algorithms
Probabilistic Algorithm
• A probabilistic algorithm is an algorithm where the result and/or the
way the result is obtained depend on chance. These algorithms are
also sometimes called randomized algorithms.
• A random source is an idealized device that outputs a sequence of
bits that are uniformly and independently distributed. For example
the random source could be a device that tosses coins, observes the
outcome, and outputs it.
• Types of Probabilistic algorithms - There is a variety of behaviours
associated with probabilistic algorithms:
• Monte Carlo algorithms
• Las Vegas algorithms
• Numerical Approximation algorithms
Pseudo Random number generation
• Random Number - A random selection of a number from a set or range of
numbers is one in which each number in the range is equally likely to be
selected.
A) True random numbers can only be generated by observations of random
physical events, like dice throws or radioactive decay. Generation of random
numbers by observation of physical events can be slow and impractical.
B) Pseudo random numbers: sequences of numbers that approximate
randomness are generated using algorithms. These numbers are inherently
non random because they are generated by deterministic mathematical
processes. Hence, these numbers are known as pseudorandom numbers.
The algorithms used to generate them are called pseudorandom number
generators.
Linear Congruence Method
• The method uses the following formula:
Xn+1 = (a * Xn + b) mod c
given seed value X0 and integer values of a, b, and c
Expected versus Average time
Las Vegas Algorithm
• Las Vegas algorithms, on the other hand, also use randomness in their approach,
but will always return the correct output.
• Example :
• Randomized Quick Sort: Here an element of a list i is chosen at random. Then the
elements of the list are compared to i create a batch of numbers L that is less
than i and a batch of numbers R that is greater than i. Then L and R are
recursively sorted, and then the three groups are placed back in order to obtain a
sorting of the original set.
• At the end of the running of this algorithm, a correct output has been obtained:
the numbers will be in sorted order. But the number of comparisons used by the
algorithm depends on the element i chosen at random.
Monte Carlo algorithms
Algorithm:
Step 1: Compute n - 1 = 2k . m , Where m is odd
Step 2: Choose a random integer a such that 1 ≤ a ≤ n-1
Step 3: Compute, b = am mod n
if ( b ≡ 1 mod n ) then return (“n is prime”)
for ( i = 0 to k – 1 )
{
if ( b ≡ -1 mod n ) then return (“n is prime”)
else
b = b2 mod n
} 61
is probable prime
return (“n is composite”)
97 will be prime
561 will be composite
NUMERICAL PROBABILISTIC ALGORITHMS
• For certain real life problems, computation of an exact solution is not
possible, maybe
• because of uncertainties in the experimental data to be used
• Because a digital computer cannot represent an irrational number exactly
• Because a precise answer will take too long to compute
• A numerical probabilistic algorithm will give an approximate answer
• The precision of the answer increases when more time is given to the
algorithm to work on
Buffon Needle problem
• Buffon's needle is an early problem in geometrical probability that
was investigated experimentally in 1777 by the French naturalist and
mathematician Comte Georges Louis de Buffon.
• It involves dropping a needle repeatedly onto a lined sheet of paper
and finding the probability of the needle crossing one of the lines on
the page. The result, surprisingly, is directly related to the value of pi.
• Dropping a needle many times on to lined paper gives an interesting
(but slow) way to find π.