0% found this document useful (0 votes)
6 views66 pages

Unit 4

The document discusses various string matching algorithms, including Brute Force, KMP, and Boyer-Moore, highlighting their time complexities. It also covers trie data structures, suffix tries, Huffman coding, and the concepts of decision and optimization problems in algorithms, particularly focusing on NP-completeness and reductions. Additionally, it touches on randomized algorithms and approximation algorithms, providing a comprehensive overview of algorithmic principles and their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views66 pages

Unit 4

The document discusses various string matching algorithms, including Brute Force, KMP, and Boyer-Moore, highlighting their time complexities. It also covers trie data structures, suffix tries, Huffman coding, and the concepts of decision and optimization problems in algorithms, particularly focusing on NP-completeness and reductions. Additionally, it touches on randomized algorithms and approximation algorithms, providing a comprehensive overview of algorithmic principles and their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

MADF Unit IV

Prof. Amrita Naik


Assistant Professor
DBCE, Goa
String Matching Algorithm
• 3 Types of Pattern matching algorithm:
• 1.Brute Force Algorithm
• 2.KMP Algorithm
• 3.Boyer-Moore Algorithm
Brute Force Algorithm
for i= 0 to n-m
{
for j=0 to m
{
if (txt[i + j] != pat[j])
break;
}
}
if (j == m)
print("Pattern found at index I”)

Time complexity=T(nm)
Pattern Matching-Brute Force
KMP Matching
KMP Algorithm
Time Complexity of KMP
• The Knuth-Morris-Pratt algorithm performs
pattern matching on a text string of length n
and a pattern string of length m in O(n+m)
time.
• Where n – is for searching and m – is for
creating the table
Boyer Moore
Boyer Moore
Time Complexity of Boyer Moore
• The worst-case running time of the BM
algorithm is O(nm)
• Average case time complexity is O(n)
Tries
• A trie (pronounced "try") is a tree-based data
structure for storing strings in order to
support fast pattern matching.
• The main application for tries is in information
retrieval. Indeed, the name "trie" comes from
the word "retrieval.“
• Stores a set of strings. We assign nodes in the
tree to letter.
Standard Trie Example

NOTE: No words in S should be prefix of the other


Compressed Tries
• A compressed trie is similar to a standard trie
and is a compact representation of standard
trie.

It will contain at least two child nodes.


Compressed Tries Example
Example of Compressed Tries
Compact Representation of
Compressed Tries
• Let the collection S of strings is an array of
strings S[0] , S[ 1], . . . , S[s - 1] . Instead of
storing the label X of a node explicitly, it can
be represented implicitly by a triplet of
integers (i, j, k) , such that X = S[i] [j..k]; that is,
X is the substring of S[i] consisting of the
characters from the jth to the kth included.
Compact Representation of
Compressed Tries
Compact Representation of
Compressed Tries
Compact Representation of
Compressed Tries
Compact Representation of
Compressed Trie
Suffix Tries
• Suffix Tree is compressed trie of all suffixes of
a given text/string.
• Application: Pattern Matching, Solving Longest
Common Subsequence problem

• Example: Consider a word “minimize”


• Suffix of minimize are: e, ze, ize, mize, imize,
nimize, inimize, minimize
Suffix Trie for string-”minimize”
Suffix Tries for string “minimize”
Text Compression
• Text compression is also useful for storing
collections of large documents more
efficiently, so as to allow for a fixed-capacity
storage device to contain as many documents
as possible.
Building Huffman Tree
• 1.Find the frequency of character

• 2.Arrange text increasing order frequency

• 3.Join lowest cost nodes

• 4.Form the tree


Huffman Algorithm
Example
• T={A,B,C} F={1,2,2} 5 A+B+C

0 1

2 3
3
C

A+B
0 A+B
1

1 2 1 2

A
B A B

Therefore the Huffman code of A=10, B=11, C=0


Huffman Coding
Huffman Coding
Huffman Coding
Time Complexity of Huffman
• The time complexity of the Huffman algorithm
is O(nlogn)
Text Similarity Testing
• String Subsequence:
• Given a string X of size n, a subsequence of X
is any string that is of the form, X [i1] X [i2] ...
X [ik], ij < i j+ I for j = 1, ..., k that is, it is a
sequence of characters that are not
necessarily contiguous but are nevertheless
taken in order from X.
• For example: Suppose W=abcd
• Subsequence={ab,bd,ac}.
LCS - Longest Common Subsequence
Problem
• Given two character strings, X of size n and Y of size m,
over some alphabet and to find a longest string S that is
a subsequence of both X and Y.
• Ex: W1={abcd} W2={bcd}
• Subsequence of
W1={a,ab,ac,ad,abc,bcd,bc,bd,b,c,d,cd,acd,abd,abcd}
• Subsequence of
W2={b,c,d,bc,cd,bd,bcd}
Then longest common subsequence={bcd}
• The brute-force approach yields an exponential
algorithm that runs in 0(2n 2m) time, which is very
inefficient.
LCS Algorithm
Longest Common Subsequence
• X=abaaba, Y=babbab
Y
* b a b b a b
* 0 0 0 0 0 0 0

a
0 0 1 1 1 1 1
b 0
1 1 2 2 2 2

X a 0
1 2 2 2 3 3
a 0
1 2 2 2 3 3
b 0
1 2 3 3 3 4
a 0
1 2 3 3 4 4

The longest common subsequence=baba


Time Complexity of LCS
• T(n)=O(nm)
Optimization & Decision
Problems
⚫ Decision problems
◦ Given an input and a question regarding a problem, determine if the
answer is yes or no

⚫ Optimization problems
◦ Find a solution with the “best” or “optimum” value

⚫ Optimization problems can be cast as decision problems


that are easier to study
◦ e.g.: Shortest path in Graph G
🞄 Find a path between u and v that uses the fewest edges (optimization)
🞄 Does a path exist from u to v consisting of at most k edges? (decision)
2
Hamiltonian Cycle
⚫ Optimization problem:
Given a directed graph G = (V, E), determine a cycle that
contains each and every vertex in V only once

Not
hamiltonian hamiltonian

⚫ Decision problem:
Given a directed graph G = (V, E), is there a cycle that contains
each and every vertex in V only once
Polynomial Algorithm and Exponential
Algorithm
Polynomial Exponential
• Linear Search-O(n) • Nqueens-O(nn)
• Binary Search-O(logn) • SumOfSubset-O(2n)
• Bubblesort-O(n2) • Mcoloring-O(nmn)
• Mergesort-O(nlogn) • HamiltonianCycle-O(nn)
• GreedyKnapsack-O(n) • 0-1 Knapsack-O(2n)
• HuffmanCod-O(nlogn) • SAT
Deterministic and Non Deterministic
Algorithm
• Deterministic Algorithm are traceable.
• Non Deterministic Algorithm are non
traceable.
• Although Exponential Algorithm cannot be
converted into Polynomial Algorithm, they can
be converted to non-deterministic Polynomial
Algorithm.
Example of Non –Deterministic Algorithm
• Non-Deterministic Search Algorithm with a
time complexity of O(1)
Algorithm NPSearch(arr,n,k)
{
i=Choice() // This step has to be figured out with O(1)
If(k=arr[i])
{
Print(“Found”)
}
Else
{
Print(“Not Found”)
}
}
Class P and NP
⚫ Class P consist of deterministic algorithm that are
solvable in polynomial time
⚫ Problems in P are called tractable.
⚫ Examples: O(n2), O(n3), O(1), O(n lg n)
⚫ Class NP
⚫ Problems not in P are intractable or unsolvable
⚫ Example:O(2n), O(nn), O(n!)
• Hamiltonian Path: Given a graph G = (V, E),
determine a path that contains each and every vertex
in V only once
• Traveling Salesman: Find a minimum weight
Hamiltonian Path.
Review: P and NP
• What do we mean when we say a problem is
in P?
– A: A deterministic solution can be found in
polynomial time
• What do we mean when we say a problem is
in NP?
– A: A non deterministic solution can be verified in
polynomial time
• What is the relation between P and NP?
– A: P  NP, but no one knows whether P = NP
Is P = NP? P

NP

⚫ Any problem in P is also in NP:


P  NP
⚫ The big (and open question) is whether P = NP
◦ i.e., if it is always easy to check a solution, should it also be
easy to find a solution?
⚫ Most computer scientists believe that this is false but we
do not have a proof .
Commonly Believed Relationship
between P and NP

P
NP
Satisfiability(SAT)
• I/P : Boolean formula
• O/P : Is formula satisfiable?
SAT Є NP
Proof : Assignment to variables
Verifier: Uses these assignments and checks
that the formula evaluates to true (T).
Example: (x1∧y1) ∨ (x2∧y2) ∨ ... ∨ (xn∧yn)
3-SAT Example
Satisfiability
• SAT is considered as the base problem which
has the same exponential time complexity as
others.
• If SAT can be solved in polynomial time then
all other problems related to it can also be
solved in polynomial time.
SAT problem can be related to other
exponential problem

0-1 Knapsack problem can also be solved using the same


decision tree
Reductions
⚫Reduction is a way in which formula of problem A can
be converted to any other problem B
⚫If we can solve A using the algorithm that can also
solve problem B.
⚫Idea: transform the inputs of A to inputs of B

yes
  yes

f Problem B no
no

Problem A
Reduction of Satisfiability
• Assume Satisfiability is NP-Hard.
• If Satisfiability(SAT) can be reduced to some other
problem(L), then the problem(L) also becomes NP-
Hard.
SAT𝝰 L
• Conversion has to be done in polynomial time.
• If SAT can be solved in polynomial time then the
problem(L) can also be solved in polynomial time
and vice-versa.
• Transitive Property=> SAT𝝰 L1 , L1𝝰 L2 , Then SAT𝝰 L2
NP-Completeness
P
NP-complete

NP

▪SAT has a non deterministic polynomial time


algorithm.
▪Hence it is NP-Complete.
▪Any of the problem has non-deterministic
polynomial algorithm, then it is NP-Complete.

9
NP-Completeness P
NP-complete

NP
⚫ A problem B is NP-complete if:
(1)B = NP
(2)SAT𝝰 B
⚫ If B satisfies only property (2) we say that B is NP-hard
⚫ Nopolynomial time algorithm has been found for an NP-
Complete problem
⚫ No one has ever proven that “no polynomial time algorithm
can exist for any NP-Complete problem”
Relationship among P,NP,NP-Complete
and NP-Hard
NP

NP-Complete
Joshi Jesal
COOK’s Theorem
• If SAT has an efficient algorithm then so does
other problems in NP.

There is an efficient algo. for SAT



There is an efficient algo for all problems in NP.

• If SAT can be proved to be solved in


Polynomial time, then all other problems can
also be solved in polynomial time
NP Hard and NP-Complete
• A problem A is NP-hard if and only if
satisfiability reduces to A (satisfiability α A).
• A problem A is NP-complete if and only if A is
NP-hard and A Є NP.
Randomized Algorithm
• Randomised Algorithm: is an algorithm that
employs a degree of randomness as part of its
logic.
Las Viegas and Monte Carlo
Probabilistic Algorithm
• A probabilistic algorithm is an algorithm
where the result and/or the way the result is
obtained depend on chance.
• These algorithms are also sometimes called
randomized.
• The techniques of applying probabilistic
algorithms to numerical problems were
originally called Monte Carlo methods.
Example of Monte Carlo Algorithm
Algorithm Rand_Primality_Test(n)
{
Randomly choose a𝞊[2,n-1]
Calculate an-1 mode n
If an-1 mode n =1 then
Print (n is PROBABLY prime)
Else
Print (n is definitely not prime)
}

NOTE: This is Fermats Primality Test


Approximation Algorithm

• Guaranteed to run in polynomial time


Approximation Algorithm
Text Books
• 1 Fundamentals of Computer Algorithms – E.
Horowitz et al, 2nd Edition UP.- - NP Hard,NP
Complete, Randomised Algorithm
• 2. Introduction to Algorithms, 3th Edition,
Thomas H Cormen, Charles E Lieserson,
Ronald L Rivest and Clifford Stein, MIT
Press/McGraw-Hill.
• 3. Algorithm Design, 1ST Edition, Jon Kleinberg
and ÉvaTardos, Pearson.
Questions asked so far
Questions asked so far
Questions asked so far

You might also like