Ir Asnment
Ir Asnment
SEQUENTIAL SEARCHING:
It is the problem of exact string matching is: given a short pattern P of length m
and a long text T of length n, find all the text positions where the pattern occurs. With minimal changes, this problem subsumes many basic queries, such as word, prefix, suffix, and substring search. Types: 1. Brute- Force search 2. Knuth Morris-Pratt algorithm 3. Boyer Moore algorithm
****************
Brute Forcsearching***********
is a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's statement. For example, a brute-force algorithm to find the divisors of a natural number n is to enumerate all integers from 1 to n, and check whether each of them divides n without remainder. Basic Algorithm: In order to apply brute-force search to a specific class of problems, one must implementfour procedures,first,next, valid, and output. These procedures should take as a parameter the data P for the particular instance of the problem that is to be solved, and should do the following: 1. 2. 3. 4. first(P): generate a first candidate solution for P. next(P,c): generate the next candidate for P after the current one c. valid(P,c): check whether candidate c is a solution for P. outpu(P,c): use the solution c of P as appropriate to the application.
The next procedure must also tell when there are no more candidates for the instance P, after the current one c. A convenient way to do that is to return a "null candidate", some conventional data value that is distinct from any real candidate. Likewise the first procedure should return if there are no candidates at all for the nstance P. The brute-force method is then expressed
by the algorithm: c first(P) while c do if valid(P,c) then output(P, c) c next(P,c) For example, when looking for the divisors of an integer n, the instance data P is the number n. The call first(n) should return the integer 1 if n 1, or otherwise; the call next(n,c) should return c + 1 if c < n, and otherwise; and valid(n,c) should return true if and only if c is a divisor of n. (In fact, if we choose to be n + 1, the tests n 1 and c < n are unnecessary.)The brute-force search algorithm above will call output for every candidate that is a solution to the given instance P. The algorithm is easily modified to stop after finding the first solution,
H E
E E
R X E
X E E E E E E E E E E E E X E E X A M P L E
algorithm kmp_search: input: an array of characters, S (the text to be searched) an array of characters, W (the word sought) output: an integer (the zero-based position in S at which W is found) define variables: an integer, m 0 (the beginning of the current match in S) an integer, i 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere) while m+i is less than the length of S, do: if W[i] = S[m + i], if i equals the (length of W)-1, return m let i i + 1 otherwise, let m m + i - T[i], if T[i] is greater than -1, let i T[i] else let i 0 (if we reach here, we have searched all of S unsuccessfully) return the length of S
H E
E X E
R A X
E M A E P M X
I L P A E
S E L M X E E P A X E
L M A X E
E P M A X E L P M A X E E L P M A X E E L P M A X E E L P M A X E E L P M A X E E L P M A X E E L P M A X E L P M A E E L P M X E L P A E E L M E E P E L E
KnuthMorrisPratt algorithm has complexity O(k), where k is the length of S and the O is big-O notation. Thus the loop executes at most 2k times, showing that the time complexity of the search algorithm is O(k).
Either rule alone works, but theyre more effective together. Algorithm: // Preprocessing: Compute R(x) for each x c Z ; Compute L0(i) and l(i) for each i = 2; : : : ; n + 1; // Search: k := n; while k _ m do i := n; h := k; while i > 0 and P[i] = T[h] do i := i- 1; h := h- 1; endwhile; if i = 0 then Report an occurrence at T[h + 1 : : : k]; k := k + n- l(2); else // mismatch at P[i] Increase k by the maximum shift given by the bad character rule and the good suffix rule; endif; endwhile;
H E E X R A E M P I L S E E A X A S M E I P A M L M P E P L E E X A M P L E
E E
A E
M X
P A
L M
They can be shown to lead to linear-time behavior, but only if P does not occur in T. Otherwise the worst-case complexity is still O(nm).