0% found this document useful (0 votes)

79 views6 pages

Application of A Modified Convolution Method To Exact String Matching

The document proposes a modified convolution method for exact string matching that uses bit-level operations to improve efficiency. It represents the pattern as a bit vector during preprocessing. During searching, it uses a "wide window" approach along with bitwise AND and SHIFT operations to find matches. The algorithm runs in O(n) time and O(m) space, where n is the text length and m is the pattern length. This provides better time and space complexity than existing convolution and suffix tree methods.

Uploaded by

Raat Jaga Tara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views6 pages

Application of A Modified Convolution Method To Exact String Matching

Uploaded by

Raat Jaga Tara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Application of a Modified Convolution Method to Exact String Matching

K.W. Liu1, R.C.T. Lee2 and C.H. Huang 3* 1, 2 Department of Computer Science, National Chi Nan University, Puli, Nantou Hsieh, Taiwan 545 3 Department of Computer Science and Information Engineering, National Formosa University, 64,Wen-Hwa Road, Hu-wei, Yun-Lin, Taiwan 632. * Corresponding author: [email protected]

Abstract
The problem of exact string matching is to find all locations at which a query of length m matches a substring of a text of length n. In this paper, we first find out all relative suffixes of this query on the text, and we look backward to find out the corresponding prefixes of this query on the text. In order to get this effort, we make use of a wide window[R.1] whose size is equal to 2m-1. Logic operation in CPU is the main process for all calculations. We directly use the logic operations to speed up the matching. Because the query can be represented as the bit vector, we save the space complexity. We get the optimal solution for exact string matching. Keywords: convolution method; bit-vector; Wide Window approach; String matching

occurrences of the pattern string within the text string. Example: Given: a text string T and a pattern string P T = ababababaabbaabbabababa P = aabbaabb

Sliding window method is the very simple method to solve the exact string matching problem. See Figure 1.1 for an example.

Introduction

Many approaches use automaton[R.4] and suffix tree[R.5] to get the exact string matching. Using automaton and suffix tree methods have the advantage of theoretical explanation. But we need a complicated programming to appear the idea of automaton and suffix tree. In practice we can use the convolution method [R.2]to get the exact string matching. But the original convolution method is taken the disadvantage of big space and time complexity. In this paper, we use bits (1 and 0) and bit level operation (in this paper we use AND and SHIFT) to simulate as convolution method. Among all of the mathematical operation, the bit level operations are faster than any other operations in CPU. Good preprocessing of pattern makes the string matching speed up. More of the string matching algorithms need to make the complicated preprocessing of pattern. In this paper we make a simple preprocessing of pattern. We use bits (0, 1) to representing the pattern in preprocessing. Time complexity for our preprocessing is O(m). Space complexity for our preprocessing is O(m) bit. The exact string matching problem is defined as follows: Given a text string T = t1t2t3tn and a pattern string P = p1p2p3...pm. The length of the pattern string is always smaller the length of the text string. We find all

Figure 1.1

2. Basic Idea of the Wide-Window Approach

We open a window with size 2|P|-1 on the text string. We divide it into two parts, we denote the first one as T1 and the second part as T2. The length of T1 is |P|-1. The length of T2 is |P|.

2-prefix of S = ab 3-prefix of S = abc 4-prefix of S = abcd 5-prefix of S = abcde See Figure 2.4 ~ Figure 2.5 for an example of the wide window approach. Given: T = aababcbdcea P = abcbd Let us produce a wide window whose length is |P|-1+|P| = 2|P| -1 In this case, |P|=5 2|P| -1 = 9

Figure 2.1 Since |T1|<|P|, some suffix of P must be in T2 if it exists. First we find all prefixes of T2 which are also suffixes of P. We can be sure that one part of T2 can be ignored as shown in flowing Figure.

Figure 2.4 We first find all prefixes of T2 which are equal to some suffixes of P. In this case, we obtain bcbd whose length is 4. We found a 4-suffix of P is the 4-prefix of T2. 4-suffix of P = bcbd 4-prefix of T1 = bcbd |P|-4 = 5 4 = 1 If the 1-suffix of T1 is the 1-prefix of P, we have found a matching. 1-suffix of T1 = a 1-prefix of P = a 1-suffix of T1 = 1-prefix of P. Thus we conclude that a matching is found

Figure 2.2 For every prefix of T2 which is a suffix of P, we should find whether there exists a suffix in T1 which is also a prefix of P as shown in Figure 2.3.

Figure 2.3 To simplify the description, we give the following definitions. Definition 1. Let n-suffix to be the suffix of string S with length n, where -1 n |S| . Example: S = abcde 0-suffix of S = 1-suffix of S = e 2-suffix of S = de 3-suffix of S = cde 4-suffix of S = bcde Definition 2. Let n-prefix to be the prefix of string S with length n, where -1 n |S| Example: S = abcde 0-prefix of S = 1-prefix of S = a

Figure 2.5 The next question is how to find all prefixes of T2 which are equal to some prefixes P. The convolution method will work in this issue.

The Modified Convolution Method

We may use the convolution method to find all prefixes of T2 which are equal to some suffixes of P. Consider Figure 3.1. Given: T2 = bcbdc , P = abcbd , T2r= cdbcd

Figure 3.1 We check the numbers among the results. If the value is equal to its position, we conclude that a suffix of P equal to a prefix of T2. The convolution method works, but the complexity is not good enough. With the convolution method, the time complexity is O(n2) and O(mn) additional space. To reduce the overhead, we proposed a modified convolution method. As in Figure 3.2, to speed up, the multiplication and addition operations can be replaced with shift and and operations. We may also use the logic operator (AND &) to find all prefixes of T2 which are equal to some suffixes of P.

Figure 3.3

The Algorithm

Let us consider the following case: T = bcbdc P = abcbd Our job is to determine whether there is a prefix in T which is a suffix of P. Indeed, in this case, we have 4-prefix of T (bcbd) which is also the 4-suffix of P. As indicated before, we may use modified convolution.

Figure 4.1 Definition 3. Given a string S = s1s2sn and a character , the -bit pattern of S is defined as b1b2bn where bi=1 if si = and bi=0 if otherwise. Taking S = abcbd as an example: a-bit pattern of S = 1 0 0 0 0 b-bit pattern of S = 0 1 0 1 0 c-bit pattern of S = 0 0 1 0 0 d-bit pattern of S = 0 0 0 0 1 We can now observe that 1. V1 = b-bit pattern of P as we are comparing T[1] = b with P, 2. V2 = c-bit pattern of P as we are comparing T[2] = c with P, 3. V3 = b-bit pattern of P as we are comparing T[3] = b with P, 4. V4 = d-bit pattern of P as we are comparing T[4] = d with P, 5. V5 = c-bit pattern of P as we are comparing T[5] = c with P.

Figure 3.2 Similarly, we may use the modified convolution method to find all suffixes of T1 which are equal to some prefixes of as Figure 3.3. T1 = aaba , P = abcbd , Pr =dbcda

We are ready to present our algorithm.

The KRC Algorithm

Input: T =t1t2tm , P=p1p2pn Output : All occurrences of P on T Preprocessing: Find the character set of P Build the character_bit pattern of P the character_rbit pattern of inversed P Searching: For each k 1... do m Open a wide window whose length is 2m-1 and its center point is at km Let the window be denoted as a1a2a2m-1 Let a1a2am-1 be denoted as T1 Let amam+1a2m-1 be denoted as T2 Find out all prefixes of T2 which are the suffix of P by using bit pattern approach. For every prefix of T2 which is a suffix of P, we should find whether there exists a suffix in T1 which is also a prefix of P by using bit pattern approach. If we found, we found a matching. End For
n

the prefix of T2 is equal to the suffix of P. Find all character bit patterns of reversed P, P is composed of a, b, c, b, d. a_rbit = 0 0 0 0 1 b_rbit = 0 1 0 1 0 c_rbit = 0 0 1 0 0 d_rbit = 1 0 0 0 0 Having constructed the character bit pattern of reversed P, we may use the character bit pattern of P to find whether the suffix of T1 is equal to the prefix of P as shown in Figure 4.3 to Figure 4.7.

Searching Step

Figure 4.3

End
In the following, we give an example to explain our algorithm step by step. T = aababcbdc P = abcbd Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1 In this case, |P|=5 , 2|P| - 1 = 9 Example:

Figure 4.4

Figure 4.2

Preprocessing step
Build character bit pattern of P P = abcbd Find all bit patterns of P, P is composed of a, b, c, b, d. The character set of P = {a, b, c, b, d} a_bit = 1 0 0 0 0 b_bit = 0 1 0 1 0 c_bit = 0 0 1 0 0 d_bit = 0 0 0 0 1 Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether

Figure 4.5

Figure 4.6

Figure 4.9

Figure 4.10 Figure 4.7 We have found one suffix which is 4-suffix. The corresponding prefix which we need to find is (|P|-4)-prefix. If we found, we got a matching. Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P as shown in Figure 4.8 to Figure 4.11. Figure 4.11

Analysis of Complexities

In the preprocessing, we make the bit pattern of pattern P and the bit pattern of reversed P. We represent P as the bit pattern whose length is |P|. Proposition 1.. The space complexity for preprocessing is O(m) bits. Preprocessing is linear time complexity, since the entire preprocessing just need to read P one time. Proposition 2. The time complexity for preprocessing is O(m). The length of text string |T| is n and the length of pattern string |P| is m. Therefore we have n/m wide windows. For each wide window, we need 2m comparisons in worst case. So the total time needed is O(n) in worst case ( 2m n/m = 2n). Proposition 3. The time complexity for searching is O(n).

Figure 4.8

To sum up, we have the following lemma. Lemma 1. The KRC algorithm runs in O(m) time complexity with O(n) additional space. Experimental results We implemented our algorithm in C programming language. Obviously, according to our experiment, the total number of character comparison of KMP string matching algorithm is at least the length of the text string, almost twice; it is not rely on the length of pattern string. But KMP remembers the substring of text which have recently compared. Therefore we made our algorithm comparing with BM string matching algorithm. We used a lot of DNA sequences n our experiment. We compared our algorithm with BM; the total number of character comparison of our algorithm is less than the BM. This means that our algorithm is better BM and KMP methods. The following is the result of our experiment. The value of x-axis is the length of pattern string. The value of y-axis is the length of text string.

[R.4] Mark Nelson. Fast String Searching with suffix trees. Dr. Dobbs Journal, 1996. [R.5] M. Crochemore, W. Rytter, Jewels of Stringology, World Scientific, Singapore, 2002.

total number of comparisons

12000 10000 8000 6000 4000 2000 0 110 210 310 410 510 610 710 810 the lenght of pattern 910 |P|

Figure experimental results The solid line is the result of BM method. The dotted line is the results of our algorithm. Obviously our algorithm has the less character comparisons than BM method.

6. Summary
We use the bit level operation and our algorithm is very easily to be implemented. Time complexity of our algorithm is O(n), so it is the optimal one in exact string algorithms. For further work, we will try multiple string matching, approximation string matching by using modified convolution method.

References

[R.1] Longtao He, Binxing Fang and Jie Sui. The wide window string matching algorithm. Theoretical Computer Science, 2005. [R.2] B.H.Wu and R.C.T Lee. Convolution and its applications to sequence analysis, 2004. [R.3] Gene Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM, 1999.

11 Data Structures and Algorithms - Narasimha Karumanchi
100% (1)
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
54.string Inotes
No ratings yet
54.string Inotes
20 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
DSA - Strings - Notes
No ratings yet
DSA - Strings - Notes
8 pages
Lec 10
No ratings yet
Lec 10
36 pages
String Matching Introduction To NP-Completeness
No ratings yet
String Matching Introduction To NP-Completeness
37 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
43 pages
4 Module Algorithms
No ratings yet
4 Module Algorithms
28 pages
MS Excel 2010 (Handout)
100% (1)
MS Excel 2010 (Handout)
35 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
The Foos Full
No ratings yet
The Foos Full
147 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
15 String Matching
No ratings yet
15 String Matching
45 pages
Draft 1
No ratings yet
Draft 1
6 pages
KMP Algo
No ratings yet
KMP Algo
16 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
String Matching
100% (1)
String Matching
27 pages
Unit 3
No ratings yet
Unit 3
34 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
W 9 Presentation
No ratings yet
W 9 Presentation
20 pages
KMP Algorithm
No ratings yet
KMP Algorithm
20 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
String Matching
No ratings yet
String Matching
27 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
How A Search Engine Works
No ratings yet
How A Search Engine Works
28 pages
Knuth Moris 2797348
No ratings yet
Knuth Moris 2797348
21 pages
Experiment 9 DAA
No ratings yet
Experiment 9 DAA
5 pages
W9 Presentation
No ratings yet
W9 Presentation
20 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
K Uniflair Ug30
No ratings yet
K Uniflair Ug30
48 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Engineering Drawing
100% (1)
Engineering Drawing
1 page
5CS4-AOA-Unit-3 @zammers
No ratings yet
5CS4-AOA-Unit-3 @zammers
7 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Dbms Viva PDF
No ratings yet
Dbms Viva PDF
14 pages
CH 8
No ratings yet
CH 8
26 pages
Co 4 (Lo 2)
No ratings yet
Co 4 (Lo 2)
12 pages
AMIGA - Art of GO, The Manual
No ratings yet
AMIGA - Art of GO, The Manual
8 pages
Advanced String Lecture
No ratings yet
Advanced String Lecture
50 pages
DSS Express User Manual - ENG
No ratings yet
DSS Express User Manual - ENG
175 pages
Rabin Karp
100% (1)
Rabin Karp
13 pages
4 ss7
No ratings yet
4 ss7
70 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
No ratings yet
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
15 pages
Engineering Mathematics Differential Equations
No ratings yet
Engineering Mathematics Differential Equations
39 pages
Unit8 ADA SPPDF 2022 11 11 17 17 37pdf 2023 12 06 16 57 08
No ratings yet
Unit8 ADA SPPDF 2022 11 11 17 17 37pdf 2023 12 06 16 57 08
18 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
LG r480 - Quanta Ql3 Preso-II - Rev 1a PDF
No ratings yet
LG r480 - Quanta Ql3 Preso-II - Rev 1a PDF
39 pages
Strings and Pattern Matching
No ratings yet
Strings and Pattern Matching
17 pages
KMP 2
No ratings yet
KMP 2
7 pages
FEWA ISMS Antivirus Policy v1.8
No ratings yet
FEWA ISMS Antivirus Policy v1.8
8 pages
Lesson4.2 Deciles and Percentile
No ratings yet
Lesson4.2 Deciles and Percentile
9 pages
String Matching
No ratings yet
String Matching
35 pages
CS 240 Tutorial 11 Notes: C A A B A
No ratings yet
CS 240 Tutorial 11 Notes: C A A B A
2 pages
Maps (Interface) : 1. Hashmap (Class) 2. Sortedmap (Interface) 3. Treemap (Class) 4. Linkedhashmap (Class)
No ratings yet
Maps (Interface) : 1. Hashmap (Class) 2. Sortedmap (Interface) 3. Treemap (Class) 4. Linkedhashmap (Class)
5 pages
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
No ratings yet
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
5 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
OpenICDL Module2 PDF
No ratings yet
OpenICDL Module2 PDF
72 pages
University of Kerala
No ratings yet
University of Kerala
6 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Algorithms in Bioinformatics
No ratings yet
Algorithms in Bioinformatics
7 pages
Windows XP Installation Manual PDF
No ratings yet
Windows XP Installation Manual PDF
4 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
HP Designjet 650C Printer - Product Specifications - Bpp01852 - HP Business Support Center
No ratings yet
HP Designjet 650C Printer - Product Specifications - Bpp01852 - HP Business Support Center
6 pages
StorageTek SL150 Modular Tape Library 2.50 Firmware Features TOI
No ratings yet
StorageTek SL150 Modular Tape Library 2.50 Firmware Features TOI
23 pages
Excel Project
No ratings yet
Excel Project
24 pages
Cover Letters From MIT
50% (2)
Cover Letters From MIT
2 pages
User Manual For BV4.1 Version 2.1
No ratings yet
User Manual For BV4.1 Version 2.1
40 pages
Avaya OneX Communicator User Guide
No ratings yet
Avaya OneX Communicator User Guide
14 pages
Simple Linear Regression PDF
No ratings yet
Simple Linear Regression PDF
40 pages
MitelScreensavers and Branding
No ratings yet
MitelScreensavers and Branding
25 pages
C by Examples PDF
No ratings yet
C by Examples PDF
6 pages
K Means
No ratings yet
K Means
9 pages
Final Capstone Report I-II-III
No ratings yet
Final Capstone Report I-II-III
30 pages
CV Daniela Stoica
No ratings yet
CV Daniela Stoica
6 pages
Chapter 06 - Symbolic Functions: What Is A Function?
No ratings yet
Chapter 06 - Symbolic Functions: What Is A Function?
3 pages
Upgradation of SCADA/EMS Systems at TANTRANSCO of Southern Region
No ratings yet
Upgradation of SCADA/EMS Systems at TANTRANSCO of Southern Region
4 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Infinite Sequences and Series
From Everand
Infinite Sequences and Series
Konrad Knopp
3.5/5 (3)
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

Application of A Modified Convolution Method To Exact String Matching

Uploaded by

Application of A Modified Convolution Method To Exact String Matching

Uploaded by

Application of a Modified Convolution Method to Exact String Matching

2. Basic Idea of the Wide-Window Approach

The Modified Convolution Method

We are ready to present our algorithm.

The KRC Algorithm

total number of comparisons

You might also like