Data Structures: Hapter
Data Structures: Hapter
Chapter 03
9/1/2024 Md. Golam Moazzam, Dept. of CSE, JU 1
Data Structures- Chapter 3
String
– A finite sequence S of zero or more characters is called a string.
– The number of characters in a string is called its length.
– The string with zero characters is called the empty string or null string.
‘THE END’, ‘’, ‘--’. Lengths: 7, 0 and 2 respectively.
– Let S1 and S2 be strings. The string consisting of the characters of S1
followed by the characters of S2 is called the concatenation of S1 and
S2. It will be denoted by S1//S2. Example: ‘THE’ // ‘END’ =
‘THEEND’.
– A string Y is called a substring of a string S, if there exist strings X and
Z such that S= X // Y // Z. If X is an empty string, Y is called an initial
substring of S. If Z is an empty string, Y is called a terminal substring
of S.
• Fixed-length structures,
• Variable-length structures with fixed maximums and
• Linked structures.
– Disadvantages:
• Time is wasted reading an engine record if most of the storage
consists of inessential blank space.
• Certain records may require more space than available.
• When the correction consists of more or fewer characters than the
original text, changing a misspelled word requires the entire record
to be changed.
Suppose we wanted to insert a new record. This would require that all
succeeding records be moved to new memory locations. This disadvantage
can be easily remedied using a liner array POINT which gives the address
of each successive record, so that the records need not be stored in
consecutive locations in memory. Accordingly, inserting a new record will
require will only an updating of the array POINT.
• One can use a marker, such as two dollar signs ($$), to signal the
end the string.
• One can list the length of the string-as an additional item in the
pointer array
• Computers are being used very frequently today for word processing, i.e.,
for inputting, processing and outputting printed matter. Therefore, the
computer must be able to correct the printed matter, which usually means
deleting, changing and inserting words, phrases, sentences and even
paragraphs in the text. Accordingly, for most extensive word processing
applications, strings are stored by means of linked lists.
Figure 3.7 (a) shows how the string would appear in memory with
character per node, and Fig. 3.7 (b) shows how it would appear with four
characters per node.
• Substring
• Indexing
• Concatenation
• Length
• Substring
Accessing a substring from a given string requires three pieces of
information:
– The name of the string or the string itself
– The position of the first character of the substring in the given string and
– The length of the substring or the position of the last character of the substring.
• Indexing
– Indexing, also called pattern matching, refers to finding the position
where a string pattern P first appears in a given string text T.
– Format of this operation:
INDEX (text, pattern)
If the pattern P does not appear in the text T, then INDEX = 0.
• Concatenation
– Let S1, and S2 be strings. The concatenation of, S1 and S2, which we
denote by S1 // S2 is the string consisting of the characters of S1
followed by the characters of S2.
• Length
– The number of characters in a string is called its length.
– Format:
LENGTH (string)
• Insertion
Suppose in a given text T we want to insert a string S so that S begins in
position K. This operation is denoted by:
INSERT (text, position, string)
Example: INSERT (‘ABCDEFG’, 3, ‘XYZ’) = ‘ABXYZCDEFG’
• Deletion
Suppose in a given text T we want to delete the substring which begins in
position K and has length L. This operation is denoted by:
DELETE (text, position, length)
Example: DELETE (‘ABCDEFG’, 4, 2) = ‘ABCFG’
• Replacement
Suppose in a given text T we want to replace the first occurrence of a
pattern P1 by a pattern P2. This operation is denoted by:
REPLACE (text, pattern1, pattern2)
Example: REPLACE (‘XABYABZ’, ‘AB’, ‘C’) = ‘XCYABZ’
Insertion
The INSERT function can be implemented using the string operation as
follows:
INSERT(T, K, S) = SUBSTRING(T, 1, K-1)//S//
SUBSTRING(T, K, LENGTH(T)-K+1)
Deletion
The DELETE function can be implemented using the string operation as
follows:
DELETE(T, K, L) = SUBSTRING(T, 1, K-1)//
SUBSTRING(T, K+L, LENGTH(T)-K-L+1)
Replacement
The REPLACE function can be implemented using the string operation. It
can be executed by using the following three steps:
1. Set K = INDEX(T, P)
2. Repeat while K ≠ 0
a) Set T = DELETE(T, K, LENGTH (P))
b) Set K = INDEX(T, P)
[End of loop]
3. Write: T
4. Exit.
A text T and patterns P and Q are in memory. This algorithm replaces every
occurrence of P in T by Q.
1. Set K = INDEX(T, P)
2. Repeat while K ≠ 0
a) Set T = REPLACE(T, P, Q)
b) Set K = INDEX(T, P)
3. [End of loop]
4. Write: T.
5. Exit.
9/1/2024 Md. Golam Moazzam, Dept. of CSE, JU 29
Data Structures- Chapter 3
Example 3.8:
(a) Suppose T = XABYABZ, P = AB, and Q = C.
After the 1st execution, T = XCYABZ
After the 2nd execution, T = XCYCZ, the final output.
For each character t, the entry f(Qi, t) in the table is the largest Q which
appears as a terminal substring in the string Qit. We compute:
a b
Q0 Q1 Q0
Q1 Q2 Q0
Q2 Q2 Q3
Q3 P Q0
Fig. 3.8 (a): Pattern Matching Table Fig. 3.8 (b): Pattern Matching Graph
Finally, P is found in T.
Thus, INDEX = 8 – LENGTH (P) = 8 – 4 = 4
Solution:
a) 4
b) Substrings: Ʌ, A, B, C, D, AB, BC, CD, ABC, BCD, ABCD
c) Initial substrings: Ʌ, A, AB, ABC. ABCD
Solution:
Solution:
b) In general, when P is an r-character string and T is an s-character string,
the data size for the algorithm is
n=r+s
The worst case occurs when every character of P except the last matches
every substring Wk. In this case,
C(n) = r (s – r + 1)
For fixed n, we have s = n - r, so that,
C(n) = r (n - 2r + 1)
= nr -2r2 + r
Solution:
Here n is fixed, so C = C(n) may be viewed as a function of r. According to
Calculus, the maximum value of C occurs when C/ = dc/dr = 0 (here C' is
the derivative of C with respect to r). Using calculus, we obtain:
C/ = n - 4r + 1 = 0
Therefore, r = (n+1)/4
For each character t, the entry f(Qi, t) in the table is the largest Q which
appears as a terminal substring in the string Qit. We compute:
a b
Q0 Q1 Q0
Q1 Q2 Q0
Q2 Q3 Q0
Q3 Q3 Q4
Q4 Q1 P
a b
Q0 Q1 Q0
Q1 Q1 Q2
Q2 Q3 Q0
Q3 Q1 Q4
Q4 Q5 Q0
Q5 Q1 P