Physical Representation of Strings and Logic Characteristics
Physical Representation of Strings and Logic Characteristics
201
Module I - Strings
Strings
Computers were frst used for processing numeric data, but now a day it is
frequently used to process non numerical data called character data. Computer
terminology usually uses the term string for a sequence of characters rather
than the term word, since word has another meaning in computer science.
Physical Representations of Strings / Storing Strings
Strings are generally stored in 3 types of structures.
Fied length structures
!ariable length structures with fed maimums
"in#ed structures
Record Oriented, Fixed length structures.
$n this storage each line of point is %iewed as a record, where all ha%e the
same length. i.e. where each record accommodates the same no& of characters.
Suppose our records ha%e length '(.
D A T A S T R U C T U R E S
.
200 214 279
R E A D : M A X , M I N , M I D
.
280 295 359
M I D : = ( M A X + M I N ) / 2
.
360 375 439
Advantages
The ease of accessng data from any gven record.
The ease of updatng data n any gven record (as ong as the ength of new
data does not exceed the record ength)
Disadvantages
Tme s wasted readng an entre record f most of the storage conssts of
nessenta bank space.
Certan records may requre more space than avaabe.
When the correcton conssts of more or fewer characters, than the orgna
text, changng a msspeed word requres the entre record to be changed.
It we want to nsert a new record then a succeedng records must be moved to
new memory ocatons. Ths can be remeded, by the use of a near array, whch
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
gves the address of each successve record, so that the records need not be
stored n consecutve ocatons n memory. Thus nsertng a new record w requre
ony an updatng of the array.
1
2
3
4
5
.
$aria+le lengt" structures ,it" -.ed #a.i#u#s
If we know the actua ength of each strng, then xed ength storage s
usefu. For exampe, one does not have to read the entre record when the strng
occupes ony the begnnng part of the memory ocaton. In that stuaton we use
ths method of storage.
The storage of varabe ength strngs n memory ce wth xed engths can be
done n two genera ways:
One can use a marker, such as 2 $ sgn ($$), to sgna the end of the strng
One can st the ength of the strng as an addtona tem n the ponter array.
1
2
3
4
1 15
2 16
3 16
4 6
.
One mght be tempted to store strngs one after another by usng some separaton
marker, such as the 2 $ sgn. Or by usng a ponter array gvng the ocaton of the
strng. These ways of storng strngs w obvousy save space and are sometmes
used n secondary memory when records are reatvey permanent and requre
tte change. However such methods of storage are usuay nemcent, when the
strngs and ther engths are frequenty beng changed.
1
2
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
D A T A S T R U C T U R E S
.
R E A D : M A X , M I N , M I D
.
M I D : = ( M A X + M I N ) / 2
.
M I D : = 0
.
D A T A S T R U C T U R E S $ $
.
R E A D : M A X , M I N , M I D $ $
.
M I D : = ( M A X + M I N ) / 2 $ $
.
M I D : = 0 $ $
.
D A T A S T R U C T U R E S
.
R E A D : M A X , M I N , M I D
.
M I D : = ( M A X + M I N ) / 2
.
M I D : = 0
.
D A T A S T R U C T U R E S $ $ R E A D : M A X , M I N , M I D $ $
D A T A S T R U C T U R E S R E A D : M A X , M I N , M I D
Data And File Structures MCA11.201
Module I - Strings
3
Linked Storage
For most etensi%e word processing applications, strings are stored by
means of lin#ed lists. )y a lin#ed list, a linearly ordered sequence of memory cells
called nodes can be formed, where each node contains an item, and a lin#, which
points to the net node in the list. *i.e. it contains the address of the net node+
General Form
One Character Per Node
Two Characters Per Node
Strng may be stored n the nked st as above. Each memory ce s
assgned 1 character or xed no: of characters, and the nk contaned n the node
gves the address of the node contanng the next character.
String &rocessing
A nte sequence S of zero or more characters s caed a "string". The no: of
characters n a strng s caed ts lengt". The strng wth zero character s caed
an /#(ty String 0Null String1. Let S1 and S2 be strngs. A strng consstng of
characters of S1, foowed by the characters of S2 s caed the concatenation of
S1 and S2. It w be denoted by S1//S2.
Exampe: THE//END = THEEND
But, THE// //END = THE END
A strng Y s caed a su+string of a strng S, f there exsts strngs X and Z
such that, S = X//Y//Z
If X s an empty strng, then Y s caed an initial su+string of S, and f Z s an
empty strng, then Y s caed a ter#inal su+string of S.
Exampe: HOW ARE s a substrng of HAI HOW ARE YOU, and HAI s an nta
substrng of that strng. Ceary f Y s a substrng of S then, the ength of Y cannot
exceed the ength of S.
Copying
Algorith ! COP"# S$, S% &
,his procedure copies content of String S- to String S.
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
xxx xxx xxx xxx
w e l c o m e
we lc om e
Data And File Structures MCA11.201
Module I - Strings
.. /ead & S. and S-
-. Set ". &0 "123,4*S.+ and Set "- &0 "123,4*S-+
3. $f ". 5 "-, then& 6rite& Copying Failed, and 1it
7. /epeat for $ &0 . to "-
Set S.8$9 &0 S-8$9
:. S.8"- ; .9 &0 <=> 8mar#ing the end of the string 9
?. 6rite & S.
@. 1it
'xaple!
S.0 <$ AB> and S- <61>
/esult& S. 0 <61>
'xplanation
S.0 <$ AB> and S- <61>
".0 7 and "- 0 -
$ 0. S.8.9C 6C S-8.9
$ 0- S.8-9C 1C S-8-9
$ 03 S.8-;.9C <=>
S. C 61
Coparison
Algorith ! CO(PAR'# S$, S% &
,his procedure compares two strings S. and S- and returns ,/D1 or FA"S1 in S,/
. Set ". &0 "123,4*S.+
- Set "- &0 "123,4*S-+
3 $f ". 0 "-, then& 3o to Step 7
1lse& 6rite& Strings not 1qual, and 1it
7 /epeat for "&0 . to ". &
$f S.8"9 E S-8"9, then& 3o to Step ?.
: 8Success9Set S,/ & 0 ,/D1, and 1it
? 8Dnsuccessful9Set S,/ &0 FA"S1
@ 1it
'xaple!
S. 0 <S/11> and S- 0 <S/11>
/esult& Strings are equal
'xplanation
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
S. 0 <S/11> and S- 0 <S/11>
"ength of S. 0 7 and S- 0 7
"engths are equal, so continueF
" 0. , S.8.90 >S> and S-8.90 >S> , so they are equal
" 0- , S.8-90 >/> and S-8-90 >/> , so they are equal
" 03 , S.8390 >1> and S-8390 >1> , so they are equal
" 07 , S.8790 >1> and S-8790 >1> , so they are equal
Concatenation
Let S1 and S2 be strngs. A strng consstng of characters of S1, foowed by the
characters of S2 s caed the concatenaton of S1 and S2. It s denoted by
S122S2.
/.a#(le:
THE//END = THEEND
But, THE// //END = THE END
Algorit"# : C3NCA*0S1%S21
Ths procedure concatenates S1 and S2 nto STR
1. Set L1 := LENGTH(S1) and Set L2 := LENGTH(S2)
2. Repeat for I :=1 to L1:
Set STR|I| := S1|I|
3. Repeat for I :=1 to L2:
Set STR|L1 + I| := S2|I|
4. Wrte: STR
5. Ext
/.(lanation
S1 =THE and S2 =END
Length (S1) = 3 and Length (S2) =3
I=1 STR|1| + T + S1|1|
I=2 STR|2| + H + S1|2|
I=3 STR|3| + E + S1|3|
STR = THE
I=1 STR|3+1| + E + S2|1|
I=2 STR|3+2| + N + S2|2|
I=3 STR|3+3| + D + S2|3|
STR = THEEND
Length
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
,he number of characters in the string is called it s length
L')*+, #string&
"123,4 *<CGBHD,1/>+ 0 '
"123,4 *<>+ 0(
"123,4 *< <+ 0(
Algorith ! L')*+,#S+R&
Ths procedure returns the ength of the strng STR n the varabe LEN
1. Set LEN := 0 |stores Length|
2. Repeat steps 3 and 4 whe S = $ |Assumng that $ s the end of a strng|
3. Read: S |one character at a tme|
4. Set LEN :=LEN +1
5. Wrte: LEN - 1
6. Ext.
'xplanation
LEN = 0
Read C , LEN = 1
Read O , LEN = 2
Read M , LEN = 3
Read P , LEN = 4
Read U , LEN = 5
Read T , LEN = 6
Read E , LEN = 7
Read R , LEN = 8
Read $ , LEN = 9
Actua ength : 9 - 1 = 8 ; Prnt 8 as the ength
String 4everse
Algorith ! S+RR'- # S &
.. /epeat while S E <=>8Assuming = as the end of a string9
/ead& S 8one character at a time9
S,//1!*S+
81nd of "oop9
-. 6rite& S
3. 1it.
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
'xaple!
S.0 <2A,D/A">
/esult& "A/D,A2
'xplanation
S.0 <2A,D/A">
/ead 2 I /ead A I /ead , I /ead D I /ead / I /ead A I /ead "
6rite " I 6rite A I 6rite / I 6rite D I 6rite , I 6rite A I 6rite 2
Su+string
For accessng a substrng from a gven strng we requre peces of nformaton.
Name of the strng or the strng tsef
The poston of the rst character of the substrng n the gven strng
The ength of the substrng or the poston of the ast character of the
substrng
S5BS*4IN60string% initial% lengt"1
/.a#(le: SUBSTRING(HELLO WORLD,4,5) = LOWO ( means space)
Algorith ! S./S+R0)* #S&
3i%en a string S, initial position of substring H and the length of substring " in
memory
,his procedure returns a portion of the string as output
.. Set J&0.
-. /epeat for $&0 H to *H ; " K .+
Set 2168J9 &0S8$9
Set J&0J ; .
81nd of "oop9
3. 6rite& 216
7. 1it
'xplanation
S 0 61 A/1 A"G21 H 0 7 " 03
$ 0 7 2168.9 C <A>C S879
$ 0 : 2168-9 C </>C S8:9
$ 0 ? 216839 C <1>C S8?9
Finally 216 holds <A/1>
Inde.ing 2 &attern Matc"ing
Indexng refers to ndng the poston where a strng pattern P rst appears
n a gven strng or Text T.
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
IND/70*e.t% &attern1
If the poston P does not appear n the Text T, then INDEX s assgned the vaue 0.
Let T = HIS FATHER IS THE PROFESSOR
Then , INDEX(T,THE) 7
INDEX(T,THEN) 0
INDEX(T,THE) 14
&attern Matc"ing
Pattern matchng s the probem of decdng whether or not a gven strng pattern P
appears n a gven Text T. Assumng that ength of P s <= ength of T.
NB: a
2
b
8
ab
2
ndcates aabbbabb and (cd)
8
means cdcdcd
Empty strng s denoted by and concatenaton s denoted as A . B or smpy
AB
Ths agorthm compares a gven pattern P wth each of the substrngs of T
Algorit"#
|Pattern Matchng| P and T are strngs wth engths R and S, respectvey, and are
stored as arrays wth one character per eement. Ths agorthm nds the INDEX of
P n T
1 |Intaze.| Set K:= 1 and MAX:= S - R +1.
2 Repeat Steps 3 to 5 whe K MAX:
3 Repeat for L:= 1 to R :|Tests each character of P|
If P|L| = T|K + L - 1|, then: Go to Step 5.
|End of Inner Loop|
4 |Success| Set INDEX := K, and Ext.
5 Set K:= K +1.
|End of Step 2 outer oop.|
6 |Faure.| Set INDEX := 0.
7 Ext.
/.a#(le:
T = ITS TIME FOR FUN
P = TIME
S = 16, R = 4, and MAX = 13
K=1
P|1| = T|1| (T = I) (P not n rst poston)
K=2
P|1| = T|2| (T = T)
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
P|2| = T|3| (I = S) (P not n second poston)
K=3
P|1| = T|3| (T = S) (P not n thrd poston)
K=4
P|1| = T|4| (T = ) (P not n fourth poston)
K=5
P|1| = T|5| (T = T)
P|2| = T|6| (I = I)
P|3| = T|7| (M = M)
P|4| = T|8| (E = E) (P s found at fth poston n T)
String Operations
0nsertion $nserting a string in the middle of the ,et
1eletion Leleting a string from the ,et
Replaceent /eplacing one string in the ,et with another
0nsertion
Suppose in a gi%en ,et , we want to insert a string S so that S begins in position
#.
6e denote the operation by
0)S'R+#text, position, string&
1ample& $2S1/,*<A)CL1F>,7,>MNO>+ 0 A)CMNOL1F
Algorith ! 0)S'R+ #S, S./, LOC &
A Strng S, ocaton LOC and substrng SUB are n memory
Ths procedure nserts substrng SUB at poston LOC n strng S
1. Set NEW:=SUBSTRING(S,1,(LOC-1)) |substrng ahead of LOC|
2. CONCAT(NEW,SUB) |Appendng new substrng|
3. CONCAT(NEW, SUBSTRING(S,LOC,(LENGTH(S) - (LOC - 1))) |substrng
foowng LOC|
4. Ext
'xplanation
S= AM HAPPY LOC =4 SUB = NOT
LENGTH(S) = 8 and LENGTH(SUB) = 4
|After Substrng| NEW = AM
|After concatenaton| NEW = AM NOT
|After concatenaton| NEW = AM NOT HAPPY
Fnay NEW hods AM NOT HAPPY
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
1eletion
Suppose in a gi%en tet , we want to delete the substring which begins in position
# and has length ". we denote the operation by,
1'L'+' #text, position, length&
'xaple!
L1"1,1*<A)CL1F34>,7,3+ 0 A)C34
$f position # 0 (,
L1"1,1*<A)CL>,(,-+ 0 A)CL
'xaple!
Suppose , 0 <A)CL1F3> and H 0>CL> then $2L1M *,,H+ 0 3 and "123,4 *H+ 0 -
4ence
L1"1,1 *<A)CL1F3>, $2L1M *,,H+, "123,4 *H++
L1"1,1 *<A)CL1F3>, 3, -+ 0 A)1F3
'xaple! !
Suppose , 0 <A)CL1F3> and H 0>LC> then $2L1M *,,H+ 0 ( and "123,4 *H+ 0 -
4ence
L1"1,1 *<A)CL1F3>, $2L1M*,,H+, "123,4*H++
L1"1,1 *<A)CL1F3>, (, -+ 0 A)CL1F3
Suppose after reading into the computer a tet , and a pattern H, we want to
delete e%ery occurrence of the pattern H in the tet ,. this can be accomplished by
repeatedly applying
L1"1,1 *,, $2L1M *,,H+, "123,4 *H+ + until $2L1M * ,, H + 0( *i.e. until H does not
appear in ,+
Algorith ! 1'L'+' #+, P&
A ,et , and a Hattern H are in memory
,his algorithm deletes e%ery occurrence of H in ,
.. 8Find the inde of H9 Set #&0 $2L1M*,,H+
-. /epeat while #E(
*a+8Lelete H from ,9 Set ,&0 L1"1,1*,, $2L1M*,,H+, "123,4*H++
*b+8Dpdate inde9 Set #&0 $2L1M*,,H+
81nd of "oop9
3. 6rite & ,
7. 1it
'xaples!
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
Suppose , 0 MA)NA)O , H 0 A)
,hen the loop in the algorithm will be eecuted twice. Luring the frst
eecution, the frst occurrence of A) in , is deleted, with the result that , 0
MNA)O. Luring the second eecution the remaining occurrence of A) in , is
deleted, so that , 0 MNO. Accordingly, MNO is the output.
Suppose , 0 MAAA)))NA , H 0 A)
Gbser%e that the pattern A) occurs only once in , but the loop in the
algorithm will be eecuted 3 times. Specifcally, after A) is deleted the frst time
from , we ha%e , 0MAA))NA, and hence A) appears again in ,. After
A) is deleted second time from ,, we see that , 0 MA)NA and A) still occurs in ,.
Finally after A) is deleted for the third time, , 0 MNA and A) does not appear in ,
again, gi%ing $2L1M *,,H+ 0 (. 4ence MNA is the output.
'xplanation!
Suppose , 0 MAAA)))NA , H 0 A)
First occurrence of H is at position P 0 7
P E (, hence L1"1,1*,,7,-+
, 0 MAA))NA
2et occurrence of H is at position P 0 3
P E (, hence L1"1,1*,,3,-+
, 0 MA)NA
2et occurrence of H is at position P 0 -
P E (, hence L1"1,1*,,-,-+
, 0 MNA
2et occurrence of H is not found
P 0 (, hence output is MNA
Replaceent
Suppose in a gi%en ,et ,, we want to replace the frst occurrence of a pattern H.,
by a pattern H-, we use,
R'PLAC'#text, pattern$, pattern%&
'xaple!
/1H"AC1*<A)CL1F34>,>A)>,>MN>+ 0 MNCL1F34
/1H"AC1*<A)CA)CL1C>,>C>,>A>+ 0 A)AA)AL1A
/1H"AC1*<MA)NA)O>,>)A>,>C>+ 0 MA)NA)O
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
,he /1H"AC1 function can be epressed as L1"1,1 function followed by an $2S1/,
function , such as,
P&0 $2L1M*,,H.+
,&0L1"1,1*,,P,"123,4*H.++
$2S1/,*,,P,H-+
,he frst two steps delete H. from ,, and the third step inserts H- in the position P
from which H. was deleted.
Suppose a ,et , and the Hatterns H and Q are in the memory of a computer.
Suppose we want to replace e%ery occurrence of the pattern H in , by the Hattern
Q. this might be accomplished by repeatedly applying
/1H"AC1*,,H,Q+, until $2S1/,*,,H+ 0 (
Algorith ! R'PLAC' # +, P &
A ,et , and the Hatterns H and Q are in memory
,his algorithm replaces e%ery occurrence of H in , by Q
.. 8Find the inde of H9 Set #&0 $2L1M*,,H+
-. /epeat while #E(
*a+8/eplace H by Q9 Set ,&0 /1H"AC1*,,H,Q+
*b+8Dpdate inde9 Set #&0 $2L1M*,,H+
81nd of "oop9
3. 6rite & ,
7. 1it
'xaples&
Suppose , 0 MAN H 0 A Q 0 A)
4ere the algorithm will ne%er terminate, as H will always occur in the tet ,,
no matter how many times the loop is eecuted.
Suppose , 0 MA)NA)O H 0 A) Q 0 C
,he loop in the algorithm will be eecuted twice. Luring the frst eecution,
the frst occurrence of A) in , is replaced by C to yield , 0 MCNA)O. Luring the
second eecution, again A) is replaced by C, gi%ing , 0 MCNCO, as the output.
'xplanation
Suppose , 0 MA)NA)O H 0 A) Q 0 C
First occurrence of H is at position P 0 -
P E (, hence /1H"AC1*,,H,Q+
, 0 MCNA)O
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala
Data And File Structures MCA11.201
Module I - Strings
2et occurrence of H is at position P 0 7
P E (, hence /1H"AC1 *,, H,Q+
, 0 MCNCO
2et occurrence of H is not found
P 0 (, hence output is MCNCO
Notes By: Sree a!s"#i $% Assistant &ro'essor% MCA De(t:% )$$S I*%
Adoor% &at"ana#t"itta% )erala