Ds Unit2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 84

UNIT

II
Algorithms and Linear Data Structure: Array
Introduction: Data, Data Structure and their types.
Algorithm and their Complexity, String processing
operations, Pattern matching algorithms: fast and
slow. Array: Types of array, memory representation
of array, Algorithm and operations on Array:
traversing, searching, insertion, deletion.
Applications
DATA STRUCTURES
Data may be organized in many different ways; the
logical or mathematical model of a particular
organization of data is called a data structure.

The choice of a particular data model depends on two


considerations.
 First, it must be rich enough in structure to mirror
the actual relationships of the data in the real
world.
 On the other hand, the structure should be
simple enough that one can effectively process the
data when necessary.
An array is a collection of items stored at contiguous
memory locations.
The idea is to store multiple items of the same type
together. This makes it easier to calculate the position
of each element by simply adding an offset to a base
value, i.e., the memory location of the first element of
the array (generally denoted by the name of the
array).
array index and a memory address is that the array
index acts like a key value to label the elements in the
array. However, a memory address is the starting
address of free memory available.

the important terms to understand the concept of Array.


•Element − Each item stored in an array is called an
element.
•Index − Each location of an element in an array has a
numerical index, which is used to identify the element.
Creating an array in C and C++ programming languages −
data_type array_name[array_size] = {elements separated
using commas}
OR
data_type array_name[array_size];
There are two types of arrays:
•One-Dimensional Arrays.
•Multi-Dimensional Arrays.

A one-dimensional array, sometimes known as a single-


dimensional array, is one in which the elements are accessed in
sequence.
The subscript of a column or row index will be used to access
this type of array. A single subscript, in this case, represents
each element. The items are saved in memory in sequential
order.
 For example, A [1], A [2],…, A [N].
Basic Operations in the Arrays
The basic operations in the Arrays are insertion, deletion,
searching, display, traverse, and update. These operations
are usually performed to either modify the data in the array
or to report the status of the array.
Following are the basic operations supported by an array.
•Traverse − print all the array elements one by one.
•Insertion − Adds an element at the given index.
•Deletion − Deletes an element at the given index.
•Search − Searches an element using the given index or by
the value.
•Update − Updates an element at the given index.
•Display − Displays the contents of the array.
Traversal Operation
This operation traverses through all the elements of an array. We
use loop statements to carry this out.
Algorithm
Following is the algorithm to traverse through all the elements
present in a Linear Array −
1 Start
2. Initialize an Array of certain size and datatype.
3. Initialize another variable ‘i’ with 0.
4. Print the ith value in the array and increment i.
5. Repeat Step 4 until the end of the array is reached.
6. End
#include <stdio.h>
int main()
{
int LA[] = {1,3,5,7,8};
int item = 10, k = 3, n = 5;
int i = 0, j = n;
printf("The original array elements are :\
n");
for(i = 0; i<n; i++)
{
printf("LA[%d] = %d \n", i, LA[i]);
}
}
Output
The original array elements are :
LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
Insertion Operation
In the insertion operation, we are adding one or
more elements to the array.
Based on the requirement, a new element can be
added at the beginning, end, or any given index of
array.
This is done using input statements of the
programming languages.
Algorithm
Following is an algorithm to insert elements into a
Linear Array until we reach the end of the array −

1. Start
2. Create an Array of a desired datatype and size.
3. Initialize a variable ‘i’ as 0.
4. Enter the element at ith index of the array.
5. Increment i by 1.
6. Repeat Steps 4 & 5 until the end of the array.
7. Stop
#include <stdio.h>
int main()
{
int LA[3], i;
printf("Array Before Insertion:\n");
for(i = 0; i < 3; i++)
printf("LA[%d] = %d \n", i, LA[i]);
printf("Inserting Elements.. ");
printf("The array elements after insertion :\n"); //
prints array values
for(i = 0; i < 3; i++)
{
LA[i] = i + 2;
printf("LA[%d] = %d \n", i, LA[i]);
}
return 0;
}
Output
Array Before Insertion:
LA[0] = 587297216
LA[1] = 32767
LA[2] = 0
Inserting Elements..
The array elements after insertion :
LA[0] = 2
LA[1] = 3
LA[2] = 4
Deletion Operation
In this array operation, we delete an element from the
particular index of an array.
This deletion operation takes place as we assign the value in
the consequent index to the current index.
Algorithm
Consider LA is a linear array with N elements and K is a
positive integer such that K<=N.
Following is the algorithm to delete an element available at the
Kth position of LA.
1. Start
2. Set J = K
3. Repeat steps 4 and 5 while J < N
4. Set LA[J] = LA[J + 1]
5. Set J = J+1
6. Set N = N-1
7. Stop
#include <stdio.h>
void main()
{
int LA[] = {1,3,5};
int n = 3;
int i;
printf("The original array elements are :\n");
for(i = 0; i<n; i++)
printf("LA[%d] = %d \n", i, LA[i]);
for(i = 1; i<n; i++)
{
LA[i] = LA[i+1];
n = n – 1; }
printf("The array elements after deletion :\n");
for(i = 0; i<n-1; i++)
printf("LA[%d] = %d \n", i, LA[i]);
}
The original array elements are :
LA[0] = 1
LA[1] = 3
LA[2] = 5
The array elements after deletion :
LA[0] = 1
LA[1] = 5
Search Operation
Searching an element in the array using a key;
The key element sequentially compares every value in the
array to check if the key is present in the array or not.
Algorithm
Consider LA is a linear array with N elements and K is a
positive integer such that K<=N.
Following is the algorithm to find an element with a value of
ITEM using sequential search.
1. Start
2. Set J = 0
3. Repeat steps 4 and 5 while J < N
4. IF LA[J] is equal ITEM THEN GOTO STEP 6
5. Set J = J +1
6. PRINT J, ITEM
7. Stop
#include <stdio.h>
void main()
{
int LA[] = {1,3,5,7,8};
int item = 5, n = 5;
int i = 0, j = 0;
printf("The original array elements are :\n");
for(i = 0; i<n; i++)
{
printf("LA[%d] = %d \n", i, LA[i]);
}
for(i = 0; i<n; i++)
{
if( LA[i] == item )
{
printf("Found element %d at position %d\n",
item, i+1);
}
}
}
Output
The original array elements are :
LA[0] = 1
LA[1] = 3
LA[2] = 5
LA[3] = 7
LA[4] = 8
Found element 5 at position 3
Multidimensional Array
The data items in a multidimensional array are stored in
the form of rows and columns.
Also, the memory allocated for the multidimensional
array is contiguous. So the elements in multidimensional
arrays can be stored in linear storage using two methods
i.e., row-major order or column-major order

data_type array_name[size1][size2]....[sizeN];

Two dimensional array: int two_d[10][20];


Three dimensional array: int three_d[10][20][30];
Classification of Data Structures
DATA STRUCTURE OPERATIONS
The data appearing in data structures are processed by means of
certain operations as follows:
•Traversing: Accessing each record exactly once so that certain items
in the record may be processed.
•Searching: finding the location of the record with a given key value,
or finding the locations of all records which satisfy one or more
conditions.
•Inserting: Adding a new record to the structure.

•Deleting: Removing a record from the structure.


The following two operations, which are used in special situations,
will also be considered.
•Sorting: Arranging the records in some logical order.

•Merging: Combining the records in two different sorted files into a


single sorted file.
ALGORITHMS: COMPLEXITY, TIME-SPACE TRADEOFF:

Algorithm: An algorithm is a well-defined list of steps for solving a


particular problem. The time and space it uses are two major measures of the
efficiency of an algorithm.

Complexity: The complexity of an algorithm is the function which gives


the running time and/or space in terms of the input size.

 Time-Space Tradeoff:
• Each of algorithms will involve a particular data structure.
Accordingly, we may not always be able to use the most efficient
algorithm, since the choice of data structure depends on many things,
including the type of data structure and the frequency with which
various data operations are applied.

• Sometimes the choice of data structure involves a time-space tradeoff:


by increasing the amount of space for storing the data, one may be able
to reduce the time needed for processing the data, or vice versa.
ALGORITHMIC NOTATIONS
Conventions that we will use in presenting algorithms are:
 Identifying Numbers: Each algorithm is assigned an identifying number

Steps, Control, Exit: The steps of the algorithm are executed one after
other, beginning with step 1, unless indicated otherwise. Control may be
transferred to step n of the algorithm by the statement “Go to step n”.
Generally Go to statements may be practically eliminated by using certain
control structures. The algorithm is completed when statement Exit is
encountered.

Comments: Each step may contain a comment in brackets which


indicates the main purpose of the step.

Variable Names: Variable names will use capital letters, as in MAX and
DATA.

Assignment Statement: Assignment statements will use the dots-equal


notation:=that is used in Pascal.

Ex. Max: =DATA [1] Assigns the value in DATA [1] to MAX.
 Input and Output: Data may be input and assigned to variables by
means of a Read statement with following form:
Read: Variable names.

 Similarly, messages, placed in quotation marks, and data in


variables may be output by means of a Write or Print statement
with the following form:
 Write: Message and/or variable names.

 Procedures: The term procedure will be used for an independent


algorithmic module which solves a particular problem.

 Control Structures: Algorithms and their equivalent computer


programs are more easily understood if they mainly use self-
contained modules and three types of logic, or flow of control,
called
 Sequence logic, or sequential flow
 Selection logic, or conditional flow
 Iteration logic, or repetitive flow
COMPLEXITY OF ALGORITHMS
 In order to compare algorithms, we must have some criteria to
measure the efficiency of algorithms.
 Suppose M is an algorithm, and suppose n is the size of the
input data.
 The time and space used by the algorithm M are two
main
measures for the efficiency of M.
 The time is measured by counting the number of key
operations- in sorting and searching algorithms, for example,
the number of comparisons. The space is measured by counting
the maximum of memory needed by the algorithm.
 The complexity of an algorithm M is the function f(n) which
gives the running time and/or storage space requirement of the
algorithm in terms of the size n of the input data.
 Frequently, the storage space required by an algorithm is
simply a multiple of the data size n.
 Accordingly, unless otherwise stated or implied, the term
complexity shall refer to the running time of the algorithm.
COMPLEXITY OF ALGORITHMS

 The two cases one usually investigates in


complexity theory are as follows:
 Worst case: The maximum value of f(n) for any
possible
input.
 Average case: The expected value of f(n).
 Sometimes we also consider the minimum possible value of
f(n), called the best case
 The average case analysis assumes a certain probabilistic
distribution for the input data; one such assumption
might be that all possible permutations of an input data
set are equally likely
EX. (LINEAR SEARCH)
 A linear array DATA with N elements and specific ITEM of
information are given. This algorithm finds the location LOC of
ITEM in the array DATA or sets LOC=0.
1. [Initialize] Set K:=1 and LOC:=0
2. Repeat Steps 3and 4 while LOC=0 and K<=N
3. If ITEM=DATA [K], then: Set LOC: =K.
4. Set K: =K+1. [Increments counter.]
[End of Step 2 loop]
5. [Successful?]
If LOC=0, then:
Write: ITEM is not in the array DATA.
Else:
Write: LOC is the location of ITEM.
[End of If Structure.]
6. Exit.
 The complexity of the search algorithm is given by the number of C
comparisons between ITEM and DATA [K]. We seek C (n) for the
worst case and the average case.
 Worst Case: Worst case occurs when ITEM is the last element in the
array DATA or is not there at all. In either situation, we have C(n)=n
Accordingly, C(n)=n is the worst case complexity of the linear search
algorithm.
 Average case: Here we assume that ITEM does not appear in DATA,
and that is equally likely to occur at any position in the array.
Accordingly, the number of comparisons can be any of the numbers
1,2,3,……,n, and each number occurs with probability p=1/n. Then
RATE OF GROWTH; BIG O NOTATION
 Suppose M is an algorithm, and suppose n is the size of the input data.
 Clearly the complexity f(n) of M increases as n increases.
 It is usually the rate of increase of f(n) that we want to examine.
 This is usually done by comparing f(n) with some standard function, such
as
log2 n, n log2 n, n2, n3, 2n
 The rates of growth for these standard functions are indicated in following
table, which gives approximate values for certain values of n.

 Suppose f(n) and g(n) are functions defined on the positive integers with
the property that f(n) is bounded by some multiple of g(n) for almost all n.
this is, suppose there exist a positive integer n0 and a positive number M
such that, for all n> n0, we have |f(n)|<=M|g(n)| Then we may write
 f(n)=O(g(n))
 which is read “f(n) is of order g(n).”
LINEAR ARRAYS
 A Linear array is a list of a finite number n of homogeneous data elements
such that:
 The elements of the array are referenced respectively by an index set
consisting of n consecutive numbers.
 The elements of the array are stored respectively in successive memory
locations.
 The number n of elements is called the length or size of the array

 If not explicitly stated, we will assume the index set consists of the integers
1,2,…n.
 In general, the length or the number of elements of the array can be obtained
from the index set by the formula.
Length = UB-LB+1
Where,
 UB- Largest Index
 LB- Smallest Index

 The elements of an array A may be denoted by the subscript notation

A1,A2,A3,…………An Or by the bracket


notation
A[1],A[2],A[3],…………..A[N]
REPRESENTATION OF LINEAR ARRAYS IN MEMORY
 Let LA be a linear array in the memory of the computer.
 Memory of the computer is simply a sequence of addressed locations as follows:

Notation:
LOC (LA [K]) =address of the element LA [K] of the array LA
 Computer does not need to keep track of the address of every element of LA, but
needs
to keep track only of the address of the first element of LA, denoted by
Base (LA) Called base address of LA
 Using Base(LA), the computer calculates the address of any element of LA by the
following formula:
 Using Base(LA), the computer calculates the address of any
element of LA by the following formula:
LOC (LA [K]) =Base (LA) + w (K-lower bound)
Where,
 w is the number of words per memory cell for the array
LA
 Given any subscript K, one can locate and access the
content of LA[K] without scanning any other element of
LA.
TRAVERSING LINEAR ARRAY

Algorithm: LA is the linear array having N number of


elements. LB & UB are lower and upper bounds of array
respectively. This algorithm traverse the linear array and
apply process on it.

1. [Initialize counter.] Set K: =LB.


2. Repeat Steps 3 and 4 while K<=UB.
3. [Visit element.] Apply PROCESS to LA[K].
4. [Increase counter.] Set K: =K+1.
[End of Step 2 loop.]
5. Exit.
INSERTING AND DELETING
 Inserting: The operation of adding another element to the
collection A.
 Deleting: The operation of removing one of the elements
from A.

Algorithm: (Inserting into a Linear Array) INSERT (LA, N, K,


ITEM) Here LA is a linear array with N elements and K is a positive
integer such that K<=N. This algorithm inserts an element ITEM
into the Kth position in LA.

1. [Initialize counter.] Set J: = N.


2. Repeat Steps 3 and 4 while J>=K
3. [Move Jth element downward.] Set LA [J+1]:=LA [J].
4. [Decrease counter.] Set J: =J-1.
[End of Step 2 loop.]
5. [Insert element.] Set LA [K]:=ITEM.
6. [Reset N.] Set N: =N+1.
7. Exit.
INSERTION
DELETION
Algorithm: (Deleting from a linear array) DELETE (LA, N, K,
ITEM), here LA is a linear array with N elements and K is a
positive integer such that K<=N. This algorithm deletes
Kth element from LA.
1. Set ITEM: = LA [K].
2. Repeat for J=K to N-1:
[Move J+1st element upward.] Set LA [J]:=LA [J+1]
[End of loop.]
3. [Reset the number N of elements in LA.] Set N: =N-1.
4. Exit.
DELETION
STRING PROCESSING
Basic terminology:
 Each programming language contains a character set that is used to
communicate with the computer. This set usually includes the following:
 Alphabet: A B C D E F G H I J K L M N O P Q R S T U V W X Y
Z
 Digits: 0 1 2 3 4 5 6 7 8 9
 Special characters: + - / * ( ) , . $ = ‘ □
 The set of special characters, which includes the blank space, frequently
denoted by □.
 A finite sequence S of zero or more characters is called a string.
 The number of characters in a string is called its length.
 The string with zero characters is called the empty string of null string.
 Specific strings will be denoted by enclosing their characters in single
quotation marks. The quotation marks will also serve as string
delimiters.

Ex. ’THE END’ ‘TO BE OR NOT TO BE’ ‘□□‘


Are strings with lengths 7, 18, and 2
Storing Strings: Strings are stored in three types
of structures:
 Fixed length structure
 Variable length structure with fixed maximum
 Linked structure
RECORD ORIENTED, FIXED-LENGTH STORAGE
In fixed storage each line of print is viewed as record,
where all records have the same length, i.e. where each
record accommodates the same number of characters. .
Since earlier systems used to input on terminals with 80-
coloumns images or using 80-coloumn cards, we will
assume our records have length 80 unless otherwise stated
or implied.
Disadvantages:
 Time is wasted reading an entire record if most of
the storage consists of inessential blank spaces.
 Certain records may require more space than available

 When correction consists of more or fewer characters than


the original text, changing a misspelled word requires the
entire record to be changed.
FIXED LENGTH STRUCTURE
2. Variable-Length Storage with fixed maximum: The storage
of variable-length strings in memory cells with fixed lengths
can be done in two general ways:
 One can use a marker, such as two dollar signs ($$), to signal
the end of the string.
 One can list the length of the string-as an additional item in
the pointer array.
3. Linked Storage: By linked list, we mean a linearly ordered
sequence of memory cells, called nodes, where each node
contains an item, called a link, which points to the next node
in the list.
 Each memory cell is assigned one character or a fixed
number of characters, and a link contained in the cell gives
the address of the cell containing the next character or group
of characters in the string.
VARIABLE LENGTH STRUCTURE WITH FIXED MAXIMUM
VARIABLE LENGTH STRUCTURE
LINKED STRUCTURE WITH ONE WORD AND FOUR WORD
CHARACTER DATA TYPES
Various programming languages handle the character data type. Each data type has its
own formula for decoding a sequence of bits in memory.
Constants: Many programming languages denote string constants by placing the

string in either single or double quotation marks.


 Ex. ‘THE END’ “TO BE OR NOT TO BE” are string constants of lengths 7 and 18

characters respectively.
Variables: Each programming language has its own rules for forming character

variables. However, such variables fall into one of three categories:


 Static: By static character variable, we mean a variable whose length is defined
before the program is executed and cannot change throughout the program.
 Semistatic: By semistatic character variable, we mean a variable whose length

may vary during the execution of the program as long as the length does not
exceed a maximum value determined by the program before the program is
executed.
 Dynamic: By dynamic character variable, we mean a variable whose length can

change during the execution of the program.


These three categories correspond, respectively, to the ways the strings are stored in
the memory of the computer.
STRING OPERATIONS
Although a string may be viewed simply as a sequence or linear array
of characters, there is a fundamental difference in use between strings
and other types of arrays.
Specifically, groups of consecutive elements in a string, called

substrings, may be units unto themselves.


Furthermore, the basic units of access in a string are usually these

substrings, not individual characters.

Substring:
Accessing a substring from a given string requires three pieces of
information:
1. The name of the string or the string itself
2. The position of the first character of the substring in the given
string
3. The length of the substring or the position of the last character of
the substring.
We call this operation SUBSTRING. Specifically, we write
SUBSTRING (string, initial, length)
To denote the substring of a string S beginning in a position K
having a length L.
Ex. SUBSTRING (‘TO BE OR NOT BE’, 4, 7) =’BE OR N’
SUBSTRING (‘THE END’, 4, 4) =’□END’
INDEXING

Indexing, also called pattern matching, refers to finding the


position where a string pattern P first appears in a given text
T. we call this operation INDEX and write

INDEX (text, pattern)

If the pattern P does not appear in the text T, then INDEX is
assigned the value 0. the arguments “text” and “pattern” can
be either string constants or string variables.

Ex. ‘HIS FATHER IS THE PROFESSOR’ Then


INDEX(T, ‘THE’), INDEX(T,’THEN’) INDEX(T,
and ’
□THE □’)
Have the values 7, 0 and 14 respectively.
CONCATENATIO
N
 Let S1 and S2 be strings.
Then concatenation of S1 and S2, which we denote S1//S2, is
the string consisting of the characters of S1 followed by the
characters of S2.

Ex. Suppose S1=’MARK and S2=’TWAIN’. Then:


S1//S2=’MARKTWAIN’ but S1//’□’//S2=’MARK TWAIN’

Length: The number of characters in a string is called


its length. We will write

LENGTH (string) for the length of a given string.

Ex. Suppose S=’COMPUTER’. Then :


LENGTH (S) =8
LENGTH (‘MARK TWAIN’) = 10
WORD/TEXT PROCESSING
In earlier times, character data processed by the
computer consisted mainly of data items, such as names and
addresses.

Today the computer also processes printed matter, such


as letters, articles and reports.

It is in this latter context that we use the term


“word processing.”

Given some printed text, the operations usually


associated with word processing are as follows:
 Replacement: Replacing one string in the text by another.
 Insertion: Inserting a string in the middle of the text.
 Deletion: Deleting a string from the text.

The above operations can be executed by using the


string operations.
INSERTION
Suppose in a given text T we want to insert a string S so that S
begins in position K. We denote this operation by

INSERT (text, position, string)

Ex: INSERT(‘ABCDEFG’,3,’XYZ’)=’ABXYZCDEFG’
INSERT (‘ABCDEFG’,6,’XYZ’)=’ABCDEXYZFG’

This insert function can be implemented by using the


string operation as follows

INSERT (T, K, S)= SUBSTRING (T, 1, K-1) //


S//
SUBSTRING (T, K, LENGTH(T)-K+1)
DELETION

Suppose in a given text T we want to delete the substring


which begins at position K and has length L. We denote this
operation by
DELETE(text, position, length)

Ex: DELETE (‘ABCDEFG ‘, (4,2)=’ABCFG’


DELETE (‘ABCDEFG’, 2,4)=’AFG’

We assume that nothing is deleted if position K=0. Thus


DELETE (‘ABCDEFG’, 0,2)=’ABCDEFG’

The delete function can be implemented using the string


operations as follows.

DELETE (T, K, L) = SUBSTRING (T, 1, K-


1)// SUBSTRING (T, K +L, LENGTH (T)-K-L+1)
That is. The initial substring of T before position K is concatenated
with the terminal substring of T beginning at position K+L. The length
of the initial substring is K-1, and the length of the terminal substring
is:
LENGTH (T)- (K+L-1)=LENGTH (T)-K-L+1
We also assume that DELETE (T,K,L)=T when K=0.
Now suppose, in the text T, we first compute INDEX(T,P), the position
where P first occurs in T, and then we compute LENGTH (P), The
number of characters in P.
 When INDEX(T,P)=0 the text T is not changed.

 Ex1: Suppose T=’ABCEDEFG’ and P=’CD’. Then INDEX(T,P)=3


and LENGTH (P)=2. Hence DELETE
(‘ABCDEFG’,3,2)=’ABEFG’

 Ex2: Suppose T=’ABCDEFG’ and P=’DC’. Then INDEX(T,P)=0


and LENGTH(P)=2. Hence, by the “zero case”
DELETE (‘ABCDEFG’,0,2)=’ABCDEFG’
Algorithm: A text T and pattern P are in memory. This algorithm
deletes every occurrence of P in T.
Find index of P.] Set K:=INDEX(T,P)
Repeat while K!=0;
[Delete P from T.]
Set T: =DELETE (T, INDEX (T, P),
LENGTH (P))
[Update index.] Set K: =Index (T, P).
[End of loop.]
Write: T.
Exit.
Ex. T=XABYABZ,

P=AB
Algorithm executed twice.
During first execution, the first occurrence of AB in T is deleted.
Result T=XYABZ
During the second execution, the remaining occurrence of AB
in T
REPLACEMENT
Suppose in a given text T we want to replace the first occurrence of
a pattern P1 is by a pattern P2. We will denote this operation by
REPLACE (text, pattern1, pattern2)

Ex. REPLACE(‘XABYABZ’, ‘AB’,’C’)=’XCYABZ’


REPLACE(‘XABYABZ’, ‘BA’,’C’)=’XABYABZ’
Specifically, the REPLACE function can be executed by using the
following three steps:
K;=INDEX(T,P1)
T:=DELETE(T,K,LENGTH(P1))
INSERT(T,K,P2)

Suppose we want to replace


every occurrence of the pattern
Q. This
might be accomplished by
repeatedly applying
Algorithm: A text T and patterns P and Q are in memory, this
algorithm replaces every occurrence of P in T by Q.
[Find index of P.] Set K:=(INDEXT,P).
Repeat while K!=0:
[Replace P by Q.] Set
T:=REPLACE(T,P,Q).
[Update index.] Set K:=INDEX(T, P).
[End of Loop.]
Write: T.
Exit.
Ex:
T=XABYABZ,

P=AB,

Q=C
Algorithm
executed
twice.
During first
execution, the
first
PATTERN MATCHING ALGORITHM
Pattern matching is the problem of deciding whether or not
a given string pattern appears in a string text T. We assume
that the length of P does not exceed the length of T.

The First Pattern Matching Algorithm:


We compare a given pattern P with each of the substring of
T, moving from left to right, until we get a match

Wk=SUBSTRING(T,K,LENGTH(P))

WK-Substring of T having the same length as P and


beginning with the Kth character of T.
Suppose Text T=abaabbaabba
and pattern P=bba
Total no. of substring MAX=S-R+1
Where S= Length of Text , R= Length of Pattern
So MAX= 11-3+1= 9 (Total substrings)
W1=aba

W2=baa

W3=aab

W4=abb

W5=bba

W6=baa

W7=aab
FIRST PATTERN MATCHING ALGORITHM
Algorithm:(Pattern matching ) P and T are T=abaabbaabba
strings with lengths R and S, Respectively, and
are stored as arrays with one character per and pattern
element. This algorithm finds the INDEX of P in P=bba
T.
Step 1: [Initialize.] Set K:=1 and MAX:=S-R+1.
Step 2: Repeat Steps 3 to 5 while K!=MAX:
Step 3: Repeat for L=1 to R: [Tests each
character of P.]
If P[L]!=T[K+L-1]. Then : Go to step 5.
[End of inner loop.]
Step 4:[Success.] Set INEDX=K, and
Exit.
Step 5:K:=K+1.
[End of step 2 outer loop.]
Step 6: [Failure.] Set INDEX=0.
Step 7: Exit.
COMPLEXITY OF FIRST PATTERN MATCHING
Best case complexity of First pattern matching algorithm
is???
When P is r character string and T is s character string, the
data size for the algorithm is
n=r+s
Worst case occurs when every character of P except last
matches with every substring Wk
C(n)=r(s-r+1) for fixed n we have s=n-r
C(n)=r(n-2r+1)
=nr-2r2+r

C’=dc/dr=0
C’=n-4r+1
0=n-4r+1
r=(n+1)/
4
The maximum value of C(n) occurs when r=(n+1)/4
so
FIND INDEX AND COMPARISIONS WITH FIRST PATTERN
MATCHING?????

1. Suppose P=aaba and T=cdcd…….cd=(cd)10

2. T=aaaaa………a=(a)20 and P=aaab

3. P=abc T=(ab)5
SECOND PATTERN MATCHING ALGORITHM
: Second pattern matching algorithm uses a table which is derived from a
particular pattern P but is independent of the text T.
Ex. Suppose P = aaba
Suppose T = T1, T2, T3 . . . . , where T1 denotes the ith character of T; and
suppose the first two characters of T match those of P; i.e. , suppose T = aa.
. . . Then T has one of the following three forms:
1.T = aab . . . . ,
2.T = aaa . . . . ,
3.T = aax

 Where x is any character different from a or b. Suppose we read T3 and


find that T3 = b.
Then we next read T , to see if T = a, which will give a match of P with
4 4
W1 .
On the other hand, suppose T = a. Then we know that P ! = W ; but we
3 1
also know that W2 = aa . . . , i.e. , that the first two characters of the
substring W2 match those of P.
 Hence we next read T to see if T = b.
4 4
Last, suppose T = x, Then we know that P! = W , but we also know that
3 1
P ! = W2 and P! = W3 , since x does not appear in P.
Hence we next read T to see if T = a, i.e., to see if the first character of
4 4
W4 matches the first character of P
Ex. Following table is used in our a b x
second pattern matching algorithm for
Q0 Q1 Q0 Q0
the pattern P = aaba.
The table is obtained as: First of all, we Q1 Q2 Q0 Q0
let Qi denote the initial substring of P of
Q2 Q2 Q3 Q0
length i; hence
Q0 = Ʌ, Q1 = a, Q2 = a2, Q3 = a2b, Q4 = Q3 P Q0 Q0
a2ba = P
The rows of the table are labeled by
these initial substrings of P, excluding P
itself. The columns of the table are
labeled a, b and x represents any
character that
Doesn’t appear in the pattern P.
a a b a
Q0 Q1 Q2 Q3 P
b
b b a

a b x
Q0 Q1 Q0 Q0
Q1 Q2 Q0 Q0
Q2 Q2 Q3 Q0
Q3 P Q0 Q0
a b x

Q0 Q1 Q0 Q0

Q1 Q2 Q0 Q0

Q2 Q2 Q3 Q0

Q3 P Q0 Q0
SECOND PATTERN MATCHING/FAST PATTERN MATCHING ALGORITHM
SEARCHING: LINEAR SEARCH:
Let DATA be a collection of data elements in
memory, and suppose a specific ITEM of information
is given.
 Searching refers to the operation of finding the

location LOC of ITEM in DATA, or printing some


message that ITEM does not appear there.
The search is said to be successful if ITEM does

appear in DATA and unsuccessful otherwise.


Linear Search: The method of searching, which
traverses DATA sequentially to locate ITEM, is called
linear search or sequential search.
LINEAR SEARCH

Algorithm:
(Linear search) LINEAR (DATA, N, ITEM, LOC),
here DATA is a linear array with N elements, and
ITEM is a given item of information. This algorithm
finds the location LOC of ITEM in DATA, or sets
LOC: =0 if the search is unsuccessful.
1. [Insert ITEM at the end of DATA.] Set
DATA [N+1]:= ITEM.
2. [Initialize counter.] Set LOC: =1.
3. [Search for ITEM.]
Repeat while DATA [LOC]! =ITEM:
Set LOC: =LOC+1
[End of loop]
4. [Successful?] If LOC= N+1, then: Set LOC: =0.
5. Exit.
COMPLEXITY OF THE LINEAR SEARCH ALGORITHM

 Measured by the number f(n) of comparisons required to


find ITEM in DATA where DATA contains n elements.
 Worst Case: occurs when one must search through the
entire array DATA, i.e., when ITEM does not appear in
DATA.
f(n)=n+1 i-e f(n)=n
BINARY SEARCH
Algorithm:(Binary Search) BINARY(DATA, LB, UB, ITEM, LOC), here DATA is a sorted array
with lower bound LB and upper bound UB, and ITEM is a given item of information, the
variables BEG, END and MID denote, respectively, the beginning, end and middle locations of
a segment of elements of DATA. This algorithm finds the location LOC of ITEM in DATA or
sets LOC=NULL.

1. [Initialize segment variables.]


Set BEG:=LB, END:=UB and MID=INT((BEG+END)/2).
2. Repeat Steps 3 and 4 while BEG<=END and DATA[MID]!=ITEM.
3. If ITEM < DATA[MID], then:
Set END:=MID-1. Else:
Set BEG:=MID+1.
[End of If structure.]
4. Set MID:=INT((BEG+END)/2).
[End of Step 2 loop.]
5. If DATA[MID]=ITEM, then:
Set
LOC:=MID.
Else:
Set
LOC:=NULL.
[End of If
BINARY SEARCH EXAMPLE
MULTIDIMENSIONAL ARRAYS
Since each element in the array is referenced by a single subscript.
Most programming languages allow two-dimensional and three
dimensional arrays, i.e arrays where elements are referenced,
respectively, by two and three subscripts.
Two-Dimensional Arrays:
A two-dimensional m x n array A is a collection of m. n data
elements such that each element is specified by a pair of integers
(such as J, K), called subscripts.
Two-dimensional arrays are called matrices in mathematics and
tables in business applications; hence two-dimensional arrays are
sometimes called matrix arrays.
There is a standard way of drawing a two-dimensional m x n array
A where the elements of A from a rectangular array with m rows
and n columns and where the element A[J,K] appears in row J and
column K.
REPRESENTATION OF TWO-DIMENSIONAL ARRAYS IN
MEMORY:

Let A be a two-dimensional m x n array. The array will be represented


in memory by a block of m. n sequential memory locations. Specifically,
the programming language will store the array A either

 Columnmajor order, or
 Row major order

 Theparticular representation used depends upon the programming


language, not the user.

The computer does not keep track of the address LOC(LA[K]) of every
element LA[K], but does keep track of Base(LA), the address of the first
element of LA. The computer uses the formula

LOC (LA [K]) =Base (LA) + w (K-1)


To find the address of LA[K] in time independent of K. (Here w is the number of
words per memory cell for the array LA, and l is the lower bound of the index of
the index set of LA.)

The computer keeps track of Base (A)- the address of the first element A[1, 1] of
A-and computes the address LOC(A[J, K]) of A[J, K] using the formula.

For Column Major order


LOC (A [J, K]) =Base (A) + w [M (K-1) + (J-1)]
For Row Major Order
LOC (A [J, K]) = Base (A) + w [N (J-1) + (K-1)]
General Multidimensional Arrays: An n-Dimensional m1 x m2
x m3 x. . . . . . x mn array B is a collection of m1. m2 . m3 . . . . . mn
data elements in which each element is specified by a list of n
integers- such as K1, K2, K3,. . . . Kn – Called subscripts.

The array will be stored in memory in a sequence of memory


locations. Specifically programming language will store the
array B either in row major or in column major order.
The definition of general multidimensional arrays also permits lower bounds
other than 1. Let C be such an n-dimensional array. The index set for each
dimension of C consists of the consecutive integers from the lower bound to
the upper bound of the dimension. The length Li of dimension i of C is the
number of elements in the index set, and Li can be calculated from
Li= upper bound – lower bound + 1
Fora given subscript Ki, the effective index Ei of Li is the number of indices
preceding Ki in the index set, and Ei can be calculated from
Ei=Ki- lower bound
Then the address LOC(C[K1,K2,K3,. . . . . . .,KN] of an arbitrary element of
C can be obtained from the formula

 For Column major order


Base(C) +w [(((…(ENLN-1 +
EN-1)LN-2) +… + E3)L2 + E2) L1
+ E1]

 Forrow major order


Base(C) + w [(…((E1L2 + E2)L3 +
E3)L4 +…+ EN-1)LN+EN]

You might also like