0% found this document useful (0 votes)
65 views69 pages

DS 1

This document outlines the syllabus for a course on data structures. It covers linear data structures like arrays, linked lists, stacks, and queues. It also covers non-linear data structures like trees and graphs. Key topics include algorithms for common data structure operations, analyzing algorithm complexity, and different sorting methods. The objectives are to understand various data structures and algorithms, and be able to implement common operations on both linear and non-linear data structures.

Uploaded by

Nayan Gaulkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views69 pages

DS 1

This document outlines the syllabus for a course on data structures. It covers linear data structures like arrays, linked lists, stacks, and queues. It also covers non-linear data structures like trees and graphs. Key topics include algorithms for common data structure operations, analyzing algorithm complexity, and different sorting methods. The objectives are to understand various data structures and algorithms, and be able to implement common operations on both linear and non-linear data structures.

Uploaded by

Nayan Gaulkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

4IT04 DATA STRUCTURE

Faculty : Dr. Pranjali Deshmukh


SYLLABUS
 UNIT I :
Algorithms and Linear Data Structure: Array Introduction: Data, Data
Structure and their types. Algorithm and their Complexity, String
processing operations, Pattern matching algorithms: fast and slow.
Array: Types of array, memory representation of array, Algorithm and
operations on Array: traversing, searching, insertion, deletion.
Applications (7 Hrs)

 UNIT II:
Algorithms and Linear Data Structure: Linked List (LL) Linked List:
Features, Representation of Linked List in memory using array, Types
of LL, Algorithms and operations onto LL: traversing, insertion,
deletion, searching & their implementation, Applications (5 Hrs)

 UNIT III:
Linear Data Structure: Stack and Queue Stack: Definition, Memory
representation of Stacks using array and Linked List. Operations on to
Stack: Push and Pop. Stack Applications: Recursion, Solve arithmetic
expressions, tower of Hanoi etc. Queue: Definition, Memory
representation of Queue using array and Linked List, Types of queue,
Operations on queues: Traversing, Insertion, Deletion, Searching.
Applications (6 Hrs)
SYLLABUS
 UNIT IV:
Sorting, Sorting Methods and its Algorithms Simple Sorting
Algorithms, Bubble Sort, Quick Sort, Insertion Sort, Selection
Sort, Heap Sort, Merge Sort, Bucket Sort and their Applications.
(6 Hrs)

 UNIT V:
Non-Linear Data Structure: Tree Trees: Terminology, Types,
Binary trees and their representation in memory, traversing in
binary trees using stacks. Binary Search Trees, searching,
inserting and deleting nodes in binary trees, Heap tree, Path
length & Huffman’s algorithm, Spanning Trees, Basic concepts of
Kruskal’s and Prim’s Algorithm, B+ tree. (6 Hrs)

 UNIT VI:
Non-Linear Data Structure: Graph Graph: Definitions,
Sequential and Linked-list representation of Graphs, Warshalls’
algorithm, Bridges in graph, Johnsons algorithm. Graph
Traversals: Breadth First Search, Depth First Search, Topological
Sort, Shortest Path Algorithms: Unweighted Shortest Paths,
Basic concepts of Dijkstra’s Algorithm.
 Text Books:
1. Mark Allen Weiss, ‘Data Structures and Algorithm
Analysis in C++’, 3/e, Florida International
University, ISBN 0-321-37531-9
2. Seymour Lipschutz, ‘Theory & Problems of Data
Structures’, Schaum’s Outline Series (Mc Graw-Hill)
International Editions
 Reference Books:
1.John Hubbard: ‘Schaum’s Outline DataStructure
with C++’, ISBN-13: 978-0071353458
2.Jean-Paul Tremblay, Paul G. Sorenson, P. G.
Sorenson, ‘An Introduction to Data Structures With
Applications’, (McGraw-Hill Computer Science
Series), ISBN-13: 978-0070651579
3.Ellis Horowitz, Sartaj Sahni, Rajasekaran ,
‘Computer Algorithms/C++’, 2nd edition, 2019.
COURSE OBJECTIVES
 To understand the role of Data Structure in memory
management
 To acquire knowledge of different types of data structures like:
array, types of array, linked list, stacks, queues, trees, and their
memory representation
 To learn the fundamental concept of data structure and
emphasize the importance of it in developing and implementing
efficient algorithms
 To analyze complexity of algorithms in terms of time and
memory space
 To Understand data structure, types of data structure and their
common applications
 To study the use of algorithms to perform the operations on data
structure such as traversing, insertion, deletion, searching,
sorting and merging
 To understand importance and applications of linear and non-
linear data structure
 To obtained knowledge and skill of Sorting Methods such as:
Bubble Sort, Quick Sort, Merge Sort, Selection Sort and Bucket
Sort
 To Learn and acquire knowledge about the use of Tree and
Graph in applications
COURSE OUTCOMES
 Define fundamental features of array, linked-list, stack,
queue, tree and graph
 Write the algorithms to perform various operations such
as: Search, Insertion, Deletion, Sort etc
 Implement algorithms for various operations on linear
and non-linear data structure
 Classify the linear data structures such as Array, Linked-
List, Stack, Queue and non-linear data Structures such
as Tree and Graph with their applications
 Implement linear data structures: Array, Linked-list,
Stack, Queue using suitable language C,C++
 Implement non-linear data structure: Tree, Graph using
C or C++
 know different types of sorting methods and their
algorithms
 Choose appropriate algorithm for Searching
 Perform operations of traverse, insertion, deletion
UNIT I

Algorithms and Linear Data Structure: Array


Introduction: Data, Data Structure and their types.
Algorithm and their Complexity, String processing
operations, Pattern matching algorithms: fast and
slow. Array: Types of array, memory representation
of array, Algorithm and operations on Array:
traversing, searching, insertion, deletion.
Applications
DATA STRUCTURES
➢Data may be organized in many different ways; the
logical or mathematical model of a particular
organization of data is called a data structure.

➢The choice of a particular data model depends on two


considerations.
➢ First, it must be rich enough in structure to mirror
the actual relationships of the data in the real
world.

➢ On the other hand, the structure should be simple


enough that one can effectively process the data
when necessary.
Classification of Data Structures
DATA STRUCTURE OPERATIONS
The data appearing in data structures are processed by means of
certain operations as follows:
•Traversing: Accessing each record exactly once so that certain items
in the record may be processed.
•Searching: finding the location of the record with a given key value,
or finding the locations of all records which satisfy one or more
conditions.
•Inserting: Adding a new record to the structure.
•Deleting: Removing a record from the structure.
The following two operations, which are used in special situations,
will also be considered.
•Sorting: Arranging the records in some logical order.
•Merging: Combining the records in two different sorted files into a
single sorted file.
ALGORITHMS: COMPLEXITY, TIME-SPACE TRADEOFF:

➢Algorithm: An algorithm is a well-defined list of steps for solving a


particular problem. The time and space it uses are two major measures of the
efficiency of an algorithm.

➢Complexity: The complexity of an algorithm is the function which gives


the running time and/or space in terms of the input size.

➢Time-Space Tradeoff:
• Each of algorithms will involve a particular data structure.
Accordingly, we may not always be able to use the most efficient
algorithm, since the choice of data structure depends on many things,
including the type of data structure and the frequency with which
various data operations are applied.

• Sometimes the choice of data structure involves a time-space tradeoff:


by increasing the amount of space for storing the data, one may be able
to reduce the time needed for processing the data, or vice versa.
ALGORITHMIC NOTATIONS
Conventions that we will use in presenting algorithms are:

Identifying Numbers: Each algorithm is assigned an identifying number

Steps, Control, Exit: The steps of the algorithm are executed one after
other, beginning with step 1, unless indicated otherwise. Control may be
transferred to step n of the algorithm by the statement “Go to step n”.
Generally Go to statements may be practically eliminated by using certain
control structures. The algorithm is completed when statement Exit is
encountered.

Comments: Each step may contain a comment in brackets which


indicates the main purpose of the step.

Variable Names: Variable names will use capital letters, as in MAX and
DATA.

Assignment Statement: Assignment statements will use the dots-equal


notation:=that is used in Pascal.

Ex. Max: =DATA [1] Assigns the value in DATA [1] to MAX.
 Input and Output: Data may be input and assigned to variables by
means of a Read statement with following form:
Read: Variable names.

 Similarly, messages, placed in quotation marks, and data in


variables may be output by means of a Write or Print statement
with the following form:
 Write: Message and/or variable names.

 Procedures: The term procedure will be used for an independent


algorithmic module which solves a particular problem.

 Control Structures: Algorithms and their equivalent computer


programs are more easily understood if they mainly use self-
contained modules and three types of logic, or flow of control,
called
❑ Sequence logic, or sequential flow
❑ Selection logic, or conditional flow
❑ Iteration logic, or repetitive flow
COMPLEXITY OF ALGORITHMS
 In order to compare algorithms, we must have some criteria to
measure the efficiency of algorithms.
 Suppose M is an algorithm, and suppose n is the size of the
input data.
 The time and space used by the algorithm M are two main
measures for the efficiency of M.
 The time is measured by counting the number of key
operations- in sorting and searching algorithms, for example,
the number of comparisons. The space is measured by counting
the maximum of memory needed by the algorithm.
 The complexity of an algorithm M is the function f(n) which
gives the running time and/or storage space requirement of the
algorithm in terms of the size n of the input data.
 Frequently, the storage space required by an algorithm is
simply a multiple of the data size n.
 Accordingly, unless otherwise stated or implied, the term
complexity shall refer to the running time of the algorithm.
COMPLEXITY OF ALGORITHMS

 The two cases one usually investigates in complexity


theory are as follows:
❑ Worst case: The maximum value of f(n) for any possible
input.
❑ Average case: The expected value of f(n).
❑ Sometimes we also consider the minimum possible value of
f(n), called the best case
 The average case analysis assumes a certain probabilistic
distribution for the input data; one such assumption
might be that all possible permutations of an input data
set are equally likely
EX. (LINEAR SEARCH)
 A linear array DATA with N elements and specific ITEM of
information are given. This algorithm finds the location LOC of
ITEM in the array DATA or sets LOC=0.
1. [Initialize] Set K:=1 and LOC:=0
2. Repeat Steps 3and 4 while LOC=0 and K<=N
3. If ITEM=DATA [K], then: Set LOC: =K.
4. Set K: =K+1. [Increments counter.]
[End of Step 2 loop]
5. [Successful?]
If LOC=0, then:
Write: ITEM is not in the array DATA.
Else:
Write: LOC is the location of ITEM.
[End of If Structure.]
6. Exit.
 The complexity of the search algorithm is given by the number of C
comparisons between ITEM and DATA [K]. We seek C (n) for the
worst case and the average case.
 Worst Case: Worst case occurs when ITEM is the last element in the
array DATA or is not there at all. In either situation, we have C(n)=n
Accordingly, C(n)=n is the worst case complexity of the linear search
algorithm.
 Average case: Here we assume that ITEM does not appear in DATA,
and that is equally likely to occur at any position in the array.
Accordingly, the number of comparisons can be any of the numbers
1,2,3,……,n, and each number occurs with probability p=1/n. Then
RATE OF GROWTH; BIG O NOTATION
 Suppose M is an algorithm, and suppose n is the size of the input data.
 Clearly the complexity f(n) of M increases as n increases.
 It is usually the rate of increase of f(n) that we want to examine.
 This is usually done by comparing f(n) with some standard function, such
as
log2 n, n log2 n, n2, n3, 2n
 The rates of growth for these standard functions are indicated in following
table, which gives approximate values for certain values of n.

 Suppose f(n) and g(n) are functions defined on the positive integers with
the property that f(n) is bounded by some multiple of g(n) for almost all n.
this is, suppose there exist a positive integer n0 and a positive number M
such that, for all n> n0, we have |f(n)|<=M|g(n)| Then we may write
 f(n)=O(g(n))
 which is read “f(n) is of order g(n).”
LINEAR ARRAYS
 A Linear array is a list of a finite number n of homogeneous data elements
such that:
❑ The elements of the array are referenced respectively by an index set
consisting of n consecutive numbers.
❑ The elements of the array are stored respectively in successive memory
locations.
 The number n of elements is called the length or size of the array
 If not explicitly stated, we will assume the index set consists of the integers
1,2,…n.
 In general, the length or the number of elements of the array can be obtained
from the index set by the formula.
Length = UB-LB+1
Where,
▪ UB- Largest Index
▪ LB- Smallest Index
 The elements of an array A may be denoted by the subscript notation
A1,A2,A3,…………An Or by the bracket notation
A[1],A[2],A[3],…………..A[N]
REPRESENTATION OF LINEAR ARRAYS IN MEMORY
 Let LA be a linear array in the memory of the computer.
 Memory of the computer is simply a sequence of addressed locations as follows:

Notation:
LOC (LA [K]) =address of the element LA [K] of the array LA
 Computer does not need to keep track of the address of every element of LA, but needs
to keep track only of the address of the first element of LA, denoted by
Base (LA) Called base address of LA
 Using Base(LA), the computer calculates the address of any element of LA by the
following formula:
 Using Base(LA), the computer calculates the address of any
element of LA by the following formula:
LOC (LA [K]) =Base (LA) + w (K-lower bound)
Where,
 w is the number of words per memory cell for the array LA

 Given any subscript K, one can locate and access the content
of LA[K] without scanning any other element of LA.
TRAVERSING LINEAR ARRAY

Algorithm: LA is the linear array having N number of


elements. LB & UB are lower and upper bounds of array
respectively. This algorithm traverse the linear array and
apply process on it.

1. [Initialize counter.] Set K: =LB.


2. Repeat Steps 3 and 4 while K<=UB.
3. [Visit element.] Apply PROCESS to LA[K].
4. [Increase counter.] Set K: =K+1.
[End of Step 2 loop.]
5. Exit.
INSERTING AND DELETING
 Inserting: The operation of adding another element to the
collection A.
 Deleting: The operation of removing one of the elements from A.

Algorithm: (Inserting into a Linear Array) INSERT (LA, N, K,


ITEM) Here LA is a linear array with N elements and K is a positive
integer such that K<=N. This algorithm inserts an element ITEM
into the Kth position in LA.

1. [Initialize counter.] Set J: = N.


2. Repeat Steps 3 and 4 while J>=K
3. [Move Jth element downward.] Set LA [J+1]:=LA [J].
4. [Decrease counter.] Set J: =J-1.
[End of Step 2 loop.]
5. [Insert element.] Set LA [K]:=ITEM.
6. [Reset N.] Set N: =N+1.
7. Exit.
INSERTION
DELETION
Algorithm: (Deleting from a linear array) DELETE (LA, N, K,
ITEM), here LA is a linear array with N elements and K is a
positive integer such that K<=N. This algorithm deletes Kth
element from LA.
1. Set ITEM: = LA [K].
2. Repeat for J=K to N-1:
[Move J+1st element upward.] Set LA [J]:=LA [J+1]
[End of loop.]
3. [Reset the number N of elements in LA.] Set N: =N-1.
4. Exit.
DELETION
STRING PROCESSING
Basic terminology:
 Each programming language contains a character set that is used to
communicate with the computer. This set usually includes the following:
❑ Alphabet: A B C D E F G H I J K L M N O P Q R S T U V W X Y
Z
❑ Digits: 0 1 2 3 4 5 6 7 8 9
❑ Special characters: + - / * ( ) , . $ = ‘ □
 The set of special characters, which includes the blank space, frequently
denoted by □.
 A finite sequence S of zero or more characters is called a string.
 The number of characters in a string is called its length.
 The string with zero characters is called the empty string of null string.
 Specific strings will be denoted by enclosing their characters in single
quotation marks. The quotation marks will also serve as string
delimiters.

Ex. ’THE END’ ‘TO BE OR NOT TO BE’ ‘□□‘


Are strings with lengths 7, 18, and 2
Storing Strings: Strings are stored in three types
of structures:
❑ Fixedlength structure
❑ Variable length structure with fixed maximum
❑ Linked structure
RECORD ORIENTED, FIXED-LENGTH STORAGE
In fixed storage each line of print is viewed as record,
where all records have the same length, i.e. where each
record accommodates the same number of characters. .
Since earlier systems used to input on terminals with 80-
coloumns images or using 80-coloumn cards, we will
assume our records have length 80 unless otherwise stated
or implied.
Disadvantages:
 Time is wasted reading an entire record if most of the
storage consists of inessential blank spaces.
 Certain records may require more space than available

 When correction consists of more or fewer characters than


the original text, changing a misspelled word requires the
entire record to be changed.
FIXED LENGTH STRUCTURE
2. Variable-Length Storage with fixed maximum: The storage
of variable-length strings in memory cells with fixed lengths
can be done in two general ways:
 One can use a marker, such as two dollar signs ($$), to signal
the end of the string.
 One can list the length of the string-as an additional item in
the pointer array.
3. Linked Storage: By linked list, we mean a linearly ordered
sequence of memory cells, called nodes, where each node
contains an item, called a link, which points to the next node
in the list.
 Each memory cell is assigned one character or a fixed
number of characters, and a link contained in the cell gives
the address of the cell containing the next character or group
of characters in the string.
VARIABLE LENGTH STRUCTURE WITH FIXED MAXIMUM
VARIABLE LENGTH STRUCTURE
LINKED STRUCTURE WITH ONE WORD AND FOUR WORD
CHARACTER DATA TYPES
Various programming languages handle the character data type. Each data type has its
own formula for decoding a sequence of bits in memory.
 Constants: Many programming languages denote string constants by placing the

string in either single or double quotation marks.


 Ex. ‘THE END’ “TO BE OR NOT TO BE” are string constants of lengths 7 and 18

characters respectively.
 Variables: Each programming language has its own rules for forming character

variables. However, such variables fall into one of three categories:


❑ Static: By static character variable, we mean a variable whose length is defined

before the program is executed and cannot change throughout the program.
❑ Semistatic: By semistatic character variable, we mean a variable whose length

may vary during the execution of the program as long as the length does not
exceed a maximum value determined by the program before the program is
executed.
❑ Dynamic: By dynamic character variable, we mean a variable whose length can

change during the execution of the program.


These three categories correspond, respectively, to the ways the strings are stored in
the memory of the computer.
STRING OPERATIONS
Although a string may be viewed simply as a sequence or linear array
of characters, there is a fundamental difference in use between strings
and other types of arrays.
Specifically, groups of consecutive elements in a string, called
substrings, may be units unto themselves.
Furthermore, the basic units of access in a string are usually these
substrings, not individual characters.

Substring:
Accessing a substring from a given string requires three pieces of
information:
1. The name of the string or the string itself
2. The position of the first character of the substring in the given
string
3. The length of the substring or the position of the last character of
the substring.
We call this operation SUBSTRING. Specifically, we write
SUBSTRING (string, initial, length)
To denote the substring of a string S beginning in a position K
having a length L.
Ex. SUBSTRING (‘TO BE OR NOT BE’, 4, 7) =’BE OR N’
SUBSTRING (‘THE END’, 4, 4) =’□END’
INDEXING

Indexing, also called pattern matching, refers to finding the


position where a string pattern P first appears in a given text
T. we call this operation INDEX and write

INDEX (text, pattern)

If the pattern P does not appear in the text T, then INDEX is
assigned the value 0. the arguments “text” and “pattern” can
be either string constants or string variables.

Ex. ‘HIS FATHER IS THE PROFESSOR’ Then


INDEX(T, ‘THE’), INDEX(T,’THEN’) and INDEX(T,’
□THE □’)

Have the values 7, 0 and 14 respectively.


CONCATENATION

Let S1 and S2 be strings.


Then concatenation of S1 and S2, which we denote S1//S2, is
the string consisting of the characters of S1 followed by the
characters of S2.

Ex. Suppose S1=’MARK and S2=’TWAIN’. Then:


S1//S2=’MARKTWAIN’ but S1//’□’//S2=’MARK TWAIN’

Length: The number of characters in a string is called its


length. We will write

LENGTH (string) for the length of a given string.

Ex. Suppose S=’COMPUTER’. Then :


LENGTH (S) =8
LENGTH (‘MARK TWAIN’) = 10
WORD/TEXT PROCESSING
In earlier times, character data processed by the computer
consisted mainly of data items, such as names and addresses.

Today the computer also processes printed matter, such as


letters, articles and reports.

It is in this latter context that we use the term “word


processing.”

Given some printed text, the operations usually associated


with word processing are as follows:

❑Replacement: Replacing one string in the text by another.


❑Insertion: Inserting a string in the middle of the text.
❑Deletion: Deleting a string from the text.

The above operations can be executed by using the string


operations.
INSERTION
Suppose in a given text T we want to insert a string S so that S
begins in position K. We denote this operation by

INSERT (text, position, string)

Ex: INSERT(‘ABCDEFG’,3,’XYZ’)=’ABXYZCDEFG’
INSERT (‘ABCDEFG’,6,’XYZ’)=’ABCDEXYZFG’

This insert function can be implemented by using the string


operation as follows

INSERT (T, K, S)= SUBSTRING (T, 1, K-1) //S//


SUBSTRING (T, K, LENGTH(T)-K+1)
DELETION

Suppose in a given text T we want to delete the substring


which begins at position K and has length L. We denote this
operation by
DELETE(text, position, length)

Ex: DELETE (‘ABCDEFG ‘, (4,2)=’ABCFG’


DELETE (‘ABCDEFG’, 2,4)=’AFG’

We assume that nothing is deleted if position K=0. Thus


DELETE (‘ABCDEFG’, 0,2)=’ABCDEFG’

The delete function can be implemented using the string


operations as follows.

DELETE (T, K, L) = SUBSTRING (T, 1, K-1)//


SUBSTRING (T, K +L, LENGTH (T)-K-L+1)
That is. The initial substring of T before position K is concatenated
with the terminal substring of T beginning at position K+L. The length
of the initial substring is K-1, and the length of the terminal substring
is:
LENGTH (T)- (K+L-1)=LENGTH (T)-K-L+1
We also assume that DELETE (T,K,L)=T when K=0.
Now suppose, in the text T, we first compute INDEX(T,P), the position
where P first occurs in T, and then we compute LENGTH (P), The
number of characters in P.

When INDEX(T,P)=0 the text T is not changed.

 Ex1: Suppose T=’ABCEDEFG’ and P=’CD’. Then INDEX(T,P)=3


and LENGTH (P)=2. Hence DELETE
(‘ABCDEFG’,3,2)=’ABEFG’

 Ex2: Suppose T=’ABCDEFG’ and P=’DC’. Then INDEX(T,P)=0


and LENGTH(P)=2. Hence, by the “zero case”
DELETE (‘ABCDEFG’,0,2)=’ABCDEFG’
Algorithm: A text T and pattern P are in memory. This algorithm
deletes every occurrence of P in T.
Find index of P.] Set K:=INDEX(T,P)
Repeat while K!=0;
[Delete P from T.]
Set T: =DELETE (T, INDEX (T, P), LENGTH (P))
[Update index.] Set K: =Index (T, P).
[End of loop.]
Write: T.
Exit.
Ex. T=XABYABZ, P=AB
Algorithm executed twice.
During first execution, the first occurrence of AB in T is deleted.
Result T=XYABZ
During the second execution, the remaining occurrence of AB in T
is deleted, so that T=XYZ
Result is XYZ
Ex2. T=XAAABBBY P=AB
REPLACEMENT
Suppose in a given text T we want to replace the first occurrence of
a pattern P1 is by a pattern P2. We will denote this operation by
REPLACE (text, pattern1, pattern2)

Ex. REPLACE(‘XABYABZ’, ‘AB’,’C’)=’XCYABZ’


REPLACE(‘XABYABZ’, ‘BA’,’C’)=’XABYABZ’
Specifically, the REPLACE function can be executed by using the
following three steps:
K;=INDEX(T,P1)
T:=DELETE(T,K,LENGTH(P1))
INSERT(T,K,P2)

Suppose we want to replace every occurrence of the pattern Q. This


might be accomplished by repeatedly applying

REPLACE(T,P,Q) until INDEX(T,P)=0


Algorithm: A text T and patterns P and Q are in memory, this
algorithm replaces every occurrence of P in T by Q.
[Find index of P.] Set K:=(INDEXT,P).
Repeat while K!=0:
[Replace P by Q.] Set T:=REPLACE(T,P,Q).
[Update index.] Set K:=INDEX(T, P).
[End of Loop.]
Write: T.
Exit.
Ex: T=XABYABZ, P=AB, Q=C
Algorithm executed twice.
During first execution, the first occurrence of AB in T is replaced by C
to Yield T=XCYABZ.
During second execution, the remaining AB in T is replaced by C to
yield T= XCYCZ.
Hence XCYCZ is the output.

Ex2. T=XAY, P=A, Q=AB

Algorithm will never terminate


PATTERN MATCHING ALGORITHM
Pattern matching is the problem of deciding whether or not
a given string pattern appears in a string text T. We assume
that the length of P does not exceed the length of T.

The First Pattern Matching Algorithm:


We compare a given pattern P with each of the substring of
T, moving from left to right, until we get a match

Wk=SUBSTRING(T,K,LENGTH(P))

WK-Substring of T having the same length as P and


beginning with the Kth character of T.
Suppose Text T=abaabbaabba
and pattern P=bba
Total no. of substring MAX=S-R+1
Where S= Length of Text , R= Length of Pattern
So MAX= 11-3+1= 9 (Total substrings)
W1=aba
W2=baa
W3=aab
W4=abb
W5=bba
W6=baa
W7=aab
W8=abb
W9=bba
Pattern matches with 5th substring so pattern found at
INDEX=5
No. of comparisons C= 1+2+1+1+3=8
FIRST PATTERN MATCHING ALGORITHM
Algorithm:(Pattern matching ) P and T are T=abaabbaabba
strings with lengths R and S, Respectively, and
are stored as arrays with one character per and pattern
element. This algorithm finds the INDEX of P in P=bba
T.
Step 1: [Initialize.] Set K:=1 and MAX:=S-R+1.
Step 2: Repeat Steps 3 to 5 while K!=MAX:
Step 3: Repeat for L=1 to R: [Tests each
character of P.]
If P[L]!=T[K+L-1]. Then : Go to step 5.
[End of inner loop.]
Step 4:[Success.] Set INEDX=K, and Exit.
Step 5:K:=K+1.
[End of step 2 outer loop.]
Step 6: [Failure.] Set INDEX=0.
Step 7: Exit.
COMPLEXITY OF FIRST PATTERN MATCHING
Best case complexity of First pattern matching algorithm
is???
When P is r character string and T is s character string, the
data size for the algorithm is
n=r+s
Worst case occurs when every character of P except last
matches with every substring Wk
C(n)=r(s-r+1) for fixed n we have s=n-r
C(n)=r(n-2r+1)
=nr-2r2+r
C’=dc/dr=0
C’=n-4r+1
0=n-4r+1
r=(n+1)/4
The maximum value of C(n) occurs when r=(n+1)/4
so
C(n)=(n+1)2 /8 =O(n2)
FIND INDEX AND COMPARISIONS WITH FIRST PATTERN
MATCHING?????

1. Suppose P=aaba and T=cdcd…….cd=(cd)10

2. T=aaaaa………a=(a)20 and P=aaab

3. P=abc T=(ab)5
SECOND PATTERN MATCHING ALGORITHM
: Second pattern matching algorithm uses a table which is derived from a
particular pattern P but is independent of the text T.
Ex. Suppose P = aaba
Suppose T = T1, T2, T3 . . . . , where T1 denotes the ith character of T; and
suppose the first two characters of T match those of P; i.e. , suppose T = aa.
. . . Then T has one of the following three forms:
1.T = aab . . . . ,
2.T = aaa . . . . ,
3.T = aax

Where x is any character different from a or b. Suppose we read T3 and


find that T3 = b.
Then we next read T4, to see if T4 = a, which will give a match of P with
W1 .
On the other hand, suppose T3 = a. Then we know that P ! = W1 ; but we
also know that W2 = aa . . . , i.e. , that the first two characters of the
substring W2 match those of P.
Hence we next read T4 to see if T4 = b.
Last, suppose T3 = x, Then we know that P! = W1 , but we also know that
P ! = W2 and P! = W3 , since x does not appear in P.
Hence we next read T4 to see if T4 = a, i.e., to see if the first character of
W4 matches the first character of P
Ex. Following table is used in our a b x
second pattern matching algorithm for
the pattern P = aaba. Q0 Q1 Q0 Q0
The table is obtained as: First of all, we Q1 Q2 Q0 Q0
let Qi denote the initial substring of P
of length i; hence Q2 Q2 Q3 Q0
Q0 = Ʌ, Q1 = a, Q2 = a2, Q3 = a2b, Q4 = Q3 P Q0 Q0
a2ba = P
The rows of the table are labeled by
these initial substrings of P, excluding P
itself. The columns of the table are
labeled a, b and x represents any
character that
Doesn’t appear in the pattern P.
a a b a
Q0 Q1 Q2 Q3 P
b
b b a

a b x
Q0 Q1 Q0 Q0
Q1 Q2 Q0 Q0
Q2 Q2 Q3 Q0
Q3 P Q0 Q0
a b x

Q0 Q 1 Q0 Q0
Q1 Q 2 Q0 Q0
Q2 Q 2 Q3 Q0
Q3 P Q0 Q0
SECOND PATTERN MATCHING/FAST PATTERN MATCHING ALGORITHM
SEARCHING: LINEAR SEARCH:
Let DATA be a collection of data elements in
memory, and suppose a specific ITEM of information
is given.
Searching refers to the operation of finding the
location LOC of ITEM in DATA, or printing some
message that ITEM does not appear there.
The search is said to be successful if ITEM does
appear in DATA and unsuccessful otherwise.
Linear Search: The method of searching, which
traverses DATA sequentially to locate ITEM, is called
linear search or sequential search.
LINEAR SEARCH

Algorithm:
(Linear search) LINEAR (DATA, N, ITEM, LOC),
here DATA is a linear array with N elements, and
ITEM is a given item of information. This algorithm
finds the location LOC of ITEM in DATA, or sets
LOC: =0 if the search is unsuccessful.
1. [Insert ITEM at the end of DATA.] Set DATA
[N+1]:= ITEM.
2. [Initialize counter.] Set LOC: =1.
3. [Search for ITEM.]
Repeat while DATA [LOC]! =ITEM:
Set LOC: =LOC+1
[End of loop]
4. [Successful?] If LOC= N+1, then: Set LOC: =0.
5. Exit.
COMPLEXITY OF THE LINEAR SEARCH ALGORITHM

 Measured by the number f(n) of comparisons required to


find ITEM in DATA where DATA contains n elements.
 Worst Case: occurs when one must search through the
entire array DATA, i.e., when ITEM does not appear in
DATA.
f(n)=n+1 i-e f(n)=n
BINARY SEARCH
Algorithm:(Binary Search) BINARY(DATA, LB, UB, ITEM, LOC), here DATA is a sorted array
with lower bound LB and upper bound UB, and ITEM is a given item of information, the
variables BEG, END and MID denote, respectively, the beginning, end and middle locations of
a segment of elements of DATA. This algorithm finds the location LOC of ITEM in DATA or
sets LOC=NULL.

1. [Initialize segment variables.]


Set BEG:=LB, END:=UB and MID=INT((BEG+END)/2).
2. Repeat Steps 3 and 4 while BEG<=END and DATA[MID]!=ITEM.
3. If ITEM < DATA[MID], then:
Set END:=MID-1.
Else:
Set BEG:=MID+1.
[End of If structure.]
4. Set MID:=INT((BEG+END)/2).
[End of Step 2 loop.]
5. If DATA[MID]=ITEM, then:
Set LOC:=MID.
Else:
Set LOC:=NULL.
[End of If structure.]
6. Exit.
BINARY SEARCH EXAMPLE
MULTIDIMENSIONAL ARRAYS
Since each element in the array is referenced by a single subscript.
Most programming languages allow two-dimensional and three
dimensional arrays, i.e arrays where elements are referenced,
respectively, by two and three subscripts.
Two-Dimensional Arrays:
A two-dimensional m x n array A is a collection of m. n data
elements such that each element is specified by a pair of integers
(such as J, K), called subscripts.
Two-dimensional arrays are called matrices in mathematics and
tables in business applications; hence two-dimensional arrays are
sometimes called matrix arrays.
There is a standard way of drawing a two-dimensional m x n array
A where the elements of A from a rectangular array with m rows
and n columns and where the element A[J,K] appears in row J and
column K.
REPRESENTATION OF TWO-DIMENSIONAL ARRAYS IN
MEMORY:

Let A be a two-dimensional m x n array. The array will be represented


in memory by a block of m. n sequential memory locations. Specifically,
the programming language will store the array A either

❑Column major order, or


❑Row major order

The particular representation used depends upon the programming


language, not the user.

The computer does not keep track of the address LOC(LA[K]) of every
element LA[K], but does keep track of Base(LA), the address of the first
element of LA. The computer uses the formula

LOC (LA [K]) =Base (LA) + w (K-1)


To find the address of LA[K] in time independent of K. (Here w is the number of
words per memory cell for the array LA, and l is the lower bound of the index of
the index set of LA.)

The computer keeps track of Base (A)- the address of the first element A[1, 1] of
A-and computes the address LOC(A[J, K]) of A[J, K] using the formula.

For Column Major order


LOC (A [J, K]) =Base (A) + w [M (K-1) + (J-1)]
For Row Major Order
LOC (A [J, K]) = Base (A) + w [N (J-1) + (K-1)]
General Multidimensional Arrays: An n-Dimensional m1 x m2
x m3 x. . . . . . x mn array B is a collection of m1. m2 . m3 . . . . .
mn data elements in which each element is specified by a list of n
integers- such as K1, K2, K3,. . . . Kn – Called subscripts.

The array will be stored in memory in a sequence of memory


locations. Specifically programming language will store the
array B either in row major or in column major order.
The definition of general multidimensional arrays also permits lower bounds
other than 1. Let C be such an n-dimensional array. The index set for each
dimension of C consists of the consecutive integers from the lower bound to
the upper bound of the dimension. The length Li of dimension i of C is the
number of elements in the index set, and Li can be calculated from
Li= upper bound – lower bound + 1
For a given subscript Ki, the effective index Ei of Li is the number of
indices preceding Ki in the index set, and Ei can be calculated from
Ei=Ki- lower bound
Then the address LOC(C[K1,K2,K3,. . . . . . .,KN] of an arbitrary element of
C can be obtained from the formula

For Column major order


Base(C) +w [(((…(ENLN-1 + EN-1)LN-2) +… + E3)L2 + E2) L1 + E1]

For row major order


Base(C) + w [(…((E1L2 + E2)L3 + E3)L4 +…+ EN-1)LN+EN]

Base(C) Denotes the address of the first element of C, and w denotes the
number of words per memory location.

You might also like