DATA STRUCTURES
An Introduction
Basic Terminology
• Data: Data may be a single value or it may be a set of values.
• Information: Meaningful or Processed data is called
Information.
• Record is a collection of related data item.
• File is a collection of logically related records.
• Entity
– is a person, place, thing, event or concept about which information is
recorded.
– has certain attributes or properties which may be assigned values.
• Attributes gives the characteristics of the entity.
• Entity set: Entities with similar attributes forms an Entity Set.
• Range is a set of all possible values that could be assigned to a
particular attribute.
2
Data Structures
• Logical or mathematical model of a particular
organization of data is called a Data Structure.
• Data structures are the building blocks of the program.
• The selection of a particular data structure stresses on
following:
– The data structure must be rich enough in structure to
reflect the relationship existing between the data.
– The structure should be so simple that data can be
processed effectively whenever required.
ALGORITHM + DATA STRUCTURE = PROGRAM
3
Classification of Data Structures
• Data structures are normally divided into two
broad categories:
– Primitive data structures
• Basic data structures that are directly operated upon by
machine instruction.
• Available in most programming languages as built-in
types.
• E.g. int, float, char, pointer
– Non-primitive data structures
• These data structures are a set of homogenous and
heterogeneous data elements stored together.
4
Types of Data Structure
5
Non-primitive data structures
• These are further classified as:
• Linear data structure
– A data structure is said to be linear if its elements
forms any sequence
• Non-linear data structure
– Represents data containing hierarchical
relationship between elements e.g. trees, graphs
6
7
8
Data Structure Operations
• The choice of data structure depends on the
frequency with which specific operations are
performed.
• Operations that can be performed are:
– Traversing
– Searching
– Insertion
– Deletion
– Sorting
– Merging
9
• Traversing
– Accessing each record exactly once so that certain items in
the record may be processed.
• Searching
– Finding the location of the record with a given key value, or
finding the location of all records satisfying one or more
conditions.
• Insertion
– Adding a new record to the structure.
• Deletion
– Removing a record from a structure.
10
• Sorting
– Arranging the records in some logical order
• Merging
– Combining the records in two different sorted files
into a single sorted file.
11
Data types
• Each variable in C has its associated data type.
• Each data type requires different amount of memory.
• Some commonly known basic data types are:
– int
• Used to store an integer
• Requires 2 bytes of memory
– char
• Stores a single character
• Requires one byte of memory
– float
• Used to store decimal numbers with single precision
– double
• Used to store decimal numbers with double precision 12
13
14
15
Algorithm
• Algorithm is a step-by-step procedure, which defines a set of
instructions to be executed in a certain order to get the desired
output.
• An algorithm is a sequence of steps to solve a problem.
• An algorithm can be expressed in English like language, called
Pseudocode.
• There may be more than one algorithms to solve a problem.
• The choice of a particular algorithm depends on the following
considerations:
– Memory requirements (Space complexity)
– Performance requirements (Time Complexity)
16
Complexity of Algorirthms
• Space Complexity
– It is the amount of memory needed to run to
completion.
• Time Complexity
– It is the amount of time needed to run to
completion
17
Characteristics of an algorithm
An algorithm should have the following characteristics:
• Definiteness/ Unambiguity
– Each step of the algorithm must be clearly and precisely defined and there should not
be any ambiguity.
• Input
– An algorithm must have zero or more but finite number of inputs
• Output
– An algorithm must have one desirable output.
• Finiteness
– An algorithm must always terminate after a finite number of steps in finite amount of
time.
• effectiveness
– An algorithm should be effective.
– Each of the operation to be performed in an algorithm must be sufficiently basic that it
can be done exactly and in a finite length of time
• Independent
– An algorithm should have step-by-step directions, which should be independent of any
18
programming code.
Algorithmic Notations
• The format for the formal presentation of an
algorithm consists of two parts:
– First part is a paragraph which tells:
• the purpose of the algorithm
• identifies the variables which occur in the algorithm
• lists the input data
– The second part of the algorithm consists of the
lists of steps that is to be executed.
19
An Example Algorithm
Problem − Design an algorithm to add two numbers and display the
result.
• Step 1 − START
• Step 2 − declare three integers a, b & c
• Step 3 − define values of a & b
• Step 4 − add values of a & b
• Step 5 − store output of step 4 to c
• Step 6 − print c Step 7 − STOP
Algorithm
Step 1 − START ADD
Step 2 − get values of a & b
Step 3 − c ← a + b
Step 4 − display c
Step 5 − STOP
20
An Example Algorithm
A non-empty array DATA with N numerical values is given. Find the
location LOC and the value MAX of the largest element of DATA.
Algorithm: Given a nonempty array DATA with N numerical values, this
algorithm finds the location LOC and the value MAX of the largest
element of DATA. The variable K is used as a counter.
21
Steps, Control, Exit
• The steps of the algorithm are executed one
after the other, beginning with step 1.
• Control may be transferred to step n by the
statement “Go to step n”.
• If several statements appear in the same step,
• e. g. Set K : = 1, LOC : =1 and MAX : =DATA[1].
• They are executed from left to right.
• The algorithm is completed when the
statement “Exit” is encountered.
• Comments
– Each step may contain a comment in brackets which indicates the main
purpose of the step.
• Variable Names
– Variable names will use capital letters even though lowercase may be used
for these same variables.
• Assignment statements
– These statements will use dots-equal notation :=
• E.g. MAX:=DATA[1]
• Assigns the value of DATA[1] to MAX
• Input and Output
– Data may be read or may be output by means of read and write statements.
• Read: Variable names
• Write: Messages and/or variable names
• Procedures
– Used for independent algorithmic module (or subalgorithm) which solves a
particular problem
23
Why do we need Algorithms?
We need algorithms because of the following
reasons:
• Scalability: It helps us to understand the scalability.
When we have a big real-world problem, we need
to scale it down into small-small steps to easily
analyze the problem.
• Performance: The real-world is not easily broken
down into smaller steps. If the problem can be
easily broken into smaller steps means that the
problem is feasible.
24
Control Structures
• Algorithms mainly uses three types of logic or flow of
control such as:
– Sequence Logic, or sequential flow
– Selection Logic, or conditional flow
– Iteration Logic, or repetitive flow
• Sequential Logic
25
Selection Logic
• Selecting on out of several alternative modules.
• These are called conditional structures
• End of such statement can be indicated by
statement:
[End of If Structure.]
• These structures are divided into three categories:
• Single alternative
• Double alternative
• Multiple alternative 26
• Single Alternative
• Double Alternative
• Multiple Alternative
27
Iteration Logic
• Begins with a Repeat statement
• Followed by a module called body of loop
• End of such statement can be indicated by
statement:
[End of loop.]
28
29
Algorithm: Quadratic Equation
30
Complexity of Algorithms
• To measure the efficiency of algorithms, we
must have some criteria.
• Time and Space are the two main measures
for the efficiency of an algorithm i.e.
– Time Complexity
– Space Complexity
31
• The complexity of an algorithm M is the
function f(n) which gives the running time and
storage space requirement of the algorithm in
terms of size n of the input data.
• In simple words, the complexity of the
algorithm will depend on the number of
statement executed.
• The total number of statements executed will
depend on conditional statements.
32
Example
• E.g.
i=0; // (1 time)
while (i<n) // (n+1 times)
{
printf(“%d”,&i); // (n times)
i=i+1;// (n times)
}
• Total number of executions
= 1+(n+1)+(n)+(n)
= 3n+2
• If we ignore constants, complexity of the order n.
• Hence the complexity,
O(n) //Big-Oh Notation
33
Finding the complexity
• There are three cases to find the complexity:
– Worst case: maximum value of f(n) for any possible input
– Average case: expected value of f(n).
– Sometimes Best case can also be considered as
minimum possible value of f(n).
• E.g.
– number n1, n2, ……., nk occur with respective probabilities
p1, p2, ……., pk.
– Expected or Average value E is given by:
E=n1p1 + n2p2 + ……. + nkpk.
34
Linear Search
35
• The complexity of the searching algorithm is given by the number
C of comparisons between ITEM and DATA[K].
• Worst case
– When ITEM is the last element in the array DATA.
– When ITEM does not exist in the list.
– Then, C(n)=n
• Average case
– It is equally likely to occur at any position in the array.
– The number of comparisons can be any number 1,2,3,….,n
– Each number occurs with probability p=1/n.
36
Rate of Growth: Big O Notation
• Suppose,
– M is an algorithm
– n is the size of input data
• Then, complexity f(n) of M increases as n
increases.
37
Rate of Growth: Big O Notation
If f(n) <=c.g(n) where c is constant
38
• Suppose f(n) and g(n) are the functions defined on positive
integers.
• F(n) is bounded by some multiple of g(n) for all n.
• There exists a positive integer n0 and a positive number M such
that for all n>no, we have,
|f(n)| <= M|g(n)|
• Then, f(n) = O(g(n))
– It can be read as “f(n) is of order g(n)”.
– E.g.
39
Omega Notation (Ω)
• The Big-O notation defines an upper bound
function g(n) for f(n) which represents the
time/space complexity of the algorithm.
• In Omega notation, the function g(n) defines
the lower bound for function f(n).
• There exists a positive integer n0 and a positive
number M such that for all n>no, we have,
|f(n)| >= M|g(n)|
40
Omega Notation (Ω)
If f(n) >= c.g(n) where c is constant
41
Theta Notation (θ)
• It is used when function f(n) is bounded both
from above and below by the function
42
Theta Notation (θ)
If c.g(n) <= f(n) <=c2.g(n) 43
Arrays
• An array is a finite set of homogenous data elements.
• Stored in consecutive memory locations.
• The elements of array are referenced respectively by an index set
consisting of n consecutive numbers.
• The number n of elements is called the length or size of the array.
Length=UB-LB+1
Where,
• UB – largest index, called Upper Bound
• LB – smallest index, called Lower bound
Length=UB when LB=1
44
…continued
• The elements of array A may be denoted by:
– Subscript notation
A1, A2, A3, ……., An
– Parenthesis notation
A(1), A(2), …… , A(N)
– Bracket notation
A[1], A[2], A[3], …… ,A[N]
• The number K in A[K] is called subscript or index.
• A[K] is called subscripted variable. 45
Representation of Array
Example
46
Representation of Array in memory
• Let LA be a linear array in memory.
– LOC(LA[K])=address of the element LA[K] of array
LA
• Computer keeps track of address of first
element of LA only, called Base address
• Base(LA)
LOC(LA[K]) = Base(LA) + w(K-lower bound)
• w is the no. of words per memory cell for LA
47
Example
48
Operations on Arrays
• Traversing
– Accessing or processing (visiting) each element of array exactly
once
• Insertion
– To insert an element into array
• Deletion
– To delete element from array
• Searching
– To search any element from the given list
• Sorting
– To sort the given list of elements
49
Algorithm: Traversing
• LA is a linear array with lower bound LB and upper bound UB. This algorithm
traverses LA applying an operation PROCESS to each element of LA.
• Alternate algorithm
50
Insertion into Linear Array
51
Deletion into Linear Array
52
Binary search
• By using this technique, element can searched in minimum
possible comparisons.
• This given list of elements should be in sorted order.
• It can be done as follows:
– Find the middle element of the array
– Compare the mid element with an item to search.
– There are three cases:
• If it is the desired element, search is successful.
• If mid is greater than desired item, search only the left half of array.
• Else If mid is less than desired item, search only the right half of array.
• Complexity of Binary Search O(log2n)
53
54
Two-dimensional Arrays
• A two dimensional m×n array A is a collection
of m·n data elements.
• Each element is specified by a pair of integers
(such as J, K), called subscripts such that
1 ≤ J ≤ m and 1≤K≤n
• It is denoted by
– AJ,K or A[J,K]
• Two dimensional arrays are called matrix
arrays.
55
Two-dimensional array
56
Representation of 2-D array in memory
57
• Following formula can be applied to locate a
particular address:
• Column major order
– LOC(A[J,K]) = Base(A) + w(M(K-1)+(J-1))
• Row major order
– LOC(A[J,K]) = Base(A) + w(N(J-1)+(K-1))
58
Example
59
Bubble Sort
60
Selection Sort
61
Insertion Sort
62
Complexity of Insertion Sort
• Worst Case
– When array A is in reverse order
– (k-1) comparisons
• Average Case
– Approximately (k-1)/2 comparisons
63
Multi-dimensional Arrays
• A multi-dimensional or n-dimensional array
m1×m2×……..×mn array B is a collection of
m1·m2·……..·mn data elements.
• Each element is specified by a list of n integers
(such as K1, K2, ….., Kn), called subscripts such
that
1 ≤ K1 ≤ m1, 1 ≤ K2 ≤ m2, ………, 1 ≤ Kn ≤ mn
• It is denoted by
– B K1, K2, ….., Kn or B[K1, K2, ….., Kn]
64
Multi-dimensional Arrays
• Length Li can be calculated as
Li = upper bound – lower bound + 1
• For a given subscript Ki, effective index Ei of Li
is the number of indices preceding Ki in the
index set.
Ei = Ki - lower bound
65
66
• Column major order
• Row major order
67
An Example:
68
Recursion
• Recursion is a process in which a function calls itself with an
argument.
• A recursive procedure must have following two properties:
– There must be certain criteria, called base criteria, for which the
procedure does not call itself.
– Each time the procedure calls itself, it must be closer to the base
criteria.
• A recursive procedure with these two properties is said to be well
defined.
• It is of two types:
– Direct recursion
• When a function class itself
– Indirect recursion 69
• When two functions calls one another mutually.
70
Factorial Function
• The product of positive integers from 1 to n is
called “n factorial” denoted by n!
n!=1·2·3·……(n-2) ·(n-1) ·n
or, n!=n· (n-1)!
• Formal Definition (Factorial function)
– If n=0, then n!=1
– If n>0, then n!=n· (n-1)!
71
Algorithm: Factorial Function
72
Fibonacci Sequence
• Fibonacci sequence is as follows:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ……………..
– Here,
F0=0 and F1=1
• Each succeeding term is the sum of two
preceding terms
• Formal Definition:
– If n=0 or n=1, then Fn=n
– If n>1, then Fn=Fn-2 + Fn-1 73
Algorithm: Fibonacci Sequence
74