Data Structure Unit I Part 1
Data Structure Unit I Part 1
UNIT I
Introduction to Data Structure
Algorithmic Problem- A problem is a pair of input specification and output specification
What is Data Structure – A way of organizing the data in the main memory and permissible
operations on it is called a data structure. In other words, A data structure is a way to store and
organize data in order to facilitate access and modification.
Importance – The data structure allows us to solve complex problems effectively by providing
algorithms for performing basic operations such as insertion, deletion, updation, and retrieval
(searching).
Data Structure Operations
Some of the basic operations performed on a data structure are as follows:
Traversing- Accessing or visiting each data element in a particular order in a data structure is called
traversing.
Searching- It is finding the location(s) of data that satisfy one or more conditions. We have various
search techniques like linear search and binary search to search for elements in a data structure.
Inserting—Inserting new data into the data structure is called insertion. Data can be inserted at the
initial (first), final (last), or anywhere between the first and final locations. This operation increases
the size of the data structure.
Deleting—Removing existing data elements from the data structure is called deletion. Data is
deleted from the initial (first), final (last), or anywhere between the first and final locations. This
operation decreases the size of the data structure.
Sorting- Arranging the data in a specified order (ascending or descending) is sorting. A user uses
different techniques to sort data, viz bubble sort, shell sort, etc.
Merging- Combining the data from two data structures (or files) into a single is called merging.
This operation improves the competency in searching and ascertains correct data access to the
users.
Types of Data Structures – Primarily data structures can be divided into two categories: primitive
and non-primitive. The non-primitive data structures can be further divided into two categories
based on relationship among the data: linear and non-linear.
Linear Data Structure- A data structure in which data are organized in one after another manner is called
linear data structure. The relationship among the data is one to one. Example of linear data structure are
shown in above diagram.
Non-linear Data Structure- A data structure in which data are not organized in one after another manner
is called non-linear data structure. The relationship among the data is either one to many or many to many.
Example of non-linear data structure are shown in above diagram.
What to be learned – There are three aspects which needs to be clearly understood about each data
structure: (i) different ways of representing data structure, (ii) different operations to manage structure, and
(iii) applications of data structure.(iv) Determine the time complexity of different operations performed on
data structure. (vi) Select suitable data structure to solve problems.
Complexity of an algorithm- There may be more than one solution(algorithm) of any given
problem for computer. As a result of this the question should arise mind that which solution should
be chosen for implementation. The selection of an algorithm depends on it’s goodness(efficiency),
which is determined through complexity of algorithm.
The complexity of an algorithm can be defined as function describing it’s efficiency in processing
the given amount of data. There are two main complexity measures of the efficiency of an
algorithm: Time and Space.
Time Complexity- Time complexity is a function describing numbers of times elementary
operations need to be performed for given amount of data in an algorithm. An elementary operation
is any one of the arithmetic operations (addition, subtraction, multiplication, division) or a
comparison between two numbers or the execution of a branching instruction or assignment
operation.
In other words, “The complexity of an algorithm is most often measured through the number
of elementary operations that the algorithm requires to arrive at an answer under "worst-case
conditions."
Space Complexity- Space Complexity is a function describing the amount of memory required in
addition to input data.
We are mainly interested in how the time and space requirements change as input data items grows
large
Input Size- An algorithm may have different running times for different inputs. How do we think
about comparing algorithms?. We define the rough size of the input, usually in terms of important
parameters of input.
Example: In the problem of search, we say that the number of elements in the array is the input
size.
Note: the size of individual elements is not considered.
Asymptotic Analysis - Asymptotic analysis is a technique for analyzing how an algorithm behaves
or performs as the input size changes. Asymptotic notation of an algorithm is a mathematical
representation(Function) of its complexity.
In asymptotic notation, when we want to represent the complexity of an algorithm, we use only
the most significant terms(without coefficient) in the complexity of that algorithm and ignore
least significant terms in the complexity of that algorithm
For example, consider the following time complexities of two algorithms
• Algorithm 1 : 5n2 + 2n + 1
• Algorithm 2 : 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time complexity for larger values of
input data (i.e. 'n' value). In above two time complexities, for larger value of 'n' the term '2n +
1' in algorithm 1 has least significance than the term '5n2', and the term '8n + 3' in algorithm 2
has least significance than the term '10n2'. Here, for larger value of 'n' the value of most significant
terms ( 5n2 and 10n2 ) is very larger than the value of least significant terms ( 2n + 1 and 8n + 3 ).
So for larger value of 'n' we ignore the least significant terms to represent overall time required
by an algorithm. The following plot shows that when input size increases the contribution of lower
order terms becomes negligible in total numbers of elementary operation to be performed.
In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates
the algorithm's upper bound.
Note- Caution! Beware of very large constant factors. An algorithm running in time 1,000,000 n is still
O(n) but might be less efficient than one running in time 2n^2 , which is O(n^2 )
Example- Consider the following f(n) and g(n)
f(n)=3n + 2
g(n) = n
If we want to represent f(n) as O(g(n)) then it must satisfy f(n) <= C g(n) for all values of C >
0 and n0>= 1
f(n) <= C g(n) ⇒3n + 2 <= C n
Above condition is always TRUE for all values of C = 4 and n >= 2.
By using Big - Oh notation we can represent the time complexity as follows:
3n + 2 = O(n)
Example: Demonstrate that 5𝑛2 + 3𝑛 + 6 is dominated by 𝑛2 i.e. that 5𝑛2 + 3𝑛 + 6 is O(𝑛2 )
by finding a 𝑐 and 𝑛0
2 2 2 2
5n + 3n + 6 ≤ 5n + 3n + 6n when n ≥1
2 2 2 2
15n + 3n + 6n = 14n
2 2
5n + 3n + 6 ≤ 14n for n ≥ 1
2 2
14n ≤ c*n for c = ? n >= ?
𝒄 = 14 & 𝒏𝟎 = 1
Big - Omega Notation (Ω)
Big - Omega notation is used to define the lower bound of an algorithm in terms of Time
Complexity.
That means Big-Omega notation always indicates the minimum time required by an algorithm for
all input values. That means Big-Omega notation describes the best case of an algorithm time
complexity.
Big - Omega Notation can be defined as follows:
Definition: Let f(n) and g(n) be functions, where n is a positive integer.
We write f(n) ∈ Ω(g(n)) (read as "f of n is big omega of g of n“) if and only if there exists constants
real number c and positive integer n0 s.t.
f(n) >= cg(n)>0 for all n >= n0
f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis
and time required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is less than equal to f(n) which
indicates the algorithm's lower bound.
Example
Consider the following f(n) and g(n):
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C >
0 and n0>= 1
f(n) >= C g(n)
⇒3n + 2 >= C n
Above condition is always TRUE for all values of C = 1 and n >= 1.
By using Big - Omega notation we can represent the time complexity as follows:
3n + 2 = Ω(n)
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is
greater than f(n) which indicates the algorithm's average bound.
Example
Consider the following f(n) and g(n):
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all values
of C1 > 0, C2 > 0 and n0>= 1
C1 g(n) <= f(n) <= C2 g(n) ⇒ C1 n <= 3n + 2 <= C2 n
Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.
By using Big - Theta notation we can represent the time complexity as follows :
3n + 2 = Θ(n)
Abstract Data Type (ADT)- It is a programming concept that defines a high-level view of a
data structure, without specifying the implementation details. In other words, it is a blueprint for
creating a data structure that defines the behavior and interface of the structure, without specifying
how it is implemented.
An ADT in the data structure can be thought of as a set of operations that can be performed on a
set of values. This set of operations actually defines the behavior of the data structure, and they
are used to manipulate the data in a way that suits the needs of the program.
ADTs are often used to abstract away the complexity of a data structure and to provide a simple
and intuitive interface for accessing and manipulating the data. This makes it easier for
programmers to reason about the data structure, and to use it correctly in their programs.
Examples of abstract data type in data structures are List, Stack, Queue, etc.
ADT allow us to make the code modular. The code is called when it consist of modules that are
highly cohesive and loosely coupled. There are various advantages of modular code like reusability
and less maintenance efforts, time and cost.
Arrays – Array is a fixed size collection of similar type of data items that are stored in memory
at contiguous memory locations. All the elements of array share common name and distinguished
by indexes or subscripts.
Types of array- Arrays can be broadly divided into two categories based on number of dimensions
(number of directions in which array can grow)
(i) One dimensional array (ii) Multidimensional array
Note: When we declare an array its name is to be specified along with sizes of all the
dimensions. During execution time, a block of memory is allocated to store array elements.
The address of first element of array is called it’s based address(starting address).
One dimensional (1 D) array- An array which can grow in only one direction is called one
dimensional array. Elements in such array can be accessed using only one subscript or index. 1D
array is the obvious choice of storing data structure in which data are to be arranged in one after
another manner like list. In spite of being one dimensional, 1D array can also be used to store data
structures that has one after many like tree data structure.
Following is a declaration and initialization of 1D array of integer type in ‘C’ language and it’s
memory representation:
int marks [5] = {51,62,43,74,55};
Here amount of memory allocated to each integer is 2 bytes( as per DOS environment).
Accessing elements of 1D array - An element of an array can be accessed (read or write) be using
array name and its index. Index of an element of 1D array depends on its position in the array. The
index of first element is zero(0) and last index is size-1.
Two dimensional array- An array which can grow in two direction is called two dimensional
array. Elements in such array are accessed using two subscript or indexes. 2D array can be
visualized as an array of 1D arrays due to one dimensional nature of main memory. 1D array is
the obvious choice of storing data structure in which data are to be arranged in row and columns
form. There are various real time data which are stored in computer memory with row and column
arrangements like matrix, gray scale image, amount of rain fall in each month on weekly/daily
basis, records of friendship on social media platforms, etc.
Following is a declaration and initialization of 2D array of integer type in ‘C’ language to store
the marks scored by three students in five subjects:
int marks [3][5] = {{50,60,78,87,95},{98,87,75,76,99},{98,99,97,95,100}};
Accessing elements of 2D array - An element of an array can be accessed (read or write) be using
array name and two indexes(row index and column index). Both row and column index start from
zero(0) therefore first element of 2D array marks can be accessed using marks[0][0]. While the
last elements can be accessed using marks[2][4].
Storage of 2D array- Since main memory in computer is one dimensional and 2D array has two
dimension it cannot be stored directly as it is in memory. Therefore, 2D array needs to be
transformed in 1D form. There are two methods for doing 2D to 1D transformation so that row-
column arrangement of 2D array remains intact: (i) Row-major order (ii) Column-major order.
Row-major order- In this method, 2D array is stored in row-by-row(row wise) manner. The
elements of first row is stored from starting address of the array, after which elements of second
row, and so on.
Column-major order- In this method, 2D array is stored in column-by-column(column wise)
manner. The elements of first column is stored from starting address of the array, after which
elements of second column, and so on.
Following example presents the storage of 2D array in memory using row-major order and
column-major order:
Note: C is row-major (as is C++, Pascal, and Python), Fortran is column-major (as is Julia, R and
Matlab)
Three dimensional array- An array which can grow in three direction is called three dimensional
array. Elements in such array are accessed using three subscript or indexes. 3D array can be
visualized as an array of 2D arrays while 2D array as an array of 1D arrays due to one dimensional
nature of main memory. 3D array is the obvious choice of storing data structure in which data are
to be arranged in row and columns form along with depth( also called page/block/sheet). There are
various real time data which are stored in computer memory with row and column arrangements
along with depth like color image(having depth 3(RGB)), amount of rain fall in each month on
weekly/daily basis for multiple cities, etc.
Note: When we declare and access array elements, the order of dimensions in 3D array in
‘C’ language is depth, row, and column. While in MATLAB it is row, column, and depth.
Following is a declaration and initialization of 3D array of integer type in ‘C’ language to store
the marks scored by two students in three subjects during two tests:
int marks [2][2][3] = {{{50,60,56},{78,87,60}},{{97,95,56},{75,76,68}};
Following diagram visualize marks 3D array as two 2D arrays of 2 rows and 3 columns:
Accessing elements of 3D array - An element of an array can be accessed (read or write) be using
array name and three indexes(depth index, row index, and column index). All three indexes start
from zero(0) therefore first element of 3D array marks can be accessed using marks[0][0][0].
While the last elements can be accessed using marks[1][1][2].
Storage of 3D array – Like 2D array, we can store elements of 3D array using row-major order
and column-major order.
Following diagram shows the row-major order storage of 3D array:
The above diagram shows all the aspects of formula for calculation of address of elements
in one dimensional array.
Example: Given the base address of an array A[1200 ………… 1900] as 1000 and the size of
each element is 4 bytes in the memory, find the address of A[1700].
Solution-
Given:
• Base address (B) = 1000
• Lower bound (LB) = 1200
• Size of each element (S) = 4 bytes
• Index of element (not value) = 1700
Formula used:
Address of A[Index] = B + S * (Index – LB)
Address of A[1700] = 1000 + 4 * (1700 – 1200)
= 1000 + 4 * (500)
= 1000 + 2000
Address of A[1700] = 3000
Two dimensional array- As mentioned earlier, 2D array can be stored in memory using two
different approaches namely row-major order and column-major order.
Address calculation in row-major order- Following is the generalized formula for address
calculation:
Address of A [ i ][j] = B + S *[ N*( i – LBr) + ( j – LBc)]
Here,
B = Base address.
W = Storage Size of one element stored in the array (in byte)
i = Row Subscript(index) of element whose address is to be calculated.
j= Column Subscript(index) of element whose address is to be calculated.
LBr = is the first element’s row index in array.
LBc = is the first element’s column index in array.
N= is the number of columns in 2D array.
Note: N*( i – LBr) is used to determine the number element in rows prior to ith row. While, j – LBc
determines the number of elements prior to the element in ith row.
Example: Given an array, arr[1………10][1………15] with base value 100 and the size of
each element is 1 Byte in memory. Find the address of arr[8][6] with the help of row-major
order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array S = 1 Bytes
Row index of an element whose address to be calculated i = 8
Column index of an element whose address to be calculated j = 6
Lower Limit of row/start row index of matrix LBr = 1
Lower Limit of column/start column index of matrix LBc = 1
Number of column given in the matrix N = UBc – LBc + 1
= 15 – 1 + 1
= 15
Solution:
Address of A[8][6] = 100 + 1 * ((8 – 1) * 15 + (6 – 1))
= 100 + 1 * ((7) * 15 + (5))
= 100 + 1 * (110)
Address of A[I][J] = 210
Address calculation in column-major order- Following is the generalized formula for address
calculation:
Address of A [ i ][j] = B + S *[ ( i – LBr) + M*( j – LBc)]
Here,
B = Base address.
W = Storage Size of one element stored in the array (in byte)
i = Row Subscript(index) of element whose address is to be calculated.
j= Column Subscript(index) of element whose address is to be calculated.
LBr = is the first element’s row index in array.
LBc = is the first element’s column index in array.
M= is the number of rows in 2D array.
Note: M*( j – LBc) is used to determine the number element in columns prior to jth column. While, i –
LBr determines the number of elements prior to the element in jth column.
Example: Given an array, arr[1………10][1………15] with base value 100 and the size of
each element is 1 Byte in memory. Find the address of arr[8][6] with the help of column-major
order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array S = 1 Bytes
Row index of an element whose address to be calculated i = 8
Column index of an element whose address to be calculated j = 6
Lower Limit of row/start row index of matrix LBr = 1
Lower Limit of column/start column index of matrix LBc = 1
Number of rows given in the matrix M = UBr – LBr + 1
= 10 – 1 + 1
= 10
Formula: used
Address of A [ i ][j] = B + S *[ ( i – LBr) + M*( j – LBc)]
Address of A[8][6] = 100 + 1 * ((6 – 1) * 10 + (8 – 1))
= 100 + 1 * ((5) * 10 + (7))
= 100 + 1 * (57)
Address of A[I][J] = 157