Unit 1
Unit 1
Data Structures
Data structure means organizing the data in the memory.
Data Structure can be defined as the group of data elements which provides an efficient way of storing
and organizing data in the computer so that it can be used efficiently.
Data items that are divided into sub-items are called Group items. Ex: An Employee Name may be
divided into three subitems- first name, middle name, and last name.
Data items that are not able to divide into sub-items are called Elementary items. Ex: SSN
Entity: An entity is something that has certain attributes or properties which may be assigned values.
The values may be either numeric or non-numeric.
Entities with similar attributes form an entity set. Each attribute of an entity set has a range of values, the
set of all possible values that could be assigned to the particular attribute. The term “information” is
sometimes used for data with given attributes, of, in other words meaningful or processed data.
1
Processor speed: To handle very large amount of data, high speed processing is required, but as the
data is growing day by day to the billions of files per entity, processor may fail to deal with that much
amount of data.
Data Search: Consider an inventory size of 106 items in a store; If our application needs to search for a
particular item, it needs to traverse 106 items every time, results in slowing down the search process.
Multiple requests: If thousands of users are searching the data simultaneously on a web server, then
there are the chances that a very large server can be failed during that process.
Reusability: Data structures are reusable, i.e. once we have implemented a particular data structure, we
can use it at any other place. Implementation of data structures can be compiled into libraries which can
be used by different clients.
Abstraction: Data structure is specified by the ADT which provides a level of abstraction. The client
program uses the data structure through interface only, without getting into the implementation details.
2
Difference between Sequential Organization and Linked Organization:
3. Allows for fast insertions and deletions Allows for fast traversal and access
Can be used for implementing data Can be used for implementing data
5.
structures like linked lists and trees structures like arrays and stacks
7. Can be used for dynamic data structures Suitable for static data structures
Pointers may be pointing to null or non- All elements are stored in contiguous
10.
existent memory in case of broken links memory
3
Types of Data Structure
Basically, data structures are divided into two categories:
Primitive data structure is a fundamental type of data structure that stores the data of only one type
(can hold a single type of value).
o Integer: The integer data type contains the numeric values. It contains the whole numbers that can
be either negative or positive. When the range of integer data type is not large enough then in that
case, we can use long.
o Float: The float is a data type that can hold decimal values. When the precision of decimal value
increases then the Double data type is used.
o Boolean: It is a data type that can hold either a True or a False value. It is mainly used for checking
the condition.
o Character: It is a data type that can hold a single character value both uppercase and lowercase
such as 'A' or 'a'.
4
The non-primitive data structure is derived from the primitive data structure and that is more
complicated to create. It is responsible for structuring of a group of homogeneous and
heterogeneous data type. This is used to process large amount of data.
Non-primitive data structure is a type of data structure that can store the data of more than one
type (Can hold multiple values).
In case of linear data structure, the data is stored in a sequence, i.e., one data after another data.
When we access the data from the linear data structure, we just need to start from one place and
will find other data in a sequence.
Linear data structures are easy to implement because computer memory is arranged in a linear
way. Its examples are array, stack, queue, linked list, etc.
Linear Non-linear
In a linear data structure, data elements are
arranged in a linear order where each and In a non-linear data structure, data elements
every element is attached to its previous and are attached in hierarchically manner.
next adjacent.
Whereas in non-linear data structure, multiple
In linear data structure, single level is involved.
levels are involved.
Its implementation is easy in comparison to While its implementation is complex in
non-linear data structure. comparison to linear data structure.
While in non-linear data structure, data
In linear data structure, data elements can be
elements can’t be traversed in a single run
traversed in a single run only.
only.
In a linear data structure, memory is not While in a non-linear data structure, memory is
utilized in an efficient way. utilized in an efficient way.
Its examples are: array, stack, queue, linked
While its examples are: trees and graphs.
list, etc.
Applications of linear data structures are Applications of non-linear data structures are
mainly in application software development. in Artificial Intelligence and image processing.
5
1. Array Data Structure
An Array is a collection of items of same data type stored at contiguous memory locations.
Representation of Array
Types of arrays:
There are majorly two types of arrays:
One-dimensional array (1-D arrays): You can imagine a 1d array as a row, where elements
are stored one after another.
6
Three-dimensional array: A 3-D Multidimensional array contains three dimensions, so it can
be considered an array of two-dimensional arrays.
Arrays allow random access to elements. This makes accessing elements by position faster.
Arrays have better cache locality which makes a pretty big difference in performance.
Arrays represent multiple data items of the same type using a single name.
Arrays store multiple data of similar types with the same name.
Array data structures are used to implement the other data structures like linked lists, stacks,
queues, trees, graphs, etc.
Disadvantages of Array:
7
As arrays have a fixed size, once the memory is allocated to them, it cannot be increased or
decreased, making it impossible to store extra data if required. An array of fixed size is
referred to as a static array.
Allocating less memory than required to an array leads to loss of data.
An array is homogeneous in nature so, a single array cannot store values of different data
types.
Arrays store data in contiguous memory locations, which makes deletion and insertion very
difficult to implement. This problem is overcome by implementing linked lists, which allow
elements to be accessed sequentially.
Application of Array:
They are used in the implementation of other data structures such as array lists, heaps, hash
tables, vectors, and matrices.
Database records are usually implemented as arrays.
It is used in lookup tables by computer.
It is used for different sorting algorithms such as bubble sort insertion sort, merge sort, and
quick sort.
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end,
called top.
A stack is an abstract data type (ADT), can be implemented in most of the programming
languages. It is named as stack because it behaves like a real-world stack, for example: -
piles of plates or deck of cards etc.
8
Various Applications of Stack in Data Structure:
o Evaluation of Arithmetic Expressions: An arithmetic expression consists of operands and
operators.
o Backtracking: Backtracking is another application of Stack. It is a recursive algorithm that is used
for solving the optimization problem.
o Delimiter Checking: delimiter checking, i.e., parsing that involves analyzing a source program
syntactically. It is also called parenthesis checking. When the compiler translates a source program
written in some programming language such as C, C++ to a machine language, it parses the
program into multiple individual parts such as variable names, keywords, etc. By scanning from left
to right.
o Reverse a Data: To reverse a given set of data, we need to reorder the data so that the first and
last elements are exchanged, the second and second last element are exchanged, and so on for
all other elements.
o Processing Function Calls
Queue: Queue is a linear list in which elements can be inserted only at one end
called rear and deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it
follows First-In-First-Out (FIFO) methodology for storing the data items.
Managing requests on a single shared resource such as CPU scheduling and disk
scheduling
Queues are used in asynchronous transfer of data (where data is not being transferred at
the same rate between two processes) for eg. pipes, file IO, sockets.
Handling hardware or real-time systems interrupts
9
Handling website traffic
Routers and switches in networking
Queues are used to maintain the play list in media players in order to add and remove the
songs from the play-list.
Queues are used as buffers in most of the applications like MP3 media player, CD player,
etc.
4. Linked List
Linked list is a linear data structure which is used to maintain a list in the memory. It can be
seen as the collection of nodes stored at non-contiguous memory locations. Each node of
the list contains a pointer to its adjacent node.
Implementing advanced data structures: We can implement data structures like stacks
and queues with the help of a linked list.
Graph Adjacency list Representation: Linked List helps in storing the adjacent vertices of
a graph in adjacency list representation.
10
Non-linear data structure
Data structures where data elements are not arranged sequentially or linearly are
called non-linear data structures. In a non-linear data structure, single level is not
involved. Therefore, we can’t traverse all the elements in single run only. Non-linear data
structures are not easy to implement in comparison to linear data structure.
1. Trees:
Trees are multilevel data structures with a hierarchical relationship among its elements
known as nodes. The bottommost nodes in the hierarchy are called leaf node while the
topmost node is called root node. Each node contains pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node
in the tree can have more than one children except the leaf nodes whereas each node can
have atmost one parent except the root node.
Store hierarchical data, like folder structure, organization structure, XML/HTML data.
Binary Search Tree is a tree that allows fast search, insert, delete on a sorted data. It also allows
finding closest item
Heap is a tree data structure which is implemented using arrays and used to implement priority
queues.
11
B-Tree and B+ Tree : They are used to implement indexing in databases.
Syntax Tree: Scanning, parsing , generation of code and evaluation of arithmetic expressions in
Compiler design.
2. Graph:
Graphs can be defined as the pictorial representation of the set of elements (represented
by vertices) connected by the links known as edges. A graph is different from tree in the
sense that a graph can have cycle while the tree can not have the one.
Used in Google maps for building transportation systems. In google maps, the intersection of two
or more roads represents the node while the road connecting two nodes represents an edge.
Google maps algorithm uses graphs to calculate the shortest distance between two vertices.
Operating Systems use Resource Allocation Graph where every process and resource acts as a
node while edges are drawn from resources to the allocated process.
Used in the World Wide Web where the web pages represent the nodes.
Blockchains also use graphs. The nodes are blocks that store many transactions while the edges
connect subsequent blocks.
12
Data Structure operations
1. Traversing:
Every data structure contains the set of data elements. Traversing the data structure
means visiting each element of the data structure in order to perform some specific
operation like searching or sorting.
To visit or process each data exactly once in the data structure
2. Insertion:
Insertion can be defined as the process of adding the elements to the data structure at any
location.
To add a new value to the data structure
If the size of data structure is n then we can only insert n-1 data elements into it.
3. Deletion:
The process of removing an element from the data structure is called Deletion. We can
delete an element from the data structure at any random location.
To remove a value from the data structure
If we try to delete an element from an empty data structure then underflow occurs.
4. Searching:
The process of finding the location of an element within the data structure is called
Searching. There are two algorithms to perform searching, Linear Search and Binary
Search.
To search for a particular value in the data structure for the given key value.
5. Sorting:
The process of arranging the data structure in a specific order is known as Sorting. There
are many algorithms that can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
To arrange the values in the data structure in a particular order.
6. Merging:
When two lists List A and List B of size M and N respectively, of similar type of elements,
clubbed or joined to produce the third list, List C of size (M+N), then this process is called
merging.
To join two same type of data structure values
13
Algorithm
An algorithm is defined as a set of rules or a step-by-step procedure that are to be
executed in a specific order to get the desired output.
The formal definition of an algorithm is a finite set of steps carried out in a specific time for
specific problem-solving operations, especially by a Computer. So basically, it is not the
complete code; rather it is the logic that can be implemented to solve a particular problem.
Note: Pseudocode is nothing but the description of the logic or the code in plain, simple
language.
Characteristics of an Algorithm
o Input: An algorithm has some input values. We can pass 0 or some input value to an algorithm.
o Output: We will get 1 or more output at the end of an algorithm.
o Unambiguity: An algorithm should be unambiguous which means that the instructions in an
algorithm should be clear and simple.
o Finiteness: An algorithm should have finiteness. Here, finiteness means that the algorithm should
contain a limited number of instructions, i.e., the instructions should be countable.
o Effectiveness: An algorithm should be effective as each instruction in an algorithm affects the
overall process.
o Language independent: An algorithm must be language-independent so that the instructions in
an algorithm can be implemented in any of the languages with the same output.
Need of Algorithms
We need algorithms because of the following reasons:
14
o Scalability: It helps us to understand the scalability. When we have a big real-world
problem, we need to scale it down into small-small steps to easily analyze the problem.
o Performance: The real-world is not easily broken down into smaller steps. If the problem
can be easily broken into smaller steps means that the problem is feasible.
Example of an algorithm
The following are the steps required to add two numbers entered by the user:
o Step 1: Start
o Step 2: Declare three variables a, b, and sum.
o Step 3: Enter the values of a and b.
o Step 4: Add the values of a and b and store the result in the sum variable, i.e., sum=a+b.
o Step 5: Print sum
o Step 6: Stop
Algorithm Analysis
Algorithm analysis is an important part of computational complexity theory, which provides
theoretical estimation for the required resources of an algorithm to solve a specific computational
problem. Analysis of algorithms is the determination of the amount of time and space resources
required to execute it.
1. Best case
2. Worst case
3. Average case
Best case: Define the input for which algorithm takes less time or minimum time. In the best case
calculate the lower bound of an algorithm. Example: In the linear search when search data is
present at the first location of large data then the best case occurs.
Worst Case: Define the input for which algorithm takes a long time or maximum time. In the
worst calculate the upper bound of an algorithm. Example: In the linear search when search data
is not present at all then the worst case occurs.
Average case: In the average case take all random inputs and calculate the computation time for
all inputs. And then we divide it by the total number of inputs.
Let's take the example of searching for an item sequentially within a list of unsorted items.
If the searched item is always the first one, then complexity is O(1);
15
if it's always the last one, then complexity is O(n).
We can also calculate the average complexity, which will turn out to be O(n).The
term "complexity" normally refers to worst-case complexity.
The analysis of an algorithm is done base on its efficiency. The two important terms used for the
analysis of an algorithm is “Priori Analysis” and “Posterior Analysis”.
Priori Analysis: It is done before the actual implementation of the algorithm when the algorithm
is written in the general theoretical language. In this, the efficiency of the algorithm is calculated
base on its complexity. It is just an approximate analysis.
Posterior Analysis: It is done after the actual implementation and execution of the algorithm
using any programming language like C, C++, Java, or Python. It is the actual analysis in which
space and time complexity of the algorithm is calculated more correctly.
Algorithm Complexity
Complexity in algorithms refers to the amount of resources (such as time or memory)
required to solve a problem or perform a task
It is difficult to calculate the exact space and running time of all algorithms because it depends on
several factors like which language you are using, which compiler, or which machine you are
using for the implementation of the algorithm.
Time Complexity:
Time complexity is the amount of time required by an algorithm for the execution and providing
the desired output.
The time complexity of an algorithm is the amount of time required to complete the execution. The
time complexity of an algorithm is denoted by the big O notation. Here, big O notation is the
asymptotic notation to represent the time complexity. The time complexity is mainly calculated by
counting the number of steps to finish the execution.
16
An algorithm is said to have a constant time complexity when the time taken by the algorithm
remains constant and does not depend upon the number of inputs.
main()
{
int a=5;
a=a+5;
printf(“%d”,a);
}
The time complexity of an algorithm is said to be linear when the number of operations performed
is directly proportional to the size of the input.
int a[100],n;
printf(“Enter n value”);
scanf(“%d”,&n);
for(i=0;i<n;i++)
scanf(“%d,&a[i]);
An algorithm has quadratic time complexity if the time to execute it is proportional to the square of
the input size.
for(i=0;i<r;i++)
{
for(j=0;j<c;j++)
{
scanf(“%d”,&a[i][j]);
}
}
17
An algorithm is said to have a logarithmic time complexity when it reduces the size of the input
data in each step by 50 percent or the number of operations gets reduced by 50 percent as the
input size increases. This is because the algorithm divides the working area in half with each
iteration. this type of time complexity is to search for strategies like divide and conquer directly
affecting the number of operations.
Ex.
Mid=(low high)/2
total time=n/2
It should be quite clear from the notation itself, it is a combination of Linear and Logarithmic Time
Complexities. The time taken by this is slightly less than the linear time complexity but not as slow
as the quadratic time complexity.
Heap Sort and Quick Sort are some examples of Linearithmic Time Complexity
When we see exponential growth in the number of operations performed by the algorithm with the
increase in the size of the input, we can say that that algorithm has exponential time complexity.
The best example to explain this kind of time complexity is the Fibonacci Series.
Space Complexity:
Space complexity refers to the amount of memory used by an algorithm to solve a problem.
It measures the amount of memory space required to store the data and structures used by
an algorithm.
An algorithm's space complexity is the amount of space required to solve a problem and produce
an output. Similar to the time complexity, space complexity is also expressed in big O notation.
18
Auxiliary space: The extra space required by the algorithm, excluding the input size, is known as
an auxiliary space. The space complexity considers both the spaces, i.e., auxiliary space, and
space used by the input.
So,
Example
int main()
{
int a = 10;
float b = 20.5;
char c = 'A';
int d[10];
return 0;
}
To calculate the complexity of this algorithm, we need to determine the amount of memory used by each
of the variables. In this case:
19
Types of Space Complexity
Stack
A stack can be defined as a container in which insertion and deletion can be done from the
one end known as the top of the stack.
A Stack is a linear data structure that follows the LIFO (Last-In-First-Out) principle.
o push(): When we insert an element in a stack then the operation is known as a push. If the stack is
full then the overflow condition occurs.
o pop(): When we delete an element from the stack, the operation is known as a pop. If the stack is
empty means that no element exists in the stack, this state is known as an underflow state.
o isEmpty(): It determines whether the stack is empty or not.
o isFull(): It determines whether the stack is full or not.'
o peek(): It returns the element at the given position.
o count(): It returns the total number of elements available in a stack.
o change(): It changes the element at the given position.
o display(): It prints all the elements available in the stack.
20
Array implementation of Stack (Static Stack Representation)
In array implementation, the stack is formed by using the array. All the operations regarding the
stack are performed using arrays.
Algorithm:
Algorithm:
Step -1: begin
Step -2: if top = 0 then stack empty; // top == -1
Step -3: item := stack (top);
Step -4: top = top - 1;
Step -5: end;
int pop ()
{
if ( top == 0) // top == -1
22
{
printf("Underflow");
return 0;
}
else
{
Item=stack [top];
top=top-1;
return item;
}
}
In linked list implementation of stack, the nodes are maintained non-contiguously in the memory.
Each node contains a pointer to its immediate successor node in the stack. Stack is said to be
overflown if the space left in the memory heap is not enough to create a node.
23
struct stack *temp;
temp=allocate memory
temp->data=item;
temp->next=top;
top=temp;
}
24