Data Structure Aktu
Data Structure Aktu
A data structure is a storage that is used to store and organize data. It is a way of arranging
data on a computer so that it can be accessed and updated efficiently.
A data structure is not only used for organizing the data. It is also used for processing,
retrieving, and storing data. There are different basic and advanced types of data structures
that are used in almost every program or software system that has been developed. So we
must have good knowledge about data structures.
Classification of Data Structure:
Linear data structure: Data structure in which data elements are arranged sequentially
or linearly, where each element is attached to its previous and next adjacent elements,
is called a linear data structure.
Examples of linear data structures are array, stack, queue, linked list, etc.
Static data structure: Static data structure has a fixed memory size. It is easier to
access the elements in a static data structure.
An example of this data structure is an array.
Dynamic data structure: In dynamic data structure, the size is not fixed. It can be
randomly updated during the runtime which may be considered efficient
concerning the memory (space) complexity of the code.
Examples of this data structure are queue, stack, etc.
Non-linear data structure: Data structures where data elements are not placed
sequentially or linearly are called non-linear data structures. In a non-linear data
structure, we can’t traverse all the elements in a single run only.
Examples of non-linear data structures are trees and graphs.
For example, we can store a list of items having the same data-type using the array data
structure.
Efficient data access and manipulation: Data structures enable quick access and
manipulation of data. For example, an array allows constant-time access to elements
using their index, while a hash table allows fast access to elements based on their key.
Without data structures, programs would have to search through data sequentially,
leading to slow performance.
Memory management: Data structures allow efficient use of memory by allocating and
deallocating memory dynamically. For example, a linked list can dynamically allocate
memory for each element as needed, rather than allocating a fixed amount of memory
upfront. This helps avoid memory wastage and enables efficient memory management.
Code reusability: Data structures can be reused across different programs and projects.
For example, a generic stack data structure can be used in multiple programs that
require LIFO (Last-In-First-Out) functionality, without having to rewrite the same code
each time.
Optimization of algorithms: Data structures help optimize algorithms by enabling
efficient data access and manipulation. For example, a binary search tree allows fast
searching and insertion of elements, making it ideal for implementing searching and
sorting algorithms.
3. Stack: Stack is a linear data structure which follows a particular order in which the
operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last
Out). In stack, all insertion and deletion are permitted at only one end of the list.
Mainly the following three basic operations are performed in the stack:
4. Queue: Like Stack, Queue is a linear structure which follows a particular order in
which the operations are performed. The order is First In First Out (FIFO). In the
queue, items are inserted at one end and deleted from the other end. A good example
of the queue is any queue of consumers for a resource where the consumer that came
first is served first. The difference between stacks and queues is in removing. In a stack
we remove the item the most recently added; in a queue, we remove the item the
least recently added.
Enqueue: Adds an item to the queue. If the queue is full, then it is said to be an
Overflow condition.
Dequeue: Removes an item from the queue. The items are popped in the same order in
which they are pushed. If the queue is empty, then it is said to be an Underflow
condition.
Front: Get the front item from the queue.
Rear: Get the last item from the queue.
5. Binary Tree: Unlike Arrays, Linked Lists, Stack and queues, which are linear data
structures, trees are hierarchical data structures. A binary tree is a tree data structure
in which each node has at most two children, which are referred to as the left child and
the right child. It is implemented mainly using Links. A Binary Tree is represented by a
pointer to the topmost node in the tree. If the tree is empty, then the value of root is
NULL. A Binary Tree node contains the following parts.
1. Data
2. Pointer to left child
3. Pointer to the right child
6. Binary Search Tree: A Binary Search Tree is a Binary Tree following the additional
properties:
The left part of the root node contains keys less than the root node key.
The right part of the root node contains keys greater than the root node key.
There is no duplicate key present in the binary tree.
A Binary tree having the following properties is known as Binary search tree (BST).
Applications of Data Structures:
Arrays: Arrays are used to store a collection of homogeneous elements in contiguous
memory locations. They are commonly used to implement other data structures, such
as stacks and queues, and to represent matrices and tables.
Linked lists: Linked lists are used to store a collection of heterogeneous elements with
dynamic memory allocation. They are commonly used to implement stacks, queues,
and hash tables.
Trees: Trees are used to represent hierarchical data structures, such as file systems,
organization charts, and network topologies. Binary search trees are commonly used to
implement dictionaries and symbol tables.
Graphs: Graphs are used to represent complex relationships between data elements,
such as social networks, transportation networks, and computer networks. They are
commonly used to implement shortest path algorithms and graph traversal algorithms.
Hash tables: Hash tables are used to implement associative arrays, which store key-
value pairs. They provide fast access to data elements based on their keys.
Stacks: Stacks are used to store a collection of elements in a last-in-first-out (LIFO)
order. They are commonly used to implement undo-redo functionality, recursive
function calls, and expression evaluation.
Queues: Queues are used to store a collection of elements in a first-in-first-out (FIFO)
order. They are commonly used to implement waiting lines, message queues, and job
scheduling.
DIFFERENCE BETWEEN DATA TYPE AND DATA STRUCTURE
It can hold value but not data. It can hold multiple types of data within a
Therefore, it is dataless. single object.
Data type examples are int, float, Data structure examples are stack, queue,
double, etc. tree, etc.
The user of data type does not need to know how that data type is implemented, for
example, we have been using Primitive values like int, float, char data types only with the
knowledge that these data type can operate and be performed on without any idea of how
they are implemented.
Features of ADT:
Abstract data types (ADTs) are a way of encapsulating data and operations on that data into
a single unit. Some of the key features of ADTs include:
Abstraction: The user does not need to know the implementation of the data
structure only essentials are provided.
Better Conceptualization: ADT gives us a better conceptualization of the real world.
Robust: The program is robust and has the ability to catch errors.
Encapsulation: ADTs hide the internal details of the data and provide a public
interface for users to interact with the data. This allows for easier maintenance and
modification of the data structure.
Data Abstraction: ADTs provide a level of abstraction from the implementation
details of the data. Users only need to know the operations that can be performed on
the data, not how those operations are implemented.
Data Structure Independence: ADTs can be implemented using different data
structures, such as arrays or linked lists, without affecting the functionality of the
ADT.
Information Hiding: ADTs can protect the integrity of the data by allowing access only
to authorized users and operations. This helps prevent errors and misuse of the data.
Modularity: ADTs can be combined with other ADTs to form larger, more complex
data structures. This allows for greater flexibility and modularity in programming.
Overall, ADTs provide a powerful tool for organizing and manipulating data in a structured
and efficient manner.
Abstract data types (ADTs) have several advantages and disadvantages that should be
considered when deciding to use them in software development. Here are some of the main
advantages and disadvantages of using ADTs:
Advantages:
Encapsulation: ADTs provide a way to encapsulate data and operations into a single
unit, making it easier to manage and modify the data structure.
Abstraction: ADTs allow users to work with data structures without having to know
the implementation details, which can simplify programming and reduce errors.
Data Structure Independence: ADTs can be implemented using different data
structures, which can make it easier to adapt to changing needs and requirements.
Information Hiding: ADTs can protect the integrity of data by controlling access and
preventing unauthorized modifications.
Modularity: ADTs can be combined with other ADTs to form more complex data
structures, which can increase flexibility and modularity in programming.
Disadvantages:
Overhead: Implementing ADTs can add overhead in terms of memory and processing,
which can affect performance.
Complexity: ADTs can be complex to implement, especially for large and complex
data structures.
Learning Curve: Using ADTs requires knowledge of their implementation and usage,
which can take time and effort to learn.
Limited Flexibility: Some ADTs may be limited in their functionality or may not be
suitable for all types of data structures.
Cost: Implementing ADTs may require additional resources and investment, which can
increase the cost of development.
Overall, the advantages of ADTs often outweigh the disadvantages, and they are widely used
in software development to manage and manipulate data in a structured and efficient way.
However, it is important to consider the specific needs and requirements of a project when
deciding whether to use ADTs.
From these definitions, we can clearly see that the definitions do not specify how these ADTs
will be represented and how the operations will be carried out. There can be different ways to
implement an ADT, for example, the List ADT can be implemented using arrays, or singly
linked list or doubly linked list. Similarly, stack ADT and Queue ADT can be implemented
using arrays or linked lists.
What is an Algorithm?
The word Algorithm means “A set of rules to be followed in calculations or other problem-
solving operations” Or “A procedure for solving a mathematical problem in a finite number of
steps that frequently involves recursive operations “.Therefore Algorithm refers to a sequence
of finite steps to solve a particular problem. Algorithms can be simple and complex depending
on what you want to achieve.
Asymptotic Analysis is defined as the big idea that handles the above issues in analyzing
algorithms. In Asymptotic Analysis, we evaluate the performance of an algorithm in terms of
input size (we don’t measure the actual running time). We calculate, how the time (or space)
taken by an algorithm increases with the input size.
1. Big O notation (O): This notation provides an upper bound on the growth rate of an
algorithm’s running time or space usage. It represents the worst-case scenario, i.e.,
the maximum amount of time or space an algorithm may need to solve a problem. For
example, if an algorithm’s running time is O(n), then it means that the running time of
the algorithm increases linearly with the input size n or less.
2. Omega notation (Ω): This notation provides a lower bound on the growth rate of an
algorithm’s running time or space usage. It represents the best-case scenario, i.e., the
minimum amount of time or space an algorithm may need to solve a problem. For
example, if an algorithm’s running time is Ω(n), then it means that the running time of
the algorithm increases linearly with the input size n or more.
3. Theta notation (Θ): This notation provides both an upper and lower bound on the
growth rate of an algorithm’s running time or space usage. It represents the average-
case scenario, i.e., the amount of time or space an algorithm typically needs to solve a
problem. For example, if an algorithm’s running time is Θ(n), then it means that the
running time of the algorithm increases linearly with the input size n.
In general, the choice of asymptotic notation depends on the problem and the specific
algorithm used to solve it. It is important to note that asymptotic notation does not provide
an exact running time or space usage for an algorithm, but rather a description of how the
algorithm scales with respect to input size. It is a useful tool for comparing the efficiency of
different algorithms and for predicting how they will perform on large input sizes.
There are many important things that should be taken care of, like user-friendliness,
modularity, security, maintainability, etc. Why worry about performance? The answer to this
is simple, we can have all the above things only if we have performance. So performance is
like currency through which we can buy all the above things. Another reason for studying
performance is – speed is fun! To summarize, performance == scale. Imagine a text editor
that can load 1000 pages, but can spell check 1 page per minute OR an image editor that
takes 1 hour to rotate your image 90 degrees left OR … you get it. If a software feature can
not cope with the scale of tasks users need to perform – it is as good as dead.
Given two algorithms for a task, how do we find out which one is better?
One naive way of doing this is – to implement both the algorithms and run the two programs
on your computer for different inputs and see which one takes less time. There are many
problems with this approach for the analysis of algorithms.
It might be possible that for some inputs, the first algorithm performs better than the
second. And for some inputs second performs better.
It might also be possible that for some inputs, the first algorithm performs better on
one machine, and the second works better on another machine for some other inputs.
Asymptotic Analysis is the big idea that handles the above issues in analyzing algorithms. In
Asymptotic Analysis, we evaluate the performance of an algorithm in terms of input size (we
don’t measure the actual running time). We calculate, how the time (or space) taken by an
algorithm increases with the input size.
For example, let us consider the search problem (searching a given item) in a sorted array.
To understand how Asymptotic Analysis solves the problems mentioned above in analyzing
algorithms,
let us say:
we run the Linear Search on a fast computer A and
Binary Search on a slow computer B and
pick the constant values for the two computers so that it tells us exactly how
long it takes for the given machine to perform the search in seconds.
Let’s say the constant for A is 0.2 and the constant for B is 1000 which means that A is
5000 times more powerful than B.
For small values of input array size n, the fast computer may take less time.
But, after a certain value of input array size, the Binary Search will definitely start
taking less time compared to the Linear Search even though the Binary Search is
being run on a slow machine.
The reason is the order of growth of Binary Search with respect to input size is
logarithmic while the order of growth of Linear Search is linear.
So the machine-dependent constants can always be ignored after a certain value of
input size.
2. Omega Notation
It defines the best case of an algorithm’s time complexity, the Omega notation defines whether
the set of functions will grow faster or at the same rate as the expression. Furthermore, it
explains the minimum amount of time an algorithm requires to consider all input values.
3. Theta Notation
It defines the average case of an algorithm’s time complexity, the Theta notation defines when
the set of functions lies in both O(expression) and Omega(expression), then Theta notation is
used. This is how we define a time complexity average case for an algorithm.
Based on the above three notations of Time Complexity there are three cases to analyze an
algorithm:
In the worst-case analysis, we calculate the upper bound on the running time of an algorithm.
We must know the case that causes a maximum number of operations to be executed. For
Linear Search, the worst case happens when the element to be searched (x) is not present in the
array. When x is not present, the search() function compares it with all the elements of arr[] one
by one. Therefore, the worst-case time complexity of the linear search would be O(n).
In the best-case analysis, we calculate the lower bound on the running time of an algorithm. We
must know the case that causes a minimum number of operations to be executed. In the linear
search problem, the best case occurs when x is present at the first location. The number of
operations in the best case is constant (not dependent on n). So time complexity in the best case
would be Ω(1)
In average case analysis, we take all possible inputs and calculate the computing time for all of
the inputs. Sum all the calculated values and divide the sum by the total number of inputs. We
must know (or predict) the distribution of cases. For the linear search problem, let us assume
that all cases are uniformly distributed (including the case of x not being present in the array).
So we sum all the cases and divide the sum by (n+1). Following is the value of average-case
time complexity.
Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 * g(n) ≤ f(n) ≤ c2
* g(n) for all n ≥ n0}
The above expression can be described as if f(n) is theta of g(n), then the value f(n) is always
between c1 * g(n) and c2 * g(n) for large values of n (n ≥ n0). The definition of theta also
requires that f(n) must be non-negative for values of n greater than n0.
The execution time serves as both a lower and upper bound on the algorithm’s time
complexity.
It exist as both, most, and least boundaries for a given input value.
A simple way to get the Theta notation of an expression is to drop low-order terms and ignore
leading constants. For example, Consider the expression 3n3 + 6n2 + 6000 = Θ(n3), the
dropping lower order terms is always fine because there will always be a number(n) after
which Θ(n3) has higher values than Θ(n2) irrespective of the constants involved. For a given
function g(n), we denote Θ(g(n)) is following set of functions. Examples :
{ 100 , log (2000) , 10^4 } belongs to Θ(1)
{ (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Θ(n)
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Θ( n2)
Big-O notation represents the upper bound of the running time of an algorithm. Therefore, it
gives the worst-case complexity of an algorithm. It is the most widely used notation for
Asymptotic analysis. It specifies the upper bound of a function.The maximum time required by
an algorithm or the worst-case time complexity. It returns the highest possible output
value(big-O) for a given input. Big-Oh(Worst Case) It is defined as the condition that allows an
algorithm to complete statement execution in the shortest amount of time possible.
If f(n) describes the running time of an algorithm, f(n) is O(g(n)) if there exist a positive
constant C and n0 such that, 0 ≤ f(n) ≤ cg(n) for all n ≥ n0. It returns the highest possible
output value (big-O) for a given input. The execution time serves as an upper bound on the
algorithm’s time complexity.
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0
} For example, Consider the case of Insertion Sort. It takes linear time in the best case and
quadratic time in the worst case. We can safely say that the time complexity of the Insertion sort
is O(n2).
2
Note: O(n ) also covers linear time.
If we use Θ notation to represent the time complexity of Insertion sort, we have to use two
statements for best and worst cases:
The Big-O notation is useful when we only have an upper bound on the time complexity of an
algorithm. Many times we easily find an upper bound by simply looking at the algorithm.
Examples :
Note: Here, U represents union, we can write it in these manner because O provides exact or
upper bound
Omega notation represents the lower bound of the running time of an algorithm. Thus, it
provides the best case complexity of an algorithm.
The execution time serves as a lower bound on the algorithm’s time complexity.
Let g and f be the function from the set of natural numbers to itself. The function f is said to be
Ω(g), if there is a constant c > 0 and a natural number n0 such that c*g(n) ≤ f(n) for all n ≥ n0
Let us consider the same Insertion sort example here. The time complexity of Insertion Sort can
be written as Ω(n), but it is not very useful information about insertion sort, as we are generally
interested in worst-case and sometimes in the average case.
Examples :
Note: Here, U represents union, we can write it in these manner because Ω provides exact or
lower bounds.
The analysis of loops for the complexity analysis of algorithms involves finding the number of
operations performed by a loop as a function of the input size. This is usually done by
determining the number of iterations of the loop and the number of operations performed in
each iteration. Here are the general steps to analyze loops for complexity analysis:
Determine the number of iterations of the loop. This is usually done by analyzing the loop
control variables and the loop termination condition.
Determine the number of operations performed in each iteration of the loop. This can include
both arithmetic operations and data access operations, such as array accesses or memory
accesses.
Express the total number of operations performed by the loop as a function of the input size.
This may involve using mathematical expressions or finding a closed-form expression for the
number of operations performed by the loop.
Determine the order of growth of the expression for the number of operations performed by the
loop. This can be done by using techniques such as big O notation or by finding the dominant
term and ignoring lower-order terms.
The time complexity of a function (or set of statements) is considered as O(1) if it doesn’t
contain a loop, recursion, and call to any other non-constant time function.
i.e. set of non-recursive and non-loop statements
In computer science, O(1) refers to constant time complexity, which means that the running
time of an algorithm remains constant and does not depend on the size of the input. This means
that the execution time of an O(1) algorithm will always take the same amount of time
regardless of the input size. An example of an O(1) algorithm is accessing an element in an
array using an index.
Example:
// Here c is a constant
for (int i = 1; i <= c; i++) {
// some O(1) expressions
}
Linear Time Complexity O(n):
The Time Complexity of a loop is considered as O(n) if the loop variables are
incremented/decremented by a constant amount. For example following functions have O(n)
time complexity. Linear time complexity, denoted as O(n), is a measure of the growth of the
running time of an algorithm proportional to the size of the input. In an O(n) algorithm, the
running time increases linearly with the size of the input. For example, searching for an
element in an unsorted array or iterating through an array and performing a constant amount
of work for each element would be O(n) operations. In simple words, for an input of size n, the
algorithm takes n steps to complete the operation.
1. // Here c is a positive integer constant
for (int i = 1; i <= n; i += c) {
// some O(1) expressions
}
1. add(4)
2. -> add(3)
3. -> add(2)
4. -> add(1)
5. -> add(0)
Each of these calls is added to call stack and takes up actual memory.
So it takes O(n) space.
However, just because you have n calls total doesn’t mean it takes O(n) space.
Look at the below function :
int addSequence (int n){
int sum = 0;
for (int i = 0; i < n; i++){
sum += pairSum(i, i+1);
}
return sum;
}
There will be roughly O(n) calls to pairSum. However, those calls do not exist simultaneously
on the call stack, so you only need O(1) space.
Note: It’s necessary to mention that space complexity depends on a variety of things such as
the programming language, the compiler, or even the machine running the algorithm.
Time-Space Trade-Off in Algorithms
Space-Time tradeoff in computer science is basically a problem solving technique in which we
solve the problem:
Either in less time and using more space, or
In very little space by spending more time.
The best algorithm is the one which helps to solve a problem that requires less space in
memory as well as takes less time to generate the output.But in general, it is not always
possible to achieve both of these conditions at the same time.
If our problem is taking a long time but not much memory, a space-time trade-off would let us
use more memory and solve the problem more quickly. Or, if it could be solved very quickly
but requires more memory than we have, we can try to spend more time solving the problem
in the limited memory.
Arrays:
An array is a linear data structure and it is a collection of items stored at contiguous memory
locations. The idea is to store multiple items of the same type together in one place. It allows
the processing of a large amount of data in a relatively short period. The first element of the
array is indexed by a subscript of 0. There are different operations possible in an array, like
Searching, Sorting, Inserting, Traversing, Reversing, and Deleting.
(1) Array traversal
Given an integer array of size N, the task is to traverse and print the elements in the array.
Examples:
Input: arr[] = {2, -1, 5, 6, 0, -3}
Output: 2 -1 5 6 0 -3
Input: arr[] = {4, 0, -2, -9, -7, 1}
Output: 4 0 -2 -9 -7 1
Approach:-
1. Start a loop from 0 to N-1, where N is the size of array.
for(i = 0; i < N; i++)
2. Access every element of array with help of
arr[index]
3. Print the elements.
printf("%d ", arr[i])
Below is the implementation of the above approach:
// C program to traverse the array
#include <stdio.h>
printf("Array: ");
for (i = 0; i < n; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
// Driver program
int main()
{
int arr[] = { 2, -1, 5, 6, 0, -3 };
int n = sizeof(arr) / sizeof(arr[0]);
printArray(arr, n);
return 0;
}
Output:
Array: 2 -1 5 6 0 -3
Time Complexity: O(n) //since one traversal of the array is required to complete all operations
hence overall time required by the algorithm is linear
Auxiliary Space: O(1)// since no extra array is used so the space taken by the algorithm is
constant
begin
IF N = MAX, return
ELSE
N=N+1
A[FIRST] = New_Element
end
Implementation in C
#include <stdio.h>
#define MAX 5
void main() {
int array[MAX] = {2, 3, 4, 5};
int N = 4; // number of elements in array
int i = 0; // loop variable
int value = 1; // new data element to be stored in array
// print to confirm
Output
array[0] = 2
array[1] = 3
array[2] = 4
array[3] = 5
array[0] = 0
array[1] = 2
array[2] = 3
array[3] = 4
array[4] = 5
C program:
#include<stdio.h>
#include<conio.h>
int main()
{
int arr[10], i, element;
printf("Enter 5 Array Elements: ");
for(i=0; i<5; i++)
scanf("%d", &arr[i]);
printf("\nEnter Element to Insert: ");
scanf("%d", &element);
arr[i] = element;
printf("\nThe New Array is:\n");
for(i=0; i<6; i++)
printf("%d ", arr[i]);
getch();
return 0;
}
int main()
{
int arr[100] = { 0 };
int i, x, pos, n = 10;
// element to be inserted
x = 50;
// insert x at pos
arr[pos - 1] = x;
DELETION
Deletion in array means removing an element and replacing it with the next element or
element present at next index. It involves three cases:
(a) Deletion in Array- an element at the beginning of the array:-
In this case we have to move all the elements one position forward to fill the position of the
element at the beginningof array. Though the deletion process is not difficult but moving all
elements one position forward involve movement of all the existing elements except the one
being deleted. This is the worst case scenario in deletion in a linear array.
In the example array elements from index 1 to index 8 have to moved one position forwards so
that the first element is replaced by second, second by third and so on.
#include <stdio.h>
#define MAX_SIZE 100
int main()
{
int arr[MAX_SIZE];
int i, size, pos;
return 0;
}