CS33201 Data Structure
CS33201 Data Structure
Data Structures
Data is a collection of facts and figures or a set of values or values of a specific format that
refers to a single set of item values. The data items are then classified into sub-items, which is
the group of items that are not known as the simple primary form of the item.
Let us consider an example where an employee name can be broken down into three sub-
items: First, Middle, and Last. However, an ID assigned to an employee will generally be
considered a single item.
Data Structure is a branch of Computer Science. The study of data structure allows us to
understand the organization of data and the management of the data flow in order to increase
the efficiency of any process or program. Data Structure is a particular way of storing and
organizing data in the memory of the computer so that these data can easily be retrieved and
efficiently utilized in the future when required. The data can be managed in various ways,
like the logical or mathematical model for a specific organization of data is known as a data
structure.
Some examples of Data Structures are Arrays, Linked Lists, Stack, Queue, Trees, etc. Data
Structures are widely used in almost every aspect of Computer Science, i.e., Compiler
Design, Operating Systems, Graphics, Artificial Intelligence, and many more.
Data Structures are the main part of many Computer Science Algorithms as they allow the
programmers to manage the data in an effective way. It plays a crucial role in improving the
performance of a program or software, as the main objective of the software is to store and
retrieve the user's data as fast as possible.
The term "information" is sometimes utilized for data with given attributes of meaningful or
processed data.
As applications are becoming more complex and the amount of data is increasing every day,
which may lead to problems with data searching, processing speed, multiple requests
handling, and many more. Data Structures support different methods to organize, manage,
and store data efficiently. With the help of Data Structures, we can easily traverse the data
items. Data Structures provide Efficiency, Reusability, and Abstraction.
1. Data Structures and Algorithms are two of the key aspects of Computer Science.
2. Data Structures allow us to organize and store data, whereas Algorithms allow us to
process that data meaningfully.
3. Learning Data Structures and Algorithms will help us become better Programmers.
4. We will be able to write code that is more effective and reliable.
5. We will also be able to solve problems more quickly and efficiently.
1. Correctness: Data Structures are designed to operate correctly for all kinds of inputs
based on the domain of interest. In order words, correctness forms the primary
objective of Data Structure, which always depends upon the problems that the Data
Structure is meant to solve.
2. Efficiency: Data Structures also requires to be efficient. It should process the data
quickly without utilizing many computer resources like memory space. In a real-time
state, the efficiency of a data structure is a key factor in determining the success and
failure of the process.
A Data Structure delivers a structured set of variables related to each other in various ways. It
forms the basis of a programming tool that signifies the relationship between the data
elements and allows programmers to process the data efficiently.
1. Primitive Data Structures are the data structures consisting of the numbers and the
characters that come in-built into programs.
2. These data structures can be manipulated or operated directly by machine-level
instructions.
3. Basic data types like Integer, Float, Character, and Boolean come under the
Primitive Data Structures.
4. These data types are also called Simple data types, as they contain characters that
can't be divided further
1. Non-Primitive Data Structures are those data structures derived from Primitive Data
Structures.
2. These data structures can't be manipulated or operated directly by machine-level
instructions.
3. The focus of these data structures is on forming a set of data elements that is
either homogeneous (same data type) or heterogeneous (different data types).
4. Based on the structure and arrangement of data, we can divide these data structures
into two sub-categories -
A data structure that preserves a linear connection among its data elements is known as a
Linear Data Structure. The arrangement of the data is done linearly, where each element
consists of the successors and predecessors except the first and the last data element.
However, it is not necessarily true in the case of memory, as the arrangement may not be
sequential.
Based on memory allocation, the Linear Data Structures are further classified into two types:
1. Static Data Structures: The data structures having a fixed size are known as Static
Data Structures. The memory for these data structures is allocated at the compiler
time, and their size cannot be changed by the user after being compiled; however, the
data stored in them can be altered.
The Array is the best example of the Static Data Structure as they have a fixed size,
and its data can be modified later.
2. Dynamic Data Structures: The data structures having a dynamic size are known as
Dynamic Data Structures. The memory of these data structures is allocated at the run
time, and their size varies during the run time of the code. Moreover, the user can
change the size as well as the data elements stored in these data structures at the run
time of the code.
Linked Lists, Stacks, and Queues are common examples of dynamic data structures
1. Arrays
An Array is a data structure used to collect multiple data elements of the same data type into
one variable. Instead of storing multiple values of the same data types in separate variable
names, we could store all of them together into one variable. This statement doesn't imply
that we will have to unite all the values of the same data type in any program into one array
of that data type. But there will often be times when some specific variables of the same data
types are all related to one another in a way appropriate for an array.
An Array is a list of elements where each element has a unique place in the list. The data
elements of the array share the same variable name; however, each carries a different index
number called a subscript. We can access any data element from the list with the help of its
location in the list. Thus, the key feature of the arrays to understand is that the data is stored
in contiguous memory locations, making it possible for the users to traverse through the data
elements of the array using their respective indexes.
Figure 3. An Array
a. One-Dimensional Array: An Array with only one row of data elements is known as
a One-Dimensional Array. It is stored in ascending storage location.
b. Two-Dimensional Array: An Array consisting of multiple rows and columns of data
elements is called a Two-Dimensional Array. It is also known as a Matrix.
c. Multidimensional Array: We can define Multidimensional Array as an Array of
Arrays. Multidimensional Arrays are not bounded to two indices or two dimensions as
they can include as many indices are per the need.
a. We can store a list of data elements belonging to the same data type.
b. Array acts as an auxiliary storage for other data structures.
c. The array also helps store data elements of a binary tree of the fixed count.
d. Array also acts as a storage of matrices.
In the Data structure, we know that an array plays a vital role in having similar types of
elements arranged in a contiguous manner with the same data type. According to the
concept of array, an array can be defined in two ways as per the memory allocation
concept.
Types of Arrays:
There are basically two types of arrays:
Static Array: In this type of array, memory is allocated at compile time having a fixed
size of it. We cannot alter or update the size of this array.
Dynamic Array: In this type of array, memory is allocated at run time but not having a
fixed size. Suppose, a user wants to declare any random size of an array, then we will
not use a static array, instead of that a dynamic array is used in hand. It is used to
specify the size of it during the run time of any program.
Example:
Let us take an example, int a[5] creates an array of size 5 which means that we can insert
only 5 elements; we will not be able to add 6th element because the size of the array is
fixed above.
int a[5] = {1, 2, 3, 4, 5}; //Static Integer Array
The code mentioned above demonstrates the declaration and initialization of both a static
integer array and a dynamic integer array. Let’s break it down line by line:
Example of Static Array: int a[5] = {1, 2, 3, 4, 5};
A static integer array named a and initializes with the values 1, 2, 3, 4, and 5.
The size of this array is determined automatically based on the number of values
provided in the initialization list.
Thus the array a is allocated on the stack, and its size cannot be changed once defined.
Example of Dynamic Array: int *a = new int[5];
In this code, we declare a pointer to an integer named a and it allocates memory in a
dynamic fashion for an integer array of size 5.
The new keyword here is used for dynamic memory allocation, and int[5] specifies the
size of this dynamic array.
The new operator is used to return the address of the dynamically allocated memory,
which is already stored in the pointer a.
This array a is allocated on the Heap, and its size can be modified later if needed.
The differences between static and dynamic arrays based on this code snippet can be
as followed:
Static Integer Array:
The size is determined automatically based on the number of values provided during
initialization (in this case, 5).
The memory is allocated on the stack.
The size of the array is fixed once it is defined.
Dynamic Integer Array:
The memory is allocated during the run time by using the new keyword.
The size is specified explicitly (in this case, 5).
The memory is allocated on the heap (not the stack).
We can change the size of the array later by using delete[ ] which is used to to
deallocate the memory and allocating a new block with a different size if desired.
Pictorial Representation of Static and Dynamic Arrays:
Static Array and Dynamic Array
1. The memory allocation occurs during 1. The memory allocation occurs during run
compile time. time.
2. The array size is fixed and cannot be 2. The array size is not fixed and can be
changed. changed.
3. The location is in Stack Memory Space. 3. The location is in Heap Memory Space.
Stack:
o Stack is also one of the important linear data structures based on the LIFO ( Last In
First Out ) principle. Many computer applications and the various strategies used in
the operating system and other places are based on the principle of LIFO itself. In this
principle, the data element entered last must be popped out first from it, and the
element pushed into the stack at the very first time is popped out last. In this
approach, we will push the data elements into the stack until it reaches their end limit;
after that, we will pop out the corresponding values.
Queue:
Linked list:
o The linked list is another major data structure used in various programs, and even
many non-linear data structures are implemented using this linked list data structure.
As the name suggests, it consists of a link of the node connected by holding the
address of the next node. It comes in the portion of the linear data structure because it
forms the link-like structure the one data node is connected sequentially with the other
node by carrying the address of that node.
o This data is not arranged in a sequential contiguous location as observed in the array.
The homogeneous data elements are placed at the contiguous memory location to
retrieve data elements is simpler.
o A non-linear data structure is another important type in which data elements are not
arranged sequentially; mainly, data elements are arranged in random order without
forming a linear structure.
o Data elements are present at the multilevel, for example, tree.
o In trees, the data elements are arranged in the hierarchical form, whereas in graphs,
the data elements are arranged in random order, using the edges and vertex.
o Multiple runs are required to traverse through all the elements completely. Traversing
in a single run is impossible to traverse the whole data structure.
o Each element can have multiple paths to reach another element.
o The data structure where data items are not organized sequentially is called a non-
linear data structure. In other words, data elements of the non-linear data structure
could be connected to more than one element to reflect a special relationship among
them.
Tree:
o The tree is a non-linear data structure that is comprised of various nodes. The nodes in
the tree data structure are arranged in hierarchical order.
o It consists of a root node corresponding to its various child nodes, present at the next
level. The tree grows on a level basis, and root nodes have limited child nodes
depending on the order of the tree.
o For example, in the binary tree, the order of the root node is 2, which means it can
have at most 2 children per node, not more than it.
o The non-linear data structure cannot be implemented directly, and it is implemented
using the linear data structure like an array and linked list.
o The tree itself is a very broad data structure and is divided into various categories
like Binary tree, Binary search tree, AVL trees, Heap, max Heap, min-heap, etc.
o All the types of trees mentioned above differ based on their properties.
Graph
o A graph is a non-linear data structure with a finite number of vertices and edges, and
these edges are used to connect the vertices.
o The graph itself is categorized based on some properties; if we talk about a complete
graph, it consists of the vertex set, and each vertex is connected to the other vertexes
having an edge between them.
o The vertices store the data elements, while the edges represent the relationship
between the vertices.
o A graph is very important in various fields; the network system is represented using
the graph theory and its principles in computer networks.
o Even in Maps, we consider every location a vertex, and the path derived between two
locations is considered edges.
o The graph representation's main motive is to find the minimum distance between two
vertexes via a minimum edge weight.
Properties of Non-linear data structures
o It is used to store the data elements combined whenever they are not present in the
contiguous memory locations.
o It is an efficient way of organizing and properly holding the data.
o It reduces the wastage of memory space by providing sufficient memory to every data
element.
o Unlike in an array, we have to define the size of the array, and subsequent memory
space is allocated to that array; if we don't want to store the elements till the range of
the array, then the remaining memory gets wasted.
o So to overcome this factor, we will use the non-linear data structure and have multiple
options to traverse from one node to another.
o Data is stored randomly in memory.
o It is comparatively difficult to implement.
o Multiple levels are involved.
o Memory utilization is effective.
In the following section, we will discuss the different types of operations that we can perform
to manipulate data in every data structure:
1. Traversal: Traversing a data structure means accessing each data element exactly
once so it can be administered. For example, traversing is required while printing the
names of all the employees in a department.
2. Search: Search is another data structure operation which means to find the location of
one or more data elements that meet certain constraints. Such a data element may or
may not be present in the given set of data elements. For example, we can use the
search operation to find the names of all the employees who have the experience of
more than 5 years.
3. Insertion: Insertion means inserting or adding new data elements to the collection.
For example, we can use the insertion operation to add the details of a new employee
the company has recently hired.
4. Deletion: Deletion means to remove or delete a specific data element from the given
list of data elements. For example, we can use the deleting operation to delete the
name of an employee who has left the job.
5. Sorting: Sorting means to arrange the data elements in either Ascending or
Descending order depending on the type of application. For example, we can use the
sorting operation to arrange the names of employees in a department in alphabetical
order or estimate the top three performers of the month by arranging the performance
of the employees in descending order and extracting the details of the top three.
6. Merge: Merge means to combine data elements of two sorted lists in order to form a
single list of sorted data elements.
7. Create: Create is an operation used to reserve memory for the data elements of the
program. We can perform this operation using a declaration statement. The creation of
data structure can take place either during the following:
a. Compile-time
b. Run-time
For example, the malloc() function is used in C Language to create data
structure.
8. Selection: Selection means selecting a particular data from the available data. We can
select any particular data by specifying conditions inside the loop.
9. Update: The Update operation allows us to update or modify the data in the data
structure. We can also update any particular data by specifying some conditions inside
the loop, like the Selection operation.
10. Splitting: The Splitting operation allows us to divide data into various subparts
decreasing the overall process completion time.
Definition of Algorithm
Advantages of Algorithms
Easy to understand: Since it is a stepwise representation of a solution to a given
problem, it is easy to understand.
Language Independent: It is not dependent on any programming language, so it can
easily be understood by anyone.
Debug / Error Finding: Every step is independent / in a flow so it will be easy to spot
and fix the error.
Sub-Problems: It is written in a flow so now the programmer can divide the tasks which
makes them easier to code.
Disadvantages of Algorithms
Creating efficient algorithms is time-consuming and requires good logical skills.
Nasty to show branching and looping in algorithms.
We have discussed Asymptotic Analysis, and Worst, Average, and Best Cases of
Algorithms. The main idea of asymptotic analysis is to have a measure of the efficiency of
algorithms that don’t depend on machine-specific constants and don’t require algorithms to
be implemented and time taken by programs to be compared. Asymptotic notations are
mathematical tools to represent the time complexity of algorithms for asymptotic analysis.
Asymptotic Notations:
Asymptotic Notations are mathematical tools used to analyze the performance of
algorithms by understanding how their efficiency changes as the input size grows.
These notations provide a concise way to express the behavior of an algorithm’s time or
space complexity as the input size approaches infinity.
Rather than comparing algorithms directly, asymptotic analysis focuses on
understanding the relative growth rates of algorithms’ complexities.
It enables comparisons of algorithms’ efficiency by abstracting away machine-specific
constants and implementation details, focusing instead on fundamental trends.
Asymptotic analysis allows for the comparison of algorithms’ space and time
complexities by examining their performance characteristics as the input size varies.
By using asymptotic notations, such as Big O, Big Omega, and Big Theta, we can
categorize algorithms based on their worst-case, best-case, or average-case time or
space complexities, providing valuable insights into their efficiency.
If f(n) describes the running time of an algorithm, f(n) is O(g(n)) if there exist a positive
constant C and n0 such that, 0 ≤ f(n) ≤ cg(n) for all n ≥ n0
It returns the highest possible output value (big-O)for a given input.
The execution time serves as an upper bound on the algorithm’s time complexity.
Mathematical Representation of Big-O Notation:
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥
n0 }
For example, Consider the case of Insertion Sort. It takes linear time in the best case and
quadratic time in the worst case. We can safely say that the time complexity of the Insertion
sort is O(n2).
Note: O(n2) also covers linear time.
Examples :
{ 100 , log (2000) , 10^4 } belongs to O(1)
U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to O(n)
U { (n^2+n) , (2n^2) , (n^2+log(n))} belongs to O( n^2)
Note: Here, U represents union, we can write it in these manner because O provides exact
or upper bounds .
2. Omega Notation (Ω-Notation) :
Omega notation represents the lower bound of the running time of an algorithm. Thus, it
provides the best case complexity of an algorithm.
The execution time serves as a lower bound on the algorithm’s time complexity.
It is defined as the condition that allows an algorithm to complete statement execution
in the shortest amount of time.
Let g and f be the function from the set of natural numbers to itself. The function f is said to
be Ω(g), if there is a constant c > 0 and a natural number n0 such that c*g(n) ≤ f(n) for all n
≥ n0
Mathematical Representation of Omega notation :
Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥
n0 }
Examples :
{ (n^2+n) , (2n^2) , (n^2+log(n))} belongs to Ω( n^2)
U { (n/4) , (2n+3) , (n/100 + log(n)) } belongs to Ω(n)
U { 100 , log (2000) , 10^4 } belongs to Ω(1)
Note: Here, U represents union, we can write it in these manner because Ω provides exact
or lower bounds.
Theta notation
Since C is a structured language, it has some fixed rules for programming. One of them
includes changing the size of an array. An array is a collection of items stored at contiguous
memory locations.
As can be seen, the length (size) of the array above is 9. But what if there is a requirement
to change this length (size)? For example,
If there is a situation where only 5 elements are needed to be entered in this array. In
this case, the remaining 4 indices are just wasting memory in this array. So there is a
requirement to lessen the length (size) of the array from 9 to 5.
Take another situation. In this, there is an array of 9 elements with all 9 indices filled.
But there is a need to enter 3 more elements in this array. In this case, 3 indices more
are required. So the length (size) of the array needs to be changed from 9 to 12.
This procedure is referred to as Dynamic Memory Allocation in C.
Therefore, C Dynamic Memory Allocation can be defined as a procedure in which the size
of a data structure (like Array) is changed during the runtime.
C provides some functions to achieve these tasks. There are 4 library functions provided by
C defined under <stdlib.h> header file to facilitate dynamic memory allocation in C
programming. They are:
1. malloc()
2. calloc()
3. free()
4. realloc()
int main()
{
return 0;
}
Output
Enter number of elements:7
Entered number of elements: 7
Memory successfully allocated using malloc.
The elements of the array are: 1, 2, 3, 4, 5, 6, 7,
C calloc() method
1. “calloc” or “contiguous allocation” method in C is used to dynamically allocate the
specified number of blocks of memory of the specified type. it is very much similar to
malloc() but has two different points and these are:
2. It initializes each block with a default value ‘0’.
3. It has two parameters or arguments as compare to malloc().
Syntax of calloc() in C
ptr = (cast-type*)calloc(n, element-size);
here, n is the no. of elements and element-size is the size of each element.
For Example:
ptr = (float*) calloc(25, sizeof(float));
This statement allocates contiguous space in memory for 25 elements each with the size of
the float.
If space is insufficient, allocation fails and returns a NULL pointer.
Example of calloc() in C
C
#include <stdio.h>
#include <stdlib.h>
int main()
{
return 0;
}
Output
Enter number of elements: 5
Memory successfully allocated using calloc.
The elements of the array are: 1, 2, 3, 4, 5,
C free() method
“free” method in C is used to dynamically de-allocate the memory. The memory allocated
using functions malloc() and calloc() is not de-allocated on their own. Hence the free()
method is used, whenever the dynamic memory allocation takes place. It helps to reduce
wastage of memory by freeing it.
Syntax of free() in C
free(ptr);
Example of free() in C
C
#include <stdio.h>
#include <stdlib.h>
int main()
{
return 0;
}
Output
Enter number of elements: 5
Memory successfully allocated using malloc.
Malloc Memory successfully freed.
int main()
{
if (ptr == NULL) {
printf("Reallocation Failed\n");
exit(0);
}
free(ptr);
}
return 0;
}
Output
Enter number of elements: 5
Memory successfully allocated using calloc.
The elements of the array are: 1, 2, 3, 4, 5,
if (ans == 1) {
index++;
marks = (int*)realloc(
marks,
(index + 1)
* sizeof(
int)); // Dynamically reallocate
// memory by using realloc
// check if the memory is successfully
// allocated by realloc or not?
if (marks == NULL) {
printf("memory cannot be allocated");
}
else {
printf("Memory has been successfully "
"reallocated using realloc:\n");
printf(
"\n base address of marks are:%pc",
marks); ////print the base or
///beginning address of
///allocated memory
}
}
} while (ans == 1);
// print the marks of the students
for (i = 0; i <= index; i++) {
printf("marks of students %d are: %d\n ", i,
marks[i]);
}
free(marks);
}
return 0;
}