Introduction to Data Structures and Algorithms
Introduction to Data Structures and Algorithms
Algorithms
Data Structure is a way of collecting and organising data in such a way that we can
perform operations on these data in an effective way. Data Structures is about
rendering data elements in terms of some relationship, for better organization and
storage. For example, we have some data which has, player's name "Virat"
and age 26. Here "Virat" is of String data type and 26 is of integer data type.
We can organize this data as a record like Player record, which will have both
player's name and age in it. Now we can collect and store player's records in a file or
database as a data structure. For example: "Dhoni" 30, "Gambhir" 31, "Sehwag" 33
If you are aware of Object Oriented programming concepts, then a class also does
the same thing, it collects different type of data under one single entity. The only
difference being, data structures provides for techniques to access and manipulate
data efficiently.
Data structures
A data structure is a specialized format for organizing, processing, retrieving and
storing data. There are several basic and advanced types of data structures, all
designed to arrange data to suit a specific purpose. Data structures make it easy for
users to access and work with the data they need in appropriate ways. Most
importantly, data structures frame the organization of information so that machines
and humans can better understand it.
For instance, in an object-oriented programming language, the data structure and its
associated methods are bound together as part of a class definition. In non-object-
oriented languages, there may be functions defined to work with the data structure,
but they are not technically part of the data structure.
Data structures are the building blocks for more sophisticated applications. They are
designed by composing data elements into a logical unit representing an abstract
data type that has relevance to the algorithm or application. An example of an
abstract data type is a "customer name" that is composed of the character strings
for "first name," "middle name" and "last name."
It is not only important to use data structures, but it is also important to choose the
proper data structure for each task. Choosing an ill-suited data structure could result
in slow runtimes or unresponsive code. Five factors to consider when picking a data
structure include the following:
Software engineers use algorithms that are tightly coupled with the data structures
-- such as lists, queues and mappings from one set of values to another. This
approach can be fused in a variety of applications, including managing collections of
records in a relational database and creating an index of those records using a data
structure called a binary tree.
Some examples of how data structures are used include the following:
Storing data. Data structures are used for efficient data persistence, such as
specifying the collection of attributes and corresponding structures used to store
records in a database management system.
Managing resources and services. Core operating system (OS) resources and
services are enabled through the use of data structures such as linked lists for
memory allocation, file directory management and file structure trees, as well as
process scheduling queues.
Ordering and sorting. Data structures such as binary search trees -- also known
as an ordered or sorted binary tree -- provide efficient methods of sorting objects,
such as character strings used as tags. With data structures such as priority
queues, programmers can manage items organized according to a specific
priority.
Indexing. Even more sophisticated data structures such as B-trees are used to
index objects, such as those stored in a database.
Searching. Indexes created using binary search trees, B-trees or hash tables
speed the ability to find a specific sought-after item.
Scalability. Big data applications use data structures for allocating and
managing data storage across distributed storage locations, ensuring scalability
and performance. Certain big data programming environments -- such as Apache
Spark -- provide data structures that mirror the underlying structure of database
records to simplify querying.
Characteristics of data structures
Data structures are often classified by their characteristics. The following three
characteristics are examples:
1. Linear or non-linear. This characteristic describes whether the data items are
arranged in sequential order, such as with an array, or in an unordered sequence,
such as with a graph.
3. Static or dynamic. This characteristic describes how the data structures are
compiled. Static data structures have fixed sizes, structures and memory
locations at compile time. Dynamic data structures have sizes, structures and
memory locations that can shrink or expand, depending on the use.
Data types
If data structures are the building blocks of algorithms and computer programs, the
primitive -- or base -- data types are the building blocks of data structures. The
typical base data types include the following:
Boolean, which stores logical values that are either true or false.
Stack. A stack stores a collection of items in the linear order that operations are
applied. This order could be last in, first out (LIFO) or first in, first out (FIFO).
Queue. A queue stores a collection of items like a stack; however, the operation
order can only be first in, first out.
Linked list. A linked list stores a collection of items in a linear order. Each
element, or node, in a linked list contains a data item, as well as a reference, or
link, to the next item in the list.
Trie. A trie, also known as a keyword tree, is a data structure that stores strings
as data items that can be organized in a visual graph.
Hash table. A hash table -- also known as a hash map -- stores a collection of
items in an associative array that plots keys to values. A hash table uses a hash
function to convert an index into an array of buckets that contain the desired data
item.
Hashing is a data
structure technique where key values are converted into indexes of an array where the data is
stored.
These are considered complex data structures as they can store large amounts of
interconnected data.
1. Supported operations. What functions and operations does the program need?
3. Programming elegance. Are the organization of the data structure and its
functional interface easy to use?
Linked lists are best if a program is managing a collection of items that don't
need to be ordered, constant time is required for adding or removing an item
from the collection and increased search time is OK.
Stacks are best if the program is managing a collection that needs to support a
LIFO order.
Queues should be used if the program is mana ging a collection that needs
to support a FIFO order.
Binary trees are good for managing a collection of items with a parent-child
relationship, such as a family tree.
Binary search trees are appropriate for managing a sorted collection where the
goal is to optimize the time it takes to find specific items in the collection.
Graphs work best if the application will analyze connectivity and relationships
among a collection of individuals in a social media network.
What is an Algorithm ?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a
certain predefined task. Algorithm is not the complete code or program, it is just the
core logic(solution) of a problem, which can be expressed either as an informal high
level description as pseudocode or using a flowchart.
3. Definiteness- Every step of the algorithm should be clear and well defined.
An algorithm is said to be efficient and fast, if it takes less time to execute and
consumes less memory space. The performance of an algorithm is measured on the
basis of following properties :
1. Time Complexity
2. Space Complexity
Space Complexity
Its the amount of memory space required by the algorithm, during the
course of its execution. Space complexity must be taken seriously for
multi-user systems and in situations where limited memory is available.
An algorithm generally requires space for following components :
Time Complexity
NOTE: Before going deep into data structure, you should have a good
knowledge of programming either in C or in C++ or Java or Python etc.
Final Thoughts
/*
*/
{
int max = (len - 1);
int min = 0;
int step = 0; // to find out in how many steps we completed the search
step++;
if(values[guess] == target)
return guess;
else
// present in array
return -1;
int main(void)
if(result == -1)
{
printf("Element is not present in the given array.");
else
return 0;
Copy
We hope the above code is clear, if you have any confusion, post your
question in our Q & A Forum.
n/m2 = 1
m = √n
Copy
Hence, the optimal jump size is √n, where n is the size of the array to
be searched or the total number of elements to be searched.
Step 2: Compare A[i] with item. If A[i] != item and A[i] < item, then jump to
the next block. Also, do the following:
1. Set i = m
2. Increment m by √n
Step 3: Repeat the step 2 till m < n-1
Step 4: If A[i] > item, then move to the beginning of the current block and
perform a linear search.
1. Set x = i
2. Compare A[x] with item. If A[x]== item, then print x as the valid
location else set x++
3. Repeat Step 4.1 and 4.2 till x < m
Step 5: Exit
A[] = {0, 1, 1, 2, 3, 5, 8, 13, 21, 55, 77, 89, 101, 201, 256, 780}
item = 77
Step 2: Compare A[0] with item. Since A[0] != item and A[0]<item,
skip to the next block
Step 3: Compare A[3] with item. Since A[3] != itemand A[3]<item, skip
to the next block
Step 5: Compare A[9] with item. Since A[9] != itemand A[9]<item, skip
to the next block
Step 6: Compare A[12] with item. Since A[12] != item and A[12] >item,
skip to A[9] (beginning of the current block) and perform a linear search.
Compare A[9] with item. Since A[9] != item, scan the next element
Compare A[10] with item. Since A[10] == item, index 10 is printed
as the valid location and the algorithm will terminate
#include<iostream>
#include<cmath>
m += sqrt(n);
return -1;
if(a[x] == item)
return -1;
int main() {
cin >> n;
cout << "\n Enter search key to be found in the array: ";
if(loc>=0)
else
Copy
The input array is the same as that used in the example:
Note: The algorithm can be implemented in any programming language
as per the requirement.
Let's see what will be the time and space complexity for the Jump search
algorithm:
Time Complexity:
The while loop in the above C++ code executes n/m times because the
loop counter increments by m times in every iteration. Since the optimal
value of m= √n , thus, n/m=√n resulting in a time complexity of O(√n).
Space Complexity:
Introduction to Sorting
Sorting is nothing but arranging the data in ascending or descending
order. The term sorting came into picture, as humans realised the
importance of searching quickly.
There are so many things in our real life that we need to search for, like a
particular record in database, roll numbers in merit list, a particular
telephone number in telephone directory, a particular page in a book etc.
All this would have been a mess if the data was kept unordered and
unsorted, but fortunately the concept of sorting came into existence,
making it easier for everyone to arrange data in an order, hence making
it easier to search.
Sorting Efficiency
If you ask me, how will I arrange a deck of shuffled cards in order, I would
say, I will start by checking every card, and making the deck as I move
on.
It can take me hours to arrange the deck in order, but that's how I will do
it.
Well, thank god, computers don't work like this.
The two main criterias to judge which algorithm is better than the other
have been:
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort
Although it's easier to understand these sorting techniques, but still we
suggest you to first learn about Space complexity, Time complexity and
the searching algorithms, to warm up your brain for sorting algorithms.
If I give you another card, and ask you to insert the card in just the right position, so that
the cards in your hand are still sorted. What will you do?
Well, you will have to go through each card from the starting or the back and find the right
position for the new card, comparing it's value with each card. Once you find the right
position, you will insert the card there.
Similarly, if more new cards are provided to you, you can easily repeat the same process
and insert the new cards and keep the cards sorted too.
This is exactly how insertion sort works. It starts from the index 1(not 0), and each index
starting from index 1 is like a new card, that you have to place at the right position in the
sorted subarray on the left.
1. It is efficient for smaller data sets, but very inefficient for larger lists.
2. Insertion Sort is adaptive, that means it reduces its total number of steps if a partially
sorted array is provided as input, making it efficient.
4. Its space complexity is less. Like bubble Sort, insertion sort also requires a single
additional memory space.
5. It is a stable sorting technique, as it does not change the relative order of elements
which are equal.
2. We compare the key element with the element(s) before it, in this case, element at
index 0:
o If the key element is less than the first element, we insert the key element
before the first element.
o If the key element is greater than the first element, then we insert it after the
first element.
3. Then, we make the third element of the array as key and will compare it with
elements to it's left and insert it at the right position.
Below, we have a pictorial representation of how bubble sort will sort the given array.
As you can see in the diagram above, after picking a key, we start iterating over the
elements to the left of the key.
We continue to move towards left if the elements are greater than the key element and stop
when we find the element which is less than the key element.
And, insert the key element after the element which is less than the key element.
#include <stdlib.h>
#include <iostream>
// main function
int main()
insertionSort(array, 6);
return 0;
}
void insertionSort(int arr[], int length)
int i, j, key;
j = i;
key = arr[j];
arr[j - 1] = key;
j--;
printArray(arr, length);
{
int j;
Copy
Sorted Array: 1 2 3 4 5 6
Now let's try to understand the above simple insertion sort algorithm.
We took an array with 6 integers. We took a variable key, in which we put each element of
the array, during each pass, starting from the second element, that is a[1].
Then using the while loop, we iterate, until j becomes equal to zero or we find an element
which is greater than key, and then we insert the key at that position.
We keep on doing this, until j becomes equal to zero, or we encounter an element which is
smaller than the key, and then we stop. The current key is now at the right position.
We then make the next element as key and then repeat the same process.
In the above array, first we pick 1 as key, we compare it with 5(element before 1), 1 is
smaller than 5, we insert 1 before 5. Then we pick 6 as key, and compare it with 5 and 1, no
shifting in position this time. Then 2 becomes the key and is compared with 6 and 5, and
then 2 is inserted after 1. And this goes on until the complete array gets sorted.
Following are the steps involved in selection sort(for sorting a given array
in ascending order):
In the first pass, the smallest element will be 1, so it will be placed at the
first position.
Then leaving the first element, next smallest element will be searched,
from the remaining elements. We will get 3 as the smallest, so it will be
then placed at the second position.
Then leaving 1 and 3(because they are at the correct position), we will
search for the next smallest element from the rest of the elements and
put it at third position and keep doing this until array is sorted.
In selection sort, in the first step, we look for the smallest element in the
array and replace it with the element at the first position. This seems
doable, isn't it?
Consider that you have an array with following values {3, 6, 1, 8, 4, 5}.
Now as per selection sort, we will start from the first element and look for
the smallest number in the array, which is 1 and we will find it at
the index 2. Once the smallest number is found, it is swapped with the
element at the first position.
Well, in the next iteration, we will have to look for the second smallest
number in the array. How can we find the second smallest number? This
one is tricky?
So, we will now look for the smallest element in the subarray, starting
from index 1, to the last index.
After we have found the second smallest element and replaced it with
element on index 1(which is the second position in the array), we will
have the first two positions of the array sorted.
Then we will work on the subarray, starting from index 2 now, and again
looking for the smallest element in this subarray.
Implementing Selection Sort Algorithm
In the C program below, we have tried to divide the program into small
functions, so that it's easier fo you to understand which part is doing
what.
# include <stdio.h>
int temp;
temp = arr[firstIndex];
arr[firstIndex] = arr[secondIndex];
arr[secondIndex] = temp;
{
int minValue = arr[startIndex];
minIndex = i;
minValue = arr[i];
return minIndex;
swap(arr, i, index);
}
}
int i;
printf("\n");
int main()
int n = sizeof(arr)/sizeof(arr[0]);
selectionSort(arr, n);
printArray(arr, n);
return 0;
}
Copy
Note: Selection sort is an unstable sort i.e it might change the
occurrence of two similar elements in the list while sorting. But it can also
work as a stable sort when it is implemented using linked list.
For example: In the array {52, 37, 63, 14, 17, 8, 6, 25}, we
take 25 as pivot. So after the first pass, the list will be changed like this.
{6 8 17 14 25 63 37 52}
Hence after the first pass, pivot will be set at its position, with all the
elements smaller to it on its left and all the elements larger than to its
right. Now 6 8 17 14 and 63 37 52 are considered as two separate
sunarrays, and same recursive logic will be applied on them, and we will
keep doing this until the complete array is sorted.
Below, we have a pictorial representation of how quick sort will sort the
given array.
In step 1, we select the last element as the pivot, which is 6 in this case,
and call for partitioning, hence re-arranging the array in such a way
that 6 will be placed in its final position and to its left will be all the
elements less than it and to its right, we will have all the elements
greater than it.
Then we pick the subarray on the left and the subarray on the right and
select a pivot for them, in the above diagram, we chose 3 as pivot for
the left subarray and 11 as pivot for the right subarray.
#include <stdio.h>
void main()
int i;
int arr[10]={90,23,101,45,65,28,67,89,34,29};
quickSort(arr, 0, 9);
for(i=0;i<10;i++)
flag = 0;
while(flag != 1)
right--;
if(loc==right)
flag =1;
else if(a[loc]>a[right])
temp = a[loc];
a[loc] = a[right];
a[right] = temp;
loc = right;
if(flag!=1)
left++;
if(loc==left)
flag =1;
temp = a[loc];
a[loc] = a[left];
a[left] = temp;
loc = left;
return loc;
int loc;
if(beg<end)
Copy
Before moving forward with Merge Sort, check these topics out first:
Selection Sort
Insertion Sort
Space Complexity of Algorithms
Time Complexity of Algorithms
In the last two tutorials, we learned about Selection Sort and Insertion
Sort, both of which have a worst-case running time of O(n ). As the size of
2
input grows, insertion and selection sort can take a long time to run.
Merge sort , on the other hand, runs in O(n*log n) time in all the cases.
Before jumping on to, how merge sort works and it's implementation, first
lets understand what is the rule of Divide and Conquer?
If we can break a single big problem into smaller sub-problems, solve the
smaller sub-problems and combine their solutions to find the solution for
the original big problem, it becomes easier to solve the whole problem.
Well that was history, and a socio-political policy (Divide and Rule), but
the idea here is, if we can somehow divide a problem into smaller sub-
problems, it becomes easier to eventually solve the whole problem.
In merge sort, we break the given array midway, for example if the
original array had 6 elements, then merge sort will break it down into two
subarrays with 3 elements each.
But breaking the orignal array into 2 smaller subarrays is not helping us
in sorting the array.
And then we have to merge all these sorted subarrays, step by step to
form one single sorted array.
Below, we have a pictorial representation of how merge sort will sort the
given array.
In merge sort we follow the following steps:
1. We take a variable p and store the starting index of our array in this.
And we take another variable r and store the last index of array in it.
2. Then we find the middle of the array using the formula (p + r)/2 and
mark the middle index as q, and break the array into two subarrays,
from p to q and from q + 1 to r index.
3. Then we divide these 2 subarrays again, just like we divided our
main array and this continues.
4. Once we have divided the main array into subarrays with single
elements, then we start merging the subarrays.
/*
*/
#include <stdio.h>
int q;
if(p < r)
q = (p + r) / 2;
mergeSort(a, p, q);
merge(a, p, q, r);
int i, j, k;
k = 0;
i = p;
j = q + 1;
else
b[k++] = a[j++];
while(i <= q)
b[k++] = a[i++];
while(j <= r)
b[k++] = a[j++];
}
for(i=r; i >= p; i--)
int i;
printf("\n");
int main()
printArray(arr, len);
printArray(arr, len);
return 0;
Copy
Given array:
32 45 67 2 7
Sorted array:
2 7 32 45 67
NOTE: If you are not familiar with Sorting in data structure, you should
first learn what is sorting to know about the basics of sorting.
What is a Heap ?
#include <iostream>
int largest = i;
int l = 2*i + 1;
int r = 2*i + 2;
largest = l;
if (largest != i)
swap(arr[i], arr[largest]);
heapify(arr, n, largest);
heapify(arr, n, i);
{
// move current root to end
swap(arr[0], arr[i]);
heapify(arr, i, 0);
int main()
{
int arr[] = {121, 10, 130, 57, 36, 17};
int n = sizeof(arr)/sizeof(arr[0]);
heapSort(arr, n);
printArray(arr, n);
Step 3: Update the count array so that element at each index, say i, is
equal to -
Step 4: The updated count array gives the index of each element of
array A in the sorted sequence. Assume that the sorted sequence is
stored in an output array, say B, of size n.
Consider the following input array A to be sorted. All the elements are in
range 0 to 9
A[]= {1, 3, 2, 8, 5, 1, 5, 1, 2, 7}
Copy
Step 1: Initialize an auxiliary array, say count and store the frequency of
every distinct element. Size of count is 10 (k+1, such that range of
elements in A is 0 to k)
For i=3, t=8, count[8]= 10, v=9. After adding 8 to B[9], count[8]=9
and i=4
#include<iostream>
int count[k+1],t;
for(int i=0;i<=k;i++)
count[i] = 0;
for(int i=0;i<n;i++)
t = A[i];
count[t]++;
for(int i=1;i<=k;i++)
{
// Updating elements of count array
count[i] = count[i]+count[i-1];
for(int i=0;i<n;i++)
t = A[i];
B[count[t]] = t;
count[t]=count[t]-1;
int main()
int n;
cin>>n;
// A is the input array and will store elements entered by the user
int A[n],B[n];
cout<<"Enter the array elements: ";
for(int i=0;i<n;i++)
cin>>A[i];
if(A[i]>k)
k = A[i];
sort_func(A,B,n);
for(int i=1;i<=n;i++)
cout<<B[i]<<" ";
cout<<"\n";
return 0;
}
Copy
The input array is the same as that used in the example:
For scanning the input array elements, the loop iterates n times, thus
taking O(n) running time. The sorted array B[] also gets computed in n
iterations, thus requiring O(n) running time. The count array also uses k
iterations, thus has a running time of O(k). Thus the total running time for
counting sort algorithm is O(n+k).
Key Points:
It is quite fast
It is a stable algorithm
Note: For a sorting algorithm to be stable, the order of elements with
equal keys (values) in the sorted array should be the same as that of the
input array.