Introduction to Data Structures and Algorithms
Introduction to Data Structures and Algorithms
Algorithms
Data Structure is a way of collecting and organising data in such a way that we can
perform operations on these data in an effective way. Data Structures is about
rendering data elements in terms of some relationship, for better organization and
storage. For example, we have some data which has, player's name "Virat"
and age 26. Here "Virat" is of String data type and 26 is of integer data type.
We can organize this data as a record like Player record, which will have both
player's name and age in it. Now we can collect and store player's records in a file or
database as a data structure. For example: "Dhoni" 30, "Gambhir" 31, "Sehwag" 33
If you are aware of Object Oriented programming concepts, then a class also does
the same thing, it collects different type of data under one single entity. The only
difference being, data structures provides for techniques to access and manipulate
data efficiently.
In simple language, Data Structures are structures programmed to store ordered
data, so that various operations can be performed on it easily. It represents the
knowledge of data to be organized in memory. It should be designed and
implemented in such a way that it reduces the complexity and increases the
efficiency.
Linked List
Tree
Graph
Stack, Queue etc.
All these data structures allow us to perform different operations on data. We select
these data structures based on which type of operation is required. We will look into
these data structures in more details in our later lessons.
The data structures can also be classified on the basis of the following
characteristics:
Characterstic Description
Static Static data structures are those whose sizes and structures
associated memory locations are fixed, at compile time.
Example: Array
What is an Algorithm ?
An algorithm is a finite set of instructions or logic, written in order, to accomplish a
certain predefined task. Algorithm is not the complete code or program, it is just the
core logic(solution) of a problem, which can be expressed either as an informal high
level description as pseudocode or using a flowchart.
Every Algorithm must satisfy the following properties:
An algorithm is said to be efficient and fast, if it takes less time to execute and
consumes less memory space. The performance of an algorithm is measured on the
basis of following properties :
1. Time Complexity
2. Space Complexity
Space Complexity
Its the amount of memory space required by the algorithm, during the course of its
execution. Space complexity must be taken seriously for multi-user systems and in
situations where limited memory is available.
An algorithm generally requires space for following components :
Instruction Space: Its the space required to store the executable version of
the program. This space is fixed, but varies depending upon the number of
lines of code in the program.
Data Space: Its the space required to store all the constants and
variables(including temporary variables) value.
Environment Space: Its the space required to store the environment
information needed to resume the suspended function.
To learn about Space Complexity in detail, jump to the Space Complexity tutorial.
Time Complexity
Time Complexity is a way to represent the amount of time required by the program to
run till its completion. It's generally a good practice to try to keep the time required
minimum, so that our algorithm completes it's execution in the minimum time
possible. We will study about Time Complexity in details in later sections.
NOTE: Before going deep into data structure, you should have a good knowledge of
programming either in C or in C++ or Java or Python etc.
Asymptotic Notations
When it comes to analysing the complexity of any algorithm in terms of time and
space, we can never provide an exact number to define the time required and the
space required by the algorithm, instead we express it using some standard
notations, also known as Asymptotic Notations.
When we analyse any algorithm, we generally get a formula to represent the amount
of time required for execution or the time required by the computer to run the lines of
code of the algorithm, number of memory accesses, number of comparisons,
temporary variables occupying memory space etc. This formula often contains
unimportant details that don't really tell us anything about the running time.
Let us take an example, if some algorithm has a time complexity of T(n) = (n 2 + 3n +
4), which is a quadratic equation. For large values of n, the 3n + 4 part will become
insignificant compared to the n part.
2
For n = 1000, n will be 1000000 while 3n + 4 will be 3004.
2
Also, When we compare the execution times of two algorithms the constant
coefficients of higher order terms are also neglected.
An algorithm that takes a time of 200n will be faster than some other algorithm that
2
takes n time, for any value of n larger than 200. Since we're only interested in the
3
asymptotic behavior of the growth of the function, the constant factor can be ignored
too.
Expression 1, and on n for Expression 2. Hence, we can clearly say that the
3
algorithm for which running time is represented by the Expression 2, will grow faster
than the other one, simply by analysing the highest power coeeficient and ignoring
the other constants(20 in 20n2) and insignificant parts of the expression(3n -
4 and 100n - 2).
The main idea behind casting aside the less important part is to make
things manageable.
All we need to do is, first analyse the algorithm to find out an expression to define it's
time requirements and then analyse how that expression will grow as the input(n) will
grow.
constants, therby tightly binding the expression rpresenting the growth of the
algorithm.
Upper Bounds: Big-O
This notation is known as the upper bound of the algorithm, or a Worst Case of an
algorithm.
It tells us that a certain function will never exceed a specified time for any value of
input n.
The question is why we need this representation when we already have the big-Θ
notation, which represents the tightly bound running time for any algorithm. Let's take
a small example to understand this.
Consider Linear Search algorithm, in which we traverse an array elements, one by
one to search a given number.
In Worst case, starting from the front of the array, we find the element or number we
are searching for at the end, which will lead to a time complexity of n,
where n represents the number of total elements.
But it can happen, that the element that we are searching for is the first element of
the array, in which case the time complexity will be 1.
Now in this case, saying that the big-Θ or tight bound time complexity for Linear
search is Θ(n), will mean that the time required will always be related to n, as this is
the right way to represent the average time complexity, but when we use the big-O
notation, we mean to say that the time complexity is O(n), which means that the time
complexity will never exceed n, defining the upper bound, hence saying that it can be
less than or equal to n, which is the correct representation.
This is the reason, most of the time you will see Big-O notation being used to
represent the time complexity of any algorithm, because it makes more sense.
Space complexity is the amount of memory used by the algorithm (including the
input values to the algorithm) to execute and produce the result.
Sometime Auxiliary Space is confused with Space Complexity. But Auxiliary Space
is the extra space or the temporary space used by the algorithm during it's execution.
Space Complexity = Auxiliary Space + Input space
1. Instruction Space
It's the amount of memory used to save the compiled version of instructions.
2. Environmental Stack
3. Data Space
Type Size
Now let's learn how to compute space complexity by taking a few examples:
{
int z = a + b + c;
return(z);
}
In the above expression, variables a, b, c and z are all integer types, hence they will
take up 4 bytes each, so total memory requirement will be (4(4) + 4) = 20
bytes, this additional 4 bytes is for return value. And because this space
requirement is fixed for the above example, hence it is called Constant Space
Complexity.
Let's have another example, this time a bit complex one,
// n is the length of array a[]
int sum(int a[], int n)
{
int x = 0; // 4 bytes for x
for(int i = 0; i < n; i++) // 4 bytes for i
{
x = x + a[i];
}
return(x);
}
In the above code, 4*n bytes of space is required for the array a[] elements.
4 bytes each for x, n, i and the return value.
Hence the total memory requirement will be (4n + 12), which is increasing linearly
with the increase in the input value n, hence it is called as Linear Space
Complexity.
Similarly, we can have quadratic and other complex space complexity as well, as the
complexity of an algorithm increases.
But we should always focus on writing algorithm code in such a way that we keep
the space complexity minimum.
In the above two simple algorithms, you saw how a single problem can have many
solutions. While the first solution required a loop which will execute for n number of
times, the second solution used a mathematical operator * to return the result in one
line. So which one is the better approach, of course the second one.
Above we have a single statement. Its Time Complexity will be Constant. The
running time of the statement will not change in relation to N.
Introduction to Sorting
Sorting is nothing but arranging the data in ascending or descending order. The
term sorting came into picture, as humans realised the importance of searching
quickly.
There are so many things in our real life that we need to search for, like a particular
record in database, roll numbers in merit list, a particular telephone number in
telephone directory, a particular page in a book etc. All this would have been a mess
if the data was kept unordered and unsorted, but fortunately the concept
of sorting came into existence, making it easier for everyone to arrange data in an
order, hence making it easier to search.
Sorting arranges data in a sequence which makes searching easier.
Sorting Efficiency
If you ask me, how will I arrange a deck of shuffled cards in order, I would say, I will
start by checking every card, and making the deck as I move on.
It can take me hours to arrange the deck in order, but that's how I will do it.
Well, thank god, computers don't work like this.
Since the beginning of the programming age, computer scientists have been working
on solving the problem of sorting by coming up with various different algorithms to
sort data.
The two main criterias to judge which algorithm is better than the other have been:
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort
Although it's easier to understand these sorting techniques, but still we suggest you
to first learn about Space complexity, Time complexity and the searching algorithms,
to warm up your brain for sorting algorithms.
1. Starting with the first element(index = 0), compare the current element with
the next element of the array.
2. If the current element is greater than the next element of the array, swap
them.
3. If the current element is less than the next element, move to the next
element. Repeat Step 1.
int main()
{
int arr[100], i, n, step, temp;
// ask user for number of elements to be sorted
printf("Enter the number of elements to be sorted: ");
scanf("%d", &n);
// input elements if the array
for(i = 0; i < n; i++)
{
printf("Enter element no. %d: ", i+1);
scanf("%d", &arr[i]);
}
// call the function bubbleSort
bubbleSort(arr, n);
return 0;
}
Although the above logic will sort an unsorted array, still the above algorithm is not
efficient because as per the above logic, the outer for loop will keep on executing
for 6 iterations even if the array gets sorted after the second iteration.
So, we can clearly optimize our algorithm.
int main()
{
int arr[100], i, n, step, temp;
// ask user for number of elements to be sorted
printf("Enter the number of elements to be sorted: ");
scanf("%d", &n);
// input elements if the array
for(i = 0; i < n; i++)
{
printf("Enter element no. %d: ", i+1);
scanf("%d", &arr[i]);
}
// call the function bubbleSort
bubbleSort(arr, n);
return 0;
}
In the above code, in the function bubbleSort, if for a single complete cycle
of j iteration(inner for loop), no swapping takes place, then flag will remain 0 and
then we will break out of the for loops, because the array has already been sorted.
Complexity Analysis of Bubble Sort
In Bubble Sort, n-1 comparisons will be done in the 1st pass, n-2 in 2nd pass, n-
3 in 3rd pass and so on. So the total number of comparisons will be,
Sum = n(n-1)/2
i.e O(n2)
In the first pass, the smallest element will be 1, so it will be placed at the first
position.
Then leaving the first element, next smallest element will be searched, from the
remaining elements. We will get 3 as the smallest, so it will be then placed at the
second position.
Then leaving 1 and 3(because they are at the correct position), we will search for the
next smallest element from the rest of the elements and put it at third position and
keep doing this until array is sorted.
int main()
{
int arr[] = {46, 52, 21, 22, 11};
int n = sizeof(arr)/sizeof(arr[0]);
selectionSort(arr, n);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}
Note: Selection sort is an unstable sort i.e it might change the occurrence of two
similar elements in the list while sorting. But it can also work as a stable sort when it
is implemented using linked list.