Data Structure - Unit 4 - B.Tech 3rd
Data Structure - Unit 4 - B.Tech 3rd
UNIT – 4
Sorting
What is sorting?
Sorting is the process of arranging data into meaningful order so that you
can analyze it more effectively.
Insertion Sort
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted
sub-list.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we
see that the sorted sub-list has only one element 14, and 27 is greater than 14.
Hence, the sorted sub-list remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
So we swap them.
We swap them again. By the end of third iteration, we have a sorted sub-list of 4
items.
This process goes on until all the unsorted values are covered in a sorted sub-list.
Now we shall see some programming aspects of insertion sort.
Algorithm
Now we have a bigger picture of how this sorting technique works, so we can
derive simple steps by which we can achieve insertion sort.
Step 1 − If it is the first element, it is already sorted. return 1;
Step 4 − Shift all the elements in the sorted sub-list that is greater than the
value to be sorted
Pseudocode
end for
end procedure
Bubble Sort
Bubble sort starts with very first two elements, comparing them to check which one
is greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we
compare 33 with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We know then that 10 is smaller 35. Hence they are not sorted.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
We swap these values. We find that we have reached the end of the array. After
one iteration, the array should look like this −
To be precise, we are now showing how an array should look like after each
iteration. After the second iteration, it should look like this −
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely
sorted.
Algorithm
begin BubbleSort(list)
return list
end BubbleSort
Pseudocode
We observe in algorithm that Bubble Sort compares each pair of array element
unless the whole array is completely sorted in an ascending order. This may cause
a few complexity issues like what if the array needs no more swapping as all the
elements are already ascending.
To ease-out the issue, we use one flag variable swapped which will help us see if
any swap has happened or not. If no swap has occurred, i.e. the array requires no
more processing to be sorted, it will come out of the loop.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
loop = list.count;
end for
end for
Implementation
One more issue we did not address in our original algorithm and its improvised
pseudocode, is that, after every iteration the highest values settles down at the end
of the array. Hence, the next iteration need not include already sorted elements.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
For this purpose, in our implementation, we restrict the inner loop to avoid already
sorted values.
Quick Sort
Quick sort is a highly efficient sorting algorithm and is based on partitioning of
array of data into smaller arrays. A large array is partitioned into two arrays one of
which holds values smaller than the specified value, say pivot, based on which the
partition is made and another array holds values greater than the pivot value.
Quicksort partitions an array and then calls itself recursively twice to sort the two
resulting sub-arrays. This algorithm is quite efficient for large-sized data sets as its
average and worst-case complexity are O(n2), respectively.
Following animated representation explains how to find the pivot value in an array.
The pivot value divides the list into two parts. And recursively, we find the pivot for
each sub-lists until all lists contains only one element.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Based on our understanding of partitioning in quick sort, we will now try to write
an algorithm for it, which is as follows.
Step 1 − Choose the highest index value has pivot
Step 2 − Take two variables to point left and right of the list excluding pivot
Step 3 − left points to the low index
Step 4 − right points to the high
Step 5 − while value at left is less than pivot move right
Step 6 − while value at right is greater than pivot move left
Step 7 − if both step 5 and step 6 does not match swap left and right
Step 8 − if left ≥ right, the point where they met is new pivot
while True do
while A[++leftPointer] < pivot do
//do-nothing
end while
end while
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
swap leftPointer,right
return leftPointer
end function
Using pivot algorithm recursively, we end up with smaller possible partitions. Each
partition is then processed for quick sort. We define recursive algorithm for
quicksort as follows −
Step 1 − Make the right-most index value pivot
Step 2 − partition the array using pivot value
Step 3 − quicksort left partition recursively
Step 4 − quicksort right partition recursively
To get more into it, let see the pseudocode for quick sort algorithm −
if right-left <= 0
return
else
pivot = A[right]
partition = partitionFunc(left, right, pivot)
quickSort(left,partition-1)
quickSort(partition+1,right)
end if
end procedure
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Merge Sort
Merge sort sorts the list using divide and conquers strategy. Unlike selection
sort, bubble sort or insertion sort, it sorts data by dividing the list into sub lists and
recursively solving and combining them.
Sorting a smaller list takes less time than sorting a larger list.
Combining two sorted sublists takes less time than combining two unsorted
lists.
Divide : Recursively divide the single list into two sublists until each sublist
contains 1 element.
Merge sort divides the list of size n into two sublists, each of size n/2. This
subdivision continues until problem size becomes one. After hitting to the problem
size one, conquer phase starts. Following figure shows the divide phase.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Merge sort compares two arrays, each of size one, using a two-way merge. The
sorted sequence is saved in a new two-dimensional array. In the following step,
two sorted arrays of size two are combined to form a single sorted array of size
four. This technique is repeated until the entire problem has been solved.
To inform the algorithm that all elements in the referenced array have been
examined, a sentinel symbol is added at the end of both arrays to be merged.
When one array reaches to its sentinel symbol, the remaining elements of other
array are simply duplicated into the final array.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
for i ← 1 to l1 do
LEFT [i] ← A [low + i–1]
end
for j ← 1 to l2 do
RIGHT [j] ← A [mid + j]
end
i ← 1, j ← 1
for k ← low to high do
if LEFT [i] ≤ RIGHT [j] then
B[k] ← LEFT [i]
i ← i + 1
else
B[k] ← RIGHT[j]
j ← j + 1
end
end
The techniques of sorting can be divided into two categories. These are:
Internal Sorting
External Sorting
Internal Sorting: If all the data that is to be sorted can be adjusted at a time in the main
memory, the internal sorting method is being performed.
External Sorting: When the data that is to be sorted cannot be accommodated in the
memory at the same time and some has to be kept in auxiliary memory such as hard
disk, floppy disk, magnetic tapes etc, then external sorting methods are performed.
The complexity of sorting algorithm calculates the running time of a function in which
'n' number of items are to be sorted. The choice for which sorting method is suitable for
a problem depends on several dependency configurations for different problems. The
most noteworthy of these considerations are:
To get the amount of time required to sort an array of 'n' elements by a particular
method, the normal approach is to analyze the method to find the number of
comparisons (or exchanges) required by it. Most of the sorting techniques are data
sensitive, and so the metrics for them depends on the order in which they appear in an
input array.
Various sorting techniques are analyzed in various cases and named these cases as
follows:
Best case
Worst case
Average case
Hence, the result of these cases is often a formula giving the average time required for a
particular sort of size 'n.' Most of the sort methods have time requirements that range
from O(nlog n) to O(n2).
Bubble Sort
Selection Sort
Merge Sort
Insertion Sort
Quick Sort
Heap Sort
An internal sort is any data sorting process that takes place entirely within
the main memory of a computer. This is possible whenever the data to be sorted
is small enough to all be held in the main memory. For sorting larger datasets, it
may be necessary to hold only a chunk of data in memory at a time, since it won’t
all fit. The rest of the data is normally held on some larger, but slower medium, like
a hard-disk. Any reading or writing of data to and from this slower media can slow
the sortation process considerably. This issue has implications for different sort
algorithms.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
1. Bubble Sort
2. Insertion Sort
3. Quick Sort
4. Heap Sort
5. Radix Sort
6. Selection sort
1. Bubble Sort
2. Insertion Sort
Insertion sort works similar to the sorting of playing cards in hands. It is assumed
that the first card is already sorted in the card game, and then we select an unsorted
card. If the selected unsorted card is greater than the first card, it will be placed at
the right side; otherwise, it will be placed at the left side. Similarly, all unsorted
cards are taken and put in their exact place.
The same approach is applied in insertion sort. The idea behind the insertion sort
is that first take one element, iterate it through the sorted array. Although it is
simple to use, it is not appropriate for large data sets as the time complexity of
insertion sort in the average case and worst case is O(n2), where n is the number
of items. Insertion sort is less efficient than the other sorting algorithms like heap
sort, quick sort, merge sort, etc.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
3. Quick Sort
Quicksort picks an element as pivot, and then it partitions the given array
around the picked pivot element. In quick sort, a large array is divided into
two arrays in which one holds values that are smaller than the specified value
(Pivot), and another array holds the values that are greater than the pivot.
After that, left and right sub-arrays are also partitioned using the same
approach. It will continue until the single element remains in the sub-array.
o Pivot can be random, i.e. select the random pivot from the given array.
o Pivot can either be the rightmost element of the leftmost element of
the given array.
o Select median as the pivot element.
4. Heap Sort
Heap sort processes the elements by creating the min-heap or max-heap using the
elements of the given array. Min-heap or max-heap represents the ordering of
array in which the root element represents the minimum or maximum element of
the array.
Before knowing more about the heap sort, let's first see a brief description of Heap.
What is a heap?
A heap is a complete binary tree, and the binary tree is a tree in which the node
can have the utmost two children. A complete binary tree is a binary tree in which
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
all the levels except the last level, i.e., leaf node, should be completely filled, and
all the nodes should be left-justified.
5. Radix Sort
Radix sort is the linear sorting algorithm that is used for integers. In Radix sort,
there is digit by digit sorting is performed that is started from the least significant
digit to the most significant digit.
The process of radix sort works similar to the sorting of students names, according
to the alphabetical order. In this case, there are 26 radix formed due to the 26
alphabets in English. In the first pass, the names of students are grouped according
to the ascending order of the first letter of their names. After that, in the second
pass, their names are grouped according to the ascending order of the second
letter of their name. And the process continues until we find the sorted list.
6.Selection Sort
In selection sort, the smallest value among the unsorted elements of the array is
selected in every pass and inserted to its appropriate position into the array. It is
also the simplest algorithm. It is an in-place comparison sorting algorithm. In this
algorithm, the array is divided into two parts, first is sorted part, and another one
is the unsorted part. Initially, the sorted part of the array is empty, and unsorted
part is the given array. Sorted part is placed at the left, while the unsorted part is
placed at the right.
In selection sort, the first smallest element is selected from the unsorted array and
placed at the first position. After that second smallest element is selected and
placed in the second position. The process continues until the array is entirely
sorted.
The average and worst-case complexity of selection sort is O(n2), where n is the
number of items. Due to this, it is not suitable for large data sets.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
A binary search tree follows some order to arrange the elements. In a Binary search
tree, the value of left node must be smaller than the parent node, and the value of
right node must be greater than the parent node. This rule is applied recursively to
the left and right subtrees of the root.
In the above figure, we can observe that the root node is 40, and all the nodes of
the left subtree are smaller than the root node, and all the nodes of the right
subtree are greater than the root node.
Similarly, we can see the left child of root node is greater than its left child and
smaller than its right child. So, it also satisfies the property of binary search tree.
Therefore, we can say that the tree in the above image is a binary search tree.
Suppose if we change the value of node 35 to 55 in the above tree, check whether
the tree will be binary search tree or not.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
In the above tree, the value of root node is 40, which is greater than its left child
30 but smaller than right child of 30, i.e., 55. So, the above tree does not satisfy the
property of Binary search tree. Therefore, the above tree is not a binary search tree.
Now, let's see the creation of binary search tree using an example.
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as
the root of the left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the
root of the right subtree.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Now, let's see the process of creating the Binary search tree using the given data
element. The process of creating the BST is shown below -
As 15 is smaller than 45, so insert it as the root node of the left subtree.
As 79 is greater than 45, so insert it as the root node of the right subtree.
90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
55 is larger than 45 and smaller than 79, so it will be inserted as the left subtree of
79.
12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the right
subtree of 10.
20 is smaller than 45 but greater than 15, so it will be inserted as the right subtree
of 15.
50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a left
subtree of 55.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Now, the creation of binary search tree is completed. After that, let's move towards
the operations that can be performed on Binary search tree.
We can perform insert, delete and search operations on the binary search tree.
Insertion in BST
Insert function is used to add a new element in a binary search tree at appropriate
location. Insert function is to be designed in such a way that, it must node violate
the property of binary search tree at each value.
It is the simplest case, in this case, replace the leaf node with the NULL and simple
free the allocated space.
In the following image, we are deleting the node 85, since the node is a leaf node,
therefore the node will be replaced with NULL and allocated space will be freed.
Algorithm
Compare the value of key with the value of root. If the key > root -> value,
recursively traverse the right subtree.
If key < root -> value, recursively traverse the left subtree.
While traversing if key == root->value, we need to delete this node:
o If the node is a leaf, make root = NULL.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
o If the node is not a leaf and has the right child, recursively replace its
value with the successor node and delete its successor from its original
position.
o If the node is not a leaf and has left child, but not right child,
recursively replace its value by predecessor node and delete its
predecessor from its original position.
Return root
1. Time complexity:
i. Best case: When we get the root node as the node which is supposed to be
searched then in that case we have to make onle one comparison so time taken
would be constant. Time complexity in best case would be O(1).
ii. Average case: When there is a balanced binary search tree(a binary search
tree is called balanced if height difference of nodes on left and right subtree is
not more than one), so height becomes logN where N is number of nodes in a
tree.
In search operation we will keep on traversing through nodes one by one,
suppose if we find the element in second level so for doing so we have done 2
comparisons, if we get element in third level we will be doing 3 comparisons so in
this way we can say that the time taken to search for a key in binary search tree is
same as the height of the tree which is logN, so time complexity for searching
is O(logN) in average case.
Note: Average Height of a Binary Search Tree is 4.31107 ln(N) - 1.9531 lnln(N) +
O(1) that is O(logN).
In this case we have to traverse from root to the deepest leaf node and in that
case height of the tree becomes n and as we have seen above time taken is same
as the height of the tree so time complexity in worst case becomes O(n).
Path Length
Given a binary tree, find the path length having maximum number of bends.
Note: Here, bend indicates switching from left to right or vice versa while
traversing in the tree.
For example, consider below paths (L means moving leftwards, R means moving
rightwards):
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
LLRRRR – 1 Bend
RLLLRR – 2 Bends
LRLRLR – 5 Bends
Examples:
Input:
4
/ \
2 6
/ \ / \
1 3 5 7
/
9
/ \
12 10
\
11
/ \
45 13
\
14
Output : 6
In the above example, the path 4-> 6-> 7-> 9-> 10-> 11-> 45
is having the maximum number of bends, i.e., 3.
The length of this path is 6.
Approach :
The idea is to traverse the tree for left and right subtrees of the root. While
traversing, keep track of the direction of motion (left or right). Whenever,
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
direction of motion changes from left to right or vice versa increment the number
of bends in the current path by 1.
On reaching the leaf node, compare the number of bends in the current path with
the maximum number of bends(i.e., maxBends) seen so far in a root-to-leaf path.
If the number of bends in the current path is greater than the maxBends, then
update the maxBends equal to the number of bends in the current path and
update the maximum path length (i.e., len) also to the length of the current path.
AVL Tree
AVL Tree is invented by GM Adelson - Velsky and EM Landis in 1962. The tree is
named AVL in honour of its inventors.
AVL Tree can be defined as height balanced binary search tree in which each node
is associated with a balance factor which is calculated by subtracting the height of
its right sub-tree from that of its left sub-tree.
If balance factor of any node is 1, it means that the left sub-tree is one level higher
than the right sub-tree.n JDK, JRE, and JVM
If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
contain equal height.
If balance factor of any node is -1, it means that the left sub-tree is one level lower
than the right sub-tree.
An AVL tree is given in the following figure. We can see that, balance factor
associated with each node is in between -1 and +1. Therefore, it is an example of
AVL tree.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Complexity
Due to the fact that, AVL tree is also a binary search tree therefore, all the
operations are performed in the same way as they are performed in a binary search
tree.
Searching and traversing do not lead to the violation in property of AVL tree.
However, insertion and deletion are the operations which can violate this property
and therefore, they need to be revisited.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
SN Operation Description
AVL tree controls the height of the binary search tree by not letting it to be skewed.
The time taken for all operations in a binary search tree of height h is O(h).
However, it can be extended to O(n) if the BST becomes skewed (i.e. worst case).
By limiting this height to log n, AVL tree imposes an upper bound on each
operation to be O(log n) where n is the number of nodes.
AVL Rotations
We perform rotation in AVL tree only in case if Balance Factor is other than -1, 0,
and 1. There are basically four types of rotations which are as follows:
Where node A is the node whose balance Factor is other than -1, 0, 1.
The first two rotations LL and RR are single rotations and the next two rotations LR
and RL are double rotations.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
B-Trees
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree
of order m can have at most m-1 keys and m children.
One of the main reason of using B tree is its capability to store large number of
keys in a single node and large key values by keeping the height of the tree
relatively small.
It is not necessary that, all the nodes contain the same number of children but,
each node must have m/2 number of nodes.
While performing some operations on B Tree, any property of B Tree may violate
such as number of minimum children a node can have. To maintain the properties
of B Tree, the tree may split or join.
Operations
Searching:
Searching in B Trees is similar to that in Binary search tree. For example, if we search
for an item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left
sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm
takes O(log n) time to search any element in a B tree.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be
followed in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the
node can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too
by following the same steps.
Example:
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split
the node from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Deletion is also performed at the leaf nodes. The node which is to be deleted can
either be a leaf node or an internal node. Following algorithm needs to be followed
in order to delete a node from a B tree.
If the node which is to be deleted is an internal node, then replace the node with
its in-order successor or predecessor. Since, successor or predecessor will always
be on the leaf node hence, the process will be similar as the node is being deleted
from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
Data Structures & Algorithm B.Tech 3rd Sem Dr. S. Senthil Kumar, Associate Professor
Now, 57 is the only element which is left in the node, the minimum number of
elements that must be present in a B tree of order 5, is 2. it is less than that, the
elements in its left and right sub-tree are also not sufficient therefore, merge it with
the left sibling and intervening element of parent i.e. 49.
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored
on the disks since, the access to value stored in a large database that is stored on
a disk is a very time consuming process.
===========XXXXXXXXXXXXXX==========