Unit Iii: Sorting and Searching
Unit Iii: Sorting and Searching
Sorting and Searching-Bubble sort - selection sort - insertion sort - merge sort
- quick sort - linear search - binary search - hashing - hash functions - collision
handling - load factors, rehashing, and efficiency.
Sorting
Sorting is defined as an arrangement of data in a certain order. Sorting techniques
are used to arrange data(mostly numerical) in an ascending or descending order. It is a
method used for the representation of data in a more comprehensible format.
Sorting a large amount of data can take a substantial amount of computing
resources if the methods we use to sort the data are inefficient. The efficiency of the
algorithm is proportional to the number of items it is traversing.
Increasing Order: A set of values are said to be increasing order when every
successive element is greater than its previous element. For example: 1, 2, 3, 4, 5.
Here, the given sequence is in increasing order.
Decreasing Order: A set of values are said to be in increasing order when the
successive element is always less than the previous one. For Example: 5, 4, 3, 2, 1.
Here the given sequence is in decreasing order.
Non-Increasing Order: A set of values are said to be in non-increasing order if every
ith element present in the sequence is greater than or equal to its (i-1)th element. This
order occurs whenever there are numbers that are being repeated. For Example: 1, 2, 2,
3, 4, 5. Here 2 repeated two times.
CATEGORIES OF SORTING
The techniques of sorting can be divided into two categories. These are:
Internal Sorting
External Sorting
Internal Sorting: If all the data that is to be sorted can be adjusted at a time in the main
memory, the internal sorting method is being performed.
External Sorting: When the data that is to be sorted cannot be accommodated in the memory
at the same time and some has to be kept in auxiliary memory such as hard disk, floppy disk,
magnetic tapes etc, then external sorting methods are performed.
The complexity of sorting algorithm calculates the running time of a function in which 'n'
number of items are to be sorted. The choice for which sorting method is suitable for a
problem depends on several dependency configurations for different problems. The most
noteworthy of these considerations are:
To get the amount of time required to sort an array of 'n' elements by a particular
method, the normal approach is to analyze the method to find the number of comparisons (or
exchanges) required by it. Most of the sorting techniques are data sensitive, and so the
metrics for them depends on the order in which they appear in an input array.
Various sorting techniques are analyzed in various cases and named these cases as follows:
Best case
Worst case
Average case
Hence, the result of these cases is often a formula giving the average time required for a
particular sort of size 'n.' Most of the sort methods have time requirements that range from
O(nlog n) to O(n2).
Sorting Techniques/Types
Bubble Sort
Example
Here we sort the following sequence using bubble sort
Sequence: 2, 23, 10, 1
First Iteration
(2, 23, 10, 1) –> (2, 23, 10, 1), Here the first 2 elements are compared and remain the same
because they are already in ascending order.
(2, 23, 10, 1) –> (2, 10, 23, 1), Here 2nd and 3rd elements are compared and swapped(10
is less than 23) according to ascending order.
(2, 10, 23, 1) –> (2, 10, 1, 23), Here 3rd and 4th elements are compared and swapped(1 is
less than 23) according to ascending order
At the end of the first iteration, the largest element is at the rightmost position which is
sorted correctly.
Second Iteration
(2, 10, 1, 23) –> (2, 10, 1, 23), Here again, the first 2 elements are compared and remain
the same because they are already in ascending order.
(2, 10, 1, 23) –> (2, 1, 10, 23), Here 2nd and 3rd elements are compared and swapped(1 is
less than 10) in ascending order.
At the end of the second iteration, the second largest element is at the adjacent position to
the largest element.
Third Iteration
(2, 1, 10, 23) –> (1, 2, 10, 23), Here the first 2 elements are compared and swap according
to ascending order.
The remaining elements are already sorted in the first and second Iterations. After the
three iterations, the given array is sorted in ascending order. So the final result is 1, 2, 10,
23.
Example Program:1
def bubbleSort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n - i - 1):
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
arr = [ 2, 1, 10, 23 ]
bubbleSort(arr)
Output:
Sorted array is:
1
2
10
23
Example Program:2
List1=[10,15,4,23,0]
print(“Unsorted List:”,List1)
for j in range(len(List1)-1):
for i in range(len(List1)-1):
if List1[i]>List1[i+1]:
List1[i],List1[i+1]=List1[i+1],List1[i]
print(“Sorted list”,List1)
OUTPUT:
Unsorted List: 10,15,4,23,0
Sorted List: 0,4,10,15,23
Selection Sort
This sorting technique repeatedly finds the minimum element and sort it in order.
Bubble Sort does not occupy any extra memory space. During the execution of this
algorithm, two subarrays are maintained, the subarray which is already sorted, and the
remaining subarray which is unsorted. During the execution of Selection Sort for every
iteration, the minimum element of the unsorted subarray is arranged in the sorted subarray.
Selection Sort is a more efficient algorithm than bubble sort. Sort has a Time-Complexity
of O(n2) in the average, worst, and in the best cases.
Example
Here we sort the following sequence using the selection sort
Sequence: 7, 2, 1, 6
(7, 2, 1, 6) –> (1, 7, 2, 6), In the first traverse it finds the minimum element(i.e., 1) and it is
placed at 1st position.
(1, 7, 2, 6) –> (1, 2, 7, 6), In the second traverse it finds the 2nd minimum element(i.e., 2)
and it is placed at 2nd position.
(1, 2, 7, 6) –> (1, 2, 6, 7), In the third traverse it finds the next minimum element(i.e., 6)
and it is placed at 3rd position.
After the above iterations, the final array is in sorted order, i.e., 1, 2, 6, 7.
Example Program:1
def selectionSort(array, size):
for s in range(size):
min_idx = s
data = [ 7, 2, 1, 6 ]
size = len(data)
selectionSort(data, size)
Output:
Sorted Array in Ascending Order is :
[1, 2, 6, 7]
Example Program:2
OUTPUT:
Insertion Sort
The Insertion sort is a straightforward and more efficient algorithm than the bubble sort
algorithm. The insertion sort algorithm concept is based on the deck of the card where we
sort the playing card according to a particular card. It has many advantages, but there are
many efficient algorithms available in the data structure.
While the card-playing, we compare the hands of cards with each other. Most of the player
likes to sort the card in the ascending order so they can quickly see which combinations they
have at their disposal.
The insertion sort implementation is easy and simple because it's generally taught in the
beginning programming lesson. It is an in-place and stable algorithm that is more beneficial
for nearly-sorted or fewer elements.
The insertion sort algorithm is not so fast because of it uses nested loop for sort the elements.
The more important thing, the insertion sort doesn't require to know the array size in advance
and it receives the one element at a time.
The great thing about the insertion sort is if we insert the more elements to be sorted - the
algorithm arranges the in its proper place without performing the complete sort.
It is more efficient for the small (less than 10) size array. Now, let's understand the concepts
of insertion sort.
The array spilled virtually in the two parts in the insertion sort - An unsorted
part and sorted part.
The sorted part contains the first element of the array and other unsorted subpart contains the
rest of the array. The first element in the unsorted array is compared to the sorted array so
that we can place it into a proper sub-array.
It focuses on inserting the elements by moving all elements if the right-side value is smaller
than the left side.
It will repeatedly happen until the all element is inserted at correct place.
To sort the array using insertion sort below is the algorithm of insertion sort.
o Spilt a list in two parts - sorted and unsorted.
o Iterate from arr[1] to arr[n] over the given array.
o Compare the current element to the next element.
o If the current element is smaller than the next element, compare to the element before,
Move to the greater elements one position up to make space for the swapped element.
Example:
We will consider the first element in the sorted array in the following array.
[10, 4, 25, 1, 5]
[10, 4, 25, 1, 5]
Now we take the first element from the unsorted array - 4. We store this value in a new
variable temp. Now, we can see that the 10>4 then we move the 10 to the right and that
overwrite the 4 that was previously stored.
Here the 4 is lesser than all elements in sorted subarray, so we insert it at the first index
position.
Now check the number 25. We have saved it into the temp variable. 25> 10 and also 25> 4
then we put it in the third position and add it to the sorted sub array.
Again we check the number 1. We save it in temp. 1 is less than the 25. It overwrites the 25.
Now, we have 4 elements in the sorted subarray. 5<25 then shift 25 to the right side and
pass temp = 5 to the left side.
Example Program:1
def insertion_sort(list1):
value = list1[i]
j=i-1
while j >= 0 and value < list1[j]:
list1[j + 1] = list1[j]
j -= 1
list1[j + 1] = value
return list1
# Driver code to test above
Output
Explanation:
In the above code, we have created a function called insertion_sort(list1). Inside the function
-
Output:
Merge Sort
Merge sort is similar to the quick sort algorithm as works on the concept of divide and
conquer. It is one of the most popular and efficient sorting algorithm. It is the best example
for divide and conquer category of algorithms.
It divides the given list in the two halves, calls itself for the two halves and then merges the
two sorted halves. We define the merge() function used to merging two halves.
The sub lists are divided again and again into halves until we get the only one element each.
Then we combine the pair of one element lists into two element lists, sorting them in the
process. The sorted two element pairs is merged into the four element lists, and so on until we
get the sorted list.
Merge Sort Concept
We have divided the given list in the two halves. The list couldn't be divided in equal parts it
doesn't matter at all.
Merge sort can be implement using the two ways - top-down approach and bottom-up
approach. We use the top down approach in the above example, which is Merge sort most
often used.
The bottom-up approach provides the more optimization which we will define later.
The main part of the algorithm is that how we combine the two sorted sublists. Let's merge
the two sorted merge list.
o A : [2, 4, 7, 8]
o B : [1, 3, 11]
o sorted : empty
First, we observe the first element of both lists. We find the B's first element is smaller, so we
add this in our sorted list and move forward in the B list.
o A : [2, 4, 7, 8]
o B : [1, 3, 11]
o Sorted : 1
Now we look at the next pair of elements 2 and 3. 2 is smaller so we add it into our sorted list
and move forward to the list.
o A : [2, 4, 7, 8]
o B : [1, 3, 11]
o Sorted : 1
Continue this process and we end up with the sorted list of {1, 2, 3, 4, 7, 8, 11}. There can be
two special cases.
o What if both sublists have same elements - In such case, we can move either one
sublist and add the element to the sorted list. Technically, we can move forward in
both sublist and add the elements to the sorted list.
o We have no element left in one sublist. When we run out the in a sublist simply add
the element of the second one after the other.
Implementation
left_sublist = list1[left_index:middle + 1]
right_sublist = list1[middle+1:right_index+1]
left_sublist_index = 0
right_sublist_index = 0
sorted_index = left_index
list1 = [44, 65, 2, 3, 58, 14, 57, 23, 10, 1, 7, 74, 48]
merge_sort(list1, 0, len(list1) -1)
print(list1)
Quick Sort
Also called partition exchange algorithm.It can be 2 or 3 times faster than the merge
sort and heap sort.
QuickSort is a divide and conquer algorithm. It picks an element as a pivot and
partitions the given array around the picked pivot. There are many different versions of
quickSort that pick pivot in different ways.
1. Always pick the first element as a pivot
2. Always pick the last element as a pivot
3. Pick a random element as a pivot
4. Pick median as a pivot
Searching Algorithms
Searching is a very basic necessity when you store data in different data structures.
The simplest approach is to go across every element in the data structure and match it
with the value you are searching for.This is known as Linear search.
It is inefficient and rarely used, but creating a program for it gives an idea about how we
can implement some advanced search algorithms.
o Linear Search
o Binary Search
Linear Search
Linear search is a method of finding elements within a list. It is also called a
sequential search.
It is the simplest searching algorithm because it searches the desired element in a
sequential manner.
Let's understand the following steps to find the element key = 7 in the given list.
Step - 1: Start the search from the first element and Check key = 7 with each element of list
x.
LinearSearch(list, key)
for each item in the list
if item == value
return its index position
return -1
Python Program
list1 = [1 ,3, 5, 4, 7, 9]
key = 7
n = len(list1)
res = linear_Search(list1, n, key)
if(res == -1):
print("Element not found")
else:
print("Element found at index: ", res)
Output:
Explanation:
In the above code, we have created a function linear_Search(), which takes three arguments
- list1, length of the list, and number to search. We defined for loop and iterate each element
and compare to the key value. If element is found, return the index else return -1 which
means element is not present in the list.
Linear search algorithm is suitable for smaller list (<100) because it check every element to
get the desired number. Suppose there are 10,000 element list and desired element is
available at the last position, this will consume much time by comparing with each element of
the list.
Binary Search
A binary search is an algorithm to find a particular element in the list. Suppose we have a list
of thousand elements, and we need to get an index position of a particular element. We can
find the element's index position very fast using the binary search algorithm.
There are many searching algorithms but the binary search is most popular among them.
The elements in the list must be sorted to apply the binary search algorithm. If elements are
not sorted then sort them first.
The divide and conquer approach technique is followed by the recursive method. In this
method, a function is called itself again and again until it found an element in the list.
A set of statements is repeated multiple times to find an element's index position in the
iterative method. The while loop is used for accomplish this task.
Binary search is more effective than the linear search because we don't need to search each
list index. The list must be sorted to achieve the binary search algorithm.
We have a sorted list of elements, and we are looking for the index position of 45.
So, we are setting two pointers in our list. One pointer is used to denote the smaller value
called low and the second pointer is used to denote the highest value called high.
Now, we will compare the searched element to the mid index value. In this case, 32 is not
equal to 45. So we need to do further comparison to find the element.
If the number we are searching equal to the mid. Then return mid otherwise move to the
further comparison.
The number to be search is greater than the middle number, we compare the n with the
middle element of the elements on the right side of mid and set low to low = mid + 1.
Otherwise, compare the n with the middle element of the elements on the left side of mid and
set high to high = mid - 1.
Python Implementation
if list1[mid] < n:
low = mid + 1
else:
return mid
result = binary_search(list1, n)
if result != -1:
print("Element is present at index", str(result))
else:
print("Element is not present in list1")
Output:
Explanation:
Example Program: 2
OUTPUT
HASHING
Hashing is a data structure that is used to store a large amount of data, which can be
accessed in O(1) time by operations such as search, insert and delete. Various Applications
of Hashing are:
Indexing in database
Cryptography
Symbol Tables in Compiler/Interpreter
Dictionaries, caches, etc.
Hashing is an important Data Structure which is designed to use a special function called
the Hash function which is used to map a given value with a particular key for faster access
of elements.
The efficiency of mapping depends of the efficiency of the hash function used.
Example:
h(large_value) = large_value % m
Here, h() is the required hash function and „m‟ is the size of the hash table. For large
values, hash functions produce value in a given range.
How Hash Function Works?
It should always map large keys to small keys.
It should always generate values between 0 to m-1 where m is the size of the hash table.
It should uniformly distribute large keys into hash table slots.
Collision Handling
If we know the keys beforehand, then we have can have perfect hashing. In perfect hashing,
we do not have any collisions. However, If we do not know the keys, then we can use the
following methods to avoid collisions:
Chaining
Open Addressing (Linear Probing, Quadratic Probing, Double Hashing)
Chaining
While hashing, the hashing function may lead to a collision that is two or more keys are
mapped to the same value. Chain hashing avoids collision. The idea is to make each cell of
hash table point to a linked list of records that have same hash function value.
Performance of Hashing
The performance of hashing is evaluated on the basis that each key is equally likely to be
hashed for any slot of the hash table.
m = Length of Hash Table
n = Total keys to be inserted in the hash table
for i in range(len(hashTable)):
print(i, end = " ")
for j in hashTable[i]:
print("-->", end = " ")
print(j, end = " ")
print()
def Hashing(keyvalue):
return keyvalue % len(HashTable)
hash_key = Hashing(keyvalue)
Hashtable[hash_key].append(value)
Output:
What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there
is a possibility that two keys result in the same value. The situation where a newly inserted
key maps to an already occupied slot in the hash table is called collision and must be
handled using some collision handling technique.
What are the chances of collisions with the large table?
Collisions are very likely even if we have a big table to store keys. An important
observation is Birthday Paradox. With only 23 persons, the probability that two people
have the same birthday is 50%.
Separate Chaining:
Separate chaining is one of the most popular and commonly used techniques in order to
handle collisions.
The linked list data structure is used to implement this technique. So what happens is, when
multiple elements are hashed into the same slot index, then these elements are inserted into
a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a linked list.
Now, we can use a key K to search in the linked list by just linearly traversing. If the
intrinsic key for any entry is equal to K then it means that we have found our entry. If we
have reached the end of the linked list and yet we haven‟t found our entry then it means that
the entry does not exist. Hence, the conclusion is that in separate chaining, if two different
elements have the same hash value then we store both the elements in the same linked list
one after the other.
Example: Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101
separate chaining is to implement the array as a linked list called a chain.
Advantages:
Simple to implement.
Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may be
inserted or deleted.
Disadvantages:
The cache performance of chaining is not good as keys are stored using a linked list.
Open addressing provides better cache performance as everything is stored in the same
table.
Wastage of Space (Some Parts of the hash table are never used)
If the chain becomes long, then search time can become O(n) in the worst case
Uses extra space for links
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally
likely to be hashed to any slot of the table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Open Addressing:
Like separate chaining, open addressing is a method for handling collisions.
In Open Addressing, all elements are stored in the hash table itself.
So at any point, the size of the table must be greater than or equal to the total
number of keys (Note that we can increase table size by copying old data if
needed).
This approach is also known as closed hashing.
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found,
insert k.
Search(k): Keep probing until the slot‟s key doesn‟t become equal to k or an empty
slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then the search
may fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn‟t stop at a
deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original
location of the hash. If in case the location that we get is already occupied, then we check
for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700,
76, 85, 92, 73, 101.
Challenges in Linear Probing :
Primary Clustering:
One of the problems with linear probing is Primary clustering, many consecutive
elements form groups and it starts taking time to find a free slot or to search for an
element.
Secondary Clustering:
Secondary clustering is less severe, two records only have the same collision chain
(Probe Sequence) if their initial position is the same.
Example:
Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to
be inserted are 50, 70, 76, 93.
Step1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.
Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will
map to slot number 0 because 50%5=0. So insert it into slot number 0.
Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is
already at slot number 0 so, search for the next empty slot and insert it.
Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70 is
already at slot number 1 so, search for the next empty slot and insert it.
Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So insert it
into slot number 3.
2. Quadratic Probing
Quadratic probing is a method with the help of which we can solve the problem
of clustering.
This method is also known as the mid-square method.
In this method, we look for the i2‘th slot in the ith iteration.
We always start from the original hash location.
If only the location is occupied then we check the other slots.
Step 3: Inserting 50
Hash(50) = 50 % 7 = 1
In our hash table slot 1 is already occupied. So, we will search for slot 1+1 2,
i.e. 1+1 = 2,
Again slot 2 is found occupied, so we will search for cell 1+2 2, i.e.1+4 = 5,
Now, cell 5 is not occupied so we will place 50 in slot 5.
In chaining, Hash table never fills up, we In open addressing, table may
2. can always add more elements to chain. become full.
S.No. Separate Chaining Open Addressing
Chaining is Less sensitive to the hash Open addressing requires extra care
3. function or load factors. to avoid clustering and load factor.
LOAD FACTOR
Load factor is defined as (m/n) where n is the total size of the hash table and m is the
preferred number of entries which can be inserted before a increment in size of the
underlying data structure is required.
The Hash table provides Constant time complexity of insertion and searching, provided the
hash function is able to distribute the input load evenly.
That‟s because if each element is at a different index, then we can directly calculate the hash
and locate it at that index, but in case of Collision, the time complexity can go up to O(N) in
the worst case, as might need to transverse over other elements comparing against the
element we need to search.
Load Factor in hashing which is basically a measure that decides when exactly to increase the
size of the Hash Table to maintain the same time complexity of O(1).
1. If there are 16 elements in the HashTable, the hash function method will
distribute one element in each Index. The searching for any item, in this case,
will take the only lookup.
2. If there are 32 elements in the HashTable, the hash function method will
distribute two elements in each Index. The searching for any item, in this case,
will take the maximum of two lookups.
3. Similarly, if there are 128 elements in HashTable, the hash function method
will distribute eight elements in each Index. The searching for any item, in this
case, will take the maximum eight lookups.
Load factor is defined as (m/n) where n is the total size of the hash table and m is the
preferred number of entries that can be inserted before an increment in the size of the
underlying data structure is required.
Initial Capacity
The initial capacity is the number of Indexes allocated in the HashTable. It is created when
the HashTable is initialized.
The capacity of the HashTable is doubled each time it reaches the threshold.
If you recall, Chaining was one of the ways of collision resolution techniques, and
HashTables usually use Chaining technique. So in case the same Index is generated for
multiple keys, all these elements will be stored against the same index in the form of the
chain. So we can see that one Index can store multiple elements, and for the same reason this
chain of elements is also referred to as “buckets”.
Buckets are the group(or chain) of elements, whose hash indexes generated from the hash
function are the same.
E.g. if we have initialized the HashTable with initial capacity of 16, then the hash function
will make sure the key-value pairs will be distributed among 16 indexes equally, thus each
bucket will carry as few elements as possible.
The Load factor is a measure that decides when to increase the HashTable capacity to
maintain the search and insert operation complexity of O(1).
The default load factor of HashMap used in java, for instance, is 0.75f (75% of the map size).
That means if we have a HashTable with an array size of 100, then whenever we have 75
elements stored, we will increase the size of the array to double of its previous size i.e. to 200
now, in this case.
The Load Factor decides “when to increase the size of the hash Table.”
The load factor can be decided using the following formula:
Where:
m– is the number of entries in a HashTable
n – is the total size of HashTable
We insert the first element, now check if we need to increase the size of the HashTable
capacity or not.
In this case, the size of the hashmap is 1, and the bucket size is 16. So, 1/16=0.0625. Now
compare this value with the default load factor.
0.0625<0.75
12/16=0.75
As soon as we insert the 13th element in the hashmap, the size of hashmap is increased
because:
13/16=0.8125
0.8125>0.75
REHASHING
Rehashing means hashing again. Basically, when the load factor increases to more than its
pre-defined value (e.g. 0.75 as taken in above examples), the Time Complexity for search and
insert increases.
So to overcome this, the size of the array is increased(usually doubled) and all the values
are hashed again and stored in the new double sized array to maintain a low load factor and
low complexity.
This means if we had Array of size 100 earlier, and once we have stored 75 elements into
it(given it has Load Factor=75), then when we need to store the 76th element, we double its
size to 200.
With the new size the Hash function can change, which means all the 75 elements we had
stored earlier, would now with this new hash Function might yield different Index to place
them, so basically we rehash all those stored elements with the new Hash Function and place
them at new Indexes of newly resized bigger HashTable.
Why Rehashing?
Rehashing is done because whenever key-value pairs are inserted into the map, the load
factor increases, which implies that the time complexity also increases as explained above.
This might not give the required time complexity of O(1). Hence, rehash must be done,
increasing the size of the bucketArray so as to reduce the load factor and the time complexity.
And say the Hash function used was division method: Key % ArraySize
So we can add this 4th element to this Hash table, and we need to increase its size to 6 now.
But after the size is increased, the hash of existing elements may/not still be the same.
E.g. The earlier hash function was Key%3 and now it is Key%6.
If the hash used to insert is different from the hash we would calculate now, then we can not
search the Element.
E.g. 100 was inserted at Index 1, but when we need to search it back in this new Hash Table
of size=6, we will calculate its hash = 100%6 = 4
But 100 is not on the 4th Index, but instead at the 1st Index.
So we need the rehashing technique, which rehashes all the elements already stored using the
new Hash Function.
Element1: Hash(100) = 100%6 = 4, so Element1 will be rehashed and will be stored at 5th
Index in this newly resized HashTable, instead of 1st Index as on previous HashTable.
Element2: Hash(101) = 101%6 = 5, so Element2 will be rehashed and will be stored at 6th
Index in this newly resized HashTable, instead of 2nd Index as on previous HashTable.
Element3: Hash(102) = 102%6 = 6, so Element3 will be rehashed and will be stored at 4th
Index in this newly resized HashTable, instead of 3rd Index as on previous HashTable.
Since the Load Balance now is 3/6 = 0.5, we can still insert the 4th element now.
Element4: Hash(103) = 103%6 = 1, so Element4 will be stored at 1st Index in this newly
resized HashTable.
Rehashing Steps –
1. For each addition of a new entry to the map, check the current load factor.
2. If it‟s greater than its pre-defined value, then Rehash.
3. For Rehash, make a new array of double the previous size and make it the new
bucket array.
4. Then traverse to each element in the old bucketArray and insert them back so
as to insert it into the new larger bucket array.
However, it must be noted that if you are going to store a really large number of elements in
the HashTable then it is always good to create a HashTable with sufficient capacity upfront
as this is more efficient than letting it perform automatic rehashing.
------------------------------UNIT 3 COMPLETED----------------------