0% found this document useful (0 votes)
21 views36 pages

DS Unit 2 Ece

The document outlines various searching and sorting algorithms including Linear Search, Binary Search, Insertion Sort, Selection Sort, Quick Sort, Merge Sort, and Heap Sort. It explains the principles, steps, and algorithms for each method, highlighting their efficiency and application in data processing. Additionally, it categorizes sorting into internal and external types based on data accommodation in memory.

Uploaded by

rohit1067in
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views36 pages

DS Unit 2 Ece

The document outlines various searching and sorting algorithms including Linear Search, Binary Search, Insertion Sort, Selection Sort, Quick Sort, Merge Sort, and Heap Sort. It explains the principles, steps, and algorithms for each method, highlighting their efficiency and application in data processing. Additionally, it categorizes sorting into internal and external types based on data accommodation in memory.

Uploaded by

rohit1067in
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Unit – II Syllabus

Searching: List Searches, Linear and Binary Search Methods.


Sorting: Selection Sort, Insertion Sort, Quick Sort, Merge Sort, Heap sort.
Hashing: Hash Function, Separate Chaining, Collision Resolution-Separate Chaining.
SEARCHING
• Searching means to find whether a particular value is present in an array or not.
• If the value is present in the array, then searching is said to be successful and the searching process
gives the location of that value in the array.
• However, if the value is not present in the array, the searching process displays an appropriate
message and in this case searching is said to be unsuccessful.
Searching techniques are linear search, binary search

LINEAR SEARCH:

• Linear search is a technique which traverse the array sequentially to locate given item or search
element.
• In Linear search, we access each element of an array one by one sequentially and see weather it
is desired element or not. We traverse the entire list and match each element of the list with the
item whose location is to be found. If the match found then location of the item is returned
otherwise the algorithm return NULL.
• A search is successful then it will return the location of desired element
• If A search will unsuccessful if all the elements are accessed and desired element not found.
• Linear search is mostly used to search an unordered list in which the items are not sorted.
Linear search is implemented using following steps...
Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the first element in the list.
Step 3 - If both are matched, then display "Given element is found!!!" and terminate the function Step
4 - If both are not matched, then compare search element with the next element in the list.
Step 5 - Repeat steps 3 and 4 until search element is compared with last element in the list.
Step 6 - If last element in the list also doesn't match, then display "Element is not found!!!" and
terminate the function.
Example:
Consider the following list of elements and the element to be searched..
BINARY SEARCH:

• Binary search is the search technique which works efficiently on the sorted lists. Hence, in order
to search an element into some list by using binary search technique, we must ensure that the list
is sorted.
• Binary search follows divide and conquer approach in which, the list is divided into two halves
and the item is compared with the middle element of the list. If the match is found then, the
location of middle element is returned otherwise, we search into either of the halves depending
upon the result produced through the match.
Algorithm:
Step 1 - Read the search element from the user.
Step 2 - Find the middle element in the sorted list.
Step 3 - Compare the search element with the middle element in the sorted list.
Step 4 - If both are matched, then display "Given element is found!!!" and terminate the function.
Step 5 - If both are not matched, then check whether the search element is smaller or larger than
the middle element.
Step 6 - If the search element is smaller than middle element, repeat steps 2, 3, 4 and 5 for the left
sublist of the middle element.
Step 7 - If the search element is larger than middle element, repeat steps 2, 3, 4 and 5 for the right
sublist of the middle element.
Step 8 - Repeat the same process until we find the search element in the list or until sublist
contains only one element.
Step 9 - If that element also doesn't match with the search element, then display "Element is not
found in the list!!!" and terminate the function.
Example:
Example 2:
SORTINGS:

• Definition: Sorting is a technique to rearrange the list of records(elements) either in ascending


or descending order, Sorting is performed according to some key value of each record.
Categories of Sorting:
The sorting can be divided into two categories. These are:
• Internal Sorting
• External Sorting

• Internal Sorting: When all the data that is to be sorted can be accommodated at a time in the
main memory (Usually RAM). Internal sortings has five different classifications: insertion,
selection, exchanging, merging, and distribution sort

• External Sorting: When all the data that is to be sorted can’t be accommodated in the memory
(Usually RAM) at the same time and some have to be kept in auxiliary memory such as hard disk,
floppy disk, magnetic tapes etc.
Ex: Natural, Balanced, and Polyphase.
INSERTION SORT:

• In Insertion sort the list can be divided into two parts, one is sorted list and other is unsorted list.
In each pass the first element of unsorted list is transfers to sorted list by inserting it in appropriate
position or proper place.
• The similarity can be understood from the
style we arrange a deck of cards. This sort
works on the principle of inserting an
element at a particular position, hence the
name Insertion Sort.
Following are the steps involved in insertion sort:
1. We start by taking the second element of the given array, i.e. element at index 1, the key. The key
element here is the new card that we need to add to our existing sorted set of cards
2. We compare the key element with the element(s) before it, in this case, element at index 0: o If
the key element is less than the first element, we insert the key element before the first element.
o If the key element is greater than the first element, then we insert it after the first element.
3. Then, we make the third element of the array as key and will compare it with elements to it's left
and insert it at the proper position.
4. And we go on repeating this, until the array is sorted.
Example 1:
Example 2:

SELECTION SORT:

• Given a list of data to be sorted, we simply select the smallest item and place it in a sorted list.
These steps are then repeated until we have sorted all of the data.
• In first step, the smallest element is search in the list, once the smallest element is found, it is
exchanged with the element in the first position.
• Now the list is divided into two parts.
One is sorted list other is unsorted list.
Find out the smallest element in the
unsorted list and it is exchange with the
starting position of unsorted list, after
that it will added in to sorted list.
• This process is repeated until all the elements are sorted.
Ex: asked to sort a list on paper.
Algorithm:
SELECTION SORT(ARR, N)
Step 1: Repeat Steps 2 and 3 for K = 1 to N-1
Step 2: CALL SMALLEST(ARR, K, N, Loc)
Step 3: SWAP A[K] with ARR[Loc]
Step 4: EXIT
Algorithm for finding minimum element in the list.
SMALLEST (ARR, K, N, Loc)
Step 1: [INITIALIZE] SET Min = ARR[K]
Step 2: [INITIALIZE] SET Loc = K
Step 3: Repeat for J = K+1 to N
IF Min > ARR[J]
SET Min = ARR[J]
SET Loc = J
[END OF IF]
[END OF LOOP] Step
4: RETURN Loc
Example 1:
Example 2: Consider the elements 23,78,45,88,32,56

Time Complexity:
Number of elements in an array is ‘N’
Number of passes required to sort is ‘N-1’
Number of comparisons in each pass is 1st pass N-1, 2nd Pass N-2 … Time
required for complete sorting is:
T(n) <= (N-1)*(N-1)
T(n) <= (N-1)2
Finally, The time complexity is O(n2).
QUICK SORT: Quick sort
follows

Divide and Conquer algorithm. It is dividing array in to smaller parts based on partitioning and
performing the sort operations on those divided smaller parts. Hence, it works well for largedatasets.
So, here are the steps how Quick sort works in simple words.
1. First select an element which is to be called as pivot element.
2. Next, compare all array elements with the selected pivot element and arrange them in such a way
that, elements less than the pivot element are to its left and greater than pivot is to it's right.
3. Finally, perform the same operations on left and right side elements to the pivot element.

How does Quick Sort Partitioning Work


1. First find the "pivot" element in the array.
2. Start the left pointer at first element of the array.
3. Start the right pointer at last element of the array.
4. Compare the element pointing with left pointer and if it is less than the pivot element, then move
the left pointer to the right (add 1 to the left index). Continue this until left side element is greater
than or equal to the pivot element.
5. Compare the element pointing with right pointer and if it is greater than the pivot element, then
move the right pointer to the left (subtract 1 to the right index). Continue this until right side
element is less than or equal to the pivot element.
6. Check if left pointer is less than or equal to right pointer, then swap the elements in locations of
these pointers.
7. Check if index of left pointer is greater than the index of the right pointer, then swap pivot element
with right pointer.
Example:

Algorithm:
quickSort(array, lb, ub)
{
if(lb< ub)
{
pivotIndex = partition(arr, lb, ub);
quickSort(arr, lb, pIndex - 1);
quickSort(arr, pivotIndex+1, ub);

}
}
MERGE SORT
Merge sort is one of the most efficient sorting algorithms. It works on the principle of Divide and
Conquer. Merge sort repeatedly breaks down a list into several sublists until each sublist consists of a
single element and merging those sublists in a manner that results into a sorted list.

Implementation Recursive Merge Sort:

The merge sort starts at the Top and proceeds downwards, “split the array into two, make a recursive
call, and merge the results.”, until one gets to the bottom of the array-tree. Example: Let us consider an
example to understand the approach better.
1. Divide the unsorted list into n sub-lists based on mid value, each array consisting 1 element
2. Repeatedly merge sub-lists to produce newly sorted sub-lists until there is only 1 sub-list
remaining. This will be the sorted list Recursive Mere Sort Example:

Example 2:
MergeSort Algoritm:
MergeSort(A, lb, ub )
{

If lb<ub
{
mid = floor(lb+ub)/2;
mergeSort(A, lb, mid)
mergeSort(A, mid+1, ub) merge(A, lb,
ub , mid)
}
}

Two- Way Merge Sort:

Merge Algorithm:
Step 1: set i,j,k=0
Step 2: if A[i]<B[j] then copy A[i] to C[k] and
increment i and k
else
copy B[j] to C[k] and increment j and k
Step 3: copy remaining elements of either A or B into Array C.
HEAP SORT
Heap sort processes the elements by creating the min-heap or max-heap using the elements of the given array.
Min-heap or max-heap represents the ordering of array in which the root element represents the minimum or
maximum element of the array.

Heap sort basically recursively performs two main operations -

o Build a heap H, using the elements of array.


o Repeatedly delete the root element of the heap formed in 1st phase.

What is a heap?

A heap is a complete binary tree, and the binary tree is a tree in which the node can have the utmost two
children. A complete binary tree is a binary tree in which all the levels except the last level, i.e., leaf node,
should be completely filled, and all the nodes should be left-justified.

What is heap sort?

Heapsort is a popular and efficient sorting algorithm. The concept of heap sort is to eliminate the elements one
by one from the heap part of the list, and then insert them into the sorted part of the list.

Algorithm
HeapSort(arr)
BuildMaxHeap(arr)
for i = length(arr) to 2
swap arr[1] with arr[i]
heap_size[arr] = heap_size[arr] ? 1
MaxHeapify(arr,1)
End

BuildMaxHeap(arr)

BuildMaxHeap(arr)
heap_size(arr) = length(arr)
for i = length(arr)/2 to 1
MaxHeapify(arr,i)
End

MaxHeapify(rr,i)

MaxHeapify(arr,i)
L = left(i)
R = right(i)
if L ? heap_size[arr] and arr[L] > arr[i]
largest = L
else
largest = i
if R ? heap_size[arr] and arr[R] > arr[largest]
largest = R
if largest != i
swap arr[i] with arr[largest]
MaxHeapify(arr,largest)
End
Working of Heap sort Algorithm

Now, let's see the working of the Heapsort Algorithm.

In heap sort, basically, there are two phases involved in the sorting of elements. By using the heap sort
algorithm, they are as follows -

o The first step includes the creation of a heap by adjusting the elements of the array.
o After the creation of heap, now remove the root element of the heap repeatedly by shifting it to the end
of the array, and then store the heap structure with the remaining elements.

Now let's see the working of heap sort in detail by using an example. To understand it more clearly, let's take an
unsorted array and try to sort it using heap sort. It will make the explanation clearer and easier.

First, we have to construct a heap from the given array and convert it into max heap.

After converting the given heap into max heap, the array elements are -
Next, we have to delete the root element (89) from the max heap. To delete this node, we have to swap it with
the last node, i.e. (11). After deleting the root element, we again have to heapify it to convert it into max heap.

After swapping the array element 89 with 11, and converting the heap into max-heap, the elements of array are -

In the next step, again, we have to delete the root element (81) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (54). After deleting the root element, we again have to heapify it to convert it
into max heap.

After swapping the array element 81 with 54 and converting the heap into max-heap, the elements of array are -
In the next step, we have to delete the root element (76) from the max heap again. To delete this node, we have
to swap it with the last node, i.e. (9). After deleting the root element, we again have to heapify it to convert it
into max heap.

After swapping the array element 76 with 9 and converting the heap into max-heap, the elements of array are -

In the next step, again we have to delete the root element (54) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (14). After deleting the root element, we again have to heapify it to convert it
into max heap.

After swapping the array element 54 with 14 and converting the heap into max-heap, the elements of array are -

In the next step, again we have to delete the root element (22) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (11). After deleting the root element, we again have to heapify it to convert it
into max heap.
After swapping the array element 22 with 11 and converting the heap into max-heap, the elements of array are -

In the next step, again we have to delete the root element (14) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (9). After deleting the root element, we again have to heapify it to convert it
into max heap.

After swapping the array element 14 with 9 and converting the heap into max-heap, the elements of array are -

In the next step, again we have to delete the root element (11) from the max heap. To delete this node, we have
to swap it with the last node, i.e. (9). After deleting the root element, we again have to heapify it to convert it
into max heap.

After swapping the array element 11 with 9, the elements of array are -
Now, heap has only one element left. After deleting it, heap will be empty.

After completion of sorting, the array elements are -

Now, the array is completely sorted.

Time Complexities All the Searching & Sorting Techniques:


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

HASHING :
Hashing is a technique that is used to store, retrieve and find data in the data structure
called Hash Table. It is used to overcome the drawback of Linear Search (Comparison) &
Binary Search (Sorted order list). It involves two important concepts-
 Hash Table
 Hash Function
Hash table
A hash table is a data structure that is used to store and retrieve data (keys) very
quickly.

P
It is an array of some fixed size, containing the keys.
Hash table run from 0 to Tablesize – 1.
Each key is mapped into some number in the range 0 to Tablesize – 1.

AP
This mapping is called Hash function.
Insertion of the data in the hash table is based on the key value obtained from the
hash function.
Using same hash key value, the data can be retrieved from the hash table by few
or more Hash key comparison.
R
The load factor of a hash table is calculated using the formula:

(Number of data elements in the hash table) / (Size of the hash table)
CO

Factors affecting Hash Table Design

Hash function
Table size.
Collision handling scheme
U

0
ST

1
2
3
.
. Simple Hash table with table size = 10
8
9

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

Hash function:
It is a function, which distributes the keys evenly among the cells in the Hash
Table.
Using the same hash function we can retrieve data from the hash table.
Hash function is used to implement hash table.
The integer value returned by the hash function is called hash key.
If the input keys are integer, the commonly used hash function is

P
H ( key ) = key % Tablesize

AP
typedef unsigned int index;
index Hash ( const char *key , int Tablesize )
{
unsigned int Hashval = 0 ;
R
while ( * key ! = „ \0 „ )
Hashval + = * key ++ ;
CO

return ( Hashval % Tablesize ) ;


}

A simple hash function


U

Types of Hash Functions


ST

1. Division Method
2. Mid Square Method
3. Multiplicative Hash Function
4. Digit Folding
1. Division Method:
It depends on remainder of division.
Divisor is Table Size.
Formula is ( H ( key ) = key % table size )

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

E.g. consider the following data or record or key (36, 18, 72, 43, 6) table size = 8

P
AP
2. Mid Square Method:
We first square the item, and then extract some portion of the resulting digits. For
example, if the item were 44, we would first compute 442=1,936. Extract the middle two digit
R
93 from the answer. Store the key 44 in the index 93.
CO
U

93 44
ST

3. Multiplicative Hash Function:


Key is multiplied by some constant value.
Hash function is given by,
H(key)=Floor (P * ( key * A ))
P = Integer constant [e.g. P=50]
A = Constant real number [A=0.61803398987],suggested by Donald Knuth to use this
constant

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

E.g. Key 107


H(107)=Floor(50*(107*0.61803398987))
=Floor(3306.481845)
H(107)=3306
Consider table size is 5000

P
3306 107

AP
4999

4. Digit Folding Method:


R
The folding method for constructing hash functions begins by dividing the item into
CO

equal-size pieces (the last piece may not be of equal size). These pieces are then added together
to give the resulting hash key value. For example, if our item was the phone number 436-555-
4601, we would take the digits and divide them into groups of 2 (43, 65, 55, 46, 01). After the
addition, 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need
U

to perform the extra step of dividing by 11 and keeping the remainder. In this case 210 % 11 is 1,
so the phone number 436-555-4601 hashes to slot 1.
ST

6-555-4601

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the
same location. This condition is known as collision.
Characteristics of Good Hashing Function:

 It should be Simple to compute.


 Number of Collision should be less while placing record in Hash Table.
 Hash function with no collision  Perfect hash function.
 Hash Function should produce keys which are distributed uniformly in hash table.

P
 The hash function should depend upon every bit of the key. Thus the hash

AP
function that simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT):
If collision occurs, it should be handled or overcome by applying some technique. Such
technique is called CRT.
R
There are a number of collision resolution techniques, but the most popular are:
 Separate chaining (Open Hashing)
CO
 Open addressing. (Closed Hashing)
Linear Probing
Quadratic Probing
Double Hashing
U

Separate chaining (Open Hashing)


Open hashing technique.
ST

Implemented using singly linked list concept.


Pointer (ptr) field is added to each record.
When collision occurs, a separate chaining is maintained for colliding data.
Element inserted in front of the list.
H (key) =key % table size
Two operations are there:-
 Insert
 Find

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

Structure Definition for Node


typedef Struct node *Position;
Struct node
{
int data; defines the nodes
Position next;
};

Structure Definition for Hash Table


typedef Position List;
struct Hashtbl
{ Defines the hash table which contains
int Tablesize; array of linked list

P
List * theLists;
};

AP
Initialization for Hash Table for Separate Chaining
Hashtable initialize(int Tablesize)
{
HashTable H;
R
int i;
H = malloc (sizeof(struct HashTbl)); Allocates table
CO

H  Tablesize = NextPrime(Tablesize);
Hthe Lists=malloc(sizeof(List) * HTablesize);  Allocates array of list
for( i = 0; i < H  Tablesize; i++ )
U

{
H  TheLists[i] = malloc(Sizeof(Struct node));  Allocates list headers
ST

H  TheLists[i]  next = NULL;


}
return H;
}
Insert Routine for Separate Chaining
void insert (int Key, Hashtable H)
{
Position P, newnode; *[Inserts element in the Front of the list always]*
List L;

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

P = find ( key, H );
if(P = = NULL)
{
newnode = malloc(sizeof(Struct node));
L = H  TheLists[Hash(key,Tablesize)];
newnode  nex t= L  next;
newnode  data = key;
L  next = newnode;
}}

P
Position find( int key, Hashtable H){

AP
Position P, List L;
L = H TheLists[Hash(key,Tablesize)];
P = L  next;
while(P != NULL && P  data != key)
P = P  next;
R
return P;}
CO
If two keys map to same value, the elements are chained together.
Initial configuration of the hash table with separate chaining. Here we use SLL(Singly Linked List)
concept to chain the elements.
U

NULL
0
NULL
ST

1
NULL
2
NULL
3
NULL
4
NULL
5
NULL
6
NULL
7
NULL
8
NULL
9

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

Insert the following four keys 22 84 35 62 into hash table of size 10 using separate chaining.
The hash function is
H(key) = key % 10
1. H(22) = 22 % 10 =2 2. 84 % 10 = 4

P
AP
R
3.H(35)=35%10=5 4. H(62)=62%10=2
CO
U
ST

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
1. It requires more pointers, which occupies more memory space.
2.Search takes time. Since it takes time to evaluate Hash Function and also to traverse the
List
Open Addressing
Closed Hashing
Collision resolution technique
Uses Hi(X)=(Hash(X)+F(i))mod Tablesize

P
When collision occurs, alternative cells are tried until empty cells are found.
Types:-
 Linear Probing

AP
 Quadratic Probing
 Double Hashing
Hash function
 H(key) = key % table size.
Insert Operation
 To insert a key; Use the hash function to identify the list to which the
R
element should be inserted.
 Then traverse the list to check whether the element is already present.
 If exists, increment the count.
CO

 Else the new element is placed at the front of the list.


Linear Probing:
Easiest method to handle collision.
Apply the hash function H (key) = key % table size
Hi(X)=(Hash(X)+F(i))mod Tablesize,where F(i)=i.
U

How to Probing:
first probe – given a key k, hash to H(key)
second probe – if H(key)+f(1) is occupied, try H(key)+f(2)
ST

And so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size.
If i reach size-1, the probe has failed.
Depending on f (i), the probe may fail sooner.
Long sequences of probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size
H (Key)+2 % Table size

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

1. H(Key)=Key mod Tablesize


This is the common formula that you should apply for any hashing
If collocation occurs use Formula 2
2. H(Key)=(H(key)+i) Tablesize
Where i=1, 2, 3, …… etc
Example: - 89 18 49 58 69; Tablesize=10
1. H(89) =89%10
=9
2. H(18) =18%10

P
=8

AP
3. H(49) =49%10
=9 ((coloids with 89.So try for next free cell using formula 2))
i=1 h1(49) = (H(49)+1)%10
= (9+1)%10
R
=10%10
=0
CO
4. H(58) =58%10
=8 ((colloids with 18))
i=1 h1(58) = (H(58) +1)%10
= (8+1) %10
U

=9%10
=9 =>Again collision
ST

i=2 h2(58) =(H(58)+2)%10


=(8+2)%10
=10%10
=0 =>Again collision

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

EMPTY 89 18 49 58 69
0 49 49 49
1 58 58
2 69
3
4
5
6

P
7
8 18 18 18

AP
9 89 89 89 89

Linear probing R
CO
U
ST

Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic
probing, rather than always moving one spot, move i2 spots from the point of collision, where
i is the number of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2 with
wrap-around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2
Hi(X)=(Hash(X)+ i2)mod Tablesize

P
AP
R
CO

Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This
U

new problem is known as secondary clustering because elements that hash to the same hash
key will always probe the same alternative cells.
ST

Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a
collision occurs. The result of the second hash function will be the number of positions forms
the point of collision to insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed.
Hi(X)=(Hash(X)+i*Hash 2(X))mod Tablesize
A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than the size of the
table.

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

P
AP
R
CO

Rehashing
Once the hash table gets too full, the running time for operations will start to take too
long and may fail. To solve this problem, a table at least twice the size of the original will be
U

built and the elements will be transferred to the new table.


Advantage:
ST

A programmer doesn‟t worry about table system.


Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the
table size is roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half full
once an insertion fails

DOWNLOADED FROM STUCOR APP


III/IIYR CSE
III/IIYR CSE CS8351-DATA STRUCTURES
CS8391-DATA STRUCTURES
DOWNLOADED FROM STUCOR APP

once a specific load factor has been reached, where load factor is the ratio of the
number of elements in the hash table to the table size
Extendible Hashing
Extendible Hashing is a mechanism for altering the size of the hash table to accommodate
new entries when buckets overflow.
Common strategy in internal hashing is to double the hash table and rehash each entry.
However, this technique is slow, because writing all pages to disk is too expensive.
Therefore, instead of doubling the whole hash table, we use a directory of pointers to
buckets, and double the number of buckets by doubling the directory, splitting just the

P
bucket that overflows.
Since the directory is much smaller than the file, doubling it is much cheaper. Only one

AP
page of keys and pointers is split.
000 100 0 1
010 100
100 000
R
111 000 000 100 100 000
001 000 010 100 111 000
CO

011 000 001 000 101 000


101 000 011 000 111 001
111 001
U

001 010 00 01 10 11
101 100
101 110
ST

000 100 100 000


010 100 111 000
001 000 101 000
011 000 111 001
001 010 101 100
001 011 101 110

DOWNLOADED FROM STUCOR APP

You might also like