0% found this document useful (0 votes)
25 views32 pages

MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes

Hashing is a data structure technique that maps data to a fixed-size table using a hash function for efficient retrieval. Key concepts include hash tables, hash functions, collisions, and load factors, with techniques like direct hashing, open hashing, and closed hashing for collision resolution. The document also discusses static and dynamic hashing, their characteristics, advantages, and disadvantages, along with applications in databases, caching, and cryptography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views32 pages

MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes

Hashing is a data structure technique that maps data to a fixed-size table using a hash function for efficient retrieval. Key concepts include hash tables, hash functions, collisions, and load factors, with techniques like direct hashing, open hashing, and closed hashing for collision resolution. The document also discusses static and dynamic hashing, their characteristics, advantages, and disadvantages, along with applications in databases, caching, and cryptography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

MODULE 5

HASHING
Hashing in Data Structures

Hashing is a technique used to map data to a fixed-size table using a hash function. The primary
purpose of hashing is to enable efficient data retrieval in constant or near-constant time.

Key Concepts in Hashing

1. Hash Table:
o A data structure that stores key-value pairs.
o Keys are mapped to indices in the table using a hash function.
2. Hash Function:
o A function that converts a key into an index in the hash table.
o Ideal properties of a hash function:
▪ Uniform distribution of keys.
▪ Minimizes collisions.
3. Collisions:
o Occur when two keys map to the same index in the hash table.
o Collision resolution techniques are used to handle these situations.
4. Load Factor:
o The ratio of the number of stored keys to the size of the hash table.
o Helps determine when to resize the table to maintain efficiency.

Types of Hashing Techniques

1. Direct Hashing

• The hash function directly maps the key to a unique table index.
• Example: index = key % table_size.
• No collisions occur if keys are unique and within the table's range.
2. Open Hashing (Separate Chaining)

• Each index in the hash table stores a linked list (or another structure) of keys that map to
that index.
• Advantages:
o Handles collisions efficiently.
o Easy to implement.
• Disadvantages:
o Can lead to long chains if many keys collide.
o Requires additional memory for linked lists.

3. Closed Hashing (Open Addressing)

• Collisions are resolved by finding another open slot within the hash table.
• Common methods:
o Linear Probing: Increment the index sequentially until an empty slot is found.
▪ Pros: Simple to implement.
▪ Cons: Clustering can occur.
o Quadratic Probing: Increment the index using a quadratic function.
▪ Pros: Reduces clustering compared to linear probing.
▪ Cons: Still prone to secondary clustering.
o Double Hashing: Use a secondary hash function to compute the increment.
▪ Pros: Minimizes clustering.
▪ Cons: More complex to implement.

Applications of Hashing

1. Databases:
o Indexing for faster data retrieval.
2. Caching:
o Fast access to frequently used data.
3. Cryptography:
o Storing passwords securely.
4. Compiler Design:
o Symbol table implementation.
5. Networking:
o Hash-based routing in distributed systems.

Static and Dynamic Hashing in Data Structures


Hashing is used to store and retrieve data efficiently. Depending on the growth and
changes in the dataset, hashing is categorized into static and dynamic hashing.

In static hashing, the size of the hash table is fixed and does not change after the
hash table is created.

Characteristics:

• The hash function remains constant.


• The table size is pre-determined, which may lead to either:
o Overflow: When the table becomes full.
o Underutilization: When the table is sparsely filled.
• Suitable for datasets with a known, fixed size.

Advantages:

1. Simple to implement.
2. Predictable memory requirements since the table size is fixed.

Disadvantages:

1. Collisions: Managing overflow due to fixed table size can be inefficient.


2. Wasted space when the table is underutilized.
3. Poor performance if the load factor becomes too high.

Collision Handling in Static Hashing:

• Open Addressing: Probes for the next available slot.


• Separate Chaining: Uses linked lists to handle collisions.

2. Dynamic Hashing
In dynamic hashing, the hash table grows or shrinks dynamically based on the
number of elements stored. This technique is useful when the dataset size is
unknown or can change significantly over time.

Characteristics:

• The hash function and table size adapt to the current load.
• Commonly used in database systems to handle large, varying datasets.
• Load Factor: When it exceeds a threshold, the table resizes.

Types of Dynamic Hashing:

1. Extendible Hashing:
o Uses a directory with pointers to buckets.
o The hash function creates a binary hash, and the number of bits used
depends on the current table size.
o If a bucket overflows, only the bucket is split, and the directory size
may double.
2. Linear Hashing:
o Expands the hash table incrementally without using a directory.
o Splits buckets as needed, based on a split pointer.
o Helps avoid sudden jumps in memory usage.

Advantages:

1. Handles dynamic datasets effectively.


2. Reduces wasted memory due to over-allocation.
3. Avoids overflow issues by resizing the hash table when needed.

Disadvantages:

1. Slightly more complex to implement compared to static hashing.


2. Resizing or splitting buckets can temporarily affect performance.

Comparison: Static vs Dynamic Hashing

Aspect Static Hashing Dynamic Hashing

Hash Table Size Fixed Changes dynamically

Collision
Requires explicit strategies Buckets split or table resizes
Handling

Efficiency Decreases as load factor increases Maintains efficiency as size adjusts

Memory Usage Wastes memory if underutilized Optimized for current data size

Complexity Easier to implement More complex to manage

Small datasets with predictable Large, dynamic datasets (e.g.,


Use Cases
size databases)
Example

A small phone book as a hash table.


(Figure is from Wikipedia)

Just An Idea

◼ Hash table :
◼ Collection of pairs,
◼ Lookup function (Hash function)
◼ Hash tables are often used to implement
associative arrays,
◼ Worst-case time for Get, Insert, and Delete
is O(size).
◼ Expected time is O(1).
Search vs. Hashing
◼ Search tree methods: key comparisons
◼ Time complexity: O(size) or O(log n)
◼ Hashing methods: hash functions
◼ Expected time: O(1)
◼ Types
◼ Static hashing (section 8.2)
◼ Dynamic hashing (section 8.3)

Static Hashing
◼ Key-value pairs are stored in a fixed size table
called a hash table.
◼ A hash table is partitioned into many buckets.
◼ Each bucket has many slots.
◼ Each slot holds one record.
◼ A hash function f(x) transforms the identifier (key)
into an address in the hash table

LINEAR OR STATIC HASHING


#include <stdio.h>
#include <stdlib.h>
#define MAX 100
/*FUNCTION PROTOTYPE */
int create(int);
void linear_prob(int[], int, int);
void display (int[]);

void main()
{
int a[MAX],num,key,i;
int ans=1;
printf(" collision handling by linear probing : \n");
for (i=0;i<MAX;i++)
{
a[i] = -1;
}
do
{
printf("\n Enter the data");
scanf("%4d", &num);
key=create(num);
linear_prob(a,key,num);
printf("\n Do you wish to continue ? (1/0) ");
scanf("%d",&ans);
}while(ans);
display(a);
}
int create(int num)
{
int key;
key=num%100;
return key;

void linear_prob(int a[MAX], int key, int num)


{
int flag, i, count=0;
flag=0;
if(a[key]== -1)
{
a[key] = num;
}
else
{
printf("\nCollision Detected...!!!\n");
i=0;
while(i<MAX)
{
if (a[i]!=-1)
count++;
i++;

}
printf("Collision avoided successfully using LINEAR PROBING\n");
if(count == MAX)
{
printf("\n Hash table is full");
display(a);
exit(1);
}
for(i=key+1; i<MAX; i++)
if(a[i] == -1)
{
a[i] = num;
flag =1;
break;
}
//for(i=0;i<key;i++)
i=0;
while((i<key) && (flag==0))
{
if(a[i] == -1)
{
a[i] = num;
flag=1;
break;

i++;
}

void display(int a[MAX])


{
int i,choice;
printf("1.Display ALL\n 2.Filtered Display\n");
scanf("%d",&choice);
if(choice==1)
{
printf("\n the hash table is\n");
for(i=0; i<MAX; i++)
printf("\n %d %d ", i, a[i]);

}
else
{
printf("\n the hash table is\n");
for(i=0; i<MAX; i++)
if(a[i]!=-1)
{
printf("\n %d %d ", i, a[i]);
12_Hashing .txt
continue;
}
}

Ideal Hashing

◼ Uses an array table[0:b-1].


◼ Each position of this array is a bucket.
◼ A bucket can normally hold only one
dictionary pair.
◼ Uses a hash function f that converts
each key k into an index in the range [0,
b-1].
◼ Every dictionary pair (key, element) is
stored in its home bucket table[f[key]].
}

Some Issues

◼ Choice of hash function.


◼ Really tricky!
◼ To avoid collision (two different pairs are in
the same the same bucket.)
◼ Size (number of buckets) of hash table.
◼ Overflow handling method.
◼ Overflow: there is no space in the bucket for
the new pair.
Choice of Hash Function

◼ Requirements
◼ easy to compute
◼ minimal number of collisions
◼ If a hashing function groups key values
together, this is called clustering of the
keys.
◼ A good hashing function distributes the
key values uniformly throughout the
range.

Some hash functions

◼ Middle of square
◼ H(x):= return middle digits of x^2
◼ Division
◼ H(x):= return x % k
◼ Multiplicative:
◼ H(x):= return the first few digits of the
fractional part of x*k, where k is a fraction.
◼ advocated by D. Knuth in TAOCP vol. III.
Hashing By Division

◼ Domain is all integers.


◼ For a hash table of size b, the number of
integers that get hashed into bucket i is
approximately 232/b.
◼ The division method results in a uniform
hash function that maps approximately
the same number of keys into each
bucket.

Criterion of Hash Table

◼ The key density (or identifier density) of


a hash table is the ratio n/T
◼ n is the number of keys in the table
◼ T is the number of distinct possible keys
◼ The loading density or loading factor of a
hash table is  = n/(sb)
◼ s is the number of slots
◼ b is the number of buckets
Overflow Handling
◼ An overflow occurs when the home bucket for
a new pair (key, element) is full.
◼ We may handle overflows by:
◼ Search the hash table in some systematic fashion
for a bucket that is not full.
◼ Linear probing (linear open addressing).
◼ Quadratic probing.
◼ Random probing.
◼ Eliminate overflows by permitting each bucket to
keep a list of all pairs for which it is the home
bucket.
◼ Array linear list.
◼ Chain.

Linear probing (linear open


addressing)
◼ Open addressing ensures that all
elements are stored directly into the
hash table, thus it attempts to resolve
collisions using various methods.

◼ Linear Probing resolves collisions by


placing the data into the next open slot
in the table.
Dynamic Hashing (extensible
hashing)
• In this hashing scheme the set of keys
can be varied, and the address space
is allocated dynamically
– File F: a collection of records
– Record R: a key + data, stored in pages
(buckets)
– space utilization
NumberOfRe cord
NumberOfPa ges * PageCapaci ty

Dynamic Hashing Using


Directories II
◼ We need to consider some issues!
◼ Skewed Tree,
◼ Access time increased.
◼ Fagin et. al. proposed extendible hashing
to solve above problems.
◼ Ronald Fagin, Jürg Nievergelt, Nicholas
Pippenger, and H. Raymond Strong, Extendible
Hashing - A Fast Access Method for Dynamic Files,
ACM Transactions on Database Systems, 4(3):315-
344, 1979.
Dynamic Hashing Using
Directories III
◼ A directories is a table of pointer of
pages.
◼ The directory has k bits to index 2^k
entries.
◼ We could use a hash function to get the
address of entry of directory, and find
the page contents at the page.
/*Write a C Program to Demonstrate Insertion, Deletion, display in Max Priority Queue. */

#define SIZE 5
int PQ[SIZE],front=0,rear=-1;

void insert()
{
int ele,j;
if(rear==SIZE-1)
{
printf("Priority Queue is Full\n");
}
else
{
printf("Enter the element to Insert\n");
scanf("%d",&ele);
j=rear;
while(ele>PQ[j]&&j>=0)
{
PQ[j+1]=PQ[j];
j--;
}
PQ[j+1]=ele;
rear=rear+1;
}
}

void delet()
{
int item;
if(front>rear)
{
printf("Priority Queue is Empty\n");
}
else
{
item=PQ[front];
front=front+1;
printf("The deleted Element is %d\n",item);
}
}

void display()
{
int i;
if(front>rear)
{
printf("Proirity Queue Underflow\n");
}
else
{
printf("The elements in Priority Queue are\n");
for(i=front;i<=rear;i++)
printf("%d\t",PQ[i]);
}
}

void main()
{
int ch=1;
clrscr();
while(ch)
{
printf(" 1.Insert 2.Delete 3.Display 4.Exit \n");
printf("Enter your Choice\n");
scanf("%d",&ch);
switch(ch)
{
case 1: insert();
break;
case 2: delet();
break;
case 3 :display();
break;
case 4 :exit(0);
default: printf("Enter valid choice\n");
break;
}
}
}

/*Write a C Program to Demonstrate Insertion, Deletion, display in Min Priority Queue. */

#define SIZE 5
int PQ[SIZE],front=0,rear=-1;

void insert()
{
int ele,j;
if(rear==SIZE-1)
{
printf("Priority Queue is Full\n");
}
else
{
printf("Enter the element to Insert\n");
scanf("%d",&ele);
j=rear;
while(ele<PQ[j]&&j>=0)
{
PQ[j+1]=PQ[j];
j--;
}
PQ[j+1]=ele;
rear=rear+1;
}
}

void delet()
{
int item;
if(front>rear)
{
printf("Priority Queue is Empty\n");
}
else
{
item=PQ[front];
front=front+1;
printf("The deleted Element is %d\n",item);
}
}

void display()
{
int i;
if(front>rear)
{
printf("Proirity Queue Underflow\n");
}
else
{
printf("The elements in Priority Queue are\n");
for(i=front;i<=rear;i++)
printf("%d\t",PQ[i]);
}
}

void main()
{
int ch=1;
clrscr();
while(ch)
{
printf(" 1.Insert 2.Delete 3.Display 4.Exit \n");
printf("Enter your Choice\n");
scanf("%d",&ch);
switch(ch)
{
case 1: insert();
break;
case 2: delet();
break;
case 3 :display();
break;
case 4 :exit(0);
default: printf("Enter valid choice\n");
break;
}
}
}
r .~ .. c.,-o<J • • . '

lclJ,; ~f, £-vc..._ ~ ,() ~{/,11 /. ,(,v, ,\ e, '1 7-vr:, - -


t.,,•
J,-.,. (/,J... /J:'.,__f , 1'{ /~ t~ 1,,+ f, n tf,-J / i)M
\ l,,,it; I /, ( lGti (t ;/,<I (,, J -,:,, ~ r.-1h' ~ ( ",i rfl>,c l. Yd( ?c )J
~/~))./" ,(;""._ \l • ,·{ . .,_ : c«· 1
~,,Jw,,,..l ,..__,_t,_
\' 1-.,,-/1,W 6 )'. l 1,{ ,-..\,- ~ ~ \.., .,,\:; {> ( (, ( k tJd Ci 1,. ,. ~ ~(
~ vt.~c~ tJ (-,? ~
1 o{ ~ l,, :~<-:

L
~ ~
' t


l l

D
D
l
o
O
0
b 7) D
[
'tl
1) 0

L.1,W-,- ~ ("1->-j_
/'--~" lPth't' 'b~'1 ~
i~ \} ~ o-,{~
'tf cJ.; lJ_ '0ocU. ~

!)'?_·
I\ )\~~ 91
2- <: [
I\ \)
1 /r,
t I
1

'},. J l} I to l J
I ~ I 9-1< I ~)) [
l )
o (5 b 6 tJ
\) I O 1)
rJ
0
ti 0
G

0 o
o- () 0 () D
'
1

3 0

..3

- 2- 'i- ~ ~

-- 3 l;; 2-

/ 4-+4-~, : 17 -; 4--- +t 1-,:. l h

~ ~ 3bJ 2-o) lo
t~)

~ ~ d-,o Jlo,3-0
~vltx
,
¥ ~,·1 "~
il, n~-3
~

~" Cf\ , ~"1:-2, CJ


l\+\ J +'

• I

' '
........ ,

,q /,j "',,D, Y?,; _:'"q ~ Try-.,.._


~ ·,
~ ~
, ~ k. r/r-Wo0..~ ~ ~-l~
,
/½eu:-r- To.-4{,__ ~ T~

~W,\ ~ t t,., C"- +t)


u'U ~ 0 ,b t"'\,
·~(\-=- 4- 4-
'-I-
t
Q \ 'l-- ~
r : ,, ..
1- ;--'-t--l-- r--+--- '
L
3 ,--t----I--J__ _ _,___ _l

_)
4-' 1 --.,:._t--+---L_J
f ,

-~

s , + ~ ~ -l-
it7;:;--,
~

R[,·,;J ~ , 0

..
. g C; ,j ) k .. . ;!
. ~(\

' . . .

' .
-,~ I 2- ~ 4-
'

<pryi) ~ab;~11; ~- 2.. o, .1 t>, 4- 0 ,_s-

tv\ ~'C' t ~t,_ t ./2-ott. T ~


r.
0

0.7
'1---- 3
I •5""""
4-
...
2.. . ,6 , 0 I 2-

2-
~

2-
q-·
-3
1- 2-
, .o. ~ .Q -3 3
J j

4- '
·r1>. "
4- - •
+
-:! s -
Q)
-- ·----- ------·- ---·------
~
, I

' I J
C [ L4-) ; ""'" c[s J.].,-c[414]-r P3+ P,-, 1?
1
K ~ ::>
jl : 4 c[3,3}t ,c~-,4] + r1 tf'( ~

, .
@ cc,';~];,
1
t--'C'\ c[t1 o]+c{j,1]+{ P,-+Pl.--.+P~
~ ~,
t~1-- c[f11j,t--C~3,))f',, +pz._+fJ_,,~
~3
c. [r, 1-) ,t c_[4- ~ 1) +P, +p,__+f!J
.
~ "'-'-'~ \ 0 f f, 0 f 0. 1-· +O•S f 0, 4-,J
l. o.1.- ;- o. 4 + o .l- fo .3 {- 0.4-j
. ,) · e,.7 + o +o . 'L-+ o. '1 +o:lf-
:- ~" ( i ,_'11 .1s,, ,. q.
c_ [1/1) ; (. ~ ~ .. ~ 9--
7

:- ,___.~ (=-.:,, =-. :. ,~.a,,_


C [ " U-} ,.
~ , ' ,j _,,
j. e
C-o c,.~~--~- .~~-~ . . .:.....,lu~~ ..- ~ A«-· ·•

. ~h'-' ~ i,.-, ~ T~
"-l_av_~1' '~ ~ '1f- ~ - - -
2- J
,
0 ,
2- 3 4-
r I, s- J._,'
0•7
'- D,~ I ,o 2-, 0 3 3
3 0, lf- I,~ ~ 3 q..
-1---1------l~__µ,..._-+--'------;

4- 0,) 4- L-L-1--+--i~,
s-- ~ L---,.___._--L..-__,__~

11 I

You might also like