CO3 Notes Hashing

The document discusses hash file organization, which uses a hashing function to map records to disk blocks for fast retrieval based on a hash key field. It covers static hashing, which uses a fixed number of buckets, and dynamic hashing techniques like extendible hashing that allow the number of buckets to vary as the database size changes. Bucket overflows are handled using overflow chaining or open addressing methods.

Uploaded by

Nani Yagneshwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views10 pages

CO3 Notes Hashing

Uploaded by

Nani Yagneshwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CO-3

Hashing
Another type of primary file organization is based on hashing, which provides very fast access
to records under certain search conditions. This organization is usually called a hash file. The
search condition must be an equality condition on a single field, called the hash field. In most
cases, the hash field is also a key field of the file, in which case it is called the hash key. The
idea behind hashing is to provide a function h, called a hash function or randomizing function,
which is applied to the hash field value of a record and yields the address of the disk block in
which the record is stored. A search for the record within the block can be carried out in a main
memory buffer. For most records, we need only a single-block access to retrieve that record.
• In a huge database structure, it is very inefficient to search all the index values and reach
the desired data.
• Hashing technique is used to calculate the direct location of a data record on the disk
without using index structure.
• Data is stored at the data blocks whose address is generated by using the hashing
function.
• The memory location where these records are stored is known as data bucket or data
blocks.

Formally, let K denote the set of all search-key values, and let B denote the set of all bucket
addresses. A hash function h is a function from K to B. Let h denote a hash function.
Static Hashing:
◼ A bucket is a unit of storage containing one or more records (a bucket is typically a
disk block).
◼ In a hash file organization we obtain the bucket of a record directly from its search-
key value using a hash function.
◼ Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
◼ In most cases, the hash field is also a key field of the file, in which case it is called the
hash key.
◼ Hash function is used to locate records for access, insertion as well as deletion.
◼ One common hash function is the h(K) = K mod M function, which returns the
remainder of an integer hash field value K after division by M; this value is then used
for the record address.
Hash file organization of account file, using branch-name as key.
◼ There are 10 buckets,
◼ The numeric representation of the ith character is assumed to be the integer i.
◼ The hash function returns the sum of the numeric representations of the characters
modulo 10.
◼ E.g., h(Perryridge) = 125%10 = 5 h(Redwood) = 84%10 = 4
◼ h(Brighton) = 93%10 = 3
Other hashing functions can be used. One technique, called folding, involves applying an
arithmetic function such as addition or a logical function such as exclusive or to different
portions of the hash field value to calculate the hash address (for example, with an address
space from 0 to 999 to store 1,000 keys, a 6-digit key 235469 may be folded and stored at the
address: (235+964) mod 1000 = 199).
Another technique involves picking some digits of the hash field value—for instance, the third,
fifth, and eighth digits—to form the hash address (for example, storing 1,000 employees with
Social Security numbers of 10 digits into a hash file with 1,000 positions would give the Social
Security number 301-67-8923 a hash value of 172 by this hash function).

Handling of Bucket Overflows

So far, we have assumed that, when a record is inserted, the bucket to which it is mapped has
space to store the record. If the bucket does not have enough space, a bucket overflow is said
to occur. Bucket overflow can occur for several reasons:
• Insufficient buckets. The number of buckets, which we denote nB, must be chosen such
that nB > nr / fr, where nr denotes the total number of records that will be stored and fr
denotes the number of records that will fit in a bucket. This designation, of course, assumes
that the total number of records is known when the hash function is chosen.
• Skew. Some buckets are assigned more records than are others, so a bucket may overflow
even when other buckets still have space. This situation is called bucket skew.

◼ There are numerous methods for collision resolution, including the following:
➢ Closed Hashing (Overflow Chaining)
➢ Open Hashing (Open Addressing)
Despite allocation of a few more buckets than required, bucket overflow can still occur. We
handle bucket overflow by using overflow buckets. If a record must be inserted into a bucket
b, and b is already full, the system provides an overflow bucket for b, and inserts the record
into the overflow bucket. If the overflow bucket is also full, the system provides another
overflow bucket, and so on. All the overflow buckets of a given bucket are chained together in
a linked list, as in Figure 11.24. Overflow handling using such a linked list is called overflow
chaining or closed hashing.
Under an alternative approach, called open hashing, the set of buckets is fixed, and there are
no overflow chains. Instead, if a bucket is full, the system inserts records in some other bucket
in the initial set of buckets B. One policy is to use the next bucket (in cyclic order) that has
space; this policy is called linear probing. Other policies, such as computing further hash
functions, are also used. Open hashing has been used to construct symbol tables for compilers
and assemblers, but closed hashing is preferable for database systems. Thus, open hashing is
of only minor importance in database implementation.
Deficiencies or disadvantages of Static Hashing
In static hashing, function h maps search-key values to a fixed set of B of bucket addresses.
➢ Databases grow with time. If initial number of buckets is too small, performance
will degrade due to too much overflows.
➢ If file size at some point in the future is anticipated and number of buckets
allocated accordingly, significant amount of space will be wasted initially.
➢ If database shrinks, again space will be wasted.
➢ One option is periodic re-organization of the file with a new hash function, but
it is very expensive.
These problems can be avoided by using techniques that allow the number of buckets to be
modified dynamically.

Dynamic Hashing:
The main problem with Static Hashing is that the number of buckets is fixed. If a file shrinks
greatly, a lot of space is wasted; more important, if a file grows a lot, long overflow chains
develop, resulting in poor performance.
The hashing scheme described so far is called static hashing because a fixed number of buckets
M is allocated. The function does key-to-address mapping, whereby we are fixing the address
space. This can be a serious drawback for dynamic files.
Newer dynamic file organizations based on hashing allow the number of buckets to vary
dynamically with only localized reorganization.
◼ Good for database that grows and shrinks in size.
◼ Allows the hash function to be modified dynamically.
◼ Extendable hashing – one form of dynamic hashing in which copes with changes in
database size by splitting the buckets as the database grows and shrinks.
➢ The hash function in extendible hashing method generates values over a
relatively large range—namely, b-bit binary integers. A typical value for b is
32.
➢ At any time use only a prefix of the hash function to index into a table of bucket
addresses.
➢ Bucket address table size = 2i.
➢ Initially i = 0
➢ Value of i grows and shrinks as the size of the database grows and shrinks.
➢ Multiple entries in the bucket address table may point to a bucket.
➢ Thus, actual number of buckets is < 2i

Step 1 – Analyse Data Elements: Data elements may exist in various forms eg. Integer, String,
Float, etc. Currently, let us consider data elements of type integer. eg: 49.
Step 2 – Convert into binary format: Convert the data element in Binary form. For string
elements, consider the ASCII equivalent integer of the starting character and then convert the
integer into binary form. Since we have 49 as our data element, its binary form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory
is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary
number and match it to the directory id. Eg. The binary obtained is: 110001 and the global-
depth is 3. So, the hash function will return 3 LSBs of 110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id
001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows.
If an overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting
data in the buckets, it might happen that the Bucket overflows. In such cases, we need to follow
an appropriate procedure to avoid mishandling of data. First, Check if the local depth is less
than or equal to the global depth. Then choose one of the cases below.
➢ Case1: If the local depth of the overflowing Bucket is equal to the global depth,
then Directory Expansion, as well as Bucket Split, needs to be performed. Then
increment the global depth and the local depth value by 1. And, assign
appropriate pointers. Directory expansion will double the number of directories
present in the hash structure.
➢ Case2: In case the local depth is less than the global depth, then only Bucket
Split takes place. Then increment only the local depth value by 1. And, assign
appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket
that is split are rehashed w.r.t the new global depth of the directory.
Step 9 – The element is successfully hashed.

Example: Now, let us consider a prominent example of hashing the following elements:
16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
1) First, calculate the binary forms of each of the given numbers.

16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
2) Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks like
this:
Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of
10000 which is 0. Hence, 16 is mapped to the directory with id=0.

Inserting 4 and 6:
Both 4(100) and 6(110) have 0 in their LSB. Hence, they are hashed as follows:

Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by directory 0
is already full. Hence, Over Flow occurs.
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and
directory expansion take place. Also, rehashing of numbers present in the overflowing bucket
takes place after the split. And, since the global depth is incremented by 1, now, the global
depth is 2. Hence, 16,4,6,22 is now rehashed w.r.t 2 LSBs. [
16(10000),4(100),6(110),22(10110)]

Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with id 00
and 10. Here, we encounter no overflow condition.

Inserting 31,7,9: All of these elements [ 31(11111), 7(111), 9(1001)] have either 01 or 11 in
their LSBs. Hence, they are mapped on the bucket pointed out by 01 and 11. We do not
encounter any overflow condition here.
Inserting 20: Insertion of data element 20 (10100) will again cause the overflow problem.

20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since the local depth
of the bucket = global-depth, directory expansion (doubling) takes place along with bucket
splitting. Elements present in overflowing bucket are rehashed with the new global depth. Now,
the new Hash table looks like this:

Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore 26
best fits in the bucket pointed out by directory 010.
The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of bucket <
Global depth (2<3), directories are not doubled but, only the bucket is split and elements are
rehashed. Finally, the output of hashing the given list of numbers is obtained.

VP700 Technical Training Rev 8a
100% (3)
VP700 Technical Training Rev 8a
141 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
6 pages
Presantation - Chapter 07-Decrease and Conquer
No ratings yet
Presantation - Chapter 07-Decrease and Conquer
41 pages
Siemens, Teamcenter PDF
No ratings yet
Siemens, Teamcenter PDF
20 pages
Advanced 8086 Microprocessor Trainer: Learning Material
No ratings yet
Advanced 8086 Microprocessor Trainer: Learning Material
80 pages
Software Quality Metrics
No ratings yet
Software Quality Metrics
16 pages
RTOS Class Notes
100% (1)
RTOS Class Notes
15 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
Cisco CCIE Lab Study Guide
No ratings yet
Cisco CCIE Lab Study Guide
7 pages
Contents - Programming With Node-RED
100% (1)
Contents - Programming With Node-RED
8 pages
Gcc-Tbc-January 2020 Exam. English-Hindi-Marathi-30 & 40 Batchwise Objective Question Papers
No ratings yet
Gcc-Tbc-January 2020 Exam. English-Hindi-Marathi-30 & 40 Batchwise Objective Question Papers
350 pages
Faculty of Engineering and Technology Semester End Examination Question Paper
100% (1)
Faculty of Engineering and Technology Semester End Examination Question Paper
2 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
FortiGate 40C
No ratings yet
FortiGate 40C
2 pages
Them Bombs - Manual (En 3.0)
No ratings yet
Them Bombs - Manual (En 3.0)
31 pages
Sim-Acc-224l v0 Cain Final
No ratings yet
Sim-Acc-224l v0 Cain Final
87 pages
Unit V
No ratings yet
Unit V
93 pages
ISACA Kenya Cyber Crime and Digital Forensics PDF
No ratings yet
ISACA Kenya Cyber Crime and Digital Forensics PDF
36 pages
Extendible Hashing
No ratings yet
Extendible Hashing
65 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
UNIT 1 - Hashing
No ratings yet
UNIT 1 - Hashing
118 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Open SSH (Configuring Secure Shell)
No ratings yet
Open SSH (Configuring Secure Shell)
10 pages
Extendible Hashing
No ratings yet
Extendible Hashing
4 pages
Chapter 7 Indexing Part2
No ratings yet
Chapter 7 Indexing Part2
41 pages
Café Time Time Management System02
No ratings yet
Café Time Time Management System02
21 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
Hashing
No ratings yet
Hashing
33 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Brkarc 3000
No ratings yet
Brkarc 3000
242 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Unit 3.docx Dbms
No ratings yet
Unit 3.docx Dbms
25 pages
What Is Hashing?
No ratings yet
What Is Hashing?
24 pages
Database Indexing and Hashing
No ratings yet
Database Indexing and Hashing
7 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
11 What Is Hashing in DBMS
No ratings yet
11 What Is Hashing in DBMS
20 pages
Huawei MSC Pool
No ratings yet
Huawei MSC Pool
4 pages
6 Hash-Based Indexing
No ratings yet
6 Hash-Based Indexing
26 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
DSimp 2
No ratings yet
DSimp 2
21 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
Hashing
No ratings yet
Hashing
11 pages
ds-5 Removed
No ratings yet
ds-5 Removed
16 pages
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
No ratings yet
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
32 pages
22-M4-File Organization - Single Level Indexing-09!09!2024
No ratings yet
22-M4-File Organization - Single Level Indexing-09!09!2024
12 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing
No ratings yet
Hashing
34 pages
Hashing Function
No ratings yet
Hashing Function
14 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Report Station 1 2
No ratings yet
Report Station 1 2
32 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
10 pages
Hashing
No ratings yet
Hashing
8 pages
Hashing
No ratings yet
Hashing
16 pages
DBMS
No ratings yet
DBMS
12 pages
NOAA Manual-Eng
No ratings yet
NOAA Manual-Eng
34 pages
Group Assignment - On - Hashing in DBMS
No ratings yet
Group Assignment - On - Hashing in DBMS
4 pages
PWC Mail - Re - Sign Off For Pending RICEFW's - Barcode Interface
No ratings yet
PWC Mail - Re - Sign Off For Pending RICEFW's - Barcode Interface
2 pages
Hashing
No ratings yet
Hashing
8 pages
Mod 5
No ratings yet
Mod 5
13 pages
MSTD - BillingSoftware - User Manual Ver 1.01
No ratings yet
MSTD - BillingSoftware - User Manual Ver 1.01
52 pages
4.5 Static Hashing, Dynamic Hashing
No ratings yet
4.5 Static Hashing, Dynamic Hashing
8 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
Hash Function
No ratings yet
Hash Function
9 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
4 pages
Datasheet
No ratings yet
Datasheet
52 pages
Hashing
No ratings yet
Hashing
8 pages
DBMS Unit-3 Notes
No ratings yet
DBMS Unit-3 Notes
9 pages
Hash Dbms
No ratings yet
Hash Dbms
5 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
3 pages
Jagannath University Paper
No ratings yet
Jagannath University Paper
2 pages
10.dynamic Hashing
No ratings yet
10.dynamic Hashing
4 pages
Java Most Completed 8kyu Only
No ratings yet
Java Most Completed 8kyu Only
19 pages
There Are Two Types of Hashing
No ratings yet
There Are Two Types of Hashing
2 pages
E Ds Extendiblehashing
No ratings yet
E Ds Extendiblehashing
3 pages
Lab 6
No ratings yet
Lab 6
4 pages
Step Broucher
No ratings yet
Step Broucher
16 pages
Attia Elmoslimany ElKeyi JCM 2012
No ratings yet
Attia Elmoslimany ElKeyi JCM 2012
15 pages
Pacs Troubleshooting Guide
No ratings yet
Pacs Troubleshooting Guide
11 pages
Translate English To Khmer - Google Search 3
No ratings yet
Translate English To Khmer - Google Search 3
1 page
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet

CO3 Notes Hashing

Uploaded by

CO3 Notes Hashing

Uploaded by

CO-3

Handling of Bucket Overflows

You might also like