0% found this document useful (0 votes)
126 views48 pages

Direct File

The document discusses various techniques for handling collisions in hashing, including progressive overflow, linear probing, linear quotient hashing, and Brent's method. Progressive overflow uses linear probing to search for the next empty slot when a collision occurs. Linear quotient hashing reduces clustering by using a variable increment based on the key. Brent's method may move previously inserted records to new locations to reduce the average number of probes needed for retrieval.

Uploaded by

Punganuru Swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views48 pages

Direct File

The document discusses various techniques for handling collisions in hashing, including progressive overflow, linear probing, linear quotient hashing, and Brent's method. Progressive overflow uses linear probing to search for the next empty slot when a collision occurs. Linear quotient hashing reduces clustering by using a variable increment based on the key. Brent's method may move previously inserted records to new locations to reduce the average number of probes needed for retrieval.

Uploaded by

Punganuru Swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Direct

File
Organization
2006 Hakan Uraz - File Organization
1
Progressive Overflow
• In coalesced hashing, storage is needed for link
fields. When this storage is not available, we need
a convention for where to search next.
• Progressive overflow (linear probing) is one
convention. If a location is occupied, we look at
the next location to see if it is empty. Table is
circular. We continue until we find an empty slot
or we encounter the home address of the record a
second time (table is full).
• For retrieval, we follow the same process.
• Performance is poor for an unsuccessful search.
2006 Hakan Uraz - File Organization
2
Progressive Overflow - Example
Insert 27, 18, 29, 28, 39, 13, 16
Hash(key) = key mod 11

2006 Hakan Uraz - File Organization


3
Progressive Overflow - Example

2006 Hakan Uraz - File Organization


4
Progressive Overflow - Discussion
• Secondary clustering is the consequence of two
or more records following the same sequence or
probe addresses. It results in a bunching of records
within the table.
• Contrasts with primary clustering which occurs
when a large number of records have the same
home address.
• In the example, the secondary clustering was
caused by records with different home addresses.
2006 Hakan Uraz - File Organization
5
Progressive Overflow - Discussion
• Not practical for handling collisions. Poor
performance. The average number of probes to
retrieve each record in the file once is 2.3 probes (vs.
1.4 of coalesced hashing).
• The increment of one is a contributing factor in the
secondary clustering. We might be able to reduce
the secondary clustering if we varied the increment.
• Progressive overflow reduces to a sequential search.
The differences are progressive overflow uses a
variable starting point and does not need to search
the entire file (terminates on a null record).
2006 Hakan Uraz - File Organization
6
Progressive Overflow - Deletion
• To delete, an indicator called a
tombstone, in the location of the
deleted record is put.
• It tells us that additional records
may follow and to keep on
searching on a retrieval, or that the
location may be filled on a
subsequent insertion.
• On the left, the record (18) is
deleted.

2006 Hakan Uraz - File Organization


7
Progressive Overflow - Discussion
• A technique for speeding the insertion process
would be to keep a bit string in main memory.
A “one” in the bit string would indicate that
the corresponding location was occupied.
• On insertion we would only need to check the
bit string to find the first unoccupied location.
• This technique would not be applicable if
duplicate keys were possible.
• Also not appropriate if we had to check if an
item had been inserted previously.
2006 Hakan Uraz - File Organization
8
Use of Buckets
• Bucket (block or page) is a storage unit of multiple
records at one file address.
• Also the unit in which information is accessed and
transferred between storage devices.
• The number of records that may be stored in a bucket is
called the blocking factor. As the blocking factor
increases, the number of auxiliary storage accesses
decreases.
• Within a bucket, we need a means of separating the
individual records. We can achieve this by knowing the
record length for fixed-length records, or by placing a
special delimiter.
• Buckets are used to improve any collision res. methods.
2006 Hakan Uraz - File Organization
9
Use of Buckets - Example
Blocking factor 2. Hash(key) = key mod 11
The keys are 27, 18, 29, 28, 39, 13, 16

2006 Hakan Uraz - File Organization


10
Use of Buckets

2006 Hakan Uraz - File Organization


11
Linear Quotient
• Here we use a variable increment instead of
a constant increment of one to reduce
secondary clustering.
• Here, the increment is a function of the key
being inserted which may be viewed as
another hashing function. So, referred to as
double hashing, H1 to get the home
address, H2 to get the increment.
2006 Hakan Uraz - File Organization
12
Linear Quotient
• Possibilities for H2:
– H2 = Quotient(Key / P) mod P
– H2’ = (Key mod (P – 2)) + 1
• H2 requires two divide operations. H2’ requires
only a single divide operation. H2’ is more
difficult for people to compute. So we use H2 in
the example.
• The home address for a record will then be
determined by the remainder of the key divided by
the table size and the increment for collision
resolution by the quotient of the same operation.
2006 Hakan Uraz - File Organization
13
Linear Quotient
If A, B, and C are synonyms at
r as illustrated, B and C will
usually have different
increments yielding different
probe chains.

2006 Hakan Uraz - File Organization


14
Linear Quotient

2006 Hakan Uraz - File Organization


15
Linear Quotient
Method requires a prime number table
size, for otherwise searching could cycle
through a subset of the table several
times.
In the example, locations 0, 2 and 4 were
occupied. If we attempted to insert a
record with home address of 0 and an
increment of 2, we would cycle through
the three occupied locations twice.

2006 Hakan Uraz - File Organization


16
Linear Quotient - Example
H1(key) = key mod 11
Keys = 27, 18, 29, 28, 39, 13, 16

2006 Hakan Uraz - File Organization


17
Linear Quotient – Example, Cont.
Alternatively: New address = (current address + increment) mod table_size

2006 Hakan Uraz - File Organization


18
Linear Quotient - Discussion
• The mechanism for determining an
unsuccessful search is the same as that for
progressive overflow. When we encounter
an empty probe location, we terminate the
search unsuccessfully.
• But an unsuccessful search requires fewer
probes with linear quotient since we are
more likely to encounter an empty location
to terminate the search as a result of
eliminating secondary clusters.
• Deleting a record from a table requires the
use of a tombstone.
2006 Hakan Uraz - File Organization
19
Linear Quotient - Discussion
• We can improve on linear quotient by observing that
the number of retrieval probes is dependent on the
placement of the records.
• E.g. if we insert a record with a key of 67, it has a
home address of 1 which is already filled with 39.
We then try locations 7, 2 8 and finally 3. Then it
requires five retrieval probes to find 67. What if 39
had not been stored at location 1?
• The next method moves an item already inserted in
the table if the move reduces the average number of
probes required to retrieve all the records.
2006 Hakan Uraz - File Organization
20
Brent’s Method
• We examined static methods. We now
examine several dynamic methods where an
item once stored may be moved.
• Requires additional processing when
inserting a record but reduce the number of
probes needed for retrieval (we insert once
but retrieve many times).

2006 Hakan Uraz - File Organization


21
Brent’s Method
• The primary probe chain of a record is the
sequence of locations visited during the insertion
or retrieval of the record
The primary probe chain for 39 is shown
on the left. p1 is the home address. Three
positions had to be visited.
• What if its home address were empty?
39 could have been inserted at its home
address which requires only 1 probe for
retrieval. We could make home address
available by moving what is stored there

2006 Hakan Uraz - File Organization


22
Brent’s Method
• Move 28 to a location
such that it could still
be retrieved (still want
to use the linear
quotient method.

• The sequence of positions visited when attempting to move a


record from the primary probe chain is called the secondary
probe chain.
• 28 will require one more probe for its retrieval but 39 will
require two fewer probes for a net reduction of one probe
achieved by moving 28.

2006 Hakan Uraz - File Organization


23
Brent’s Method
• The solid vertical line
represents the primary probe
chain (the addresses that would
be considered in storing an item
using linear quotient).
• The horizontal lines represent
the secondary probe chains (the
addresses that would be searched
in attempting to move an item
from a position along the primary
probe chain.

2006 Hakan Uraz - File Organization


24
Brent’s Method
• The q value along the primary probe chain is the
increment for the item being inserted.
• The qi’s along the secondary probe chains
represent the increments associated with the item
being moved.
• The subscript i gives the number of probes needed
to retrieve the item being inserted along its
primary probe chain.
• The subscript j gives the number of additional
probes needed to retrieve the item being moved
along its secondary probe chain.

2006 Hakan Uraz - File Organization


25
Brent’s Method
• To minimize the number of retrieval probes, we
want to minimize (i + j).
• In the case where i = j, we will arbitrarily
choose to minimize on i.
• Let s be the number of probes required to
retrieve an item if nothing is moved.
• We try all combinations of (i + j) < s such that
we minimize (i + j).
• On equality, since there would be no reduction
in the number of probes, no movement would
occur.
2006 Hakan Uraz - File Organization
26
Brent’s Method

2006 Hakan Uraz - File Organization


27
Brent’s Method

2006 Hakan Uraz - File Organization


28
Brent’s Method - Example
• Let’s insert 27, 18, 29, 28, 39, 13, 16.
Hash(key) = key mod 11
And the incrementing function is
i(key) = Quotient(Key / 11) mod 11

2006 Hakan Uraz - File Organization


29
Brent’s Method - Example
28 is moved from its original location
to location 8.
Note that we use the increment
associated with the item being moved
and not with the record being
inserted.

2006 Hakan Uraz - File Organization


30
Brent’s Method - Example

27 is moved to location 0.

• Some probes overall are saved by


making the moves.
• The average number of probes needed
to retrieve each record in the file once is
1.7 probes vs. the 1.9 of linear quotient
or the 2.3 of progressive overflow or the
1.4 for LISCH.

2006 Hakan Uraz - File Organization


31
Brent’s Method - Discussion
• Brent’s method only pertains to the
insertion process, the linear quotient is used
for retrieval.
• Because linear quotient is used for retrieval,
deleting a record would require the
placement of a tombstone.
• Insertion process can be modified. Then we
also eliminate the need to traverse the
primary probe chain when computing s.
This improves the insertion performance.
2006 Hakan Uraz - File Organization
32
Brent’s Method - Discussion
• We follow the processing order
represented by dashes.
• We move a record when a
move is advantageous. We are
interleaving the process of
computing s and the process of
testing for improvement and
moving previously stored
records where appropriate.
• This modification only changes
the mechanism for insertions. It
does not affect the locations of
the records or the number of
retrieval probes.
2006 Hakan Uraz - File Organization
33
Brent’s M.- Modified Process Example

Insertion of 16
using the
modified
process.

2006 Hakan Uraz - File Organization


34
Binary Tree
• Carries the concept in Brent’s method one step further and
move items from secondary and subsequent probe chains.
• Needs fewer retrieval probes than Brent’s method.
• Two choices at each probable storage address—continue to
the next address along the probe chain of the item being
inserted or move the item stored at that address to the next
position on its probe chain.

A left branch signifies the


continue option, a right branch
the move option.

2006 Hakan Uraz - File Organization


35
Binary Tree
• The decision tree is
generated in a breadth
first fashion from the
top down left to right.
• It is used only as a
control mechanism in
deciding where to store
an item and is not used
for storing records.

2006 Hakan Uraz - File Organization


36
Binary Tree
• A different binary tree is constructed for each
insertion. Encountering either an empty leaf
node in the binary tree or a full table
terminates the process.
• By moving items from secondary and
subsequent probe chains, we are achieving a
placement of records that will further reduce
the average number of retrieval probes (we
insert once but retrieve often).
• Like Brent’s method, processing is only for
insertion, retrieval is done with linear quotient.
2006 Hakan Uraz - File Organization
37
Binary Tree

2006 Hakan Uraz - File Organization


38
Binary Tree

2006 Hakan Uraz - File Organization


39
Binary Tree - Example
Keys to insert: 27, 18, 29, 28, 39, 13, 16, 41, 17, 19
Hash(key) = key mod 11
i(key) = Quotient(key / 11) mod 11
• 27 and 18 are inserted without difficulty.
• 29 causes a collision
• So we generate a binary tree to determine
which, if any, records to move.
• Location 9 is empty.
• The tree appears on the right.

2006 Hakan Uraz - File Organization


40
Binary Tree - Example
• The bold numbers represent locations in
the table and the values in parantheses are
the keys stored at those locations. The
underlined node is the root node.
• The tree tells us to attempt to move 29
into location 7.
• That location is occupied so we continue
along the primary probe chain until we
reach location 9. The table is shown on
the right.

2006 Hakan Uraz - File Organization


41
Binary Tree - Example

The path formed by the leftmost branch at each


level of the tree is equivalent to the primary
probe chain of the linear quotient method and of
Brent’s method

2006 Hakan Uraz - File Organization


42
Binary Tree - Example

2006 Hakan Uraz - File Organization


43
Binary Tree - Example

2006 Hakan Uraz - File Organization


44
Binary Tree - Example

2006 Hakan Uraz - File Organization


45
Binary Tree - Implementation
• Binary tree used in inserting items is a
complete binary tree.
• The n nodes of a complete binary tree
corresponds to the first n nodes of a full
binary tree numbered top down, left to
right.
• An advantage of it: can be implemented as
a sequential structure:
– lchild(i) = 2 * i
– rchild(i) = 2 * i + 1
2006– parent(i) = floor(i / 2) - File Organization
Hakan Uraz 46
Binary Tree - Implementation
• The depth of the binary tree is O(log n).
• After the tree gets to be a certain size,
secondary storage can be used (for example
for the bottom two levels of the tree)
• How to check for a full table?
– Keep a counter of the depth of the tree. When it
exceed log n table is full. But this requires the
generation of a possibly enormous tree.
– Better solution: Keep a counter of how many
records have already been inserted into table,
check that number before generating the tree.
2006 Hakan Uraz - File Organization
47
Binary Tree - Discussion
• As with Brent’s method, linear quotient is
used for retrieval. That is why we use the
increment associated with the item being
moved in the table. Otherwise it won’t be
possible to subsequently retrieve the record.
• Since linear quotient is used for retrieval, a
tombstone is used for the deleted record to
ensure retrieving later records in the chain.
2006 Hakan Uraz - File Organization
48

You might also like