Linear Hashing: Historical Background
Linear Hashing: Historical Background
Historical Background
Linear Hashing
A hash table is an in-memory data structure that
Donghui Zhang1 , Yannis Manolopoulos2 ,
associates keys with values. The primary opera-
Yannis Theodoridis3 , and Vassilis J. Tsotras4
1 tion it supports efficiently is a lookup: given a
Paradigm4, Inc., Waltham, MA, USA
2 key, find the corresponding value. It works by
Aristotle University, Thessaloniki, Greece
3 transforming the key using a hash function into a
University of Piraeus, Piraeus, Greece
4 hash, a number that is used as an index in an array
University of California-Riverside, Riverside,
to locate the desired location where the values
MA, USA
should be. Multiple keys may be hashed to the
same bucket, and all keys in a bucket should be
searched upon a query. Hash tables are often used
Definition to implement associative arrays, sets and caches.
Like arrays, hash tables have O(1) lookup cost on
Linear Hashing is a dynamically updateable disk-
average.
based index structure which implements a hash-
ing scheme and which grows or shrinks one
bucket at a time. The index is used to support
Foundations
exact match queries, i.e., find the record with
a given key. Compared with the BC-tree index
The Linear Hashing scheme was introduced
which also supports exact match queries (in log-
by [2].
arithmic number of I/Os), Linear Hashing has
better expected query cost O(1) I/O. Compared
Initial Layout
with Extendible Hashing, Linear Hashing does
The Linear Hashing scheme has m initial buckets
not use a bucket directory, and when an overflow
labeled 0 through m 1, and an initial hashing
occurs, it is not always the overflown bucket that
function h0 (k)Df(k) % m that is used to map any
is split. The name Linear Hashing is used because
key k into one of the m buckets (for simplicity
the number of buckets grows or shrinks in a
assume h0 (k)Dk% m), and a pointer p which
linear fashion. Overflows are handled by creating
points to the bucket to be split next whenever an
a chain of pages under the overflown bucket. The
overflow page is generated (initially p D 0). An
hashing function changes dynamically and at any
example is shown in Fig. 1.
given instant there can be at most two hashing
functions used by the scheme.
Cross-References
Extendible Hashing
Hashing
Hash-based Indexing