0% found this document useful (0 votes)

51 views91 pages

Lec 20-24

slide

Uploaded by

Vishal Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views91 pages

Lec 20-24

slide

Uploaded by

Vishal Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Indexing Structures

Professor Navneet Goyal

Department of Computer Science & Information Systems
BITS, Pilani
Topics
n Basic Concepts
n Classification of Indices
n Tree-based Indexing
n Hash-based Indexing
n Comparison

© Prof. Navneet Goyal, BITS, Pilani

Basic Concepts
n Indexing mechanisms used to speed up
access to desired data.
n E.g., index at the end of a book
n E.g., author catalog in library
n Search Key – attribute(s) used to look
up records in a file
n Multiple indexes for a single file
n An index file consists of records (called
index entries) of the form
search-key pointer

© Prof. Navneet Goyal, BITS, Pilani

Basic Concepts
n Index files are typically much smaller than
the original file
n Kinds of indices:
n Ordered indices: search keys are stored in
sorted order (single-level)
n Tree indices: search keys are arranged in a tree
(multi-level)
n Hash indices: search keys are distributed
uniformly across “buckets” using a “hash
function”

© Prof. Navneet Goyal, BITS, Pilani

Classification
n Single-level vs. Multi-level
n Dense vs. Sparse
n Static vs. Dynamic

© Prof. Navneet Goyal, BITS, Pilani

Choosing an Index
n No single indexing structure
suitable for all database
applications
n Can be chosen based on the
following factors:
n Access types supported efficiently. E.g.,
• records with a specified value in the attribute
• or records with an attribute value falling in a specified
range of values.
n Access time
n Insertion time
n Deletion time
n Space overhead
© Prof. Navneet Goyal, BITS, Pilani
Primary Index
n Example of an ordered index
n In an ordered index, index entries
are stored sorted on the search key
value. E.g., topics in book index.
n Requires relation to be sorted on
the search key
n Search key should be a ‘KEY’ of the
relation
n If not, then it is called a Clustering
Index
© Prof. Navneet Goyal, BITS, Pilani
Primary Index
10
20
10
30
30
40
50
50
70
60
90
70
80

90
100

© Prof. Navneet Goyal, BITS, Pilani

Primary Index
n Primary index requires that the ordering field
of the data file have a distinct value for each
record.
n Primary index is sparse
n Contains as many records as there are
blocks* in the data file (there are 5 blocks in
this example and each block can hold only 2
records).
n The first record in each block of the data file is
called anchor record of the block, or simply
block anchor.
n There can be only one primary index on a
table
© Prof. Navneet Goyal, BITS, Pilani
Clustering Index
1
1
1
2

1
2
2
3
3 3
4 3
5
3
3
OPTION 1 4
5

© Prof. Navneet Goyal, BITS, Pilani

Clustering
Index

Figure taken from Elmasiri, 4e

© Prof. Navneet Goyal, BITS, Pilani

Clustering
Index

Figure taken from Elmasiri, 4e

© Prof. Navneet Goyal, BITS, Pilani

Clustering Index
n Data file is sorted on a non-key field
n Retrieves cluster of records for a
given search key
n Clustering index is always sparse

© Prof. Navneet Goyal, BITS, Pilani

Secondary Index
(key)
5
1
2
2 7
3 1

4
4
5
3
6
8
7 6
8

© Prof. Navneet Goyal, BITS, Pilani

Secondary
Index (key)

Figure taken from Elmasiri, 4e

© Prof. Navneet Goyal, BITS, Pilani

Secondary Index
(Non-key)
n Option 1 is to include several index entries
with the same index field value- one for each
record
n This would be a dense index
n Option 2 is to have variable length records
for the index entries, with a repeating field
for the pointer-one pointer to each block
that contains a record with matching
indexing field value.
n This would be a non-dense index.

© Prof. Navneet Goyal, BITS, Pilani

Secondary Index
(Non-key)
1
Emp# SSN Name Dept # DOB SALARY OPTION 1
3 2
5 3
1 3
3 3
2 3
3 4
4 5
5 5
3
1 B1 (1)
2 B2 (1)
3 B3 (1), B3 (2), B3 (3), B3 (4)
4 B4 (1)
OPTION 2 5 B5 (1)

© Prof. Navneet Goyal, BITS, Pilani

Secondary
Index
(Non-key)

n Option 3 is most
commonly used
n Record Pointers
n Implemented using
one level of
indirection so that
index entries are of
fixed length and
have unique field
values
Figure taken from Elmasiri, 4e
© Prof. Navneet Goyal, BITS, Pilani
Types of Single-level
Indexes

Ordering Nonordering
Field Field
Key Field Primary Secondary Index
Index (key)
Nonkey Clustering Secondary Index
Field Index (nonkey)

© Prof. Navneet Goyal, BITS, Pilani

Example 1: Primary Index
EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )
Suppose that:
record size R=100 bytes
block size B=1024 bytes
r=30000 records
Then, we get:
blocking factor Bfr= B div R= 1024 div 100= 10 records/block
number of file blocks b= (r/Bfr)= (30000/10)= 3000 blocks
For an index on the SSN field, assume the field size VSSN=9 bytes,
assume the block pointer size PR=6 bytes. Then: index entry size Ri=(VSSN+
PR)=(9+6)=15 bytes
index blocking factor Bfri= B div Ri= 1024 div 15 = 68 entries/block
number of index blocks bi= (ri/ Bfri)= (3000/68)= 45 blocks
binary search needs log2bi= log245= 6 block accesses (+ 1 for the data block)
This is compared to binary search cost of:
log 2 b = log 2 3000 = 12 block accesses
© Prof. Navneet Goyal, BITS, Pilani
Example 2: Secondary Index
-Non Key Field
EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )
Suppose that:
record size R=100 bytes
block size B=1024 bytes
r=30000 records
Then, we get:
blocking factor Bfr= B div R= 1024 div 100= 10 records/block
number of file blocks b= (r/Bfr)= (30000/10)= 3000 blocks
For an index on the Job field, assume the field size VJOB=9 bytes,
assume the block pointer size PR=6 bytes. Then: index entry size Ri=(VJOB+
PR)=(9+6)=15 bytes
index blocking factor Bfri= B div Ri= 1024 div 15 = 68 entries/block
number of index blocks bi= (ri/ Bfri)= (30000/68)= 442 blocks
binary search needs log2bi= log2442= 9 block accesses (+ 1 for the data block)
This is compared to the linear search cost of:
b/2 = 3000/2 = 1500 block accesses
© Prof. Navneet Goyal, BITS, Pilani
Properties of Single-
level Indexes
Type of Number of Index Entries Dense or Block
Index Sparse Anchoring

Primary No. of blocks in data file Sparse Yes

Clustering No. of distinct index field values Sparse Yes/no*
Secondary Number of records in data file Dense No
(key)
Secondary No. of records** Dense No
(nonkey) No. of distinct index field Sparse
values***

* Yes if every distinct value of the ordering field starts from a new block; no otherwise
** For Option 1
*** For Options 2 & 3
© Prof. Navneet Goyal, BITS, Pilani
Multilevel Indexes
n In all single level indexes, the index file
is always sorted on the search key
n For an index with bi blocks, a binary
search requires approximately (log2 bi)
block accesses
n The idea behind multilevel indexes is to
reduce the part of the index file that we
continue to search by a factor of bfri
(blocking factor)
© Prof. Navneet Goyal, BITS, Pilani
Multilevel Indexes
n Blocking Factor=block size in bytes/record
size in bytes
n bfri, the blocking factor for the index, is
always greater than 2
n Search space is reduced much faster
n bfri is called the fan-out (fo) for the
multilevel index
n Searching a multilevel index requires (logfo
bi) block accesses, which is a smaller
number that for binary search if fo>2.

© Prof. Navneet Goyal, BITS, Pilani

Multilevel Indexes
n MLI considers the index file (first or
base level of MLI) as an ordered file
with distinct values
n We can create a PI for the first level
n Index to the first level is called the 2nd
level of the MLI
n 2nd level is a PI, so block anchors can
be used
n 2nd level has one record for each block
of the 1st level
© Prof. Navneet Goyal, BITS, Pilani
Multilevel Indexes
n Blocking factor for the 2nd level & all
subsequent levels is the same as that
of the 1st level index
n If the 1st level has r1 entries, & the
blocking factor is bfri =fo, then the 1st
level needs r1/fo blocks
n r2=r1/fo
n The same process can be repeated for
the second level & we wet r3 = r2/fo

© Prof. Navneet Goyal, BITS, Pilani

Multilevel Indexes
n Note that we require the 2nd level only
if the 1st level needs more than 1 block
of disk space
n Similarly, we require the 3rd level only
if the 2nd level needs more than 1 block
of disk space
n Repeat the preceding process until all
the entries of some index level t fit in a
single block

© Prof. Navneet Goyal, BITS, Pilani

Multilevel Indexes
n 1 <= r1/(fo)t
n An MLI with r1 1st level entries, will
have approx. t levels, where
t= log fo r1
n MLI can be used for any type of index,
primary, clustering, or secondary, as
long as the 1st level index has distinct
search key values and fixed-length
entries

© Prof. Navneet Goyal, BITS, Pilani

Multilevel
Indexes

Figure taken from Elmasiri, 4e

© Prof. Navneet Goyal, BITS, Pilani
Example: MLI
EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )
Suppose that:
record size R=100 bytes
block size B=1024 bytes
r=30000 records
Dense secondary index of Ex. 2 is converted into an MLI
index blocking factor Bfri= B div Ri= 1024/15=68 entries/block
number of 1st level index blocks bi= (ri/ Bfri)= (30000/68)= 442
number of 2nd level index blocks = 442/68 = 7 &
number of 3rd level index blocks = 7/68 = 1
number of block accesses = t+1=3+1 = 4 block accesses
This is compared to 10 block accesses using dense secondary
index © Prof. Navneet Goyal, BITS, Pilani
Multi-Level Indexes
n Such a multi-level index is a form
of search tree ; however, insertion
and deletion of new index entries
is a severe problem because every
level of the index is an ordered
file.

© Prof. Navneet Goyal, BITS, Pilani

Multiple-key Access
n Implicit assumption that the index is
created on only one attribute
n In many retrieval & update requests,
multiple attributes are involved
n Option 1: Multiple such indexes on a relation
can be used to answer queries
n Option 2: Have a composite search key

© Prof. Navneet Goyal, BITS, Pilani

Multiple-key Access
n Example: List all employees of DNO=4 with
AGE=59
n Assume DNO has an index, but age does not
n Assume AGE has an index, but DNO does not
n If both DNO and AGE have indexes. Both
would give a set of records or a set of
pointers (to blocks or records) as result.
Intersection of these records or pointers
yields those records that satisfy both
conditions, or the blocks in which records
satisfying both conditions are located
© Prof. Navneet Goyal, BITS, Pilani
Multiple-key Access
n All the above alternatives give the correct
result
n IF the set of records that satisfy each
condition ( DNO=4 or AGE=59) individually
are large, yet only a few records satisfy the
combined condition, then none of the above
technique is efficient.
n Try having a composite search key
<DNO, AGE> or <AGE, DNO>

© Prof. Navneet Goyal, BITS, Pilani

Index Update
n Insert
n Delete
n Update (first delete & then insert)
n Compare single-level & ML indexes
n DO IT YOURSELF!!!

© Prof. Navneet Goyal, BITS, Pilani

Indexed Sequential File
n Common file organization used in data
processing
n Ordered file with a ML primary index on its
ordering key field
n Indexed sequential file
n Used in large no. of early IBM systems
n Insertions handles by some form of
overflow file that is merged periodically
with the data file
n Index is recreated during file reorganization

© Prof. Navneet Goyal, BITS, Pilani

IBM’s ISAM
n Indexed Sequential Access Method
n 2-level index
n Closely related to the organization of the
disk

© Prof. Navneet Goyal, BITS, Pilani

Tree-based Indexing
n ISAM & B,B+-trees
n Based on tree data structures
n Provide:
n Efficient support for range queries
n Efficient support for insertion & deletion
n Support for equality queries (not as
efficient as hash-based indexes)
n ISAM is static, whereas B,B+-tree are
dynamic, adjusts gracefully under
inserts and deletes
© Prof. Navneet Goyal, BITS, Pilani
Search Tree
n Search tree is a special type of tree
that is used to guide the search for a
record, given the search key
n MLI is a variation of the search tree
A node in a search tree with pointers to subtrees below it

© Prof. Navneet Goyal, BITS, Pilani

Search Tree

A search tree of order p = 3

© Prof. Navneet Goyal, BITS, Pilani

Search Tree
n Each key value in the tree is
associated with a pointer to the
record in the data file having that
value.
n Pointer could be to the disk block
containing the record
n Search tree itself can be stored on the
disk by assigning each tree node to a
disk block

© Prof. Navneet Goyal, BITS, Pilani

Search Tree
Constraints:
n Search keys within a node is ordered
(increasing from L to R)
n For all values X in the subtree pointed
to by Pi, we have

i=1 1<i<q i=q

© Prof. Navneet Goyal, BITS, Pilani

Search Tree
n Algorithms for inserts and deletes do not
guarantee that a search tree is balanced
n Keeping a search tree balanced HELPS!!
n Keeping search tree balanced yields a
uniform search speed regardless of the
value of the search key
n Deletions may lead to nearly empty nodes,
thus wasting space and increasing no. of
levels

© Prof. Navneet Goyal, BITS, Pilani

B-Tree
n B-tree has additional constraints that ensure
that tree is always balanced and that the
space wasted by deletion is never excessive
n Algorithms for inserts and deletes are more
complex in order to maintain these
additional constraints
n They are mostly simple
n Become complicated only when inserts and
deletes lead to splitting and merging of
nodes respectively

© Prof. Navneet Goyal, BITS, Pilani

B-Tree
n One or two levels of index are often
very helpful in speeding up queries
n More general structure that is used in
commercial systems
n This family of data structures is called
B-trees, & the particular variant that
is often used in known as B+-tree

© Prof. Navneet Goyal, BITS, Pilani

B-Tree: Characteristics
n Automatically maintains as many
levels of index as is appropriate for
the size of the file being indexed
n Manages space on the blocks they use
so that every block is between half
full & completely full
n Each node corresponds to a disk block

© Prof. Navneet Goyal, BITS, Pilani

Structure of B-Trees
n Balanced tree
n All paths from the root to a leaf have the
same length
n Three layers in a B-tree
n Root
n Intermediate layer
n Leaves
n Parameter n is associates with each B-tree
n Each node will have n search keys & n+1
pointers
n Pick n to be as large as will allow n+1
pointers & n keys to fit in one block
© Prof. Navneet Goyal, BITS, Pilani
Example
n Block size = 4096 bytes
n Search key – 4 byte integer
n Pointer - 8 bytes
n Assume no header information kept in block
n We choose n such that
4n + 8(n+1) <= 4096
n n=340
n Block can hold 340 keys & 341 pointers

© Prof. Navneet Goyal, BITS, Pilani

B-Trees & B+-Trees
n An insertion into a node that is not full
is quite efficient; if a node is full the
insertion causes a split into two nodes
n Splitting may propagate to other tree
levels
n A deletion is quite efficient if a node
does not become less than half full
n If a deletion causes a node to become
less than half full, it must be merged
with neighboring nodes

© Prof. Navneet Goyal, BITS, Pilani

Difference between B-tree
and B+-tree
n In a B-tree, pointers to data records
exist at all levels of the tree

n In a B+-tree, all pointers to data

records exists at the leaf-level nodes

n A B+-tree can have less levels (or

higher capacity of search values) than
the corresponding B-tree

© Prof. Navneet Goyal, BITS, Pilani

Rules for B-Trees
n At the root, there are at least two used
pointers. All pointers point to the B-tree
blocks at the lower level
n At a leaf, the last pointer points to the next
leaf block to the right, i.e., to the block with
next higher keys
n Among the other n pointers in a leaf, at least
(n+1)/2 are used to point to data records
and unused pointers can be thought of as
null and do not point anywhere
n The ith pointer, if it is used, points to a record
with the ith key
© Prof. Navneet Goyal, BITS, Pilani
Rules for B-Trees
n At any interior node, all the n+1 pointers can be used
to point to B-tree blocks at the next lower level
n At least (n+1)/2 of them are actually used
n If j pointers are used, then there will be j-1 keys, k1,
k2,…., kj-1.
n The 1st pointer points to a part of the B-tree where
some of the records with keys less than k1 will be
found.
n The 2nd pointer goes to that part of the tree where all
the records with keys that are at least k1, but less
than k2 will be found, and so on
n Finally, the jth pointer gets us to that part of the B-
tree where some of the records with keys greater
than or equal to kj-1 are found.
© Prof. Navneet Goyal, BITS, Pilani
Rules for B-Trees
n Note that some of the records with
keys far below k1 or far above kj-1
may not be reachable from this block
at all, but will be reached via another
block at the same level.
n The nodes at any level, left to right,
contain keys in non-decreasing order.

© Prof. Navneet Goyal, BITS, Pilani

Hash-based Indexing
n Intuition behind hash-based indexes
n Good for equality searches
n Useless for range searches
n Static hashing
n Dynamic hashing
n Extendible hashing
n Linear hashing

© Prof. Navneet Goyal, BITS, Pilani

Static Hashing
n A bucket is a unit of storage containing one or more
records (a bucket is typically a disk block).
n In a hash file organization we obtain the bucket of a
record directly from its search-key value using a hash
function.
n Hash function h is a function from the set of all search-
key values K to the set of all bucket addresses B.
n Hash function is used to locate records for access,
insertion as well as deletion.
n Records with different search-key values may be
mapped to the same bucket; thus entire bucket has to
be searched sequentially to locate a record.

© Prof. Navneet Goyal, BITS, Pilani

Static Hashing
Hash file organization of account file, using branch_name
as key
n There are 10 buckets,
n The binary representation of the ith
character is assumed to be the integer i.
n The hash function returns the sum of
the binary representations of the
characters modulo 10
n E.g. h(Perryridge) = 5 h(Round Hill) = 3
h(Brighton) = 3

© Prof. Navneet Goyal, BITS, Pilani

Static Hashing

© Prof. Navneet Goyal, BITS, Pilani

Hash Functions
n Worst hash function maps all search-key values
to the same bucket; this makes access time
proportional to the number of search-key values
in the file.
n An ideal hash function is uniform, i.e., each
bucket is assigned the same number of search-
key values from the set of all possible values.
n Ideal hash function is random, so each bucket
will have the same number of records assigned to
it irrespective of the actual distribution of search-
key values in the file.
n Typical hash functions perform computation on
the internal binary representation of the search-
key.
© Prof. Navneet Goyal, BITS, Pilani
Bucket Overflow
n Bucket overflow can occur because of
n Insufficient buckets
n Skew in distribution of records. This can
occur due to two reasons:
• multiple records have same search-key value
• chosen hash function produces non-uniform
distribution of key values
n Although the probability of bucket
overflow can be reduced, it cannot be
eliminated; it is handled by using
overflow buckets.

© Prof. Navneet Goyal, BITS, Pilani

Bucket Overflows

n Overflow chaining – the overflow

buckets of a given bucket are
chained together in a linked list.
n Above scheme is called closed
hashing.

© Prof. Navneet Goyal, BITS, Pilani

Bucket Overflows

© Prof. Navneet Goyal, BITS, Pilani

Hash Indexes
n Hashing can be used not only for file
organization, but also for index-structure
creation.
n A hash index organizes the search keys, with
their associated record pointers, into a hash file
structure.
n Strictly speaking, hash indices are always
secondary indices
n if the file itself is organized using hashing, a separate
primary hash index on it using the same search-key is
unnecessary.
n However, we use the term hash index to refer to both
secondary index structures and hash organized files.

© Prof. Navneet Goyal, BITS, Pilani

Example of Hash Index

© Prof. Navneet Goyal, BITS, Pilani

Deficiencies of Static
Hashing
n Databases grow with time. If initial number of
buckets is too small, performance will degrade
due to too much overflows.
n If file size at some point in the future is
anticipated and number of buckets allocated
accordingly, significant amount of space will be
wasted initially.
n If database shrinks, again space will be
wasted.
n One option is periodic re-organization of the
file with a new hash function, but it is very
expensive.
These problems can be avoided by using techniques
that allow the number of buckets to be modified
dynamically.
© Prof. Navneet Goyal, BITS, Pilani
Dynamic Hashing

n Long overflow chains can develop

and degrade performance.
n Extendible and Linear Hashing:
Dynamic techniques to fix this
problem.

© Prof. Navneet Goyal, BITS, Pilani

Extendible Hashing
n Insert new data entry to a full bucket
n Add overflow page OR
n Reorganize file using double the no.
of buckets & redistributing the
entries
n Drawback – entire file has to be read
& twice as many pages have to be
written

Extendible Hashing
n Idea: Use directory of pointers to
buckets, double # of buckets by
doubling the directory, splitting
just the bucket that overflowed!
n Directory much smaller than file, so
doubling it is much cheaper. Only
one page of data entries is split.
No overflow page!
n Trick lies in how hash function is
adjusted!

Extendible Hashing
LOCAL DEPTH 2
Bucket A
n Directory is array of size GLOBAL DEPTH 4* 12* 32* 16*

4.
n To find bucket for r, take 2 2
last `global depth’ # bits Bucket B
of h(r); we denote r by 00 1* 5* 21* 13*

h(r). 01
n If h(r) = 5 = binary 101, 10 2
it is in bucket pointed to Bucket C
by 01. 11 10*

2
DIRECTORY
Bucket D
15* 7* 19*

Extendible Hashing
vInsert: If bucket is full, split it (allocate
new page, re-distribute).

v If necessary, double the directory. (As

we will see, splitting a bucket does not
always require doubling; we can tell by
comparing global depth with local depth
for the split bucket.)

Insert h(r)=20
(Causes Doubling)
LOCAL DEPTH 2 3
LOCAL DEPTH
Bucket A
GLOBAL DEPTH 32*16* 32* 16* Bucket A
GLOBAL DEPTH

2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21*13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
© Prof. Navneet Goyal, BITS, Pilani
Points to Note
n 20 = binary 10100. Last 2 bits (00) tell us r
belongs in A or A2. Last 3 bits needed to tell
which.
n Global depth of directory: Max # of bits needed to tell
which bucket an entry belongs to.
n Local depth of a bucket: # of bits used to determine if
an entry belongs to this bucket.
n When does bucket split cause directory
doubling?
n Before insert, local depth of bucket = global depth.
Insert causes local depth to become > global depth;
directory is doubled by copying it over and `fixing’
pointer to split image page. (Use of least significant
bits enables efficient doubling via copying of
© Prof. Navneet Goyal, BITS, Pilani
Points to Note

n Does splitting a bucket always

necessitates a directory doubling?
n Try inserting 9*
n Belongs to bucket B, which is already
full
n Split the bucket B and using directory
elements 001 & 101 to point to the
bucket & its split image

Points to Ponder

n Why use LSB, why not MSB?

n What if a bucket becomes empty?

Directory Doubling
Why use least significant bits in directory?
➳ Allows for doubling via copying!
6 = 110 3 6 = 110 3
000 000
001 100
2 2
010 010
1 00 1 00
011 110
0 6* 01 100 0 10 001
1 10 6* 1 6* 01
101 101
6*
11 110 6* 11
011 6*
111 111

Least Significant vs. Most Significant

© Prof. Navneet Goyal, BITS, Pilani
Extensible hashing: deletion
Two options:
n No merging of blocks
n Merge blocks and cut directory if possible
n Example: Run through the insert example in
reverse!

75
Comments on Extendible
Hashing
n If directory fits in memory, equality search
answered with one disk access; else two.
n 100MB file, 100 bytes/rec, 4K pages contains 1,000,000
records (as data entries) and 25,000 directory elements;
chances are high that directory will fit in memory.
n Directory grows in spurts, and, if the distribution of hash
values is skewed, directory can grow large.
n Multiple entries with same hash value cause problems!

Summary of extensible hashing

• Good: can handle growing files

• with less wasted space
• with no full reorganizations
• Bad:
• Indirection: Not bad if directory in
memory
• Directory size could be large

77
Linear Hashing
n Linear hashing is an alternative mechanism
which avoids these disadvantages at the
possible cost of more bucket overflows
n Motivation: Ext. Hashing uses a directory
that grows by doubling… Can we do
better? (smoother growth)
n LH: split buckets from left to right,
regardless of which one overflowed
(simple, but it works!!)

Linear Hashing
n Does not require a directory
n LH provides a way to control
chains from growing too large
on average
n It accomplishes this by
expanding address space
gracefully, one chain at a time
n Achieved using chain splitting

Linear Hashing: Example
Suppose M=3, three buckets
[0], [1], and [2]
[1] = {106, 217, 151, 418, 379}
How can we enlarge space for this file?
Three issues with chain splitting:
n How can a chain be split?
n Which chain should be split?
n When should a chain be split?
© Prof. Navneet Goyal, BITS, Pilani
Linear Hashing: Example
n How can a chain be split?
n Split a chain [m] evenly into two chains using a
mod function
n Since we want to expand the address space, the
argument for a hash fn. need not be M
n Use 2M to rehash the records in [m]
n On average, mod 2M will hash half of the records
to chain [m], and the other half to chain [M+m]
[1] = {106, 217, 151, 418, 379}
Rehash using mod 2M(=6)
[1] = {217, 151, 379}
[4] = {106, 418}
© Prof. Navneet Goyal, BITS, Pilani
Linear Hashing: Example
n Which chain to be split?
n Following possibilities:
• Split chain [0]: this will create chain [3]
• Split chain [1]: this will create chain [4]
• Split chain [2]: this will create chain [5]
n Linear hashing gets its name from the fact that
chains are designated linearly for splitting
n In the example, we will first split the chain [0],
then [1], and then [2]
n Note that this is independent of where the
insertions are taking place

Linear Hashing: Example
n Which chain to be split?
0 1 2

M=3
mod 3
0 1 2 3

mod 6 mod 3 mod 6

Linear Hashing: Example
n Which chain to be split?
0 1 2

M=3
mod 3
0 1 2 3 4

mod 6 mod 3 mod 6

Linear Hashing: Example
n Which chain to be split?
0 1 2

M=3
mod 3
0 1 2 3 4 5

mod 6 mod 6

Linear Hashing: Example
n Which chain to be split?
0 1 2

M=3
mod 3
0 1 2 3 4 5

M=6

mod 6

Linear Hashing: Example
Initially: h(x) = x mod N (N=4 here)
Assume 3 records/bucket
Insert 17 = 17 mod 4 1
Bucket id 0 1 2 3
13
4 8 5 9 6 7 11

Linear Hashing: Example
Initially: h(x) = x mod N (N=4 here)
Assume 3 records/bucket Overflow for Bucket 1
Insert 17 = 17 mod 4 1
Bucket id 0 1 2 3
13
4 8 5 9 6 7 11

Split bucket 0, anyway!!

© Prof. Navneet Goyal, BITS, Pilani
Linear Hashing: Example
To split bucket 0, use another function
h1(x):
h0(x) = x mod N , h1(x) = x mod (2*N)
Split pointer
17
0 1 2 3
13
4 85 9 6 7 11

Q&A
Thank You

Lec06-Indexing in Dbms
No ratings yet
Lec06-Indexing in Dbms
21 pages
DBMS - File Organization, Indexing and Hashing Notes
No ratings yet
DBMS - File Organization, Indexing and Hashing Notes
19 pages
20-M4-File Organization - Single Level Indexing-09-09-2024
No ratings yet
20-M4-File Organization - Single Level Indexing-09-09-2024
28 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
Index 2
No ratings yet
Index 2
24 pages
Indexation 1
No ratings yet
Indexation 1
24 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
Indexing
No ratings yet
Indexing
53 pages
100 C Programming Exercises
82% (11)
100 C Programming Exercises
4 pages
Indexing
No ratings yet
Indexing
89 pages
Elmasri - 6e - Ch18
No ratings yet
Elmasri - 6e - Ch18
53 pages
Ch17Notes Indexing Structures For Files
No ratings yet
Ch17Notes Indexing Structures For Files
39 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
Chapter - 3 - Indexing Structures For Files
No ratings yet
Chapter - 3 - Indexing Structures For Files
83 pages
Index Method1
No ratings yet
Index Method1
24 pages
Index Structures
No ratings yet
Index Structures
34 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Indexing
No ratings yet
Indexing
24 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Indexing Structures: Professor Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
No ratings yet
Indexing Structures: Professor Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
87 pages
Lec 09
No ratings yet
Lec 09
52 pages
Week 15 Physical Database Design Index - CH 17 Updated
No ratings yet
Week 15 Physical Database Design Index - CH 17 Updated
35 pages
Indexing
No ratings yet
Indexing
27 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Primary Indexing
No ratings yet
Primary Indexing
7 pages
Week 7 - Indexing Structures
No ratings yet
Week 7 - Indexing Structures
25 pages
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
48 pages
Indexing PDF
100% (1)
Indexing PDF
6 pages
Index 3
No ratings yet
Index 3
21 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
25 pages
9 Files, Indices and Database Tuning
No ratings yet
9 Files, Indices and Database Tuning
17 pages
Indexing Dbms
No ratings yet
Indexing Dbms
22 pages
File Org & Indexing - DPP 02
No ratings yet
File Org & Indexing - DPP 02
5 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
CO3 Notes Indexing
No ratings yet
CO3 Notes Indexing
11 pages
Fundamentals of Data Structures - MCQ - I
100% (1)
Fundamentals of Data Structures - MCQ - I
26 pages
02 - Indices
No ratings yet
02 - Indices
208 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Indexing
No ratings yet
Indexing
6 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
I3306-chap2-TD2-EN - Fa23-24-Solution
No ratings yet
I3306-chap2-TD2-EN - Fa23-24-Solution
6 pages
SingleLevelIndexing Examples
No ratings yet
SingleLevelIndexing Examples
24 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
23 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
UNIT-2 Data Models
No ratings yet
UNIT-2 Data Models
77 pages
Query Builder
No ratings yet
Query Builder
58 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
30 pages
Data Structure Database Table Columns of A Database Table Lookups
No ratings yet
Data Structure Database Table Columns of A Database Table Lookups
3 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
Coding Patterns
100% (1)
Coding Patterns
28 pages
Data Structure MCQ Question
0% (1)
Data Structure MCQ Question
6 pages
SS ZG518-L10
No ratings yet
SS ZG518-L10
28 pages
Indexing
No ratings yet
Indexing
8 pages
How Does Database Indexing Work
No ratings yet
How Does Database Indexing Work
4 pages
Steps To Create Tree Control
No ratings yet
Steps To Create Tree Control
30 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Data Structure and Algorithms: Sem. I
No ratings yet
Data Structure and Algorithms: Sem. I
13 pages
CS2202::Data Structures Course Prerequisites: Course Objectives
No ratings yet
CS2202::Data Structures Course Prerequisites: Course Objectives
11 pages
Python & Leetcode - The Ultimate Interview Bootcamp: Strings
No ratings yet
Python & Leetcode - The Ultimate Interview Bootcamp: Strings
3 pages
Artificial Intelligence 1
No ratings yet
Artificial Intelligence 1
205 pages
What Is An Index
No ratings yet
What Is An Index
4 pages
Unit 1
No ratings yet
Unit 1
72 pages
UNIT 5 AI Notes
No ratings yet
UNIT 5 AI Notes
26 pages
Visualization
No ratings yet
Visualization
15 pages
Blockchain Unit 1 Notes
No ratings yet
Blockchain Unit 1 Notes
25 pages
Java Programming For BSC It 4th Sem Kuvempu University
100% (1)
Java Programming For BSC It 4th Sem Kuvempu University
52 pages
IT202-DS-Unit 4 - Non-Linear-Data-Structures
No ratings yet
IT202-DS-Unit 4 - Non-Linear-Data-Structures
110 pages
Tutorals Exercises
No ratings yet
Tutorals Exercises
54 pages
Binary Search Tree PDF
No ratings yet
Binary Search Tree PDF
26 pages
Cse Odd Syllabus 2023-24
No ratings yet
Cse Odd Syllabus 2023-24
19 pages
LLM Powered Autonomous Agents - Lil'Log
No ratings yet
LLM Powered Autonomous Agents - Lil'Log
24 pages
Data Structure Lesson Plan
No ratings yet
Data Structure Lesson Plan
6 pages
Augmented Static BBST (Segment Tree) : July 2015
No ratings yet
Augmented Static BBST (Segment Tree) : July 2015
12 pages
Daa File
No ratings yet
Daa File
45 pages
Cst201 Data Structures, December 2020
No ratings yet
Cst201 Data Structures, December 2020
2 pages
H2 Comp Sci JPJC 2024 JC2 Prelim Paper 2 QP
No ratings yet
H2 Comp Sci JPJC 2024 JC2 Prelim Paper 2 QP
13 pages
CA229 Unit 04
No ratings yet
CA229 Unit 04
15 pages
DAA Q Bank CAE2
No ratings yet
DAA Q Bank CAE2
9 pages
Data Structure
No ratings yet
Data Structure
37 pages
IT4IT™ Foundation – Study Guide, 2nd Edition
From Everand
IT4IT™ Foundation – Study Guide, 2nd Edition
Andrew Josey
No ratings yet
IGNOU PGDCA All in One Previous Years Unsolved Papers
From Everand
IGNOU PGDCA All in One Previous Years Unsolved Papers
Manish Soni
No ratings yet