Unit 5
Unit 5
Ch. Sandeep
II Year - II Sem
UNIT – VI
www.company.com
Sandeep Ch Dept of CSE
UNIT-VI
www.company.com
Sandeep Ch Dept of CSE
UNIT-VI
www.company.com
Sandeep Ch Dept of CSE
Overview of storage & indexing:
• A row in a relation is called as a tuple or a
record.
Topic 1 • A record is simply collection of fields.
File Organization • File is a sequence/collection of records.
• File stores a given relation or a given table.
• Each record in a file has a unique identifier
called ‘record id’ or ‘rid’
• On disk a data is stored in terms of files
• Indexing is a technique which helps us when
we have to access a collection of records in
multiple ways
www.company.com
Sandeep Ch Dept of CSE
Data on external storage
• Since DBMS stores large quantities of data, so
the data is stored on external storage devices
such as disks and tapes
Topic 2 • The data stored in the disks is fetched into the
main memory when needed for processing
File Organization
• Generally the information is, read from memory
or written to memory as ‘pages’
• The data is read into the main memory for
processing and written back to the disk for
stable storage. This is done by “buffer
manager”
• Disks are the most important storage devices
• The space on the disk is managed by “disk
space manager”
www.company.com
Sandeep Ch Dept of CSE
File Organization and Indexing
The pile
Topic 3
The
The direct, sequential
File Organization or hashed, file
file
Five of the
common file
organizations are:
The indexed The
sequential indexed
file file
www.company.com
Sandeep Ch Dept of CSE
The Pile
• Least complicated
form of file
organization
Topic 3
• Data are collected in
File Organization the order they arrive
• Each record consists
of one burst of data
• Purpose is simply to
accumulate the mass
of data and save it
• Record access is by
exhaustive search
www.company.com
Sandeep Ch Dept of CSE
The Sequential File
www.company.com
Sandeep Ch Dept of CSE
Indexing
www.company.com
Sandeep Ch Dept of CSE
Indexes
www.company.com
Sandeep Ch Dept of CSE
Primary Index
www.company.com
Sandeep Ch Dept of CSE
Topic 3
File Organization
www.company.com
Sandeep Ch Dept of CSE
Primary Index
www.company.com
Sandeep Ch Dept of CSE
Clustering Indexes
File Organization
www.company.com
Sandeep Ch Dept of CSE
Secondary Indexes
• If index is built on non-ordering field of file it is
called Secondary Index.
Topic 3 • a secondary index file is an index file that is
used to index fields that are neither ordering
File Organization
fields nor key fields
• The secondary index may be on a field which
is a non key with duplicate values.
• The index is an ordered file with two fields.
• The first field is of the same data type as
some non ordering field of the data file that is
an indexing field.
• The second field is either a block pointer or a
record pointer.
www.company.com
Sandeep Ch Dept of CSE
Secondary Indexes
www.company.com
Sandeep Ch Dept of CSE
Topic 3
File Organization
www.company.com
Sandeep Ch Dept of CSE
MULTILEVEL INDEXES
www.company.com
Sandeep Ch Dept of CSE
Topic 3
File Organization
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
Topic 4
Hash based
indexing
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
Topic 4
Hash based
indexing
• Static Hashing:
• In static hashing, when a search-key value is provided, the
hash function always computes the same address.
• For example, if we want to generate address for
STUDENT_ID = 76 using mod (5) hash function, it always
result in the same bucket address 4.
• There will not be any changes to the bucket address here.
Hence number of data buckets in the memory for this static
hashing remains constant throughout.
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
• Now, If we want to insert some new records into the file But
the data bucket address generated by the hash function is
not empty or the data already exists in that address.
• This becomes a critical situation to handle. This situation
Topic 4 in the static hashing is called bucket overflow.
• There are several methods provided to overcome this
Hash based situation.
indexing
• Some commonly used methods are
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
• Open Hashing:
• In Open hashing method, next available data block is used
to enter the new record, instead of overwriting older one.
This method is also called linear probing.
Topic 4 • For example, D3 is a new record which needs to be
inserted , the hash function generates address as 105. But
Hash based it is already full. So the system searches next available
indexing data bucket, 123 and assigns D3 to it.
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
• Quadratic probing :
• Quadratic probing is very much similar to open hashing or
linear probing. Here, The only difference between old and
Topic 4 new bucket is linear. Quadratic function is used to
determine the new bucket address.
Hash based
indexing
• Double Hashing :
• Double Hashing is another method similar to linear
probing. Here the difference is fixed as in linear probing,
but this fixed difference is calculated by using another
hash function. That’s why the name is double hashing
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
• Dynamic Hashing:
• The drawback of static hashing is that that it does not
expand or shrink dynamically as the size of the
database grows or shrinks.
Topic 4
• In Dynamic hashing, data buckets grows or shrinks
(added or removed dynamically) as the records
Hash based increases or decreases.
indexing
• Dynamic hashing is also known as extended hashing.
• In dynamic hashing, the hash function is made to
produce a large number of values.
• For Example, there are three data records D1, D2 and
D3 . The hash function generates three addresses
1001, 0101 and 1010 respectively. This method of
storing considers only part of this address – especially
only first one bit to store the data. So it tries to load
three of them at address 0 and 1. As shown below:
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
Topic 4
Hash based
indexing
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing
Hash based
indexing
www.company.com
Sandeep Ch Dept of CSE
Tree based indexing
www.company.com
Sandeep Ch Dept of CSE
B-Trees
Topic
www.company.com
Sandeep Ch Dept of CSE
Node structure in B-Trees
www.company.com
Sandeep Ch Dept of CSE
B-Tree Insertion
www.company.com
Sandeep Ch Dept of CSE
B+-Trees
www.company.com
Sandeep Ch Dept of CSE
The structure of the internal nodes of a
B+-Trees
www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees
www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees
www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees
www.company.com
Sandeep Ch Dept of CSE
THANK YOU
www.company.com
Sandeep Ch Dept of CSE