0% found this document useful (0 votes)
7 views54 pages

Unit 5

Uploaded by

anonymous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views54 pages

Unit 5

Uploaded by

anonymous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Database Systems

Ch. Sandeep
II Year - II Sem
UNIT – VI

1. Overview of Storage and Indexing


2. Data on External Storage
3. File Organization and Indexing
Syllabus: 4. Hash Based Indexing
5. Tree base Indexing

www.company.com
Sandeep Ch Dept of CSE
UNIT-VI

• To understand the process of storing the Data on


disk
• To understand the process of applying the indexing
to the stored data
Objectives: • To understand the requirement of B & B+ trees for
fast retrieval of the data

www.company.com
Sandeep Ch Dept of CSE
UNIT-VI

✓Students will be able to know how to store the


data on disk
✓Students can determine what type of indexing
technique to be used to ensure the fast retrieval of
Expected the data
Outcomes:: ✓Students will understand the importance of hash
based and tree based indexing in DBMS

www.company.com
Sandeep Ch Dept of CSE
Overview of storage & indexing:
• A row in a relation is called as a tuple or a
record.
Topic 1 • A record is simply collection of fields.
File Organization • File is a sequence/collection of records.
• File stores a given relation or a given table.
• Each record in a file has a unique identifier
called ‘record id’ or ‘rid’
• On disk a data is stored in terms of files
• Indexing is a technique which helps us when
we have to access a collection of records in
multiple ways

www.company.com
Sandeep Ch Dept of CSE
Data on external storage
• Since DBMS stores large quantities of data, so
the data is stored on external storage devices
such as disks and tapes
Topic 2 • The data stored in the disks is fetched into the
main memory when needed for processing
File Organization
• Generally the information is, read from memory
or written to memory as ‘pages’
• The data is read into the main memory for
processing and written back to the disk for
stable storage. This is done by “buffer
manager”
• Disks are the most important storage devices
• The space on the disk is managed by “disk
space manager”
www.company.com
Sandeep Ch Dept of CSE
File Organization and Indexing

• File organization is the logical structuring of the


records as determined by the way in which they are
Topic 3
accessed
File Organization • In choosing a file organization, several criteria are
important:
• short access time
• ease of update
• economy of storage
• simple maintenance
• reliability
• Priority of criteria depends on the application that
will use the file
www.company.com
Sandeep Ch Dept of CSE
File Organization Types

The pile
Topic 3
The
The direct, sequential
File Organization or hashed, file
file

Five of the
common file
organizations are:
The indexed The
sequential indexed
file file

www.company.com
Sandeep Ch Dept of CSE
The Pile
• Least complicated
form of file
organization
Topic 3
• Data are collected in
File Organization the order they arrive
• Each record consists
of one burst of data
• Purpose is simply to
accumulate the mass
of data and save it
• Record access is by
exhaustive search

www.company.com
Sandeep Ch Dept of CSE
The Sequential File

• Most common form of file


structure
• A fixed format is used for
Topic 3 records
• Key field uniquely identifies
File Organization the record & determines
storage order
• Typically used in batch
applications
• Only organization that is
easily stored on tape as
well as disk

www.company.com
Sandeep Ch Dept of CSE
Indexing

• An index file is a secondary or auxiliary


file that contains meta data or data that
Topic 3
is used to access the required data
File Organization elements from the database fastly.
• An index is the data structure used for
fast access.
• There are two kinds of indexes
1. single level indexes
2. multi-level indexes.

www.company.com
Sandeep Ch Dept of CSE
Indexes

• Single level indexes have just one level of


index structures that is index file maps
Topic 3
directly to the block or the address of
File Organization record
• A multi-level index on the other hand has
multiple levels of indexing where, one
level of index structure may point to
another level of index structure and so on.
• And finally the last level would point to
block addresses or record addresses in
the primary file
www.company.com
Sandeep Ch Dept of CSE
Indexing

• indexing field or an indexing attribute is


the field on which the index structure is
Topic 3
built.
File Organization • That is searching is efficient and fast on
this field.
• Ordering field is the field on which the
records of file are ordered
• Indexing can be one of the following types
1. Primary Index
2. Clustering Index
3. Secondary Index
www.company.com
Sandeep Ch Dept of CSE
Indexes

Primary Index: If index is built on ordering


'key-field' of file it is called Primary Index.
Topic 3
Generally it is the primary key of the
File Organization relation.
Clustering Index: If index is built on
ordering non-key field of file it is called
Clustering Index.
Secondary Index: If index is built on non-
ordering field of file it is called Secondary
Index.

www.company.com
Sandeep Ch Dept of CSE
Primary Index

• If index is built on ordering 'key-field' of


file it is called Primary Index.
Topic 3
• Usually it is a primary key on which
File Organization records are ordered on disk
• A primary index is an ordered file with
two fields.
• The first field is of the same as the
ordering key field-called the primary key-
of the data file, and the second field is a
pointer to a disk block (a block address).
www.company.com
Sandeep Ch Dept of CSE
Primary Index

• There is one index entry (or index


record) in the index file for each block in
Topic 3
the data file.
File Organization • Each index entry has the value of the
primary key field for the first record in a
block and a pointer to that block as its
two field values.
• We will refer to the two field values of
index entry i as <K(i),P(i)>.

www.company.com
Sandeep Ch Dept of CSE
Topic 3

File Organization

www.company.com
Sandeep Ch Dept of CSE
Primary Index

• we use the NAME field as primary key,


because that is the ordering key field of the
Topic 3 file
File Organization • Each entry in the index has a NAME value
and a pointer.
• The first 3 index entries are as follows:
• <K(l)=(Aaron,Ed), P(l) =address of block 1>
• <K(2)=(Adams.john),P(2)=address of block
2>
• <K(3) = (Alexander,Ed), P(3) = address of
block 3>
www.company.com
Sandeep Ch Dept of CSE
Primary Index

•A dense index has an index entry for


every search key value ( every record) in
Topic 3
the data file.
File Organization • A sparse index, on the other hand, has
index entries for only some of the search
values.
• A primary index is hence a non dense
(sparse) index, since it includes an entry
for each disk block of the data file and
the keys of its anchor record rather than
for every search value (or every record).
www.company.com
Sandeep Ch Dept of CSE
Clustering Indexes

• If records of a file are physically ordered


based on a non key field-which does not
Topic 3
have a distinct value for each record-that
File Organization field is called the clustering field.
• If index structure is built on the
clustering field is called Clustering Index.
• This differs from a primary index, which
requires that the ordering field of the
data file have a distinct value for each
record.

www.company.com
Sandeep Ch Dept of CSE
Clustering Indexes

• A clustering index is also an ordered file


with two fields; the first field is of the
Topic 3
same type as the clustering field of the
File Organization data file, and the second field is a block
pointer.
• There is one entry in the clustering index
for each distinct value of the clustering
field, containing the value and a pointer
to the first block in the data file that has
a record with that value for its clustering
field.
www.company.com
Sandeep Ch Dept of CSE
Topic 3

File Organization

www.company.com
Sandeep Ch Dept of CSE
Secondary Indexes
• If index is built on non-ordering field of file it is
called Secondary Index.
Topic 3 • a secondary index file is an index file that is
used to index fields that are neither ordering
File Organization
fields nor key fields
• The secondary index may be on a field which
is a non key with duplicate values.
• The index is an ordered file with two fields.
• The first field is of the same data type as
some non ordering field of the data file that is
an indexing field.
• The second field is either a block pointer or a
record pointer.
www.company.com
Sandeep Ch Dept of CSE
Secondary Indexes

• In this case there is one index entry for


each record in the data file, which
Topic 3
contains the value of the secondary key
File Organization for the record and a pointer either to the
block in which the record is stored or to
the record itself.
• Hence, such an index is dense.
• We again refer to the two field values of
index entry i as <K(i), P(i)>.

www.company.com
Sandeep Ch Dept of CSE
Topic 3

File Organization

www.company.com
Sandeep Ch Dept of CSE
MULTILEVEL INDEXES

• A multi-level index on the other of hand


has multiple levels of indexing where,
Topic 3
one level of index structure may point to
File Organization another level of index structure and so
on.
• And finally the last level would point to
block addresses or record addresses in
the primary file

www.company.com
Sandeep Ch Dept of CSE
Topic 3

File Organization

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• In database management system, When we


want to retrieve a particular data, It becomes
Topic 4 very inefficient to search all the index values
and reach the desired data. In this situation,
Hash based Hashing technique comes into picture.
indexing
• Hashing is an efficient technique to directly
search the location of desired data on the disk
without using index structure.
• Data is stored at the data blocks whose
address is generated by using hash function.
• The memory location where these records are
stored is called as data block or data bucket.
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Hash File Organization :


• Data bucket: Data buckets are the memory locations where
the records are stored. These buckets are also considered
as Unit Of Storage.
Topic 4 • Hash Function: Hash function is a mapping function that
maps all the set of search keys to actual record address.
Hash based Generally, hash function uses primary key to generate the
indexing hash index – address of the data block. Hash function can be
simple mathematical function to any complex mathematical
function.
• Hash Index: The prefix of an entire hash value is taken as a
hash index. Every hash index has a depth value to signify
how many bits are used for computing a hash function. These
bits can address 2n buckets. When all these bits are
consumed ? then the depth value is increased linearly and
twice the buckets are allocated. Below given diagram clearly
depicts how hash function work

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

Topic 4

Hash based
indexing

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Further hashing is divided into two categories

Topic 4

Hash based
indexing
• Static Hashing:
• In static hashing, when a search-key value is provided, the
hash function always computes the same address.
• For example, if we want to generate address for
STUDENT_ID = 76 using mod (5) hash function, it always
result in the same bucket address 4.
• There will not be any changes to the bucket address here.
Hence number of data buckets in the memory for this static
hashing remains constant throughout.
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Operations of static hashing:


• Insertion: When a new record is inserted into the table, The
hash function h generates a bucket address for the new
record based on its hash key K. i.e. Bucket address = h(K)
Topic 4 • Searching: When a record needs to be searched, The same
hash function is used to retrieve the bucket address for the
Hash based record. For Example, if we want to retrieve whole record for ID
indexing 76, and if the hash function is mod (5) on that ID, the bucket
address generated would be 4. Then we will directly got to
address 4 and retrieve the whole record for ID 104. Here ID
acts as a hash key.
• Deletion: If we want to delete a record, Using the hash
function we will first fetch the record which is supposed to be
deleted. Then we will remove the records for that address in
memory.
• Updation: The data record that needs to be updated is first
searched using hash function, and then the data record is
updated.
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Now, If we want to insert some new records into the file But
the data bucket address generated by the hash function is
not empty or the data already exists in that address.
• This becomes a critical situation to handle. This situation
Topic 4 in the static hashing is called bucket overflow.
• There are several methods provided to overcome this
Hash based situation.
indexing
• Some commonly used methods are

• 1. Open hashing or linear probing


• 2. Closed hashing
• 3. Quadratic probing
• 4. Double hashing

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Open Hashing:
• In Open hashing method, next available data block is used
to enter the new record, instead of overwriting older one.
This method is also called linear probing.
Topic 4 • For example, D3 is a new record which needs to be
inserted , the hash function generates address as 105. But
Hash based it is already full. So the system searches next available
indexing data bucket, 123 and assigns D3 to it.

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Closed hashing: In Closed hashing method, a new data


bucket is allocated with same address and is linked it after
the full data bucket. This method is also known
as overflow chaining.
Topic 4 • For example, we have to insert a new record D3 into the
tables. The static hash function generates the data bucket
Hash based address as 105. But this bucket is full to store the new
indexing data. In this case is a new data bucket is added at the end
of 105 data bucket and is linked to it. Then new record D3
is inserted into the new bucket

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Quadratic probing :
• Quadratic probing is very much similar to open hashing or
linear probing. Here, The only difference between old and
Topic 4 new bucket is linear. Quadratic function is used to
determine the new bucket address.
Hash based
indexing
• Double Hashing :
• Double Hashing is another method similar to linear
probing. Here the difference is fixed as in linear probing,
but this fixed difference is calculated by using another
hash function. That’s why the name is double hashing

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• Dynamic Hashing:
• The drawback of static hashing is that that it does not
expand or shrink dynamically as the size of the
database grows or shrinks.
Topic 4
• In Dynamic hashing, data buckets grows or shrinks
(added or removed dynamically) as the records
Hash based increases or decreases.
indexing
• Dynamic hashing is also known as extended hashing.
• In dynamic hashing, the hash function is made to
produce a large number of values.
• For Example, there are three data records D1, D2 and
D3 . The hash function generates three addresses
1001, 0101 and 1010 respectively. This method of
storing considers only part of this address – especially
only first one bit to store the data. So it tries to load
three of them at address 0 and 1. As shown below:
www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

Topic 4

Hash based
indexing

• But the problem is that No bucket address is remaining


for D3. The bucket has to grow dynamically to
accommodate D3.

www.company.com
Sandeep Ch Dept of CSE
Hash Based Indexing

• So it changes the address have 2 bits rather than 1 bit,


and then it updates the existing data to have 2 bit
address. Then it tries to accommodate D3
Topic 4

Hash based
indexing

www.company.com
Sandeep Ch Dept of CSE
Tree based indexing

• A tree is formed of nodes.


Topic 5 • Each node in the tree, except for a special
node called the root, has one parent node
Hash based and several-zero or more--child nodes.
indexing
• A node that does not have any child
nodes is called a leaf node;
• a non leaf node is called an internal node.
• A sub tree of a node consists of that node
and all its descendant nodes-its child
nodes, the child nodes of its child nodes,
and so on
www.company.com
Sandeep Ch Dept of CSE
B-Trees

• B-tree index is the widely used data structures for tree


based indexing in DBMS. It is a multilevel format of tree
based indexing in DBMS technique which has balanced
binary search trees. All leaf nodes of the B tree signify
actual data pointers
• Moreover, all leaf nodes are interlinked with a link list,
which allows a B tree to support both random and
sequential access
• B-tree is a specialized multiway tree designed especially
for use on disk. B-Tree consists of a root node, branch
nodes(internal) and leaf nodes .
• In a B-tree each node may contain a large number of
keys
• In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
www.company.com
Sandeep Ch Dept of CSE
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees
• Allows for rapid tree traversal searching
through an upside-down tree structure
• Reading a single record from a very
large table using a B-Tree index, can
often result in a few block reads—even
when the index and table are millions of
blocks in size.

• B-Tree Characteristics or node structure

www.company.com
Sandeep Ch Dept of CSE
B-Trees

Topic

www.company.com
Sandeep Ch Dept of CSE
Node structure in B-Trees

www.company.com
Sandeep Ch Dept of CSE
B-Tree Insertion

1. A B-tree starts with a single root node at level 0 .


2. Once the root node is full with p -1 search key
values and we attempt to insert another entry in
the tree, the root node splits into two nodes at
level 1.
3. Only the middle value is kept in the root node,
and the rest of the values are split evenly
between the other two nodes.
4. When a nonroot node is full and a new entry is
inserted into it, that node is split into two nodes
at the same level, and the middle entry is moved
to the parent node along with two pointers to the
new split nodes.
5. If the parent node is full, it is also split.
www.company.com
Sandeep Ch Dept of CSE
Insertion in B-Trees

www.company.com
Sandeep Ch Dept of CSE
B+-Trees

• Most implementations of a dynamic multilevel


index use a variation of the B-tree data
structure called a B+-tree.
• In a B-tree, every value of the search field
appears once at some level in the tree, along
with a data pointer.
• In a B+-tree, data pointers are stored only at
the leaf nodes of the tree; hence, the structure
of leaf nodes differs from the structure of
internal nodes.
• The leaf nodes have an entry for every value of
the search field, along with a data pointer to the
record if the search field is a key field.
www.company.com
Sandeep Ch Dept of CSE
The structure of the internal nodes of a
B+-Trees

www.company.com
Sandeep Ch Dept of CSE
The structure of the internal nodes of a
B+-Trees

www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees

www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees

• The pointers in internal nodes are tree


pointers to blocks that are tree nodes,
whereas the pointers in leaf nodes are data
pointers to the data file records or blocks
www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees

www.company.com
Sandeep Ch Dept of CSE
The structure of the leaf nodes of a B+-
Trees

www.company.com
Sandeep Ch Dept of CSE
THANK YOU

www.company.com
Sandeep Ch Dept of CSE

You might also like