0% found this document useful (0 votes)
13 views

Module-6

The document outlines the physical database design in database management systems, focusing on storage structures, types of storage (primary, secondary, and tertiary), and file organization methods. It discusses various storage devices, their characteristics, and the hierarchy of memory, including RAID configurations for data redundancy. Additionally, it covers different file organization techniques such as sequential, heap, and hash file organization, highlighting their advantages and disadvantages.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module-6

The document outlines the physical database design in database management systems, focusing on storage structures, types of storage (primary, secondary, and tertiary), and file organization methods. It discusses various storage devices, their characteristics, and the hierarchy of memory, including RAID configurations for data redundancy. Additionally, it covers different file organization techniques such as sequential, heap, and hash file organization, highlighting their advantages and disadvantages.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

DATABASE MANAGEMENT SYSTEMS

COURSE CODE: CSE-2007


MODULE – 6
(Physical Database Design)

By:
Dr. Nagendra Panini Challa
Assistant Professor, Senior Grade 2
SCOPE, VIT-AP University, India
AGENDA
 Storage and file structure: Memory
Hierarchies and Storage Devices,
Placing File Records on Disk
 Hashing Techniques
 Indexing Techniques

(Primary Indexes, Secondary Indexes,


Clustering Indexes, Multilevel Indexes,
Dynamic Multilevel Indexes Using B-
Trees and B+-Trees)

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 2


STORAGE STRUCTURE
 A database system provides an
ultimate view of the stored data.
However, data in the form of bits,
bytes get stored in different storage
devices.
Types of Data Storage
 For storing the data, there are
different types of storage options
available. These storage types differ
from one another as per the speed
and accessibility. There are the
following types of storage devices
used for storing the data:
Primary Storage
Secondary Storage
Tertiary Storage
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 3
Primary Storage

 It is the primary area that offers quick access to the stored data. The primary storage as
volatile storage. It is because this type of memory does not permanently store the data.
As soon as the system leads to a power cut or a crash, the data also get lost.

Main memory and cache are the types of primary storage.


 Main Memory: It is the one that is responsible for operating the data that is available by
the storage medium. The main memory handles each instruction of a computer
machine. This type of memory can store gigabytes of data on a system but is small
enough to carry the entire database. At last, the main memory loses the whole content if
the system shuts down because of power failure or other reasons.
 Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A
cache is a tiny storage media which is maintained by the computer hardware usually.
While designing the algorithms and query processors for the data structures, the
designers keep concern on the cache effects.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 4


Secondary Storage
 Secondary storage is also called as Online storage. It is the storage area that allows the
user to save and store data permanently. This type of memory does not lose the data due
to any power failure or system crash. It is non-volatile storage.
There are some commonly described secondary storage media which are available in almost
every type of computer system:
 Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are
further plugged into the USB slots of a computer system. These USB keys help transfer
data to a computer system, but it varies in size limits. Unlike the main memory, it is
possible to get back the stored data which may be lost due to a power cut or other
reasons. This type of memory storage is most commonly used in the server systems for
caching the frequently used data. This leads the systems towards high performance and is
capable of storing large amounts of databases than the main memory.
 Magnetic Disk Storage: This type of storage media is also known as online storage media. A
magnetic disk is used for storing the data for a long time. It is capable of storing an entire
database. It is the responsibility of the computer system to make availability of the data
from a disk to the main memory for further accessing. Also, if the system performs any
operation over the data, the modified data should be written back to the disk. The
tremendous capability of a magnetic disk is that it does not affect the data due to a system
crash or failure, but a disk failure can easily ruin as well as destroy the stored data.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 5


Tertiary Storage
 It is the storage type that is external from the computer system. It has the slowest
speed. But it is capable of storing a large amount of data. It is also known as Offline
storage. Tertiary storage is generally used for data backup.

There are following tertiary storage devices available:

 Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact
Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the
other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each
side of the disk.

 Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data
sequentially from the start. Thus, tape storage is also known as sequential-access
storage. Disk storage is known as direct-access storage as we can directly access the
data from any location on disk.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 6


Storage Hierarchy

 Besides the above, various other storage


devices reside in the computer system.
These storage media are organized on
the basis of data accessing speed, cost
per unit of data to buy the medium, and
by medium's reliability. Thus, we can
create a hierarchy of storage media on
the basis of its cost and speed.
 In the image, the higher levels are
expensive but fast. On moving down, the
cost per bit is decreasing, and the access
time is increasing. Also, the storage
media from the main memory to up
represents the volatile nature, and below
the main memory, all are non-volatile
devices.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 7


Memory Hierarchy
 A computer system has a well-defined hierarchy of memory. A CPU has direct access to
it main memory as well as its inbuilt registers. The access time of the main memory is
obviously less than the CPU speed. To minimize this speed mismatch, cache memory is
introduced. Cache memory provides the fastest access time and it contains data that is
most frequently accessed by the CPU.
 The memory with the fastest access is the costliest one. Larger storage devices offer
slow speed and they are less expensive, however they can store huge volumes of data
as compared to CPU registers or cache memory.
Magnetic Disks
 Hard disk drives are the most common secondary storage devices in present computer
systems. These are called magnetic disks because they use the concept of
magnetization to store information. Hard disks consist of metal disks coated with
magnetizable material. These disks are placed vertically on a spindle. A read/write head
moves in between the disks and is used to magnetize or de-magnetize the spot under it.
A magnetized spot can be recognized as 0 (zero) or 1 (one).
 Hard disks are formatted in a well-defined order to store data efficiently. A hard disk
plate has many concentric circles on it, called tracks. Every track is further divided
into sectors. A sector on a hard disk typically stores 512 bytes of data.
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 8
Redundant Array of Independent Disks
 RAID or Redundant Array of Independent Disks, is a technology to connect
multiple secondary storage devices and use them as a single storage media.
 RAID consists of an array of disks in which multiple disks are connected together
to achieve different goals. RAID levels define the use of disk arrays.

RAID 0
 In this level, a striped array of disks is implemented. The data is broken down into
blocks and the blocks are distributed among disks. Each disk receives a block of
data to write/read in parallel. It enhances the speed and performance of the
storage device. There is no parity and backup in Level 0.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 9


RAID 1
 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends
a copy of data to all the disks in the array. RAID level 1 is also called mirroring and
provides 100% redundancy in case of a failure.

 RAID 2 records Error Correction Code using Hamming distance for its data, striped
on different disks. Like level 0, each data bit in a word is recorded on a separate
disk and ECC codes of the data words are stored on a different set disks. Due to its
complex structure and high cost, RAID 2 is not commercially available.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 10


RAID 3
 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is
stored on a different disk. This technique makes it to overcome single disk failures.

RAID 4
 In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping,
whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three
disks to implement RAID.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 11


RAID 5
 RAID 5 writes whole data blocks onto
different disks, but the parity bits
generated for data block stripe are
distributed among all the data disks
rather than storing them on a different
dedicated disk.

RAID 6
 RAID 6 is an extension of level 5. In
this level, two independent parities are
generated and stored in distributed
fashion among multiple disks. Two
parities provide additional fault
tolerance. This level requires at least
four disk drives to implement RAID.
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 12
DBMS - FILE STRUCTURE

 Relative data and information is stored collectively in file formats.


 A file is a sequence of records stored in binary format.
 A disk drive is formatted into several blocks that can store records.
 File records are mapped onto those disk blocks.
FILE ORGANIZATION
File Organization defines how file records are mapped onto disk blocks.

We have different types of File Organization to organize file records −


1. SEQUENTIAL FILE ORGANIZATION

 It is one of the simple methods of file organization.

 Here each file/records are stored one after the other in a sequential manner.

This can be achieved in two ways:

 Records are stored one after the other as they are inserted into the tables. This
method is called pile file method.

 When a new record is inserted, it is placed at the end of the file. In the case of
any modification or deletion of record, the record will be searched in the
memory blocks. Once it is found, it will be marked for deleting and new
block of record is entered.
Inserting a new record:

In the diagram above, R1, R2, R3 etc. are the records.


They contain all the attribute of a row. i.e.; when we say student record, it will
have his id, name, address, course, DOB etc.
Similarly R1, R2, R3 etc can be considered as one full set of attributes.
In the second method, records are sorted (either ascending or
descending) each time they are inserted into the system.
This method is called sorted file method. Sorting of records may be
based on the primary key or on any other columns.
Whenever a new record is inserted, it will be inserted at the end of
the file and then it will sort – ascending or descending based on key
value and placed at the correct position.
In the case of update, it will update the record and then sort the file
to place the updated record in the right place. Same is the case with
delete.
ADVANTAGES OF SEQUENTIAL FILE ORGANIZATION
 The design is very simple compared other file organization. There is
no much effort involved to store the data.
 When there are large volumes of data, this method is very fast and
efficient.
 This method is helpful when most of the records have to be
accessed like calculating the grade of a student, generating the
salary slips etc where we use all the records for our calculations
 This method is good in case of report generation or statistical
calculations.
 These files can be stored in magnetic tapes which are comparatively
cheap.
DISADVANTAGES OF SEQUENTIAL FILE ORGANIZATION

 Sorted file method always involves the effort for sorting the record.

 Each time any insert/update/ delete transaction is performed, file is


sorted.
 Hence identifying the record, inserting/ updating/ deleting the
record, and then sorting them always takes some time and may
make system slow
2. HEAP FILE ORGANIZATION
 This is the simplest form of file organization.
 Here records are inserted at the end of the file as and when
they are inserted.
 There is no sorting or ordering of the records. Once the data
block is full, the next record is stored in the new block.
 This new block need not be the very next block. This method
can select any block in the memory to store the new records.
 It is similar to pile file in the sequential method, but here data
blocks are not selected sequentially.
 They can be any data blocks in the memory. It is the
responsibility of the DBMS to store the records and manage
them.
DIAGRAMMATIC REPRESENTATION OF HEAP FILE ORGANIZATION
IF A NEW RECORD IS INSERTED, THEN IN THE ABOVE CASE IT WILL BE
INSERTED INTO DATA BLOCK 1.
 When a record has to be retrieved from the database, in this
method, we need to traverse from the beginning of the file till we
get the requested record.
 Hence fetching the records in very huge tables, it is time
consuming.
 This is because there is no sorting or ordering of the records.
 We need to check all the data.

 Similarly if we want to delete or update a record, first we need to


search for the record. Again, searching a record is similar to
retrieving it- start from the beginning of the file till the record is
fetched.
 If it is a small file, it can be fetched quickly.
 But larger the file, greater amount of time needs to be spent in
fetching.
 In addition, while deleting a record, the record will be
deleted from the data block.

 But it will not be freed and it cannot be re-used.


Hence as the number of record increases, the memory
size also increases and hence the efficiency.

 For the database to perform better, DBA has to free


this unused memory periodically.
ADVANTAGES OF HEAP FILE ORGANIZATION
 Very good method of file organization for bulk insertion.
i.e.; when there is a huge number of data needs to load
into the database at a time, then this method of file
organization is best suited.
 They are simply inserted one after the other in the
memory blocks.
 It is suited for very small files as the fetching of records is
faster in them.
 As the file size grows, linear search for the record
becomes time consuming.
DISADVANTAGES OF HEAP FILE ORGANIZATION

 This method is inefficient for larger databases as it takes time to


search/modify the record.
 Proper memory management is required to boost the performance.

 Otherwise there would be lots of unused memory blocks lying and


memory size will simply be growing
HASH/DIRECT FILE ORGANIZATION

 In this method of file organization, hash function is used to calculate


the address of the block to store the records.
 The hash function can be any simple or complex mathematical
function.
 The hash function is applied on some columns/attributes – either key
or non-key columns to get the block address.
 Hence each record is stored randomly irrespective of the order they
come. Hence this method is also known as Direct or Random file
organization.
 If the hash function is generated on key column, then that column is
called hash key, and if hash function is generated on non-key
column, then the column is hash column.
 When a record has to be retrieved, based on the hash key column,
the address is generated and directly from that address whole record
is retrieved.
 Here no effort to traverse through whole file. Similarly when a new
record has to be inserted, the address is generated by hash key and
record is directly inserted.
 Same is the case with update and delete. There is no effort for
searching the entire file nor sorting the files. Each record will be
stored randomly in the memory.
ADVANTAGES OF HASH FILE ORGANIZATION
 Records need not be sorted after the transaction. Hence the
effort of sorting is reduced in this method.
 Since block address is known by hash function, accessing any
record is very faster. Similarly updating or deleting a record is
also very quick.
 This method can handle multiple transactions as each record is
independent of other. i.e.; since there is no dependency on
storage location for each record, multiple records can be
accessed at the same time.
 It is suitable for online transaction systems like online banking,
ticket booking system etc.
DISADVANTAGES OF HASH FILE ORGANIZATION
 This method may accidentally delete the data.

 Since all the records are randomly stored, they are scattered in the memory. Hence memory is not
efficiently used.
 If we are searching for range of data, then this method is not suitable. Because, each record will be stored
at random address. Hence range search will not give the correct address range and searching will be
inefficient. For example, searching the employees with salary from 20K to 30K will be efficient.
 Searching for records with exact name or value will be efficient. If the Student name starting with ‘B’
will not be efficient as it does not give the exact name of the student.
 If there is a search on some columns which is not a hash column, then the search will not be efficient.
This method is efficient only when the search is done on hash column. Otherwise, it will not be able find
the correct address of the data.
 If there is multiple hash columns – say name and phone number of a person, to generate the address, and
if we are searching any record using phone or name alone will not give correct results.
 If these hash columns are frequently updated, then the data block address is also changed accordingly.
Each update will generate new address. This is also not acceptable.
 Hardware and software required for the memory management are costlier in this case. Complex programs
needs to be written to make this method efficient.
CLUSTER FILE ORGANIZATION

There are two types of cluster file organization

 Indexed Clusters: - Here records are grouped based on the cluster key and stored
together. Our example above to illustrate STUDENT-COURSE cluster is an indexed
cluster. The records are grouped based on the cluster key – COURSE_ID and all the
related records are stored together. This method is followed when there is retrieval of
data for range of cluster key values or when there is a huge data growth in the clusters.
That means, if we have to select the students who are attending the course with
COURSE_ID 230-240 or there is a large number of students attending the same course,
say 250.

 Hash Clusters: - This is also similar to indexed cluster. Here instead of storing the
records based on the cluster key, we generate the hash key value for the cluster key and
store the records with same hash key value together in the memory disk.
Advantages of Clustered File Organization

 This method is best suited when there is frequent request for joining
the tables with same joining condition.
 When there is a 1:M mapping between the tables, it results efficiently

Disadvantages of Clustered File Organization

 This method is not suitable for very large databases since the
performance of this method on them is low.
 We cannot use this clusters, if there is any change is joining condition.
If the joining condition changes, the traversing the file takes lot of
time.
 This method is not suitable for less frequently joined tables or tables
with 1:1 conditions
PLACING FILES ON RECORD
The techniques for placing file records on disk:

1. Records and Record Types


2. Files, Fixed-Length Records, and Variable-Length Records
3. Record Blocking and Spanned versus Unspanned Records
4. Allocating File Blocks on Disk
5. File Headers

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 34


1) Records and Record Types:

 Data is usually stored in the form of records.


 Each record consists of a collection of related data values or items, where each value is formed
of one or more bytes.
 Records usually describe entities and their attributes.
 For example, an EMPLOYEE record represents an employee entity, and each field value in the
record specifies some attribute of that employee, such as Name,Birth_date, Salary,
or Supervisor.
 A collection of field names and their corresponding data types constitutes a record
type or record format definition.
 A data type, associated with each field, specifies the types of values a field can take. For
example struct employee{
char name[30]; char ssn[9]; int salary; int job_code; char department[20]; };
 In some database applications, the need may arise for storing data items that consist of large
unstructured objects, which represent images, digitized video or audio streams, or free text.
These are referred to as BLOBs (binary large objects). A BLOB data item is typically stored
separately from its record in a pool of disk blocks, and a pointer to the BLOB is included in the
record.
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 35
2) Files, Fixed-Length Records, and Variable-Length Records:
 A file is a sequence of records.

 In many cases, all records in a file are of the same record type.

 If every record in the file has exactly the same size (in bytes), the file is said to be made up
of fixed-length records.
 If different records in the file have different sizes, the file is said to be made up of variable-
length records.
A file may have variable-length records for several reasons:
 The file records are of the same record type, but one or more of the fields are of varying
size (variable-length fields). For example, the Name field of EMPLOYEE can be a variable-length
field.
 The file records are of the same record type, but one or more of the fields may have
multiple values for individual records; such a field is called a repeating field and a group of
values for the field is often called a repeating group.
 The file records are of the same record type, but one or more of the fields are optional;
that is, they may have values for some but not all of the file records (optional fields).
 The file contains records of different record types and hence of varying size (mixed file).
This would occur if related records of different types were clustered (placed together) on disk
blocks;
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 36
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 37
3) Record Blocking and Spanned versus Unspanned Records:

 The records of a file must be allocated to disk blocks because a block is the unit of data
transfer between disk and memory.
 When the block size is larger than the record size, each block will contain numerous
records, although some files may have unusually large records that cannot fit in one
block.
 Suppose that the block size is B bytes. For a file of fixed-length records of size R bytes,
with B ≥ R, we can fit bfr = B/R records per block, where the (x) (floor function) rounds
down the number x to an integer.
 The value bfr is called the blocking factor for the file. In general, R may not
divide B exactly, so we have some unused space in each block equal to
B − (bfr * R) bytes

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 38


 To utilize this unused space, we can store part of a record on one block and the rest on
another.
 A pointer at the end of the first block points to the block containing the remainder of
the record in case it is not the next consecutive block on disk.
 This organization is called spanned because records can span more than one block.
 Whenever a record is larger than a block, we must use a spanned organization.
 If records are not allowed to cross block boundaries, the organization is
called unspanned.
 This is used with fixed-length records having B > R because it makes each record start
at a known location in the block, simplifying record processing.
 For variable-length records, either a spanned or an unspanned organization can be
used. If the average record is large, it is advantageous to use spanning to reduce the
lost space in each block.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 39


For variable-length records using spanned organization, each block may
store a different number of records.
In this case, the blocking factor bfr represents the average number of
records per block for the file.
We can use bfr to calculate the number of blocks b needed for a file
of r records:

b = (r/bfr) blocks

where the (x) (ceiling function) rounds the value x up to the next
integer.
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 40
4. Allocating File Blocks on Disk

 There are several standard techniques for allocating the blocks of a file on disk.
 In contiguous allocation, the file blocks are allocated to consecutive disk blocks.
This makes reading the whole file very fast using double buffering, but it makes
expanding the file difficult.
 In linked allocation, each file block contains a pointer to the next file block. This
makes it easy to expand the file but makes it slow to read the whole file.
 A combination of the two allocates clusters of consecutive disk blocks, and the clusters
are linked.
 Clusters are sometimes called file segments or extents.
 Another possibility is to use indexed allocation, where one or more index
blocks contain pointers to the actual file blocks.
 It is also common to use combinations of these techniques.

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 41


5. File Headers

 A file header or file descriptor contains information about a file that is needed by the system
programs that access the file records.
 The header includes information to determine the disk addresses of the file blocks as well as
to record format descriptions, which may include field lengths and the order of fields within a
record for fixed-length unspanned records and field type codes, separator characters, and
record type codes for variable-length records.
 To search for a record on disk, one or more blocks are copied into main memory buffers.
 Programs then search for the desired record or records within the buffers, using the
information in the file header.
 If the address of the block that contains the desired record is not known, the search
programs must do a linear search through the file blocks.
 Each file block is copied into a buffer and searched until the record is located or all the file
blocks have been searched unsuccessfully.
 This can be very time-consuming for a large file.
 The goal of a good file organization is to locate the block that contains a desired record with
a minimal number of block transfers.
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 42
HASHING TECHNIQUES
 separate chaining,
 linear and quadratic probing,
 double hashing,
 extendible hashing,
 rehashing

Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 43


INDEXING
 We know that data is stored in the form of records. Every record
has a key field, which helps it to be recognized uniquely.

 Indexing is a data structure technique to efficiently retrieve records


from the database files based on some attributes on which the
indexing has been done. Indexing in database systems is similar to
what we see in books.

 Indexing is defined based on its indexing attributes. Indexing can


be of the following types −
 Primary Index

 Secondary Index

 Clustering Index
Primary Index − Primary index is defined on an ordered data file.
The data file is ordered on a key field. The key field is generally the
primary key of the relation.
Secondary Index − Secondary index may be generated from a field
which is a candidate key and has a unique value in every record, or a
non-key with duplicate values.
Multilevel Index

Index records comprise search-key values and data pointers. Multilevel index is
stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index
records in the main memory so as to speed up the search operations. If single-level
index is used, then a large size index cannot be kept in memory which leads to
multiple disk accesses.
INDEXED SEQUENTIAL ACCESS METHOD (ISAM)

This is an advanced sequential file organization method.


Here records are stored in order of primary key in the file.
Using the primary key, the records are sorted.
For each primary key, an index value is generated and mapped with the record.
This index is nothing but the address of record in the file.
ADVANTAGES OF ISAM
 Since each record has its data block address, searching for a record in
larger database is easy and quick. There is no extra effort to search
records. But proper primary key has to be selected to make ISAM
efficient.
 This method gives flexibility of using any column as key field and
index will be generated based on that. In addition to the primary key
and its index, we can have index generated for other fields too. Hence
searching becomes more efficient, if there is search based on columns
other than primary key.
 It supports range retrieval, partial retrieval of records. Since the index
is based on the key value, we can retrieve the data for the given range
of values. In the same way, when a partial key value is provided, say
student names starting with ‘JA’ can also be searched easily.
DISADVANTAGES OF ISAM

 An extra cost to maintain index has to be afforded. i.e.; we need to have extra
space in the disk to store this index value. When there is multiple key-index
combinations, the disk space will also increase.
 As the new records are inserted, these files have to be restructured to maintain
the sequence. Similarly, when the record is deleted, the space used by it needs
to be released. Else, the performance of the database will slow down.
B+ TREE FILE ORGANIZATION

 B+ Tree is an advanced method of ISAM file organization.


 It uses the same concept of key-index, but in a tree like structure.
 B+ tree is similar to binary search tree, but it can have more than
two leaf nodes.
 It stores all the records only at the leaf node.
 Intermediary nodes will have pointers to the leaf nodes. They do
not contain any data/records.
Consider a student table below. The key value here is STUDENT_ID.
And each record contains the details of each student along with its
key value and the index/pointer to the next value.
In a B+ tree it can be represented as below.
 Please note that the leaf node 100 means, it has name and address of student with ID
100, as we saw in R1, R2, R3 etc above.

 From the above B+ tree structure, it is evident that

 There is one main node called root of the tree – 105 is the root here.

 There is an intermediary layer with nodes. They do not have actual records stored. They
are all pointers to the leaf node. Only the leaf node contains the data in sorted order.

 The nodes to the left of the root nodes have prior values of root and nodes to the right
have next values of the root. i.e.; 102 and 108 respectively.

 There is one final node, called leaf node, which has only values. i.e.; 100, 101, 103,
104, 106 and 107

 All the leaf nodes are balanced – all the leaf nodes at same distance from the root node.
Hence searching any record is easier.

 Searching any record is linear in this case. Any record can be traversed through single
path and accessed easily.
ADVANTAGES OF B+ TREES
 Since all records are stored only in the leaf node and are sorted
sequential linked list, searching is becomes very easy.
 Using B+, we can retrieve range retrieval or partial retrieval.
Traversing through the tree structure makes this easier and quicker.
 As the number of record increases/decreases, B+ tree structure
grows/shrinks. There is no restriction on B+ tree size, like we have
in ISAM.
 Since it is a balance tree structure, any insert/ delete/ update does
not affect the performance.
 Since we have all the data stored in the leaf nodes and more
branching of internal nodes makes height of the tree shorter. This
reduces disk I/O. Hence it works well in secondary storage devices.
DISADVANTAGES OF B+ TREES

This method is less efficient for static tables


CONSTRUCTION OF B+
TREE
Rules:

If order ( m ) = 4, 3
Then Maximum children = m = 4 , 3
Minimum children = ceil(m/2)=2
Max. keys = (m - 1) = 3,2
Min. keys = ceil(m/2)-1 = 1
B+ TREE MATERIALS
Notes:
 https://www.studytonight.com/advanced-data-structures/b-plus-trees-data-structure

Animation:
https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

https://goneill.co.nz/btree-demo.php

Example:
DB2_4_BplusTreeExample.pdf

Video Demo of B+ Tree:


https://www.youtube.com/watch?v=DqcZLulVJ0M
Database Management Systems (DBMS), SCOPE, VIT-AP University, India 02/03/2025 60

You might also like