0% found this document useful (0 votes)

51 views18 pages

Physical Data Organization: Department of Computer Science

The document discusses physical data organization in databases including storage media hierarchy, disk access and buffer management, RAID, file organization, variable-length records, mapping data to files, efficient searching using indexing and hashing, index structures like dense/sparse indexes and clustering indexes, and size and blocking factor of index files. It focuses on techniques for efficient physical representation and access of data on secondary storage.

Uploaded by

gsivaji88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views18 pages

Physical Data Organization: Department of Computer Science

Uploaded by

gsivaji88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UVA DEPARTMENT OF COMPUTER SCIENCE

Physical Data Organization

Database design using logical model of the database

- appropriate level for users to focus on
- user independence from implementation details
Performance
- other major factor in user satisfaction
- depends on
- efficient data structures for data representation
- efficiency of system operation on those structures
Disk access
- one of the most critical factors in performance
- main memory is in general not big enough for entire DB
- recovery problem with main memory DB
- disk contains data files and system files including
data dictionary and index files

Physical-1
UVA DEPARTMENT OF COMPUTER SCIENCE

Storage Media Hierarchy

Storage medium: primary storage and secondary storage

- database is stored physically on some some storage medium
- primary storage: can be operated directly by CPU
--- main memory & cache
- secondary storage: larger capacity, lower cost, slower access;
cannot be operated directly by CPU; must be copied to primary
Hierarchy
- access speed, cost per unit of data, reliability
- cache: fastest and most costly
- main memory
- flash memory: limited number of writes (also slow)
non-volatile: disk-substitute in embedded systems
- magnetic disk and optical disk (CD-ROM)
- tape storage: sequential access; for backup and archival

Physical-2
UVA DEPARTMENT OF COMPUTER SCIENCE

Disk Access and Buffer Management

Disk
- direct access storage device (not sequential)
- arm movement involves seek time and latency time
- goal is to reduce # of disk access and seek time
- a block need not to be transferred every time
- buffer blocks: closely related with concurrency control
and recovery strategy of the database system
Buffer management
- goal is to increase hit ratio
- similar to virtual memory management in OS
- differences: forced writing for recovery and
MRU (most recently used first) replacement algorithm
- priority-based replacement: data dictionary and
index blocks have high priority

Physical-3
UVA DEPARTMENT OF COMPUTER SCIENCE

RAID

Redundant arrays of independent disks

- motivation: large # of small disks might be cost
effective; higher reliability and higher performance
Higher reliability by redundancy
- mirroring/shadowing: a logical disk consists of two
physical disks --- write on both
Higher performance by parallelism
- data striping: splitting data across multiple disks
- bit-level or block-level striping
- with n disks, block i will go to disk (i mod n) + 1
RAID levels
- to provide redundancy at lower cost using disk striping
combined with error-correcting bits, instead of mirroring

Physical-4
UVA DEPARTMENT OF COMPUTER SCIENCE

File Organization

File
- a sequence of records mapped unto disk blocks
- block: unit of data transfer between disk and memory
- block size ranges from 512 bytes to few Kbytes
- fixed-length records vs variable-length records
Fixed-length records
- size of each field is declared
- when delete, mark it to be ignored: searching for
deleted free space may not be efficient
- use pointer for free space: danger of dangling pointer
which no longer points to the desired record
- problem of interblock records: needs 2 accesses
... block i) (block i+1 ...
----------
record j

Physical-5
UVA DEPARTMENT OF COMPUTER SCIENCE

Variable-length Records

When such situations occur?

- multiple record types in one file
- record type allows variable length fields
- repeating groups (multiple values)
Methods to deal with them
- byte string representation: special end-of-record
symbol ( ) at the end of each record
- each record is a string of consecutive bytes
- difficulty in reusing the space of deleted record
- fixed-length representation:
1) reserved space for expected maximum length
- useful only if most are close to max. length
2) a list of fixed-length records chained by pointers
3) anchor block (first record of the chain) and overflow
block (all the others) chained by pointers

Physical-6
UVA DEPARTMENT OF COMPUTER SCIENCE

Mapping Data to Files

Relational database
- straight-forward
- in most cases, each relation in a separate file
File organization
- how to organize a given set of records in files
- heap file: any record can be placed anywhere (no ordering)
- sequential file: records are stored in a sequential order
- hashing file: hash function computes the specific block
for the record based some attribute value
- clustering file: records of different relations stored on
the same file/block for efficient processing
- related records can be read by one block read
- may be inefficient for other operations

Physical-7
UVA DEPARTMENT OF COMPUTER SCIENCE

Efficient Searching

Additional structures help searching

- associated with files to make the search for
records based on certain field more efficient
- for direct data locating w/o sequential search
- two approaches: indexing and hashing
Sequential file
- records are chained together by pointers
for fast retrieval in search key order
- records are stored physically in search key order
to minimize the number of block accesses
- difficult to maintain the physical sequential
order as records are inserted and deleted
- binary search for files can be done on the blocks
rather than on the records, if block address are
available in the file header

Physical-8
UVA DEPARTMENT OF COMPUTER SCIENCE

Index Structures

Index file
- index is usually defined on a single field
of a record (index field)
- index file is for fast random access
Dense index
- one index record for every search-key value
- faster access but higher overhead
Sparse index
- index records for only some of the records
- less faster but less overhead

(Brighton) (record: Brighton, ..) (Brighton)

(Downhill) (record: Downhill, ..)
(Marinion) (record: Marinion, ..) (Marinion)
dense index sparse index

Physical-9
UVA DEPARTMENT OF COMPUTER SCIENCE

Index Structures

Hierarchy of index
- multi-level index for a large index file
- index tree (search tree)
Primary and secondary index
- primary index is the one whose search key specifies
the sequential order of the file
- secondary index: index other than primary one
- secondary index improves the performance of queries
that use keys other than the primary search key
- modifying DB imposes a serious overhead on secondary
index (compared to the primary index)
- dense index is desirable than sparse index for
secondary index, since the file is not ordered
physically according to the secondary index

Physical-10
UVA DEPARTMENT OF COMPUTER SCIENCE

Clustering Index

Clustering field
- a non-key field that does not have a distinct value
for each record, on which records of a file are
physically ordered

Clustering index
- clustering index is to speed up retrieval of records
that have the same value for the clustering field
- differs from primary index which requires that ordering
field should have a distinct value for each record

Physical-11
UVA DEPARTMENT OF COMPUTER SCIENCE

Index File

Index file size

- index file for a primary index need substantially
fewer blocks than the data file
- why?
- fewer index entries: an entry exists for each
block of data file rather than for each record
- index entry is smaller in size than a data record:
only two fields (key value and block pointer)
Blocking factor (bfr)
- savings in disk block accesses
- bfr = block size (B) / record length (R)

Physical-12
UVA DEPARTMENT OF COMPUTER SCIENCE

Index File: Example

An ordered file with 30,000 records, B = 1 Kb, R = 100 bytes

- bfr = 10; data file needs 3000 blocks
- binary search would require (log2 Blocks) = 12 accesses

- with ordering key field of 9 bytes and block pointer

of 6 bytes, size of primary index entry = 15 bytes
- bfr = block size (B) / record length (R) = 68
- total # of index entries: 3000
- # of blocks needed for the index = (3000/68) = 45
- binary search on index file would require
(log2 Bi ) = (log2 45) = 6 accesses
- search for a record using the primary index
6 (for index) + 1 (for data) = 7 accesses

Physical-13
UVA DEPARTMENT OF COMPUTER SCIENCE

Search Tree

Disadvantage of indexed sequential file organization

- performance degradation as file grows
- file reorganization can avoid this performance
degradation with its own overhead
Search tree
- a special type of tree used to guide the search
for a record given the value of one of its fields
- in a search tree of order p, each node contains
at most p 1 search values and p pointers in the order
<P1 , K1 , ..., Pq 1 , Kq 1 , Pq >, where q p
Pi : pointer to a child node or null pointer
Ki : search key value from some ordered set of
values (all search key values are assumed to be unique)
for all values X in the subtree pointed by Pi , we have
Ki 1 <X<Ki for 1<i<q, X<Ki for i=1, and Ki 1 <X for i=q

Physical-14
UVA DEPARTMENT OF COMPUTER SCIENCE

B-tree Index Files

B-tree (balanced tree)

- a search tree with some additional constraints
for efficient insertion and deletion
- number of access is fixed
Formal definition
A B-tree of order n is a search tree that satisfies
1) the root has at least two children
2) all nodes other than root have at least n/2 children
3) all leaf nodes are at the same level (balanced)
Insertion and deletion
- insertion may need split when a node becomes too large
(more than n children)
- deletion may need combining if a node becomes too small
(less than n/2 pointers)
- balance property must be maintained

Physical-15
UVA DEPARTMENT OF COMPUTER SCIENCE

B-tree and B+-tree

Node structure of B-tree

<P1 , (K1 , Pr1 ), P2 , ..., (Kq 1 , Prq 1 ), Pq >
Pi : tree pointer to point another node
Ki : search key value
Pri : data pointer to point record whose search key
field value is Ki (or the data block containing it)
- within each node, K1 < K2 < .. <Kq 1

- for all values X in the subtree pointed by Pi , we have

Ki 1 <X<Ki for 1<i<q, X<Ki for i=1, and Ki 1 <X for i=q
- a node with q tree pointers, q p, has q 1 search
key field values, and hence q 1 data pointers
B+ -tree: a variation of B-tree data structure
- most widely used multi-level index implementation
<P1 , K1 , ..., Pq 1 , Kq 1 , Pq >, where q p
- at leaf node, it is <K1 , Pr1 , ..., Kq 1 , Prq 1 , Pnext >
where Pnext points to the next leaf node of the tree

Physical-16
UVA DEPARTMENT OF COMPUTER SCIENCE

B+-tree

Requirements for maintaining B+ -tree

- every node must contain at least n/2 pointers
except for the root (which should have at least 2)
- balanced: for ensuring good performance
Searching for key field value K
1) visit the root node, looking for the smallest key
value greater than K. Suppose the value is Ki .
2) follow pointer Pi to another node
- if K < K1 , then follow P1
- if K > Kmax , then follow Pmax
3) repeat step 2 until reaching a leaf node

Physical-17
UVA DEPARTMENT OF COMPUTER SCIENCE

Differences of B+-tree from B-tree

1. In B+ -tree, data pointers are stored only at the leaf nodes

- more entires can be packed into internal (non-leaf) nodes
of a B+ -tree than for a similar B-tree
- for the same block (node) size, the order p will be larger
for the B+ -tree than for the B-tree --- improved search time
- B-tree eliminates redundant storage of search key values
- faster search in some cases to find desired search key
values before reading a leaf node in B-tree
2. Leaf and non-leaf nodes are of the same size in B+ -tree,
while in B-tree, non-leaf nodes are larger
- complication in storage management for index structures
3. Deletion in B-tree is more complicated
- in B+ -tree, deleted entry always appears in a leaf
- in B-tree, it can be a non-leaf node, requiring
replacement by the proper value from the subtree
of the node containing the deleted entry

Physical-18

ADBMS(BCA)_2_1744958912050
No ratings yet
ADBMS(BCA)_2_1744958912050
40 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Indexing
No ratings yet
Indexing
53 pages
7_DataStorageIndexingStructures
No ratings yet
7_DataStorageIndexingStructures
83 pages
Lecture Data Storage
No ratings yet
Lecture Data Storage
28 pages
WINSEM2024-25_CBS1003_ETH_VL2024250505129_2025-04-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CBS1003_ETH_VL2024250505129_2025-04-08_Reference-Material-I
12 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
DBMS-Unit5-PPT (1)
No ratings yet
DBMS-Unit5-PPT (1)
40 pages
OS FIle System
No ratings yet
OS FIle System
39 pages
Unit 5
No ratings yet
Unit 5
185 pages
File Organization
No ratings yet
File Organization
47 pages
Updated Unit 5 DBMS Notes
No ratings yet
Updated Unit 5 DBMS Notes
138 pages
09_FIle.pptx
No ratings yet
09_FIle.pptx
22 pages
02 Storage (1)
No ratings yet
02 Storage (1)
104 pages
File Organizations and Indexes
No ratings yet
File Organizations and Indexes
51 pages
DS_TM_Study_Material_Presentations_Unit-4_1TM
No ratings yet
DS_TM_Study_Material_Presentations_Unit-4_1TM
22 pages
Chapter 6- - Copy
No ratings yet
Chapter 6- - Copy
62 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Chapter 6-
No ratings yet
Chapter 6-
62 pages
CH 3 Index
No ratings yet
CH 3 Index
40 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
Lecture15 Fall
No ratings yet
Lecture15 Fall
102 pages
d-s-s-1
No ratings yet
d-s-s-1
6 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
8.Physical Database Design
No ratings yet
8.Physical Database Design
20 pages
Unit-4
No ratings yet
Unit-4
42 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
81 pages
Vector data Files in gis_20240830_114132_0000
No ratings yet
Vector data Files in gis_20240830_114132_0000
25 pages
Indexing
No ratings yet
Indexing
62 pages
Appendix F
No ratings yet
Appendix F
24 pages
VND - Ms Powerpoint&Rendition 1
No ratings yet
VND - Ms Powerpoint&Rendition 1
118 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
File Organization
No ratings yet
File Organization
11 pages
08pdf Physical Optim
No ratings yet
08pdf Physical Optim
59 pages
Storage and File Structures: Goals
No ratings yet
Storage and File Structures: Goals
13 pages
file organization
No ratings yet
file organization
9 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
File Organization and Data Base Design
No ratings yet
File Organization and Data Base Design
17 pages
Unit 4 Database Management System4
No ratings yet
Unit 4 Database Management System4
3 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
No ratings yet
File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
23 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Database File Organisation Lecture
No ratings yet
Database File Organisation Lecture
32 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
CH 13
No ratings yet
CH 13
6 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
28 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet

Physical Data Organization: Department of Computer Science

Uploaded by

Physical Data Organization: Department of Computer Science

Uploaded by

UVA DEPARTMENT OF COMPUTER SCIENCE

Physical Data Organization

Database design using logical model of the database

Storage Media Hierarchy

Storage medium: primary storage and secondary storage

Disk Access and Buffer Management

Redundant arrays of independent disks

When such situations occur?

Mapping Data to Files

Additional structures help searching

(Brighton) (record: Brighton, ..) (Brighton)

Index file size

Index File: Example

An ordered file with 30,000 records, B = 1 Kb, R = 100 bytes

- with ordering key field of 9 bytes and block pointer

Disadvantage of indexed sequential file organization

B-tree Index Files

B-tree (balanced tree)

B-tree and B+-tree

Node structure of B-tree

- for all values X in the subtree pointed by Pi , we have

Requirements for maintaining B+ -tree

Differences of B+-tree from B-tree

1. In B+ -tree, data pointers are stored only at the leaf nodes

You might also like