Physical Data Organization: Department of Computer Science
Physical Data Organization: Department of Computer Science
Physical-1
UVA DEPARTMENT OF COMPUTER SCIENCE
Physical-2
UVA DEPARTMENT OF COMPUTER SCIENCE
Disk
- direct access storage device (not sequential)
- arm movement involves seek time and latency time
- goal is to reduce # of disk access and seek time
- a block need not to be transferred every time
- buffer blocks: closely related with concurrency control
and recovery strategy of the database system
Buffer management
- goal is to increase hit ratio
- similar to virtual memory management in OS
- differences: forced writing for recovery and
MRU (most recently used first) replacement algorithm
- priority-based replacement: data dictionary and
index blocks have high priority
Physical-3
UVA DEPARTMENT OF COMPUTER SCIENCE
RAID
Physical-4
UVA DEPARTMENT OF COMPUTER SCIENCE
File Organization
File
- a sequence of records mapped unto disk blocks
- block: unit of data transfer between disk and memory
- block size ranges from 512 bytes to few Kbytes
- fixed-length records vs variable-length records
Fixed-length records
- size of each field is declared
- when delete, mark it to be ignored: searching for
deleted free space may not be efficient
- use pointer for free space: danger of dangling pointer
which no longer points to the desired record
- problem of interblock records: needs 2 accesses
... block i) (block i+1 ...
----------
record j
Physical-5
UVA DEPARTMENT OF COMPUTER SCIENCE
Variable-length Records
Physical-6
UVA DEPARTMENT OF COMPUTER SCIENCE
Relational database
- straight-forward
- in most cases, each relation in a separate file
File organization
- how to organize a given set of records in files
- heap file: any record can be placed anywhere (no ordering)
- sequential file: records are stored in a sequential order
- hashing file: hash function computes the specific block
for the record based some attribute value
- clustering file: records of different relations stored on
the same file/block for efficient processing
- related records can be read by one block read
- may be inefficient for other operations
Physical-7
UVA DEPARTMENT OF COMPUTER SCIENCE
Efficient Searching
Physical-8
UVA DEPARTMENT OF COMPUTER SCIENCE
Index Structures
Index file
- index is usually defined on a single field
of a record (index field)
- index file is for fast random access
Dense index
- one index record for every search-key value
- faster access but higher overhead
Sparse index
- index records for only some of the records
- less faster but less overhead
Physical-9
UVA DEPARTMENT OF COMPUTER SCIENCE
Index Structures
Hierarchy of index
- multi-level index for a large index file
- index tree (search tree)
Primary and secondary index
- primary index is the one whose search key specifies
the sequential order of the file
- secondary index: index other than primary one
- secondary index improves the performance of queries
that use keys other than the primary search key
- modifying DB imposes a serious overhead on secondary
index (compared to the primary index)
- dense index is desirable than sparse index for
secondary index, since the file is not ordered
physically according to the secondary index
Physical-10
UVA DEPARTMENT OF COMPUTER SCIENCE
Clustering Index
Clustering field
- a non-key field that does not have a distinct value
for each record, on which records of a file are
physically ordered
Clustering index
- clustering index is to speed up retrieval of records
that have the same value for the clustering field
- differs from primary index which requires that ordering
field should have a distinct value for each record
Physical-11
UVA DEPARTMENT OF COMPUTER SCIENCE
Index File
Physical-12
UVA DEPARTMENT OF COMPUTER SCIENCE
Physical-13
UVA DEPARTMENT OF COMPUTER SCIENCE
Search Tree
Physical-14
UVA DEPARTMENT OF COMPUTER SCIENCE
Physical-15
UVA DEPARTMENT OF COMPUTER SCIENCE
Physical-16
UVA DEPARTMENT OF COMPUTER SCIENCE
B+-tree
Physical-17
UVA DEPARTMENT OF COMPUTER SCIENCE
Physical-18