CSE301 Lec5
CSE301 Lec5
Define terms
CHAPTER 5:
Describe the physical database design
PHYSICAL DATABASE DESIGN AND
process
PERFORMANCE
Choose storage formats for attributes
Select appropriate file organizations
Describe three types of file organization
Describe indexes and their appropriate use
Translate a database model into efficient
structures, and know when/how to
denormalize
1 2
3 4
DESIGNING FIELDS CHOOSING DATA TYPES
5 6
7 8
HANDLING MISSING DATA DENORMALIZATION
Transforming normalized relations into non-normalized
Substitute an estimate of the missing value physical record specifications
Benefits:
(e.g., using a formula) Can improve performance (speed) by reducing number of table
lookups (i.e. reduce number of necessary join queries)
Construct a report listing missing values
Costs (due to data duplication)
In programs, ignore missing data unless the Wasted storage space
Data integrity/consistency threats
value is significant (sensitivity testing)
Common denormalization opportunities
One-to-one relationship (Fig. 5-2)
Many-to-many relationship with non-key attributes (associative entity)
Triggers can be used to perform these operations (Fig. 5-3)
Reference data (1:N relationship where 1-side has data not used in
any other relationship) (Fig. 5-4)
9 10
Extra table
access
required
11 12
Figure 5-4
A possible DENORMALIZE WITH CAUTION
denormalization
situation:
reference data Denormalization can
Increase chance of errors and inconsistencies
Reintroduce anomalies
Force reprogramming when business rules change
Extra table
access
required
Perhaps other methods could be used to
improve performance of joins
Data duplication
Organization of tables in the database (file
organization and clustering)
Proper query design and optimization
13 14
Tablespace components
Segment – a table, index, or partition
Extent–contiguous section of disk space
Data block – smallest unit of storage
15 16
FILE ORGANIZATIONS FILE ORGANIZATIONS
Technique for physically arranging Factors for selecting file organization:
records of a file on secondary storage
Fast data retrieval and throughput
Types of file organizations Efficient storage space utilization
Sequential Protection from failure and data loss
Indexed Minimizing need for reorganization
Hashed Accommodating growth
Security from unauthorized use
17 18
Figure 5-6a
Sequential file INDEXED FILE ORGANIZATIONS
organization Storage of records sequentially or nonsequentially with
an index that allows software to locate individual
records
Records of the If sorted – every Index: a table or other data structure used to
insert or delete
file are stored in requires re-sort determine in a file the location of records that satisfy
sequence by the some condition
primary key
field values
If not sorted Primary keys are automatically indexed
Average time to
find desired record
= n/2 Other fields or combinations of fields can also be
indexed; these are called secondary keys (or
19 nonunique keys) 20
Figure 5-6b Indexed file organization Figure 5-6c
Hashed file
organization
Hash algorithm
Usually uses division-
remainder to determine
record position. Records
with same position are
uses a tree search grouped in lists.
Average time to find desired
record = depth of the tree
21 22
23 24
USING AND SELECTING KEYS RULES FOR USING INDEXES
Creating a unique key index 1. Use on larger tables
Example: CustomerID (primary key) of Customer
2. Index the primary key of each table
Example: Composite primary key for OrderLine
3. Index search fields (fields frequently in WHERE clause)
25 26
9. Be careful of indexing attributes with null values; many Data warehouses are already configured for
DBMSs will not recognize null values in an index optimized query performance
search
27 28