0% found this document useful (0 votes)
66 views43 pages

Chapter 5 Physical Design

The document discusses the key aspects of physical database design including: 1. Mapping the logical database design to a physical design for a target database management system (DBMS). This includes designing base relations, constraints, and representations of derived data. 2. Analyzing common transactions to understand data usage and access patterns. Performance criteria like frequently used relations and peak loads are identified. 3. Selecting appropriate file organizations and secondary indexes based on the transaction analysis to optimize data access and improve performance of common operations. Disk space requirements are also estimated.

Uploaded by

Elias Wekgari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views43 pages

Chapter 5 Physical Design

The document discusses the key aspects of physical database design including: 1. Mapping the logical database design to a physical design for a target database management system (DBMS). This includes designing base relations, constraints, and representations of derived data. 2. Analyzing common transactions to understand data usage and access patterns. Performance criteria like frequently used relations and peak loads are identified. 3. Selecting appropriate file organizations and secondary indexes based on the transaction analysis to optimize data access and improve performance of common operations. Disk space requirements are also estimated.

Uploaded by

Elias Wekgari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Chapter Five

Physical Database
Design
Chapter Five - Objectives
Purpose of physical database design.

How to map the logical database design to a


physical database design.
How to design base relations for target DBMS.

How to design general constraints for target


DBMS.

2
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Chapter five – Objectives Cont’d…
How to select appropriate file organizations based on
analysis of transactions.
When to use secondary indexes to improve
performance.
How to estimate the size of the database.

How to design user views.

How to design security mechanisms to satisfy user


requirements.
How to Monitor and Tune an operational System

3
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Logical vs. Physical Database Design
Sources of information for physical design process
includes logical data model and documentation
that describes model.
Logical database design is concerned with the what,
physical database design is concerned with the
how.

4
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Physical Database Design
Process of producing a description of the
implementation of the database on secondary
storage.
It describes the base relations, file organizations,
and indexes used to achieve efficient access to the
data, and any associated integrity constraints and
security measures.

5
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Physical Database Design
Methodology
1. Translate logical data model for target DBMS
Design base relations
Design representation of derived data
Design general constraints

6
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Physical Database Design Methodology
2. Design file organizations and indexes
 Analyse transactions
 Choose file organizations
Choose indexes
Estimate disk space requirements

7
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Physical Database Design
Methodology
3. Design user views
4. Design security mechanisms
5. Consider the introduction of controlled
redundancy
6. Monitor and tune operational system

8
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
1. Translate Logical Data Model for
Target DBMS
To produce a relational database schema from the
logical data model that can be implemented in the
target DBMS.
 Need to know functionality of target DBMS such as how to
create base relations and whether the system supports the
definition of:
PKs, FKs, and AKs;
required data – i.e. whether system supports NOT NULL;
domains;
relational integrity constraints;
general constraints.

9
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Design base relations
To decide how to represent base relations
identified in logical model in target DBMS.
For each relation, need to define:
the name of the relation;
a list of simple attributes in brackets;
the PK and, where appropriate, AKs and FKs.
referential integrity constraints for any FKs identified.

10
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Design base relations
From data dictionary, we have for each attribute:
its domain, consisting of a data type, length, and any
constraints on the domain;
an optional default value for the attribute;
whether it can hold nulls;
whether it is derived, and if so, how it should be
computed.

11
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
DDL for the PropertyForRent Relation

12
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Design representation of derived data
To decide how to represent any derived data
present in logical data model in target DBMS.
Examine logical data model and data dictionary,
and produce list of all derived attributes.
Derived attribute can be stored in database or
calculated every time it is needed.

13
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Design representation of derived data
Option selected is based on:
 additional cost to store the derived data and keep it consistent
with operational data from which it is derived;
 cost to calculate it every time it is required.

Less expensive option is chosen subject to performance


constraints.
Usually recommended approaches are if,
 Needed frequently and the Query Facility hardly copes with
the calculation(eg. recursion)- store it
 Needed rarely- calculate it
Whichever option selected must be documented with
justification.
14
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
PropertyforRent Relation and Staff Relation with
Derived Attribute noOfProperties

15
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Design general constraints
To design the general constraints for target DBMS.
 Some DBMS provide more facilities than others for defining
enterprise constraints. Example: the following can be
embedded in the SQL Create Statement of the staff relation.

CONSTRAINT StaffNotHandlingTooMuch
CHECK (NOT EXISTS (SELECT staffNo
FROM PropertyForRent
GROUP BY staffNo
HAVING COUNT(*) > 100))

16
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
2. Design File Organizations and Indexes
 File Organization
The physical arrangement of data in a file into records and
pages on secondary storage.
 To determine optimal file organizations to store the base
relations and the indexes that are required to achieve
acceptable performance; that is, the way in which
relations and tuples will be held on secondary storage.

 Must understand the typical workload that database


must support.

17
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Analyze transactions
To understand the functionality of the transactions
that will run on the database and to analyze the
important transactions.
Attempt to identify performance criteria, such as:
transactions that run frequently and will have a
significant impact on performance;
transactions that are critical to the business;
times during the day/week when there will be a high
demand made on the database (called the peak load).

18
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Analyze transactions Cont’d…
Use this information to identify the parts of the
database that may cause performance problems.
Also need to know high-level functionality of the
transactions, such as:
attributes that are updated;
search criteria used in a query.

19
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Analyze transactions cont’d…
Often not possible to analyze all transactions, so
investigate most ‘important’ ones.
To help identify these, you can use:
transaction/relation cross-reference matrix, showing
relations that each transaction accesses, and/or
transaction usage map, indicating which relations are
potentially heavily used.

20
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Analyze transactions cont’d…
 To focus on areas that may be problematic:

(1) Map all transaction paths to relations.

(2) Determine which relations are most frequently accessed by transactions.


Where transactions require frequent access to particular relations,
then their pattern of operation is very important.
(3) Analyze the data usage of selected transactions that involve these relations.
For an update transaction, note the attributes that are updated, as
these attributes may be candidates for avoiding an access structure
(such as a secondary index).
Attributes used in a predicate of a query may be candidates for
access structure
The attributes used in any predicates for very frequent or critical
transactions should have a higher priority for access structures.
21
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Cross-referencing transactions and
relations(the Matrix)

22
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Example: Transaction Usage Map

23
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Example: Transaction Analysis Form

24
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose file organizations
 To determine an efficient file organization for each base relation.

 File organizations include Heap, Hash, Indexed Sequential Access Method


(ISAM), B+-Tree, and Clusters.
 The main types of file organization are:
 Heap (unordered) files: Records are placed on disk in no particular order.- just as in
the order of insertion.
 Indexed Sequential (ordered) files: Records are stored ordered by the value of a
specified field.
 Hash files: Records are placed on disk according to a hash function with a value of a
field in a record. ( Random/Direct file organization)
 B+ tree is good in storing data for efficient retrieval in a block-oriented storage
context—in particular file systems, like in NTFS.
 Clustered: two or more related tables stored together ( Related rows from each table
are physically stored together)
 Some DBMSs may not allow selection of file organizations.
 The DBMSs themselves take the optimal file organization strategy
 Relational database management systems such as IBM
DB2, Informix, Microsoft SQL Server, Oracle 8, Sybase ASE,
and SQLite support B+ tree for table and indices.
25
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes
To determine whether adding indexes will improve the
performance of the system.
One approach is to keep tuples unordered and create
as many secondary indexes as necessary.

26
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes Cont’d…
Another approach is to order tuples in the relation
by specifying a primary or clustering index.
In this case, choose the attribute for ordering or
clustering the tuples as:
attribute that is used most often for join operations -
this makes join operation more efficient, or
attribute that is used most often to access the tuples
in a relation in order of that attribute (order by
clause).

27
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes Cont’d…
If ordering attribute chosen is key of relation,
index will be a primary index; otherwise, index will
be a clustering index.
Each relation can only have either a primary index
or a clustering index.
Secondary indexes provide a mechanism for
specifying an additional key for a base relation that
can be used to retrieve data more efficiently.

28
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes Cont’d…
Have to balance overhead involved in maintenance
and use of secondary indexes against performance
improvement gained when retrieving data.
Index management overhead includes:
adding an index record to every secondary index
whenever tuple is inserted;
updating secondary index when corresponding tuple
updated;
increase in disk space needed to store secondary index;
possible performance degradation during query
optimization to consider all secondary indexes.

29
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes – Guidelines
1. Do not index small relations.
2. Index PK of a relation if it is not a key of the file
organization.
3. Add secondary index to a FK if it is frequently accessed.
4. Add secondary index to any attribute heavily used as a
secondary key.
5. Add secondary index on attributes involved in: selection or
join criteria; ORDER BY; GROUP BY; and other
operations involving sorting (such as UNION or
DISTINCT).

30
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Choose indexes – Guidelines Cont’d…
6. Add secondary index on attributes involved in built-in
functions.
7. Add secondary index on attributes that could result in an
index-only plan.
8. Avoid indexing an attribute or relation that is frequently
updated.
9. Avoid indexing an attribute if the query will retrieve a
significant proportion of the relation.
10. Avoid indexing attributes that consist of long character
strings.

31
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Estimate disk space requirements
To estimate the amount of disk space that will be
required by the database.
 Consider a number of issues like
Number of tables
Number of attributes in each table
Size of bytes reserved for each attribute
Number of Records per each table
The percentage of growth in the number of records in each
table.

32
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
3. Design User Views
To design the user views that were identified during
the Requirements Collection and Analysis stage of the
database system development lifecycle.
The Views are created using a SQL Statement that
retrieves data from one or more base tables

33
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
4. Design Security Measures
To design the security measures for the database as specified
by the users.
Access and privilege definition
Designing Users /User Groups
Granting Users the appropriate privilege for a database
object with the appropriate mode of operation.
i.e any database access control should specify three things
Who?
What object?
What operation?
Two Types of Securities Provided by databases
System security- Authentication
Data security- Authorization
34
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
5. Consider the Introduction of
Controlled Redundancy
To determine whether introducing redundancy in a
controlled manner by relaxing normalization rules
will improve the performance of the system.

35
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Consider the Introduction of
Controlled Redundancy
Result of normalization is a design that is structurally
consistent with minimal redundancy.
However, sometimes a normalized database does not
provide maximum processing efficiency – i.e.
performance.
May be necessary to accept loss of some benefits of a
fully normalized design in favor of performance.

36
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Consider the Introduction of
Controlled Redundancy
Also consider that, denormalization:
makes implementation more complex;
often sacrifices flexibility;
may speed up retrievals but it slows down updates.

37
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Consider the Introduction of
Controlled Redundancy
Denormalization refers to a refinement to
relational schema such that the degree of
normalization for a modified relation is less than
the degree of at least one of the original relations.
Also use term more loosely to refer to situations
where two relations are combined into one new
relation, which is still normalized but contains
more nulls than original relations.

38
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Consider the Introduction of
Controlled Redundancy
Consider denormalization in following situations,
specifically to speed up frequent or critical
transactions:
Combining 1:1 relationships
Duplicating non-key attributes in 1:* relationships to
reduce joins
Duplicating foreign key attributes in 1:*
relationships to reduce joins

39
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Consider the Introduction of
Controlled Redundancy
Duplicating attributes in *:* relationships to reduce
joins
Introducing repeating groups
Creating extract tables
Partitioning relations.

40
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
6 Monitor & Tune Operational System
To monitor operational system and improve
performance of system to correct inappropriate
design decisions or reflect changing requirements.

41
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Monitor & Tune Operational System Cont’d…
Number of factors may be used to measure efficiency:

- Transaction throughput: number of transactions processed


in given time interval.
- Response time: elapsed time for completion of a single
transaction.
- Disk storage: amount of disk space required to store
database files.
No one factor is always correct. Have to trade each off
against another to achieve reasonable balance.
Need to understand how the various hardware
components interact and affect database performance.
42
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5
Quiz #1
Answer the following questions briefly
1. List at least two guidelines for avoiding secondary indexes (1)
2. What is similarity and difference between primary, cluster and
secondary indexes. (1)
3. Level of Normalization for denormalized relation is higher than
the original Relations
A. True B. False
4. 4. What factors are considered while estimating the disk space
required to store a DB?(1)
5. 5. One of the following DBMS doesn’t support the Relational
Model(1)
A. MS-SQL Server C. MySQL
B Oracle D. MS-Access E. None

43
Fundamentals Of Database Systems (INSY 2031) ch 01/23/2024
apter 5

You might also like