Week08 - Physical Design
Week08 - Physical Design
Methodology
Physical Design
1
Physical Database Design
Throughout the processes of conceptual and
logical database designs and the
normalization, the primary objective has been
the storage efficiency and the consistency of
the database
In the physical database design, however,
the focus shifts from storage efficiency to the
efficiency in execution
2
Physical Database Design
(Cont.)
The physical DB design involves:
Transforms logical DB design into technical
specifications for storing and retrieving data
Does not include practically implementing the
design however tool specific decisions are
involved
The Physical design requires the following
input
Normalized relations
Definitions of each attribute (means the purpose 3
or objective of the attributes)
Physical Database Design
(Cont.)
Descriptions of data usage (how and by whom
data will be used)
Requirements for response time, data security,
backup etc.
Tool to be used
Decisions that are made during this process
are:
Choosing data types
Deciding file organizations
Selecting structures
Preparing strategies for efficient access 4
De-normalization
De-normalization is a technique to move from higher
to lower normal forms of database modeling in order
to speed up database access
De-normalization process is applied for deriving a
physical data model from a logical design
In logical design we group things logically related
through same primary key
In physical database design fields are grouped, as
they are stored physically and accessed by DBMS
5
De-normalization (Cont.)
We should be aware that each new RDBMS
release usually bring enhanced performance
and improved access options that may
reduce the need for De-normalization
A fully normalized database schema can fail
to provide adequate system response time
due to excessive table join operations
6
De-normalization (Cont.)
De-normalization Situation 1:
Merge two Entity types into one with one to one
relationship
Even if one of the entity type is optional, so joining
can lead to wastage of storage, however if two
accessed together very frequently their merging
might be a wise decision
So those two relations must be merged for better
performance, which have one to one relationship
7
De-normalization (Cont.)
De-normalization Situation 2:
Many to many binary relationships mapped to three
relations
Queries needing data from two participating relations
need joining of three relations that is expensive
Join is an expensive operation from execution point of
view
Consider the many to many relationship b/w EMP,
PROJ and WORK
EMP (empID, eName,pjId,Sal)
PROJ (pjId,pjName)
8
WORK (empId.pjId,dtHired,Sal)
De-normalization (Cont.)
So now if we by de-normalizing these relations
and merge the WORK relation with PROJ relation
But in this case it is violating 2NF and anomalies
of 2NF would be there
But there would be only one join operation
involved by joining two tables, which increases
the efficiency
EMP (empID, eName,pjId,Sal)
PROJ (pjId,pjName, empId,dtHired,Sal)
9
De-normalization (Cont.)
De-normalization Situation 3:
In 1:M situation when the ET on side does not
participate in any other relationship, then many side
ET is appended with reference data rather than the
foreign key
In this case the reference table should be merged with
the main table
Consider STUDENT and HOBBY relations
One student can have one hobby and one hobby can
be adopted by many students
Here hobby can be merged with the student relation
Thus redundancy of data would be there, but there
would not be any joining of two relations, which will
have a better performance 10
Partitioning
Partitioning splits same relation into two
Aims of data partitioning in database are to
Reduce workload (e.g. data access,
communication costs, search space)
Balance workload
Speed up the rate of useful work (e.g. frequently
accessed objects in main memory)
There are two types of partitioning:
Horizontal Partitioning
Vertical Partitioning 11
Partitioning (Cont.)
Horizontal Partitioning
Table is split on the basis of rows, which means a
larger table is split into smaller tables
The advantage of this is that time in accessing the
records of a larger table is much more than a
smaller table
Range Partitioning
In this type of partitioning range is imposed on any
particular attribute
For Example for those students whose ID is from 1-
1000 are in partition 1 and so on 12
Partitioning (Cont.)
Hash Partitioning
A particular algorithm is applied and DBMS knows
that algorithm
So hash partitioning reduces the chances of
unbalanced partitions to a large extent
List Partitioning
In this type of partitioning the values are specified for
every partition
So there is a specified list for all the partitions
13
Partitioning (Cont.)
Vertical Partitioning
Vertical partitioning is done on the basis of
attributes
Same table is split into different physical records
depending on the nature of accesses
Primary key is repeated in all vertical partitions of
a table to get the original table
Consider the Student relation
STD (stId, sName, sAdr, sPhone, cgpa, prName,
school, mtMrks, mtSubs, clgName,
14
intMarks, intSubs, dClg, bMarks, bSubs)
Partitioning (Cont.)
We can partition this relation vertically as
under
STD (stId, sName, sAdr, sPhone, cgpa,
prName)
STDACD (sId, school, mtMrks, mtSubs,
clgName, intMarks, intSubs,
dClg, bMarks,bSubs)
15
Data Storage Concepts
Physical Storage Media Storage media are
classified according to following characteristics:
Speed of access
Cost per unit of data
Reliability
RAID – Redundant Array of Inexpensive Disks
Many disk that look as a single disk to OS but have better
performance and better reliability
RAID disk drives are used frequently on servers
RAID have the property that the data are distributed over
the drives to allow parallel operations
16
Data Storage Concepts (Cont.)
Fundamental to RAID is "striping", a method
of concatenating multiple drives into one
logical storage unit
Striping involves partitioning each drive's
storage space into stripes which may be as
small as one sector (512 bytes) or as large as
several megabytes
The type of application environment, I/O or
data intensive, determines whether large or
small stripes should be used
17
Data Storage Concepts (Cont.)
RAID-0
Simple Striping
Virtual single disk is divided up into strips of k
sectors each
Since no redundant information is stored,
performance is very good, but the failure of
any disk in the array results in data loss
18
Data Storage Concepts (Cont.)
1 2 3 4
5 6 7 8
9 10 11 12
20
Data Storage Concepts (Cont.)
1 1’ 1’’
2 2’ 2’’
3 3’ 3’’
21
Data Storage Concepts (Cont.)
RAID-2,3
For reliability simple parity check code is used
Parity bit is stored on separate disk
RAID-4
RAID Level 4 stripes data at a block level across
several drives, with parity stored on one drive
The performance of a level 4 array is very good
for reads (the same as level 0)
Writes, however, require that parity data be
updated each time
22
Data Storage Concepts (Cont.)
RAID-5
RAID Level 5 is similar to level 4, but distributes parity
among the drives
This can speed small writes in multiprocessing systems,
since the parity disk does not become a bottleneck
RAID-0 is the fastest and most efficient array type
but offers no fault-tolerance
RAID-1 is the array of choice for performance-
critical, fault-tolerant environments
RAID-2 is seldom used today since ECC is
embedded in almost all modern disk drives
23
Data Storage Concepts (Cont.)
RAID-3 can be used in data intensive or single-user
environments which access long sequential records
to speed up data transfer. However, RAID-3 does
not allow multiple I/O operations to be overlapped
RAID-4 offers no advantages over RAID-5 and does
not support multiple simultaneous write operations
RAID-5 is the best choices in multi-use
environments which are not write performance
sensitive. However, at least three and more typically
five drives are required for RAID-5 arrays
24