Systems Analysis and Design
Systems Analysis and Design
34
Process of Database Design
Logical Design
› Four key steps:
1. Develop a logical data model for each known user interface
for the application using normalization principles
2. Combine normalized data requirements from all user
interfaces into one consolidated logical database model
3. Translate the conceptual E-R data model for the application
into normalized data requirements
4. Compare the consolidated logical database design with the
translated E-R model and produce one final logical database
model for the application
35
Process of Database Design
Physical Design
› Based upon results of logical database design
› Key decisions:
1. Choosing storage format for each attribute from the logical
database model
2. Grouping attributes from the logical database model into
physical records
3. Arranging related records in secondary memory (hard disks
and magnetic tapes) so that records can be stored,
retrieved, and updated rapidly
4. Selecting media and structures for storing data to make
access more efficient
36
Relational Database Model
Data represented as a set of related tables or relations
Relation
› A named, two-dimensional table of data. Each
relation consists of a set of named columns and an
arbitrary number of unnamed rows
› Properties
Entries in cells are simple
Entries in columns are from the same set of values
Each row is unique
The sequence of columns can be interchanged without
changing the meaning or use of the relation
The rows may be interchanged or stored in any sequence
37
Relational Database Model
• Well-Structured Relation
– A relation that contains a minimum amount of
redundancy and allows users to insert, modify,
and delete the rows without errors or
inconsistencies
38
Normalization
39
Normalization
• Second Normal Form (2NF)
– Each non-primary key attribute is identified by the whole key (called
full functional dependency)
• Third Normal Form (3NF)
– Non-primary key attributes do not depend on each other (called
transitive dependencies)
• The result of normalization is that every non-
primary key attribute depends upon the whole
primary key and nothing but the primary key.
40
Functional Dependencies and
Primary Keys
• Functional Dependency
– A particular relationship between two attributes. For a
given relation, attribute B is functionally dependent on
attribute A if, for every valid value of A, that value of A
uniquely determines the value of B.
– Instances (or sample data) in a relation do not prove the
existence of a functional dependency
– Knowledge of problem domain is the most reliable method
for identifying functional dependency
41
Functional Dependencies and
Primary Keys
42
Functional Dependencies and
Primary Keys
43
Functional Dependencies and
Primary Keys
• EMPLOYEE2 below is an example of a relation
that is not in second normal form.
– EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course,
Date_Completed)
• The functional dependencies in this relation
are the following:
– Emp_ID:Name, Dept, Salary
– Emp_ID, Course:Date_Completed
44
Functional Dependencies and
Primary Keys
• EMPLOYEE2 is decomposed into the following
two relations:
– EMPLOYEE1(Emp_ID, Name, Dept, Salary)
– EMP_COURSE(Emp_ID, Course, Date_Completed)
45
Functional Dependencies and
Primary Keys
46
Functional Dependencies and
Primary Keys
• Consider the relation below:
– SALES(Customer_ID, Customer_Name, Salesperson,
Region)
• The following functional dependencies exist in
the SALES relation where Customer_ID is the
primary key :
– Customer_ID:Customer_Name, Salesperson, Region
– Salesperson:Region (Each salesperson is assigned to
a unique region.)
47
Functional Dependencies and
Primary Keys
• SALES relation is decomposed into two
relations to convert it into 3NF as follows:
– SALES1(Customer_ID, Customer_Name,
Salesperson)
– SPERSON(Salesperson, Region)
48
49
Functional Dependencies and
Primary Keys
• Foreign Key
– An attribute that appears as a nonprimary key attribute in
one relation and as a primary key attribute (or part of a
primary key) in another relation
• Referential Integrity
– An integrity constraint specifying that the value (or
existence) of an attribute in one relation depends on the
value (or existence) of the same attribute in another
relation
50
Transforming E-R Diagrams into Relations
51
Transforming E-R Diagrams into Relations
1. Represent Entities
– Each regular entity is transformed into a relation
– The identifier of the entity type becomes the primary
key of the corresponding relation
– The primary key must satisfy the following two
conditions
a. The value of the key must uniquely identify every row in the
relation
b. The key should be non-redundant
52
53
Transforming E-R Diagrams into Relations
2. Represent Relationships
– Binary 1:N and 1:1 Relationships
• A binary one-to-many (1:N) relationship in an E-R diagram
is represented by adding the primary key attribute (or
attributes) of the entity on the one side of the relationship
as a foreign key in the relation that is on the many side of
the relationship.
• For a binary or unary one-to-one (1:1) relationship
between the two entities A and B (for a unary relationship,
A and B would be the same entity type), the relationship
can be represented by any of the following choices:
1. Adding the primary key of A as a foreign key of B
2. Adding the primary key of B as a foreign key of A
3. Both of the above
54
55
Transforming E-R Diagrams into Relations
56
57
Transforming E-R Diagrams into Relations
58
59
60
Transforming E-R Diagrams into Relations
61
Physical Database Design
• Physical Database Design is involved with how
data are stored and related based on a
particular DBMS.
• It involves two major activities:
– Designing Fields
– Designing Physical Tables
62
Designing Fields
• A field is the smallest unit of data recognized
by database management systems
• In general, each attribute from each
normalized relation is represented as one or
more fields
• The basic decisions concerning a field are:
– the type of data (or storage type) and
– data integrity controls for the field.
63
Choosing Data Types
• A data type is a coding scheme for representing
organizational data.
• The specific database management software to be
used dictates which data type choices are available
• Selecting a data type balances four objectives that will
vary in degree of importance for different applications:
– Minimize storage space
– Represent all possible values of the field
– Improve data integrity for the field
– Support all data manipulations desired on the field
64
Controlling Data Integrity
• The five popular data integrity control methods are:
– default value: is the value a field will assume unless an explicit
value is entered for the field
– input mask: is a pattern of codes that restricts the width and
possible values for each position within a field
– range control: is a limited set of permissible values both
numeric and alphabetic data may have
– referential integrity: is a constraint specifying that the value (or
existence) of an attribute in one relation depends on the value
(or existence) of the same attribute in another relation
– null value control: is a constraint that insures that mandatory
fields are given a value.
65
Designing Physical Tables
• A relational database is a set of related tables (tables are
related by foreign keys referencing primary keys)
• a physical table is a named set of rows and columns that
specifies the fields in each row of the table.
• A physical table may or may not correspond to one relation
• Whereas normalized relations possess properties of well-
structured relations, the design of a physical table has two
goals different from those of normalization:
– efficient use of secondary storage and
– data-processing speed.
66
Designing Physical Tables
• Denormalization is the process of splitting or
combining normalized relations into physical
tables based on affinity of use of rows and fields.
• By placing data used together close to one
another on disk, the number of disk I/O
operations needed to retrieve all the data needed
in a program is minimized.
67
Designing Physical Tables
68
File Organization
• File Organization is a technique for physically
arranging the records of a file.
• The basic three families of file organizations
used in most file management environments
are:
– Sequential
– indexed, and
– hashed
69
Sequential File Organization
• The rows in the file are stored in sequence according to a
primary key value
• To locate a particular row, a program must normally scan
the file from the beginning until the desired row is located
• Sequential files are fast if you want to process rows
sequentially, but they are essentially impractical for
random row retrievals
• Deleting rows can cause wasted space or the need to
compress the file
• Adding or updating rows requires rewriting the file
70
Indexed File Organizations
• The rows are stored either sequentially or
nonsequentially, and an index is created that
allows software to locate individual rows.
• The main disadvantages of indexed file
organizations are the extra space required to
store the indexes and the extra time necessary
to access and maintain indexes.
71
Hashed File Organizations
• In hashed file organization, the address of each
row is determined using an algorithm that
converts a primary key value into a row address
• rows are located nonsequentially as dictated by
the hashing algorithm. Thus, sequential data
processing is impractical.
• On the other hand, retrieval of random rows is
fast
72
73