5.relational DB Design
5.relational DB Design
??
Redundant Information and Anomalies Problems
Insertion Anomalies.
• Cannot add data to the database due to the absence of other related data.
Deletion Anomalies.
• Removing a record also unintentionally removes other valuable data
Modification Anomalies.
• A piece of information that appears in multiple places is updated in one location but
not the others, leading to inconsistent data across the database.
Problems with NULL Values
NULL Values in Tuples: If many of the attributes do not apply to all tuples in the
relation, we end up with many NULLs in those tuples.
• This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN operations at
the logical level.
• Another problem with NULLs is how to account for them when aggregate operations
such as COUNT or SUM are applied. SELECT and JOIN operations involve comparisons;
if NULL values are present, the results may become unpredictable.
• Moreover, NULLs can have multiple interpretations, such as the following:
The attribute does not apply to this tuple.
The attribute value for this tuple is unknown.
The value is known but absent; that is, it has not been recorded yet.
Quality of relation schema design
Four informal guidelines that may be used as measures to determine the quality of
relation schema design:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
(These measures are not always independent of one another).
Functional Dependencies
Suppose that:
• The relational database schema has n attributes A1, A2,… , An
• the whole database as being described by a single universal relation schema
R = {A1, A2, … , An}
A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
(We also say that there is a functional dependency from X to Y, or that Y is functionally dependent on
X. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. A
column appearing on the left-hand side of an FD is called a determinant or, alternatively, an LHS
for left-hand side.)
Functional Dependencies
Candidate key:
If a constraint on R states that there cannot be more than one tuple with a
given X-value in any relation instance r(R)—that is, X is a candidate key of
R—this implies that X → Y for any subset of attributes Y of R (because the
key constraint implies that no two tuples in any legal state r(R) will have the
same value of X). If X is a candidate key of R, then X → R.
Most practical relational design projects take one of the following two approaches:
• Perform a conceptual schema design using a conceptual model such as ER
or EER and map the conceptual design into a set of relations.
• Design the relations based on external knowledge derived from an existing
implementation of files or forms or reports.
First normal form (1NF) is now considered to be part of the formal definition of a
relation in the basic (flat) relational model; historically, it was defined to disallow
multivalued attributes, composite attributes, and their combinations. It states that
the domain of an attribute must include only atomic (simple, indivisible) values
and that the value of any attribute in a tuple must be a single value from the
domain of that attribute. Hence, 1NF disallows having a set of values, a tuple of
values, or a combination of both as an attribute value for a single tuple. In other
words, 1NF disallows relations within relations or relations as attribute values
within tuples. The only attribute values permitted by 1NF are single atomic (or
indivisible) values.
Second Normal Form
Second normal form (2NF) is based on the concept of full functional dependency.
A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold anymore; that is, for
any attribute A X, (X - {A}) does not functionally determine Y. A functional
dependency X → Y is a partial dependency if some attribute A X can be removed
from X and the dependency still holds; that is, for some A X, (X - {A}) → Y.
{Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours
nor Pnumber → Hours holds).
{Ssn, Pnumber} → Ename is partial because Ssn → Ename holds.
A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
Third Normal Form
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF. We pointed out in the last
subsection that although 3NF allows functional dependencies that conform to the
clause (b) in the 3NF definition, BCNF disallows them and hence is a stricter
definition of a normal form.
A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R.
M-Way Relationships
M-way relationships are represented by associative entity types in the Crow’s Foot
ERD notation.
In the conversion process, an associative entity type converts into a table with a
combined primary key consisting of three or more components
Practical Concerns about Normalization
• In the initial design approach, you use normalization techniques in conceptual data
modeling. Instead of drawing an ERD, you identify functional dependencies and apply
a normalization procedure like the simple synthesis procedure.
After defining the tables, you identify the referential integrity constraints and
construct a relational model diagram.
•
Advantages of Normalization as a Refinement Tool: use normalization to remove
redundancies after conversion from an ERD to a table design rather than as an
initial design tool.
• Easier to translate requirements into an ERD than into lists of FDs.
• Fewer FDs to specify because most FDs are derived from primary keys.
• Fewer tables to split because normalization performed intuitively during ERD
development.
• Easier to identify relationships especially M-N relationships without attributes.
Relational Database Design Methods
A bottom-up design methodology (also called design by synthesis) considers the basic
relationships among individual attributes as the starting point and uses those to construct relation
schemas. This approach is not very popular in practice because it suffers from the problem of
having to collect a large number of binary relationships among attributes as the starting point. For
practical situations, it is next to impossible to capture binary relationships among all such
pairs of attributes.
A top-down design methodology (also called design by analysis) starts with a number of groupings
of attributes into relations that exist together naturally, for example, on an invoice, a form, or a
report. The relations are then analyzed individually and collectively, leading to further
decomposition until all desirable properties are met.
Informal Design Guidelines for Relation Schemas
The implicit goals of the design activity are information preservation and minimum
redundancy.
Information is very hard to quantify—hence we consider information preservation
in terms of maintaining all concepts, including attribute types, entity types, and
relationship types as well as generalization/specialization relationships, which are
described using a model such as the EER model. Thus, the relational design must
preserve all of these concepts, which are originally captured in the conceptual
design after the conceptual to logical design mapping.
Minimizing redundancy implies minimizing redundant storage of the same
information and reducing the need for multiple updates to maintain consistency
across multiple copies of the same information in response to real-world events
that require making an update.
Informal Design Guidelines for Relation Schemas