0% found this document useful (0 votes)
19 views33 pages

5.relational DB Design

The document discusses relational database design theory, focusing on identifying modification anomalies, defining functional dependencies, and normalizing tables to reduce redundancies. It outlines various normal forms (1NF, 2NF, 3NF, BCNF) and their significance in ensuring data integrity while addressing issues like NULL values and insertion, deletion, and modification anomalies. Additionally, it presents guidelines for designing effective relation schemas and highlights the importance of normalization in the database development process.

Uploaded by

Quỳnh Phươnn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views33 pages

5.relational DB Design

The document discusses relational database design theory, focusing on identifying modification anomalies, defining functional dependencies, and normalizing tables to reduce redundancies. It outlines various normal forms (1NF, 2NF, 3NF, BCNF) and their significance in ensuring data integrity while addressing issues like NULL values and insertion, deletion, and modification anomalies. Additionally, it presents guidelines for designing effective relation schemas and highlights the importance of normalization in the database development process.

Uploaded by

Quỳnh Phươnn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Database Systems

RELATIONAL DATABASE DESIGN THEORY


 Identify modification anomalies in tables with excessive redundancies
 Define functional dependencies among columns of a table
 Normalize tables by detecting violations of normal forms and applying
normalization rules

Relation schema

 Relation schema consists of a number of attributes


R = {A1, A2, … , An}
 The relational database schema consists of a number of relation schemas.
 A Relation state or Relation instance r(R) is a set of tuples (also called records). A
relation instance can be thought of as a table in which each tuple is a row, and all
rows have the same number of fields.
Redundant Information and Anomalies Problems

 Redundant Information in Tuples and Update Anomalies


(The tables given below is not normalized)
Redundant Information and Anomalies Problems
Redundant Information and Anomalies Problems

 ??
Redundant Information and Anomalies Problems

 Insertion Anomalies.
• Cannot add data to the database due to the absence of other related data.
 Deletion Anomalies.
• Removing a record also unintentionally removes other valuable data
 Modification Anomalies.
• A piece of information that appears in multiple places is updated in one location but
not the others, leading to inconsistent data across the database.
Problems with NULL Values

 NULL Values in Tuples: If many of the attributes do not apply to all tuples in the
relation, we end up with many NULLs in those tuples.
• This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN operations at
the logical level.
• Another problem with NULLs is how to account for them when aggregate operations
such as COUNT or SUM are applied. SELECT and JOIN operations involve comparisons;
if NULL values are present, the results may become unpredictable.
• Moreover, NULLs can have multiple interpretations, such as the following:
The attribute does not apply to this tuple.
The attribute value for this tuple is unknown.
The value is known but absent; that is, it has not been recorded yet.
Quality of relation schema design

 Four informal guidelines that may be used as measures to determine the quality of
relation schema design:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
(These measures are not always independent of one another).
Functional Dependencies

 Suppose that:
• The relational database schema has n attributes A1, A2,… , An
• the whole database as being described by a single universal relation schema
R = {A1, A2, … , An}
 A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
(We also say that there is a functional dependency from X to Y, or that Y is functionally dependent on
X. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. A
column appearing on the left-hand side of an FD is called a determinant or, alternatively, an LHS
for left-hand side.)
Functional Dependencies

 X functionally determines Y in a relation schema R if, and only if, whenever


two tuples of r(R) agree on their X-value, they must necessarily agree on their
Y-value.
 If X → Y in R, this does not say whether or not Y → X in R
 A functional dependency is a property of the semantics or meaning of the
attributes. The database designers will use their understanding of the semantics of
the attributes of R—that is, how they relate to one another—to specify the
functional dependencies that should hold on all relation states (extensions,
instance) r of R.
Functional Dependencies

 Example of functional dependencies:


Ssn → Ename

Pnumber → {Pname, Plocation}

{Ssn, Pnumber} → Hours


Functional Dependencies

 ???? Specify functional dependencies constraints on R:


R = {Ssn, Ename, Bdate, Address, Dnumber, Dname, Dlocation, Pnumber, Pname,
Plocation, Hours, Dmgr_Ssn}
 Ssn -> { Ename, Bdate, Address, Dnumber}
 Dnumber -> Dname, Dmgr_Ssn
 Dname -> Dnumber
Functional Dependencies

 ??? Specify functional dependencies constraints on R:

 SdtNo, OfferNo -> Enrgrade

 SdtNo, CourseNo -> Enrgrade


Definitions of Keys

 Candidate key:
If a constraint on R states that there cannot be more than one tuple with a
given X-value in any relation instance r(R)—that is, X is a candidate key of
R—this implies that X → Y for any subset of attributes Y of R (because the
key constraint implies that no two tuples in any legal state r(R) will have the
same value of X). If X is a candidate key of R, then X → R.

 Ex: Relation schema R = {Pnumber, Pname, Plocation}


Pnumber → {Pname, Plocation} in a relation schema R.
 Pnumber is a candidate key of R.
Definitions of Keys

 A superkey of a relation schema R = {A1, A2, … , An} is a set of attributes S R


with the property that no two tuples t1 and t2 in any legal relation
state r of R will have t1[S] = t2[S] (hay S -> R)
 A key K is a superkey with the additional property that removal of any attribute
from K will cause K not to be a superkey anymore.
• E.g.: {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename},
{Ssn, Ename, Bdate}, and any set of attributes that includes Ssn are all superkeys.
 If a relation schema has more than one key, each is called a candidate key. One of
the candidate keys is arbitrarily designated to be the primary key, and the others
are called secondary keys. In a practical relational database, each relation schema
must have a primary key. If no candidate key is known for a relation, the entire
relation can be treated as a default superkey.
 An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called nonprime if it
is not a prime attribute—that is, if it is not a member of any candidate key
Normal Forms Based on Primary Keys

 Most practical relational design projects take one of the following two approaches:
• Perform a conceptual schema design using a conceptual model such as ER
or EER and map the conceptual design into a set of relations.
• Design the relations based on external knowledge derived from an existing
implementation of files or forms or reports.

 Normal Forms are used to evaluate the relations for goodness.


 Normalization is the process of minimizing redundancy from a relation or set of
relations by decomposing them further as needed to achieve higher normal forms.
 Normal Forms:
• first, second, and third normal form (1NF, 2NF, 3NF) (proposed by Codd [1972])
• Boyce-Codd normal form (BCNF) (proposed later by Boyce and Codd)
 All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation.
• Fourth normal form (4NF) (based on the concepts of multivalued dependencies )
• Fifth normal form (5NF) based on the concepts of join dependencies)
 Normalization of Relations:
• The normalization process takes a relation schema through a series of tests to certify
whether it satisfies a certain normal form. The process, which proceeds in a top-down
fashion by evaluating each relation against the criteria for normal forms and
decomposing relations as necessary.
 The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has been
normalized.
(database design as practiced in industry today pays particular attention to normalization only up to
3NF, BCNF, or at most 4NF).
 Denormalization is the process of storing the join of higher normal form relations
as a base relation, which is in a lower normal form.
 the database designers need not normalize to the highest possible normal form.
Relations may be left in a lower normalization status, such as 2NF, for performance
reasons. Doing so incurs the corresponding penalties of dealing with the anomalies.
First Normal Form

 First normal form (1NF) is now considered to be part of the formal definition of a
relation in the basic (flat) relational model; historically, it was defined to disallow
multivalued attributes, composite attributes, and their combinations. It states that
the domain of an attribute must include only atomic (simple, indivisible) values
and that the value of any attribute in a tuple must be a single value from the
domain of that attribute. Hence, 1NF disallows having a set of values, a tuple of
values, or a combination of both as an attribute value for a single tuple. In other
words, 1NF disallows relations within relations or relations as attribute values
within tuples. The only attribute values permitted by 1NF are single atomic (or
indivisible) values.
Second Normal Form

 Second normal form (2NF) is based on the concept of full functional dependency.
 A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold anymore; that is, for
any attribute A  X, (X - {A}) does not functionally determine Y. A functional
dependency X → Y is a partial dependency if some attribute A  X can be removed
from X and the dependency still holds; that is, for some A  X, (X - {A}) → Y.
 {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours
nor Pnumber → Hours holds).
 {Ssn, Pnumber} → Ename is partial because Ssn → Ename holds.
 A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
Third Normal Form

 Third normal form (3NF) is based on the concept of transitive dependency.


 A functional dependency X → Y in a relation schema R is a transitive dependency
if there exists a set of attributes Z in R that is neither a candidate key nor a subset
of any key of R and both X → Z and Z → Y hold.
• E.g. The dependency Ssn → Dmgr_ssn is transitive through Dnumber in EMP_DEPT, because
both the dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is
neither a key itself nor a subset of the key of EMP_DEPT.

 Definition. According to Codd’s original definition, a relation schema R is in


3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent
on the primary key.
Boyce-Codd Normal Form

 Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF. We pointed out in the last
subsection that although 3NF allows functional dependencies that conform to the
clause (b) in the 3NF definition, BCNF disallows them and hence is a stricter
definition of a normal form.
 A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R.
M-Way Relationships

 M-way relationships are represented by associative entity types in the Crow’s Foot
ERD notation.
 In the conversion process, an associative entity type converts into a table with a
combined primary key consisting of three or more components
Practical Concerns about Normalization

 Role of Normalization in the Database Development Process


• Normalization can be used as either a refinement tool or initial design tool in the
database development process.
• In the refinement approach:
 perform conceptual data modeling using the Entity Relationship Model
 Transform the ERD into tables using the conversion rules.
 Then, apply normalization techniques to analyze each table:
identify FDs, use the simple synthesis procedure to remove redundancies, and analyze a
table for independence if the table represents an M-way relationship. Since the primary
key determines the other columns in a table, you only need identify FDs in which the
primary key is not the LHS.


• In the initial design approach, you use normalization techniques in conceptual data
modeling. Instead of drawing an ERD, you identify functional dependencies and apply
a normalization procedure like the simple synthesis procedure.
After defining the tables, you identify the referential integrity constraints and
construct a relational model diagram.

 Advantages of Normalization as a Refinement Tool: use normalization to remove
redundancies after conversion from an ERD to a table design rather than as an
initial design tool.
• Easier to translate requirements into an ERD than into lists of FDs.
• Fewer FDs to specify because most FDs are derived from primary keys.
• Fewer tables to split because normalization performed intuitively during ERD
development.
• Easier to identify relationships especially M-N relationships without attributes.
Relational Database Design Methods

 A bottom-up design methodology (also called design by synthesis) considers the basic
relationships among individual attributes as the starting point and uses those to construct relation
schemas. This approach is not very popular in practice because it suffers from the problem of
having to collect a large number of binary relationships among attributes as the starting point. For
practical situations, it is next to impossible to capture binary relationships among all such
pairs of attributes.
 A top-down design methodology (also called design by analysis) starts with a number of groupings
of attributes into relations that exist together naturally, for example, on an invoice, a form, or a
report. The relations are then analyzed individually and collectively, leading to further
decomposition until all desirable properties are met.
Informal Design Guidelines for Relation Schemas

 The implicit goals of the design activity are information preservation and minimum
redundancy.
 Information is very hard to quantify—hence we consider information preservation
in terms of maintaining all concepts, including attribute types, entity types, and
relationship types as well as generalization/specialization relationships, which are
described using a model such as the EER model. Thus, the relational design must
preserve all of these concepts, which are originally captured in the conceptual
design after the conceptual to logical design mapping.
 Minimizing redundancy implies minimizing redundant storage of the same
information and reducing the need for multiple updates to maintain consistency
across multiple copies of the same information in response to real-world events
that require making an update.
Informal Design Guidelines for Relation Schemas

 Guideline 1. Design a relation schema so that it is easy to explain its meaning. Do


not combine attributes from multiple entity types and relationship types into a
single relation. Intuitively, if a relation schema corresponds to one entity type or
one relationship type, it is straightforward to explain its meaning. Otherwise, if the
relation corresponds to a mixture of multiple entities and relationships, semantic
ambiguities will result and the relation cannot be easily explained.
 Guideline 2. Design the base relation schemas so that no insertion, deletion, or
modification anomalies are present in the relations. If any anomalies are present,
note them clearly and make sure that the programs that update the database will
operate correctly.
Informal Design Guidelines for Relation Schemas

 Guideline 3. As far as possible, avoid placing attributes in a base relation whose


values may frequently be NULL. If NULLs are unavoidable, make sure that they
apply in exceptional cases only and do not apply to a majority of tuples in the
relation.
 Guideline 4. Design relation schemas so that they can be joined with equality
conditions on attributes that are appropriately related (primary key, foreign key)
pairs in a way that guarantees that no spurious tuples are generated. Avoid
relations that contain matching attributes that are not (foreign key, primary key)
combinations because joining on such attributes may produce spurious tuples.
REFERENCES

1. Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom, DATABASE SYSTEMS:


The Complete Book
2. Ramez Elmasri, Shamkant B. N avathe, FUNDAMENTALS OF
FourthEdition DATABASE SYSTEMS
3. Michael V. Manning, Database Design, Application Development and
Administration.

You might also like