DBMS - Unit - 3 - Chapter - 2 - Relationl Database Design
DBMS - Unit - 3 - Chapter - 2 - Relationl Database Design
Decomposition
Atomic Domains Functional- Using
and First Dependency Multivalued Database-Design
Normal Form Theory Dependencies Process
• Functional Dependency removes data redundancy where the same values should not be repeated at multiple
locations in the same database table.
• The process of Normalization starts with identifying the candidate keys in the relation. Without functional
dependency, it's impossible to find candidate keys and normalize the database.
Inference rules
• The inference rule is a type of assertion. It can apply to a set of FD(functional dependency) to derive other
FD.
• Using the inference rule, we can derive additional functional dependency from the initial set.
• Reflexive Rule (IR1)
• In the reflexive rule, if Y is a subset of X, then X determines Y.
• If X ⊇ Y then X → Y
• Example:
• X = {a, b, c, d, e}
• Y = {a, b, c}
• Augmentation Rule (IR2)
• The augmentation is also called as a partial dependency. In augmentation, if X determines Y, then XZ
determines YZ for any Z
• If X → Y then XZ → YZ
• Example:
• For R(ABCD), if A → B then AC → BC
• Transitive Rule (IR3)
• In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
• If X → Y and Y → Z then X → Z
• It isn't easy to maintain and update data as it would involve searching many records in relation.
• So to handle these problems, we should analyze and decompose the relations with redundant data into
smaller, simpler, and well-structured relations that are satisfy desirable properties. Normalization is a
process of decomposing the relations into relations with fewer attributes.
• What is Normalization?
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using relationships.
• The normal form is used to reduce redundancy from the database table.
• Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship
due to lack of data.
• Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the
unintended loss of some other important data.
• Updation Anomaly: The update anomaly is when an update of a single data value requires multiple
rows of data to be updated.
Types of Normal Forms:
• Normalization works through a series of stages called Normal forms. The normal forms apply to individual
relations. The relation is said to be in particular normal form if it satisfies constraints.
First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
• Example: Relation STUDENT is not in 1NF because of multi-valued attribute STUD_PHONE.
Second Normal Form (2NF)
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on the primary key
• In a table, if attribute B is functionally dependent on A, but is not functionally dependent on a proper subset of
A, then B is considered fully functional dependent on A. Hence, in a 2NF table, all non-key attributes cannot
be dependent on a subset of the primary key. Note that if the primary key is not a composite key, all non-key
attributes are always fully functional dependent on the primary key. A table that is in 1st normal form and
contains only a single key as the primary key is automatically in 2nd normal form.
• This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase
Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key.
Therefore, this table does not satisfy second normal form.
Third Normal Form(3NF)
• A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in
second normal form.
• Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive dependency.
• A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –>
Y
• X is a super key.
• Y is a prime attribute (each element of Y is part of some candidate key).
• Example: Let's assume there is a company where employees work in more than one department.
• In the above table Functional dependencies are as follows:
• EMP_ID → EMP_COUNTRY
• EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
• Candidate key: {EMP-ID, EMP-DEPT}
• The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
• To convert the given table into BCNF, we decompose it into three tables:
EMP_ID EMP_COUNTR EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Y
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
D394 283
D394 300
D283 232
D283 549
Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a
third attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it
always requires at least three attributes.
• Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.
• Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.
• In this case, these two columns can be called as multivalued
dependent on BIKE_MODEL.
The representation of these dependencies is shown below:
• BIKE_MODEL → → MANUF_YEAR
• BIKE_MODEL → → COLOR
• This can be read as "BIKE_MODEL multidetermined MANUF_YEAR"
and "BIKE_MODEL multidetermined COLOR".
Fourth Normal Form (4NF)
• Any relation is said to be in the fourth normal form when it satisfies the following conditions:
• It must be in BCNF
• It should have no multivalued dependency.
• FD{Student-ID->->Course
Student-ID->->Hobby}
• Now this relation is thus in 4NF. A relation can contain a functional dependency along with a multi-
valued dependency also. So when such a case arises the columns which are functionally dependent are
moved to a separate table and the columns which are multi-valued dependent are moved to a separate
table. This converts the relation into 4NF.
Join Dependency
• If a table can be recreated by joining multiple tables and each of this table have a subset of the attributes of
the table, then the table is in Join Dependency. It is a generalization of Multivalued Dependency
{(EmpName, EmpSkills ),
( EmpName, EmpJob),
(EmpSkills, EmpJob)}
• Disadvantages of Normalization
• You cannot start building the database before knowing what the user needs.
• The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize relations of a higher degree.
• Careless decomposition may lead to a bad database design, leading to serious problems.
Decomposition Using Functional Dependencies
Keys and functional dependencies
• A database models a set of entities and relationships in the real world. There are usually a variety of
constraints (rules) on the data in the real world.
• For example, some of the constraints that are expected to hold in a university database are:
• 1. Students and instructors are uniquely identified by their ID.
• 2. Each student and instructor has only one name.
• 3. Each instructor and student is (primarily) associated with only one department.
• 4. Each department has only one value for its budget, and only one associated building.
An instance of a relation that satisfies all such real-world constraints is called a legal instance of the relation; a
legal instance of a database is one where all the relation instances are legal instances.
Some of the most commonly used types of real-world constraints can be represented formally as keys
(superkeys, candidate keys and primary keys), or as functional dependencies
FD have to be generated..
• We shall use functional dependencies in two ways:
• 1. To test instances of relations to see whether they satisfy a given set F of functional dependencies.
• 2. To specify constraints on the set of legal relations
• Third NF(refer previous slides)
• Dependency preserving concept: If we decompose a relation R into relations R1 and R2, All
dependencies of R either must be a part of R1 or R2 or must be derivable from combination of
FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
• Canonical Cover
• Lossless Decomposition
• Dependency preservation
Functional Dependency Theory
• Closure of a Set of Functional Dependencies
• Suppose we are given a relation schema r (A, B, C, G, H, I) and the set of functional dependencies:
• A→B
• A→C
• CG→H
• CG→I
• B→H
• The functional dependency:
• A→H is logically implied.
• Let F be a set of functional dependencies. The closure of F, denoted by F+, is the set of all functional
dependencies logically implied by F. Given F, we can compute F+ directly from the formal definition of
functional dependency. If F were large, this process would be lengthy and difficult. Such a computation of F+
requires arguments of the type just used to show that A→H is in the closure of our example set of
dependencies.
• Axioms, or rules of inference, provide a simpler technique for reasoning about functional dependencies
• we use Greek letters (α,β,γ,. . . ) for sets of attributes, and uppercase Roman letters from the beginning of the
alphabet for individual attributes. We use αβ to denote α ∪ β .
• By applying these rules repeatedly, we can find all of F+, given F. This collection of rules is called
Armstrong’s axioms in honor of the person who first proposed it.
• Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies.
• They are complete, because, for a given set F of functional dependencies, they allow us to generate all F+.
Let us apply our rules to the example of schema R = (A,
B, C, G, H, I) and the set F of functional dependencies
{A→ B, A→ C, CG → H, CG → I , B → H}.
We list several members of F+ here:
• FD{A→ B, A→ C, CG → H, CG → I , B → H}
• we shall use it to compute (AG)+ with the functional dependencies as
above and result = AG.
• A → B causes us to include B in result. To see this fact, we observe that
• A→ B is in F, A⊆ result (which is AG), so result := result ∪B.
• A→C causes result to become ABCG.
• CG→H causes result to become ABCGH.
• CG→I causes result to become ABCGHI.
• Canonical Cover
• Suppose that we have a set of functional dependencies F on a relation schema. Whenever a user performs an
update on the relation, the database system must ensure that the update does not violate any functional
dependencies, that is, all the functional dependencies in F are satisfied in the new database state.
• The system must roll back the update if it violates any functional dependencies in the set F.
• A canonical cover or irreducible a set of functional dependencies FD is a simplified set of FD that has a
similar closure as the original set FD.
• Extraneous attributes::An attribute of an FD is said to be extraneous if we can remove it without
changing the closure of the set of FD.
Q. Suppose a relational schema R(w x y z), and set of functional dependency as following F : { x w, wz xy, y wxz }
Find the canonical cover Fc (Minimal set of functional dependency).
• Lossless Decomposition::refer 5NF slide
• Dependency preserving::
Algorithms for Decomposition
3NF and BCNF:(need to explain in detail what is 3NF,BCNF,respective algorithms, difference between
3NF and BCNF)
For explanation regarding 3NF and BCNF refer previous slides.
Dependency-preserving, lossless decomposition into 3NF:
In 3NF the functional dependencies are already in 1NF and In BCNF the functional dependencies are already in
3.
2NF. 1NF, 2NF and 3NF.
7. Lossless decomposition can be achieved by 3NF. Lossless decomposition is hard to achieve in BCNF
Decomposition Using Multivalued Dependencies
• Multivalued dependency,4NF and 4NF decomposition Algorithm
• Multivalued dependency,4NF(concept refer in previous slides)
• 4NF decomposition Algorithm
More Normal Forms
• Join Dependency(refer previous slides)
• Multivalued dependencies help us understand and eliminate some forms of repetition of information that
cannot be understood in terms of functional dependencies. There are types of constraints called join
dependencies that generalize multivalued dependencies, and lead to another normal form called project-join
normal form (PJNF) (PJNF is called fifth normal form in some books).There is a class of even more
general constraints that leads to a normal form called domain-key normal form (DKNF).
• A practical problem with the use of these generalized constraints is that they are not only hard to reason with,
but there is also no set of sound and complete inference rules for reasoning about the constraints. Hence PJNF
and DKNF are used quite rarely.
Database-Design Process
• we assumed that a relation schema r(R) is given, and proceeded to normalize it. There are several ways in
which we could have come up with the schema r(R):
• So far we have looked at detailed issues about normal forms and normalization. In this section, we study how
normalization fits into the overall database-design process.
• There are several ways in which we could have come up with the schema r (R):
• 1. r (R) could have been generated in converting an E-R diagram to a set of relation schemas.
• 2. r(R) could have been a single relation schema containing all attributes that are of interest. The
normalization process then breaks up r (R) into smaller schemas.
• 3. r (R) could have been the result of an ad-hoc design of relations that we then test to verify that it satisfies a
desired normal form.
• E-R Model and Normalization
• When we define an E-R diagram carefully, identifying all entities correctly, the relation schemas generated
from the E-R diagram should not need much further normalization
• However, there can be functional dependencies between attributes of an entity. For instance, suppose an
instructor entity set had attributes dept_name and dept_address, and there is a functional dependency
dept_name → dept_address. We would then need to normalize the relation generated from instructor. Most
examples of such dependencies arise out of poor E-R diagram design. In the above example, if we had
designed the E-R diagram correctly, we would have created a department entity set with attribute
dept_address and a relationship set between instructor and department.
• Functional dependencies can help us detect poor E-R design. If the generated relation schemas are not in
desired normal form, the problem can be fixed in the E-R diagram. That is, normalization can be done
formally as part of data modeling. Alternatively, normalization can be left to the designer’s intuition during E-
R modeling, and can be done formally on the relation schemas generated from the E-R model.
• A careful reader will have noted that in order for us to illustrate a need for multivalued dependencies and
fourth normal form, we had to begin with schemas that were not derived from our E-R design. Indeed, the
process of creating an E-R design tends to generate 4NF designs. If a multivalued dependency holds and is
not implied by the corresponding functional dependency, it usually arises from one of the following sources:
• A many-to-many relationship set.
• A multivalued attribute of an entity set.
• For a many-to-many relationship set each related entity set has its own schema and there is an additional
schema for the relationship set. For a multivalued attribute, a separate schema is created consisting of that
attribute and the primary key of the entity set (as in the case of the phone number attribute of the entity set
instructor).
• Naming of Attributes and Relationships
• A desirable feature of a database design is the unique-role assumption, which means that each attribute name
has a unique meaning in the database. This prevents us from using the same attribute to mean different things
in different schemas.
• For example, we might otherwise consider using the attribute number for phone_number in the instructor
schema and for room_number in the classroom schema. The join of a relation on schema instructor with
one on classroom is meaningless.
• While users and application developers can work carefully to ensure use of the right number in each
circumstance, having a different attribute name for phone number and for room number serves to reduce user
errors.
• While it is a good idea to keep names for incompatible attributes distinct, if attributes of different relations
have the same meaning, it may be a good idea to use the same attribute name. For this reason we used the
same attribute name “name” for both the instructor and the student entity sets.
• In large database schemas, relationship sets (and schemas derived therefrom) are often named via a
concatenation of the names of related entity sets, perhaps with an intervening hyphen or underscore. We have
used a few such names, for example inst_sec and student_sec. We used the names teaches and takes instead of
using the longer concatenated names. This was acceptable since it is not hard for you to remember the
associated entity sets for a few relationship sets. We cannot always create relationship-set names by simple
concatenation; for example, a manager or works-for relationship between employees would not make much
sense if it were called employee-employee! Similarly, if there are multiple relationship sets possible between
a pair of entity sets, the relationship-set names must include extra parts to identify the relationship set.
• Different organizations have different conventions for naming entity sets. For example, we may call an entity
set of students student or students. We have chosen to use the singular form in our database designs. Using
either singular or plural is acceptable, as long as the convention is used consistently across all entity sets.
• As schemas grow larger, with increasing numbers of relationship sets, using consistent naming of attributes,
relationships, and entities makes life much easier for the database designer and application programmers.
De-normalization for Performance
• Occasionally database designers choose a schema that has redundant information; that is, it is not normalized.
• They use the redundancy to improve performance for specific applications.
• The penalty paid for not using a normalized schema is the extra work (in terms of coding time and execution
time) to keep redundant data consistent.
• For instance, suppose all course prerequisites have to be displayed along with a course information, every time a
course is accessed.
• In our normalized schema, this requires a join of course with prereq.
• One alternative to computing the join on the fly is to store a relation containing all the attributes of course and
prereq. This makes displaying the “full” course information faster. However, the information for a course is repeated
for every course prerequisite, and all copies must be updated by the application, whenever a course prerequisite is
added or dropped. The process of taking a normalized schema and making it non-normalized is called
denormalization, and designers use it to tune performance of systems to support time-critical operations.
• A better alternative, supported by many database systems today, is to use the normalized schema, and additionally
store the join of course and prereq as a materialized view.
Other Design Issues::Refer Text