Unit-05 Relational Database Design
Unit-05 Relational Database Design
Keys
A key in DBMS is an attribute or a set of attributes that help to uniquely identify a tuple (or row) in a
relation (or table). Keys are also used to establish relationships between the different tables and columns of
a relational database. Individual values in a key are called key values.
1. Super Key
Super key is an attribute or set of attributes that can be used to identify the row in a table. It is the set of all
the keys which help to identify rows in a table uniquely. This means that all those columns of a table than
capable of identifying the other columns of that table uniquely will all be considered super keys. Super Key
is the superset of a candidate key (explained below). The Primary Key of a table is picked from the super
key set to make the table’s identity attribute.
Features of Super Key:
− Uniqueness
− Redundancy Allowed
− Not Necessarily Minimal
1
2. Candidate Keys
Minimum subset of super key. A candidate key is a specific type of field in a relational database that can
identify each unique record independently of any other data. Candidate keys are those attributes that
uniquely identify rows of a table. The Primary Key of a table is selected from one of the candidate keys.
So, candidate keys have the same properties as the primary keys explained above. There can be more than
one candidate key in a table.
Features of Candidate Keys:
− Uniqueness
− Irreducibility
− Minimality
3. Primary Key
A primary is a single column value used to identify a database record uniquely. It has the following
attributes. A primary key cannot be NULL. A primary key value must be unique. The primary key values
should rarely be changed. The primary key must be given a value when a new record is inserted.
Features of Primary Keys
− Uniqueness
− Not Null
− Fixed Values
− Single Attribute or Composite
− Indexed
− Used in Relationships
4. Alternate Key
As stated above, a table can have multiple choices for a primary key; However, it can choose only one. So,
all the keys which did not become the primary Key are called alternate keys.
5. Foreign Key
Foreign Key references the primary key of another Table! It helps connect your Tables. A foreign key can
have a different name from its primary key. It ensures rows in one table have corresponding rows in another.
Unlike the Primary key, they do not have to be unique. Foreign keys can be null even though primary keys
cannot.
6. Composite Key
2
A composite key is a primary key composed of multiple columns used to identify a record uniquely. In our
database, we have two people with the same name as Robert Phil, but they live in different places. Hence,
we require both Full Name and Address to identify a record uniquely. That is a composite key.
Functional Dependencies
→ A functional dependency is a constraint that specifies the relationship between two sets of attributes
where one set can accurately determine the value of other sets.
→ It is denoted as X → Y, where X is a set of attributes that can determine the value of Y.
→ Y is functionally dependent on X.
→ The attribute set on the left side of the arrow, X is called Determinant, while on the right side, Y is
called the Dependent.
→ Functional Dependency helps to maintain the quality of data in the database. It plays a vital role in
finding the difference between good and bad database design.
In this example, if we know the value of Student_ID, we can obtain Student_Name, Semester, Hotel etc.
By this, we can say that the city, Employee Name, and salary are functionally dependent on student ID.
3
Here, we can see that both the attributes Student_id and Roll_no alone can uniquely identify a course.
Hence, we can say that the relationship is partially dependent.
3. Transitive Dependency
→ Given a relation R(A,B,C) then dependency like A–>B, B–>C is a transitive dependency, since A–
>C is implied
4
→ A Transitive Dependency is a type of functional dependency which happens when “t” is indirectly
formed by two functional dependencies. Let’s understand with the following Transitive Dependency
Example.
{Company} -> {CEO} (if we know the compay, we know its CEO’s name)
{CEO} -> {Age} If we know the CEO, we know the Age
Therefore, according to the rule of rule of transitive dependency: {Company} -> {Age} should hold, that
makes sense because if we know the company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or more
attributes.
4. Multivalued Dependency
Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes
in a single table.
A multivalued dependency is a complete constraint between two sets of attributes in a relation. It requires
that certain tuples be present in a relation. Consider the following Multivalued Dependency Example to
understand.
5
In this example, maf_year and color are independent of each other but dependent on car_model. In this
example, these two columns are said to be multivalue dependent on car_model. This dependence can be
represented like this:
car_model -> maf_year
car_model-> colour
Anomalies
→ Data anomalies are inconsistencies in the data stored in a database as a result of an operation such
as update, insertion, and/or deletion
→ Database anomaly is normally the flaw in databases which occurs because of poor planning and
storing everything in a flat database.
→ Generally, this is removed by the process of normalization which is performed by splitting/joining
of tables.
→ There are three types of anomalies: update, deletion, and insertion anomalies.
→ For example, each employee in a company has a department associated with them as well as the
student group they participate in.
1. Insertion Anomaly
→ Insertion Anomalies happen when inserting vital data into the database is not possible because other
data is not already there.
→ For example, if a system is designed to require that a customer be on file before a sale can be made
to that customer, but you cannot add a customer until they have bought something, then you have
an insert anomaly.
If course is new then first we must add the course in database and then only assign course id and name to
student . This creates insertion anomalies
2. Update Anomaly
→ An update anomaly is a data inconsistency that results from data redundancy and a partial update.
→ For example, to change an employee’s title due to a promotion.
6
→ If the data is stored redundantly in the same table, and the person misses any of them, then there
will be multiple titles associated with the employee. The end user has no way of knowing which is
the correct title.
3. Deletion Anomaly
→ A deletion anomaly is the unintended loss of data due to deletion of other data.
→ For example, For example, if a single database record contains information about a particular
product along with information about a salesperson for the company and the salesperson quits, then
information about the product is deleted along with salesperson information.
Decomposition
→ The term decomposition refers to the process in which we break down a table in a database into
various elements or parts. Thus, decomposition replaces a given relation with a collection of various
smaller relations. Thus, in a database, we can make any table break down into multiple tables when
we want to collect a particular set of data.
→ Decomposition must always be lossless. This way, we can rest assured that the data/information
that was there in the original relation can be reconstructed accurately on the basis of the decomposed
relations. In case the relation is not decomposed properly, then it may eventually lead to problems
such as information loss.
Types of Decomposition
1. Lossless Decomposition
A decomposition is said to be lossless when it is feasible to reconstruct the original relation R using joins
from the decomposed tables. It is the most preferred choice. This way, the information will not be lost from
the relation when we decompose it. A lossless join would eventually result in the original relation that is
very similar.
2. Lossy Decomposition
Just like the name suggests, whenever we decompose a relation into multiple relational schemas, then the
loss of data/information is unavoidable whenever we try to retrieve the original relation. Example : make
yourself.
7
Normal forms: 1NF, 2NF, 3NF and BCNF
Normalization is a process in relational database design that organizes tables and their attributes to reduce
data redundancy and dependency. The goal is to eliminate data anomalies, improve data integrity, and make
the database structure more efficient. There are several normal forms, each addressing specific issues
related to data organization. The most common normal forms are:
Rules:
→ It was defined to disallow multivalued attributes, composite attributes, and their combinations. It
states that the domain of an attribute must include only atomic values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute. REMEDY:Form new
relations for each multivalued attribute or nested relation.
Rules:
Rule 1: Be in 1NF
→ The second normal form (2NF) is based on the concept of full functional dependency. A functional
dependency X Y is a full functional dependency if removal of any attribute A from X means that
the dependency does not hold anymore. A functional dependency X Y is a partial dependency if
8
some attribute A & X can be removed from X and the dependency still holds. Definition: A relation
schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary
key of R. The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key.
Rules:
Rule 1: Be in 2NF
To move our 2NF table into 3NF, we again need to divide our table.
9
4. BCNF
→ Boyce-Codd Normal Form (BCNF) is a higher normal form in relational database design,
addressing certain types of dependencies that may still exist even after applying the Third Normal
Form (3NF). BCNF is an extension of the Third Normal Form and ensures that a relation is free
from certain types of anomalies related to functional dependencies.
o 4NF (Fourth Normal Form) Rules
− If no database table instance contains two or more, independent and multivalued data describing the
relevant entity, then it is in 4th Normal Form.
o 5NF (Fifth Normal Form) Rules
− A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of
smaller tables without loss of data.
10