Database Management Systems
Database Management Systems
What is normalization
Database Normalization is the process of removing the redundant data from tables in to
improve storage efficiency , data integrity and scalability.
Normalization generally involves splitting existing tables into multiples ones, which must
be re-joined or linked each time a query is issued.
Primarily a tool to validate and improve a logical design so that it satisfies certain
constraints that avoid unnecessary duplication of data
It is bottom-up techniques.
– The minimal number of attributes necessary to support the data requirements of the enterprise;
– Attributes with a close logical relationship are found in the same relation;
– Minimal redundancy with each attribute represented only once with the important exception of
attributes that form all or part of foreign keys.
• The benefits of using a database that has a suitable set of relations is that the database will be:
– Easier for the user to access and maintain the data;
– Take up minimal storage space on the computer.
3.2 How Normalization Support Database Design
Approach 1 shows how normalization can be used as a bottom-up standalone database design
technique,
Approach 2 shows how normalization can be used as a validation technique to check the structure
of relations which may have been created using a top-down approach such as ER modeling.
The users’ requirements specification is the preferred data source, it is possible to design a
database based on the information taken directly from other data sources, such as forms and
reports.
• Problems associated with data redundancy are illustrated by comparing the Staff and Branch
relations with the Staff-Branch relation.
Staff-Branch relation has redundant data; the details of a branch are repeated for every member of
staff.
• In contrast, the branch information appears only once for each branch in the Branch relation and
only the branch number (branchNo) is repeated in the Staff relation, to represent where each
member of staff is located.
Relations that contain redundant information may potentially suffer from update anomalies.
A relation that contains minimal data redundancy and allows users to insert, delete, and update
rows without causing data inconsistencies
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
consider the following table.
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
If a relation R has attributes X, Y, Z with the dependencies X->Y and X->Z which states that
those dependencies are fully functional.
Partial Functional Dependency: In partial functional dependency a non key attribute depends
on a part of the composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a partial
functional dependency in RBDMS.
Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e.
For example,
enrol_n
o name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Example1:
In the table shown below the values in the [color] column in the first row can be divided into
“yellow ” and “blue ”, hence table-product is not in 1NF.
A repeating group means that a table contains two or more columns that are closely related.
The above table is not in first normal form because the [color] column can contain multiple
values.
To bring this table to first normal form, we split the table into two tables and now we how the
resulting tables.
Eg 2:
3.6 Second Normal Form
Definition
* All non-key attributes are fully functionally dependent on the primary key.
Note that if the primary is not a composite key, all non key attributes are always fully functional
dependent on the primary key. A table that is in first normal form and contains only a single key as
the primary key is automatically in second normal form.
The above table has a composite primary key [customer id, store id]. The non-key attribute is
[purchase Location]. In this case, [purchase location] only depends on [store id], which is only part
of the primary key. Therefore this table does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables and now we have the
following.
What we have done is to remove the partial functional dependency that we initially have. Now, in
the table [table- store], the column [purchase location] is fully dependent on the primary key of
that table, which is store Id.
A database is in third normal form if it satisfies the following condition. i) It is in second normal
form. ii) There is no transitive functional independency.
Example:
In the above table [book-id] determines [gender-id] and [gender] determines [gender-type] via
[gender-id] and we have transitive functional dependency and this structure does not satisfies the
third normal form.
To bring this table to the third normal form, we split the table into two as follows.
Now all non-key attributes are fully functionally dependent only on the primary key In [table-
book], both [gender-id] and [price] are only dependent on [book-id].
Relational data model includes several types of constraints whose purpose is to maintain
the accuracy and integrity of the data in the database. The major type of integrity constraints
are
1. Domain Constraints
2. Entity Integrity
3. Referential Integrity
4. Operational constraints
1. What is a tuple?
In Relational Data structure terminology tuple is nothing but record.
8. What is normalization?
Normalization is a process in which we analyze and decompose the complex relations and
transform them into smaller, simpler and well –structured relations for validating and improving
the logical design , so that the logical design satisfies certain constraints and avoid unnecessary
duplication of data.
9. What are the two problematic issues in the design of relational database?
Two most problematic issues in the design of relational databases are
1. Repetition of Information ( redundancy)
2. Inability to represent certain information.