Unit 4 Dbms
Unit 4 Dbms
The Schema Refinement refers to refine the schema by using some technique.
The best technique of schema refinement is decomposition.
Normalization
It means “split the tables into small tables which will contain less number of attributes in such a
way that table design must not contain any problem of inserting, deleting, updating anomalies and
guarantees no redundancy”.
Normalization or Schema Refinement is a technique of organizing the data in the database.
It is a systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies.
Redundancy: refers to repetition of same data or duplicate copies of same data stored in different
locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized databases
where all the data is stored in one table which is sometimes called a flat file database.
Due to redundancy of data we may get the following problems, those are:
1.insertion anomalies : It may not be possible to store some information unless some other information
is stored as well.
2.redundant storage: some information is stored repeatedly
3.update anomalies: If one copy of redundant data is updated, then inconsistency is created unless all
redundant copies of data are updated.
4.deletion anomalies: It may not be possible to delete some information without losing some other
information as well
Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000, then we
have to update FEE column in all the rows, else data will become inconsistent.
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy, otherwise
they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4 subject.
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly: Deletion of S3 student cause the deletion of course. Because of deletion of some data
forced to delete some other useful data.
➢ Remove insert, update, and delete anomalies during the database activities.
➢ Normalization reduces a complex user view to a set of small and sub groups of fields or relations. This
process helps to design a logical data model known as conceptual data model.
Advantages of Normalization:
1. Greater overall database organization will be gained.
2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.
Disadvantages of Normalization:
1. The disadvantage of normalization is that it produces a lot of tables with a relatively small number of
columns. These columns then have to be joined using their primary/foreign key relationship.
2. This has two disadvantages.
Performance: all the joins required to merge data slow processing & place additional stress on your
hardware.
Complex queries: developers have to code complex queries in order to merge data from different tables.
Concept of Functional Dependency:
To practice: Check D→A can be derived from the following FDs or not AB→C, BC→AD,
D→E, CF→B.
(2) Identification of key by using closure set as attributes:
A key attribute: An attribute that is capable of identifying all other attributes in a given table.
(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and ti identify rows in
the table uniquely.
(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify uniquely the rows in
the table so, we combine 2 or more attributes to identify the rows uniquely.
(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be used to identify the
rows uniquely Eg :( vech no,veng no,purchase date)
Either vehicle no or vehicle engine no can be used as a key attribute then they are called as candidate keys
one of the candidate key can be elected as primary key.
Example 1: Find candidate keys for the relation R(ABCD) having following FD’s AB→CD, C→A,
D→A.
Sol: From the given FD’s, the attribute B is key attribute because it is not in RHS of functional
dependency.
From the above attributes AB and BC determines all attributes. AB, BC are candidate keys.
Example 2: Find candidate keys for the relation R(ABCDE) having following FD’s A→BC, CD→E,
B→D, E→A.
Sol: From the given FD’s, no attribute is key attribute because all are in RHS of functional dependency.
So check for all attributes of LHS
(3) To identify equivalence of F.D’s :
Different database designers may define different F.D’s sets from the same requirements. To
evaluate whether they are equivalent if we are able to derive all F.D’s in G from F and vice versa
(4) To identify the irreducible form of FD’s /canonical Form (minimal cover):
We try to minimize the functional dependency. The minimize FD should be equivalent to original FD,
Procedure to find minimal set:
Step 1: Have single attributes on the RHS for every FD.
Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them from
the list.
Step 3: Evaluate the necessity of the LHS attributes in FD’s obtained from step 2.If they are not
necessary remove from FD.
Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step 3.Then we
will get irreducible set.
To Practice:
Normal forms based on functional dependency (1NF, 2NF and 3 NF, Boyce Codd normal form
(BCNF), 4NF)
Normalization means “split the tables into small tables which will contain less number of attributes
in such a way that table design must not contain any problem of inserting, deleting, updating
anomalies and guarantees no redundancy”.
The evolution of Normalization theories / Steps of Normalization / Different Normal Forms is
illustrated below
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF).
Points to be Remember
➢ If you construct E-R diagrams in to the tables, then 4 NF and 5 NF need not be applied on the
table.
➢ Practically applied normalization is upto 3NF and very rarely we will go beyond that.
➢ 2 NF dealing with the partial dependencies and 3NF is dealing with transitive dependencies.
First Normal Form (1NF):
A relation is said to in the 1NF if it is already in un-normalized form and it satisfies the following
conditions or rules or qualifications are:
1. Each attribute name must be unique.
2. Each attribute value must be single or atomic i.e., Single Valued Attributes.
3. Each row / record must be unique.
4. There is no repeating group’s.
Example: How do we bring an un-normalized table into first normal form? Consider the following
relation:
Solution: This table is not in first normal form because the [Color] column can contain multiple
values. For example, the first row includes values "red" and "green." To bring this table to first
normal form, we split the table into two tables and now we have the resulting tables:
➔This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part
of the primary key. Therefore, this table does not satisfy second normal form.
➔ To bring this table to second normal form, we break the table into two tables, and now we have the
following:
➢ By transitive functional dependency, we mean we have the following relationships in the table: A
is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively
dependent on A via B. and A non-key attribute is depending on a non-key attribute.
Example: Consider the following relation
➔ In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional
dependency, and this structure does not satisfy third normal form.
➔ To bring this table to third normal form, we split the table into two as follows:
✓ A table is in 3NF if for every functional dependency X → Y, X is the super key of the table.
✓ For BCNF, the table should be in 3NF and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
➔ In the above table Functional dependencies are as follows: EMP_ID → EMP_COUNTRY and
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT}
➔ The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To convert the
given table into BCNF, we decompose it into three tables
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the relation will be
multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the following
conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have
multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.
■ If all these conditions are true for any relation (table), it is said to have multi-valued dependency.
Example
➢ The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two hobbies,
Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to un-
necessary repetition of data.
➢ So to make the above table into 4NF, we can decompose it into two tables:
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math
class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking
that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary
key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations