0% found this document useful (0 votes)
5 views

DB Normalization

Uploaded by

Anas Abdussalam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DB Normalization

Uploaded by

Anas Abdussalam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

NIGERIAN DEFENCE ACADEMY

DEPRATMENT OF COMPUTER SCIENCE

CS 911 – ADVANCED DATABASE TECHNOLOGY &


APPLICATION

PRESENTATION ON DATABASE NORMALIZATION


BY

ANAS ABDUSSALAM
NDAPGS/FMSIS/COM012024/3918
YAKUBU ERNEST NWUKU
NDAPG/FMSIS/COM012024/5184

PhD COMPUTER SCIENCE

1
Database Normalization

Database normalization is the process of organizing data in a database to minimize redundancy


and ensure data integrity. It involves dividing a database into related tables and defining
relationships between them to avoid anomalies (such as insertion, deletion, or update anomalies).
The goal is to ensure that each piece of data is stored only once, in the most appropriate place,
and that relationships between the data are clearly defined.

Normalization typically involves a series of steps or "normal forms" where each normal form
addresses specific types of redundancy and dependency issues. These normal forms include:

1. First Normal Form (1NF)


2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)

To aid our understanding of each of these forms of normalization, it is crucial we discuss


Functional Dependencies and Inference Rules.

Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

2
Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency
as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are tri
vial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

3
Inference Rules

In the context of Database Management Systems (DBMS), inference rules are used primarily
in relational database theory to derive new dependencies from a given set of functional
dependencies (FDs). These rules are particularly useful when reasoning about normalization,
database design, and query optimization.

Armstrong's Axioms

The most commonly known inference rules in DBMS are Armstrong's Axioms, which are a set
of sound and complete rules used to infer all possible functional dependencies from a given set.
These are fundamental in normalizing databases and ensuring the correctness of database
designs.

Here are the 6 Armstrong's axioms (inference rules):

1. Reflexive Rule (IR1)

In the reflexive rule, if Y is a subset of X, then X determines Y.

1. If X ⊇ Y then X → Y

Example:

1. X = {a, b, c, d, e}
2. Y = {a, b, c}

2. Augmentation Rule (IR2)

The augmentation is also called as a partial dependency. In augmentation, if X determines Y,


then XZ determines YZ for any Z.

1. If X → Y then XZ → YZ

Example:

1. For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3)

In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.

1. If X → Y and Y → Z then X → Z

4
4. Union Rule (IR4)

Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.

1. If X → Y and X → Z then X → YZ

Proof:

1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

5. Decomposition Rule (IR5)

Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.

1. If X → YZ then X → Y and X → Z

Proof:

1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)

6. Pseudo transitive Rule (IR6)

In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.

1. If X → Y and YZ → W then XZ → W

Proof:

1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Application of Inference Rules in DBMS

These inference rules are used in various database operations:

5
1. Database Normalization: They help in decomposing tables to remove redundancy and
ensure that the database is in the desired normal form (1NF, 2NF, 3NF, etc.).
2. Functional Dependency Closure: You can use these rules to find the closure of a set of
functional dependencies, which helps in identifying all implied FDs.
3. Candidate Key Identification: Inference rules are also used to deduce keys in a
relational schema by deriving FDs and determining minimal super keys.
4. Query Optimization: In query optimization, these rules are applied to rewrite queries in
more efficient forms or ensure that dependencies are satisfied during query execution.

Forms of Normalization
Below is an explanation of each of the six forms of normalization

First Normal Form (1NF)

Ensures that each column contains atomic (indivisible) values, and each entry in a column is of
the same data type. There should be no repeating groups or arrays in a table.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

6
Second Normal Form (2NF)

Builds on 1NF by ensuring that all non-key attributes are fully dependent on the primary key. It
eliminates partial dependencies where a non-key attribute depends only on part of a composite
key.

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

7
25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)

Builds on 2NF by ensuring that no non-key attribute is dependent on another non-key attribute, a
condition known as transitive dependency. This means all non-key attributes must depend
directly on the primary key.

A relation is in third normal form if it holds at least one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP} and so on

8
Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key (EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

9
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF. A table is in BCNF if every
functional dependency X → Y, X is the super key of the table. For BCNF, the table should be in
3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

10
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency. For a dependency A → B, if for a single value of A, multiple values of B exist, then
the relation will be a multi-valued dependency.

Example

11
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entities. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and
Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on
STU_ID, which leads to unnecessary repetition of data.

So, to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

12
21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless. 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy. 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT

13
Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

To summarize each form of Normalization

Normal Description
Form

14
1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no
multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.

Advantages of Normalization

 Normalization helps to minimize data redundancy.


 Greater overall database organization.
 Data consistency within the database.
 Much more flexible database design.
 Enforces the concept of relational integrity.

Disadvantages of Normalization

 You cannot start building the database before knowing what the user needs.
 The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
 It is very time-consuming and difficult to normalize relations of a higher degree.
 Careless decomposition may lead to a bad database design, leading to serious problems.

By normalizing a database, you improve its efficiency, consistency, and scalability, making data
updates and retrieval easier while reducing data anomalies.

15
Conclusion

Inference rules in DBMS (especially Armstrong's Axioms) are foundational tools used to derive
new functional dependencies, assisting in the design, optimization, and normalization of
databases. These rules ensure that databases remain consistent, efficient, and free from
unnecessary redundancy.

16
REFERENCES

Elmasri, R., & Navathe, S. B. (2016). Fundamentals of Database Systems (7th Edition).
Pearson.

Silberschatz, A., Korth, H. F., & Sudarshan, S. (2019). Database System Concepts (7th
Edition). McGraw-Hill Education.

Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications
of the ACM, 13(6), 377-387.

Connolly, T., & Begg, C. (2015). Database Systems: A Practical Approach to Design,
Implementation, and Management (6th Edition). Pearson.

Ramakrishnan, R., & Gehrke, J. (2003). Database Management Systems (3rd Edition).
McGraw-Hill.

Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of Databases. Addison-Wesley.

Armstrong, W. W. (1974). Dependency Structures of Data Base Relationships. IFIP Congress,


580-583.

Date, C. J. (2003). An Introduction to Database Systems (8th Edition). Pearson.

Javatpoint.com

17

You might also like