0% found this document useful (0 votes)
26 views

Normalization

Data base management

Uploaded by

luckson chikowa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Normalization

Data base management

Uploaded by

luckson chikowa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Keys

 Keys play an important role in the relational database.


 It is used to uniquely identify any record or row of data from the table. It is also used to establish
and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.

Types of keys:
1. Primary key

 It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys, as we saw in the PERSON table. The key which is most suitable from those
lists becomes a primary key.
 In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
since they are also unique.
 For each entity, the primary key selection is based on requirements and developers.

2. Candidate key

 A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
 Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
attributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the


name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.


4. Foreign key

 Foreign keys are the column of the table used to point to the primary key of another table.
 Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.
 We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
 In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

5. Alternate key

There may be one or more attributes or a combination of attributes that uniquely identify each
tuple in a relation. These attributes or combinations of the attributes are called the candidate
keys. One key is chosen as the primary key from these candidate keys, and the remaining
candidate key, if it exists, is termed the alternate key. In other words, the total number of the
alternate keys is the total number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation, it does not have an alternate
key.

For example, employee relation has two attributes, Employee_Id and PAN_No, that act as
candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate
key, PAN_No, acts as the Alternate key.
6. Composite key

Whenever a primary key consists of more than one attribute, it is known as a composite key.
This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned multiple
roles, and an employee may work on multiple projects simultaneously. So the primary key will
be composed of all three attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So
these attributes act as a composite key since the primary key comprises more than one attribute.
7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys. These keys are
created when a primary key is large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large
in employee relations. So it would be better to add a new virtual attribute to identify each tuple in
the relation uniquely.
Decomposition:
Two types:
1.Lossless Decomposition
2.Lossy Decomposition
Normalization
 Normalization is the process of organizing the data in the database.
 Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.
 Normalization divides the larger table into the smaller table and links them using
relationship.
 The normal form is used to reduce redundancy from the database table.

Types of Normal Forms


There are the four types of normal forms:

Normal
Description
Form
1NF A relation is in 1NF if it contains an atomic value.
A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
2NF
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
4NF
valued dependency.
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
5NF
should be lossless.

First Normal Form (1NF)


 A relation will be 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


7272826385,
14 John UP
9064738238
20 Harry 8574783832 Bihar
7390372389,
12 Sam Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
 In the 2NF, relational must be in 1NF.
 In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID Name SUBJECT TEACHER_AGE


25 Seeta Chemistry 30
25 Seeta Biology 30
47 Neeta English 35
83 Sagar Math 38
83 Sagar Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which


is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID Name TEACHER_AGE


25 Seeta 30
47 Neeta 35
83 Sagar 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Third Normal Form (3NF)


 A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
 If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....s


o on
Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent
on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP (FK)


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd normal form (BCNF)


 BCNF is the advance version of 3NF. It is stricter than 3NF.
 A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
 For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
364 UK

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

You might also like