Normalization Explained in Detil
Normalization Explained in Detil
DB2 Universal Database Version 9.1 is going out of support as of April 30, 2012. More details are available here: http://www-01.ibm.com/support/docview.wss?uid=swg21575777
Normalization
Normalization helps eliminate redundancies and inconsistencies in table data. It is the process of reducing tables to a set of columns where all the non-key columns depend on the primary key column. If this is not the case, the data can become inconsistent during updates. This section briefly reviews the rules for first, second, third, and fourth normal form. The fifth normal form of a table, which is covered in many books on database design, is not described here. Form Description First At each row and column position in the table, there exists one value, never a set of values. Second Each column that is not part of the key is dependent upon the key. Third Each non-key column is independent of other non-key columns, and is dependent only upon the key. Fourth No row contains two or more independent multi-valued facts about an entity.
The following example shows the same table in first normal form.
Table 5. Table Conforming to First Normal Form PART (Primary Key) P0010 P0010 P0010 P0020 P0020 WAREHOUSE (Primary Key) Warehouse A Warehouse B Warehouse C Warehouse B Warehouse D QUANTITY 400 543 329 200 278
QUANTITY
WAREHOUSE_ADDRESS
1608 New Field Road 4141 Greenway Drive 171 Pine Lane 4141 Greenway Drive 800 Massey Street
The primary key is a composite key, consisting of the PART and the WAREHOUSE columns together. Because the WAREHOUSE_ADDRESS column depends only on the value of WAREHOUSE, the table violates the rule for second normal form. The problems with this design are: The warehouse address is repeated in every record for a part stored in that warehouse. If the address of a warehouse changes, every row referring to a part stored in that warehouse must be updated. Because of this redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse.
If at some time there are no parts stored in a warehouse, there might not be a row in which to record the warehouse address.
The solution is to split the table into the following two tables: Table 7. PART_STOCK Table Conforming to Second Normal Form PART (Primary Key) P0010 P0010 P0010 P0020 P0020 WAREHOUSE (Primary Key) Warehouse A Warehouse B Warehouse C Warehouse B Warehouse D QUANTITY 400 543 329 200 278
Table 8. WAREHOUSE Table Conforms to Second Normal Form WAREHOUSE (Primary Key) Warehouse A Warehouse B Warehouse C Warehouse D WAREHOUSE_ADDRESS 1608 New Field Road 4141 Greenway Drive 171 Pine Lane 800 Massey Street
There is a performance consideration in having the two tables in second normal form. Applications that produce reports on the location of parts must join both tables to retrieve the relevant information.
FIRSTNAME
LASTNAME
WORKDEPT
DEPTNAME
John
Parker
E11
000320
Ramlal
Mehta
E21
000310
Maude
Setright
E11
Table 10. Unnormalized EMPLOYEE_DEPARTMENT Table After Update Information in the table has become inconsistent. EMPNO (Primary Key)
FIRSTNAME
LASTNAME
WORKDEPT
DEPTNAME
000290
John
Parker
E11
000320
Ramlal
Mehta
E21
000310
Maude
Setright
E11
The table can be normalized by creating a new table, with columns for WORKDEPT and DEPTNAME. An update like changing a department name is now much easier; only the new table needs to be updated. An SQL query that returns the department name along with the employee name is more complex to write, because it requires joining the two tables. It will probably also take longer to run than a query on a single table. Additional storage space is required, because the WORKDEPT column must appear in both tables. The following tables are defined as a result of normalization: Table 11. EMPLOYEE Table After Normalizing the EMPLOYEE_DEPARTMENT Table EMPNO (Primary Key) 000290 000320 000310
FIRSTNAME
LASTNAME
WORKDEPT
Table 12. DEPARTMENT Table After Normalizing the EMPLOYEE_DEPARTMENT Table DEPTNO (Primary Key) DEPTNAME
Table 12. DEPARTMENT Table After Normalizing the EMPLOYEE_DEPARTMENT Table DEPTNO (Primary Key) E11 E21 DEPTNAME Operations Software Support
Instead, the relationships should be represented in two tables: Table 14. EMPLOYEE_SKILL Table Conforming to Fourth Normal Form EMPNO (Primary Key) 000130 000130 000130 SKILL (Primary Key) Data Modelling Database Design Application Design
Table 15. EMPLOYEE_LANGUAGE Table Conforming to Fourth Normal Form EMPNO (Primary Key) 000130 000130 LANGUAGE (Primary Key) English Spanish
If, however, the attributes are interdependent (that is, the employee applies certain languages only to certain skills), the table should not be split. A good strategy when designing a database is to arrange all data in tables that are in fourth normal form, and then to decide whether the results give you an acceptable level of performance. If they do not, you can rearrange the data in tables that are in third normal form, and then reassess performance. Concept topic This topic is part of: Administration Guide: Planning