Unit 9 Functional Dependencies and Normalization For Relational Databases
This document describes the concepts of database normalization. It begins with an introduction to normalization, explaining that normalization is the process of structuring data to avoid data redundancy and inconsistencies. It then discusses several normalization forms including second normal form (2NF), third normal form (3NF), and Boyce-Codd normal form (BCNF). The document provides examples to illustrate these normalization concepts and contains self-assessment questions throughout.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
57 views20 pages
Unit 9 Functional Dependencies and Normalization For Relational Databases
This document describes the concepts of database normalization. It begins with an introduction to normalization, explaining that normalization is the process of structuring data to avoid data redundancy and inconsistencies. It then discusses several normalization forms including second normal form (2NF), third normal form (3NF), and Boyce-Codd normal form (BCNF). The document provides examples to illustrate these normalization concepts and contains self-assessment questions throughout.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 166
Unit 9 Functional Dependencies and Normalization for Relational Databases Structure 9.1 Introduction to Normalization Objectives Self Assessment Question(s) (SAQs) 9.2 Information Design Guide Lines for Relational DB Self Assessment Question(s) (SAQs) 9.3 Normal forms Based on Primary Keys 9.3.1 Second Normal Form (2NF) 9.3.2 Third Normal Form (3NF) Self Assessment Question(s) (SAQs) 9.4 Boyce Codd Normal Form (BCNF) Self Assessment Question(s) (SAQs) 9.5 Fourth Normal Form (4NF) Self Assessment Question(s) (SAQs) 9.6 Normalization using J oin Dependencies Self Assessment Question(s) (SAQs) 9.7 Summary 9.8 Terminal Questions (TQs) 9.9 Multiple Choice Questions (MCQs) 9.10 Answers to SAQs, TQs, and MCQs 9.10.1 Answers to Self Assessment Questions (SAQs) 9.10.2 Answers to Terminal Questions (TQs) 9.10.3 Answers to Multiple Choice Questions (MCQs) Database Management Systems Unit 9 Sikkim Manipal University Page No.: 167 9.1 Introduction to Normalization Normalization is the process of building database structures to store data, because any application ultimately depends on its data structures. If the data structures are poorly designed, the application will start from a poor foundation. This will require a lot more work to create a useful and efficient application. Normalization is the formal process for deciding which attributes should be grouped together in a relation. Normalization serves as a tool for validating and improving the logical design, so that the logical design avoids unnecessary duplication of data, i.e. it eliminates redundancy and promotes integrity. In the normalization process we analyze and decompose the complex relations into smaller, simpler and well-structured relations. Objectives To know about o Information Design Guide Lines for Relational DB: o Normal Forms Based on Primary Keys: o Second Normal Form (2NF) o Third Normal Form (3NF ) o Boyce Codd Normal Form (BCNF) o Fourth Normal Form (4NF) o Normalization using J oin Dependencies Self Assessment Question(s) (SAQs) (For Section 9.1) 1. Define Normalization. Why do you need it? 9.2 Information Design Guide Lines for relational DB Some criteria for good and bad relation schemas are: Semantics of attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples. Database Management Systems Unit 9 Sikkim Manipal University Page No.: 168 Semantics of the Attributes: Whenever we group attributes to form a relation, we assume that a certain meaning is associated with the attributes. This meaning is called Semantics, and specifies how the attribute values in a tuple relate to one another. E.g.: consider company database schema. The various relations considered for this database are: EMPLOYEE f.k ENAME SSN BDATE ADDRESS DNUMBER DEPARTMENT f.k DNAME DNUMBER DMGRSSN p.k. Fig. 9.1: Simplified version of the COMPANY relational database schema Database Management Systems Unit 9 Sikkim Manipal University Page No.: 169 The meaning of the Employee relation is quite simple, each tuple represents an employee. The Dnumber attribute is a foreign key that represents an implicit relationship between EMPLOYEE and DEPARTMENT relations. Guideline-1: design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Reducing redundant values on tuples: Storage space is one of the most important considerations of a relational schema. Improper grouping of attributes has a significant effect on the storage space of the relational schema. Ex: Figure A Emp.no Emp.Name Salary Address Figure B Dept_no Dname D_location In figure B each department information appears only once in the department relation. If we integrate figure (A) and figure (B) as single table Emp_dept. Figure C: Emp_dept Emp.no Emp.Name Salary Addr Dept.no D.Name D.loc There will be serious problemin using Figure C; that is insertion anomalies, deletion anomalies and modification anomalies. Database Management Systems Unit 9 Sikkim Manipal University Page No.: 170 Here whenever we are inserting tuples, there maybe n employees in department 10, Dept.no, D.name, D_loc values are repeated n times, which leads to data redundancy. Insertion Anomalies: It is difficult to insert a ne department that has no employees as yet in the Emp_dept relation. This causes a problem because Emp.no is the primary key of Emp_dept. This problem does not occur in the design of fig.(B), because a department is entered in the DEPARTMENT relation, whether or not any employee works for it. Deletion Anomalies: If we deletie the lost employee of a department from the emp_dept relation, than the whole information about that department will be lost. This problem does not occur in the database of fig.(B) because DEPARTMENT tuples are stored separately. Modification Anomalies: In Emp_dept. if we change the value of one of the attributes of a particular department, say location of department 5, we must update the tuples of employees who work in that department, otherwise DB will become inconsistent. Guide-line 2: Design DB so that no insertion, deletion or modification anomalies are present in that relation. If there are any anomalies, note them clearly, so that proper actions can be taken. NULL values in tuples: These include unnecessary attributes in the relation. If many of the attributes do not take any values, we insert NULL values. This can waste space at the storage level, and it also leads to problems in understanding Database Management Systems Unit 9 Sikkim Manipal University Page No.: 171 the meaning of the attributes and specifying join operation. Null's may lead to counting problems while using aggregate functions. Guideline 3: As far as possible avoid using NULL values for attributes in a relation. Disallowing spurious tuples: Design relational schema so that they can be joined with equality conditions. Figure A Emp_loc Emp_Name P_loc Figure B Emp_project SSN PNO P_Name P_Loc If we attempt a natural join operation on figure A and Figure B, the result produces many more tuples than the actual combination of tuples. Additional tuples are called Spurious Tuples,_ because they represent wrong information. Guideline 4: Design relation schemas so that they can be joined with equality conditions on attributes that are either primary key or foreign key. It guarantees that no spurious tuples are generated. Self Assessment Question(s) (SAQs) (For section 9.2) 1. List some criteria for good and bad relationschemas 9.3 Normal forms Based on Primary Keys A relation schema R is in first normal form if every attribute of R takes only single atomic values. We can also define it as intersection of each row and Database Management Systems Unit 9 Sikkim Manipal University Page No.: 172 column containing one and only one value. To transform the un-normalized table (a table that contains one or more repeating groups) to first normal form, we identify and remove the repeating groups within the table. E.g. Figure A Dept. D.Name D.No D. location R&D 5 [England, London, Delhi) HRD 4 Bangalore Consider the figure that each dept can have number of locations. This is not in first normal form because D.location is not an atomic attribute. The dormain of D location contains multivalues. There is a technique to achieve the first normal form. Remove the attribute D.location that violates the first normal form and place into separate relation Dept_location Ex: Dept Dept_location Dept.no. D.Name Dept_location Dept_No 5 R&D 6 HRD 9.3.1 Second Normal Form (2 NF) A second normal form is based on the concept of full functional dependencey. A relation is in second normal form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R. Emp_Project: Emp_Project Database Management Systems Unit 9 Sikkim Manipal University Page No.: 173 Figure 9.2: 2NF and 3 NF, (a) Normalizing EMP_PROJ into 2NF relations (b) Normalizing EMP_DEPT into 3NF relations A Partial functional dependency is a functional dependency in which one or more non-key attributes are functionally dependent on part of the primary key. It creates a redundancy in that relation, which results in anomalies when the table is updated. 9.3.2 Third Normal Form (3NF) This is based on the concept of transitive dependency. We should design relational schema in such a way that there should not be any transitive Database Management Systems Unit 9 Sikkim Manipal University Page No.: 174 dependencies, because they lead to update anomalies. A functional dependence [FD] x->y in a relation schema 'R' is a transitive dependency. If there is a set of attributes 'Z' Le x->, z->y is transitive. The dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN- >Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset[part] of the key. According to codd's definition, a relational schema 'R is in 3NF if it satisfies 2NF and no no_prime attribute is transitively dependent on the primary key. Emp_dept relation is not in 3NF, we can normalize the above table by decomposing into E1 and E2. Note: Transitive is a mathematical relation that states that if a relation is true between the first value and the second value, and between the second value and the 3 rd value, then it is true between the 1 st and the 3 rd value. Example 2: Consider a relation schema 'Lots' which describes the parts of land for sale in various countries of a state. Suppose there are two candidate keys: property_ID and {Country_name.lot#}; that is, lot numbers are unique only within each country, but property_ID numbers are unique across countries for entire state. Database Management Systems Unit 9 Sikkim Manipal University Page No.: 175 Based on the two candidate keys property_ID and {country name,Lot}we know that functional dependencies FD1 and FD2 hold. Suppose the following two additional functional dependencies hold in LOTS. FD3: Country_name ->tax_rate FD4: Area ->price Here, FD3 says that the tax rate is fixed for a given country coutryname taxrate, FD4 says that price of a Lot is determined by its area, area price. The Lots relation schema violates 2NF, because tax_rate is partially dependent upon candidate key { Country_namelot#} Due to this, it decomposes lots relation into two relations - lots1 and lots 2. Lots1 violates 3NF, because price is transitively dependent on candidate key of Lots1 via attribute area. Hence we could decompose LOTS1 into LOTS1A and LOTS1B. A relation schema R is in 3NF when it satisfies the conditions below. 1. It is fully functionally dependent on every key of 'R' 2. It is non_transitively dependent on every key of 'R' Self Assessment Question(s) (SAQs) (For section 9.3) 1. Define and explain 1 NF. 2. Explain 2-NF. 3. Discuss 3-NF. 9.4 Boyce Codd Normal Form (BCNF) Database relation are designed so that they are neither partial dependencies nor transitive dependencies, because these types of dependencies result in update anomalies. A functional dependency describes the relationship between attributes in a relation. For example, 'A' Database Management Systems Unit 9 Sikkim Manipal University Page No.: 176 and 'B' are attributes in relation R. 'B' is functionally dependent on 'A' (A B) if each value of 'A' is associated with exactly one value of 'B'. The left_hand side and the right_hand side functional dependency are sometimes called the determinant and dependent respectively. A relation is in BCNF if andonly if every determinant is a Candidate key. The difference between the third normal form and BCNF is that for a functional dependency A B, the third normal form allows this dependency in a relation if 'B' is a primary_key attribute and 'A' is not a Cndidate key. Where as in BCNF. 'A' must be Candidate Key. Therefore BCNF is a stronger form of the third normal form. PRODUCT (prd#,prdname,price) Prd#->prodname,price CUSTOMER (cust#,custname,custaddr) Cust#->custname,custaddr ORDER (ord#,cust#mord#,qty,amt) Ord#->qty,amt The PRODUCT scheme is in BCNF. Since the prd#is a candidate key, similarly customer schema is also in BCNF. The schema ORDER, however is not in BCNF, because ord#is not a super key for ORDER, i.e. we could have a pair of tuples representing a single ord#. For e.g. (1234,145,13,789) (1234,123,53,455) here ord#is not a candidate key. However, the FD ord#->amt is not trivial; therefore ORDER does not satisfy the definition of CNF. It suffers from the Database Management Systems Unit 9 Sikkim Manipal University Page No.: 177 problem of repetition of information. This redundancy can be eliminated by decomposing into ORDER1, ORDER2. ORDER1(ord#,cust#) ORDER2(prd#,qty,amt) Example 2: Consider for example LOTS relation. It has got a 5 functional dependency FD1 to FD4, Suppose we have thousands of lots in the relation but the lots are from only two countries: A and B. suppose lot size in country A is 0.5.0.6.1.0 acres, where as lot size in country B is restricted to 1.1.1.2..2.0 acres. In such a situation we would have additional functional dependency FD5: area ->country_name. Here FD5 can be represented by 16 tuples in a separate relation R(Area,Country_name), since there are only 16 possible area values. This representation reduces the redundancy of repeating the same information in thousands of LOTS1A tuples. Figure 9.3: Boyce-Codd normal form (a) BCNF normalization of LOTS1A with the functional dependency FD2 being lost in the decomposition (b) A schematic relation with FDS; it is in 3NF but not in BCNF Database Management Systems Unit 9 Sikkim Manipal University Page No.: 178 Self Assessment Question(s) (SAQs) (For Section 9.4) 1. Explain the concept of BCNF. 9.5 Fourth Normal Form (4NF) Multi valued dependencies are based on the concept of first normal form, which prohibits attributes having a set of values. If we have two or more multi valued independent attributes in the same relation, we get into a situation where we have to repeat every value of one of the attributes, with every value of the other attributes to keep the relation state consistent, and to maintain independence among the attributes involved. This constraint is specified by a Multi valued dependency. Consider a table employee that has the attribute name, project and hobby. An employee can work in more than one project and can have more than one hobby. The employees projects and hobbies are independent of one another. A given project or hobby is associated with any number of employees. To keep the Relation State consistent we must have separate tuples to represent every combination of employee's project and employees hobbies. The drawback of EMPLOYEE relation is redundant data. This redundant data leads to update anomaly. For example, if we wish to add one more project on Sybase, so that employ B is handling, then we must add two more tuples for each hobby. The values Reading and Movie of hobby are repeated with each value of project. This redundancy is undesirable. One way to remove redundancy is to decompose EMPLOYEE relation into two relations PROJ ECT AND HOBBY. NOW, if we wish to insert Sybase in PROJ ECT relation, then there is only one entry required. Database Management Systems Unit 9 Sikkim Manipal University Page No.: 179 Definition (MVD): A relation R(X.Y.Z) is said to have multivalued dependency XY if the set of Y values for a given [X,Z] pair does not depend on Z, but depends only on X, then we say XY "X multi- determines y" or "y is multi-dependent on x". Then such FD is called Multivalued Dependency (MVD) and is represented by a double arrows We can also define MVD as, for each value of X there is a set of values for Y, and a set of values for Z. However, the set of values for Y and Z are independent of each other. So wherever two independent one_to_many relationships (A:B and A:C) are mixed on the same relation, a multivalued dependency arises. Multivalued dependency can be avoided using the fourth normal form. ENPLOYEE NAME PROJ ECT HOBBY A Microsoft Cricket A Oracle Music A Microsoft Music A Oracle Cricket B INTEL Movies B Sybase Reading B INTEL Reading B Sybase Movies Decomposed relation to reduce redundancy PROJ ECT NAME PROJ ECT A Microsoft A Oracle B Intel B Sybase Database Management Systems Unit 9 Sikkim Manipal University Page No.: 180 HOBBY NAME PROJ ECT A Cricket A Music B Movie B Reading Fourth Normal Form (4NF) : The definition of 4NF is violated when a relation has undesirable multivalued dependencies, and hence identify such relations and decompose into 4NF relations. Alternate definition: A relation R is said to be in 4NF if for every MVD AB that holds over R, one of the following is true: B A (trivial), or AB =R or A is a super key The Employee relation is not in 4NF because of the non-trivial MVDs (project and hobby attributes of employee relation are independent of each other) and NAME is not a super key of EMPLOYEE. To make this relation into 4NF you have to decompose EMPLOYEE to PROJ ECT AND HOBBY. Self Assessment Question(s) (SAQs) (For section 9.5) 1. Explain the concept of multivalued dependencies. 9.6 Normalization using join dependencies J oin dependency: the 5NF is also called "Project J oin Normal form". It is important to note that normalization into 5NF is considered very rarely in practice. Definiton: relation r is in 5NF, if for all join dependencies at least one of the following holds: (R1,R2..Rn) dependency Every Ri is a candidate key for R. Database Management Systems Unit 9 Sikkim Manipal University Page No.: 181 For an example of a J D, the relation shown in the figure states that CSE department offers subjects like Data structure and RDBMS, which are taken by Leela. Similarly, the other departments offer different subjects. However, no student takes all the subjects and no subject has all students enrolled in it, and therefore all three fields are needed to represent the information. DST Dept Subject Student CSE Data structures Leela Mech Thermodynamics Arjun CSE RDBMS Leela Maths Discrete Structure Parvathy The above relation does not suffer any MVD, because Subject and Student are not independent. To make this relation into 5NF we decompose it as: DJ (Dept. Subject) DS (Dept, Student) SS (Subject, Student) The three relations shown above satisfy the rules of 5NF, and also they are lossless. One of the major differences between 4NF and 5NF is that in a given relation R(X,Y,Z), if the attributes Y and Z are independent, then it suffers 4N,F and if they have dependency, then it is in NF. The 4NF gives generally two relations after decomposition, whereas 5NF gives three relations to keep all the information of the original relation. Self Assessment Question(s) (SAQs) (For section 9.6) 1. What do you mean by join dependencies? Explain 5-NF Database Management Systems Unit 9 Sikkim Manipal University Page No.: 182 9.7 Summary We have learnt in this unit concepts like o Information Design Guide Lines for relational DB: o Normal forms Based on Primary Keys: o Second Normal Form (2NF) o Third Normal Form (3NF ) o Boyce Codd Normal Form (BCNF) o Fourth Normal Form (4NF) Normalization using J oin Dependencies 9.8 Terminal Questions (TQs) 1. Discuss the criteria for bad relational schemas. 2. Discuss the attribute semantics as an information measure of goodness of a relation schema. 3. Discuss the first, second & third normal forms. 4. Discuss the concept of multi-valued dependency. 9.9 Multipl e Choice Questions (MCQs) 1. --------- Eliminates redundancy and promotes integrity A) Normalization B) Integration C) Consistency D) None of the above 2. A relation schema R is in if every attribute of R takes only single atomic values. a) First Normal form b) Second Normal form c) Third Normal form d) None of the above Database Management Systems Unit 9 Sikkim Manipal University Page No.: 183 3. .is a functional dependency in which one or more non-key attributes are functionally dependent on part of the primary key. They are sequential access devices a) A full functional dependency b) A Partial functional dependency c) Functional dependency d) None of the above 4 A relation r is in .. if for all join dependencies at least one of the following holds: (R1,R2..Rn) os atrovoa; kpom-dependency Every Ri is a candidate key for R. o first normal form o Second Normal form o Fifth Normal form o None of the above 9.10 Answers to SAQs, TQs, and MCQs 9.10.1 Answers to Self Assessment Questions (SAQs) For Section 9.1 1. Normalization is the process of building database structures to store data, because any application ultimately depends on its data structures. (Refer section9.1) For Section 9.2 1. Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples. (Refer section 9.2) Database Management Systems Unit 9 Sikkim Manipal University Page No.: 184 For Section 9.3 1. A relation schema R is in first normal form if every attribute of R takes only single atomic values. (Refer section 9.3) 2. A second normal form is based on the concept of full functional dependency. A relation is in second normal form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R. (Refer section 9.3.1) 3. This is based on the concept of transitive dependency. We should design relational schema in such a way that there should not be any transitive dependencies because they lead to update anomalies. (Refer section9.3.2) For Section 9.4 1. Database relations are designed so that they neither partial dependencies nor transitive dependencies, because these types of dependencies result in update anomalies. A functional dependency describes the relationship between attributes in a relation. For e.g. 'A' and 'B' are attributes in relation R. 'B' is functionally dependent on 'A' (A B) if each value of 'A' is associated with exactly one value of 'B'. The left_hand side and the right_hand side in a functional dependency are sometimes called the determinant and dependent respectively. A relation is in BCNF if and only if every determinant is a Candidate key. (Refer section9.4) For Section 9.5 1. Multi valued dependencies are based on the concept of first normal form, which prohibits attributes having a set of values. (Refer section 9.5) Database Management Systems Unit 9 Sikkim Manipal University Page No.: 185 For Section 9.6 1. J oin dependency, the 5NF is also called "Project J oin Normal form". It is important to note that normalization into 5NF is considered very rarely in practice. Definiton: relation r is in 5NF, if for all join dependencies at least one of the following holds: (R1,R2..Rn) dependency Every Ri is a candidate key for R. (Refer section9.6) 9.10.2 Answers to Terminal Questions (TQs) 1. Criteria for good and bad relation schemas. Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples. (Refer section 9.2) 2. Whenever we group attributes to form a relation, we assume that a certain meaning is associated with the attributes. This meaning is called Semantics, and specifies how the attribute values in a tuple relate to one another. (Refer section 9.2) 3. A relation schema R is in first normal form if every attribute of R takes only single atomic values. (Refer section 9.3) 4. Multi valued dependencies are based on the concept of first normal form, which prohibits attributes having a set of values.(Refer section 9.5) 9.10.3 Answers to Multiple Choice Questions (MCQs) 1. A 2. A 3. B 4. C