Unit 4
Unit 4
The CREATE TABLE command is used to specify a new relation by giving it a name and specifying its attributes and
initial constraints.
(OR)
DDL also allows us to specify different constraints(write about PRIMARYKEY and FOREIGNKEY constraints)
f NULLs exist in the grouping attribute, then a separate group is created for all tuples with a NULL value in the
grouping attribute. For example, if the EMPLOYEE table had some tuples that had NULL for the grouping attribute
Dno, there would be a separate group for those tuples in the result above.
Example 2: For each project, retrieve the project number, the project name, and the number of employees who
work on that project.
The above SQL statement shows how we can use a join condition in conjunction with GROUP BY. In this case, the
grouping and functions are applied after the joining of the two relations. Sometimes we want to retrieve the values
of these functions only for groups that satisfy certain conditions. For example, suppose that we want to modify the
Page |3
above SQL statement so that only projects with more than two employees appear in the result. SQL provides a
HAVING clause, which can appear in conjunction with a GROUP BY clause, for this purpose. HAVING provides a
condition on the summary information regarding the group of tuples associated with each value of the grouping
attributes. Only the groups that satisfy the condition are retrieved in the result of the query. This is illustrated below:
Example 3: For each project on which more than two employees work, retrieve the project number, the project
name, and the number of employees who work on the project.
Example 4: For each project, retrieve the project number, the project name, and the number of employees from
department 5 who work on the project.
Page |4
Example 5: Make a list of all project numbers for projects that involve an employee whose last name is ‘Smith’, either
as a worker or as a manager of the department that controls the project.
The above SQL statement can be better expressed using nested subquery, as follows:
In V1, we did not specify any new attribute names for the view WORKS_ON1 (although we could have); in this
case, WORKS_ON1 inherits the names of the view attributes from the defining tables EMPLOYEE, PROJECT, and
WORKS_ON. View V2 explicitly specifies new attribute names for the view DEPT_INFO, using a one-to-one
correspondence between the attributes specified in the CREATE VIEW clause and those specified in the SELECT clause
of the query that defines the view.
We can now specify SQL queries on a view—or virtual table—in the same way we specify queries involving base
tables. For example, to retrieve the last name and first name of all employees who work on the ‘ProductX’ project,
we can utilize the WORKS_ON1 view and specify the query as:
If we do not need a view any more, we can use the DROP VIEW command to dispose of it. For example, to get rid
of the view V1, we can use the SQL statement:
4.6 Normalization
4.6.1 Anomalies in relational database design
Storing natural joins of base relations obviously leads to redundancy and an additional problem referred to as
update anomalies. These can be classified into insertion anomalies, deletion anomalies, and modification anomalies.
Insertion Anomalies. Insertion anomalies can be differentiated into two types, illustrated by the following
examples based on the EMP_DEPT relation shown in fig.
Figure 4.1: Two relation schemas suffering from update anomalies. (a) EMP_DEPT and (b) EMP_PROJ.
Page |6
■ To insert a new employee tuple into EMP_DEPT, we must include either the attribute
values for the department that the employee works for, or NULLs (if the employee does
not work for a department as yet). For example, to insert a new tuple for an employee
who works in department number 5, we must enter all the attribute values of department
5 correctly so that they are consistent with the corresponding values for department 5
in other tuples in EMP_DEPT.
■ It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation. The only way to do this is to place NULL values in the attributes
for employee. This violates the entity integrity for EMP_DEPT because Ssn is its
primary key. Moreover, when the first employee is assigned to that department, we do
not need this tuple with NULL values any more.
Deletion Anomalies. The problem of deletion anomalies is related to the second insertion anomaly situation just
discussed. If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for
a particular department, the information concerning that department is lost from the database.
Modification Anomalies. In EMP_DEPT, if we change the value of one of the attributes of a particular department—
say, the manager of department 5—we must update the tuples of all employees who work in that department;
otherwise, the database will become inconsistent. If we fail to update some tuples, the same department will be
shown to have two different values for manager in different employee tuples, which would be wrong.
4.6.2 Decomposition
It is easy to see that these three anomalies are undesirable and cause difficulties to maintain consistency of data as
well as require unnecessary updates. The solution for these all problems is to decompose the tables into base
tables.
(a)
EMPLOYEE
Ename Ssn Bdate Address Dno
DEPARTMENT
Dnumber Dname Dmgr_ssn
(b)
WORKS_ON
Ssn Pnumber Hours
PROJECT
Pnumber Ename Pname Plocation
state r of R. The constraint is that, for any two tuples t 1 and t 2 in r that have t 1[X] = t
2[X], they must also have t 1[Y] = t 2[Y].
Figure 4.3: A relation state of TEACH with a possible functional dependency TEXT → COURSE. However, TEACHER → COURSE is ruled out.
See the illustrative example relation in Figure 4.4. Here, the following FDs may hold because the four tuples in the
current extension have no violation of these constraints: B → C; C → B; {A, B} → C; {A, B} → D; and {C, D} → B.
However, the following do not hold because we already have violations of them in the given extension: A → B
(tuples 1 and 2 violate this constraint); B → A (tuples 2 and 3 violate this constraint); D → C (tuples 3 and 4 violate
it).
attributes of a relation. Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on
the concepts of multivalued dependencies and join dependencies, respectively.
Normalization of data can be considered a process of analyzing the given relation schemas based on their FDs and
primary keys to achieve the desirable properties of (1) minimizing redundancy and (2) minimizing the insertion,
deletion, and update anomalies discussed in Section 4.6.1. It can be considered as a “filtering” or “purification”
process to make the design have successively better quality. Unsatisfactory relation schemas that do not meet certain
conditions—the normal form tests—are decomposed into smaller relation schemas that meet the tests and hence
possess the desirable properties.
Definition. The normal form of a relation refers to the highest normal form condition that it
meets, and hence indicates the degree to which it has been normalized.
Normal forms, when considered in isolation from other factors, do not guarantee a good database design. It is
generally not sufficient to check separately that each relation schema in the database is, say, in BCNF or 3NF. Rather,
the process of normalization through decomposition must also confirm the existence of additional properties that
the relational schemas, taken together, should possess. These would include two properties:
■ The nonadditive join or lossless join property, which guarantees that the spurious tuple
generation problem does not occur with respect to the relation schemas created after
decomposition.
■ The dependency preservation property, which ensures that each functional dependency is
represented in some individual relation resulting after decomposition.
The nonadditive join property is extremely critical and must be achieved at any cost, whereas the dependency
preservation property, although desirable, is sometimes sacrificed.
It states that the domain of an attribute must include only atomic (simple, indivisible) values and
that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a
single tuple. In other words, 1NF disallows relations within relations or relations as attribute values within tuples.
The only attribute values permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT schema shown in following figure 4.5(a), Dnumber is the primary key of the relation.
Fig. (b) shows the state. We assume that each department can have a number of locations. There are two ways we
can look at the Dlocations attribute:
■ The domain of Dlocations contains atomic values, but some tuples can have a set of these values. In this case,
Dlocations is not functionally dependent on the primary key Dnumber.
■ The domain of Dlocations contains sets of values and hence is nonatomic. In this case, Dnumber → Dlocations
because each set is considered a single member of the attribute domain.
Page |9
Figure 4.5: Normalization into 1NF. (a) A relation schema that is not in 1NF. (b) Sample state of relation DEPARTMENT.
In either case, the DEPARTMENT relation in Figure 4.5 is not in 1NF. There are three main techniques to achieve
first normal form for such a relation:
1. Remove the attribute Dlocations that violates 1NF and place it in a separate relation DEPT_LOCATIONS along
with the primary key Dnumber of DEPARTMENT. The primary key of this relation is the combination
{Dnumber, Dlocation}, as shown in Figure 4.6. A distinct tuple in DEPT_LOCATIONS exists for each location of
a department. This decomposes the non-1NF relation into two 1NF relations.
2. Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location
of a DEPARTMENT, as shown in Figure 4.5(c). In this case, the primary key becomes the combination
{Dnumber, Dlocation}. This solution has the disadvantage of introducing redundancy in the relation.
P a g e | 10
3. If a maximum number of values is known for the attribute—for example, if it is known that at most three
locations can exist for a department—replace the Dlocations attribute by three atomic attributes:
Dlocation1, Dlocation2, and Dlocation3. This solution has the disadvantage of introducing NULL values if most
departments have fewer than three locations. It further introduces spurious semantics about the ordering
among the location values that is not originally intended. Querying on this attribute becomes more difficult;
for example, consider how you would write the query: List the departments that have ‘Bellaire’ as one of
their locations in this design.
Of the three solutions above, the first is generally considered best because it does not suffer from redundancy
and it is completely general, having no limit placed on a maximum number of values.
The EMP_PROJ relation in Figure 4.7 is in 1NF but is not in 2NF. The nonprime attribute Ename violates
2NF because of FD2, as do the nonprime attributes Pname and Plocation because of FD3. The functional
dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent on the primary key
{Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.
The functional dependencies FD1, FD2, and FD3 lead to the decomposition of EMP_PROJ into the three
relation schemas EP1, EP2, and EP3 shown in Figure 4.8, each of which is in 2NF.
Definition. According to Codd’s original definition, a relation schema R is in 3NF if it satisfies 2NF and
no nonprime attribute of R is transitively dependent on the primary key.
A functional dependency X → Y in a relation schema R is a transitive dependency if there exists a set of attributes Z
in R that is neither a candidate key nor a subset of any key of R, and both X → Z and Z → Y hold.
The dependency Ssn → Dmgr_ssn is transitive through Dnumber in EMP_DEPT in Figure 4.9, because both the
dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of
the key of EMP_DEPT. Intuitively, we can see that the dependency of Dmgr_ssn on Dnumber is undesirable in
EMP_DEPT since Dnumber is not a key of EMP_DEPT.
The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However, EMP_DEPT is not
in 3NF because of the transitive dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber. We can normalize
EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2 shown in Figure 4.10.
Figure 4.11 Summary of Normal Forms Based on Primary Keys and Corresponding Normalization
P a g e | 12