Unit-1
Unit-1
Structure
1.0 Introduction
1.1 Objectives
1.2 Concepts of a Relational Model
1.3 Formal Definition of a Relation
1.4 The Codd Commandments
1.5 Relational Algebra
1.6 Relational Completeness
1.7 Summary
1.8 Model Answers
1.9 Further Reading
1.0 INTRODUCTION
One of the main advantage of the relational model is that it is conceptually simple and more
importantly based on mathematical theory of relation. It also frees the users from details of
storage structure and access methods.
The relational model like all other models consists of three basic components:
a set of domains and a set of relations
operation on relations
integrity rules
In this unit, we first provide the formal definition of a relational data model. Then we define
basic operations of relational algebra and finally discuss the integrity rules.
1.1 OBJECTIVES
After completing this unit, you will be able to:
define the concepts of relational model
discuss the basic operations of the relational algebra
state the integrity rules
Property 5: the columns of a table are assigned distinct names and the ordering of these
columns is immaterial.
--
0
Example of a valid relation
BANGALORE
BANGALORE
BANGALORE
In general we say that a relation defined over n domains has a degree n or is n-ary. The
elements of this set are n-tuples.
We shall distinguish between the definition of a relation and the relation itself. We shall say
that the defmition of a relation gives a name to the relation and specifies the components *
over which it is defined These components are referred to as relation attributes or attributes
for short. An attribute has a domain associated with it from which it takes on values. The
relation itself, on the other hand, is the set of tuples which constitute it at a given instance of
time. For example, a statement which says that a relation Supplier is built over attributes S#,
P#,SCITY having domains integer, character string respectively is the definition of the .
relation Supplier. The relation itself is shown below. It must be noted that at the time the
definition of a relation is just given, a relation with no tuples in it, i.e. a null relation, is
c.-
Supplier
S# P# scm
10 1 BANGALORE
10 2 BANGALORE
10 3 BANGALORE
11 1 BOMBAY
11 2 BOMBAY
We can now defme the notion of a relational database or database for short. A database is a
collection of relations of assorted degrees such that these relations are in accordance with
their definitions in the relational schema. Since a relation is time varying, by this definition
we can infer that a database is also time varying.
,4
::::A C F
6 B D G 6 D G
2 B D.H 2 B 2 D H
This rule allows many types of database design change to be made dynamically, without
users being aware of them. To illustrate the meaning of the rule the examples on the next
page show two types of activity, described in more detail later, that should be possible if this
rule is enforced.
A B C D
1 A C E
4 C D 4 A C F
6 D G 6 B D G
2 B 2 D H 2 B D H
Integrity Rule 1
lntcgrity rulc 1 is conccmcd with primary key values. Before we formally state the rule, let
us look at thc effect of null values in prime attributes. A null value for an attribute is a value
that is eithcr not known at Lhc timc or docs not apply to a given instancc of the object. It
may also bc possiblc that a particular tuplc docs not have a value for an attribute; this fact
could bc represented by a null value.
If any atvibutc of a primary kcy (primc attribute) wcrc permitted to have null values, then,
,
bccausc the attributes in h c kcy must bc nonrcdundant, thc key cannot bc used for unique
identification of tuples. This contradicts thc rcquircmcnts for a primary key. Consider the
relation P in figurc 3. The allribute Id is lhe primary kcy for P. If null values (represented as
@ were permitted, as in figure 3, hen h e two tuples @, Smilh are indistinguishable,even
though hey may represent two different instances of the entity type employee. Similarly, the
tuples < @, Lalonde > and lo4, Lalonde >, for all intents and purposes, are also
indistinguishable and may be referring to the same person. As instances of entities are
distinguishable. so must be thcir surrogates in the model.
Id Name
101 Jones
101 Jones
103 Smith @ Smith
104 Lalonde
104 Lalonde
107 Evan
107 Evan
110 Drew
110 Drew
@ Lalonde
112 Smilh
- @ Smith
Figure 3 :(a) Reistion withwt null values and (b) d a t l o n with null values
Integrity rule 1 specifies that insmnccs of the entities are distinguishable and thus no prime
atuibute (component of a primary key) value may be null. This rule is also referred to as the
entity rule. We could state this rule formally as:
Example
Consider the example of employees and their has a manager and as managers are also
employees, we may represent managcrs by their employee numbcrs, if the employee number
is a key of the rclation employee. Figure 4 illustrates an example of such an employee
relation. The Manager attributc represents thc cmploycc numbcr or the managcr. Manager is
a forcign key; notc that it is referring to the primary key of the . m e relation. An employee
can only have a manager who is also an employee. The chief executive officer (CEO) of thc
company can have himself or hcrsclf a$the managcr or may tikc null values. Some
employees may also bc temporarily without managcr, and this can bc rcprcsentcd by thc
Managcr taking null values.
F i ~ w r c4 : Foreign Keys
their appropriate attribute values have been set to null. The insertion of a tuple with a
foreign key reference or the update of the foreign key attributes of a relation require a check I
The practical importancc of thcse rulcs is difficult to estimate, and depends largely on the
RDBMS in qucstion, i& proposed use and individual vicw points, but the theoretical
importance is undeniable. It is interesting to see how some of the rulcs relate to others, and
to somc of thc morc important advantages of thc relational model. It is unlikely at the
prescnt timc that any RDBMS can claim full logical data independence because of their
generally poor ability to handlc updating through vicws. Even token adherence to this rule
however, when combined with facilities enabling physical data independence, potentially
yield advantages to applications developers, unheard of with any other type of databaw
systcm. Coupling thcse two rules with the data independence and distribution independcncc
rules can take thc protection of customer investment to new heighls.
The bcauty of the relational database is that the concepts that define it are few, easy to
undcrsland and explicit Thc 12 rulcs explained can be used as the basic relational design
criteria, and as such are clcar indications of thc purity of the relational concept. Whilst you
do not find thesc rules bcing quoted so often these days as in the recent past, it does not mean
that thcy arc any lcss imporcant. Rathcr it can be interpreted as reflecting a reduced
impormcc as propaganda. Othcr factors, of which performance is the most obvious, have
now takcn precedencc.
Basic Operations
Basic operations are the traditional set operations: union, difference, intersection and
cartesian product. Thrce of thcse four basic operations - union, intersection, and difference -
requirc that operand relations be union compatible. Two relations are union compatible if
thcy havc thc sarnc arity and onc-to-onc corrcspondcncc of the attributes with the
corresponding attributes defincd ovcr thc same domain. The cartesian product can be
defined on any two relations. Two relations P and Q are said to be-union compatible if both
P and Q are of the samc dcgrcc n and thc domain of the corresponding n attributcs are
identical, i.e. if P = P(P,, .... ,] and Q = {Q,,... Q,) thcn
Relational Model
Dom(Pi) = Dom(Q,) for i = ( 1,2, .....n)
where Dom(Pi) represents the domain of the attribute Pi.
Example 1
In the examples to follow,we utilise two relations P and Q given in Figure 5. R is a
computcd rcsult relation. We assume that the relations P and Q in Figure 5 represent employees
working on the development of software application packages J, and J, respectively.
Name
Smith
Lalonde
Byron
Drew
If we assume that P and Q are two unioncompatible relations, then the union of P and Q is
the set-thcorcticunion of P and Q. The resultant relation, R = P U Q, has tuples drawn from
P and Q such that
The result relation R contains tuplcs that arc in cither Por Q or in both of them. The
duplicate tuplcs are eliminated.
Remember that from our definition of union compatibility the degree of the relations P and R
is the same. The cardinality of the resultant rclation depends on the duplication of tuples in P
and Q. From the abovc expression, we can see that if all the tuplcs in Q were contained in P,
then 1 Rl = I PI and R = P, while if the tuples in Pand Q weredisjoint, thcn I R I = 1 PI +I Ql.
Example 2
R.thc union of P and Q given in Figure 5 in the above example 1 is shown in Figure 6(a). R
represenls cmployccs working on the packagcs J, or J,, or both of these packages. Since a
relat~ondoes nothave duplicate tuplcs, an employee working on both J, and J2 will appear in
the relalion R only once.
R: R: R:
Smith
Lalonde
110 Draw
(b) P - Q (c) p n Q
DifTerence (-)
The difference operation rcmoves common tuplcs from the fist relation.
R = P-Qsuchthat
R = (111 E P h t g Q)
Example 3
I
i
R, the result of P - Q, gives cmployccs working only on package J,. (figure 6(b) in example
2). Employees working on both packagcs J, and J, have been removed.
RDBMS and DDBMS Intersection ( n )
The intersection operation selects the common tuples fnnn the two relations.
Example 4
The resultant relation of P n Q is the set of all employees working on both the packages.
(figure 5(c) of example 2).
The intersection operation is really unnecessary. It can be very simply expressed as:
P n Q =P-(P-Q)
It is, however, more convenient to write an expression with a single intersection operation
than one involving a pair of difference operations.
Note that in these examples the operand and the result relation schemes, including the
attribute names, are identical i.e. P = Q = R If the attribute names of compatible relations
are not identical, the naming of the attributes of the result relation will have to be resolved
where a tuple r E R is given by (t, 1 1 t2 I t, E -PA t2 E Q) ,i.e. the result relation is obtained
by concatenating each tuple in relation P with each tuple in relation Q. Here, represents the
concatenation operation.
The schcmc of the result relation is given by:
Example 5
The cartesian product of the PERSONNEL relation and SOFTWARE-PACKAGE relations
of figure 7(a) is shown in figure 7(b). Note that the relations P and Q from figure 5 of
examplc 1 are a subsct of thc PERSONNEL relation.
Id P.Narne S
101 Jones JI
101 Jones Jz
103 Smith JI
103 Smith Jz
10Q Lalonde J1
10Q Lalonde Jz
106 Byron J1
106 Byron Jz
107 Evan JI
107 Evan Jz
110 Drew JI
110 Drew J2
112 Smith JI
112 Smith Jz
Figure 7 : (a) PERSONNEL (EmpU, Name) and SOFTWARE-PACKACE(S) represent employes and soh
ware packages respectively; (b) the Cartesian pmduct of PERSONNEL and SOFI'WARE-PACKAGES
The union and intersection operations are associative and commutative; therefore, given
relations R, S, T:
R-(S-T)#(R-S)-T nonassociative
-
supplemented by the definition of the following operations: projection, selection,join, and
division. These operations are represented by symbols rc, a, and + respcctively.
Projection and selection are unary operations;join and division are binary.
- Projection ( x )
The projection of a relation is defined as a projection of all its tuples over some set of
attributes, i.e., it yields a vertical subset of the relation. The projection operation is used to
either reduce the number of attributes in the resultant relation or to reorder attributes. In the
first case, the arity (or degree) of the relation is rcduced. The projection operation is shown
graphically in figure 8. Figure 8 shows the projcction of the relation PERSONNEL on the
attribute Name. The cardinality of the result relation is also reduced due to the deletion of
duplicate tuples.
We defincd the projection of a tuple t; over thc attribute A, denoted t,[A] or x .(ti), as (a).
where a is the value
PERSONAL :
Id Name Name
101 Jones Jones
103 Smith Smith
104 Lalonde > Lalonde
106 Byron Byron
107 Evan Evan
110 Drew Drew
1 112 Smith -
where T[A] is a single attribute relation and I T[A] 1 5 T. The cardinality T[A] may be less
than the cardinality IT1 because of the deletion of any duplicates in the result A case in point
is illusuated in figure 8.
Similarly, we can define the projection of a relation on a set of attribute names, X, as a
concatenation of the projections for each auribute A in X for every tuple in the relation.
A belongs lo X
where 4[A] represents the concatenation of all 4[A] for all A E X.
A belongs to X
Simply stated, the projection of a relation P on the set of attribute names Y belong to P is the
projection of each tuple of the relation P on the set of attribute names Y.
Note that the projection operation reduces the arity if the number of atmbutes in X is less
than the arity of the relation. The projection operation may also reduce the cardinality of the
result relation sincc duplicate tuples are removed. (Note that the projection operation
produces a relation as the result. By definition, a relation cannot have duplicate tuples. In
most commercial implementationsof the relational model, however, the duplicates would
still be present in the result).
Selection (0)
Suppose we want to find those employees in the relation PERSONNEL of figure 7(a) of
exarnplef5with an Id less than 105. This is an operation that selects only some of the tuples
the relatibn. Such an operation is known as a selection OperahOn. 'l'he projection operation
yields a vcrtical subset of a relation. The action is defined over a subset of the attribute
names but over all the tuples in the relation. The selection operation, however, yields a
horizontal subset of a given relation, i.e., the action is defined over the complete set of
attribute names but only a subset of the tuples are included in the result. To have a tuple
included in the result relation, the specified selection conditions or predicates must be
satisfied by it. The selection operation, is sometimes known as the restriction operation.
Any finite number of predicates connected by Boolean operators may be specified in the
selection operation. 'The predicates may define a comparison between two
domaincompatible atuibutes or between an attribute and a constant value; if the comparison
is between auribute A, and constant c,, then c, belong to Dom(A,).
Given a relation P and a predicate expression B, the selections of those tuples of relation P
that satisfy the predicate B is a rclation R written as:
The above expression could be read as "select those tuples t from P in which the predicate
B(t) is me." The set of tuplcs in relation R are in this case defined as follows:
Relatlond Model
JOJN (4
The join operator. as the name suggests, allows the combining of two relations to form a
single new relation. The tuples from the operand relations that participate in the operation
and contribute to the result are related. The join operation allows the processing of
relationships existing between the operand relations.
Example 6
In figurelo we encounter the following relations: ASSIGNMENT (Emp#, Prod#, Job#)
JOB-FUNCHON (Job#, Title)
EMPLOYEE :
Figure 10 (a) Relatlon schemes for employee role In development teams @) sample relatloas
Suppose we want to respond to the query Get product number of assignments whose
development teams have a chief programmer. This requircs first computing the cartesian
product of the ASSIGNMENT and JOB-FUNCTION relations. Let us name this product
relation TEMP. This is followed by selecting those tuples of TEMP where the attribute Title
has the value chief programmer and rhe value of the attribute Job# in ASSIGNMENT and
JOB-FUN(JTI0N are the same. The required rcsult. shown below is obtained by projecting
these tuples on the attribute Prod#. The operations are specified below.
TEMP = (ASSIGNMENT X JOB-FUNCTION)
Fl
IT-, (oTitle = 'chief programmer' A ASSIGNMENT.Job# (TEMP))
BINS 9
WDBMS and DDBMS In another method of responding to this query, we can first select those tuples from the
JOB-FUNCTION relation so that the value of the attribute Title is chief programmer. Let us
call this set of t4ples the relation TEMPI. We thcn compute the cartesian product of TEMP1
and ASSIGNMENT. c ~ ! I i ~the g product TEMP2. This is followed by a projection on Prod#
over TEMP2 to give us the required response. These operations are specified below:
Notice Lhat in the seIection operation that follows the cartesian product we take only those
tupltx where the value of h e attributes ASSIGNMENT.JoWt and JOB-FUNCTION.Job# are
the same. These combined operations of cartesian product followed by selection are the join
operation. Note that we have qualified the identically named attributes by the name af the
corresponding relation to distinguish them.
In case of the join of a relation with itself, we would need to rename either the auributes of
one of the copies of the relation or the relation name itself. We illustrate this in example 7.
In general the join condition may have more than one tern, necessitating the use of the
subscript in the comparison opcrator. Now we shall define the different types of join
operations.
In thcsc discussions we use P,Q, R and so on to represent both the relation scheme and the
collection or bag of underlying domains of the attributes. We call it a bag of domains
bccause more than one attribute may be defined on the same domain.
Typically. P nQ may bc null and this guarantees the uniqueness of attribute names in the
result relation. When the same attribute name occurs in the two schemes we use qualified
names.
Two common and very useful variants of the join are the equi-join and the natural join. In
the equi-join the comparison operator theq(i = 1,2,.......n) is always the equality operator
(=). Similarly, in the natural join the comparison operator is always the equality opcrator.
However, only one of the two sets of domain compatible atuibutcs involved in the natural
join are A, from P and Bi from Q, for i = 1, .....,n. the natural join predicate is a conjunction
of terms of the following form:
Domain compatibility requires that the domains of A, and B, be compatible, and for this
reason relation schemes P and Q have attributes defined on common domains, i.e., P nQ # $.
Therefore,join attributes have common domains in the relation schemes P and Q. q
Consequently, only one set of the join attributes on these common domains needs to be
preserved in the result relation. This is achieved by taking a projection after the join
operation, thereby eliminating the duplicate attributes. If the relation P and Q have attributes
wilh the same domains but different attribute names, then renaming or projection may be
i
I
specified. I
Example 7 ?
Given the EMPLOYEE and SALARY relat~onsot tigwe i i . i), ~f we have w firid the salary
employees by name, we join the tuples in the relation Eh4PLOYEE with those in SALARY
such that the value of the attribute Id in EMPLOYEE is the same as that in SALARY. The
natu~aljoin takes the predicate expression a be EMPL0YEE.M = SALARY.1d. The result
of the natural join is shown in figure I I lii) When using the natural join, we do not need to
specify this predicate. The expression lo specify the operation of finding the salary of
employees by name is given as follows. Here we project the resuh of the natural join
operation on the attributes Name and Salary:
I Division (+-)
Before we define the division operation. let us consider an example.
I
Example 8
Given the relations P and Q as shown in figure 12 (a). the result of dividing P by Q is the
relation R and it has two tuples. For each tuple in R, its product with the tuples of Q must be
in P. In our example (al,bl) and (a,&) must both be tuples in P;the same is true for (%, bl)
and (a,,b,).
Simply stated, the cartesian product of Q and R is a subset of P. In figure 12(b), the result
relation R has four tuples; the cartesian product of R and Q gives a resulting relation which is
P: Q: R (result) : Q: then R is :
(a) (b)
then R is Q: then R is :
In figure 12(d),the relation Q is empty. The result relation can be defined as the projection
of P on the awibutcs in P - Q. However, it is usual to disallow division by a empty relation.
t
Let us treat the Q as representing one set of properties (the properties are defined on the Q,
each tuple in Q representing an instance of these properties) and the relation r as representing
entities with these properties (entities are defined on P - Q, and the properties are, as before,
defined on Q); note that P u Q must be equal to P. Each tuple in P represents an object with
some given property. The resultant relation R, then, is the set of entities that possesses all the
properties specified in Q. The two entities a, and a, possess all the properties, i.e., b, and b2.
The other entities in P,az, a,, and a4, only possess one, not both, of the properties. The
division operation is useful when a query involves the phrase "for all objects having all the
specified properties." Note that both P - Q and Q in general represent a set of attributes. It
should be clear that Q not a subsct of P.
The relational model has evoked a wide amount of interest in the database community. This
model has a very swnd mathematical basis a i t It exhibits a high degree of data
independence.
However, it has its share of difficulties. These are:
The relational model does not deal with issues like semantic integrity,
concurrency and database security. These issues are left to be solved by the
implementors of database management systems based on the relational model. The
most serious consequence of the foregoing was the absence of the concept of
semantic integrity in relational systems.
Traditionally, implementations of the relational model have suffered from the
drawback that they are relatively poor on response time. The biggest problem is in
the realization of the join operator. Whereas, a DBMS based on the relational
model can handle small databases, as the sizes of databases reach the region of
billions of bytes the performance of these systems falls rather drastically.
Consequently, these systems are able to support databases of relatively small sizes.
2. Two relations are union compatible if they have some arity and one a one
correspondence of the attributes with the corresponding altributes defirled over the same
domain.