0% found this document useful (0 votes)
7 views

Unit-1

Uploaded by

s.shanmugapriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit-1

Uploaded by

s.shanmugapriya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT 1 RELATIONAL MODEL

Structure
1.0 Introduction
1.1 Objectives
1.2 Concepts of a Relational Model
1.3 Formal Definition of a Relation
1.4 The Codd Commandments
1.5 Relational Algebra
1.6 Relational Completeness
1.7 Summary
1.8 Model Answers
1.9 Further Reading

1.0 INTRODUCTION
One of the main advantage of the relational model is that it is conceptually simple and more
importantly based on mathematical theory of relation. It also frees the users from details of
storage structure and access methods.
The relational model like all other models consists of three basic components:
a set of domains and a set of relations
operation on relations
integrity rules
In this unit, we first provide the formal definition of a relational data model. Then we define
basic operations of relational algebra and finally discuss the integrity rules.

1.1 OBJECTIVES
After completing this unit, you will be able to:
define the concepts of relational model
discuss the basic operations of the relational algebra
state the integrity rules

1.2 CONCEPTS OF A RELATIONAL MODEL


The relational model was propounded by E.F. Codd of the IBM in 1972. The basic concept in
the relational model is that of a relation.

A relation can be viewed as a table which has the following properties :

Property 1: it is column homogeneous. In other words, in any given column of a table,


all items are of the same kind.
Property 2: each item is a simple number or a character string. That is, a table must be in
INF. (First Normal Form) which will be introduced in the second unit.
Property 3: all rows of a table are distinct
Property 4: the ordering of rows within a table is immaterial.

Property 5: the columns of a table are assigned distinct names and the ordering of these
columns is immaterial.
--

RDBMS and DDBMS

0
Example of a valid relation

BANGALORE
BANGALORE

BANGALORE

1.3 FORMAL DEFINITION OF A RELATION


Formally, a relation is defined as the subset of the expanded cartesian product of domains. In
order to do so, first we define the cartesian product of two sets and then the expanded
cartesian product.
i
'Ihe cartesian product of two sets A and B, denoted by Ax B is

The expanded cartesian product of n sets Al, Az......A,, is defined by


X(Al, Az....43 = ((a,q ....a,,): aj E Aj 1 <= j < = n))

'Ihe element (a,, a, ,... is called an n-tuple.


Given domains Dl, Dz,...D,, we defme a relation, R, as a subset of the expanded cartesian
product of these domains as follows:

In general we say that a relation defined over n domains has a degree n or is n-ary. The
elements of this set are n-tuples.
We shall distinguish between the definition of a relation and the relation itself. We shall say
that the defmition of a relation gives a name to the relation and specifies the components *
over which it is defined These components are referred to as relation attributes or attributes
for short. An attribute has a domain associated with it from which it takes on values. The
relation itself, on the other hand, is the set of tuples which constitute it at a given instance of
time. For example, a statement which says that a relation Supplier is built over attributes S#,
P#,SCITY having domains integer, character string respectively is the definition of the .
relation Supplier. The relation itself is shown below. It must be noted that at the time the
definition of a relation is just given, a relation with no tuples in it, i.e. a null relation, is
c.-
Supplier
S# P# scm
10 1 BANGALORE
10 2 BANGALORE
10 3 BANGALORE
11 1 BOMBAY
11 2 BOMBAY

A relational schema is defined to be a collection of relation defmitions.

We can now defme the notion of a relational database or database for short. A database is a
collection of relations of assorted degrees such that these relations are in accordance with
their definitions in the relational schema. Since a relation is time varying, by this definition
we can infer that a database is also time varying.

1.4 THE CODD COMMANDMENTS


In the most basic of definitions a DBMS can be regarded as relational only if it obeys the
following three rules:
All information must be held in tables
Retrieval of the data must be possible using the following types of operations:
SELECT, JOIN and PROJECT
All relationships between data must be represented explicitly in that data itself.
This really is the minimum requirement, but it is surprising to see just how some well-known
database products fail according to these simple rules to be in fact, relational, no matter what
their vendors claim.
To define the requirements more rigorously, compliance with thc 12 mles stated below must
be demonstrable, within a single product, for it to be termed relational. In reality it's true to
say that they don't all carry the same degree of importance, and indeed some very good
products exist today supporting major large-scale production systems that cannot, hand on
heart, claim to obey any more than eight or so of these rules. It's likely however, that it is
only when all 12 rules can be satisfied, by facilities that coexist together, that the full benefits
of the relational database can be realised.

The Twelve Rules


Just as in the 12 rules that define the distributed product, there is a single overall rule which
in some ways covers all others and is commonly called Rule 0. It states that
Any truly relational database must be manageable entirely through its own relational
capabilities
Having stated this rule, we will not delve deeper except to say that its meaning can be
interpreted by stating that a relational database must be relational, wholly relational and
nothing but relational. If a DBMS depends on record-by-recorddata manipulation tools, it
. is not truly relational.

Rule 1: The information rule


All information is explicitly and logically represented in exactly one way -by data
values in tables.
In simple terms this means that if an item of data doesn't reside somewhere in a table in the
database then it doesn't exist and this should be extended to the point where even such
information as table, view and column names to mention just a few. should be contained
somewhere in table form. This necessitates the provision of an active data dictionary, that is
itself relational, and it is the provision of such facilities that allow the relatively easy
additions to RDBMS's of programming and CASE tools for example. This rule serves on its
own to invalidate the claims of several databases to be relational simply because of their lack
of ability to store dictionary items (or indeed metadata) in an integrated, relational form.
Commonly such products implement their dictionary information systems in some native fde
structure, and thus set themselves up for failing at the first hurdle.

Rule 2 :The rule of guaranteed access


Every item of data must be logically addressable by resorting to a combination of table
name, primary key value and column name.
Whilst it is possible to retrieve individual items of data in many differentways, especially in
a relationalfSQLenvironment, it must be true that any item can be retrieved by supplying the
table name, the primary key value of the row holding the item and the column name in which
it is to be found. If you think back to the table like storagestruclurc, this rule is saying that
at the intersection of a column and a row you will necessarily find one value of a data item
(or null).

Rule 3 :The systematic treatment of null values


It may surprise you to see this subject on the list of properties, but it is fundamental to the
DBMS that null values are supported in the representation of missing and inapplicable
information. This support for null values must be consistent throughout the DBMS, and
independent of data type (a null value in a CHAR field must mean the same as null in an
INTEGER field for example).
It has often been the case in other product types, that a character to represent missing or
inapplicable data has been allocated from the domain of characters pertinent to a particular
RDBMS and DDBMS item. We may for example defme four permissible values for a column SEX as:
M Male
F Female
X No data available
Y Not applicable
Such a solution requires careful design, and must decrease productivity at the very least.
This situation is particularly undesirable when very high-level languages such as SQL
are used to manipulate such data, and if such a solution is used for numeric columns all
sorts of problems can arise during aggregate functions such as SUM and AVERAGE etc.

Rule 4 :The database description rule


A description of the database is held and maintained using the same logical structures
used to define the data, thus allowing users with appropriate authority to query such
information in the same ways and using the same languages as they would any other data
in the database.
Put into easy terms, Rule 4 means that there must be a data dictionary within the
RDBMS that is constructed of tables and/or views that can be examined using SQL.
This rule states therefore that a dictionary is mandatory, and if taken in conjunction with
Rule 1, there can be no doubt that the dictionary must also consist of combinations of
tables and views.

Rule 5 :The comprehensive sub-language rule


There must be at least one language whose statementscan be expressed as character string
conforming to some well defmed syntax, that is comprehensive in supporting the following :
Data definition
View defmition
Data manipulation
Integrity constraints
Authorisation
Transaction boundaries
Again in real terms. this means that the RDBMS must be completely manageable through its
own dialect of SQL, although some products still support SQL-like languages (Ingress
support of Quel for example). This rule also sets out to scope the functionality of SQL - you
will detect an implicit requirement to support access control, integrity constraints and
transaction management facilities for example.

Rule 6 :The view updating rule


All views that can be updated in theory, can also be updated by the system. This is quite
a diffcult rule to interpret, and so a word of explanation is required whilst it is possible to
create views in all sorts of illogical ways, and with all sorts of aggregates and virtual
columns. it is obviously not possible to update through some of them. As a very simple
example. if you define a virtual column in a view as A*B where A and B are columns in a
base table, then how can you perform an update on that virtual column directly? The
database cannot possible break down any number supplied. into its two component parts.
without more information being supplied. To delve a little deeper. we should consider that
the possible complexit), of a view is almost infinite in logical terms. simply because a view
can be defined in terms of both tables and other views. Particular vendors restrict the
complexity of their own implementations, in some cases quite drastically.
Even in logical terms it is often incredibly difficult to tell whether a view is theoretically
updatable, let alone delve into the practicalities of actually doing so. In fact there exists
another set of rules that, when applied to a view. can be used to determine its level of logical
complexity. and it is only realistic to apply Rule 6 to those views that are defined as simple
by such criteria.

Rule 7 :The insert and update rule


An RDBMS must do more than just be able to retrieve relational data sets. It has to be ,
capable of inserting, updating and deleting data as a relational set. Many RDBMSes that fail Relatload Model
the grade fall back to a single-record-at-timeprocedural technique when it comes time to
manipulate data.

Rule 8 :The physical independence rule


User access to the database, via monitors or application programs, must remain logically
consistent whenever changes to the storage representation, or access methods to the data, are
changed.
Therefore, and by way of an example, if an index is built or dcstroycd by the DBAon a table.
any user should still retrieve the same data From that table, albeit a little more slowly. It is
largely this rule that demands the clear distinction between the logical and physical layers of
the database. Applications must be limited to interfacing with the logical layer to enable the
enforcement of this rule, and it is this rule that sorts out the men from thc boys in the
relational market place. Looking at other architectures alrady discusscd. one can imagine
the consequences of changing thc physical swucture of a network or hierarchical system.
However there are plenty of traps awaiting even in the relational world. Consider the
application designer who depends on the presence of a B-wee typc indcx to ensure retrieval
of data is in a predefined order, only to find that thc DBA dynamically drops the indcx! What
about the programmer who doesn't chcck for prime key uniqucncss in his application.
because he knows it is enforced by a unique indcx. The removal of such an index might be
catastrophic. I point out these two issues because although they are scrious factors. I am not
convinced that they constitule the breaking of this rule; it is for thc individual to makc up his
own mind.

Rule 9 :The logical data independence rule


Application programs must be independent of changes madc to thc base tables.

TAB 1 FRAG 1 FRAG 2

,4
::::A C F
6 B D G 6 D G
2 B D.H 2 B 2 D H

Figure I :TAB 1Split into two fragments

This rule allows many types of database design change to be made dynamically, without
users being aware of them. To illustrate the meaning of the rule the examples on the next
page show two types of activity, described in more detail later, that should be possible if this
rule is enforced.

FRAG 1 FRAG 2 TAB 1

A B C D
1 A C E
4 C D 4 A C F
6 D G 6 B D G
2 B 2 D H 2 B D H

Figure 2 :l b o fragments Combined into One Table


RDRMS and DDRMS Firstly, it should be possible to split a table vertically into more than one fragment, as long
as such splitting preserves all the original data (is non-loss), and maintain the primary key in
each and every fragment. This means in simple terms that a single table should be divisible
into one or more other tables.
Secondly it should be possible to combine base tables into one by way of a non-loss join.
Note that if such changes are made, then views will be required so that users and applications
are unaffected by them.

Rule 10 : Integrity rules


Thc relational modcl includcs two gencral integrity rules. These integrity rules implicitly or
explicitly dcfinc the set of consistcnt dambase statcs, or changes of state, or both. Other
intcgrity constraints can bc specified, for examplc, in terms of dependencies during database
dcsign. In this scction wc dcfinc thc intcgrity rules formulated by Codd.

Integrity Rule 1
lntcgrity rulc 1 is conccmcd with primary key values. Before we formally state the rule, let
us look at thc effect of null values in prime attributes. A null value for an attribute is a value
that is eithcr not known at Lhc timc or docs not apply to a given instancc of the object. It
may also bc possiblc that a particular tuplc docs not have a value for an attribute; this fact
could bc represented by a null value.
If any atvibutc of a primary kcy (primc attribute) wcrc permitted to have null values, then,
,
bccausc the attributes in h c kcy must bc nonrcdundant, thc key cannot bc used for unique
identification of tuples. This contradicts thc rcquircmcnts for a primary key. Consider the
relation P in figurc 3. The allribute Id is lhe primary kcy for P. If null values (represented as
@ were permitted, as in figure 3, hen h e two tuples @, Smilh are indistinguishable,even
though hey may represent two different instances of the entity type employee. Similarly, the
tuples < @, Lalonde > and lo4, Lalonde >, for all intents and purposes, are also
indistinguishable and may be referring to the same person. As instances of entities are
distinguishable. so must be thcir surrogates in the model.

Id Name
101 Jones
101 Jones
103 Smith @ Smith
104 Lalonde
104 Lalonde
107 Evan
107 Evan
110 Drew
110 Drew
@ Lalonde
112 Smilh
- @ Smith

Figure 3 :(a) Reistion withwt null values and (b) d a t l o n with null values

Integrity rule 1 specifies that insmnccs of the entities are distinguishable and thus no prime
atuibute (component of a primary key) value may be null. This rule is also referred to as the
entity rule. We could state this rule formally as:

Definition: Integrity Rule 1(Entity Integrity):


If the attribute A of relation R is a prime attribute of R, then A cannot accept null
values.

Integrity Rule 2 (Referential Integrity) :


Integrity rule 2 is concerned with foreign keys, i.e., with attributes of a relation having
domains that are those of the primary key of another relation.
Relation (R), may contain ref'erenxs m another relation (S). Relations R and S need not be
distinct. Suppose the reference in r is via a set of attributes that forms a primary key of the
relation S. This set of attributes in R is a foreign key. A valid relationship between a tuple in
R to one in S requires that the values of the attributes in the foreign key of R correspond to
the primary key of a tuple in S. This ensures that the reference from a tuple of the relation R
is made unambiguously to an existing tuple in the S relation. The referencing attributds) in
the R relation can have null value(s); in this case, it is not referencing any tuple in the S
relation. However, if the value is not null, it must exist as the primary attribute of a tuple of
the S relation. If the referencing attribute in R has a value that is nonexistent in S, R is
attempting to refer a nonexistent tuple and hen= a nonexistent instance of the corresponding
entity. This cannot be allowed. We illustrate this point in the following example:

Example
Consider the example of employees and their has a manager and as managers are also
employees, we may represent managcrs by their employee numbcrs, if the employee number
is a key of the rclation employee. Figure 4 illustrates an example of such an employee
relation. The Manager attributc represents thc cmploycc numbcr or the managcr. Manager is
a forcign key; notc that it is referring to the primary key of the . m e relation. An employee
can only have a manager who is also an employee. The chief executive officer (CEO) of thc
company can have himself or hcrsclf a$the managcr or may tikc null values. Some
employees may also bc temporarily without managcr, and this can bc rcprcsentcd by thc
Managcr taking null values.

F i ~ w r c4 : Foreign Keys

Definition :Integrity Rule 2 (Referential Integrity)


Given two relations R and S. suppose R refers to the relation S via a set of attributes that
forms the primary key of S and this set of attributes forms a foreign key in R. Then the value
of the foreign key in a tuple in R must either be equal to the primary key of a tuple of S or be
entirely null.
If we have the attribute A of relation R defincd on domain D and the primary key of relation
S also defined on domain D, then the values of A in tuples of R must bc either null or equal
to Lhe value, let us say v, where v is the primary key value for a tuple in S. Note that R and S
may be the same relation. 'The tuple in S is called the target of the foreign key. The primary
kcy of the referenced relation and the atmbutes in the foreign key of the referencing relation
could be composite.
Referential integrity is very important. Because the foreign key is used as a surrogate for
another entity, the rule enforces the existence of a tuple for the relation corresponding to the
instance of the referred entity. In example, we do not want a nonexisting employee to be
manager. The integrity rule also implicitly defines the possible actions that could be taken
whenever updates, insertions, and deletions are made.
If we delete a tuple that is a target of a foreign key reference, then threc explicit possibilities
exist to maintain database integrity:
All tuples that contain references to the deleted tuple should also be deleted. This
may cause, in turn. the deletion of other tuples. This option b referred to as a
domino or cascading deletion. since one deletion lcads to another.
Only tuples that are not referenced by any othcr tuple can be deleted. A tuple
referred by other tuples in the database cannot be deleted.
The tuple is deleted. However, to avoid the domino effect. the pertinent foreign key
atmbutes of all referencing tuples are set to null.
Similar actions arc required when the primary key of a referenced relation is updated. An
update of a primary key can be considered as a deletion followed by an insertion.
The choice of the option to use during a tuple deletion depends on the application. For
example. in most cases it would be inappropriate to delete all employees under a given
manager on the manager's departure; it would be more appropriate to replace it by null.
RDBMS and DDBMS Another examplc is when a department is closed. If employees were assigned to
departments, then the employee tuples would contain the department key too. Deletion of a
department tuples should be disallowed until the employees have either been reassigned or
I
!

their appropriate attribute values have been set to null. The insertion of a tuple with a
foreign key reference or the update of the foreign key attributes of a relation require a check I

that the referenced relation exists.


Although the definition of the relational model specifies the two integrity rules, it is
unfortunate that these concepls are not fully implemented in all commercial relational
DBMSs. The concept of referential integrity enforcement would require an explicit
statement as to what should be done when the primary key of a target tuple is updated or the
target tuple is deleted.

Rule 11: Distribution rule :


A RDBMS must have distribution independence.
This is one of the more attractive aspecls of RDBMSeaDatabase system built on the
relational framework are well suited to today's clienVserver database design.

Rule 12 :No subversion rule :


If an RDBMS supports a lower level language that permits for example, row-at-a-time
processing, then this language must not be able to bypass any integrity rules or
constraints of the relational language.
Thus, not only must a RDBMS be governed by relational rules, but those rules must be its
primary laws.

The practical importancc of thcse rulcs is difficult to estimate, and depends largely on the
RDBMS in qucstion, i& proposed use and individual vicw points, but the theoretical
importance is undeniable. It is interesting to see how some of the rulcs relate to others, and
to somc of thc morc important advantages of thc relational model. It is unlikely at the
prescnt timc that any RDBMS can claim full logical data independence because of their
generally poor ability to handlc updating through vicws. Even token adherence to this rule
however, when combined with facilities enabling physical data independence, potentially
yield advantages to applications developers, unheard of with any other type of databaw
systcm. Coupling thcse two rules with the data independence and distribution independcncc
rules can take thc protection of customer investment to new heighls.

The bcauty of the relational database is that the concepts that define it are few, easy to
undcrsland and explicit Thc 12 rulcs explained can be used as the basic relational design
criteria, and as such are clcar indications of thc purity of the relational concept. Whilst you
do not find thesc rules bcing quoted so often these days as in the recent past, it does not mean
that thcy arc any lcss imporcant. Rathcr it can be interpreted as reflecting a reduced
impormcc as propaganda. Othcr factors, of which performance is the most obvious, have
now takcn precedencc.

1.5 RELATIONAL ALGEBRA


Rclational algcbra is a procedural language. It specifies the opcrations to be performed on
existing relations to dcrivc rcsult relations. Furthcrmorc, it dcfines the complete scheme for
cach of thc rcsult relations. Thc rclational algcbraic opcrations can be divided into basic
set-oricnted opcrations and rclational- oricnted opcrations. The former are the traditional set
opcrations, the latter, thosc for performing joins, sclcction, projection, and division.

Basic Operations
Basic operations are the traditional set operations: union, difference, intersection and
cartesian product. Thrce of thcse four basic operations - union, intersection, and difference -
requirc that operand relations be union compatible. Two relations are union compatible if
thcy havc thc sarnc arity and onc-to-onc corrcspondcncc of the attributes with the
corresponding attributes defincd ovcr thc same domain. The cartesian product can be
defined on any two relations. Two relations P and Q are said to be-union compatible if both
P and Q are of the samc dcgrcc n and thc domain of the corresponding n attributcs are
identical, i.e. if P = P(P,, .... ,] and Q = {Q,,... Q,) thcn
Relational Model
Dom(Pi) = Dom(Q,) for i = ( 1,2, .....n)
where Dom(Pi) represents the domain of the attribute Pi.

Example 1
In the examples to follow,we utilise two relations P and Q given in Figure 5. R is a
computcd rcsult relation. We assume that the relations P and Q in Figure 5 represent employees
working on the development of software application packages J, and J, respectively.

Name
Smith
Lalonde
Byron
Drew

Figure 5 : Unlon compatible relations

If we assume that P and Q are two unioncompatible relations, then the union of P and Q is
the set-thcorcticunion of P and Q. The resultant relation, R = P U Q, has tuples drawn from
P and Q such that

The result relation R contains tuplcs that arc in cither Por Q or in both of them. The
duplicate tuplcs are eliminated.
Remember that from our definition of union compatibility the degree of the relations P and R
is the same. The cardinality of the resultant rclation depends on the duplication of tuples in P
and Q. From the abovc expression, we can see that if all the tuplcs in Q were contained in P,
then 1 Rl = I PI and R = P, while if the tuples in Pand Q weredisjoint, thcn I R I = 1 PI +I Ql.

Example 2
R.thc union of P and Q given in Figure 5 in the above example 1 is shown in Figure 6(a). R
represenls cmployccs working on the packagcs J, or J,, or both of these packages. Since a
relat~ondoes nothave duplicate tuplcs, an employee working on both J, and J2 will appear in
the relalion R only once.
R: R: R:

Smith
Lalonde
110 Draw

(b) P - Q (c) p n Q

I F i p r e 6 : Results of (a) union (b) diffcrcncc and (c) lntersectlon operations

DifTerence (-)
The difference operation rcmoves common tuplcs from the fist relation.

R = P-Qsuchthat

R = (111 E P h t g Q)

Example 3
I
i
R, the result of P - Q, gives cmployccs working only on package J,. (figure 6(b) in example
2). Employees working on both packagcs J, and J, have been removed.
RDBMS and DDBMS Intersection ( n )
The intersection operation selects the common tuples fnnn the two relations.

Example 4
The resultant relation of P n Q is the set of all employees working on both the packages.
(figure 5(c) of example 2).
The intersection operation is really unnecessary. It can be very simply expressed as:
P n Q =P-(P-Q)
It is, however, more convenient to write an expression with a single intersection operation
than one involving a pair of difference operations.
Note that in these examples the operand and the result relation schemes, including the
attribute names, are identical i.e. P = Q = R If the attribute names of compatible relations
are not identical, the naming of the attributes of the result relation will have to be resolved

Cartesian Product (x)


The extended cartesian or simply the artesian product of two relations is the concatenation
of tuples belonging to the two relations. A new resultant relation scheme is created
consisting of all possible combinations of the tuples.

where a tuple r E R is given by (t, 1 1 t2 I t, E -PA t2 E Q) ,i.e. the result relation is obtained
by concatenating each tuple in relation P with each tuple in relation Q. Here, represents the
concatenation operation.
The schcmc of the result relation is given by:

The degree of the result relation is given by:


IRI = !PI + IQI
The cardinality of the result relation is given by:

Example 5
The cartesian product of the PERSONNEL relation and SOFTWARE-PACKAGE relations
of figure 7(a) is shown in figure 7(b). Note that the relations P and Q from figure 5 of
examplc 1 are a subsct of thc PERSONNEL relation.

PERSONNEL : Software Packages :


Id Name
101 Jones
103 Smith
' 104^"
Lalondc
106 Byron
107 Evan
110 Drew
112 Srnilll
---
PERSONNEL : Relational Model

Id P.Narne S
101 Jones JI
101 Jones Jz
103 Smith JI
103 Smith Jz
10Q Lalonde J1
10Q Lalonde Jz
106 Byron J1
106 Byron Jz
107 Evan JI
107 Evan Jz
110 Drew JI
110 Drew J2
112 Smith JI
112 Smith Jz

Figure 7 : (a) PERSONNEL (EmpU, Name) and SOFTWARE-PACKACE(S) represent employes and soh
ware packages respectively; (b) the Cartesian pmduct of PERSONNEL and SOFI'WARE-PACKAGES

The union and intersection operations are associative and commutative; therefore, given
relations R, S, T:

The difference operation, in general, is noncommutative and nonassociative.

R-(S-T)#(R-S)-T nonassociative

Additional Relational Algebraic Operations


The basic set operations, which provide a very limited dala manipulation facility, have been

-
supplemented by the definition of the following operations: projection, selection,join, and
division. These operations are represented by symbols rc, a, and + respcctively.
Projection and selection are unary operations;join and division are binary.

- Projection ( x )
The projection of a relation is defined as a projection of all its tuples over some set of
attributes, i.e., it yields a vertical subset of the relation. The projection operation is used to
either reduce the number of attributes in the resultant relation or to reorder attributes. In the
first case, the arity (or degree) of the relation is rcduced. The projection operation is shown
graphically in figure 8. Figure 8 shows the projcction of the relation PERSONNEL on the
attribute Name. The cardinality of the result relation is also reduced due to the deletion of
duplicate tuples.
We defincd the projection of a tuple t; over thc attribute A, denoted t,[A] or x .(ti), as (a).
where a is the value

PERSONAL :
Id Name Name
101 Jones Jones
103 Smith Smith
104 Lalonde > Lalonde
106 Byron Byron
107 Evan Evan
110 Drew Drew
1 112 Smith -

Figure 8 : Projection of relation PERSONNELover attribute Name


RDBMS and DDBMS of tuple 4 over the attribute A. Similarly, we define the projection of a relation T, denoted by
T[A] or x ,('I?, on the atuibutc A. This is defined in terms of the projection for each tuple in
4 belonging to Ton the attribute A as:

where T[A] is a single attribute relation and I T[A] 1 5 T. The cardinality T[A] may be less
than the cardinality IT1 because of the deletion of any duplicates in the result A case in point
is illusuated in figure 8.
Similarly, we can define the projection of a relation on a set of attribute names, X, as a
concatenation of the projections for each auribute A in X for every tuple in the relation.

A belongs lo X
where 4[A] represents the concatenation of all 4[A] for all A E X.
A belongs to X
Simply stated, the projection of a relation P on the set of attribute names Y belong to P is the
projection of each tuple of the relation P on the set of attribute names Y.
Note that the projection operation reduces the arity if the number of atmbutes in X is less
than the arity of the relation. The projection operation may also reduce the cardinality of the
result relation sincc duplicate tuples are removed. (Note that the projection operation
produces a relation as the result. By definition, a relation cannot have duplicate tuples. In
most commercial implementationsof the relational model, however, the duplicates would
still be present in the result).

Selection (0)
Suppose we want to find those employees in the relation PERSONNEL of figure 7(a) of
exarnplef5with an Id less than 105. This is an operation that selects only some of the tuples
the relatibn. Such an operation is known as a selection OperahOn. 'l'he projection operation
yields a vcrtical subset of a relation. The action is defined over a subset of the attribute
names but over all the tuples in the relation. The selection operation, however, yields a
horizontal subset of a given relation, i.e., the action is defined over the complete set of
attribute names but only a subset of the tuples are included in the result. To have a tuple
included in the result relation, the specified selection conditions or predicates must be
satisfied by it. The selection operation, is sometimes known as the restriction operation.

PERSONNEL : Results of Selection


Id Name
101 Jones
103 Smith Smith
104 Lalonde 114 Lalonde
106 Byron
107 Evan
110 Drew
112 Smith

Figum 9 : Result of Selection over PERSONNEL for Id < 105.

Any finite number of predicates connected by Boolean operators may be specified in the
selection operation. 'The predicates may define a comparison between two
domaincompatible atuibutes or between an attribute and a constant value; if the comparison
is between auribute A, and constant c,, then c, belong to Dom(A,).
Given a relation P and a predicate expression B, the selections of those tuples of relation P
that satisfy the predicate B is a rclation R written as:

The above expression could be read as "select those tuples t from P in which the predicate
B(t) is me." The set of tuplcs in relation R are in this case defined as follows:
Relatlond Model

JOJN (4
The join operator. as the name suggests, allows the combining of two relations to form a
single new relation. The tuples from the operand relations that participate in the operation
and contribute to the result are related. The join operation allows the processing of
relationships existing between the operand relations.

Example 6
In figurelo we encounter the following relations: ASSIGNMENT (Emp#, Prod#, Job#)
JOB-FUNCHON (Job#, Title)
EMPLOYEE :

Em@ Name Profession


101 Jones Analyst
103 Smith hogrammer
104 Lalonde Receptionist
106 Byron Receptionist
107 Evan VPR & D
110 Drew VP operations
112 Smith Manager
PRODUCT :
Prod# Prod-Name Prod-Details
HEAP1 HEAP-SORT ISS Module
BINS9 BINARY-SEARCH ISS/R Module
FM6 1 FILE-MANAGER ISS/R-PC S U ~ S Y S
B++ 1 B++-TREE ISS/R Turbo Sys
B++2 B++-TREE ISS/R-PCTurbo
(a)
JOB-FUNCTION ASSIGNMENT

Job# Title Em@ Prod# Job#


loo0 CEO 107 HEAP 1 800
900 President 101 HEAP1 600
800 Manager 110 BINS9 800
700 Chief Programmer 103 HEAP1 700
600 Analyst 101 BINS9 700
110 FM6 800
107 B++1 800

Figure 10 (a) Relatlon schemes for employee role In development teams @) sample relatloas

Suppose we want to respond to the query Get product number of assignments whose
development teams have a chief programmer. This requircs first computing the cartesian
product of the ASSIGNMENT and JOB-FUNCTION relations. Let us name this product
relation TEMP. This is followed by selecting those tuples of TEMP where the attribute Title
has the value chief programmer and rhe value of the attribute Job# in ASSIGNMENT and
JOB-FUN(JTI0N are the same. The required rcsult. shown below is obtained by projecting
these tuples on the attribute Prod#. The operations are specified below.
TEMP = (ASSIGNMENT X JOB-FUNCTION)

Fl
IT-, (oTitle = 'chief programmer' A ASSIGNMENT.Job# (TEMP))

BINS 9
WDBMS and DDBMS In another method of responding to this query, we can first select those tuples from the
JOB-FUNCTION relation so that the value of the attribute Title is chief programmer. Let us
call this set of t4ples the relation TEMPI. We thcn compute the cartesian product of TEMP1
and ASSIGNMENT. c ~ ! I i ~the g product TEMP2. This is followed by a projection on Prod#
over TEMP2 to give us the required response. These operations are specified below:

TEMPl = (aTide = 'chief programmer '(JOB-FUNCTION))


TEMP2 = (oASSIGNMENT.JoW = JOB-FUN(JTION.Job# (ASSIGNMENT X TEMP1))
x- (TEMP2) gives the required result.

Notice Lhat in the seIection operation that follows the cartesian product we take only those
tupltx where the value of h e attributes ASSIGNMENT.JoWt and JOB-FUNCTION.Job# are
the same. These combined operations of cartesian product followed by selection are the join
operation. Note that we have qualified the identically named attributes by the name af the
corresponding relation to distinguish them.

In case of the join of a relation with itself, we would need to rename either the auributes of
one of the copies of the relation or the relation name itself. We illustrate this in example 7.

In general the join condition may have more than one tern, necessitating the use of the
subscript in the comparison opcrator. Now we shall define the different types of join
operations.

In thcsc discussions we use P,Q, R and so on to represent both the relation scheme and the
collection or bag of underlying domains of the attributes. We call it a bag of domains
bccause more than one attribute may be defined on the same domain.

Typically. P nQ may bc null and this guarantees the uniqueness of attribute names in the
result relation. When the same attribute name occurs in the two schemes we use qualified
names.

Two common and very useful variants of the join are the equi-join and the natural join. In
the equi-join the comparison operator theq(i = 1,2,.......n) is always the equality operator
(=). Similarly, in the natural join the comparison operator is always the equality opcrator.
However, only one of the two sets of domain compatible atuibutcs involved in the natural
join are A, from P and Bi from Q, for i = 1, .....,n. the natural join predicate is a conjunction
of terms of the following form:

(tl[Ai] = L2[Bi])for i = 1,2, ...., n

Domain compatibility requires that the domains of A, and B, be compatible, and for this
reason relation schemes P and Q have attributes defined on common domains, i.e., P nQ # $.
Therefore,join attributes have common domains in the relation schemes P and Q. q

Consequently, only one set of the join attributes on these common domains needs to be
preserved in the result relation. This is achieved by taking a projection after the join
operation, thereby eliminating the duplicate attributes. If the relation P and Q have attributes
wilh the same domains but different attribute names, then renaming or projection may be
i
I

specified. I

Example 7 ?
Given the EMPLOYEE and SALARY relat~onsot tigwe i i . i), ~f we have w firid the salary
employees by name, we join the tuples in the relation Eh4PLOYEE with those in SALARY
such that the value of the attribute Id in EMPLOYEE is the same as that in SALARY. The
natu~aljoin takes the predicate expression a be EMPL0YEE.M = SALARY.1d. The result
of the natural join is shown in figure I I lii) When using the natural join, we do not need to
specify this predicate. The expression lo specify the operation of finding the salary of
employees by name is given as follows. Here we project the resuh of the natural join
operation on the attributes Name and Salary:

x (Name.Salary) (EMPLOYEE w SALARY)


R d hJ Modd
EMPLOYEE : EMPLOYEE w SALARY
Id I Name I
101 Jones
103 Smith Smith
104 Lalonde 104 Lalonde 75
107 Evan 107 Evan 80

Figure I 1 : (I) The natural join of EMPLOYEE and SALARY relations;


(U) The jolnt of ASSIGNMENT wlth the renamed copy

I Division (+-)
Before we define the division operation. let us consider an example.

I
Example 8
Given the relations P and Q as shown in figure 12 (a). the result of dividing P by Q is the
relation R and it has two tuples. For each tuple in R, its product with the tuples of Q must be
in P. In our example (al,bl) and (a,&) must both be tuples in P;the same is true for (%, bl)
and (a,,b,).
Simply stated, the cartesian product of Q and R is a subset of P. In figure 12(b), the result
relation R has four tuples; the cartesian product of R and Q gives a resulting relation which is
P: Q: R (result) : Q: then R is :

(a) (b)
then R is Q: then R is :

Figure 12 : Examples nf the division operation. (a) R = P + Q;


( b ) R = P + Q ( P b t h e s p m e ~ ~ I n p a r t i()c; ) R = P t Q ( P i s t h e s a m e ~ s i n p a i ( i ) ;
(d)R=P+Q(PIsthesameaqInpartI)
RDBMS and DDBMS again a subset of P. In figure12(c), since there are no tuples in P with a value b, for the
attribute B (i.e., selection,, ,(P) = 0), we have an empty relation R, which has a cardinality
of zero.

In figure 12(d),the relation Q is empty. The result relation can be defined as the projection
of P on the awibutcs in P - Q. However, it is usual to disallow division by a empty relation.
t

Finally, if relation P is an empty relation, then relation R is also an empty relation.

Let us treat the Q as representing one set of properties (the properties are defined on the Q,
each tuple in Q representing an instance of these properties) and the relation r as representing
entities with these properties (entities are defined on P - Q, and the properties are, as before,
defined on Q); note that P u Q must be equal to P. Each tuple in P represents an object with
some given property. The resultant relation R, then, is the set of entities that possesses all the
properties specified in Q. The two entities a, and a, possess all the properties, i.e., b, and b2.
The other entities in P,az, a,, and a4, only possess one, not both, of the properties. The
division operation is useful when a query involves the phrase "for all objects having all the
specified properties." Note that both P - Q and Q in general represent a set of attributes. It
should be clear that Q not a subsct of P.

1.6 RELATIONAL COMPLETENESS 1


The notion of relational completeness was propounded by Codd in 1972 as a basis for
evaluating the power of different query languages.
A language is relationally complete if the basic relational algebra operations can be
performed. The basic relational algebra operations are
Union
Difference
Gross product
Projection
Selection
Query languages that are actually used in practice provide features in addition to the one
mentioned above. For example, they provide facilities for
1. modification, storage and deletion of information
2. printing relations
3. assigning relations to some relation names
4. computing aggregate functions like SUM and MAX
5. performing arithmetic, for example. like retrieving the Salary + commission. I
1
Check Your Progress
1. Define the following terms:
1

(a) Intention of a set


(b) Extension of a set
2. What is union compatible?
Rdatlonal Modd

The relational model has evoked a wide amount of interest in the database community. This
model has a very swnd mathematical basis a i t It exhibits a high degree of data
independence.
However, it has its share of difficulties. These are:
The relational model does not deal with issues like semantic integrity,
concurrency and database security. These issues are left to be solved by the
implementors of database management systems based on the relational model. The
most serious consequence of the foregoing was the absence of the concept of
semantic integrity in relational systems.
Traditionally, implementations of the relational model have suffered from the
drawback that they are relatively poor on response time. The biggest problem is in
the realization of the join operator. Whereas, a DBMS based on the relational
model can handle small databases, as the sizes of databases reach the region of
billions of bytes the performance of these systems falls rather drastically.
Consequently, these systems are able to support databases of relatively small sizes.

1.8 MODEL ANSWERS


1. (a) The intention of a set defines the permissible occurrences by specifying a
membership condition.
(b) The extension of the set specifies one of nuinerous possible occurrences by explicitly
listing the set members. These two methods of defining a set are iIlustrated by the
following example:
Intention of set G = (g I g is an odd positive integer less than 10)
Extension of set G = ( 1,35,7,9)

2. Two relations are union compatible if they have some arity and one a one
correspondence of the attributes with the corresponding altributes defirled over the same
domain.

1.9 FURTHER READING


Bipin C.Desai, An Introduction to Database System, Galgotia Publication, New Delhi.

You might also like