Principles of Database and Knowledge Base Systems Volume 1 1 PDF
Principles of Database and Knowledge Base Systems Volume 1 1 PDF
DATABASE AND
KNOWLEDGE -BASE
SYSTEMS
VOLUME
Jeffrey D. U 11 man
OBJECT
BASE
TWOPHASe
LOCKING
\
\ KNOWLEDGE BASE
THIRD
NORMAL
FORM
SHORTCUTS
Jj
DATABASE
FIRSTORDER
LOGIC
PRINCIPLES OF
DATABASE AND
KNOWLEDGE -BASE
SYSTEMS
VOLUME I
Series Editors
Alfred V. Aho, Bell Telephone Laboratories, Murray Hill, New Jersey
Jeffrey D. Ullman, Stanford University, Stanford, California
1. Algorithms for Graphics and Image Processing*
Tneo Pavlidis
2. Algorithmic Studies in Mass Storage Systems*
C. K. Wong
3. Theory of Relational Databases*
Jeffrey D. Ullman
4. Computational Aspects of VLSI*
Jeffrey D. Ullman
5. Advanced C: Food for the Educated Palate*
Narain Gehani
6. C: An Advanced Introduction*
Narain Gehani
7. C for Personal Computers: IBM PC, AT&T PC 6300, and Compatibles*
Narain Gahani
8. Principles of Computer Design*
Leonard R. Marino
9. The Theory of Database Concurrency Control*
Christos Papadimitriou
10. Computer Organization*
Michael Andrews
1 1 . Elements of Artificial Intelligence Using LlSP
Steven Tanimoto
12. Trends in Theoretical Computer Science
Egon Borger, Editor
13. An Introduction to Solid Modeling
Martti Mantyla
14. Principles of Database and Knowledge Base Systems, Volume I
Jeffrey D. Ullman
These previously-published books are in the Principles of Computer Science Series but they are not
numbered within the volume itself. All future volumes in the Principles of Computer Science Series
will be numbered.
PRINClPLES OF
DATABASE AND
KNOWLEDGE -BASE
SYSTEMS
VOLUME I
Jeffrey D. Ullman
STANFORD DIVERSITY
' I
All rights reserved. No part of this book may be reproduced in any form
including photostat, microfilm, and xerography, and not in information storage
and retrieval systems, without permission in writing from the publisher, except
by a reviewer who may quote brief passages in a review or as provided in the
Copyright Act of 1976.
Computer Science Press
1803 Research Boulevard
Rockville, Maryland 20850
123456
Printing
Year 93 92 91 90 89 88
PREFACE
vi
PREFACE
PREFACE
vii
in two. The first half, on query optimization for distributed systems, is moved
to Volume II, while the second half forms the core of the new Chapter 10; the
latter includes not only distributed locking, but also covers other issues such as
distributed agreement ("distributed commit").
Exercises
Each chapter, except the first, includes an extensive set of exercises, both to
test the basic concepts of the chapter and in many cases to extend these ideas.
The most difficult exercises are marked with a double star, while exercises of
intermediate difficulty have a single star.
Acknowledgements
The following people made comments that were useful in the preparation of
this volume: David Beech, Bernhard Convent, Jim Cutler, Wiebren de Jonge,
Michael Fine, William Harvey, Anil Hirani, Arthur Keller, Michael Kifer, Hans
Kraamer, Vladimir Lifschitz, Alberto Mendelzon, Jaime Montemayor, Inderpal Mumick, Mike Nasdos, Jeff Naughton, Meral Ozsoyoglu, Domenico Sacca,
Shuky Sagiv, Yatin Saraiya, Bruce Schuchardt, Mary Shaw, Avi Silberschatz,
Leon Sterling, Rodney Topor, Allen Van Gelder, Moshe Vardi, and Elizabeth
Wolf.
Alberto Mendelzon, Jeff Naughton, and Shuky Sagiv also served as the
publisher's referees.
My son Peter Ullman developed some of the TgX macros used in the prepa
ration of this manuscript.
The writing of this book was facilitated by computing equipment con
tributed to Stanford University by ATT Foundation and by IBM Corp.
Old Debts
The two editions of Ullman [1982] acknowleged many people who contributed
to that book, and many of these suggestions influenced the present book. I
thank in this regard: Al Aho, Brenda Baker, Dan Blosser, Martin Brooks,
Peter deJong, Ron Fagin, Mary Feay, Shel Finkelstein, Vassos Hadzilacos, Kevin
Karplus, Zvi Kedem, Arthur Keller, Hank Korth, Keith Lantz, Dave Maier, Dan
Newman, Mohammed Olumi, Shuky Sagiv, Charles Shub, Joe Skudlarek, and
Joseph Spinden.
Gerree Pecht, at Princeton, typed the first edition of the old book; vestiges
of her original troff can be found in the IgX source of this volume. Luis TrabbPardo assisted me in translation of Ullman [1982] from troff to TgX.
J. D. U.
Stanford CA
TABLE OF CONTENTS
1.1:
1.2:
1.3:
1.4:
1.5:
1.6:
1.7:
2.1:
2.2:
2.3:
2.4:
2.5:
2.6:
2.7:
Chapter
3.1:
3.2:
3.3:
3.4:
3.5:
3.6:
3.7:
3.8:
3.9:
3.10:
32
Data Models 32
The Entity-relationship Model 34
The Relational Data Model 43
Operations in the Relational Data Model 53
The Network Data Model 65
The Hierarchical Data Model 72
An Object-Oriented Model 82
Exercises 87
Bibliographic Notes 94
3: Logic as a Data Model 96
The Meaning of Logical Rules 96
The Datalog Data Model 100
Evaluating Nonrecursive Rules 106
Computing the Meaning of Recursive Rules 115
Incremental Evaluation of Least Fixed Points 124
Negations in Rule Bodies 128
Relational Algebra and Logic 139
Relational Calculus 145
Tuple Relational Calculus 156
The Closed World Assumption 161
Exercises 164
Bibliographic Notes 171
viii
TABLE OF CONTENTS
Chapter
4.1:
4.2:
4.3:
4.4:
4.5:
4.6:
4.7:
4.8:
Chapter
5.1:
5.2:
5.3:
5.4:
5.5:
5.6:
5.7:
Chapter
6.1:
6.2:
6.3:
6.4:
6.5:
6.6:
6.7:
6.8:
6.9:
6.10:
6.11:
6.12:
6.13:
6.14:
TABLE OF CONTENTS
Chapter
7.1:
7.2:
7.3:
7.4:
7.5:
7.6:
7.7:
7.8:
7.9:
7.10:
7.11:
Chapter
8.1:
8.2:
8.3:
8.4:
8.5:
8.6:
Chapter
9.1:
9.2:
9.3:
9.4:
9.5:
9.6:
9.7:
9.8:
9.9:
9.10:
9.11:
446
TABLE OF CONTENTS
Chapter
10.1:
10.2:
10.3:
10.4:
10.5:
10.6:
10.7:
10.8:
Bibliography
Index
616
588
XI
CHAPTER 1
Databases,
Object Bases,
and
Knowledge Bases
record
name : char [30] ;
manager : char [30] ;
end
The file itself is a sequence of records, one for each employee of the company.
D
In many of the data models we shall discuss, a file of records is abstracted
to what is often called a relation, which might be described by
EMPLOYEES(NAME, MANAGER)
Here, EMPLOYEES is the name of the relation, corresponding to the file men
tioned in Example 1.1. NAME and MANAGER are field names; fields are often
called attributes, when relations are being talked about.
While we shall, in this informal introductory chapter, sometimes use "file"
and "relation" as synonyms, the reader should be alert to the fact that they
are different concepts and are used quite differently when we get to the details
of database systems. A relation is an abstraction of a file, where the data type
of fields is generally of little concern, and where order among records is not
specified. Records in a relation are called tuples. Thus, a file is a list of records,
but a relation is a set of tuples.
Efficient File Access
The ability to store a file is not remarkable; the file system associated with
any operating system does that. The capability of a DBMS is seen when we
access the data of a file. For example, suppose we wish to find the manager
of employee "Clark Kent." If the company has thousands of employees, it is
very expensive to search the entire file to find the one with NAME = "Clark
Kent". A DBMS helps us to set up "index files," or "indices," that allow us
to access the record for "Clark Kent" in essentially one stroke, no matter how
large the file is. Likewise, insertion of new records or deletion of old ones can
be accomplished in time that is small and essentially constant, independent of
the file's length. An example of an appropriate index structure that may be
familiar to the reader is a hash table with NAME as the key. This and other
index structures are discussed in Chapter 6.
Another thing a DBMS helps us do is navigate among files, that is, to
combine values in two or more files to obtain the information we want. The
next example illustrates navigation.
Example 1.2: Suppose we stored in an employee's record the department for
which he works, but not his manager. In another file, called DEPARTMENTS,
we have records that associate a department's name with its manager. In the
style of relations, we have:
EMPLOYEES(NAME, DEPT)
DEPARTMENTS(DEPT, MANAGER)
Now, if we want to find Clark Kent's manager, we need to navigate from
EMPLOYEES to DEPARTMENTS, using the equality of the DEPT field in
both files. That is, we first find the record in the EMPLOYEES file that has
NAME - "Clark Kent", and from that record we get the DEPT value, which we
all know is "News". Then, we look into the DEPARTMENTS file for the record
having DEPT = "News", and there we find MANAGER = "Perry White". If
we set up the right indices, we can perform each of these accesses in some small,
constant amount of time, independent of the lengths of the files. D
Query Languages
To make access to files easier, a DBMS provides a query language, or data
manipulation language, to express operations on files. Query languages differ in
the level of detail they require of the user, with systems based on the relational
data model generally requiring less detail than languages based on other models.
Example 1.3: The query discussed in Example 1.2, "find the manager of Clark
Kent," could be written in the language SQL, which is based on the relational
model of data, as shown in Figure 1.1. The language SQL will be taught begin
ning in Section 4.6. For the moment, let us note that line (1) tells the DBMS
to print the manager as an answer, line (2) says to look at the EMPLOYEES
and DEPARTMENTS relations, (3) says the employee's name is "Clark Kent,"
and the last line says that the manager is connected to the employee by being
associated (in the DEPARTMENTS relation) with the same department that
the employee is associated with (in the EMPLOYEES relation).
(1) SELECT MANAGER
(2) FROM EMPLOYEES, DEPARTMENTS
(3) WHERE EMPLOYEES. NAME = 'Clark Kent1
(4)
AND EMPLOYEES. DEPT = DEPARTMENTS . DEPT ;
Figure 1.1 Example SQL query.
In Figure 1.2 we see the same query written in the simplified version of the
network-model query language DML that we discuss in Chapter 5. For a rough
description of what these DML statements mean, lines (1) and (2) together tell
the DBMS to find the record for Clark Kent in the EMPLOYEES file. Line
(3) uses an implied "set" structure EMP-DEPT that connects employees to
their departments, to find the department that "owns" the employee ("set"
and "owns" are technical terms of DML's data model), i.e., the department
to which the employee belongs. Line (4) exploits the assumption that there is
another set structure DEPT-MGR, relating departments to their managers. On
line (5) we find and print the first manager listed for Clark Kent's department,
and technically, we would have to search for additional managers for the same
department, steps which we omit in Figure 1.2. Note that the print operation
on line (5) is not part of the query language, but part of the surrounding "host
language," which is an ordinary programming language.
The reader should notice that navigation among files is made far more ex
plicit in DML than in SQL, so extra effort is required of the DML programmer.
The difference is not just the extra line of code in Figure 1.2 compared with
Figure 1.1; rather it is that Figure 1.2 states how we are to get from one record
to the next, while Figure 1.1 says only how the answer relates to the data. This
"declarativeness" of SQL and other languages based on the relational model
is an important reason why systems based on that model are becoming pro
gressively more popular. We shall have more to say about declarativeness in
Section 1.4. D
(1)
(2)
(3)
(4)
Transaction Management
Another important capability of a DBMS is the ability to manage simultane
ously large numbers of transactions, which are procedures operating on the
database. Some databases are so large that they can only be useful if they
are operated upon simultaneously by many computers; often these computers
are dispersed around the country or the world. The database systems used by
banks, accessed almost instantaneously by hundreds or thousands of automated
teller machines, as well as by an equal or greater number of employees in the
bank branches, is typical of this sort of database. An airline reservation system
is another good example.
Sometimes, two accesses do not interfere with each other. For example,
any number of transactions can be reading your bank balance at the same
time, without any inconsistency. But if you are in the bank depositing your
salary check at the exact instant your spouse is extracting money from an
automatic teller, the result of the two transactions occurring simultaneously
That is, view SAFE-EMPS consists of the NAME, DEPT, and ADDRESS fields
of EMPLOYEES, but not the SALARY field. SAFE-EMPS may be thought of
as a relation described by
SAFE-EMPS(NAME, DEPT, ADDRESS)
The view SAFE-EMPS does not exist physically as a file, but it can be queried
as if it did. For example, we could ask for Clark Kent's department by saying
in SQL:
SELECT DEPARTMENT
FROM SAFE-EMPS
WHERE NAME = ' Clark Kent ' ;
Normal users are allowed to access the view SAFE-EMPS, but not the relation
EMPLOYEES. Users with the privilege of knowing salaries are given access to
read the EMPLOYEES relation, while a subset of these are given the privilege
of modifying the EMPLOYEES relation, i.e., they can change people's salaries.
D
Security aspects of a DBMS are discussed in Chapter 8, along with the re
lated question of integrity, the techniques whereby invalid data may be detected
and avoided.
user group 1
view 1
user group 2
view 2
user group n
I view n
definition and
mapping written
in subscheme
data definition
language
conceptual
database
definition and
mapping written
in data defini
tion language
CH
I^3\
physical
database
implemented
on physical
devices
This bringing together of files was not a trivial task. Information of the same
type would typically be kept in different places, and the formats used for the
same kind of information would frequently be different.
Example 1.5: Different divisions of a company might each keep information
about employees and the departments to which they were assigned. But one
division might store employee names as a whole, while another had three fields,
for first, middle, and last names. The translation of one format into the other
might not be difficult, but it had to be done before a unified conceptual database
could be built.
Perhaps more difficult to reconcile are differences in the structure of data.
One division might have a record for each employee and store the employee's
department in a field of that record. A second division might list departments
in a file and follow each department record by a list of records, one for each
employee of that department. The difference is that a department suddenly
devoid of employees disappears in the first division's database, but remains
in the second. If there were such an empty department in each division, the
query "list all the departments" would give different answers according to the
structures of the two divisions. To build a conceptual scheme, some agreement
about a unified structure must be reached; the process of doing so is called
database integration, D
10
(1.2)
11
2.
3.
Data Independence
The chain of abstractions of Figure 1.3, from view to conceptual to physical
database, provides two levels of "data independence." Most obviously, in a
well-designed database system the physical scheme can be changed without
altering the conceptual scheme or requiring a redefinition of subschemes. This
independence is referred to as physical data independence. It implies that
12
13
14
4.
Enter (add to the database) flight 456, with 100 seats, from ORD to JFK
on August 21.
Items (1) and (3) illustrate the querying of the database, and they would
be implemented by programs like those of Figures 1.1 and 1.2. Item (2) is an
example of an update statement, and it would be implemented by a program
such as the following lines of SQL.
UPDATE FLIGHTS
SET SEATS = SEATS - 4
WHERE NUMBER = 123 AND DATE = 'AUG 31' ;
Item (4) illustrates insertion of a record into the database, and it would be
expressed by a program (in SQL) like:
INSERT INTO FLIGHTS
VALUES(456, 'AUG 21', 100, 'ORD', 'JFK');
Host Languages
Often, manipulation of the database is done by an application program, written
in advance to perform a certain task. It is usually necessary for an application
program to do more than manipulate the database; it must perform a variety
of ordinary computational tasks. For example, a program used by an airline to
book reservations does not only need to retrieve from the database the current
number of available seats on the flight and to update that number. It needs
to make a decision: are there enough seats available? It might well print the
ticket, and it might engage in a dialog with the user, such as asking for the
passenger's "frequent flier" number.
Thus, programs to manipulate the database are commonly written in a
host language, which is a conventional programming language such as C or
even COBOL. The host language is used for decisions, for displaying questions,
and for reading answers; in fact, it is used for everything but the actual querying
and modification of the database.
The commands of the data manipulation language are invoked by the hostlanguage program in one of two ways, depending on the characteristics of the
DBMS.
1. The commands of the data manipulation language are invoked by hostlanguage calls on procedures provided by the DBMS.
15
Program in ordinary
programming language
Program in extended
programming language
CALL GET(B)
##GET(B)
A := B+l
A := B+l
CALL STORE(A)
##STORE(A)
A
i
i
i
| Database
Causes
16
modifying data, values are copied from the local data variables to the database,
again in response to calls to the proper procedures. For example, the request
to decrement by 4 the number of seats on a certain flight could be performed
by:
1. Copying the number of seats remaining on that flight into the local data
area,
2. Testing if that number was at least 4, and if so,
3. Storing the decremented value into the database, as the new number of
seats for that flight.
Database System Architecture
In Figure 1.6 we see a diagram of how the various components and languages
of a database management system interact. On the right, we show the de
sign, or database scheme, fed to the DDL compiler, which produces an internal
description of the database. The modification of the database scheme is very
infrequent, compared to the rate at which queries and other data manipulations
are performed. In a large, multiuser database, this modification is normally the
responsibility of a database administrator, a person or persons with respon
sibility for the entire system, including its scheme, subschemes (views), and
authorization to access parts of the database.
We also see in Figure 1.6 the query-language processor, which is given data
manipulation programs from two sources. One source is user queries or other
data manipulations, entered directly at a terminal. Figure 1.1 is an example
of what such a query would look like if SQL were the data manipulation lan
guage. The second source is application programs, where database queries and
manipulations are embedded in a host language and preprocessed to be run
later, perhaps many times. The portions of an application program written
in a host language are handled by the host language compiler, not shown in
Figure 1.6. The portions of the application program that are data manipula
tion language statements are handled by the query language processor, which is
responsible for optimization of these statements. We shall discuss optimization
in Chapter 11 (Volume II), but let us emphasize here that DML statements,
especially queries, which extract data from the database, are often transformed
significantly by the query processor, so that they can be executed much more
efficiently than if they had been executed as written. We show the query pro
cessor accessing the database description tables that were created by the DDL
program to ascertain some facts that are useful for optimization of queries, such
as the existence or nonexistence of certain indices.
Below the query processor we see a database manager, whose role is to
take commands at the conceptual level and translate them into commands at
the physical level, i.e., the level of files. The database manager maintains and
User
Query
17
Application
Program
Database
Scheme
Authorization
Tables
(Concurrent,
Access
Tables
Database
Description |
Tables
18
time," since we can read the entire unit after moving the disk head once.
As another example of a possible specialization of the file manager, we
indicated in Figure 1.6 that the file manager may use the concurrent access
tables. One reason it might be desirable to do so is that we can allow more
processes to access the database concurrently if we lock objects that are smaller
than whole files or relations. For example, if we locked individual blocks of
which a large file was composed, different processes could access and modify
records of that file simultaneously, as long as they were on different blocks.
1.4 MODERN DATABASE SYSTEM APPLICATIONS
The classical form of database system, which we surveyed in the first three
sections, was designed to handle an important but limited class of applications.
These applications are suggested by the examples we have so far pursued: files
of employees or corporate data in general, airline reservations, and financial
records. The common characteristic of such applications is that they have large
amounts of data, but the operations to be performed on the data are simple.
In such database systems, insertion, deletion, and retrieval of specified records
predominates, and the navigation among a small number of relations or files,
as illustrated in Example 1.3, is one of the more complex things the system is
expected to do.
This view of intended applications leads to the distinction between the
DML and the host language, as was outlined in the previous section. Only
the DML has the built-in capability to access the database efficiently, but the
expressive power of the DML is very limited. For example, we saw in Section
1.1 how to ask for Clark Kent's manager, and with a bit more effort we could
ask for Clark Kent's manager's manager's manager, for example. However, in
essentially no DBMS commercially available in the late 1980's, could one ask in
one query for the transitive closure of the "manages" relationship, i.e., the set of
all individuals who are managers of Clark Kent at some level of the managerial
hierarchy.3
The host language, being a general-purpose language, lets us compute man
agement chains or anything else we wish. However, it does not provide any as
sistance with the task that must be performed repeatedly to find the managers
of an individual at all levels; that task is to answer quickly a question of the
form "who is X's manager?"
The DML/host language dichotomy is generally considered an advantage,
rather than a deficiency in database systems. For example, it is the limited
power of the DML that lets us optimize queries well, transforming the algo
rithms that they express in sometimes surprising, but correct, ways. The same
3 Some commercial DBMS's have a built-in facility for computing simple recursions, like
managerial hierarchies, but they cannot handle any more complex recursion.
19
20
Cell2
Celll
Figure 1.7 Cells in a drawing database.
ory. However, VLSI images can be 100 times as large, and multicolored as well.
A page of text, such as the one you are reading, can have more data still, when
printed on a high-quality printer. When an image has that much data, the host
language program can no longer store a bitmap in main memory. For efficiency,
it must store the image on secondary storage and use a carefully designed al
gorithm for exploring the contents of cells, to avoid moving large numbers of
pages between main and secondary storage. This is exactly the sort of task that
a data manipulation language does well, but when written in the host language,
the programmer has to implement everything from scratch. Thus, for graphics
databases and many similar applications, the DML/host language separation
causes us great difficulty.
Integration of the DML and Host Language
There are two common approaches to the problem of combining the fast access
capability of the DML with the general-purpose capability of the host language.
1. The "object-oriented" approach is to use a language with the capability
of denning abstract data types, or classes. The system allows the user to
embed data structures for fast access in those of his classes that need it.
Thus, the class "cell" for Example 1.9, might be given indices that let us
find quickly the constituents of a given cell, and the cells of which a given
cell is a constituent.
2. The "logical" approach uses a language that looks and behaves something
like logical (if then) rules. Some predicates are considered part of the
conceptual scheme and are given appropriate index structures, while others
are used to define views, as if they were part of a subscheme. Others may
21
Declarative Languages
There is a fundamental difference between the object-oriented and logical ap
proaches to design of an integrated DML/host language; the latter is inherently
declarative and the former is not. Recall, a declarative language is a language in
which one can express what one wants, without explaining exactly how the de
sired result is to be computed. A language that is not declarative is procedural.
"Declarative" and "procedural" are relative terms, but it is generally accepted
that "ordinary" languages, like Pascal, C, Lisp, and the like, are procedural,
with the term "declarative" used for languages that require less specificity re
garding the required sequence of steps than do languages in this class. For
instance, we noticed in Example 1.3 that the SQL program of Figure 1.1 is
more declarative than the Codasyl DML program of Figure 1.2. Intuitively,
the reason is that the DML program tells us in detail how to navigate from
employees to departments to managers, while the SQL query merely states the
relationship of the desired output to the data.
The declarativeness of the query language has a significant influence on
the architecture of the entire database system. The following points summarize
the observed differences between systems with declarative and procedural lan
guages, although we should emphasize that these assertions are generalizations
that could be contradicted by further advances in database technology.
1. Users prefer declarative languages, all other factors being equal.
2. Declarative languages are harder to implement than procedural languages,
because declarative languages require extensive optimization by the system
if an efficient implementation of declaratively-expressed wishes is to be
found.
3. It appears that true object-orientedness and declarativeness are incompat
ible.
The interactions among these factors is explored further in Section 1.7.
1.5 OBJECT-BASE SYSTEMS
The terms "object base" and "object-oriented database management system"
(OO-DBMS) are used to describe a class of programming systems with the ca
pability of a DBMS, as outlined in Section 1.1, and with a combined DML/host
language having the following features.
22
1.
Complex objects, that is, the ability to define data types with a nested
structure. We shall discuss in Section 2.7 a data model in which data
types are built by record formation and set formation, which are the most
common ways nested structures are created. For example, a tuple is built
from primitive types (integers, etc.) by record formation, and a relation is
built from tuples by set formation; i.e., a relation is a set of tuples with
a particular record format. We could also create a record one of whose
components was of type "set of tuples," or even more complex structures.
2. Encapsulation, that is, the ability to define procedures applying only to
objects of a particular type and the ability to require that all access to
those objects is via application of one of these procedures. For example,
we might define "stack" as a type and define the operations PUSH and
POP to apply only to stacks (PUSH takes a parameterthe element to be
pushed).
3. Object identity, by which we mean the ability of the system to distinguish
two objects that "look" the same, in the sense that all their components of
primitive type are the same. The primitive types are generally character
strings, numbers, and perhaps a few other types that have naturally asso
ciated, printable values. We shall have more to say about object identity
below.
A system that supports encapsulation and complex objects is said to sup
port abstract data types (ADT's) or classes. That is, a class or ADT is a def
inition of a structure together with the definitions of the operations by which
objects of that class can be manipulated.
Object Identity
To see the distinction between systems that support object identity and those
that do not, consider Example 1.2, where we discussed a file or relation
EMPLOYEES(NAME, DEPT)
consisting of employee-department pairs. We may ask what happens if the
News department has two employees named Clark Kent. If we think of the
database as a file, we simply place two records in the file; each has first field
"Clark Kent" and second field "News" . The notion of a file is compatible with
object identity, because the position of a record distinguishes it from any other
record, regardless of its printable values.
However, when we view the data as a relation, we cannot store two tuples,
each of which has the value
("Clark Kent", "News")
The reason is that formally, a relation is a set. A tuple cannot be a member of
a set more than once. That is, there is no notion of "position" of a tuple within
23
24
25
constants begin with a lower-case letter, with the exception that constants
are also permitted to be integers. Variables must begin with a capital letter.
Logical statements, often called rules, will usually be written in the form of
Horn clauses, which are statements of the form: "if A\ and A^ and An are
true, then B is true." The Prolog syntax for such a statement is:5
B :- Ai & A2 & & An.
The symbol :- can generally be read "if." Note the terminal period, which serves
as an endmarker for the rule. If n = 0, then the rule asserts B unconditionally,
and we write it
B.
Example 1.10: The following two rules can be interpreted as an inductive
definition of addition, if we attribute the proper meanings to the predicate
symbol sum and the function symbol s. That is, sum(X, Y, Z) is true exactly
when Z is the sum of X and Y, while s(X) is the successor of X, that is, the
integer which is one more than X. Then the rules:
sum(X,0,X).
sum(X,s(Y),s(Z)) :- sum(X,Y,Z).
say that X + 0 = X, and that if X + Y = Z, then X + (Y + 1) = (Z + I). D
Example 1.11: We are more often interested in logical rules expressing in
formation about the data in a database. We should appreciate the similarity
between the logical notion of a predicate with its arguments and a relation
name with its attributes. That is, we can think of a predicate as true for its
arguments if and only if those arguments form a tuple of the corresponding
relation. For instance, we can define a view SAFE-EMPS as we did in Example
1.4, by the logical rule:
safe-emps(N,D,A) :- employees (N.D.S, A) .
In order to interpret the above rule, we must remember that EMPLOYEES
has four fields, the NAME, DEPT, SALARY, and ADDRESS. The rule says
that for all employee names N, departments D, and addresses A, (N, D, A) is a
fact of the safe-emps predicate if there exists a salary 5, such that (N, D, S, A)
is a fact of the employees predicate. Note that in general, a variable like S,
appearing on the right side of the :- symbol but not on the left, is treated as
existenti&lly quantified; informally, when reading the rule we say "there exists
some 5" after saying the "if that corresponds to the :- symbol.
For another example of how information about data can be expressed in
logical terms, let us suppose that we have our earlier EMPLOYEES relation,
whose only attributes are NAME and DEPT, and let us also use the DEPARTMoet Prolog versions use a comma where we use the ampersand.
26
(1.3)
That is, (F, M) is a manages fact if there exists a department D such that
(E, D) is an employees fact and (D, M) is a departments fact. In essence, we
have used the above logical rule to create view, manages, which looks like a
relation with attributes name and manager. The queries shown in Figures 1.1
and 1.2, to find Clark Kent's manager, could be expressed in terms of this view
quite simply:
manages ('Clark Kent', X)
(1.4)
The value or values of X that make (1.4) true are found by an algorithm es
sentially the same as the one that would answer Figure 1.1 or 1.2, but the
logical rule (1.3) plays an important part in allowing the system to interpret
what the query means. In a loose sense, we might suppose that (1.3) represents
"knowledge" about the manages relationship. D
Expressiveness of Logic
We mentioned in Section 1.4 that SQL and similar data manipulation languages
do not have enough power to compute transitive closures, such as managerial
hierarchies.6 Logical rules using function symbols have all the power of a Tur
ing machine; i.e., they can express any computation that can be written in
conventional programming languages. Even logical rules without function sym
bols (a language we shall call "datalog" in Chapter 3) have power to express
computations beyond that of conventional DML's, as the next example will
suggest.
Example 1.12: Suppose we have a relation (or predicate) manages(E,M),
intended to be true exactly when employee E reports directly to manager M .
We may wish to define another predicate boss(E, B), intending it to be true
whenever B is E's manager, or his manager's manager, or his manager's man
ager's manager, and so on; i.e., boss is the transitive closure of manages. The
predicate boss can be expressed in Horn clauses as
(1) boss(E.M) :- manages (E,M) .
(2) boss(E.M) :- boss(E.N) ft manages(N.M) .
The above is a typical example of how logical rules can be used recursively,
that is, used to define a predicate like boss in terms of itself. To argue that a
6 Formally, the transitive closure of a binary relation r is the smallest relation 3 that
includes r and is transitive, i.e., s(X, Y) and s(Y, Z) imply s(X, Z).
27
then by the inductive hypothesis, we can infer 6oss(e,cn_i), and we also are
given manages(cn-i,b). Thus, we may use rule (2) to infer boss(e,b). D
Example 1.13: The following rules solve the problem posed in Section
1.4, of expanding the recursive definition of cells. We shall suppose the
database stores a predicate set(I, X, Y), meaning that the pixel with coor
dinates (X, Y) is set (has value 1) in the cell named /. Also stored is the
predicate contains(I, J, X, Y), meaning that cell / contains a copy of cell J,
with the origin of the copy at position (X, Y) of cell /. Then we can define the
predicate on(I, X, Y) to mean that point (X, Y) is set in cell /, either directly
or through one of /'s constituent cells.
onU.X.Y) :- set(I,X,Y).
onU.X.Y) :- containsU, J.U.V) ft on(J,W,Z) ft
X = U+W ft Y = V+Z.
The first rule says a point is on if it is directly set. The second says that
(X, Y) is set in / if there exists a cell J such that a copy of J appears in /
with origin translated to point (U, V), the point (W, Z) is on in J, and the
translation from the origin of / to point (U, V), and from there to the relative
point (W, Z) takes us to (X, Y) of the coordinate system of cell /. D
28
No
Separate
Value
Yes
Separate
Object
No
Integrated
Value
Yes
Integrated
The earliest true DBMS's appeared in the 1960's, and they were based on
either the network or the hierarchical data models, which we discuss in Sections
2.3, and 2.4. Languages based on these models appear in Chapter 5. These
systems provided efficient access to massive amounts of data, but they neither
integrated the DML and host languages, nor did they provide query languages
that were significantly declarative. They were object-oriented in the sense that
they supported object identity, although they did not support abstract data
types.
BIBLIOGRAPHIC NOTES
29
The 1970's saw the advent of relational systems, following Codd's seminal
paper (Codd [1970]). A decade of development was needed, with much of the
research devoted to the techniques of query optimization needed to execute the
declarative languages that are an essential part of the relational idea. As we
mentioned, relational systems are declarative and value-oriented, but they do
not easily allow us to integrate the DML and host languages.
We see the 1980's as the decade of object-oriented DBMS's in the true
sense of the term; i.e., they support both object identity and abstract data
types. These are the first systems to provide well-integrated data manipulation
and host languages. However, in one sense, they represent a retrograde step:
they are not declarative, the way relational systems are.
Our prediction is that in the 1990's, true KBMS's will supplant the OODBMS's just as the relational systems have to a large extent supplanted the ear
lier DBMS's. These systems will provide both declarativeness and integration of
the DML/host language. We predict that they will be inherently value-oriented
and logic-based. It also appears that there is much to be learned about query
optimization before it is possible to implement commercial KBMS's. Much of
Volume II is devoted to this technology of optimization.
BIBLIOGRAPHIC NOTES
Most of the topics introduced in this chapter will be taken up again in later
chapters, and we defer the references in those areas until they are studied more
deeply. Here, we mention a few odd topics and give some references regard
ing knowledge-base systems that would otherwise not appear until the second
volume.
Three-Level Architecture
The three levels of abstractionphysical, conceptual, and viewappear in
the "DBTG report" (CODASYL [1971]). They are also a feature of the
"ANSI/SPARC report" (ANSI [1975]), where they are called internal, con
ceptual, and external, respectively. Tsichritzis and Klug [1978] is an informal
introduction to a revised version of that report.
Database Integration
We mentioned in Section 1.1 that the process of "database integration" is needed
when files from several sources are combined to form a single database. El Masri
and Wiederhold [1979] and Litwin [1984] give notations to assist in the process.
30
Stonebraker and Rowe [1977] discuss the classical architecture of a DBMS that
interfaces a data manipulation language to a host language, as was discussed
in Section 1.3.
Object-Oriented Systems
BIBLIOGRAPHIC NOTES
31
General Sources
The bibliography by Kambayashi [1981] is getting out-of-date but is an exten
sive compendium of work in database systems prior to its publication. Date
[1986], Korth and Silberschatz [1986], and Wiederhold [1983] contain large,
general bibliographies; the latter also catalogs commercial database systems.
Bernstein, Hadzilacos, and Goodman [1987] has extensive references on con
currency control and distributed systems. Wiederhold [1987] provides a large
bibliography on file structures and physical design.
CHAPTER 2
Data Models
for
Database Systems
We now consider the principal models used in database systems. Section 2.2
introduces the entity-relationship model, which is used primarily as a database
design tool. The relational model is discussed in Sections 2.3 and 2.4; the
network model in Section 2.5, and the hierarchical model, which is based on
collections of tree structures, in Section 2.6. In Section 2.7 we introduce the
"object model," which is a synthesis of several models based on complex objects.
Chapter 3 is devoted to one particular data model, based on logic. This
"datalog" data model plays an important role in most knowledge systems and
KBMS's, and aspects of that model are also central to the relational model
for database systems. We shall also meet, in Chapters 4 and 5, certain query
languages based on the models we discuss here.
2.1 DATA MODELS
A data model is a mathematical formalism with two parts:
1. A notation for describing data, and
2. A set of operations used to manipulate that data.
Chapter 1 made brief mention of two important models. One is the rela
tional model, where the notation for describing data is a set of names, called
attributes, for the columns of tables. Another is the network model, which uses
directed graphs to describe data. For neither of these models did we discuss the
operations used to manipulate data, although examples were given in Figures
1.1 and 1.2.
Distinctions Among Data Models
One might naturally ask whether there is one "best" data model for database
systems. The multiplicity of models in use suggests not. Below, we list some
32
33
of the differences among models that influence where and when they are best
used.
1.
Purpose. Most data models are intended to serve as the notation for data
in a database and as a notation underlying the data manipulation language.
The entity-relationship model, on the other hand, is intended as a notation
for conceptual scheme design, prior to an implementation in the model
of whatever DBMS is used. This model is therefore missing a notion of
operations on data (although some have been proposed), and one could
even argue that it should not be classified as a data model at all.
2.
3.
4.
34
35
Isa Hierarchies
We say A isa B, read "A is a B" if entity set B is a generalization of entity set
A, or equivalently, A is a special kind of B. The primary purpose of declaring
isa relationships between entity sets A and B is so A can inherit the attributes
of B, but also have some additional attributes that don't make sense for those
members of B that are not also members of A. Technically, each entity a in
set A is related to exactly one entity b in set B, such that a and f' are really
the same entity. No b in B can be so related to two different members of A,
but some members of B can be related to no member of A. The key attributes
for entity set A are actually attributes of entity set B, and the values of those
attributes for an entity a in A are taken from the corresponding b in B.
Example 2.3: A corporation might well have an entity set EMPLOYEES,
with attributes such as ID_NO, NAME, and SALARY. If the corporation were
36
a baseball team, certain of the employees, the players, would have other im
portant attributes, like BATTING-AVG or HOME-RUNS, that the other em
ployees would not have. The most sensible way to design this scheme is to
have another entity set PLAYERS, with the relationship PLAYERS isa EM
PLOYEES. Attributes like NAME, belonging to EMPLOYEES, are inherited
by PLAYERS, but only PLAYERS have attributes like BATTING_AVG. D
Relationships
A relationship among entity sets is an ordered list of entity sets. A particular
entity set may appear more than once on the list. This list of entity sets is the
scheme-level notion of a relationship. If there is a relationship "R. among entity
sets EI, E2, , Ek, then the current instance of "R. is a set of fc-tuples. We call
such a set a relationship set. Each fc-tuple (6i,62,. .. ,6^) in relationship set H
implies that entities ei,62,...,6fc, where e\ is in set EI, e2 is in set E2, and
so on, stand in relationship H to each other as a group. The most common
case, by far, is where k = 2, but lists of three or more entity sets are sometimes
related.
Example 2.4: Suppose we have an entity set PERSONS and we have a rela
tionship MOTHER-OF, whose list of entity sets is PERSONS, PERSONS. The
relationship set for relationship MOTHER-OF consists of all and only those
pairs (pi,p2) such that person pz is the mother of person p\.
An alternative way of representing this information is to postulate the
existence of entity set MOTHERS and relationship MOTHERS isa PERSONS.
This arrangement would be more appropriate if the database stored values for
attributes of mothers that it did not store for persons in general. Then the
relationship MOTHER-OF would be the list of entity sets
PERSONS, MOTHERS
To get information about a person's mother as a person, we would compose (in
the sense of ordinary set-theoretic relations) the relationships MOTHER-OF
and isa. D
Borrowed Key Attributes
We mentioned in connection with isa relationships that if A isa B, then the key
for A would naturally be the key attributes of B, and those attributes would not
appear as attributes of entity set A. Thus, in Example 2.3, the key for entity
set PLAYERS would most naturally be the attribute IDJMO of EMPLOYEES.
Then a player would be uniquely identified by the ID_NO of the employee that
is him.
There are times when we want the key for one entity set A to be attributes
of another entity set B to which A is connected by a relationship TL other than
37
Entity-Relationship Diagrams
It is useful to summarize the information in a design using entity-relationship
diagrams, where:
1. Rectangles represent entity sets.
2. Circles represent attributes. They are linked to their entity sets by (undir
ected) edges. Sometimes, attributes that are part of the key for their entity
set will be underlined. As a special case regarding attributes, we sometimes
identify an entity set having only one attribute with the attribute itself,
calling the entity set by the name of the attribute. In that case, the entity
set appears as a circle attached to whatever relationships the entity set is
involved in, rather than as a rectangle.
3. Diamonds represent relationships. They are linked to their constituent en
tity sets by edges, which can be undirected edges or directed edges (arcs);
the use of arcs is discussed later when we consider functionality of rela
tionships. The order of entity sets in the list for the relationship can be
indicated by numbering edges, although the order is irrelevant unless the
same entity set appears more than once on a list.
Example 2.5: Figure 2.1(a) shows a simple entity-relationship diagram, with
three entity sets, EMPS, DEPTS, and MANAGERS. The first two are related
by relationship ASSIGNED _TO and the second and third are related by MAN
AGES. For the present, we should ignore the arrows on some of the edges
connecting the relationship diamonds to the entity-set rectangles. We show
three attributes, NAME, PHONE, and SALARY, for EMPS; NAME is taken
to be the key.1 Departments have attributes NAME (of the department) and
1 While we shall often imagine that "names" of entities can serve as keys for their entity
38
PHONE I
EMPS
C ASSIGNED
(b)
Figure 2.1 Examples of entity-relationship diagrams.
LOCATION, while MANAGERS has only the attribute name.2
In Figure 2.1(b) we see an entity set PERSONS and we see a relationship
PARENT-OF between PERSONS and PERSONS. We also notice two edges
from PARENT-OF to PERSONS; the first represents the child and the second
the parent. That is, the current value of the PARENT-OF relationship set is
the set of pairs (pi,pz) such that p2 is known to be a parent of p\. D
Functionality of Relationships
Many-Many Relationships
We also encounter many-many relationships, where there are no restrictions
on the sets of fc-tuples of entities that may appear in a relationship set. For
example, the relationship PARENT_OF in Figure 2.1 is many-many, because
we expect to find two parents for each child, and a given individual may have
any number of children. The relationship of enrollment between courses and
students, mentioned in Section 2.1, is another example of a many-many rela
tionship, because typically, many students take each course and many courses
are taken by each student.
While many-many relationships appear frequently in practice, we have to
be careful how these relationships are expressed in the conceptual scheme of the
actual database.3 Many data models do not allow direct expression of manymany relationships, instead requiring that they be decomposed into several
3 Recall that the entity-relationship design is not the conceptual scheme, but rather a
sketch of one, and we need to translate from entities and relationships into the data
model of the DBMS that is used.
40
41
BALANCEJ)
(CADDR
42
43
44
STATE
POP
San Diego
Miami
Pittsburg
Texas
Oklahoma
Iowa
4490
13880
509
45
tuples (Buffalo, W. Va., 831) and (W. Va., 831, Buffalo) would not be the same,
and the two relations of Figure 2.4 would not be considered the same. D
CITY
STATE POP
Buffalo
W. Va.
831
Providence Utah
1608
Las Vegas N. M. 13865
46
47
WORKSJN(ENAME, DNAME)
MANAGES(ENAME, DNAME)
CARRIES(INAME, DNAME)
SUPPLIES(SNAME, INAME, PRICE)
INCLUDES(O#, INAME, QUANTITY)
PLACED-BY(O#, CNAME)
In each case, the set of attributes is the set of keys for the entity sets
connected by the relationship of the same name as the relation. For exam
ple, SUPPLIES connects SUPPLIERS, ITEMS, and PRICE, which have keys
SNAME, INAME, and PRICE, respectively, and it is these three attributes we
see in the scheme for SUPPLIES. Fortunately, there were no coincidences among
the names of the key attributes, so none had to have their names changed.
The two relations MANAGES and WORKSJN have the same set of at
tributes, but of course their meanings are different. We expect that tuple (e,d)
in MANAGES means that e manages department d, while the same tuple in
WORKSJN means that e is an employee in department d.
These thirteen relations are not an ideal design for the YVCB relational
database scheme. We shall consider how to improve the scheme in the remainder
of this section. Chapter 7 covers the design of relational database schemes from
a more formal point of view. D
Keys of Relations
Like entity sets, relations have sets of one or more attributes that serve as a
keys. For relations we can give a definition of "key" that is more formal than the
informal notion of a set of attributes that "distinguish" members of an entity
set. We say that a set 5 of attributes of a relation R is a key if
1. No instance of R that represents a possible state of the world can have two
tuples that agree in all the attributes of S, yet are not the same tuple, and
2. No proper subset of 5 has property (1).
Example 2.10: In the relation SUPPLIES, from Example 2.9, SNAME and
INAME together form a key. If there were two tuples (s,i,pi) and (s,i,p2) in
SUPPLIES, then supplier a would apparently sell item i both at price pi and
at price p2, a situation that means our data is faulty. This observation justifies
condition (1). To check (2) we have to consider the proper subsets, that is,
SNAME alone and INAME alone. Neither should satisfy condition (1). For
example, it is quite possible that we find the two tuples
(Acme, Brie, 3.50)
(Acme, Perrier, 1.25)
in SUPPLIES at the same time, and although they agree on SNAME, they are
48
If a relation comes from an entity set, a set of attributes is a key for that
relation if it is a key for the entity set.
If a relation comes from a many-many relationship, then the key for the
relation is normally the set of all the attributes.
If a relation comes from a one-to-one relationship between entity sets E and
F, then the key for E and the key for F are both keys for the relation. Note
that relations, like entity sets, can have more than one set of attributes that
is a candidate key.
49
4.
EMPS(ENAME, SALARY)
MANAGERS(ENAME)
DEPTS(DNAME, DEPT#)
SUPPLIERS(SNAME, SADDR)
ITEMS(INAME, ITEM#)
ORDERS(0#, DATE)
CUSTOMERS (CNAME, CADDR, BALANCE)
WORKSJN(ENAME, DNAME)
MANAGES(ENAME, DNAME)
CARRIES(INAME, DNAME)
SUPPLIES(SNAME, INAME, PRICE)
INCLUDES(0#, INAME, QUANTITY)
PLACED-BY(O#, CNAME)
Figure 2.5 Table of relations and keys.
When two relations have a candidate key in common, we can combine the
attributes of the two relation schemes and replace the two relations by one
whose set of attributes is the union of the two sets. One advantage to doing
so is that we save the storage space needed to repeat the key values in the two
relations. A second is that queries talking about attributes of the two relations
can sometimes be answered more quickly if the two relations are combined.
50
Example 2.12: Relations DEPTS and MANAGES from Figure 2.5 each have
DNAME as a candidate key; in one case it is the primary key and in the other
not. We may thus replace DEPTS and MANAGES by one relation
DEPTS(DNAME, DEPT#, MGR)
Notice that we have decided to call the new relation DEPTS. The attributes
DNAME and DEPT# are intended to be the same as the attributes of the same
name in the old DEPTS relation, while MGR is intended to be the attribute
ENAME from MANAGES. There is nothing wrong with changing the names
of attributes, as long as we carry along their intuitive meaning.
In Figure 2.6(a) we see two possible instances for the old relation DEPTS
and MANAGES. Figure 2.6(b) shows them combined into one relation, the new
DEPTS. Notice that the twelve entries in the two relations have been combined
into nine in the single relation, saving a small amount of space. Also, a query
like "what is the number of the department that Harry Hamhock manages?"
can be answered by consulting the one relation in Figure 2.6(b), while in the
database of Figure 2.6(a) we would have to combine the two relations by a
possibly expensive operation called the join, discussed in the next section. D
DNAME
DEPT#
Produce
Cheese
Meat
12
31
5
ENAME
Esther Eggplant
Larry Limburger
Harry Hamhock
DNAME
Produce
Cheese
Meat
MANAGES
DEPTS
(a) Old relations.
DNAME
DEPT#
MGR
Produce
Cheese
Meat
12
31
5
Esther Eggplant
Larry Limburger
Harry Hamhock
Dangling Tuples
When we combine two or more relations like those in Example 2.12, there is a
problem that must be overcome, a problem that if not solved or denned away
51
prevents us from combining the relations despite the advantages to doing so. In
Example 2.12 we made the hidden assumption that the set of departments was
the same in both relations DEPTS and MANAGES. In practice that might not
be the case. For example, suppose the YVCB has a Wine department, whose
number is 16, but that temporarily has no manager. Then we could add the
tuple (Wine, 16) to the old DEPTS relation, in Figure 2.6(a), but there seems
to be no way to add a tuple to the new DEPTS in Figure 2.6(b), because such
tuples need some value for the MGR attribute. Similarly, if Tanya Truffle were
appointed to head the new Gourmet department, but we had not yet assigned
that department a number, we could insert our new fact into MANAGES, but
not into the DEPTS relation of Figure 2.6(b).
Tuples that need to share a value with a tuple in another relation, but find
no such value, are called dangling tuples. One possible way to avoid the problem
of dangling tuples is to add to the database scheme information about existence
constraints, that is, conditions of the form "if a value appears in attribute A of
some tuple in relation R, then v must also appear in attribute B of some tuple
in relation 5." For example, if we guaranteed that every department appearing
in the DNAME attribute of the old DEPTS appeared in the DNAME field
of MANAGES, and vice-versa, then this problem of dangling tuples would be
defined away. We would thus be free to combine the two relations, knowing
that no information could be lost thereby.
Of course these existence constraints put some severe limitations on the
way we insert new tuples into the two relations of Figure 2.6(a) or the one
relation of Figure 2.6(b). In either case, we cannot create a new department
name, number, or manager without having all three. If that is not satisfactory,
we also have the option of storing null values in certain fields. We shall represent
a null value by _L. This symbol may appear as the value of any attribute that
is not in the primary key,5 and we generally take its meaning to be "missing
value." When looking for common values between two or more tuples, we do
not consider two occurrences of to be the same value; i.e., each occurrence
of is treated as a symbol distinct from any other symbol, including other
occurrences of J..
Example 2.13: If we added the Wine department and added manager Truffle
of the Gourmet department, we could represent this data with null values in
the relation DEPTS of Figure 2.6(b) by the relation of Figure 2.7. D
If we assume that problems of dangling tuples are defined away by existence
constraints or handled by allowing nulls in nonkey attributes, then we can
combine relations whenever two or more share a common candidate key.
5 In Chapter 6 we discuss storage structures for relations, and we shall then see why null
values in the primary key often cause significant trouble.
52
DNAME
DEPT#
MGR
Produce
Cheese
Meat
Wine
Gourmet
12
31
5
16
J_
Esther Eggplant
Larry Limburger
Harry Hamhock
1
Tanya Truffle
1, 8
2, 3, 9
4
5, 10
6, 13
7
11
12
53
with EMPS and WORKSJN. However, what is the justification for combining
MANAGERS with DEPTS, with which it does not even share an attribute, let
alone a key? In explanation, recall that MANAGES is a one-to-one relation
ship between ENAME (representing managers) and DNAME. Hence, these two
attributes are in a sense equivalent, and we may regard MANAGERS as if its
attribute were DNAME rather than ENAME.
There is, however, one special problem with our choice to combine relations
in this way. Even with nulls, we cannot handle all situations with dangling
tuples. For example, if there were a manager m mentioned in MANAGERS,
but not in MANAGES, we would want to insert into the new DEPTS relation a
tuple (-L, -L, m). But this tuple would have a null value in the key, DEPTS, and
as we mentioned, there are reasons concerning the physical storage of relations
why this arrangement is frequently not acceptable.
Incidentally, one might wonder why one cannot further combine relations
like SUPPLIES and SUPPLIERS, since the key of the latter is a subset of the
key of the former. The reason is that in a combined relation with attributes
SNAME, SADDR, INAME, and PRICE, we would find that each supplier's ad
dress was repeated once for each item that the supplier sold. That is not a fatal
problem, but it does lead to wasted space and possible inconsistencies (Acme's
address might be different according to the tuples for two different items Acme
sells). The matter of relational database scheme design, called "normalization"
provides considerable intellectual mechanics that can be brought to bear on is
sues like whether SUPPLIES and SUPPLIERS should be combined; we develop
this theory in Chapter 7. D
2.4 OPERATIONS IN THE RELATIONAL DATA MODEL
In the previous section we introduced the mathematical notion of a relation,
which is the formalism underlying the relational data model. This section in
troduces the family of operations usually associated with that model. There
are two rather different kinds of notations used for expressing operations on
relations:
1 . Algebraic notation, called relational algebra, where queries are expressed
by applying specialized operators to relations, and
2. Logical notation, called relational calculus, where queries are expressed by
writing logical formulas that the tuples in the answer must satisfy.
In this section, we shall consider relational algebra only. This algebra
includes some familiar operations, like union and set difference of relations, but
it also includes some that probably are not familiar. Logical notations will
be introduced in Chapter 3, after we discuss logical languages for knowledgebase systems. One of the interesting facts about these notations for relational
databases is that they are equivalent in expressive power; that is, each can
54
express any query that the other can express, but no more.
Limitations of Relational Algebra
The first thought we might have on the subject of operations for the relational
model is that perhaps we should simply allow any program to be an operation
on relations. There might be some situations where that makes sense, but there
are many more where the advantages of using a well-chosen family of operations
outweigh the restrictions on expressive power that result. Recall from Section
1.2 that one important purpose of a conceptual scheme, and hence of its data
model, is to provide physical data independence, the ability to write programs
that work independently of the physical scheme used. If we used arbitrary
programs as queries, the programmer would
a)
b)
Have to know everything about the physical data structures used, and
Would have to write code that depended on the particular structure se
lected, leaving no opportunity for the physical structure to be tuned as we
learned more about the usage of the database.
55
2.
Provides a rich enough language that we can express enough things to make
database systems useful.
We mentioned, in Section 1.4, the approximate point at which the power of
relational languages fails; they cannot, in general, express the operation that
takes a binary relation and produces the transitive closure of that relation.
Another limitation we face in relational languages, less cosmic perhaps,
but of practical importance, is the finiteness of relations. Recall that we have
assumed relations are finite unless we explicitly state otherwise; this convention
is sound because infinite relations cannot be stored explicitly. The constraint
of finiteness introduces some difficulties into the definition of relational algebra
and other relational languages. For example, we cannot allow the algebraic
operation of complementation, since the complement of R is an infinite relation,
the set of all tuples not in R.
56
3.
4.
5.
same arity.
Cartesian product. Let R and S be relations of arity ki and k2, respectively.
Then R x S, the Cartesian product of R and 5, is the set of all possible
(ki + fc2)-tuples whose first fci components form a tuple in R and whose
last &2 components form a tuple in S.
Projection. The idea behind this operation is that we take a relation R,
remove some of the components (attributes) and/or rearrange some of the
remaining components. If R is a relation of arity k, we let iri,,ia,...,tm(fi),
where the ij's are distinct integers in the range 1 to fc, denote the projection
of R onto components ti, 12, , tm, that is, the set of m-tuples 0102 am
such that there is some fc-tuple 61 62 '1fc in R for which a, = 6^ for
j = 1, 2, . . . , m. For example, T^J (R) is computed by taking each tuple n
in R and forming a 2-tuple from the third and first components of n, in that
order. If R has attributes labeling its columns, then we may substitute at
tribute names for component numbers, and we may use the same attribute
names in the projected relation. Thus, if relation R is R(A, B, (7, D), then
KC,A(R) is the same as jr3,i(R), and the resulting relation has attribute C
naming its first column and attribute A naming its second column.
Selection. Let F be a formula involving
t) Operands that are constants or component numbers; component i is
represented by $t,
ii) The arithmetic comparison operators <, =, >, <, ^, and >, and
Hi) The logical operators A (and), V (or), and -' (not).
Then af(R) is the set of tuples n in R such that when, for all t, we
substitute the tth component of /i for any occurrences of $i in formula
F, the formula F becomes true. For example, <7$2>$3(-#) denotes the set
of tuples /, in H such that the second component of /i exceeds its third
component, while <7$i='smith'v$i='Jones'(-#) is the set of tuples in R whose
first components have the value 'Smith' or 'Jones'. As with projection, if
a relation has named columns, then the formula in a selection can refer to
columns by name instead of by number.
A
a
d
c
b
a
b
c
f
d
b
d
g
a
(a) Relation R
(b) Relation 5
57
Example 2.15: Let R and 5 be the two relations of Figure 2.9. In Figure
2.10(a) and (b), respectively, we see the relations R U S and R - S. Note
that we can take unions and differences even though the columns of the two
relations have different names, as long as the relations have the same number
of components. However, the resulting relation has no obvious names for its
columns. Figure 2.10(c) shows R x 5. Since R and 5 have disjoint sets of
attributes, we can carry the column names over to R x 5. If R and 5 had
a column name in common, say G, we could distinguish the two columns by
calling them R.G and S.G. Figure 2.10(d) shows irA,c(R), and Figure 2.10(e)
Shows &B=b(R)- D
a
d
c
b
b
a
b
g
fd
a
(a) R U 5
A
a
a
d
d
c
c
(b) R-S
B
b
b
a
a
b
b
c
c
f
f
d
d
b
d
b
d
b
d
g
a
g
a
g
a
a
f
a
f
a
f
(c) R x 5
a
d
c
fd
A
a
c
B
b
b
C
c
d
(e) aB=b(R)
Figure 2.10 Results of some relational algebra operations.
58
r-.((TxS)-R)
V is the set of (r s)-tuples 01, . . . , a?-, that are the first r - s components
of some tuple in fl, and for some s-tuple ar_a+i, . . . , ar in 5, 01, . . . , ar is not
in R. Hence, T - V is R -j- 5. We can write R -~ S as a single expression in
relational algebra by replacing T and V by the expressions they stand for. That
is,
R -5- S = iri'2
,-.() - 7r,,3,,..,f-.((lT1,2
r-.(R) X S) - R)
Example 2.16: Let R and 5 be the relations shown in Figure 2.11(a) and (b).
Then R-rS is the relation shown in Figure 2.11(c). Tuple ab is in R-rS because
abcd and abef are in R, and tuple ed is in R + S for a similar reason. Tuple bc,
which is the only other pair appearing in the first two columns of R, is not in
R -r 5 because bccd is not in R. D
o b
a b
6 c
e d
e d
a b
c
e
e
c
e
d
d
f
f
d
f
e
(a) Relation R
(b) Relation 5
(c) R + S
59
Join
The 9-join of R and 5 on columns t and j, written R M 5, where 0 is an arithi8j
m,,tii comparison operator (=, <, and so on), is shorthand for <7$ie$(T.+j)(flx 5),
if fl is of arity r. That is, the #-join of R and S is those tuples in the Cartesian
product of R and 5 such that the ith component of R stands in relation 0 to
the jth component of S. If 9 is =, the operation is often called an equijoin.
Example 2.17: Let R and S be the relations given in Figure 2.12(a) and (b).
Then R tx\ S is given in Figure 2.12(c). As with all algebraic operations,
B<D
when columns have names we are free to use them. Thus ix] is the same as
B<D
x in this case.
2<1
Incidentally, note that tuple (7,8,9) of R does not join with any tuple of
5, and thus no trace of that tuple appears in the join. Tuples that in this
way fail to participate in a join are called dangling tuples. Recall that we first
met the notion of "dangling tuples" in the previous section, when we discussed
combining relations with a common key. What we tacitly assumed there was
that the correct way to form the combined relation was to take the equijoin
in which the key attributes were equated. There is a good reason for this
assumption, which we shall cover in Chapter 7. D
A
1
4
7
2
5
8
C
3
6
9
(a) Relation R
(b) Relation 5
1
1
4
2
2
5
3
3
6
3
6
6
1
2
2
(c) R ex 5
B<D
Natural Join
The natural join, written R ixi S, is applicable only when both R and S have
columns that are named by attributes. To compute R >i S we
1. Compute R x 5.
2. For each attribute A that names both a column in R and a column in 5
select from R x S those tuples whose values agree in the columns for R.A
and S.A. Recall that R.A is the name of the column of R x S corresponding
to the column A of R, and 5.A is defined analogously.
60
3.
For each attribute A above, project out the column S.A, and call the
remaining column, R.A, simply A.
Formally then, if A\, A2, . . . , AI, are all the attribute names used for both R
and 5, we have
Rtx} S = Kit,i1,...,im(rR.Ai=S.AiA-AR.Ak=S.Ak(R x S)
where ii,t2, . . . ,im is the list of all components of R x 5, in order, except the
components S.A\, . . . , S.Ai,.
Example 2.18: Let R and 5 be the relations given in Figure 2.13(a) and (b).
Then
R CXI 5 = KA,R.B,R.C,D<7R.B=S.BAR.C=S.C(R x S)
d
b
c
b
b
c
c
f
d
(a) Relation R
c
c
a
a
b
b
b
b
c
c
c
c
d
d
e
d
e
b
(b) Relation S
d
d
c
e
b
(c) R txi S
Semyoin
Note the useful convention that a relation name, such as R, stands for the
set of attributes of that relation in appropriate contexts, such as in the list
of attributes of a projection. In other contexts, R stands for the value of the
relation R. An equivalent way of computing R X S is to project S onto the
set of attributes that are common to R and S, and then take the natural join
61
of R with the resulting relation. Thus, an equivalent formula for the semijoin
is R X S R txi KRns(S), as the reader may prove. Note that the semijoin is
not symmetric; i.e., RX S / 5 X R"m general.
Example 2.19: Let R and 5 be the relations of Figure 2.13(a) and (b), respec
tively. Then R X S is the projection of Figure 2.13(c) onto attributes A, B,
and C, that is, the relation of Figure 2.14(a). Another way to obtain the same
result is first to project 5 onto {B,C}, the attributes shared by the relation
schemes for R and 5, yielding the relation of Figure 2.14(b), and then joining
this relation with R. D
a
d
c
(a)
b
b
b
a
c
d
c
a d
RXS
(b)
a
a
d
d
b
b
c
c
(c) S(E,F,G)t*S(G,H,I).
Figure 2.14 Joins and semijoins.
When we use the natural join and semijoin operations, the attributes of
the relations become crucial; that is, we need to see relations from the setof-mappings viewpoint rather than the set-of-lists viewpoint. Thus, to make
explicit what attributes we are assuming for a relation R, we shall write
R(A\,...,An) explicitly. We can even use the same relation in the mathe
matical sense as several arguments of a join with different attributes assigned
to the columns in different arguments.
For example, suppose we had relation 5 of Figure 2.13(b), but ignored
the attributes B, C, and D for the moment. That is, see 5 only as a ternary
relation with the three tuples bcd, bce, and adb. The natural join
S(E,F,G)t*S(G,H,I)
takes this relation and joins it with itself, as an equijoin between the third
column of the first copy and the first column of the second copy. The only
value these columns have in common is 6, so the result would be the relation of
Figure 2.14(c).
62
63
(2.1)
That is, the selection focuses us on those tuples that talk about item "Brie,"
and the projection lets us see only the supplier name from those tuples. The
algebraic expression (2.1) thus evaluates to a relation of arity 1, and its value
will be a list of all the suppliers of Brie.
6 By Theorem 2.2, the order in which the flj's are joined does not affect the result and
need not be specified.
64
If we wanted to know what items supplier "Acme" sells for less than $5,
and the prices of each, we could write:
7rlNAME,PRICE(^SNAME='Acme'APRICE<5.0o(SUPPLIES))
D
Navigation Among Relations
Many more queries are expressed by navigating among relations, that is, by
expressing connections among two or more relations. It is fundamental to the
relational data model that these connections are expressed by equalities (or
sometimes inequalities) between the values in two attributes of different rela
tions. It is both the strength and weakness of this model that connections are
expressed that way. It allows more varied paths among relations to be followed
than in other data models, where, as we shall see in the next sections, particular
pathways are favored by being "built in" to the scheme design, but other paths
are hard or impossible to express in the languages of these models.
Example 2.21: We again refer to the relations of Figure 2.8. Suppose we wish
to determine which customers have ordered Brie. No one relation tells us, but
INCLUDES gives us the order numbers for those orders that include Brie, and
ORDERS tells us for each of those order numbers what customer placed that
order. If we take the natural join of INCLUDES and ORDERS, we shall have
a relation with set of attributes
(2.2)
D
Efficiency of Joins
Join is generally the most expensive of the operations in relational algebra.
The "compare all pairs of tuples" method for computing the join, suggested
by Example 2.18, is not the way to compute the join in reality; it takes O(n2)
7 We would not generally wish to have a relation with this set of attributes in the database
scheme because of redundancy of the type discussed in Example 2.14. That is, we would
have to repeat the date and customer for every item on the given order.
65
time on relations of size (number of tuples) n.8 We shall discuss other methods
in Chapter 11, but for the moment, observe one way to compute equijoins or
natural joins is to sort both relations on the attribute or attributes involved
in the equality, and then merge the lists, creating tuples of the join as we go.
This approach takes time O(m + nlogn) on relations of size n, where m is
the number of tuples in the result of the join. Typically m is no more than
O(nlogn), although it could be as high as n2.
However, a better idea in many cases is to avoid doing joins of large rela
tions at all. By transforming an algebraic expression into an equivalent one that
can be evaluated faster, we frequently save orders of magnitude of time when
answering queries. It is the development of such query optimization techniques
that made DBMS's using the relational model feasible.
Example 2.22: Consider expression (2.2) from Example 2.21. Rather than
compute the large relation INCLUDES ex ORDERS, then reduce it in size by
selection and projection, we prefer to do the selection, and as much of the
projection as we can as soon as we can. Thus, before joining, we select on the
INCLUDES relation for those tuples with INAME = 'Brie' and then project
this relation onto O#, to get only the set of order numbers that include Brie.
Presumably, this will be a much smaller set than the entire INCLUDES relation.
Now we would like to select the tuples in ORDERS with this small set of
order numbers, to get the desired set of customers. The whole process can be
expressed by a semijoin:
TTCUST (ORDERS tx <71NAME='Brie'(INCLUDES))
(2.3)
allows us to use a simple directed graph model for data. In place of entity sets,
the network model talks of logical record types.10 A logical record type is a
name for a set of records, which are called logical records. Logical records are
composed of Reids, which are places in which elementary values such as integers
and character strings can be placed. The set of names for the fields and their
types constitute the logical record format.
Record Identity
One might suppose there is a close analogy between these terms for networks
and the terms we used for relations, under the correspondence
Logical record format : Relation scheme
Logical record
: Tuple
Logical record type
: Relation name
However, there is an important distinction between tuples of relations and
records of a record type. In the value-oriented relational model, tuples are
nothing more than the values of their components. Two tuples with the same
values for the same attributes are the same tuple. On the other hand, the net
work model is object-oriented, at least to the extend that it supports object
identity. Records of the network model may be viewed as having an invisible
key, which is in essence the address of the record, i.e., its "object identity."
This unique identifier serves to make records distinct, even if they have the
same values in their corresponding fields. In fact, it is feasible to have record
types with no fields at all.
The reason it makes sense to treat records as having unique identifiers,
independent of their field values, is that physically, records contain more data
than just the values in their fields. In a database built on the network model
they are given physical pointers to other records that represent the relationships
in which their record type is involved. These pointers can make two records
with the same field values different, and we could not make this distinction if
we thought only of the values in their fields.
Links
Instead of "binary many-one relationships" we talk about links in the network
model. We draw a directed graph, called a network, which is really a simplified
entity-relationship diagram, to represent record types and their links. Nodes
correspond to record types. If there is a link between two record types TI and
T2, and the link is many-one from TI to T2, then we draw an arc from the node
10 We drop the word "logical" from "logical record," or "logical record type/format" when
ever no confusion results.
67
for TI to that for T2,n and we say the link is from T\ to T2. Nodes and arcs
are labeled by the names of their record types and links.
Representing Entity Sets in the Network Model
Entity sets are represented directly by logical record types; the attributes of an
entity set become fields of the logical record format. The only special case is
when an entity set E forms its key with fields of some entity set F, to which E is
related through relationship 72. We do not need to place those fields of F in the
record format for E, because the records of E do not need to be distinguished
by their field values. Rather, they will be distinguished by the physical pointers
placed in the records of E to represent the relationship H, and these pointers
will lead from a record e of type E to the corresponding record of type F that
holds the key value for e.
Alternatively, when the relationship concerned is isa, and the subset has
no field that the superset does not have, (as between MANAGERS and EMPS
in Figure 2.2), we could eliminate the record type for the subset, e.g. MAN
AGERS, altogether, and let the relationships between MANAGERS and other
entity sets (besides EMPS) be represented in the network model by links in
volving EMPS. The isa relationship itself could be represented by a one-bit
field telling whether an employee is a manager. Another choice is to represent
the isa implicitly; only EMPS records that represent managers will participate
in relationships, such as MANAGES, that involve the set of managers.
Representing Relationships
Among relationships, only those that are binary and many-one (or one-one as
a special case) are representable directly by links. However, we can use the
following trick to represent arbitrary relationships. Say we have a relationship
R among entity sets E\,E^, . .. ,Ek- We create a new logical record type T
representing fc-tuples (e\,e2, . . . , efc) of entities that stand in the relationship
R. The format for this record type might be empty. However, there are many
times when it is convenient to add information-carrying fields in the format for
the new record type T. In any event, we create links LI, L2, . . . , Lfc. Link Lj
is from record type T to the record type for entity set Ei, which we shall also
call Ef. The intention is that the record of type T for (ei,e2, ,k) is linked
to the record of type Ei for Ci, so each link is many-one.
As a special case, if the relationship is many-one from EI,..., Ek-i to Ek,
and furthermore, the entity set Ek does not appear in any other relationships,
Some works on the subject draw the arc in the opposite direction. However, we chose
this direction to be consistent with the notion of functional dependency discussed in
Chapter 7. Our point of view is that arrows mean "determines uniquely." Thus, as each
record of type T1 is linked to at most one record of type Tj, we draw the arrow into Tj.
68
then we can identify the record type T with Ek, storing the attributes of E^
in T. For example, the relationship SUPPLIES of Figure 2.2 is many-one from
SUPPLIERS and ITEMS to PRICE, and PRICE participates in no relationship
but this one. We may therefore create a type T with links to ITEMS and
SUPPLIERS, and containing PRICE as a field. We shall discuss this matter
further when we convert the full entity-relationship diagram of Figure 2.2 to a
network, in Example 2.24. For the moment, we consider a simpler example.
Example 2.23: We mentioned in Section 2.1 a common example of a purely
many-many relationship, that between courses and students with the intended
meaning that the student is taking the course. To represent this relationship in
the network model, we would use two entity sets, COURSES and STUDENTS,
each with appropriate fields, such as
COURSES(DEPT, NUMBER, INSTRUCTOR)
STUDENTS(ID#, NAME, ADDRESS, STATUS)
To represent the relationship between these entity sets, we need to in
troduce a new record type, say ENROLL, that represents single pairs in the
relationship set, i.e., one course and one student enrolled in that course. There
might not be any fields in ENROLL, or we might decide to use ENROLL records
to store information that really does refer to the pair consisting of a course and
a student, e.g., the grade the student receives in the course, or the section in
which the student is enrolled. Thus, we might use record format
ENROLL(SECTION, GRADE)
Notice that two or more enrollment records may look the same, in the sense
that they have the same values in their SECTION and GRADE fields. They
are distinguished by their addresses, i.e., by their "object identity."
We also need two links, one from ENROLL to COURSES, which we shall
call E-COURSE, and one from ENROLL to STUDENTS, which we shall call
E-STUDENT. The network for these record types and links is shown in Figure
2.15(a).
The link E-COURSE associates with each ENROLL record a unique
COURSES record, which we take to be the course in which the enrollment is
made. Likewise, EJ3TUDENT associates with each ENROLL record a unique
STUDENTS record, that of the student who is thereby enrolled. As we shall
discuss in Chapter 5, when we consider the DBTG network language in detail,
the notion of ownership is used to help describe the relationship enforced by
the links. If a link, such as E-STUDENT is from ENROLL to STUDENTS,
then each student record is said to own the enrollment records which the link
associates to that student.
In Figure 2.15(b) we see a simple example of three COURSES records, five
ENROLL records, and four STUDENT records. The ENROLL records each
COURSES
E-COURSE
ENROLL
E-STUDENT
STUDENTS
(a) The network.
EE200
E-COURSE
5;|3|A
E-STUDENT
L/I/VXI
Nerd
(b) Physical connections representing links.
show fields for the section and grade; the fields of STUDENTS and COURSES
are not shown. The unique identifiers for ENROLL records, which are in essence
addresses, are shown as integers outside the records. The fact that records 1 and
4 have identical field values is of no concern. Evidently, they are distinguished
by the differences in their links. For example, ENROLL record 1 represents
only the fact that student Grind is enrolled in CS101.
We can say that the record for Grind owns ENROLL records 1 and 2.
Weenie owns 4 and 5, while Jock owns no enrollment records. It is also true
that CS101 owns ENROLL records 1 and 3. There is no conflict with the fact
that Grind also owns record 1, because their ownership is through different
links. That is, Grind is the owner of 1 according to the E-STUDENT link, and
CS101 the owner of that record according to the E-COURSE link. D
Example 2.24: Let us design a network for the YVCB database scheme whose
70
entity-relationship diagram was given in Figure 2.2. We start with logical record
types for the six entity sets that remain after excluding MANAGERS, which
as we mentioned above, can be represented by the logical record type for its
superset, EMPS. Thus, we have logical record formats:
EMPS(ENAME, SALARY)
DEPTS(DNAME, DEPT#)
SUPPLIERS(SNAME, SADDR)
ITEMS(INAME, ITEM#)
ORDERS(O#, DATE)
CUSTOMERS(CNAME, CADDR, BALANCE)
These are, except for MANAGERS, the same as the relations we started with
initially in Example 2.9.
We need two more record types, because two of the relationships, SUP
PLIES and INCLUDES, are not binary, many-one relationships. Let us use
record type ENTRIES to represent order-item-quantity facts. It makes sense
to store the quantity in the entry record itself, because the relationship IN
CLUDES is many-one from ORDERS and ITEMS to QUANTITY. Thus, we
need only links from ENTRIES to ITEMS and ORDERS, which we call EJTEM
and E-ORDER, respectively.
WORKSJN
DEPTS |*4
ITEMS
CARRIES
_
EJTEM
| ENTRIES
E-ORDER
71
for the same reason as was discussed above concerning QUANTITY. We shall
use OJTEM and OJ3UPPLIER, as the links from OFFERS to ITEMS and
SUPPLIERS, respectively. The last two record types for our network are thus:
ENTRIES(QUANTITY)
OFFERS (PRICE)
The relationships in Figure 2.2, other than SUPPLIES and INCLUDES, are
many-one and binary. Thus, they are directly representable by links. The only
special remark needed is that the relationship MANAGES, originally between
DEPTS and MANAGERS, will now be between DEPTS and EMPS, since we
agreed to use EMPS to represent managers. Since this relationship is one-one,
we could have it run in either direction, and we have chosen to have it run from
EMPS to DEPTS. The complete network is shown in Figure 2.16. D
72
only be used where, fortuitously, the attributes have the same name in the rela
tion schemes; real relational DBMS's do not generally support the natural join
directly, requiring it to be expressed as an equijoin, with the explicit equality
of values spelled out.
It is probably a matter of taste which style one prefers: cascade of functions
or equalities among values. However, there is one important advantage to the
relational model that doesn't depend upon such matters of diction. The result
of an operation on relations is a relation, so we can build complex expressions
of relational algebra easily. However, the result of operations on networks is
not generally a network, or even a part of one. It has to be that way, because
the invisible pointers and unique identifiers for records cannot be referred to
in network query languages. Thus, new networks cannot be constructed by
queries; they must be constructed by the data definition language. While we
can obtain some compounding of operations by following sequences of many
links in one query, we are limited, in the network model, to following those
links. Again, it is a matter of judgment whether the links we select for a
database scheme design are adequate to the task of supporting all the queries
we could reasonably wish to ask.
There is an additional distinction between the network and relational mod
els in the way they treat many-many relationships. In the network model, these
are forbidden, and we learned in Examples 2.23 and 2.24 how to replace manymany relationships by several many-one relationships. The reason framers of
the network model forbade many-many relationships is that there is really no
good way to store directly a many-many relationship between entity sets E and
F so that given an entity of E we can find the associated F's efficiently and
vice-versa. On the physical level, we are forced to build a structure similar to
that implied by the breakup of a many-many relationship into some many-one
relationships, although there are a substantial number of choices of structure
available.
Presumably, the authors of the network model took the position that
databases were so large that direct implementation of many-many relationships
always lead to unacceptable performance. In relational systems, the philoso
phy is to provide the database designer with a DDL in which he can create
the index structures needed to use a relation that is a many-many relationship
with adequate efficiency. However, it is also permissible, and indeed may be
preferable in small databases, to use a relation that is a many-many relationship
(i.e., it has no key but its full set of attributes) without the data structure that
supports efficient access.
2.6 THE HIERARCHICAL DATA MODEL
A hierarchy is simply a network that is a forest (collection of trees) in which
all links point in the direction from child to parent. We shall continue to use
73
the network terminology "logical record type," and so on, when we speak of
hierarchies.
Just as any entity-relationship diagram can be represented in the relational
and network models, such a diagram can always be represented in the hierar
chical model. However, there is a subtlety embodied in our use of the vague
term "represented." In the previous two models, the constructions used to con
vert entity-relationship diagrams had the property that relationships could be
followed easily by operations of the model, the join in the relational case and
link-following in the network case. The same is true in the hierarchical model
only if we introduce "virtual record types."
A Simple Network Conversion Algorithm
Let us first see what happens if we attempt to design hierarchies by simply
splitting networks apart into one or more trees. Recall that in a hierarchy,
all links run from child to parent, so we must start at a node with as many
incoming links as possible and make it the root of a tree. We attach to that
tree all the nodes that can be attached, remembering that links must point
to the parent. When we can pick up no more nodes this way, we start with
another, unattached node as root, and attach as many nodes to that as we can.
Eventually, each node will appear in the forest one or more times, and at this
point we have a hierarchy. The formal construction is shown in Figure 2.17.
procedure BUILD (n);
make n selected;
for each link from some node m to n do begin
make m a child of n;
if m is not selected then BUILD (m)
end
end
/* main program */
make all nodes unselected;
while not all nodes are selected do begin
pick an unselected node n;
/* prefer a node n with no links to unselected nodes,
and prefer a node with many incoming links */
BUILD (n)
end
Figure 2.17 Simple hierarchy-building procedure.
74
Example 2.25: Consider the network of Figure 2.16. DEPTS is a good can
didate to pick as the first root, because it has three incoming links, two from
EMPS and one from ITEMS. We then consider EMPS, but find it has no
incoming links. However, ITEMS has incoming links from ENTRIES and OF
FERS. These have no incoming links, so we are done building the tree with root
DEPTS. All the above mentioned nodes are now selected.
The remaining nodes with no outgoing links are CUSTOMERS and SUP
PLIERS. If we start with CUSTOMERS, we add ORDERS as a child and
ENTRIES as a child of ORDERS, but can go no further. From SUPPLIERS
we add OFFERS as a child and are done. Now, all nodes are selected, and we
are finished building the forest. The resulting forest is shown in Figure 2.18.
The only additional point is that of the two children of DEPTS that come
from node EMPS, we have changed one, that representing the manager of the
department, to MGR. D
DEPTS
EMPS MGR
ITEMS
ENTRIES OFFERS
CUSTOMERS
SUPPLIERS
ORDERS
OFFERS
ENTRIES
Figure 2.18 First attempt at a hierarchy for the YVCB database scheme.
Database Records
Hierarchies of logical record types, such as that in Figure 2.18, are scheme level
concepts. The instances of the database corresponding to a scheme consist of a
collection of trees whose nodes are records; each tree is called a database record.
A database record corresponds to some one tree of the database scheme, and
the root record of a database record corresponds to one entity of the root record
type. If T is a node of the scheme, and S is one of its children, then each record
of type T in the database record has zero or more child records of type 5.
Example 2.26: Figure 2.19(a) shows one database record for the DEPTS
tree of Figure 2.18. This database record's root corresponds to the Produce
Department, and it should be understood that the entire database instance
has database records similar to this one for each department. The instance
also includes a database record for each customer, with a structure that is an
expansion of the middle tree in Figure 2.18, and it includes a database record
for every supplier, with the structure implied by the rightmost tree of Figure
75
Mgr
Emp
Emp
Emp
Esther
Sharon George Arnold
Greed Avarice Eggplant
Sloth
76
For records representing entries and offers, we have indicated the unique iden
tifier that distinguishes each such record from all others of the same type; e.g.,
ENTRIES record 121 has QUANTITY 1. Recall that entries have only a quan
tity, and offers only a price as real data, and thus we cannot differentiate among
records of these types by field values alone. Other types of records must also
have unique identifiers, but we have not shown these because our assumptions
let us expect that records for departments, employees, and so on, are uniquely
identified by the values in their fields. As in networks, these unique identifiers
may be thought of as the addresses of the records. D
Record Duplication
As we may notice from Figure 2.18, certain record types, namely ENTRIES and
OFFERS, appear twice in the hierarchical scheme. This duplication carries over
to the instance, where an offer record by supplier s to sell item i appears both
as a child of the ITEMS record for i and as a child of the SUPPLIER record for
a. For example, OFFERS record 293 appears twice in Figure 2.19, and we can
deduce thereby that this offer is an offer by Ajax to sell lettuce at $.69. This
duplication causes several problems:
1. We waste space because we repeat the data in the record several times.
2. There is potential inconsistency, should we change the price in one copy of
the offer, but forget to change it in the other.
As we mentioned in Section 2.1, the avoidance of such problems is a recur
ring theme in database design. We shall see how the network model deals with
it in Chapter 5, via the mechanism known as virtual fields, and in Chapter 7 we
shall investigate the response of the relational model, which is the extensive the
ory of database design known as "normalization." In the hierarchical model,
the solution is found in "virtual record types" and pointers, which we shall
discuss immediately after discussing another reason such pointers are needed.
Operations in the Hierarchical Model
While the links in the network model were regarded as two-way, allowing us to
follow the link forward to the owner record or backward to the owned records,
in the hierarchical model, links are presumed to go only one way, from parent to
child, i.e., from owner to owned records. The reason for this difference will be
understood when we discuss the natural physical implementations of networks
and hierarchies in Chapter 6. For the moment, let us take for granted that one
can only follow links from parent to child unless there is an explicit pointer to
help us travel in the other direction.
For example, in Figure 2.19 we can, given a record for an item like lettuce,
find all its OFFERS children, but we cannot, given OFFERS record 293 deter
mine that it is a child of the lettuce ITEMS record, and therefore it is an offer
77
to sell lettuce at the price, $.69, given in record 293. If that is so, how could we
determine what items Ajax offers to sell? We can find the SUPPLIERS record
for Ajax, because another operation generally found in hierarchical systems is
the ability to find the root of a database record with a specified key, such as
"Ajax." We can then go from Ajax to all its offers. But how do we find what
items are offered for sale? In principle we can do so. Take a unique identifier
for an OFFERS record, say 293, and examine the entire collection of DEPTS
database records, until we find an item that has offer 293 as a child. How
ever, that solution is evidently too time consuming, and we need to augment
the hierarchical scheme by pointers that lead directly where we decide they are
needed.
78
EMPS Virtual
EMPS
ITEMS
ENTRIES OFFERS
^^
X^
CUSTOMERS
SUPPLIERS
ORDERS
Virtual
OFFERS
Virtual
ENTRIES
/
/
79
and then create a child of each that is a virtual version of the other.
Example 2.28: Reconsider Example 2.23, which discussed a many-many re
lationship between courses and students. Instead of creating a new record type
to interpose between COURSES and STUDENTS, as we did in that example,
in the hierarchical model we may create a scheme with the two trees of Figure
2.22.
COURSES
STUDENTS
\
,<
X
Virtual
STUDENTS
Virtual
COURSES
In Figure 2.23 we see an instance of the scheme of Figure 2.22; this instance
is the same one that was represented as a network in Figure 2.15. Given a course,
such as CS101, we can find the students enrolled as follows.
1.
2.
3.
Find the courses record for CS101. Recall that finding a root record, given
its key, is one of the typical operations of a hierarchical system.
Find all the virtual STUDENTS children of CS101. In Figure 2.23, we
would find pointers to the STUDENT records for Grind and Nerd, but at
this point, we would not know the names or anything about the students
to whose records we have pointers.
Follow the pointers to find the actual student records and the names of
these students.
Similarly, given a student, we could perform the analogous three steps and
find the courses the student was taking. D
80
CS101
\
to
Nerd
to
Grind
<*-
Grind
to
CS101
EE200
V
\
![
MATH40
J\
\
\
to
to
/
Grind \\ Weenie
\
1
(
\'
I
]
\
1 Nerd
Weenie |
\
\
4
to
Weenie
.--X
/
\
to
MATH40
Jock
to
CS101
1
1
1
to
MATH40
to
EE200
ENROLL(*COURSE, GRADE)
, NUMBER)
^STUDENTS
81
If that is too inefficient, there are several other schemes we could use.
One thing we don't want to do is duplicate enrollments as children of both
STUDENTS and COURSES. However, we could use the scheme of Figure 2.25.
There, we can go directly from the CS101 record to its enrollments, and find
the grades directly. On the other hand, to find all the students taking CS101
we need to go first to ENROLL, then to STUDENTS, via two virtual record
pointers. In comparison, Figure 2.24 lets us go from courses to students in one
hop. Which is better depends on what paths are more likely to be followed.
If we were willing to pay the space penalty, we could even use both sets of
pointers. D
STUDENTS(NAME, ADDR)
COURSES(DEPT, NUMBER)
f
I
\
*ENROLL
Example 2.30: In Figure 2.26 we see a better design for the YVCB database.
Entries, with their quantities, and offers, with their prices, are handled by
the trick of the previous example, using combined records. We have also added
virtual ORDERS as a child of ITEMS, to facilitate finding the orders for a given
item, and we have similarly added virtual SUPPLIERS as a child of ITEMS to
help find out who supplies a given item.
DEPTS
EMPS *EMPS/ /
/
*ORDERS
ITEMS
CUSTOMERS
SUPPLIERS
ORDERS
*ITEMS/
/ PRICE
- ^ *ITEMS/
X QUANTITY/"
82
Object Structure
The set of object structures definable in our model is very close to the set
of possible schemes for database records in the hierarchical model. We can
define the set of allowable object types, together with their intended physical
implementation, recursively by:
1. A data item of an elementary type, e.g., integer, real, or character string
of fixed or varying length, is an object type. Such a type corresponds to
the data type for a "field" in networks or hierarchies.
2. If T is an object type, then SETOF(T) is an object type. An object of
type SETOF(T) is a collection of objects of type T. However, since objects
must preserve their identity regardless of their connections to other objects,
we do not normally find the member objects physically present in the
collection; rather the collection consists of pointers to the actual objects
in the set. It is as though every child record type in a hierarchy were a
virtual record type, and every logical record type in the hierarchy were a
root of its own tree.
3.
83
Example 2.31: Let us translate our running example into the above terms.
We shall, for simplicity, assume that the only elementary types are string and
int. Then the type of an item can be represented by the record
ItemType = RECORDOF (name: string, I#:int)
Notice the convention that a field of a record is represented by the pair
(<fieldname>: <type>).
To handle orders, we need to represent item/quantity pairs, as we did in
Figure 2.26. Thus, we need another object type
IQType = RECORDOF (it em: ItemType, quantity:int)
Here, the first field is an object of a nonelementary type, so that field should
be thought of as a pointer to an item.
Now we can define the type of an order to be:
OrderType = RECORDOF (O#:int, includes :SETOF(IQType))
Here, we have embedded the definition of another object type, SETOF(/QType),
within the definition of OrderType. That is equivalent to writing the two
declarations:
SIQType = SETOF(IQType)
OrderType = RECORDOF (O#:int, includes: SIQType)
Either way, the field includes of OrderType is a representation of a set of
pointers to objects of type IQType, perhaps a pointer to a linked list of pointers
to those objects.
Customers can be represented by objects of the following type:
CustType = RECORDOF (name: string, addr: string,
balance : int , orders : SETOF (OrderType) )
while departments may be given the following declaration:
DeptType = RECORDOF (name: string, dept#:int,
emps : SETOF (EmpType) , mgr : EmpType ,
items : SETOF (ItemType) )
Notice that this declaration twice makes use of a type EmpType, for employees,
once as a set and once directly. In both cases, it is not the employees or man
ager of the department that appear there, but pointers to the actual employee
objects. Those objects have the following type:
84
The database scheme in Figure 2.27 is similar to, but not identical to
the scheme of Figure 2.26. For example, Figure 2.27 includes a pathway from
employees to their departments, since the field dept of EmpType is a pointer
to the department. However, in Figure 2.27 we do not have a way to get from
items to their orders or suppliers. There is nothing inherent in either model that
forces these structures. We could just as well chosen to add a virtual pointer
child of EMPS in Figure 2.26 that gave the department of the employee, and
in Figure 2.27 we could have added the additional pointers to item records by
declaring
85
Class Hierarchies
Another essential ingredient in the object model is the notion of subclasses
and hierarchies of classes, a formalization of "isa" relationships. There are two
common approaches to the definition of class hierarchies.
1. In addition to record and set constructors for types, allow a third con
structor, type union. Objects of type U(7\,T2) are either type T\ objects
or type TI objects.
2. Define a notion of subtype for given types.
The first approach is used in programming languages like C and Pascal. In
object-oriented database systems, it is preferable to use the second approach,
because
11 This book also uses the term "method" to refer to the body of an algorithm. The
meanings are not the same, but not altogether unrelated either. We trust no confusion
will result.
86
a)
EXERCISES
87
88
2.5: The beer drinkers database consists of information about drinkers, beers,
and bars, telling
i) Which drinkers like which beers,
it) Which drinkers frequent which bars.
iii) Which bars serve which beers.
Represent the scheme for the beer drinkers database in the (a) entityrelationship (b) relational (c) network (d) hierarchical (e) object models.
2.6: In Figure 2.29 we see the entity-relationship diagram of an insurance com
pany. The keys for EMPLOYEES and POLICIES are EMP# and P#,
respectively; SALESMEN are identified by their isa relationship to EM
PLOYEES. Represent this diagram in the (a) relational (b) network (c)
hierarchical (d) object models.
EMP# ) C NAME )
(SALARY
POLICIES
\
x'
BENEFICIARY) C NAME
Figure 2.29 An insurance company database.
2.7: Figure 2.30 shows a genealogy database, with key attributes NAME and
LIC#. The intuition behind the diagram is that a marriage consists of
two people, and each person is the child of a marriage, i.e., the marriage
of his mother and father. Represent this diagram in the (a) relational (b)
network (c) hierarchical (d) object models.
EXERCISES
LIC#
DATE
90
c)
QUANTITY WTPART-OF
2
Figure 2.31 Part hierarchy database.
We wish to design database schemes in various data models that represent
the information in Figure 2.31. It is desired that the scheme avoids redun
dancy and that it is possible to answer efficiently the following two types
of queries.
t) Given a part, find its subparts and the quantity of each (no recursion
is implied; just find the immediate subparts).
it) Given a part, find all the parts of which it is a subpart.
Design suitable schemes in the (a) relational (b) network (c) hierarchical
(d) object models.
* 2.11: Suppose we wish to maintain a database of students, the courses they have
taken, and the grades they got in these courses. Also, for each student, we
want to record his name and address; for each course we record the course
name and the department that offers it. We could represent the scheme in
various models, and we have several options in each model. Some of those
schemes will have certain undesirable properties, among which are
A) The inability to determine, given a student, what courses he has taken,
without examining a large fraction of the database.
B) The inability to determine, given a course, what students have taken
it, without examining a large fraction of the database.
EXERCISES
91
b)
with indices on STUDENT and COURSE that let us find the tuples
for a given student or a given course without looking at other tuples.
The relation schemes (COURSE, DEPT, GRADE) and
(COURSE, STUDENT, ADDR)
c)
d)
e)
f)
giving the name of a student, his address, and a grade. The network
has link CSG from SAG to COURSE, with the intent that a COURSE
record owns a set of SAG records (s,a, g), one for each student s that
took the course; a is the student's address and g is the grade he got
in the course.
The hierarchy of Figure 2.32(a).
The hierarchy of Figure 2.32(b).
The object model scheme that has an object of type SETOF(C<ype)
to represent courses and an object of type SETOF(Stype) to represent
students. These types are defined by:
Ctype = RECORDOF (name: string, students :SETOF(Stype))
Stype = RECORDOF (name: string, transcript :Ttype)
Ttype = SETOF (RECORDOF (course: Ctype, grade : string) )
* 2.12: We mentioned in Section 2.4 that two tables represent the same relation
if one can be converted to the other by permuting rows and/or columns,
provided the attribute heading a column moves along with the column. If
a relation has a scheme with m attributes and the relation has n tuples,
how many tables represent this relation?
2.13: Let R and S be the relations shown in Figure 2.33. Compute
a) R U 5.
b) R S (ignore attribute names in the result of union and difference).
92
COURSE(NAME, DEPT)
STUDENT(NAME, ADDR, GRADE)
(a) Hierarchy for Exercise 2.11(d).
COURSE(NAME,
\
STUDENT(NAME, ADDR)
GRADE
\
I
*COURSE
R t>o S.
*A(R).
aA=c(R x S).
S X R.
S -T {b,c} (note {6, c} is a unary relation, that is, a relation of arity
1)R txi 5 (take < to be alphabetic order on letters).
B<C
a
c
d
b
b
e
(a) R
be
e
a
b
d
(b) S
EXERCISES
93
94
* 2.20: Show how every (a) network scheme and (b) hierarchical scheme can be
translated into a collection of type definitions in the object model of Section
2.7, in such a way that traversing any link (in the network), or parent-tochild or virtual pointer (in the hierarchy) can be mimicked by following
pointers in the fields of objects.
* 2.21: Show how every object model scheme can be expressed as an entityrelationship diagram.
BIBLIOGRAPHIC NOTES
At an early time in the development of database systems, there was an estab
lished view that there were three important data models: relational, network,
and hierarchical. This perspective is found in Rustin [1974], Sibley [1976], and
the earliest edition of Date [1986], published in 1973; it is hard to support this
view currently, although these models still have great influence. Kerschberg,
Klug, and Tsichritzis [1977], Tsichritzis and Lochovsky [1982], and Brodie, Mylopoulos, and Schmidt [1984] survey the variety of data models that exist.
Bachman [1969] is an influential, early article proposing a data model, now
called "Bachman diagrams." CODASYL [1971] is the accepted origin of the
network model, and Chen [1976] is the original paper on the entity-relationship
model.
The Relational Model
The fundamental paper on the relational model, including the key issues of re
lational algebra and relational database design (to be discussed in Chapter 7),
is Codd [1970].
There are a number of earlier or contemporary papers that contain some
ideas of the relational model and/or relational algebra. The paper by Bosak et
al. [1962] contains an algebra of files with some similarity to relational algebra.
Kuhns [1967], Levien and Maron [1967], and Levien [1969] describe systems with
relational underpinnings. The paper by Childs [1968] also contains a discussion
of relations as a data model, while Filliat and Kranning [1970] describe an
algebra similar to relational algebra.
Extensions to the Relational Model
There is a spectrum of attempts to "improve" the relational model, ranging
from introduction of null values, through structures that are closer to objectoriented models than they are to the value-oriented relational model.
Attempts to formalize operations on relations with null values have been
made by Codd [1975], Lacroix and Pirotte [1976], Vassiliou [1979, 1980], Lipski
[1981], Zaniolo [1984], Imielinski and Lipski [1984], Imielinski [1986], Vardi
[1986], and Reiter [1986].
BIBLIOGRAPHIC NOTES
95
Some languages more powerful than relational algebra, for use with the
relational model, have been considered by Aho and Ullman [1979], Cooper
[1980], and Chandra and Harel [1980, 1982, 1985]. The complexity of such
languages, i.e., the speed with which arbitrary queries can be answered, is
discussed by Vardi [1982, 1985].
Some early attempts to enhance the relational model involve providing
"semantics" by specializing the roles of different relations. Such papers include
Schmid and Swenson [1976], Furtado [1979], Codd [1979], Sciore [1979], and
Wiederhold and El Masri [1980].
Object-Oriented Models
There is a large family of "semantic" data models that support object-identity;
some of them also involve query languages with value-oriented features. Hull
and King [1987] is a survey of such models. The semantic model of Hammer
and McLeod [1981] and the functional model of Shipman [1981] are early efforts
in this direction. More recent efforts are found in Abiteboul and Hull [1983],
Heiler and Rosenthal [1985] and Beech [1987].
The paper by Bancilhon [1986] is an attempt to integrate an object-oriented
data model with logic programming, but although it supports abstract data
types, it finesses object-identity.
Complex Objects
The fundamental paper on complex objects, built from aggregation (record
formation) and generalization (type hierarchies) is Smith and Smith [1977].
Notations for complex objects have been developed in Hull and Yap [1984],
Kuper and Vardi [1984, 1985], Zaniolo [1985], Bancilhon and Koshafian [1986],
and Abiteboul and Grumbach [1987].
Minsky and Rozenshtein [1987] present a scheme for defining class hierar
chies, including collections of classes that do not form a tree, but rather a class
can have several incomparable superclasses.
There is also a family of papers that build complex objects in a valueoriented context, chiefly by allowing attributes of relations to have types with
structure. These are called "non-first-normal-form" relations, following Codd
[1970], who called a relation "in first-normal-form" if the types of attributes
were elementary types, e.g., integers. Papers in this class include Jaeschke and
Scheck [1982], Fischer and Van Gucht [1984], Roth, Korth, and Silberschatz
[1984], Ozsoyoglu and Yuan [1985], and Van Gucht and Fischer [1986].
Notes on Exercises
A solution to Exercise 2.14 can be found in Aho and Ullman [1979]. A result
on operator independence similar to Exercise 2.15 was proved by Beck [1978].
CHAPTER 3
Logic
as a
Data Model
97
general, rules define the true instances of certain predicates, boss in this case,
in terms of certain other predicates that are defined by database relations, e.g.,
manages.
There are three alternative ways to define the "meaning" of rules. In sim
ple cases, such as the one above, all these methods yield the same answer. As
we permit more complicated kinds of logical rules, we are faced with different
approaches that result in different answers, because logical rules, being declara
tive in nature, only state properties of the intended answer. In hard cases, there
is no guarantee that a unique answer is defined, or that there is a reasonable
way to turn the declarative program into a sequence of steps that compute the
answer.
Proof-Theoretic Interpretation of Rules
The first of the three interpretations we can give to logical rules is that of axioms
to be used in a proof. That is, from the facts in the database, we see what other
facts can be proved using the rules in all possible ways. This interpretation is
the one we gave to the rules in Example 1.12, where we showed that the boss
facts that could be proved from the rules (1) and (2) above, plus a given set
of manages facts were exactly what one would expect if "boss" were given the
interpretation "somewhere above on the management hierarchy."
In simple cases like Example 1.12, where all the axioms are if then
rules, and there are no negations in the rules or the facts, then it is known
that all facts derivable using the rules are derivable by applying the rules as we
did in that example. That is, we use the rules only by substituting proved or
given facts in the right side and thereby proving the resulting fact on the left.1
It turns out that when there are negations, the set of provable facts often is
not what we intuitively want as a meaning for the logical rules anyway. Thus,
we shall here define the "proof-theoretic meaning" of a collection of rules to
be the set of facts derivable from given, or database facts, using the rules in
the "forward" direction only, that is, by inferring left sides (consequents, or
conclusions) from right sides (antecedents or hypotheses).
Model-Theoretic Interpretation of Rules
In this viewpoint, we see rules as defining possible worlds or "models." An
interpretation of a collection of predicates assigns truth or falsehood to ev
ery possible instance of those predicates, where the predicates' arguments are
chosen from some infinite domain of constants. Usually, an interpretation is
1 Note that if there are negations in our axioms or facts, this statement is false. For
example, if we have rule q :- p and the negative fact -'q, we can derive -'p by applying
the rule "backwards, i.e., given that the left side is false we can conclude that the right
side is false.
98
99
that MI does not have this property. For example, we could change p(3) from
true to false in MI and still have a model.
Moreover, M2 is the unique minimal model consistent with the database
{r(l)}. This model also happens to be what we get if we use the proof-theoretic
definition of meaning for rules. That is, starting with the rules (1) and (2) of
Example 3.1 and the one fact r(l), we can prove q(l), p(l), and no other
predicate instances. These happy coincidences will be seen true for "datalog"
rules in general, as long as they do not involve negation. When negation is
allowed as an operator, we shall see in Section 3.6, then_there need not be a
unique minimal model, and none of the minimal models necessarily corresponds
to the set of facts that we can prove using the rules. For some rules we can get
around this problem by defining a preferred minimal model, but in general, the
issue of what sufficiently complicated rules mean gets murky very quickly.
Computational Definitions of Meaning
The third way to define the meaning of logical rules is to provide an algorithm
for "executing" them to tell whether a potential fact (predicate with constants
for its arguments) is true or false. For example, Prolog defines the meaning of
rules this way, using a particular algorithm that involves searching for proofs
of the potential fact. Unfortunately, the set of facts for which Prolog finds a
proof this way is not necessarily the same as the set of all facts for which a
proof exists. Neither is the set of facts Prolog finds true necessarily a model.
However, in many common cases, Prolog will succeed in producing the unique
minimal model for a set of rules when those rules are run as a Prolog program.
In this book, we shall take another approach to treating rules as computa
tion. We shall translate rules into sequences of operations in relational algebra,
and for datalog rules without negation, we can show that the program so pro
duced always computes the unique minimal model and (therefore) the set of
facts that can be proved from the database. When negation is allowed, we
shall consider only a limited case called "stratified" negation, and then we shall
show that what our program produces is a minimal model, although it is not
necessarily the only minimal model. There is, however, some justification for
selecting our minimal model from among all possible minimal models.
Comparison of "Meanings"
We might naturally ask which is the "best" meaning for a logic program. A
logician would not even take seriously the computational meaning of rules, but
for those wishing to implement knowledge-base systems, efficient computation
is essential. We cannot use logical rules as programs unless we have a way of
computing their consequences, and an efficient way of doing so, at that.
On the other hand, a purely operational definition of meaning for rules,
100
"the program means whatever it is that this interpreter I've written does," is
not acceptable either. We don't have a preference between the proof-theoretic
and model-theoretic meanings, as long as these meanings are reasonably clear
to the user of the logic-based language. In practice, it seems that the modeltheoretic approach lets us handle more powerful classes of rules than the prooftheoretic approach, although we shall start out with the proof-theoretic meaning
in Section 3.3. Whichever meaning we choose, it is essential that we show its
equivalence to an appropriate computational meaning.
101
In the relational model, all relations are EDB relations. The capability to
create views (see Section 1.2) in models like the relational model is somewhat
analogous to the ability in datalog to define IDB relations. However, we shall
see in Chapter 4 that the view-definition facility in relational DBMS's does not
compare in power with logical rules as a definition mechanism.
Atomic Formulas
Datalog programs are built from atomic formulas, which are predicate symbols
with a list of arguments, e.g., p(A\, . . . , An), where p is the predicate symbol.
An argument in datalog can be either a variable or a constant. As mentioned
in Section 1.6, we use names beginning with lower case letters for constants and
predicate names, while using names beginning with upper case letters for vari
ables. We also use numbers as constants. We shall assume that each predicate
symbol is associated with a particular number of arguments that it takes, and
we may use p^ to denote a predicate of arity k.
An atomic formula denotes a relation; it is the relation of its predicate
restricted by
1. Selecting for equality between a constant and the component or compo
nents in which that constant appears, and
2. Selecting for equality between components that have the same variable.
For example, consider the YVCB database relations of Figure 2.8. The atomic
formula
customers(joe, Address, Balance)
represents the relation CT$1=joe(CUSTOMERS). Atomic formula
includes(X, Item, X)
denotes CT$1=$3(INCLUDES), that is, the tuples where the order number hap
pens to be equal to the quantity ordered.
Notice that although there are no names for attributes in the datalog model,
selecting suggestive variable names like Address help remind us what is going
on. However, as in relational algebra, we must remember the intuitive meaning
of each position in a list of arguments.
Built-in Predicates
We also construct atomic formulas with the arithmetic comparison predicates,
=, <, and so on; these predicates will be referred to as built-in predicates.
Atomic formulas with built-in predicates will be written in the usual infix no
tation, e.g., X < Y instead of <(X, Y). Other atomic formulas and their
predicates will be referred to as ordinary when a distinction needs to be made.
Built-in predicates do not necessarily represent finite relations. We could
102
think of X < Y as representing the relation of all tuples (x, y) such that x < y,
but this approach is unworkable because this set is infinite, and it is not even
clear over what domain x and y should be allowed to range. We shall therefore
require that whenever a rule uses an atomic formula with a built-in predicate,
any variables in that formula are limited in range by some other atomic formula
on the right side of the rule. For example, a variable might be limited by
appearing in an atomic formula with an EDB predicate. We shall then find that
built-in atomic formulas can be interpreted as selections on a single relation or
on the join of relations. The details will be given when we discuss "safe" rules
at the end of this section.
Clauses and Horn Clauses
A literal is either an atomic formula or a negated atomic formula; we denote
negated atomic formulas by -<p(A\, . . . , An) or p(A\, . . . , An). A negated atomic
formula is a negative literal; one that is not negated is a positive literal. A clause
is a sum (logical OR) of literals. A Horn clause is a clause with at most one
positive literal. A Horn clause is thus either
1. A single positive literal, e.g., p(X, Y), which we regard as a fact,
2. One or more negative literals, with no positive literal, which is an integrity
constraint, and which will not be considered in our discussion of datalog,
or
3. A positive literal and one or more negative literals, which is a ruie.
The reason Horn clauses of group (3) are considered rules is that they have
a natural expression as an inference. That is, the Horn clause
piV-VpnV9
(3.1)
is logically equivalent to p\ A A pn q. To see why, note that if none of the
p's are false, then to make (3.1) true, q is forced to be true. Thus, if pi, . . . ,pn
are all true (and therefore none of the p's are true), q must be true. If at least
one of the p's is false, then no constraint is placed on q; it could be true or false.
We shall follow Prolog style for expressing Horn clauses, using
q :- pi & & pn.
for the Horn clause pi A A pn q. We call q the head of the rule and
pi& &pn the body. Each of the pj's is said to be a subgoal. A collection of
Horn clauses is termed a logic program.
When writing Horn clauses as implications, either in the style
pi A A pn - q
or in the Prolog style, variables appearing only in the body may be regarded
as quantified existentially within the body, while other variables are universally
quantified over the entire rule. For example, rule (1) in Figure 3.1 says "for all
103
104
105
106
107
Our first step is to examine the set of values that we may substitute for the
variables of a rule to make the body true. In proofs using the rule, it is exactly
these substitutions that let us conclude that the head, with the same substitu
tion, is true. Therefore, we define the relation for a rule r to have the scheme
X\, . . . , Xm, where the Xi's are the variables of the body of r, in some selected
order. We want this relation to have a tuple (01, . . . ,om) if and only if, when
we substitute oj for Xi, 1 < i < m, all of the subgoals become true.
More precisely, suppose that p\,...,pn is the list of all predicates appearing
in the body of rule r, and suppose PI , . . . , Pn are relations, where Pi consists of
all those tuples (ai, . . . , afc) such that p(ai, . . . , afc) is known to be true. Then
a subgoal 5 of rule r is made true by this substitution if the following hold:
t) If S is an ordinary subgoal, then 5 becomes p(b\, . . . ,bk) under this sub
stitution, and (&i,..., 6fc) is a tuple in the relation P corresponding to
Pii) If 5 is a built-in subgoal, then under this substitution 5 becomes Me, and
the arithmetic relation b0c is true.
Example 3.5: The following is an informal example of how relations for rule
bodies are constructed; it will be formalized in Algorithm 3.1, to follow. Con
sider rule (2) from Figure 3.1. Suppose we have relations P and S computed for
predicates parent and sibling, respectively. We may imagine there is one copy
of P with attributes X and Xp and another with attributes Y and Yp. We
suppose the attributes of 5 are Xp and Yp. Then the relation corresponding
to the body of rule (2) is
R(X, Xp, Y, Yp) = P(X, Xp) M P(Y, Yp) M S(Xp, Yp)
(3.2)
108
For another example, consider rule (1) of Figure 3.1. Here, we need to
join two copies of P and then select for the arithmetic inequality X / Y. The
algebraic expression for rule (1) is thus
Q(X, Y, Z) = axY (P(X, Z) ix P(Y, Z))
(3.3)
The relation Q(X, Y, Z) computed by (3.3) consists of all tuples (x, y, z) such
that:
1.
2.
3.
(x, z) is in P,
(y, z) is in P, and
x*y.
Again, it is easy to see that these tuples (x, y, z) are exactly the ones that make
the body of rule (1) true. Thus, (3.3) expresses the relation for the body of rule
(1).
Finally, let us examine an abstract example that points out some of the
problems we have when computing the relation for the body of a rule. Consider
p(X,Y) :- q(a,X) ft r(X,Z,X) ft s(Y,Z)
(3.4)
Suppose we already have computed relations Q, R, and 5 for subgoals q, r, and
s, respectively. Since the first subgoal asks for only those tuples of Q that have
first component o, we need to construct a relation, with attribute X, containing
only the second components of these tuples. Thus, define relation
We also must restrict the relation R so that its first and third components, each
of which carries variable X in the second subgoal, are equal. Thus define
Then the relation for the body of rule (3.4) is defined by expression:
This expression defines the set of tuples (x, y, z) that make the body of (3.4)
true, i.e., the set of tuples (x, y, z) such that:
1.
2.
3.
(a,x) is in Q,
(x, z,x) is in R, and
(y, z) is in 5.
D
We shall now describe how to construct an expression of relational algebra
that computes the relation for a rule body.
109
Algorithm 3.1: Computing the Relation for a Rule Body, Using Relational
Algebra Operations.
INPUT: The body of a datalog rule r, which we shall assume consists of subgoals
S\, . . . ,Sn involving variables Xi, . . . , Xm. For each 5i = pi(Au, . . . , Aik, ) with
an ordinary predicate, there is a relation Ri already computed, where the A's
are arguments, either variables or constants.
OUTPUT: An expression of relational algebra, which we call
EVAL-RULE(r, RI, . .
that computes from the relations RI,.. ., R^3 a relation R(Xi,. . . , Xm) with
all and only the tuples (ai, . . . ,am) such that, when we substitute Oj for Xj,
1 < j < m, all the subgoals 5i , . . . , 5n are made true.
METHOD: The expression is constructed by the following steps.
1.
2.
3 Technically, not all n relations may be present as arguments, because some of the subgoals may have built-in predicates and thus not have corresponding relations.
4 It is not necessary to add this term for all possible pairs k and /, just for enough pairs
that all occurrences of the same variable are forced to be equal. For example, if X
appears in positions 2, 5, 9, and 14, it suffices to add terms $2 = $5, 15 = $9, and
19 = 114.
110
3.
Let E be the natural join of all the Qi's defined in (1) and the DX'S defined
in (2). In this join, we regard Qi as a relation whose attributes are the
variables appearing in 5i, and we regard D\ as a relation with attribute
X.5
4. Let EVAL-RULE(r, #1,... ,fln) be af(E), where F is the conjunction of
XOY for each built-in subgoal XOY appearing among pi, . . . ,pn, and E is
the expression constructed in (3). If there are no built-in subgoals, then
the desired expression is just E. D
Example 3.5 illustrates the construction of this algorithm. For instance, the
expression T(X) = K2(<r$i=a(Q)) is what we construct by step (1) of Algorithm
3.1 from the first subgoal, q(a,X), of the rule given in (3.4); that is, T(X) in
Example 3.5 is Qi here. Similarly, U(X, Z) = iri,2(<7$i=$3(-#)) in Example 3.5 is
Q2' constructed from the second subgoal, r(X, Z, X). Q3, constructed from the
third subgoal, S(Y, Z), is S(Y, Z) itself. There are no built-in subgoals, so no
extra domains need be constructed in step (2), and no selection is needed in step
(4). Thus, the expression T(X) txi U(X, Z) txt S(Y, Z) is the final expression
for the body of the rule (3.4). In Example 3.7 we shall give a more extensive
example of how EVAL-RULE is computed when there are built-in subgoals.
Theorem 3.1: Algorithm 3.1 is correct, in the sense that the relation R pro
duced has all and only those tuples (ai, . . . ,am) such that, when we substitute
each a,j for Xj, every subgoal Si is made true.
Proof: Suppose (oi,...,am) makes every Si true. By (i) in the definition
of "made true"6 and step (1) of Algorithm 3.1, there is a tuple Hi in Qi that
has Oj in its component for Xj, for every variable Xj appearing in subgoal 5j.
Step (2) tells us there is a (unary) tuple vxt QI in Dxt, for every variable
Xi that appears in no ordinary subgoal. Then step (3) of the algorithm takes
the natural join of the Qi's and DX's. At each step of the join, the tuples
Hi agree on any variables in common, so they join together into progressively
larger tuples, each of which agrees with (a\, . . . , am) on the attributes they have
in common.
Finally, the join of all the /Vs and i/'s is (ai, . . . ,am) itself. Furthermore,
by (it) in the definition of "made true," the tuple (ai, . . . , am) satisfies condition
F in step (4), so Algorithm 3.1 puts (ai, . . . ,am) in relation R.
Conversely, suppose (ai, . . . ,am) is put in R by the algorithm. Then this
tuple must satisfy F of step (4), and therefore condition (ii) of "made true"
is met. Also, (ai, . . . ,om) must be in the relation defined by E of step (3), so
each Qi has a tuple Hi whose component for variable Xj has value a,-, for each
Xj that appears in subgoal 5j. An examination of step (1) tells us that the
5 Since any X for which D\ is constructed cannot be an attribute of any Qi, the natural
join really involves the Cartesian product of all the DX 's, if any.
6 The formal definition of "made true" appears just before Example 3.5.
111
112
113
OUTPUT: For each IDB predicate p, an expression of relational algebra that gives
the relation for p in terms of the relations Ri, . . . , Rm for the EDB predicates.
METHOD: Begin by rectifying all the rules. Next, construct the dependency
graph for the input program, and order the predicates p\, . . . ,pn, so that if the
dependency graph for the program has an arc from pi to PJ, then i < j. We can
find such an order because the input program is nonrecursive, and therefore the
dependency graph has no cycles. Then for i = 1, 2, ... n, form the expression
for relation Pi (for pi ) as follows.
If pi is an EDB predicate, let Pi be the given relation for pJ. In the opposite
case, suppose pi is an IDB predicate. Then:
1. For each rule r having pi as its head, use Algorithm 3.1 to find an expression
Er that computes the relation Rr for the body of rule r, in terms of relations
for the predicates appearing in r's body.
2. Because the program is nonrecursive, all the predicates appearing in the
body of r already have expressions for their relations in terms of the EDB
relations. Substitute the appropriate expression for each occurrence of an
IDB relation in the expression Er to get a new expression Fr.
3. Renaming variables, if necessary, we may assume that the head of each rule
for pj is pi(Xi, . . . , Xk). Then take the expression for Pi to be the union
over all rules r for pJ, of irx,,...,xt(Fr)- D
Example 3.7: Let us take an abstract example that illustrates the mechanics
of Algorithm 3.2. Suppose we have the four rules:
(1) p(a,Y) :- r(X,Y).
(2) p(X,Y) :- s(X,Z) ft r(Z,Y).
(3) q(X,X) :- p(X,b).
(4) q(X,Y) :- p(X,Z) t s(Z,Y).
Here, r and a are EDB predicates, which we may suppose have given relations
R and 5. Predicates p and q are IDB predicates, for which we want to compute
relations P and Q.
We begin by rectifying the rules, which requires modification to (1) and
(3). Our new set of rules is:
(1)
(2)
(3)
(4)
p(X,Y)
p(X,Y)
q(X,Y)
q(X,Y)
::::-
r(X,Y)
s(X,Z)
p(X,b)
p(X,Z)
ft
ft
ft
ft
X=a.
r(Z,Y).
X=Y.
s(Z,Y).
114
is taken. As a special case, no projection is needed for the first of these. Thus,
the expression for P is
) = <rx=a(R(X,Y)) U *x,Y(S(X,Z)*iR(Z,Y))
Next, we consider q. The relation for rule (3) is computed as follows. By
Algorithm 3.1, the expression for the subgoal p(X, b) is
X7ry|
'(P(YW))
Finally, the expression for rule (4) is P(X, Z) M S(Z, Y) so the expression
for Q is
Q(X,Y)=ax=YUx(ffz=b(P(X,Z))) x KY(P(Y, W)) J U
115
and only the facts that are provable from the EDB facts and rules.
To see that the set of EDB and IDS facts thus constructed is the unique
minimal model, we again perform an induction on the order in which the predi
cates are handled. The claim this time is that any model for the facts and rules
must contain all the facts constructed by the expressions. Thus, the model
consisting of the union of the relations for each of the predicates produced by
Algorithm 3.2 is a subset of any model whatsoever. It is itself a model, since
any substitution into one of the rules that makes the body true surely makes
the head true. Thus, what we construct is the only possible minimal model.
D
3.4 COMPUTING THE MEANING OF RECURSIVE RULES
Algorithm 3.2 does not apply to recursive datalog programs, because there is
no order for the predicates that allows the algorithm to be applied. That is,
whenever there is a cycle in the dependency graph, the first predicate on that
cycle which we try to evaluate will have a rule with a subgoal whose expression
is not yet available.
However, the proof-theoretic approach still makes sense if we remember
that it is permissible to derive some facts using a rule, and later use newly
derived facts in the body to derive yet more facts. If we start with a finite
database, and we use only datalog rules, then there are only a finite number
of different facts that could possibly be derived; they must be of the form
P(OI, . . . ,0fc), wherep is an IDB predicate mentioned in the rules, and 0i, . . . , afc
are constants appearing in the database.
Consider a datalog program with given EDB relations RI , . . . , Rk and with
IDB relations Pi, . . . , Pm to be computed. For each t, 1 < i < m, we can express
the set of provable facts for the predicate pi (corresponding to IDB relation Pi)
by the assignment
Pi := EVALfo, fl,, .... flfc, PL .... Pm)
where EVAL is the union of EVAL-RULE (as defined in Algorithm 3.1) for each of
the rules for pJ. If we start with all Pj's empty, and we execute an assignment
such as this for each i, repeatedly, we shall eventually reach a point where no
more facts can be added to any of the Pj's.8 Now, the assignment symbol
becomes equality; that is, the set of IDB facts that can be proved satisfies the
equations
Pi = EVAL(pi,fli, . . . ,flfc,Pi, . . . ,Pm)
8 We shall show later in this section that when the rules have no negative subgoals, EVAL
is "monotone"; that is, the P^a can only grow, and once in Pi, a fact will continue to
be there every time Pi is recomputed.
116
for all i. We shall call equations derived from a datalog program in this manner
datalog equations.
Example 3.8: The rules of Figure 3.1 can be viewed as the following equations.
We use P, 5, C, and R for the relations corresponding to parent, sibling,
cousin, and related, respectively.
S(X, Y) = 7rX,y (<rx*Y (P(X, Z) M P(Y, Z)))
C(X, Y) = nx,Y(P(X, Xp) IM P(Y, Yp) tx, S(Xp, Yp)) U
7rx.y (P(X, Xp) M P(Y, Yp) M C(Xp, Yp))
R(X, Y) = S(X, Y) U 7rX,y (R(X, Z) M P(Y, Z)) U
D
Fixed Points of Datalog Equations
117
It turns out that datalog programs each have a unique minimal model
containing any given EDB relations, and this model is also the unique minimal
fixed point, with respect to those EDB relations, of the corresponding equations.
Moreover, as we shall see, just as in the nonrecursive case, this "least fixed
point" is exactly the set of facts one can derive, using the rules, from a given
database.
More formally, let the variables of the equations be PI, . . . , Pm, correspond
ing to IDB predicates pi,...,pm, and let us focus our attention on particu
lar relations Ri,...,Rk assigned to the EDB predicates ri,...,rfc. A solu
tion, or fixed point, for the EDB relations Ri,...,Rk assigns to Pi,...,Pm
particular relations P\ , ...,Pm , such that the equations are satisfied. If
5i = P^1\...,P^ and 52 = Pi2),. . . ,Pm} are two solutions to a given set
of equations, we say that 5i < 52 if relation P/ is a subset of relation P,
for all i, 1 < t < m. Then So is the least fixed point of a set of equations, with
respect to the EDB relations fli, . . . , Rk, if for any solution 5, we have .So < S.
More generally, SQ is a minimal fixed point if there is no other fixed point S
such that S < So- Notice that if there is a least fixed point, then that is the
only minimal fixed point. However, there may be several minimal fixed points
that are not comparable by <, and in that case there is no least fixed point.
Example 3.9: Let us consider the common problem of computing the transitive
closure of a directed graph. If the graph is represented by an EDB predicate
arc such that arc(X, Y) is true if and only if there is an arc from node X to
node Y, then we can express the paths in the graph by rules:
(1) path(X.Y) :- arc(X.Y).
(2) path(X.Y) :- path(X.Z) ft path(Z.Y).
That is, the first rule says that a path can be a single arc, and the second says
that the concatenation of any two paths, say one from X to Z and another from
Z to V , yields a path from X to Y. This pair of rules is not necessarily the best
way we can define paths, but they are probably the most natural way. Note the
analogy between path and ore here and the predicates boss and manages in
Example 1.12. There, we used another, simpler way of computing the transitive
closure of a relation.
We can turn these rules into a single equation for the relation P that cor
responds to the path predicate. The equation assumes there is a given relation
A corresponding to predicate arc.
P(X, Y) = A(X, Y) U 7rx,y (P(X, Z) M P(Z, Y))
(3.5)
Suppose that the nodes are {1,2,3} and A represents the arcs 1 2 and
2 - 3; that is, A = {(1,2), (2,3)}. The first rule for path tells us that (1,2)
and (2,3) are in P, and the second rule implies that (1,3) is in P. However,
118
we are not required to deduce the existence of any more paths, because P =
{(1,2), (2,3), (1,3)} is a solution to Equation (3.5). That is,
{(1,2), (2,3), (1,3)} = {(1,2), (2,3)}U
7rx>r({(l,2), (2,3), (1,3)} M {(1,2), (2,3), (1,3)})
is an equality. In interpreting the above, we have to remember that the left
operand of the join is a relation over attribute list X, Z, and its right operand is
a relation over attributes Z, Y. Thus, the expression irx,Y (P(X' Z) txi P(Z, y))
can be thought of as the composition of the relation P with itself, and its value
here is {(1,3)}.
This solution is the proof-theoretic meaning of the rules, because we derived
from the EDB relation A exactly what the rules allowed us to prove. It is also
easy to see it is the minimal model of the rules or least fixed point of the
equation (3.5) [with respect to the given relation A], because every derived fact
can be shown to be in every model or fixed point containing the EDB relation
A,
However, there are other solutions to (3.5). Suppose we arbitrarily decided
that (1,1) was also in P. The rules do not imply any more paths, given that
A = {(1,2), (2,3)} and P = {(1,1), (1,2), (2,3), (1,3)}. Notice how (1,1)
"proves" itself if we let X = Y = Z = 1 in rule (2). Thus, another solution to
(3.5) is:
{(1,1), (1,2), (2,3), (1,3)} = {(1,2), (2,3)} U
.), (1,2), (2,3), (1,3)} i* {(1,1), (1,2), (2,3), (1,3)})
Similarly, we could let P consist of all nine pairs (i, j), where 1 < t, j < 3, and
that value would also satisfy (3.5). On the other hand, not every value of P
satisfies (3.5). For example, still assuming A = {(1,2), (2,3)}, we cannot let
P = {(1,2), (2,3), (1,3), (3, 1)}, because the resulting substitution into (3.5),
which is
{(1,2), (2,3), (1,3), (3, !)} = {(!, 2), (2,3)} U
jrx,y({(l,2), (2,3), (1,3), (3,1)} ixi {(1,2), (2,3), (1,3), (3,1)})
is not an equality. The join on the right yields, for example, tuple (3, 1,2) over
attribute list X, Z, Y, which after projection is (3, 2), a tuple that is not on the
left.
As a final example, let us see a model that is not a fixed point. Let A = 0
and P = {(1,2)}. Then the rules are made true. In rule (1), there is no way
to make the body, arc(X, Y) true, so the rule is true no matter what constants
are substituted for the variables. In rule (2), there is no value we can substitute
for Z that will make both (X, Z) and (Z, Y) be tuples of P, so again the body
cannot be made true and the rule must always be true. We conclude that the
set of facts consisting of path(l, 2) alone is a model of the given datalog rules.
119
However, (3.5) is not made true; its left side is {(1, 2)} and its right side is
0 for the given A and P. Thus, P = {(1, 2)} is not a fixed point of the equations
with respect to EDB A = 0. D
Solving Recursive Datalog Equations
We can solve a set of datalog equations by assuming initially that all the Pi 's are
empty, and the #j's are whatever is given. We then apply EVAL to the current
values of the IDB relations and the permanent values of the EDB relations,
to get new values for the IDB relations. This process repeats, until at some
point, none of the Pj's change. We know the IDB relations must converge in
this sense, because the EVAL operation is "monotone," a property that we shall
define more formally later, but which essentially means that when you add more
tuples to some of the arguments of the operation, the result cannot lose tuples.
Algorithm 3.3: Evaluation of Datalog Equations.
\
INPUT: A collection of datalog rules with EDB predicates r\ , . . . , rfc and IDB
predicates pi, . . . ,pm. Also, a list of relations RI, . . . , Rk to serve as values of
the EDB predicates.
OUTPUT: The least fixed point solution to the datalog equations obtained from
these rules.
METHOD: Begin by setting up the equations for the rules. These equations have
variables PI, . . . , Pm corresponding to the IDB predicates, and the equation for
Pi is Pj = EVAL(pj, RI, . . . , Rk, PI, . . . , Pm). We then initialize each Pj to the
empty set and repeatedly apply EVAL to obtain new values for the Pj's. When
no more tuples can be added to any IDB relation, we have our desired output.
The details are given in the program of Figure 3.3. D
for i := 1 to m do
Pi := 0;
repeat
for i := 1 to m do
Qi := Pt; I* save old values of Pj's */
for t := 1 to m do
Pj := EVAL(pj,fli,...,flfc,Qi,...,Qm);
until PJ = Qi for all t, 1 < i < m;
output Pj's
Figure 3.3 Simple evaluation algorithm.
120
Example 3.10: Consider the rules of Figure 3.1 and the particular relation
P for the EDB predicate parent shown in Figure 3.4. In that figure, an edge
downward from x to y means that x is a parent of y; i.e., parent(y,x) is true.
The EVAL formulas for predicates sibling, cousin, and related, or equivalently
their relation variables 5, C, and R, are the formulas given on the right sides
of the equations in Example 3.8. When we apply Algorithm 3.3, the relation P
remains fixed; it contains the tuples ca, da, and so on, indicated by Figure 3.4.
[Note we use the compact notation for tuples here, ca instead of (c, a), and so
on.]
121
1
2
cdde
fghi
fi
fh fi
gh gi
hi jk
cd de
fg hi
fi
df dg ch
ci eh ei
df di gj
fk hk ij
fh dj gh
gi dk cj
ck ej ek
fj hj gk
Figure 3.5 Application of the algorithm of Figure 3.3.
sibling pair dc causes the reverses of each of these pairs to be placed in C, but
we don't list these pairs because of our convention (just as we did not explicitly
show that dc is in S).
In the third round, rule (3) for cousin could cause more tuples to be added
to C. For example, the fact that h and i were discovered to be cousins in round
2 (they are children of siblings d and e) tells us in round 3 that ./ and k are
cousins. However, we already discovered that fact in round 2.
Rules (4)-(6) for related are similarly applied starting at round 2. It takes
until round 5 for all the tuples in R to be deduced. For example, the fact that
/ and j are related is not deduced until that round.9 D
Monotonicity
To prove that Algorithm 3.3 converges at all, let alone that it converges to
the least fixed point, requires establishing that repeated application of EVAL
produces for each IDB predicate a sequence of relations that are progressively
larger, until at some point they stop growing and remain fixed. We need the
9 Note that parenthood does not imply a relationship between / and j by rules (4)-(6).
Rather, / and j are related because c and d are siblings, / is a descendant of c and / is
a descendant of d.
122
p(2)
123
124
125
relations. However, since there can be no incremental tuples for EDB relations,
we may take the union over the subgoals with IDB predicates only, except
on the first round. On the first round, we must use the full relations for all
predicates. However, since the IDB predicates have empty relations on round
1, we in effect use only the EDB relations on round 1.
Let us define more formally the operation of incremental evaluation of
the relations associated with rules and predicates. Let r be a rule with ordi
nary subgoals 5i ,...,5n; we exclude from this list any subgoals with built-in
predicates. Let Ri , . . . , Rn be the current relations associated with subgoals
5i, . . . , 5n, respectively, and let A/Zi, . . . , A#,, be the list of corresponding in
cremental relations, the sets of tuples added to RI ,...,#n on the most recent
round. Recall that EVAL-RULE(r, T\,. .. ,Tn) is the algebraic expression used
by Algorithm 3.1 to compute the relation for the body of rule r, when that
algorithm uses relation Ti as the relation for subgoal Si (Ti is /Zj in Algorithm
3.1). Then the incremental relation for rule r is the union of the n relations
EVAL-RULE(r, RI,..., Ri-i, A^, Ri+1, ...,#n)
for 1 < t < n. That is, in each term, exactly one incremental relation is
substituted for the full relation. Formally, we define:
EVAL-RULE-INCR(r, R1t . . . , R
Remember that all rules are assumed rectified, so the union is appropriate here,
just as it was in Algorithm 3.3.
Now, suppose we are given relations Ri,...,Rk for the EDB predicates
"i, - - ,TV For the IDB predicates pi, . . . ,pm we are given associated relations
Pi,-,Pm and associated incremental relations APi,...,APm. Let p be an
IDB predicate. Define:
EVAL-INCR(p, Ri,...t Rk, Pi, ..., Pm, AP1, . . . , APm)
to be the union of what EVAL-RULE-INCR produces for each rule for p. In
each application of EVAL-RULE-INCR, the incremental relations for the EDB
predicates are 0, so the terms for those subgoals that are EDB predicates do
not have to appear in the union for EVAL-RULE-INCR.
Example 3.12: Consider the rules of Figure 3.1 again. Let P, S, C, and R
be the relations for parent, sibling, cousin, and related, as before, and let A5,
AC, and Afl be the incremental relations for the last three of these predicates,
which are the IDB relations. Since sibling is defined only in terms of the EDB
relation parent, we find
EVAL-INCR(stWm0, P) = 0
That is, EVAL-RULE-INCR for rule (1) is a union over an empty set of subgoals
126
that have IDB predicates. This situation is not alarming, since we saw in
Example 3.10 that 5 will get all the tuples it is ever going to get on the first
round [and incremental evaluation starts by applying EV\L(sibling, P) once].
Predicate cousin is defined by rules (2) and (3), and these rules each have
only one IDB predicate: sibling in (2) and cousin in (3). Thus, for each of
these rules EVAL-RULE-INCR has only one term, and the formula for cousin has
the union of the terms for each of the two rules:
EVAL-INCR(cousm, P, S, C, A5, AC) =
Semi-Naive Evaluation
These definitions are used in the following improvement to Algorithm 3.3. The
algorithm below, taking advantage of incremental relations, is sometimes called
"semi-naive," compared with the simpler but less efficient Algorithm 3.3, which
is called "naive." In Chapter 13 (Volume II) we shall examine some algorithms
that are more efficient still, and do not warrant the approbation "naive."
Algorithm 3.4: Semi-Naive Evaluation of Catalog Equations.
INPUT: A collection of rectified datalog rules with EDB predicates n , . . . , T>
and IDB predicates pi, . . . ,pm. Also, a list of relations Ri,...,Rk to serve as
values of the EDB predicates.
OUTPUT: The least fixed point solution to the relational equations obtained
from these rules.
METHOD: We use EVAL once to get the computation of relations started, and
then use EVAL-INCR repeatedly on incremental IDB relations. The computation
is shown in Figure 3.6, where for each IDB predicate pJ, there is an associated
relation Pi that holds all the tuples, and there is an incremental relation APj
that holds only the tuples added on the previous round. D
Example 3.13: Let us continue with Example 3.12. On the first round, which
is the initial for-loop of Figure 3.6, we use the ordinary EVAL operation. As
we saw in Example 3.10, only relation 5 for sibling gets any tuples on this
round, because only that predicate has a rule without IDB predicates in the
body. Thus, on the second round, 5 and A5 are both the complete relation for
sibling, while all other IDB relations and incremental relations are empty.
127
for i := 1 to m do begin
AQi,...,AQm);
APj := APi Pi /* remove "new" tuples
that actually appeared before */
end;
for i := 1 to m do
P, := Pi U APj
until APi = 0 for all i;
output Pj's
Figure 3.6 Semi-naive evaluation of datalog programs.
On the second round, i.e., the first time through the repeat-loop of Figure
3.6, A5 becomes equal to 0, since this is what EVAL-INCR returns, as discussed
in Example 3.12. The terms from rules (2) and (4) now contribute some tuples
to AC and Afi, respectively, and these tuples then find their way into C and
R at the end of the repeat-loop. That is, on round 2 we compute:
C = AC = KX,Y (P(X, Xp) ix P(y, Yp) tx A5(Ap, Yp))
R = Afl = A5
On the third round, since A5 is empty, rules (2) and (4) can no longer
yield new tuples, but as AC and A# now have some tuples, rules (3), (5), and
(6) may. We thus compute:
AC = TTX.X (P(X, Xp) txj P(Y, Yp) M AC( Xp, Yp))
Afl = 7rx,x(Afl(A-,Z)*iP(y,Z)) U *x,Y(&R(Z,Y) ixi P(X,Z))
The values of AC and Afl are accumulated into C and R, respectively, and
provided both were not empty, we repeat another round in the same way. D
Theorem 3.5: Algorithm 3.4 correctly computes the least fixed point of its
given rules and given EDB relations.
Proof: We shall show that Algorithms 3.3 and 3.4 compute the same sets of
tuples for each of the IDB relations on each round. Since Algorithm 3.3 was
shown to compute the least fixed point, we shall thus conclude that Algorithm
128
3.4 does so too. The actual inductive hypothesis we need is that a tuple added
to some IDB relation P in round j by Algorithm 3.3, not having been placed
in that relation on any prior round, will be placed in both P and AP on round
j by Algorithm 3.4. The basis, round 1, is immediate, since the same formulas,
given by EVAL, are used by both algorithms.
For the induction, one has only to notice that if a tuple n is added to some
IDB relation P on round i, and n was not previously in P, then there must
be some rule r for predicate p (the predicate corresponding to relation P) and
tuples in the relations for all the subgoals of r such that
1. The tuples for the subgoals together yield n, and
2. At least one of these tuples, say V, was added to its relation, say T, on
round t 1.
By the inductive hypothesis with j = i - 1 and observation (2) above, v is in
AT when we start round i of Algorithm 3.4. Therefore the term of EVAL-INCR
that uses AT (or rather its copy into some AQ^) will produce p,, since that
term uses full relations for subgoals other than the one that supplies f, and v
will be supplied by AT. D
3.6 NEGATIONS IN RULE BODIES
There are frequent situations where we would like to use negation of a predicate
to help express a relationship by logical rules. Technically, rules with negated
subgoals are not Horn clauses, but we shall see that many of the ideas developed
so far apply to this broader class of rules. In general the intuitive meaning of
a rule with one or more negated subgoals is that we should complement the
relations for the negated subgoals, and then compute the relation of the rule
exactly as we did in Algorithm 3.1.
Unfortunately, the "complement" of a relation is not a well-defined term.
We have to specify the relation or domain of possible values with respect to
which the complement is taken. That is why relational algebra uses a setdifference operator, but not a complementation operator. But even if we specify
the universe of possible tuples with respect to which we compute the comple
ment of a relation, we are still faced with the fact that this complement will
normally be an infinite relation. We cannot, therefore, apply operations like
selection or join to the complement, and we cannot perform Algorithm 3.1 on
a rule with negation in a straightforward manner.
It turns out that one critical issue we face when trying to define the meaning
of rules with negated subgoals is whether the variables appearing in the negated
subgoals also appear in nonnegated, ordinary (non-built-in) subgoals. In the
next example, we see what happens when things work right, and then we see
where problems arise when variables appear only in negated subgoals. Later,
we examine another problem that comes up when some subgoals are negated:
129
there is not necessarily a least fixed point for a logic program. Furthermore,
since we have no mechanism for proving negated facts, the proof-theoretic point
of view does not help us, and we are forced to select one of the minimal models
as the "meaning" of the logic program.
Example 3.14: Suppose we want to define "true cousins" to be individuals
who are related by the cousin predicate of Figure 3.1 but who are not also
related by the sibling relationship. We might write
trueCousin(X.Y) :- cousin(X.Y) & -.sibling(X.Y) .
This rule is very much like an application of the difference operator of relational
algebra, and indeed we can compute T = C S, where T is the relation for
trueCousin, and C and 5 are the relations for cousin and sibling, computed
as in the previous section.
The formula T = C S is easily seen to give the same relation as
130
males who are not married to absolutely everybody in the universe; that is,
there exists some Y such that Y is not married to X.
To avoid this apparent divergence between what we intuitively expect a
rule should mean and what answer we would get if we interpreted negation in
the obvious way (complement the relation), we shall forbid the use of a variable
in a negated subgoal if that variable does not also appear in another subgoal,
and that subgoal is neither negated nor a built-in predicate. This restriction is
not a severe one, since we can always rewrite the rule so that such variables do
not appear.12 For example, to make the attributes of the two relations involved
in (3.6) be the same, we need to project out Y from married; that is, we rewrite
the rules as:
husband(X) :- married(X,Y) .
bachelor(X) :- male(X) ft -"husband (X) .
These rules can then have their meaning expressed by:
husband(X) = irx (married(X, Y))
bachelor(X) = male(X) - husband(X)
or just:
bachelor(X) = male(X) KX (married(X, Y))
n
While we shall forbid variables that appear only in negated subgoals, the
condition found in Example 3.14 and in the rewritten rules of Example 3.15,
which is that the set of variables in a negated subgoal exactly match the vari
ables of a nonnegated subgoal, is not essential. The next example gives the idea
of what can be done in cases when there are "too few" variables in a negated
subgoal.
Example 3.16: Consider:
canBuy(X.Y) :- likes(X.Y) ft -'broke (X) .
Here, likes and broke are presumed EDB relations. The intention of this rule
evidently is that X can buy Y if X likes Y and X is not broke. Recall the
relation for this rule is a join involving the "complement" of broke, which we
might call notBroke. The above rule can then be expressed by the equivalent
relational algebra equation:
canBuy(X, Y) = likes(X, Y) DO notBroke(X)
(3.7)
The fact that notBroke may be infinite does not prevent us from computing
12 Provided, of course, that we take the interpretation of -<q(X\ ,., Xn ) to be that used
implicitly in (3.6): "there do not exist values of those variables among X\ , . . . , Xn that
appear only in negated subgoals such that these values make q(X\, . . . , Xn) true."
131
the right side of (3.7), because we can start with all the likes(X, Y) tuples and
then check that each one has an X-component that is a member of notBroke,
or equivalently, is not a member of broke.
As we did in the previous two examples, we can express (3.7) as a set
difference of finite relations if we "pad" the broke tuples with all possible objects
that could be liked. But there is no way to say "all objects" in relational algebra,
nor should there be, since that is an infinite set.
We have to realize that we do not need all pairs (X, Z) such that X is broke
and Z is anything whatsoever, since all but a finite number of the possible Z's
will not appear as a second component of a likes tuple, and therefore could not
possibly be in the relation canBuy anyway. The set of possible Z's is expressed
in relational algebra as 7r2(/ifces), or equivalently, 7ry(/ifces(X, V)). We may
then express canBuy in relational algebra as:
D
Nonuniqueness of Minimal Fixed Points
Adjusting the attribute sets in differences of relations is important, but it does
not solve all the potential problems of negated subgoals. If Si and S2 are two
solutions to a logic program, with respect to a given set of EDB relations, we
say 5i < 52 if 51 < 52 and 5i ^ 52. Recall that fixed point Si is said to be
minimal if there is no fixed point S such that 5 < 5i . Also, Si is said to be
aleast fixed point if Si < S for all fixed points S. When rules with negation
are~alTowed, there might not be a least fixed point, but several minimal fixed
points. If there is no unique least fixed point, what does a logic program mean?
Example 3.17: Consider the rules:
(1) p(X) :- r(X) ft -'q(X).
(2) q(X) :- r(X) ft -.p(X) .
132
Let P, Q, and R be the relations for IDB predicates p and q, and EDB predicate
r, respectively. Suppose R consists of the single tuple 1; i.e., R = {1}. Let S\
be the solution P = 0 and Q = {1}; let S2 have P = {1} and Q = 0. Both 5i
and 52 are solutions to the equations P = R Q and Q R P.13
Observe that S\ < S2 is false, because of the respective values of Q, and
S2 < Si is false because of P. Moreover, there is no solution S such that S < Si
or 5 < 52. The reason is that such an 5 would have to assign 0 to both P and
Q. But then P = R-Q would not hold.
We conclude that both Si and S2 are fixed points, and that they are both
minimal. Thus, the set of rules above has no least fixed point, because if there
were a least fixed point S, we would have S < Si and S < S2. D
Stratified Negation
To help deal with the problem of many minimal fixed points, we shall permit
only "stratified negation." Formally, rules are stratified if whenever there is a
rule with head predicate p and a negated subgoal with predicate q, there is no
path in the dependency graph from p to q.14 Restriction of rules to allow only
stratified negation does not guarantee a least fixed point, as the next example
shows. However, it does allow a rational selection from among minimal fixed
points, giving us one that has become generally accepted as "the meaning" of
a logic program with stratified negation.
Example 3.18: Consider the stratified rules:15
(1) p(X) :- r(X).
(2) p(X) :- p(X).
(3) q(X) :- s(X) ft -.p(X) .
The above set of rules is stratified, since the only occurrence of a negated
subgoal, -<p(X) in rule (3), has a head predicate, q, from which there is no path
to p in the dependency graph. That is, although q depends on p, p does not
depend on q.
Let EDB relations r and a have corresponding relations R and 5, and let
IDB relations p and q have relations P and Q. Suppose R = {1} and S = {1, 2}.
13 Note that rules (1) and (2) are logically equivalent, but these two set- valued equations are
not equivalent; certain sets P, Q, and R satisfy one but not the other. This distinction
between logically equivalent forms as we convert logic into computation should be seen
as a "feature, not a bug." It allows us, ultimately, to develop a sensible semantics for a
large class of logical rules with negation.
14 The construction of the dependency graph does not change when we introduce negated
subgoals. If -,<?(A"i, . . . . A"n) is such a subgoal, and the rule has head predicate p, we
draw an arc from q to p, just as we would if the were not present.
15 If one does not like the triviality of the rule (2), one can develop a more complicated
example along the lines of Example 3.9 (paths in a graph) that exhibits the same problem
as is illustrated here.
133
Since not every logic program with negations is stratified, it is useful to have an
algorithm to test for stratification. While this test is quite easy, we explain it
in detail because it also gives us the stratification of the rules; that is, it groups
the predicates into strata, which are the largest sets of predicates such that
1. If a predicate p has a rule with a subgoal that is a negated 9, then q is in
a lower stratum than p.
2. If predicate p has a rule with a subgoal that is a nonnegated q, then the
stratum of p is at least as high as the stratum of q.
The strata give us an order in which the relations for the IDB predicates may
be computed. The useful property of this order is that following it, we may
treat any negated subgoals as if they were EDB relations.
Algorithm 3.5: Testing For and Finding a Stratification.
INPUT: A set of datalog rules, possibly with some negated subgoals.
OUTPUT: A decision whether the rules are stratified. If so, we also produce a
stratification.
METHOD: Start with every predicate assigned to stratum 1. Repeatedly examine
the rules. If a rule with head predicate p has a negated subgoal with predicate q,
let p and q currently be assigned to strata t and j respectively. If i < j, reassign
p to stratum j + l. Furthermore, if a rule with head p has a nonnegated subgoal
with predicate q of stratum j, and t < j, reassign p to stratum j. These laws
are formalized in Figure 3.7.
If we reach a condition where no strata can be changed by the algorithm of
Figure 3.7, then the rules are stratified, and the current strata form the output
of the algorithm. If we ever reach a condition where some predicate is assigned
a stratum that is larger than the total number of predicates, then the rules are
not stratified, so we halt and return "no." D
16 If we did not have rule (2), then the first equation would be P = R, and there would be
a unique solution to the equations.
134
135
the stratum of predicates along any path in the dependency graph can never
decrease, because those paths go from subgoal to head, and the stratum of a
head predicate is never less than the stratum of one of its subgoals.
Suppose a program had a stratification, but was not stratified. Then there
would be a path in the dependency graph to some q from some p, such that
negated q was a subgoal of a rule r for p. The existence of the path says that
the stratum of q is at least as high as the stratum of p, yet the rule r requires
that the stratum of q be less than that of p. D
Lemma 3.3: If a logic program is stratified, then Algorithm 3.5 halts on that
program without producing a stratum higher than n, the number of predicates
in the program.
Proof: Each time we increase the stratum of some predicate p because of
some predicate q in the algorithm of Figure 3.7, it must be that q is a subgoal
(negated or not) of a rule for p. If we increase stratum[p] to i, and q is not
negated, then write q p; if q is negated, write q ^ p. For example, the
sequence of stratum changes discussed in Example 3.19 for the nonstratified
rules of Example 3.17 is r =* p ^ q =* p.
For technical reasons, it is convenient to add a new symbol start, which is
assumed not to be a predicate. We then let start =$ p for all predicates p.
It is an easy induction on the number of times Algorithm 3.5 changes a
stratum that if we set the stratum of a predicate p to t, then there is a chain of
and => steps from start to p that includes at least t => steps. The key point
in the proof is that if the last step by which Algorithm 3.5 makes the stratum
of p reach i is q =^ p, then there is a chain with at lease i 1 => steps to q, and
one more makes at least i =^'s to p. If the step by which Algorithm 3.5 makes
the stratum of p reach t is q -^ p, then there is already a chain including at
least t =>'s to 9, and this chain can be extended to p.
Now, notice that if the stratum of some predicate reaches n + 1, there is
a chain with at least n + 1 =>'s. Thus some predicate, say p, appears twice as
the head of a =>. Thus, a part of the chain is
qi 4 p 92 ^ P
where i < j. Also, observe that every portion of the chain is a path in the
dependency graph; in particular, there is a path from p to <ft in the dependency
graph.
The fact that q^ ^ p is & step implies that there is a rule with head p
and negated subgoal 92. Thus, there is a path in the dependency graph from
the head, p, of some rule to a negated subgoal, 92, of that rule, contradicting
the assumption that the logic program is stratified. We conclude that if the
program is stratified, no stratum produced by Algorithm 3.5 ever exceeds n,
136
and therefore, Algorithm 3.5 must eventually halt and answer "yes." D
Theorem 3.6: Algorithm 3.5 correctly determines whether a datalog program
with negation is stratified.
Proof: Evidently, if Algorithm 3.5 halts and says the program is stratified,
then it has produced a valid stratification. Lemma 3.2 says that if there is a
stratification, then the program is stratified, and Lemma 3.3 says that if the
logic program is stratified, then Algorithm 3.5 halts and says "yes" (the program
is stratified). We conclude that the algorithm says "yes" if and only if the given
logic program is stratified. D
Corollary 3.3: A logic program is stratified if and only if it has a stratification.
Proof: The three-step implication in the proof of Theorem 3.6 incidentally
proves that the three conditions "stratified," "has a stratification," and "Algo
rithm 3.5 says 'yes''" all are equivalent. D
Safe, Stratified Rules
In order that a sensible meaning for rules can be defined we need more than
stratification; we need safety. Recall that we defined rules to be "safe" in
Section 3.2 if all their variables were limited, either by being an argument of
a nonnegated, ordinary subgoal, or by being equated to a constant or to a
limited variable, perhaps through a chain of equalities. When we have negated
subgoals, the definition of "safe" does not change. We are not allowed to use
negated subgoals to help prove variables to be limited.
Example 3.20: The rules of Examples 3.16, 3.17, and 3.18 are all safe. The
rule of Example 3.15 is not safe, since Y appeared in a negated subgoal but in
no nonnegated subgoal, and therefore could not be limited. However, as we saw
in that example, we can convert that rule to a pair of safe rules that intuitively
mean the same thing. D
When rules are both safe and stratified, there is a natural choice from
among possible fixed points that we shall regard as the "meaning" of the rules.
We process each stratum in order, starting with the lowest first. Suppose we
are working on a predicate p of stratum i. If a rule for p has a subgoal with
a predicate q of stratum less than i, we can obtain q's relation, because that
relation is either an EDB relation or has been computed when we worked on
previous strata. Of course, no subgoal can be of stratum above t, if we have a
valid stratification. Moreover, if the subgoal is negated, then stratum of q must
be strictly less than t.
As a consequence of these properties of a stratification, we can view the set
of rules for the predicates of stratum i as a recursive definition of the relations
for exactly the stratum-t predicates, in terms of relations for the EDB relations
and all IDB relations of lower strata. As the equations for the IDB predicates
137
138
1.
139
3.3 for computing least fixed points when there is no negation) and a simple
induction on the strata. That is, we show by induction on i that the equations
derived from the rules with heads of stratum i are satisfied.
As for showing we have a minimal fixed point, we can actually show more.
The perfect fixed point S has the following properties:
1. If 5i is any other fixed point, then for every predicate p of stratum 1, p's
relation in S is a subset (not necessarily proper) of p's relation in Si2. For all i > 1, if Si is any fixed point that agrees with S on the relations
for all predicates of strata less than i, then the relations for the predicates
of stratum i are subsets in S of their relations in Si .
It follows from (1) and (2) that S is a minimal fixed point. In fact, S
is "least" of all minimal fixed points if one puts the most weight on having
small relations at the lowest strata. All the results mentioned above are easy
inductions on the strata, and we shall leave them as exercises for the reader.
3.7 RELATIONAL ALGEBRA AND LOGIC
We can view relational algebra expressions as defining functions that take given
relations as arguments and that produce a value, which is a computed relation.
Likewise, we know that datalog programs take EDB relations as arguments and
produce IDB relations as values. We might ask whether the functions defined by
relational algebra and by logic programs are the same, or whether one notation
is more expressive than the other.
The answer, as we shall prove in this section, is that without negation in
rules, relational algebra and datalog are incommensurate in their expressive
power; there are things each can express that the other cannot. With negation,
datalog is strictly more expressive than relational algebra. In fact, the set of
functions expressible in relational algebra is equivalent to the set of functions
we can express in datalog (with negation) if rules are restricted to be safe,
nonrecursive, and have only stratified negation. In this section, "nonrecursive
datalog" will be assumed to refer to rules of this form unless stated otherwise.
Note that since the rules are nonrecursive, it is easy to see that they must be
stratified.
From Relational Algebra to Logical Rules
Mimicking the operations of relational algebra with datalog rules is easy except
for selections that involve complex conditions. Thus, we begin with two lemmas
that let us break up selections by arbitrary formulas into a cascade of unions
and selections by simpler formulas. Then we give a construction of rules from
arbitrary relational algebra formulas.
Lemma 3.4: Every selection is equivalent to a selection that does not use the
NOT operator.
140
141
The first argument of the union is simple, but the second requires an application
of Lemma 3.5 to the A, leaving
142
n
Example 3.23: Let us consider the algebraic expression
canBuy(X, Y) = likes(X,Y) - (broke(X) x KY(Hkes(X,Y))\
developed in Example 3.16. The outermost operator is , with left operand
likes(X,Y) and right operand equal to an expression that we shall name for
convenience:
brokePair(X, Y) = broke(X) x 7ry (likes(X, Y))
The left operand, being an EDB relation, requires no rules. The right
operand has outermost operator x , with a left operand that is an EDB relation
and right operand 7ry (/ifces( A\ Y)}. The latter expression can be transformed
into a rule by Case 3 of Theorem 3.7; it is:
liked(Y) :- likes(X,Y).
Here we have invented the predicate name liked for the predicate whose relation
is the same as that of the expression 7ry(/ifces(X,
143
Now, we can write the rule for the expression brokePair, using Case 4 of
Theorem 3.7:
brokePair (X,Y) :- broke(X) ft liked(Y) .
Finally, we use Case 2 of Theorem 3.7 to produce the rule for canBuy:
canBuy(X.Y) :- likes(X.Y) ft -.brokePair(X.Y) .
Notice that the three rules developed here are the same as the rules pro
duced by an ad-hoc argument in Example 3.16. D
From Logic to Algebra
Now, we shall prove the converse of Theorem 3.7; for every nonrecursive datalog
program, every IDB relation can be computed by an equivalent expression of
relational algebra. Essentially all the ideas for constructing the desired algebraic
expression from a collection of nonrecursive rules have been given; we only have
to put them together properly.
Theorem 3.8: Let H be a collection of safe, nonrecursive datalog rules, possi
bly with negated subgoals. Then for each predicate p of 72 there is an expression
of relational algebra that computes the relation for p.
Proof: Since "R. is nonrecursive, we can order the predicates according to a
topological sort of the dependency graph; that is, if q appears as a subgoal in a
rule for p, then q precedes p in the order. Essentially, we apply Algorithm 3.2 to
evaluate the relation for each predicate in its turn. However, as we now have the
possibility of negated subgoals, we first use the trick of Algorithm 3.6 to replace
relations R for negated subgoals by complementary relations R = DOMk - R,
where k is the arity of R, and DOM is the set of all symbols appearing in H
and in the EDB relations.
The set DOM can always be expressed in relational algebra; it is the union
of a constant set and projections of the EDB relations. Also, the construction
of Algorithm 3.2 uses only the operators of relational algebra. As we may
compose these algebraic operations into expressions with as many operators as
we need, we can easily show by induction on the order in which the predicates
are considered that each has a relation defined by some expression of relational
algebra. D
Example 3.24: Consider the rules
p(X) :- r(X,Y) ft -'s(Y).
q(Z) :- s(Z) ft -'p(Z).
Assume r and a are EDB predicates with relations R and S; we shall derive
expressions for relations P and Q, which correspond to IDB predicates p and
q, respectively. The algebraic expression for DOM is the projection of R onto
its first and second components, plus the unary relation S itself; that is:
144
DOM =
We must use the topological order p, q. Predicate p is defined by the first
rule. For the first subgoal we can use the EDB relation R(X, Y), and for the
second subgoal we use the complementary relation [DOM S](Y), i.e., the
unary relation DOM S regarded as a relation over attribute Y. As required
by Algorithm 3.2, we take the join of these relations and project onto attribute
X, the sole attribute of the head. The resulting expression is:
P(X) = *x (R(X, Y) tx [DOM - S] (Y))
(3.8)
Next we construct the expression for Q according to rule (2). For the
first subgoal of rule (2) we use relation S(Z). For the second subgoal we need
[DOM - P](Z). Thus, the relation Q is S(Z) txj [DOM - P](Z), or, since the
join is an intersection in this case,
Q(Z] = [S n (DOM - P)] (Z)
Since 5 is a subset of DOM, the above simplifies to Q(Z) = S(Z) - P(Z), or,
substituting (3.8), with Z in place of X, for P(Z),
Q(Z) = S(Z) - KZ (R(Z, Y) tx [DOM - S] (K))
One can further argue that DOM can be replaced by ir2(R) in the above ex
pression. The reason is that [DOM - S](Y) is joined with R(Z, Y), so only
those elements of DOM that are derived from the second component of R could
contribute to the join. D
Monotone Relational Algebra
Recall from Theorem 3.3 that of the five basic relational algebra operations, all
but set difference are monotone. The operations union, product, selection, and
projection form the monotone subset of relational algebra. We also include in
the monotone subset any operation derivable from these four operators, such
as natural join. Finally, intersection, even though it was defined using set
difference, is in fact a monotone operator.
Examination of the constructions in Theorems 3.7 and 3.8 tells us that
when there are no set difference operators in the algebraic expression, the rules
constructed by Theorem 3.7 use no negated subgoals, and when there are no
negated subgoals, Theorem 3.8 provides an expression using only the monotone
subset of relational algebra. Thus, we have the following equivalence.
Theorem 3.9: The set of functions from relations to relations expressible in the
monotone subset of relational algebra is the same as the functions expressible
by nonrecursive datalog programs with no negated subgoals. D
145
146
5.
6.
147
Notice that the occurrence of X in r(X) is free, rather than bound by the
quantifier (VX).
Example 3.25: In Example 2.21 we discussed the relational algebra formula
irCUST(<7lNAME='Brie' (INCLUDES M ORDERS))
148
(3.9)
Since one of the conjuncts in (3.9) requires that / be equal to 'Brie', the
only / that could possibly exist to make the formula true is 'Brie'. Thus, we
can remove the quantifier (3/) if we replace occurrences of / by the constant
'Brie'. With a small effort, we can thus prove that (3.9) is equivalent to:
(3N)(3Q)(3D)(indudes(N,'Bne\Q) A orders(N,D,C))
(3.10)
in the sense that (3.9) and (3.10) produce the same relation over C, i.e., the same
sets of customers, when given the same INCLUDES and ORDERS relations. D
Domain Relational Calculus
Formulas can be used to express queries in a simple way. Each formula with
one or more free variables defines a relation whose attributes correspond to
those free variables. Conventionally, we shall write F(Xi,...,Xn) to mean
that formula F has free variables Xi, . . . , Xn and no others. Then the query,
or expression, denoted by F is
(X1-Xn\F(Xi,...,Xn)}
(3.11)
that is, the set of tuples 01 on such that when we substitute a^ for Xi ,
1 < t < n, the formula F(a\, . . . , an) becomes true.
The query language consisting of expressions in the form of (3.11) is called
domain relational calculus (DRC). The adjective "domain" seems odd in this
context, but it refers to the fact that variables are components of tuples, i.e.,
variables stand for arbitrary members of the domain for their components. This
form of logic should be distinguished from "tuple" relational calculus, which we
take up later in this section, where variables stand for whole tuples.
It should be observed that the relations defined by DRC expressions need
not be finite. For example,
149
(XY[-v(X,Y)}
is a legal DRC expression defining the set of pairs (X, Y) that are not in the
relation of predicate p. To avoid such expressions, we shall later introduce
a subset of DRC formulas called "safe," in analogy with safe datalog rules
introduced in Section 3.2.
150
151
152
evaluating F, any quantified variables are assumed to range over the set D, and
the negation of a subformula G is satisfied only by values in D that do not make
G true. We say F is domain independent if the relation for F with respect to
D D DOM(F) does not actually depend on D. If F is domain independent,
then its relation with respect to any domain D D DOM(F) is the same as its
relation with respect to DOM(F).
Example 3.28: The formula FI = -'r(X, Y) is not domain independent. Let
R be the relation given for predicate r; therefore DOM(Fi) = ni(R) U n^(R).
However, if D is any set that contains DOM(Fi), then the relation for FI with
respect to D is (D x D) R. In particular, if a is any symbol in D that is not
in DOM(Fi), then the tuple (a, a) is in the relation of FI with respect to D,
but is not in the relation of FI with respect to DOM(Fi).
For another, more complex example, consider
F2 = (3Y)(p(X,Y)Vq(Y,Z))
Let P = {06, cd} and Q = {ef} be the relations given for p and q, respec
tively. Then DOM(F2) = {a,6,c,d,e,/}. Let D = {a,b,c,d,e,f,g}. Then
the relation for F2 with respect to D, which is a relation over X and Z, the
free variables of Fz, includes the tuple (a, g). The reason is that there exists a
value of Y in the domain D, namely Y = 6, that makes p(a, Y ) V q(Y, g) true.
Naturally, q(b,g) isn't true, because bg is not in Q, but p(a, 6) is true because
ab is in P. Since eliminating g from D surely yields a different relation for F2,
we conclude that F2 is not domain independent.
On the other hand, consider F3 = (3Y)(p(X, Y)/\q(Y, Z)). As with F2, the
relation for F3 is a relation over X and Z. However, suppose (a, 6) is a tuple in
the relation for F3 with respect to some D. Then there must be some value c in
D such that when c is substituted for Y , the formula p(a, Y) A q(Y, b) becomes
true. That is, if P and Q are the relations for p and q, then ac is in P and cb
is in Q. Therefore, o is in iri(P), and thus a is in DOM(F3). Similarly, 6 is in
7r2(Q) and therefore in DOM(F3). We conclude that whatever D D DOM(F3)
we choose, the set of (a, 6) pairs in the relation for FJ with respect to D will
be the same as the relation for F3 with respect to D0M(F3); therefore, F3 is
domain independent. D
Safe DRC Formulas
As we mentioned, there is no algorithm to tell whether a given DRC formula is
domain independent. Thus, real query languages based on relational calculus
use only a subset of the DRC formulas, ones that are guaranteed to be domain
independent. We shall define "safe" formulas to be a subset of the domain
independent formulas. The important properties of safety are:
a)
b)
c)
153
With these criteria in mind, we introduce the following definition of safe DRC
formulas. Intuitively, these conditions force DRC formulas to look like the result
of applying a sequence of (safe) nonrecursive datalog rules. Rule (2) below says
that logical OR is used in the same way that two rectified rules for the same
predicate may be used, and rule (3) is analogous to the requirement that all
variables in the body of a rule be limited.
1.
2.
There are no uses of the V quantifier. This constraint does not affect the
expressiveness of the language, because (VX)F is logically equivalent to
-'(3X)-'F. That is, F is true for all X if and only if there does not exist
an X for which F is false. By applying this transformation wherever we
find a V, we can eliminate all universal quantifiers.
Whenever an OR operator is used, the two formulas connected, say F\ V Fz,
have the same set of free variables; i.e., they are of the form
F1(Xi,...,Xn)VF2(X1,...,Xn)
3.
4.
154
r(jf,y,z)A-.(p(jf,y)v(y,z))
(3.13)
Then, all three variables are limited by the positive conjunct r(X, Y, Z), so this
formula satisfies condition (3). D
Safe DRC to Relational Algebra
We can prove that every safe formula has a relational algebra expression defining
the same relation. Of course, there are many nonsafe formulas that also have
equivalent relational algebra expressions, such as the formula of Example 3.29,
but we cannot in general tell which ones do and which do not.
Theorem 3.11: The sets of functions computed by expressions of relational
algebra, by safe, nonrecursive datalog programs, and by safe formulas of domain
relational calculus are the same.
Proof: Theorems 3.7 and 3.8 proved the equivalence of relational algebra and
safe, nonrecursive datalog rules. Theorem 3.10, when we check the form of
each constructed formula, tells us that the functions expressible in relational
algebra are all expressible in safe DRC. We complete the proof by showing that
safe DRC formulas define functions that are expressible in safe, nonrecursive
datalog.
The proof proceeds by induction on the number of operators in the safe
DRC formula. However, because rule (3) applies to maximal sets of formulas
connected by logical AND, we have to state the inductive hypothesis carefully,
to avoid considering subformulas that appear unsafe only because they are a
proper subpart of a conjunct that does satisfy rule (3). For example, we do not
want to consider X = Y separately in the formula X = Y hp(X,Y). Thus,
the inductive hypothesis we shall prove is the following. Let F be a safe DRC
formula. Then for every subformula G such that F does not apply the AND
155
where Xi, . . . , Xm are all the free variables in G. Rule (3) in the definition of
safety for DRC implies that the above rule is a safe datalog rule.
If k = 1, then G must be an ordinary atomic formula of the form
p(Xi, . . . , Xk). In this case, use the EDB relation name p for pc, and do
not generate any rules.
Induction: Since universal quantifiers are forbidden in safe formulas, and nega
tions can only appear within conjunctions, we can break the induction into
three cases, for 3, V, and A.
1.
G = (3Xi)H, where the free variables of H are X\, . . . , Xk- Then the rule
2.
defines the desired predicate pcG = H V /. Then the free variables of H and / must be the same, by rule
(2) in the definition of "safety"; let these variables be X\, . . . ,Xk- Then
we use the two rules
Po(Xi,..
3.
156
Example 3.30: Let us treat the DRC formula constructed in Example 3.26:
likes(X, Y) A ^(broke(X) A (IZ)likes(Z, Y))
Of the three atomic formulas, only likes(Z, Y) is not part of a larger conjunct; it
is part of an existentially quantified formula. We thus use likes as the predicate
for this subformula and create no rules.
For the expression (BZ)likes(Z, Y) we create a new predicate, say p and
give it the rule
p(Y) :- likes(Z.Y).
Next we work on the maximal conjunct broke(X) A (^Z)likes(Z, Y), which is
broke(X) /\p(Y). The rule for this conjunct is thus
q(X,Y) :- broke(X) ft p(Y).
where q is used as the predicate name. Finally, the outer conjunct is translated
into a rule
r(X,Y) :- likes(X.Y) ft -.q(X.Y).
Notice that the rules we obtain are essentially those of Example 3.23. What we
called p, q, and r here are liked, brokePair, and canBuy there. D
3.9 TUPLE RELATIONAL CALCULUS
Tuple relational calculus, or TRC, is a variant of relational calculus where vari
ables stand for tuples rather than components of tuples. To refer to the ith
component of a tuple /i we use /i[i]. If an attribute A for that component is
known, we may also write n[A}. The formulas of TRC are defined recursively,
and the structure of TRC formulas is quite close to the structure of DRC for
mulas. The basis, atomic formulas, is:
1. If p is a predicate name and n a tuple variable, then p(/i) is an atomic
formula. Its meaning is that "tuple p, is in the relation for p."
2. XOY is an atomic formula if 9 is an arithmetic comparison operator and X
and Y are each either constants or component references; the latter are of
the form n[i] for some tuple variable n and component number or attribute
i.
The inductive definition proceeds as in domain relational calculus. If F\
and /2 are formulas of TRC and /i is a tuple variable appearing free in FI , then
the following are TRC formulas, with the obvious meanings and sets of free and
bound variables.
a) FiAF2
d) (3/i)Fi
b) FiVFj
e) (V/i)F1
c) -F1
157
The relation associated with a formula of TRC is defined in the same way as
for DRC. The relation for F has one component for each component of each free
tuple variable of F that actually is mentioned in F.19 The value of the relation
for F is the set of tuples whose values, when substituted for their corresponding
components of tuple variables, makes F true. A query of TRC is an expression
of the form {p, \ F(/i)}, where p, is the only free variable of F. Naturally, this
expression defines the relation of all tuples n such that F(n) is true.
On occasion, the arity of a tuple variable will not be clear from context.
We shall use the notation n^ to denote a tuple variable n that is of arity i.
Frequently, we use the superscript when the tuple variable is quantified and
leave it off in other places.
Example 3.31: The DRC query derived in Example 3.26 can be turned into
TRC if we use the tuple n for (X,Y), v for X in broke, and p for (Z,Y) in
likes. The query is:
|
(3.14)
Notice how the atomic formula f[l] = /i[l] replaces the connection that in
DRC was expressed by using the same variable X in likes and in broke. We
cannot use n as the argument of broke, because broke takes a unary tuple, while
likes takes a binary tuple. Similarly, p[2] = ^[2] substitutes for the double use
of Y. Also notice that v must be existentially quantified, or else it would be a
free variable of the query that did not appear to the left of the bar in the set
former (3.14). Although it looks like v could therefore be anything, the formula
i/[l] = n[l] completely defines the unary tuple v. The relation for the formula
broke(v) A i/[l] = /i[l] is
158
the most useful approach is to define a restricted form of TRC called "safe
TRC." Again in analogy with DRC, we shall be rather more restrictive than we
have to be, when defining "safety," because all we really need to do is define a
class that reflects what is found in commercial TRC-based languages and that
is equivalent in expressive power to relational algebra.
Since the arity of a tuple variable is not always clear from context in a
TRC formula, we shall assume that the arity of each variable is given and that
the arity of one variable does not change from occurrence to occurrence, even
if the two occurrences are bound by different quantifiers. The safety of a TRC
formula is defined as follows, in close analogy with the definition of safe DRC.
1. There are no uses of the V quantifier.
2. Whenever an V operator is used, the two formulas connected, say Fi V F2,
have only one free tuple variable, and it is the same variable.
3. Consider any subformula consisting of a maximal conjunction of one or
more formulas F\ A A Fm. Then all components of tuple variables that
are free in any of these Fj's are limited in the following sense.
a) If Fi is a nonnegated, ordinary atomic formula p(n), then all compo
nents of tuple variable p, are said to be limited.
b) If Fi is n[j] = a or a = fi[j], where a is a constant, then p\j] is limited.
c) If Fi is p\j] v[k] or v[k] = fi[j], and v[k] is a limited variable, then
p\j] is limited.
4. A -, operator may only apply to a term in a conjunction of the type dis
cussed in rule (3).
From Relational Algebra to Safe Tuple Relational Calculus
We shall show that safe TRC is equivalent in expressive power to relational
algebra, and therefore to the other languages of Theorem 3.11. The proof is in
two lemmas, one converting relational algebra to TRC and the other converting
TRC to DRC. These two results, together with the equivalences of Theorem
3.11 will show the equivalence between safe TRC and the three other abstract
query languages shown equivalent in that theorem.
Our first step is to show how relational algebra expressions can be converted
into TRC formulas. The proof is quite similar to that of Theorem 3.10, where
we turned the algebra into domain calculus.
Lemma 3.6: Every query expressible in relational algebra is expressible in safe
tuple relational calculus.
Proof: We show by induction on the number of operators in the relational
algebra expression E that there is a TRC formula, with a single free tuple
variable, that defines the same relation as E. The basis, zero operators, requires
that we consider two cases, where E is a relation variable R or a constant
relation. If E is a relation name R, then formula R(n) suffices. If E is a
159
constant, say {/iI, , Ain}, consisting of tuples of arity k, then we use one free
variable, say v, and we write the TRC formula
i/[l] = /n[l] A i/[2] = m[2] A A i/[fc] = /!,[*] V
i/[l] = /i2[l] A i/[2] = /ij[2] A A v[k] = fi2[k] V
i/[l] = /in[l] A i/[2] = /i [2] A A ,/[*] = ,i[*]
Note that for all i and j, ni[j] is a constant here. Thus, the above formula is
safe, by rules (2) and (3b) of the safety definition.
For the induction, we consider the five cases.
1. E = EI U F2. Then there are TRC formulas FI and F2 for EI and F2.
By renaming if necessary, we may assume that the lone free tuple variable
in FI and F2 is n. Because E\ and F2 have the same arity, the free tuple
variables of FI and F2 must also have the same arity, so the renaming is
permitted. Then FI V F2 is a TRC formula for F.
2. E = EI F/2. As in (1), assume there are formulas F\(fi) and F2(fJ.) for
EI and F/2, respectively. Then FI A -'F2 is a TRC formula for E.
3. E = 7rj,,...,ilk(F,i). Let Fi(I/) be a TRC formula for EI. Then a formula for
F, with free variable fi, is
(3i/)(Fi(i/) A /i[l] = i/[n] A /i[2] = i/[t2] A A n[k] = i/[tfc])
4.
F = EI x F2. Let F1(i/(m)) and F2(pW) be TRC formulas for FI and F/2.
Then the TRC formula for F, with lone free variable /i(m+"), is
(3i/)(3p)(F1(i/)AF2(p)A
p.[l] = i/[l] A A n[m] = v[m] A
160
(3.15)
161
162
(3.17)
163
164
EXERCISES
3.1: In Example 1.13 we gave datalog rules for a simple black/white cell defi
nition system. Generalize the system to allow n colors, 1,2, ...,n. That
is, EDB relation contains(I, J, X, Y) has the same meaning as in Example
1.13, and EDB relation aet(I, X, Y, C) says that cell / has color C at point
(X, Y). Define color(I, X, Y, C) to mean that cell / has color C at point
(X, Y), either directly or through the presence of some subcell. If a point
is defined to have more than one color, then color will be true for each.
EXERCISES
165
* 3.2: Modify your answer to Exercise 3.1 so at most one color is defined for any
point, using the rules:
i)
If / has two subcells that each define a color for a given point, the
higher-numbered color predominates.
tt) If / has a subcell J, then whatever color a point has in J (including
both directly denned colors and colors required by subcells of J) pre
dominates over a color defined for that point directly in / [i.e., via
set(I,X,Y,C)}.
Hint: Simplify life by ignoring the translation of coordinates. Start by
assuming there is only one point per cell, and the EDB predicates are
set(I,C) and contains(I,J).
3.3: Are your rules from Exercise 3.2 (a) safe? (b) stratified?
** 3.4: Show that
a)
b)
Your rules from Exercise 3.2 can have more than one minimal fixed
point for some values of set and contains.
Relation contains is acyclic, if the graph whose nodes correspond to
cells and that has an arc from / to J if contains(I, J, X, Y) is true for
some X and Y, is acyclic. Show that if contains is acyclic, then your
rules from Exercise 3.2 have a unique least fixed point.
166
procedure p(x) ;
local a;
call q(a);
call q(x) ;
end
procedure q(y) ;
call p(y);
call q(3);
end
Figure 3.8 Example program for Exercise 3.5.
3.6: Suppose we have EDB relations
frequents(Drinker, Bar)
serves(Bar, Beer)
likes(Drinker, Beer)
*
3.7:
3.8:
The first indicates the bars a drinker visits; the second tells what beers
each bar serves, and the last indicates which beers each drinker likes to
drink. Define the following predicates using safe datalog rules.
a) happy(D) that is true if drinker D frequents at least one bar that
serves a beer he likes.
b) shouldVisit(D, B) if bar B serves a beer drinker D likes.
c) veryHappy(D) if every bar that drinker D frequents serves at least
one beer he likes. You may assume that every drinker frequents at
least one bar.
d) sad(D) if drinker D frequents no bar that serves a beer he likes.
Write each of the queries of Exercise 3.6 in (i) relational algebra (ii) safe
DEC (iii) safe TRC.
Assuming R and S are of arity 3 and 2, respectively, convert the expression
iri,5(o-$2=$4V$3=*4(fi x S)) tO
a)
b)
EXERCISES
167
3.13: Consider the rules in Figure 3.9. Here, s is the only EDB predicate.
a) Rectify the rules.
b) Write relational algebra expressions for the relations defined by the
IDB predicates p, q, and T. To simplify, you may use the result for
one predicate as an argument of the expression for other predicates.
c) Produce algebraic expressions directly from the rules (without rectifi
cation) by using the extended projection operator.
d) Write a safe DRC expression for the relation of q.
168
3.14: Verify that the expression T(X) ixi U(X,Z) ixi S(Y,Z) in Example 3.5
defines the relation for the body of the rule (3.4) from that exercise.
* 3.15: Complete the proof of Lemma 3.1 by showing that substituting X for Y in
a rule that has X = Y as a subgoal does not make the rule unsafe if it was
safe and does not change the set of facts that the head of the rule yields.
3.16: Complete the proof of Theorem 3.2 by showing that the set of facts pro
duced by Algorithm 3.2 is a subset of any model of the rules, and therefore
is the unique minimal model.
3.17: Complete the proof of Theorem 3.3 by showing that the operations union,
projection, and product are monotone.
3.18: Show that intersection is monotone.
* 3.19: Is 4- a monotone operator?
3.20: Show Corollary 3.1: the composition of monotone operators is monotone.
3.21: Show that Algorithm 3.3 computes the proof-theoretic meaning of datalog
rules, i.e., the set of facts that can be inferred by applying the rules in the
forward direction (from body to head).
3.22: Rules are said to be linear if they each have at most one subgoal with an
IDB predicate. Give a simplification of Algorithm 3.4 (semi-naive evalua
tion) for the case that rules are linear.
* 3.23: A logic program is said to be metalinear if we can partition the predicates
into "strata" such that a rule whose head is in stratum t can have no
subgoals of stratum above i and at most one subgoal at stratum i. Note
these "strata" have nothing to do with negated subgoals; we assume there
are none.
a) Give an example of a datalog program that is metalinear but not
linear.
b) Simplify Algorithm 3.4 for the case that the rules are metalinear.
* 3.24: Extend Algorithm 3.4 to the case that the rules are stratified rules (in the
sense of Section 3.6, not Exercise 3.23) with negations.
3.25: Consider the rules:
p(X,Y) :- q(X,Y) ft -,r(X) .
r(X) :- s(X,Y) ft -'t(Y).
r(X) :- s(X,Y) ft r(Y).
a)
b)
c)
EXERCISES
169
-'s(X).
ft p(Y).
-ip(X).
ft q(Y) .
170
Under the closed world assumption, what negative facts for IDB pred
icates p and q can be inferred?
Under the generalized closed world assumption, what negative facts
for p and q can be derived?
Are there contradictory facts deduced in your answers to (a) and/or
(b)?
BIBLIOGRAPHIC NOTES
171
3.38: Some definitions of logical rules allow predicates that are mixed EDB/IDB
predicates. That is, a predicate p may have a stored relation with some
tuples for which p is true, and there may be rules that define additional
tuples for which p is true. Show that any such collection of rules can be
replaced by another collection, defining the same relations, in which each
predicate is either EDB or IDB, but not both.
BIBLIOGRAPHIC NOTES
The basic concepts of logic are found in Manna and Waldinger [1984], and the
elements of logic programming appear in Lloyd [1984] and Apt [1987].
There have been two directions from which applications of logic to database
systems have been approached. One, often called "deductive databases," em
phasizes issues of expressibility of languages, and semantic issues, such as the
closed world assumption. Gallaire and Minker [1978] is a compendium of basic
results in this area, and later surveys were written by Gallaire, Minker, and
Nicolas [1984] and Minker [1987]. Minker [1988] is a collection of recent papers
on the subject. A critique of this area is found in Harel [1986].
The second direction emphasizes the optimization of queries expressed as
logic programs. We shall cover this area in detail in Chapter 13 (Volume II).
Bancilhon and Ramakrishnan [1986] is a survey of results in this class.
Relational Calculus
Codd [1972b] is the basic paper on relational calculus, including the equivalence
with relational algebra. Pirotte [1978] classifies query languages into domaincalculus and tuple-calculus languages.
Levien and Maron [1967] and Kuhns [1967] were early papers on the use
of similar forms of logic as a query language.
Klug [1981] extends the logic and the algebra-logic correspondence to aggre
gate operators (sum, average, etc.). Kuper and Vardi [1984] develop a calculus
for a model more general than the relational model; it is similar to the "object
model" discussed in Section 2.7.
Fixed-Point Semantics of Logic Programs
The fixed point semantics for datalog that we developed in Section 3.5 was
explored in the context of logic programming by Van Emden and Kowalski
[1976] and Apt and Van Emden [1982], and in the database context by Chandra
and Harel [1982]. The basic mathematics, relating monotonicity to the existence
of least fixed points goes back to Tarski [1955].
Reiter [1984] compares the proof-theoretic and model-theoretic approaches
to defining semantics.
172
Semi-Naive Evaluation
BIBLIOGRAPHIC NOTES
173
call stratified datalog and defined their meaning to be the "perfect" fixed point.
Immerman [1982] proved the surprising result that any query expressible in this
language can be expressed with a single level of negation, i.e., with two strata.
However, the number of arguments in predicates in the two-strata program may
be very much larger than in the original, so this technique is not generally useful
as an optimization.
Apt, Blair, and Walker [1985] considered the multiplicity of minimal fixed
points for logic programs and argued that for stratified programs the "perfect"
fixed point is the preferred one. Van Gelder [1986] independently argued the
same and gave a relatively efficient algorithm for testing whether a given fact
is in the perfect model of a stratified datalog program. Additional application
of the "stratified" concept appears in Apt and Pugin [1987] and Przymusinski
[1988].
The Closed World Assumption
The fundamental paper on the CWA is Reiter [1978]; also see Reiter [1980]
for a discussion of the domain closure and unique-value axioms. Minker [1982]
introduces the generalized CWA.
There is a close connection between the CWA and the "negation as failure"
idea in Clark [1978]; see Shepherdson [1984].
McCarthy [1980] defines a more general metarule for inferring negative in
formation, called "circumscription." Lifschitz [1985] and Gelfond, Przymusinska, and Przymusinski [1986] relate circumscription and the CWA.
A fundamental problem with all attempts to define metarules for negative
information is the complexity of answering queries according to these rules.
Przymusinski [1986] attempts to provide an algorithm for answering queries in
the presence of circumscriptions, but the question whether the circumscription
approach can be made computationally tractable remains open.
CHAPTER 4
Relational
Query
Languages
175
2.
3.
For these reasons, the languages we shall discuss are really "more than
complete"; that is, they can do things with no counterpart in relational alge
bra or calculus. Many, but not all, become equivalent to relational calculus
when we throw away arithmetic and aggregate operators. Some languages, like
Query-by-Example (Section 4.4), may be called "more than complete" even
after eliminating arithmetic and aggregation. The original design for Queryby-Example allows computation of the transitive closure of a relation, although
not all implementations support this feature, and we do not discuss it here.
Recall that transitive closure is not something that can be expressed by nonrecursive logic, and therefore, by Theorem 3.12, cannot be expressed in relational
algebra or the two forms of relational calculus.
176
*C(*A=a(R(A,B)lxS(B,C)))
(4.1)
This query says: "print the C-values associated with .A-value a in the joined
relation [Rtxi S\(A,B,C). An equivalent domain calculus expression is
{C\(3B)(r(a,B)*3(B,C))}
(4.2)
If we compare (4.1) and (4.2) we see that the calculus expression does in
fact tell only what we want, not how to get it; that is, (4.2) only specifies the
properties of the desired values C. In comparison, (4.1) specifies a particular
order of operations. It is not immediately obvious that (4.1) is equivalent to:
7rC (lTB (<TA=aR(A, B)) M S(B, C))
(4.3)
To evaluate (4.3) we need only look for the tuples in R that have .A-value a and
find the associated B-values. This step computes Ri(B) = irB(0A=aR(A,B)).
Then we look for the tuples of 5 whose B-values are in RI, i.e., we compute
Ri(B) M S(B,C). Finally, we project this relation onto C to get the desired
answer.
As we suggested in Example 2.22, this operation can be quite efficient if
we have the proper indices. An index on attribute A for relation R allows us
to find those tuples with .A-value a in time that is proportional to the number
of tuples retrieved. The set of B-values in these tuples is the set RI. If we
also have an index on B for relation S, then the tuples with B-values in RI
can likewise be retrieved in time proportional to the number of tuples retrieved.
From these, the C-values in the answer may be obtained. The time to do these
steps could be proportional to the sizes of R and 5, since in the worst case,
all tuples in these relations have the desired .A-values or B-values. However, in
typical cases, the size of RI and the size of the answer will be much smaller
than the sizes of the relations, so the time to perform the query by following
(4.3) is much less than the time to look at R and S.
In comparison, (4.1) requires that we evaluate the natural join of R and
5, which could involve sorting both relations on their B-values and running
through the sorted relations. The resulting relation could be very large com
pared to R and 5. Under no circumstances could (4.1) be evaluated in less
time than it takes to scan at least one of R and S, no matter how we choose
to do the join. Thus, the time to evaluate (4.1) exceeds the time to evaluate
(4.3), often by a wide margin, even though the relations computed by the two
expressions are always the same.
In principle, we can always evaluate (4.2) like (4.3) rather than (4.1), which
appears to be an advantage of calculus over algebra, especially as (4.1) is sim
pler, and therefore more likely to be written than is (4.3). However, an opti
mization pass in an algebra-based query language compiler can convert (4.1)
177
178
Relational algebra
R\JS
R-S
Rr\s
Rc<S
ISBL
R+S
R-S
R.S
R:F
R*S
If we write
T = (R*S): B=C 7, A,D
the composition of the current relations R and 5 would be computed and as
signed to relation name T. Note that as R and 5 have attributes with different
names, the *, or natural join operator, is here a Cartesian product.
However, suppose we wanted T to stand not for the composition of the
current values of R(A, B) and S(C, D) but for the formula for composing R
and 5. Then we could write
T = (N!R*N!S): B=C 7. A,D
The above ISBL statement causes no evaluation of relations. Rather, it defines
T to stand for the formula
(R*S): B=C 7. A,D
If we ever use T in a statement that requires its evaluation, such as
LIST T
or
U = T+V
the current values of R and S are at that time substituted into the formula for
T to get a value for T. D
The delayed evaluation operator N! serves two important purposes. First,
large relational expressions are hard to write down correctly the first time.
Delayed evaluation allows the programmer to construct an expression in easy
stages, by giving temporary names to important subexpressions. More impor
tantly, delayed evaluation serves as a rudimentary facility for defining views. By
179
defining a relation name to stand for an expression with delayed evaluation, the
programmer can use this name as if the defined relation really existed. Thus, a
set of one or more defined relations forms a view of the database.
Renaming of Attributes
In ISBL, the purely set theoretic operators, union, intersection, and difference,
have definitions that are modified from their standard definitions in relational
algebra, to take advantage of the fact that components have attribute names
in ISBL. The union and intersection operators are only applicable when the
two relations involved have the same set of attribute names. The difference
operator, R S, is the ordinary set-theoretic difference when R and S have the
same set of attribute names. More generally, if some of the attributes of R and
5 differ, then R - S denotes the set of tuples n"mR such that /i agrees with no
tuple in 5 on those attributes that R and 5 have in common. Thus, in ISBL
the expression R 5, if R is R(A, B) and 5 is S(A, C1), denotes the relational
algebra expression
180
this information was shown in Figure 2.8. Of the eight relations of that scheme,
we shall deal with four that will serve for most of our examples. These relations
tell about customers, the orders for delivery that they place, the items on each
order, and the suppliers of those items. The schemes for these relations, with
some attributes renamed from Figure 2.8 to allow the use of the same attribute,
e.g., NAME, with different meanings in different relations, are:
CUSTOMERS (NAME, ADDR, BALANCE)
ORDERS (O#, DATE, CUST)
INCLUDES (O#, ITEM, QUANTITY)
SUPPLIES (NAME, ITEM, PRICE)
In Figure 4.2 we see sample data that will serve as the "current instance" of
this database.
We shall now consider some typical queries on the YVCB database and
their expression in ISBL. For comparison, we shall use these same queries as
examples for several other languages as well.
Example 4.2: The simplest queries often involve a selection and projection on
a single relation. That is, we specify some condition that tuples must have, and
we print some or all of the components of these tuples. The specific example
query we shall use is
Print the names of customers with negative balances.
In ISBL we can write
LIST CUSTOMERS: BALANCE<0 '/. NAME
The clause BALANCE<0 selects the first and second tuples, because their values
in column 3 (BALANCE) is negative. The projection operator leaves only the
first column, NAME, so LIST causes the table
Zack Zebra
Judy Giraffe
to be printed. D
Example 4.3: A more complicated type of query involves taking the natural
join, or perhaps a more general join or Cartesian product of several relations,
then selecting tuples from this relation and printing some of the components.
Our example query is:
Print the suppliers who supply at least one
item ordered by Zack Zebra.
This query asks us to go to the ORDERS relation to find the numbers of all the
orders placed by Zack Zebra. Then, armed with those numbers, we go to the
INCLUDES relation to find the items ordered by Zebra, which are the items
associated with these order numbers. Lastly, we go to the SUPPLIES relation
NAME
Zack Zebra
Judy Giraffe
Ruth Rhino
ADDR
74 Family Way
153 Lois Lane
21 Rocky Road
BALANCE
-200
-50
+43
(a) CUSTOMERS
o#
1024
1025
1026
DATE
Jan 3
Jan 3
Jan 4
CUST
Zack Zebra
Ruth Rhino
Zack Zebra
(b) ORDERS
o#
1024
1024
1025
1025
1025
1026
ITEM
Brie
Perrier
Brie
Escargot
Endive
Macadamias
QUANTITY
3
6
5
12
1
2048
(c) INCLUDES
NAME
Acme
Acme
Acme
Acme
Ajax
Ajax
Ajax
ITEM
Brie
Perrier
Macadamias
Escargot
Brie
Perrier
Endive
PRICE
3.49
1.19
.06
.25
3.98
1.09
.69
(d) SUPPLIES
181
182
to find the suppliers of those items. While we could write the query directly, it is
conceptually simpler to begin by defining the join that follows these connections
from ORDERS to INCLUDES to SUPPLIES. This connection happens to be
a natural join, since the connecting attributes, O# and ITEM, have the same
names in each of the connected relations; if that were not the case we would
have to use renaming to adjust the attributes. We define the natural join by:
OIS = N! ORDERS * N! INCLUDES * N! SUPPLIES
In this way, OIS is defined to be a relation with scheme
OIS(O#, DATE, GUST, ITEM, QUANTITY, NAME, PRICE)
Note that evaluation of OIS is deferred. When evaluated, it would consist of
all those (o, d, c, i,q, n,p) tuples such that customer c placed order o on date
d, order o includes an order for quantity q of item i, and supplier n supplies
i at price p. To complete the query, we have only to select from this relation
the tuples for customer Zack Zebra and project onto the name attribute, to
produce the set of all suppliers for the items ordered by Zebra. This step is:
OIS: CUST="Zack Zebra" '/. NAME
Since Zack Zebra placed orders 1024 and 1026; the first includes Brie and Perrier
and the latter includes Macadamias, and both Ajax and Acme supply at least
one of these, the answer to the query is {"Ajax", "Acme"}. D
Example 4.4: A still more complicated sort of query involves what amounts
to a "for all" quantifier. The particular query we shall consider is:
Print the suppliers that supply every
item ordered by Zack Zebra.
Such queries are easier in calculus languages than algebraic languages.
That is, in domain calculus we can write the query as
(VI)(((3P)supplies(N,I,P)) V
-<((3O)(3D)(3Q)(orders(O,D, "Zack Zebra") A
indudes(O,/,Q))n j
(4.4)
That is, print the set of supplier names N such that for all items /, either N
supplies / [there exists a price P such that (N, /, P) is a tuple of SUPPLIES] or
there exists no order by Zack Zebra for item /. The latter condition is expressed
by the negation of the condition that there is an order number O, a date D, and
a quantity Q such that orders(O, D, "Zack Zebra"), i.e., Zebra placed order O,
and includes(O, /, Q), i.e., item / is included in that order. Notice also that
p V -'q is logically equivalent to p q, i.e., p implies q, so we are saying that if
183
Zebra placed an order for the item /, then supplier N supplies it.
To convert (4.4) to algebra, it helps to eliminate the universal quantifier.
Recall that we can always do so by
Then, we can use DeMorgan's law to move the generated negation inside the
OR: -'(P V Q) = (-.P) A (-,Q). The resulting expression is:
| -Y(3/)(-.((3P)suppftea(AT,/,P)) A
(3O)(3D)(3Q)(orders(O,D, "Zack Zebra") A
(4.5)
Equation (4.5) is not safe; it is not even domain independent [if Zebra hasn't
ordered any items, then both (4.4) and (4.5) define the set of all suppliers in
the domain]. However, if we make the closed world assumption, that the only
suppliers that exist are those that appear in the SUPPLIES relation, we can
work with (4.5) to produce an algebraic expression.1 We first compute the set
of all suppliers, and then subtract those that satisfy the body of (4.5), that is,
there is an item that Zebra orders but which the supplier doesn't sell. The set
of all suppliers is
ALLSUPS = SUPPLIES '/. NAME
For the database of Figure 4.2, ALLSUPS is {"Ajax", "Acme"}.
In a manner similar to the previous example, we can find all of the items
ordered by Zebra by:
ZEBRAITEMS = (ORDERS * INCLUDES):
CUST="Zack Zebra" */. ITEM
For our example database, ZEBRAITEMS is
{"Brie", Terrier", "Macadamias" }
Next, we use a trick that was introduced in Example 3.16. To find the
suppliers that fail to supply some item in the set ZEBRAITEMS, we take from
SUPPLIES the set of pairs (n, t) such that supplier n does supply item i, and
subtract it from the set of pairs consisting of any supplier and any item in
ZEBRAITEMS. The difference is the set of pairs (n, t) such that n is some
1 Perhaps we should also consult the SUPPLIERS relation, mentioned in Figure 2.8 but
not used in this section, since that relation, holding supplier names and addresses, might
mention a supplier that does not appear in the SUPPLIES relation, presumably because
it sells nothing now. If Zebra ordered nothing, then such a supplier would satisfy the
query.
184
supplier, i is an item Zack Zebra ordered, but n doesn't supply i. This set of
pairs can be obtained by the sequence of steps:
NIPAIRS = SUPPLIES '/. NAME, ITEM
NOSUPPLY = (ALLSUPS * ZEBRAITEMS) - NIPAIRS
The result is that only "Acme" is printed. The entire ISBL program is shown
in Figure 4.3, where we have treated all the assignments as view definitions, to
be executed only when the answer is called for by the last statement. D
ALLSUPS = N! SUPPLIES 7. NAME
ZEBRAITEMS = (N! ORDERS * N! INCLUDES):
CUST="Zack Zebra" '/. ITEM
NIPAIRS = N! SUPPLIES '/. NAME, ITEM
NOSUPPLY = (N! ALLSUPS * N! ZEBRAITEMS) - N! NIPAIRS
LIST (ALLSUPS - (NOSUPPLY '/. NAME))
Figure 4.3 Solution to query of Example 4.4.
ISBL Extensions
The ISBL language is fairly limited, when compared with query languages to be
discussed in the next sections. For example, it has no aggregate operators (e.g.,
average, min), and there are no facilities for insertion, deletion, or modification
of tuples. However, there exists in the surrounding PRTV system the facility
to write arbitrary PL/I programs and integrate them into the processing of
relations. The simplest use of PL/I programs in ISBL is as tuple-at-a-time
processors, which serve as generalized selection operators.
Example 4.5: We could write a PL/I program LOWADDR(S) that examines
the character string S and determines whether 5, as a street address, has a
number lower than 100, returning "true" if so. We can then apply LOWADDR
to an attribute in an ISBL expression, with the result that the component for
that attribute in each tuple is passed to LOWADDR, and the tuple is "selected"
185
if LOWADDR returns "true." The syntax of ISBL calls for the join operator
to be used for these generalized selections. Thus
LIST (CUSTOMERS * LOWADDR(ADDR)) '/, NAME
prints the names of customers whose street number does not exceed 99,
{"Zack Zebra", "Ruth Rhino"}
for the example database of Figure 4.2. D
PL/I programs that operate on whole relations, rather than tuples, can also
be defined. To facilitate such processing, the PRTV system allows relations to
be passed to PL/I programs, either as relational read 61es, or relational write
files. These are ordinary files in the PL/I sense, opened for reading or writing,
respectively. A PL/I program can read or write the next record, which is a tuple
of the underlying relation, into or from a PL/I record structure. The reader
should be able to envision how to write PL/I programs to compute aggregate
operators like sums or averages, to delete or modify tuples in arbitrarily specified
ways, or to read tuples from an input file (not necessarily a relational read file;
it could be a terminal, for example) and append them to a relation.
4.3 QUEL: A TUPLE RELATIONAL CALCULUS LANGUAGE
QUEL is the query language of INGRES, a relational DBMS developed at
Berkeley, and marketed by Relational Technology, Inc. In viewpoint and style,
QUEL most closely resembles tuple relational calculus, although the correspon
dence is less close than ISBL's resemblance to relational algebra.
The Retrieve Statement
The most common form of query in QUEL is:
range of n\ is Ri
(4.6)
range of /^ is lit,
retrieve (^.A\,.. .,mr.Ar)
where *(^i,. . . ,/ifc)
The intuitive meaning of a range-statement such as
range of n is R
is that any subsequent operations, such as retrieval, are to be carried out once
for each tuple in relation R, with /i equal to each of these tuples in turn. Thus,
the /Vs in (4-6) are tuple variables, and each range-statement corresponds to
an atomic formula Ri(p-i) of TRC. It is possible to redeclare a tuple variable to
range over another relation, but until one does, the relation corresponding to a
tuple variable does not change. It is unnecessary to include the range statement
186
for p, in every query, if the relation for /i is the one already declared for fi, but
we shall do so for clarity in the examples to follow.
The condition * is a formula involving components of the /ij's. QUEL uses
Hi.B to designate the component for attribute B of the relation Ri, over which
Hi ranges. Component designators and constants can be related by comparison
operators, as in the language C (<= for <, != for ^, and so on). Comparisons
can be connected by the logical connectives, and, or, and not for A, V, and -'.
As each of the /Vs ranges over the tuples of its relation, the QUEL inter
preter determines whether the current /ij's make V true. If so, certain com
ponents of the /ij's are printed. The components of the tuple to be printed
are computed from component designators in the retrieve-clause. That is, the
first component printed is the component of tuple variable /ij, corresponding
to attribute AI of relation R,t , and so on.
The retrieve statement thus prints a table whose columns are headed
A\,...,Ar. If we wish a different name, say TITLE, for column m, use
TITLE = mm.Am
in place of the simple component designator Him.Am.
The QUEL statement form above is thus equivalent to the TRC query:
r
A*'))
(4-7)
In (4.7), the formula *' is * translated from the QUEL notation into the TRC
notation. That is:
1. Some comparison and logical operators are changed; e.g., and becomes A,
and == becomes =.
2. A component designator fii.B becomes Hi[j], where B is the jth attribute
of relation Ri, assuming some fixed order for the attributes of each relation.
Thus, in the first line of (4.7) we have the existential quantification of the
/Vs, which in effect says "let the /ij's range over all possible values." We also
have the atomic formulas Ri(fii), which restrict each tuple variable m to range
only over the tuples of its corresponding relation Ri. Note, incidentally, that
there is no prohibition against two or more tuple variables ranging over the
same relation, and it is sometimes essential that they do.
In the second line of (4.7) we see the equalities that say the tuple v to be
printed consists of certain components of the /ij's, namely those components
that are indicated in the retrieve-clause. Finally, the condition *' on the third
line of (4.7) enforces the where-clause, only allowing the printing of a tuple v
if the /ij's satisfy *.
While the form of a QUEL query is clearly patterned after tuple relational
calculus, it is also convenient to see the same query as an expression of relational
187
algebra:
7rAr(<7F(fli X- Xflfc))
range of o is ORDERS
range of i is INCLUDES
range of s is SUPPLIES
retrieve (s.NAME)
where o.CUST = "Zack Zebra"
and o.O# = i.O#
and i.ITEM = s.ITEM
Figure 4.4 Print the suppliers of an item ordered by Zebra.
To execute the query of Figure 4.4, the QUEL interpreter considers each
choice of a tuple o from ORDERS, i from INCLUDES, and s from SUPPLIES.2
2 Technically, the optimization performed by the QUEL processor will cause it to take a
rather different approach to answering this query, but the result will be the same as the
algorithm described here, which follows the definition of the "meaning" of the query.
See Chapter 11 (Volume II) for details of the actual QUEL processing algorithm.
188
Whenever all the conditions of lines (5)-(7) are satisfied, the NAME component
of the tuple s is printed. The conditions of lines (6) and (7) say that o, t, and
s fit together to form a tuple of the natural join
189
(4.8)
However, consider what happens if 5 is empty. Then the atomic formula S(p) is
never satisfied, and therefore no values of v can ever be found to satisfy the body
of (4.8). Similarly, the result is the empty set whenever R is empty. It is easy to
check that if neither R nor S is empty, then (4.8) produces R U 5, as one would
3 The same is true in SQL, to be discussed in Section 4.6.
190
Delete Statements
In order to perform unions and differences properly, QUEL provides two other
statement forms. To delete from a relation, one can write
range of /.i\ is H\
range of /ifc is /u
delete /t,
where ^(/ii,. . .,/ifc)
Here, ^(/ii,...,/ifc) is a QUEL expression like those that can follow "where"
in the retrieve statement. The effect of this statement is to delete from Ri all
tuples fii for which there exist, for all j = 1, 2, . . . , k other than j = i, tuples
Hj in RJ such that 9(n\,. . . ,/ifc) holds. Note that ^ and the Hj'a are found
before any deletions occur, so the order in which tuples are deleted does not
matter.
Example 4.9: The QUEL command
range of o is ORDERS
range of i is INCLUDES
delete o
where o.O# = i.O# and i.ITEM = "Brie"
191
deletes from the ORDERS relation all orders that include Brie among their
items. The deletion occurs only from ORDERS; the information is left in the
INCLUDES relation, where it constitutes a collection of "dangling" tuples,
no longer connected to an existing order. Probably, we should also issue a
command to delete from INCLUDES all tuples whose order number is the same
as the order number of some (perhaps other) tuple whose ITEM component is
"Brie." D
Append Statements
Similarly, QUEL has an append statement to perform unions, among other
tasks. We can write
range of /ii is R^
range of Hk is Rk
append to S(A\ = 1, . . . , An = n)
where *(/ii, . . . ,/ifc)
Here ^ is a QUEL expression as above, and the j's are expressions involving
components of the /V and/or constants, connected by arithmetic operators, if
needed. For each assignment of values to the nj's such that 9(fi\, . . . , /ifc) is
true, we add to relation 5 the tuple whose component for attribute Ap is the
value of p, for p = 1, 2, . . . , n.
Example 4.10: We could insist that every order in the YVCB database include
ten pounds of Brie by writing:
range of o is ORDERS
append to INCLUDES (O#=o.O#, ITEM="Brie" , QUANTITY=10)
Note that the where-clause is not required in the append statement, and
it is possible, indeed more usual, for the append statement to be used without
tuple variables like o above, for the purpose of appending a single tuple to a
relation. Thus, we can add Sammy Snake to the list of YVCB customers, with
append to CUSTOMERS(NAME="Sammy Snake",
ADDR="56 Allina Row" , BALANCE=0)
D
Retrieval into a Relation
We are still not ready to simulate any relational algebra expression in QUEL;
we need the capability to assign values to new relations. If 5 is the name of a
new relation we can write
192
range of n\ is Ri
range of /ifc is RI,
retrieve into S(A\ = 1,. .. ,An = n)
where *(/ii,. . . ,/ifc)
This statement will find all lists of tuples Hi , . . . , Hk such that /ij is in Ri for all
t = 1, 2, ...,&, and *(/i1, . . . , Hk) is true. It then creates for relation S a tuple
whose ith component is j. Here, j is a formula as in the append statement.4
The attribute names AI, . . . , An become the names of the components of 5. We
may omit "Ai =" if i is of the form /ij.NAME, whereupon NAME becomes
the tth attribute of S.
Example 4.11: QUEL, like most relational query languages, does not auto
matically remove duplicates when it computes a relation, because doing so is a
very expensive operation. However, there are times when allowing duplicates
explodes the size of a relation, and we need to cleanse it of its duplicates. Also,
we frequently do not want duplicate information printed.
Suppose, for example, that we wanted to print the names of all the suppliers
appearing in the SUPPLIES relation. We could write
range of s is SUPPLIES
retrieve (s.NAME)
but then each supplier would be printed once for each item it supplies. QUEL
provides a sort command to eliminate duplicates while it sorts a relation, ini
tially on the first component, then on the second component among tuples
with the same first component, and so on. To print each supplier only once,
and incidentally print them in alphabetical order, we could write
range of s is SUPPLIES
retrieve into JUNK (NAME=s. NAME)
sort JUNK
print JUNK
JUNK has one column headed NAME. We could have eliminated the "NAME="
from the retrieve-clause, since the attribute of s used to form the one column
of JUNK is called NAME. D
Completeness of QUEL
Since we now know how to create temporary relations, all we must do to evaluate
any relational algebra expression is to apply the five basic operators to given
and temporary relations in an appropriate order. That is, we work bottom-up
4 Note that the use of formulas, the C^'a, to compute the components of tuples is permitted
in all retrieve statements, not just those that have an "into" keyword.
193
194
range of s is SUPPLIES
retrieve into ALLSUPS(s.NAME)
range of o is ORDERS
range of i is INCLUDES
retrieve into ZEBRAITEMS(i.ITEM)
where o . O# = i . 0#
and o.CUST = "Zack Zebra"
range of a is ALLSUPS
range of z is ZEBRAITEMS
retrieve into NOSUPPLY (a. NAME, z.ITEM)
/* temporarily, we have set NOSUPPLY to the product
of ALLSUPS and ZEBRAITEMS; we now delete all tuples
that are in NIPAIRS, i.e., they are the NAME and
ITEM components of a SUPPLIES tuple */
range of n is NOSUPPLY
range of s is SUPPLIES
delete n
where n.NAME = s.NAME
and n.ITEM = s.ITEM
range of a is ALLSUPS
range of n is NOSUPPLY
delete a
where a. NAME = n.NAME
/* above computes the answer into ALLSUPS */
print ALLSUPS
Figure 4.6 Print the supplies who supply everything Zebra ordered.
195
range of c is CUSTOMERS
retrieve ( sum (c. BALANCE))
D
We can also partition the tuples of a relation according to the value of
one or more expressions computed from each tuple. We then take an aggre
gate separately for each set of tuples having values in common for each of the
expressions. This partitioning is achieved by writing
agg-op(E by FI, F2, . . . , Ffc)
(4.9)
where E and the F's are expressions whose operands are chosen from among
constants and terms fi.A for one tuple variable /i only. The operands in an
expression may be connected by arithmetic operators. If p ranges over R, the
value of (4.9) for a given value of /i is computed by finding the set 5M of all those
tuples v of R such that v and /i give the same value for each of the formulas
FI, . . . , Ffc. Then, apply the aggregate operator agg-op to the value of E(v), as
v ranges over all the tuples in 5M.
Example 4.14: To print the items supplied with their average prices, we could
write
range of s is SUPPLIES
retrieve into DUMMY (ITEM=s. ITEM,
AP = avg(s. PRICE by s.ITEM))
sort DUMMY
print DUMMY
For example, suppose SUPPLIES is the relation of Figure 4.2(d). When fi is
the first tuple, (Acme, Brie, 3.49), we look for all tuples with the same ITEM
value, "Brie," finding the first and fifth tuples. For each of these tuples, we
evaluate the expression PRICE, i.e., we obtain the third field. These values are
3.49 and 3.98, respectively. We then take the average of these values, which is
3.74, rounding up. Thus, from the first tuple of SUPPLIES, we get the tuple
of relation DUMMY that has first component equal to the ITEM, i.e., "Brie,"
and the second component, AP, equal to 3.74. Note that when n is the fifth
tuple of SUPPLIES, we get an identical tuple of DUMMY.
We sort DUMMY to remove duplicates, as DUMMY will have, for each
item, as many tuples as the SUPPLIES relation has for that item. The result
of running the above program on relation SUPPLIES of Figure 4.2 is shown in
Figure 4.7. D
4.4 QUERY-BY-EXAMPLE: A DRC LANGUAGE
Query-by-Example (QBE) is a language developed in Yorktown Heights by
IBM. It contains a number of features not present in relational algebra or cal
196
ITEM
Brie
Perrier
Macadamias
Escargot
Endive
AP
3.74
1.14
.06
.25
.69
for commands
on tuples
for tuples mentioned
in queries
Figure 4.8 A QBE table skeleton.
Queries are posed by using domain variables and constants, as in domain
relational calculus, to form tuples that we assert are in one of the relations whose
skeletons appear on the screen. Certain of the variables, those prefixed by the
operator P . , are printed.5 When a tuple or combination of tuples matching the
conditions specified by the query are found, the components for those attributes
preceded by P . are printed.
All operators in QBE end in dot, and the dot is not itself an operator.
197
Before going into detail regarding the form and meaning of queries in QBE,
let us take an example of what a typical query looks like. Suppose we want
to answer the query of Example 4.3, to print the suppliers of items ordered by
Zack Zebra, and we have the ORDERS, INCLUDES, and SUPPLIES relations
available in the database. We call for three table skeletons to be displayed. In
the box reserved for the relation name, in one skeleton, we type ORDERS P. . In
response to the P . , the attributes of ORDERS will appear along the first row
of that skeleton, as shown in Figure 4.9. Similarly, we type INCLUDES P. in the
upper left corner of the second skeleton to get the attributes of the INCLUDES
relation, and we type SUPPLIES P . to get the attributes of the SUPPLIES
relation, in the third skeleton.
ORDERS
O#
DATE
Zack Zebra
_123
INCLUDES
SUPPLIES
CUST
O#
ITEM
-123
-banana
NAME
ITEM
P.
-banana
QUANTITY
PRICE
In Figure 4.9 we see this query expressed in QBE. In each of the skeletons
is a tuple of the relation for that variable, with the important features shown.
For example, the customer name in the ORDERS skeleton is specified to be
Zack Zebra. The order number in the ORDERS and INCLUDES relations are
required to be the same, indicated by the fact that the domain variable -123
appears in both places. Likewise, the ITEM in the INCLUDES and SUPPLIES
tuples must be the same, because the one domain variable -banana appears
in both places. The entry Zack Zebra appears with no quotation marks or
underscore, to indicate it is a literal, while all variables in QBE must have
names that begin with an underscore.6
6 Note that this convention, preceding names of domain variables by an underscore and
leaving literals unadorned, is diametrically opposed to the usual style of query languages
and programming languages, where character string literals are adorned with quotes,
198
(4.10)
and variables are unadorned. Also observe that Query-by-Example takes its name from
the suggestion that variable names be chosen to be examples of the object desired.
However, as with variables of other languages, the name "banana" has no semantic
meaning, and it could be replaced in all its occurrences by "junk," "a," or "xyz."
199
Example 4.15: Suppose we wish to print the order number and quantity
ordered, for all orders for brie. We can express this query in domain calculus
as
{A\A2 | includes(A\, "Brie", A2)}
and in QBE as
O#
ITEM
QUANTITY
P. -123
Brie
P.
INCLUDES
Here variable _123 replaces A\. We could have omitted -123 altogether, since
it appears only once. We have taken our option not to create a variable for A2 ,
since it also appears only once.
Let us consider another query: print the name, address, order number, and
date for all current orders. In domain calculus this query is:
As no term has all the A's, we call for a new table skeleton, as well as the
skeletons of CUSTOMERS and ORDERS. The query is shown in Figure 4.10.
CUSTOMERS
NAME
ADDR
-Snake
-Rock
Off
DATE
CUST
-123
-today
-Snake
ORDERS
P . _Snake
P._Rock
BALANCE
P. -123
P. -today
It would also have been permissible to write the unnamed relation of Figure
4.10 as
P.
-Snake
_Rock
-123
-today
200
201
that value changes as we allow tuple variables to range over all tuples, as in the
implementation procedure just described.
Example 4.16: The query of Figure 4.11(a) asks for all supplier-item-price
triples for which the price is at least a dollar. Figure 4.11(b) asks for all items
such that at least one supplier sells the item at a price greater than the lowest
price for Perrier.
SUPPLIES
P.
NAME
ITEM
PRICE
> 1.00
ITEM
P.
Perrier
PRICE
(a)
SUPPLIES
NAME
-X
< _x
(b)
Figure 4.11 Queries using inequalities.
The query of Figure 4.11(b) is implemented by creating tuple variables n
and v for the two rows of the skeleton. As we allow n and v to range over
the various tuples in SUPPLIES, we check for matches. Tuple /i must have
some PRICE component, which defines a value for _x. For example, _x = 3.49
when p is the first tuple of SUPPLIES in Figure 4.2(d). We then look at the
PRICE and ITEM components of v. If i>[PRICE] is less than the value of _x,
and i/[ITEM] is "Perrier," then we have a match. We therefore perform the
action indicated by tuple n, that is, we print ^[ITEM]. For example, if fJL is
the first tuple and v the second tuple in the relation of Figure 4.2(d), then the
conditions are met, and we print "Brie," which is /i[ITEM].
Note that QBE, unlike QUEL, eliminates duplicates automatically. Thus
"Brie" would be printed only once, even though there are, in Figure 4.2(d), two
tuples for Brie and two for Perrier, and the price of Brie exceeds the price of
Perrier in all four combinations. D
Another way to designate a set is to use an entry that is part constant and
part variable. Juxtaposition represents concatenation, so if the domain for this
entry is character strings, we can try to match any constant character strings in
the entry to substrings of the string that forms the corresponding component
of some tuple. If we find such a match, we can assign pieces of the remainder
of the string to the variables in the entry.
202
Example 4.17: To print all the orders placed in January we could write
ORDERS
O#
DATE
CUST
Jan -32
P.
If the date component of a tuple begins with "Jan" then the remainder of that
date matches variable -32, and the entire tuple is printed. For the relation of
Figure 4.2(b), all tuples would be printed, since all are dated January. D
Negation of Rows
We may place the symbol -< in the first column (the column with the relation
name R) of any row. Intuitively, the query then requires that any tuple match
ing the row not be a tuple of R. We shall try to be more precise later, but first
let us consider an example.
Example 4.18: Suppose we wish to print the order or orders with the largest
quantity. We could use the aggregate function MAX., to be described later, but
we can also do it with a negation. Rephrase the query as: "print an order if
there is no order with a larger quantity." This condition is expressed in QBE
in Figure 4.12. D
INCLUDES
1
O#
P.
ITEM
QUANTITY
_x
> -X
Figure 4.12 Print orders such that no order has a larger quantity.
203
For the query of Figure 4.12, the outer loop is on tuple variable /i, and the
inner loop is on v. For a fixed /i, we print /i[O#] only if, while considering all
values of i/, we never find a quantity larger than /i[QUANTITY]. If we followed
this procedure on the data of Figure 4.2(c), then when p, was any tuple but
the last, the quantity in v, when v was the last tuple, would be greater than
the quantity in /i. When fi is the last tuple, no value of i/, including the last
tuple, has a greater quantity, so /i[O#], which is 1026, would be the only order
number printed.
Aggregate Operators
QBE has the usual five aggregate operators, denoted SUM., AVG., MAX., MIN.,
and CNT. (count). There are two other operators, ALL. and UN. (unique) that
often are used in conjunction with aggregate operators. ALL. applied to a
domain variable produces the list of values that the variable takes on as we
run through all the tuples in the relevant relation. The list may have duplicate
elements; it is not the same as a set. Thus, the ALL. operator effectively leaves
duplicates in, while most other QBE operations eliminate duplicates.
Example 4.19: To compute the average balance of YVCB customers we write
CUSTOMERS
NAME
ADDR
BALANCE
P . AVG . ALL . _x
The tuple variable n for this row ranges over all customers, and for each one,
domain variable _x takes on the value /^[BALANCE].
The expression ALL . _x produces the list of values assumed by _x. Should
two customers have the same balance, that balance will appear twice. To com
pute the average balance, we want duplicates left in, or else balances appearing
in the tuples for two or more customers would receive less weight than they
deserve when we take the average.
The expression AVG . ALL . _x then produces the average of all the elements
on the list that was produced by ALL. x. Duplicates are not eliminated prior
to taking the average, which we just argued is what we want in this example.
Finally, the P. causes the average to be printed. D
The operator UN . converts a list into a set, by eliminating duplicates.
Example 4.20: Suppose we wanted to know how many suppliers there are in
the YVCB database. If we (incorrectly) wrote
SUPPLIES
NAME
P.CNT.ALL._x
ITEM
PRICE
204
and applied it to the relation of Figure 4.2(d) we would get the answer 7, since
variable _x takes on a list of seven values, one for each tuple in the relation.
The correct way to pose the query is
SUPPLIES
NAME
P.CNT.UN.ALL.jc
ITEM
PRICE
In this way, before counting the set of suppliers produced by the expression
ALL._x, the operator UN. removes duplicates. The value of the expression
UN.ALL._x is the set {"Acme", "Ajax"}. Then the operator CNT. computes
the size of this set, and P. prints the correct answer, 2. D
NAME
Ajax
ITEM
Escargot
PRICE
.24
Notice that this query is implemented by a special case of the QBE implemen
tation rule. Since there are no tuple variables on which to loop, we simply
execute the insert operation once. The row to be inserted has no variables, so
the components of the inserted tuple are well defined.
If instead, Ajax wants to sell Escargot for the same price that Acme sells
them, we could retrieve Acme's price as we perform the insertion:
SUPPLIES
I.
NAME
Ajax
Acme
ITEM
Escargot
Escargot
PRICE
-ripoff
-ripoff
A tuple variable p, for the second row ranges over all tuples in SUPPLIES.
Assuming the data of Figure 4.2(d), the only value of n that contains the
constants "Acme" and "Escargot" for NAME and ITEM, respectively, also has
.25 in its PRICE component. Thus, the value .25 is given to the variable -ripoff
when p. reaches this tuple. At that time, the insert action of the first row is
205
taken, with variable _ripoff bound to .25, so the tuple ("Ajax", "Escargot", .25)
is inserted into SUPPLIES. D
Updates
The update operation can only be understood if we are aware that the QBE sys
tem allows us to define key and nonkey attributes of relations, by a mechanism
to be discussed shortly. The set of key attributes must uniquely determine a tu
ple; that is, two different tuples in a relation cannot agree on all key attributes.
If we place the update (U . ) operator in the first column of a row, then entries
in key fields must match the tuple updated, and any tuple of the relation that
does match the row of the skeleton in the key attributes will have its nonkey
attributes updated to match the values in the row with the U. operator.
Example 4.22: In the SUPPLIES relation, NAME and ITEM are key at
tributes and PRICE is nonkey. That is, NAME and ITEM together form a key
for the relation. If Acme decides to lower its price for Perrier to one dollar, we
may update the YVCB database by:
SUPPLIES
0.
NAME
Acme
ITEM
Perrier
PRICE
1.00
If Acme instead decides to lower all its prices by 10%, we can write:
SUPPLIES
U.
NAME
Acme
Acme
ITEM
-spam
_spam
PRICE
.9*_ripoff
_ripoff
Note the use of an arithmetic expression in the row to be updated. The use of
arithmetic is permitted where it makes sense, such as in rows to be updated or
inserted, and in "condition boxes," a concept to be described next. The execu
tion of the above command follows the general rules we have been following. A
tuple variable for the second row is allowed to range over all SUPPLIES tuples.
Whenever the tuple has supplier name Acme, the variable jspam gets bound to
the item, and _ripoff gets bound to the price. We then update the unique tuple
with NAME equal to "Acme" and ITEM equal to the value of variable -spam,
by changing the PRICE component to .9x jipoff, that is, to 90% of its former
value. D
Condition Boxes
There are times when we wish to include a condition on a query, insertion,
deletion, or update that is not expressed by simple terms such as <3 in the
rows of the query. We can then call for a condition box to be displayed and
206
enter into the box any relationships we wish satisfied. Entries of a condition
box are essentially conditions as in a language like Pascal, but without the use
of the "not" operator, -'. Either AND or & can be used for logical "and," while
OR or | is used for "or." When the query is implemented, a match is deemed
to occur only when the current values of the tuple variables allow a consistent
assignment of values to the domain variables in the query, and these values also
satisfy the conditions.
Example 4.23: Suppose we want to find all the suppliers whose price for Brie
and Perrier together is no greater than $5.00. We can express this query with
a condition box, as shown in Figure 4.13. The two tuple variables /i and v
range over all SUPPLIES tuples. When we find a pair of tuples with the same
supplier name, with /i[ITEM] equal to "Brie" and ^[ITEM] equal to "Perrier,"
the variables _x and _y get bound to the prices of these items charged by the
supplier in question. If the condition in the condition box is satisfied, i.e., the
sum of _x and _y is no more than five dollars, then consistent values of fi and
v have been found, and we perform the print action indicated in n. If the
condition box is not satisfied, then we do not have a match, even though fi and
v agree on the value of the variable _bmw.
SUPPLIES
NAME
P . -bmw
_bmw
ITEM
Brie
Perrier
PRICE
j.
-y
CONDITIONS
_y <= 5.00
Figure 4.13 Suppliers who sell a Brie and Perrier for under $5.
For example, using the data of Figure 4.2(d), when -bmw has value
"Acme," the sum of _x and _y is 4.68, which satisfies the condition, so "Acme"
is printed. However, when the variable _bmw has value "Ajax," the sum is 5.07,
which does not satisfy the condition, and we do not print "Ajax." D
Completeness of QBE
As with the other languages we have studied, it appears simplest to prove
completeness by showing how to apply each of the five basic relational algebra
operations and store the result in a new relation. For instance, to compute
207
R
_al
_a2
_an
_bl
_b2
_bn
_al
_bl
_a2
_b2
T
I.
I.
...
_an
_bn
208
names and their attribute names. The second P . refers to the attribute names.
To insert a new relation REL into the table directory, type I . REL I . in the
upper left box and then type the attributes of REL along the top of the skele
ton. Again, the second I . refers to the attributes, while the first I . refers to
the relation name.
The attributes may be declared to have certain properties. These proper
ties are:
1. KEY, telling whether or not the attribute is part of the key (recall that
updates require the system to distinguish between key and nonkey fields).
The values of this property are Y (key) and N (nonkey).
2. TYPE, the data type of the attribute, such as CHAR (variable length
character string), CHAR(n) (character string of length n), FLOAT (real
number), or FIXED (integer).
3. DOMAIN, a name for the domain of values for this attribute. If a domain
variable in a query appears in two different columns, those columns must
come from the same domain. The system rejects queries that violate this
rule, a useful check on the meaningfulness of queries.
4. INVERSION, indicating whether an index on the attribute is (Y) or is not
(N) to be created and maintained.
Example 4.24: To create the SUPPLIES relation we might fill a table skeleton
with some of its properties, as shown in Figure 4.15. The first row indicates the
key for the relation; recall that NAME and ITEM together determine a unique
price, so the key for SUPPLIES is {NAME, ITEM}. The second row indicates
the data type for each ATTRIBUTE. We suppose that the NAME and ITEM
components are character strings, while PRICE is a real number, presumably
one that is significant to two decimal places.
In the row for domains we have indicated a distinct domain for each at
tribute. That would prevent us, for example, from asking a query about sup
pliers who provide an item with the same name as the supplier, because the
same variable would not be allowed to appear in the NAME and ITEM fields.
In the last row we have declared that there are no indices to be created. Recall
that an index on an attribute, such as NAME, allows us to find tuples with a
given name very fast; we do not have to search the entire relation. Particular
structures that could be used to create indices will be discussed in Chapter 6.
n
Views
QBE contains a delayed-evaluation feature similar to ISBL. When we wish to
create a view V, we insert V into the table directly as a relation, prefixing the
name V by the keyword VIEW. We then formulate in QBE the method whereby
V is to be calculated. V is not actually computed at the time. Rather, it is
I. SUPPLIES
KEY
TYPE
DOMAIN
INVERSION
I.
I.
I.
I.
I.
209
NAME
Y
CHAR
NAMES
N
ITEM
Y
CHAR
ITEMS
N
PRICE
N
FLOAT
AMOUNTS
N
I. VIEW OI I.
I.
NAME
-Snake
DATE
-today
ITEM
_hotdogs
QUANTITY
-somuch
ORDERS
O#
-123
DATE
-today
CUST
-Snake
INCLUDES
O#
-123
ITEM
-hotdogs
QUANTITY
.somuch
NAME
Ruth Rhino
DATE
P.
ITEM
P.
QUANTITY
P.
which prints the date, item and quantity for everything ordered by Ruth Rhino.
210
The value of relation OI, or rather its relevant partthe tuples with NAME
equal to "Ruth Rhino"is computed from ORDERS and INCLUDES when
the above query is executed. D
4.6 THE QUERY LANGUAGE SQL
SQL, formerly known as SEQUEL, is a language developed by IBM in San
Jose, originally for use in the experimental database system known as System
R. The language is now used in a number of commercial database systems, and
in some cases, the entire database system is marketed under the name SQL.
The particular version we shall discuss here is SQL/RT, implemented for the
IBM PC/RT by Oracle Corp.
Because SQL is the most commonly implemented relational query lan
guage, we shall discuss it in more detail than the other languages of this chap
ter. This section discusses the query language. Section 4.7 covers the data
definition facilities of the SQL system, and Section 4.8 introduces the reader to
the way SQL's query language interfaces with a host language.
The Select Statement
The most common form of query in SQL is a select statement of the form:
SELECT Ri^A\^.^Ri^Ar
FROM fli,...,flfc
WHERE *;
(4.11)
Here, Ri , . . . , R^ is a list of distinct relation names, and Rit .A\ , . . . Rir .Ar is
a list of component references to be printed; R.A refers to the attribute A of
relation R. If only one relation in the list following the keyword FROM has an
attribute A, then we may use A in place of R.A in the select-list.
* is a formula involving logical connectives AND, OR, and NOT, and compar
ison operators -, <=, and so on, essentially as in QUEL. Later, we shall discuss
more general conditions that can appear in place of *.
The meaning of query (4.11) is most easily expressed in relational algebra,
as:
That is, we take the product of all the relations in the from-clause, select
according to the where-clause (* is replaced by an equivalent expression ^',
using the operators of relational algebra, i.e., A in place of AND, and so on), and
finally project onto the attributes of the select-clause. Note the unfortunate
notational conflict: the keyword SELECT in SQL corresponds to what is called
"projection" in relational algebra, not to "selection."
211
Example 4.26: The query of Example 4.2, to list the customers with negative
balances, is expressed in SQL by:
SELECT NAME
FROM CUSTOMERS
WHERE BALANCE < 0;
Here, since there is only one relation in the from-clause, there can be no am
biguity regarding what the attributes refer to. Thus, we did not have to prefix
attributes by their relation names. However, we could have written
SELECT CUSTOMERS . NAME
if we wished, or similarly adorned BALANCE in the third line.
In either style, the result would be a one column relation whose attribute
is the one in the select-clause, that is, NAME. Had we wanted another header
for the column, we could have provided an alias for NAME by writing that alias
immediately after NAME in the select-clause, with no punctuation.9 Thus,
SELECT NAME CUSTOMER
FROM CUSTOMERS
WHERE BALANCE < 0;
Prints the table
CUSTOMER
Zack Zebra
Judy Giraffe
Had we wished to print the entire tuple for customers with a negative balance,
we could have written
SELECT NAME, ADDR, BALANCE
FROM CUSTOMERS
WHERE BALANCE < 0;
or just
SELECT *
FROM CUSTOMERS
WHERE BALANCE < 0;
since R.* is SQL's way of saying "all the attributes of relation R." In this exam
ple, since CUSTOMERS is the only relation in the from-clause, we do not even
need to mention that relation, and hence used * instead of CUSTOMERS.*. D
212
Example 4.27: The query of Example 4.3, to print the suppliers of the items
Zack Zebra ordered, is expressed in SQL by the program of Figure 4.17. Here, we
take the natural join of ORDERS, INCLUDES and SUPPLIES, using equalities
in the where-clause to define the join, just as we did in QUEL in Example 4.6
or in QBE in Figure 4.9. The where-clause also contains the condition that the
customer be Zack Zebra, and the select-clause causes only the supplier name to
be printed.
SELECT NAME
FROM ORDERS, INCLUDES, SUPPLIES
WHERE CUST = 'Zack Zebra1
AND ORDERS. O# = INCLUDES. O#
AND INCLUDES. ITEM = SUPPLIES. ITEM;
Figure 4.17 Print the suppliers of an item ordered by Zebra.
We should notice the way attributes are referenced in Figure 4.17. CUST
and NAME unambiguously refer to attributes of ORDERS and SUPPLIES,
respectively, so they do not have to be prefixed by a relation name. However,
O# is an attribute of both ORDERS and INCLUDES, so its two occurrences
on the fourth line of Figure 4.17 have to be prefixed by the relations intended.
A similar handling of the two occurrences of ITEM appears on the last line.
One other nuance is that SQL, like most real query languages, does not
remove duplicates automatically. Thus, in the query of Figure 4.17, "Acme"
would be printed three times, because it supplies each of the three items ordered
by Zebra, and "Ajax" would be printed twice. To remove duplicates, we use
the keyword DISTINCT following SELECT; i.e., the first line of Figure 4.17 would
become
SELECT DISTINCT NAME
D
Tuple Variables
Sometimes we need to refer to two or more tuples in the same relation. To do
so, we define several tuple variables for that relation in the from clause and use
the tuple variables as aliases of the relation. The effect is exactly the same as
was achieved by the range-statement in QUEL, so SQL, which appeared at first
to be a "syntactically sugared" form of relational algebra, is now revealed to
resemble tuple relational calculus.
213
Example 4.28: The query of Example 4.7, to print the names and addresses
of customers whose balance is less than that of Judy Giraffe may be expressed:
SELECT cl.NAME, cl.ADDR
FROM CUSTOMERS cl, CUSTOMERS c2
WHERE cl. BALANCE < c2. BALANCE
AND C2.NAME = 'Judy Giraffe1;
Recall that in the style of SQL, a name followed with no punctuation by another
name makes the second be an alias for the first. Thus, the above from-clause
declares both cl and c2 to be aliases of CUSTOMERS, in effect making them
tuple variables that range over CUSTOMERS. With that understanding, the
above SQL program is only a syntactic variation on the QUEL program that
we gave in Example 4.7. D
Pattern Matching
In addition to the usual arithmetic comparisons in where-clauses, we can use the
operator LIKE to express the condition that a certain value matches a pattern.
The symbol % in character strings stands for "any character string," while the
underscore _ stands for "any one character."
Example 4.29: The following code prints those items that begin with "E."
SELECT ITEM
FROM SUPPLIES
WHERE ITEM LIKE ' E7. ' ;
The next program prints those orders whose number is in the range 10001999, i.e., those whose order numbers are a "1" followed by any three characters.
For this code to make sense, we have to assume that order numbers are stored
as character strings, rather than integers.10
SELECT *
FROM ORDERS
WHERE O# LIKE 'l-__' ;
214
Find the set 5i of orders placed by Zebra, using the ORDERS relation.
Find the set S2 of items in set of orders 5i, using the INCLUDES relation.
Find the set S3 of suppliers of the items in set S2 by using the SUPPLIES
relation.
215
is only in the scope of SUPPLIES, and it is to that relation ITEM on line (3)
refers.
The occurrence of ITEM on line (4) might refer to SUPPLIES or IN
CLUDES, since it is in the scope of both. However, SQL follows a "most
closely nested" rule to resolve ambiguities, so the scope of INCLUDES, being
nested within the scope of SUPPLIES, yet including line (4), is deemed to be
the relation to which ITEM at line (4) refers. Had we wanted to refer to the
ITEM component of SUPPLIES anywhere within lines (4)-(9), we could have
said SUPPLIES. ITEM. Similar remarks apply to the occurrences of O# on lines
(6) and (7), which refer to the O# components of INCLUDES and ORDERS,
respectively, for the same reasons that ITEM refers to SUPPLIES on line (3)
and to INCLUDES on line (4).
The keyword ANY is used like an existential quantifier. If S is some expres
sion denoting a set, then the condition
A 9 ANY S
is equivalent to the logical expression
(3X)(X is in S A ABX)
Presumably, A is an attribute, whose value is taken from some tuple of some
relation, 5 is a set denned by a subquery, and 0 is an arithmetic comparison
operator. Similarly,
A 0 ALL S
means
(VA")( if X is in S then ABX)
Example 4.31: We can print each item whose price is as large as any appearing
in the SUPPLIES relation by using a subquery to form the set 5 of all prices,
and then saying that the price of a given item is as large as any in the set 5.
This query is shown in Figure 4.19. Notice that the scope rules described above
disambiguate which of the two uses of relation SUPPLIES [lines (2) and (5)]
the attribute PRICE refers to at lines (3) and (4). Line (3) is only in the scope
of the SUPPLIES of line (2), while at line (4), PRICE refers to the relation
with a PRICE attribute whose scope most closely surrounds line (4); that is
the relation SUPPLIES declared at line (5) and used in the subquery of lines
(4)-(5). Notice also that a where-clause is not essential in a query or subquery,
and this subquery creates a list of all prices by having a missing, or always-true,
where-clause. D
If we are sure that the set of values produced by a subquery will be a
singleton, then we can treat it as an ordinary value, and it may appear in arith
metic comparisons. However, if the data is such that the set 5 in a condition
like A = S is not a singleton, then the condition makes no sense, and an error
216
Aggregate Operators
SQL provides the usual five aggregate operators, AVG, COUNT, SUM, MIN, and MAX.
It also provides the operators STDDEV and VARIANCE to provide the standard
deviation and variance of a list of numbers. A select- from-where statement
can print the result of applying one or more of these aggregate operators to
the attributes of a single relation, by placing the relation in the from-clause
and placing in the select-clause the list of aggregate terms, agg-Op(A), where
aggjyp is an aggregate operator and A is an attribute. The where-clause may
have a condition *, and if so, only those tuples that satisfy * are included
in the computation of the aggregate. The keyword DISTINCT may precede the
attribute A in agg-Op(A), in which case duplicates are eliminated before agg-op
is applied.
217
Example 4.33: Let us consider the queries of Examples 4.19 and 4.20, which
were to compute the average balance and the total number of suppliers in the
YVCB database. For the first of these we write
SELECT AVG (BALANCE)
FROM CUSTOMERS;
This query would print the average balance, which is 69 for the data of Figure
4.2(a). The column header would be AVG (BALANCE). If we wanted another
column header, say AVJ3AL, we could specify an alias, as in:
SELECT AVG (BALANCE) AV_BAL
FROM CUSTOMERS;
For the query of Example 4.20, to count the number of suppliers, we can ex
amine the SUPPLIES relation but, recall from that example, we must eliminate
duplicates before we count. That is, we write
SELECT COUNT (DISTINCT NAME) #SUPPS
FROM SUPPLIES;
to print the number of different suppliers, in a column headed by #SUPPS.
If we wished to know only how many suppliers sell Brie, we could ask:
SELECT COUNT (NAME) #BRIE_SUPPS
FROM SUPPLIES
WHERE ITEM = ' Brie ' ;
Note it is unnecessary to remove duplicates here, because the fact that a supplier
sells Brie appears only once, assuming {NAME, ITEM} is a key for SUPPLIES
in the YVCB database. D
Aggregation by Groups
As in QUEL, we can partition the tuples of a relation into groups and apply
aggregate operators to the groups individually. To do so, we follow the selectfrom-where statement with a "group-by" clause, consisting of the keywords
GROUP BY and a list of attributes of the relation mentioned in the from-clause
that together define the groups. That is, if we have clause
GROUP BY A\,...,Ak
then we partition the relation into groups, such that two tuples are in the same
group if and only if they agree on all the attributes A\,...,Ak- For the result
of such a query to make sense, the attributes A\,...,Ak must also appear in
the select-clause, although they could be given aliases for printing, if desired.11
11 This situation is the only one where it is permitted to have both attributes of a relation
and aggregations of other attributes of the same relation appearing in the same selectclause; otherwise, the combination of, say, NAME and AVG(BALANCE) from relation
218
Example 4.34: Let us reconsider the query of Example 4.13, to print a table,
which was shown in Figure 4.7, of all the items and their average prices. In
SQL we write
SELECT ITEM, AVG (PRICE) AP
FROM SUPPLIES
GROUP BY ITEM;
The alias AP for AVG (PRICE) is used to conform with the table of Figure 4.7.
D
A where-clause can follow the from-clause if we wish only a subset of the
tuples to be considered as we form the groups. We can also arrange to have
only a subset of the groups printed, independently of any filtering that goes
on in the where-clause before we construct the groups. The keyword HAVING
introduces a clause that may follow the group-by clause. If we write
GROUP BY A\,...,Ak
HAVING *
then the condition * is applied to each relation /Ra,,...,ot that consists of the
group of tuples with values ],...,*. for attributes A\,...,Ak, respectively.
Those groups for which /2a,,...,at satisfies ^ are part of the output, and the
others do not appear.
Example 4.35: Suppose we wanted to restrict the groups in the query of
Example 4.34 to those items that were sold by more than one supplier. We
could then write
SELECT ITEM, AVG (PRICE) AP
FROM SUPPLIES
GROUP BY ITEM
HAVING COUNT(*) > 1;
Recall that * stands for all the attributes of the relation referred to, which in
this case can only be SUPPLIES. Thus, COUNT(*) counts the distinct tuples,
but since it appears in a having-clause, it does so independently for each group.
It finds that only the groups corresponding to Brie and Perrier have more than
one tuple, and only these two groups have their averages printed. The resulting
output is a subset of the tuples of Figure 4.7, that is,
ITEM AP
Brie
3.74
Perrier 1.14
If we had wanted to consider only those groups with two or more distinct
prices, we could have used the following having-clause:
CUSTOMERS does not make sense.
219
D
Instead of inserting one tuple at a time, we can replace the value-clause of
an insert-statement by a select-from-where statement that produces a relation
of values, say R. The arity of R must match the arity of the relation into which
insertion occurs.
220
FROM INCLUDES
WHERE ITEM = 'Brie');
Of course, most deletions will not need a subquery. If we wish to delete
a particular tuple, we simply specify all its values, or at least the values of its
key. For example, if Acme no longer sells Perrier, we can write:
DELETE FROM SUPPLIES
WHERE NAME = 'Acme'
AND ITEM * 'Perrier1;
221
Update
The general form of an update command is
UPDATE R
Completeness of SQL
As with QUEL and QBE, in order to simulate an arbitrary expression of rela
tional algebra in SQL, we must assume that a relation for each subexpression
has been defined.12 We then compute the relation for each subexpression, from
smallest to largest expression, culminating in the evaluation of the entire ex
pression. Thus, as with the other languages, we have only to show how to apply
the five basic operators of relational algebra.
Assume we have relations R(A\, . . . , An) and S(Bi,. . . , Bm). In the case
that we need to take the union or difference of R and 5, we also assume m = n
and Ai = Bi for all i. Of course, if the arities of R and 5 disagree, we cannot
take then" union or difference. However, if we need to rename the attributes of
12 Creation of new relations is explained in the next section.
222
5, we can create a new relation Snew with the same attributes, A\, . . . , An, as
R. We then copy 5 into Snew by:
INSERT INTO Snew
SELECT *
FROM S;
223
Creation of Indices
Indices are used to speed up access to a relation. Recall that if relation R has
an index on attribute A, then we can retrieve all the tuples with a given value
a for attribute A, in time roughly proportional to the number of such tuples,
224
rather than in time proportional to the size of R. That is, in the absence of
an index on A, the only way to find the tuples /i in R such that n[A] a is
to look at all the tuples in R. We devote Chapter 6 to a discussion of data
structures that give indices the capability to focus on only the desired tuples.
For the moment, let us consider only how indices are created and used in SQL.
The basic index creation command is:
CREATE INDEX /
ON R(A)\
The effect is to create an index named / on attribute A of relation R.
Example 4.41: We can say
CREATE INDEX OJLINDEX
ON ORDERS (O#);
to create an index on attribute O# of relation ORDERS. The name of the
index is O# JNDEX, and it allows the retrieval of the tuple for a given order
number in time that does not depend (significantly) on the size of the ORDERS
relation. D
An index can also enforce the condition that a certain attribute is a key.
If in Example 4.41 we had said
CREATE UNIQUE INDEX O#_INDEX
ON ORDERS (O#);
then the index O# JNDEX would not only speed up access given an order
number, but it would make sure, as tuples were inserted into ORDERS, that
we never had two tuples with the same order number.
It makes sense to use the UNIQUE keyword in the declaration of the index
on O# for ORDERS, but if we declared an index on O# for INCLUDES, we
would not want to declare it UNIQUE, because it is normal for orders to include
more than one item, and therefore several tuples in INCLUDES may have the
same order number.
To remove an index / from a relation R, without affecting the data in R
itself, we issue command
DROP INDEX /;
Views
A third group of commands of the SQL language functions as a subschema DDL,
or view definition mechanism. In general, we create a view by the command
CREATE VIEW V (Ai , . . . ,4fc) AS
Qi
where V is the name of the view, A\,...,Ak are its attributes, and Q is the
225
query that defines the view. The view V does not exist, but it can be queried,
and when we do so, V, or its relevant part, is constructed. To construct V, we
evaluate the query Q, and whatever tuples Q produces are the tuples in V.
Example 4.42: We can construct a view consisting of those items Acme sells
and their prices, by:
CREATE VIEW ACME_SELLS ( ITEM , PRICE) AS
SELECT ITEM, PRICE
FROM SUPPLIES
WHERE NAME = ' Acme ' ;
Since the attributes of the view ACME-SELLS are the same as the attributes
of the query that returns its tuples, we do not even have to list attributes for
the view, and the first line above could have been written simply:
CREATE VIEW ACME_SELLS AS
A second example is the view OI constructed in Example 4.25. This view is
the join of ORDERS and INCLUDES, with the common O# attribute projected
out. We can create this view as
CREATE VIEW OI (NAME, DATE, ITEM, QUANTITY) AS
SELECT CUST, DATE, ITEM, QUANTITY
FROM ORDERS, INCLUDES
WHERE ORDERS. O# = INCLUDES. O#;
Note how the attribute CUST of ORDERS becomes NAME in view 01, because
we have chosen to specify attributes for that view explicitly. D
Finally, should we want to destroy a view V we say
DROP VIEW V;
This statement has no effect on the database, but queries on view V will no
longer be accepted.
Database Catalogs
There are four database catalogs, called TABLES, VIEWS, INDEXES, and
COLUMNS, and we may obtain information about the current database scheme
by issuing queries that refer to these catalogs as if they were relations. There
is only one major syntactic difference: the name of the table, view, etc., which
we might assume is an attribute of TABLES and VIEWS, respectively, is not
specified in a where-clause, but rather by appending the object name, in brack
ets, to the catalog name. We shall not enumerate all the attributes of the four
catalogs, but rather give some examples of the information available, and how
it is requested.
Suppose we wanted to find the definition of the view ACME-SELLS introduced
in Example 4.42. We could ask:
226
SELECT VIEWSTEXT
FROM VIEWS [ACME-SELLS] ;
(4.12)
3.
4.
5.
6.
For example, if we wanted to examine some of these attributes for the view
ACME-SELLS of Example 4.42 we would write:
SELECT COLSNAME, COL$ID, COL$DATATYPE
FROM COLUMNS [ACME-SELLS] ;
Much of the information for view ACME-SELLS is inherited from the declara
tions we made when we created relation SUPPLIES in Example 4.40. In par
ticular, the data types of attributes ITEM and PRICE are inherited, because
they correspond to the attributes of the same names in the view definition.
227
Their order, as far as COLSID is concerned, comes from the order in which
they appeared in the create-view statement. Thus, the information printed by
the above query is:
COLSNAME COLSID COLSDATATYPE
ITEM
f~
CHAR
PRICE
2
NUMBER
We can use the TABLES catalog to find out the date on which a relation
such as SUPPLIES was created, by a query like:
SELECT TABSTIME
FROM TABLES [SUPPLIES] ;
If we want to know the same thing about a view, we refer to the attribute
VEWSCTIME of catalog VIEWS.
Finally, we can query the catalog INDEXES to find out information about
the indices declared for a given relation. Some of the attributes of indices are:
1. IDXSNAME, the name of the index.
2. IDXSCOLUMN, the attribute that is indexed.
3. IDXSUNIQUE tells whether the index is "unique," i.e., whether the at
tribute IDXSCOLUMN serves as a key for the relation.
Thus, we could ask about the index OSJNDEX created in Example 4.41,
by:
SELECT IDX$NAME, IDXSCOLUMN, IDXSUNIQUE
FROM INDEXES [ORDERS];
Since there is only the one index for ORDERS, the following single tuple:
IDXSNAME IDXSCOLUMN IDXSUNIQUE
O#JNDEX
O#
NON UNIQUE
would be the only one printed.
4.8 EMBEDDING SQL IN A HOST LANGUAGE
We shall now sketch the way SQL/RT interfaces the SQL language with the
host language C. We try to avoid details of the C language itself, using a
"Pidgin" version that should make clear what functions the code written in
the host language is performing, without getting bogged down in the details
of C or of the UNIX operating system that surrounds it. While the interfaces
between other hosts and/or other database languages differ in many details,
the treatment given here is representative of the capabilities found in such
interfaces.
The process of creating an executable program prog that accesses an SQL
database is shown in Figure 4.21. We begin with a source program prog.pc, that
228
is mainly C code, but also includes special statements, each on a line beginning
EXEC SQL, that are translated by the SQL precompiler into C code, mostly
calls to library routines that perform the various SQL commands and pieces of
commands.
prog.pc
I
Precompiler
I
prog.c
i
C Compiler
-I
prog.o
1
Loader
I
prog
SQL
Library
229
Execute-Immediate Statements
The simplest way to have a C program influence an SQL database is to embed
within the C program an execute-immediate statement, of the form
EXEC SQL EXECUTE IMMEDIATE 5;
Here, S is an SQL statement that is not a query; i.e., S may not be a select-fromwhere statement. For example, S might be a command to insert a particular
tuple into the ORDERS relation, as in:
EXEC SQL EXECUTE IMMEDIATE
INSERT INTO ORDERS
VALUES(1027, 'Jan 4', 'Sally Squirrel1 );
(4.13)
There is, however, little use in placing such a statement in a C program,
since every time the program is executed, the same tuple will be inserted into
ORDERS. What we really want is an application program that can be run
every time the YVCB accepts a new order. The program must therefore ask
230
the user for the order number,13 name, and date, place the user-supplied values
into C variables, which we call ordno, name, and date, and then execute the
statement:
EXEC SQL EXECUTE IMMEDIATE
INSERT INTO ORDERS
VALUES (: ordno , :date, :name);
Notice how the C variables preceded by colons are used exactly as constants
were used in (4.13).
Prepare-and-Execute
An alternative to immediate execution of statements is to prepare statements
prior to their execution, giving each a name known to the SQL precompiler only,
and then executing the statement, by referring to it by its name. The advantage
to this arrangement is that the time spent by the SQL system processing a
command occurs only once, when we prepare the statement, and executions of
the statement can then proceed more rapidly. In contrast, if we use executeimmediate statements, the cost of processing the command is paid every time
the statement is executed.
The form of a prepare-statement is
EXEC SQL PREPARE 5 FROM 5;
Here, S is an SQL statement, which is still not permitted to be a query, and
S is the name chosen for this statement. S may be an SQL command written
out, perhaps with C variables, preceded by colons, in place of some constants.
S may also be the name of a C variable (again preceded by a colon) that is a
character string in which the command appears.14 Thus, we could write
EXEC SQL PREPARE stat FROM :com;
and store the text of the desired command in variable com prior to executing
the above statement. Subsequently, stat will refer to the statement that was in
com when the prepare-statement above was executed.
We may then execute a statement S by issuing a command of the following
form:
EXEC SQL EXECUTE 5 USING :A\
:Ak;
where AI, . . . , Ak are the C variables that appear in the text from which 5 was
prepared.
13 Perhaps the program would generate a new order number from a C variable representing
the "next order number," which in turn might come from a UNIX file accessible from
C programs, or from a one-tuple SQL relation accessible through SQL commands.
14 We would need a variable like com if we read commands from a terminal and executed
them; the matter is discussed further at the end of the section.
231
232
(1)
(2)
(3)
(4)
(5)
(6)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
233
(8) nomore :
(9)
EXEC SQL CLOSE C;
(10)
/* Continue with program following query S */
Figure 4.24 Prepare-open-fetch-close pattern.
value of At- Line (7) suggests that something must happen with each tuple, and
in practice, line (7) will be replaced by code that accesses some of the variables
A\, . . . ,Aif, thereby using the tuple retrieved in some calculation.
We break out of the loop of lines (5)-(7) when line (6) fails to find a new
tuple, after each tuple of the answer has been retrieved. At that time, the
"whenever" clause of line (4) applies, taking us to line (8). Line (9) closes the
cursor C, so it can be reopened if we repeat this query, and we then continue
with the program after the query. A small technical note is that lines (8) and
(9) may not be combined, because the EXEC SQL must be the first characters,
other than white space, on any line in which it appears.
Example 4.45: Let us write a program determine the total number of pounds
of Brie on order. Of course we could do this job with an ordinary SQL command:
SELECT SUM(QUANTITY)
FROM INCLUDES
WHERE ITEM = 'Brie';
but the problem will still serve for an illustration. The program is shown in
Figure 4.25.
Notice that the declaration of variable sum does not have to appear in the
SQL declare section, because it is not used as an interface variable. Also, ==
is C's equality operator, and += is an accumulation operator; sum += quant is
what would be written
sum := sum + quant
in most other languages. Procedure equalstrings, not written here, tests
whether two strings are identical. D
234
printsum:
EXEC SQL CLOSE cur;
writeC'Amount of Brie ordered = "); write(sum);
Figure 4.25 Printing the amount of Brie ordered.
EXERCISES
235
EXERCISES
4.1: Suppose we have the beer drinkers' database from Example 3.6 with rela
tions
FREQUENTS(DRJNKER, BEER)
SERVES(BAR, BEER)
LIKES(DRINKER, BEER)
*
*
4.2:
*
4.3:
4.4:
Write the following queries in (t) ISBL (ii) QUEL (iii) Query-by-Example
(t) SQL.
a) Print the bars that serve a beer drinker Charles Chugamug likes.
b) Print the drinkers that frequent at least one bar that serves a beer
that they like.
c) Print the drinkers that frequent only bars that serve some beer that
they like (assume each drinker frequents at least one bar).
d) Print the drinkers that frequent no bar that serves a beer that they
like.
Write in (t) QUEL (it) Query-by-Example (tit) SQL:
a) The DRC expression of Exercise 3.10.
b) The TRC expression of Exercise 3.9.
Using (t) QUEL (ii) Query-by-Example (itt) SQL, write programs to per
form the following operations on the beer drinkers' database of Exercise
4.1.
a) Delete from SERVES all tuples for Potgold Beer.
b) Insert the fact that drinker Charles Chugamug likes Potgold.
c) Insert the facts that Chugamug likes all beers served at the Bent Elbow
Bar and Grill.
Suppose that the beer drinkers' database has relation
SELLS(BAR,BEER,AMOUNT)
Write in (i) QUEL (ii) Query-by-Example (in) SQL queries to print the
a) Total amount of each beer sold.
b) Average amount of each beer sold per per bar, excluding bars that do
not sell the beer.
* c) Maximum amount of each beer sold, provided at least two bars sell
the beer.
4.5: Suppose that we want a view of the beer drinkers' database
WHERE(DRINKER, BEER, BAR)
236
containing those tuples (d, 6, r) such that drinker d likes beer 6, bar r
serves beer 6, and drinker d frequents bar r. Write in (t) ISBL (ii) Queryby-Example (in) SQL a view definition for this view.
4.6: Write or sketch a simple command interpreter that interfaces with the beer
drinkers' database through calls to SQL. The commands are of the forms
i) i <bar name> <beer name>, meaning "insert into SERVES the fact
that the bar serves the beer."
it) d <bar name> <beer name>, meaning "delete from SERVES the fact
that the bar serves the beer."
tit) q bar <bar name>, meaning "print the beers served by the bar."
iv) q beer <beer name>, meaning "print the bars that serve the beer."
* 4.7: Suppose we have a relation
MANAGES(EMPLOYEE, MANAGER)
4.8:
4.9:
4.10:
4.11:
EXERCISES
237
SELECT OWNER
FROM FSO
WHERE FILE IN
SELECT FILE
FROM FTD
WHERE TYPE = 'tex'
FILE
TYPE
_foo
-bar
P.
DIRECTORY
_root
-root
_foo
-bar
238
cannot.
a)
b)
Show the Query-by-example table directory entries for FSO and FTD.
Invent suitable types and domains for the attributes.
Show the entries in the SQL database catalog TABLES for FSO
and FTD. Indicate the values of the fields COLSNAME, COLSID,
COLSDATATYPE, COLSLENGTH, COLSSCALE, and COLSNULL.
Where no value can be deduced, give suitable values, making them
consistent with your choices for (a), when possible.
BIBLIOGRAPHIC NOTES
The notion of completeness for query languages is from Codd [1972b]. Kim
[1979] is a survey of relational systems, while Kent [1979] argues the inadequacy
of such systems. Greenblatt and Waxman [1978] compare several relational
languages for ease-of-use by naive users.
ISBL
Todd [1976] is the principal source of information.
QUEL
The description of QUEL given here is based on Stonebraker, Wong, Kreps, and
Held [1976] and Zook et al. [1977]. An overview of the surrounding INGRES
system can be found in Stonebraker [1980].
Query-by-Example
Development of the system is described in Zloof [1975, 1977]. A description of
the commercial version is in IBM [1978a].
SQL
A definition of the SQL language (formerly called SEQUEL) can be found in
Chamberlin et al. [1976]; earlier versions are described in Boyce, Chamberlin,
King, and Hammer [1975] (called SQUARE) and Astrahan and Chamberlin
[1975].
System/R, which included the original implementation of SQL, is surveyed
in Astrahan et al. [1976, 1979], Blasgen et al. [1981], and Chamberlin et al.
[1981]. The VM commercial implementation of SQL is covered in IBM [1984],
while the PC/RT version of Sections 4.6-4.8 is from IBM [1985a, b].
BIBLIOGRAPHIC NOTES
239
AWK
There is a UNIX tool called AWK that we have not covered here, but which,
along with the join command of UNIX can serve as a rudimentary relational
database system for small files. See Aho, Kernighan, and Weinberger [1979,
1988].
View Update
An unresolved technical problem for relational database systems is how one
properly translates update operations on views into operations on the actual
database relations. Dayal and Bernstein [1982] and Keller [1985] present tech
niques for managing part of this problem.
CHAPTER 5
Object-Oriented
Database
Languages
241
Records
What we called logical record types in Section 2.5 are referred to as record
types in the DBTG proposal. The fields in a logical record format are called
data items, and what we called logical records are known simply as records.
We shall use the terms "record" and "record type," since we are inclined to
drop the term "logical" anyway, when no confusion results. However, let us
continue to use "field," rather than "data item," since the latter term is rarely
used outside the DBTG proposal itself. The database can, naturally, contain
many occurrences of records of the same type. There is no requirement that
records of the same type be distinct, and indeed, record types with no fields
are possible; they would be used to connect records of other types, and in the
implementation, the seemingly empty records would have one or more pointers.
DBTG Sets
By an unfortunate turn of fate, the concept of a link, that is, a many-one
relationship from one record type to another, is known in the DBTG world as
a set. To avoid the obvious confusions that would occur should the term "set"
be allowed this meaning, many substitute names have been proposed; the term
DBTG set is a common choice, and we shall adopt it here.
When we have a many-one relationship m from records of type R2 to
records of type Ri, we can associate with each record r of type R\ the set
Sr, consisting of those records s of type #2 such that m(s) = r. Since m is
many-one, the sets 5r, and Sr3 are disjoint if r\ ^ r2. If S is the name of the
DBTG set representing the link m, then each set 5r, together with r itself, is
said to be a set occurrence of S. Record r is the owner of the set occurrence,
and each s such that m(s) = r is a member of the set occurrence. Record type
RI is called the owner type of S, and #2 is the member type of 5.
The DBTG model requires that the owner and member types of a DBTG
set be distinct. This requirement produces some awkwardness, but it is consid
ered necessary because many DBTG operations assume that we can distinguish
the owner from members in a set occurrence. We can get around the require
ment by introducing dummy record types, as in the following example.
Example 5.1: Suppose we have a record type PEOPLE, which we would like
to be both the owner and member types of DBTG set MOTHER-OF, where the
owner record in a set occurrence is intended to be the mother of all its member
records. Since we cannot have PEOPLE be both the owner and member types
for MOTHER-OF, we instead create a record type DUMMY, with the following
DBTG sets.
1. IS, with owner DUMMY and member PEOPLE. The intention is that each
DUMMY record owns an IS set occurrence with exactly one PEOPLE
record. Thus, each DUMMY record d is effectively identified with the
242
1:1
4:1
243
RECORD EMPS
1 ENAME CHAR (20)
1 SALARY REAL;
RECORD DEPTS
1 DNAME CHAR (10)
1 DEPT# INTEGER;
RECORD SUPPLIERS
1 SNAME CHAR(1O)
1 SADDR CHAR(50) ;
RECORD ITEMS
1 INAME CHAR (10)
1 ITEM# INTEGER;
RECORD ORDERS
1 O# INTEGER
1 DATE CHAR(1O);
RECORD CUSTOMERS
1 CNAME CHAR(20)
1 CADDR CHAR (50)
1 BALANCE REAL;
RECORD ENTRIES
1 QUANTITY INTEGER;
RECORD OFFERS
1 PRICE REAL;
Figure 5.2 Record type declarations for YVCB database.
Example 5.3: The YVCB network database scheme was described in Example
2.24. There are eight record types. In Figure 5.2 we see the declarations for
each of these. D
We declare DBTG set 5, with owner type O and member type M by the
following statement form:
DBTG SET 5
OWNER IS O
MEMBER IS M ;
Example 5.4: The YVCB database also has eight links, as indicated in Figure
2.16. In Figure 5.3 we see them listed with their owner and member types. D
244
The NAME and ITEM fields would waste space, because they duplicated
data that could be obtained without them. That is, given an OFFERS
record, we could find the value of NAME by finding the owner of that
245
record according to the 0-SUPPLIER link and taking the SNAME field
of the owner record. Similarly, we could find the owner of the OFFERS
record according to the OJTEM link, and take the INAME field of that
record in place of the ITEM field of the OFFERS record.
2. There is a potential for inconsistency. Perhaps, because of careless up
dating of the database, when we follow the links described in (1) to get
SNAME or INAME values, they do not agree with the NAME and ITEM
fields of the OFFERS record.
The way the DBTG proposal copes with the problems of redundancy and
potential inconsistency is to allow us to declare virtual fields, which are fields
defined to be logically part of a record, but not physically present in the record.
Rather, when we declare the virtual field, we define a source for the field, which
is a field of some owner record. When we refer to the virtual field in a query, the
database system obtains its value by following a link to the proper owner record
and obtaining the source field from that record. By having only one physical
copy of each field, we not only save space, but we also render impossible the
inconsistency mentioned in (2) above. Of course, we trade increased access time
for the privileges of consistency and space conservation, since instead of finding
the virtual field in the record where we imagine it resides, we have to go to the
database to obtain the source field from another record.
Example 5.5: If we wished to have virtual NAME and ITEM fields in OFFERS
records we could have defined that record type by the DDL code in Figure 5.4.
Note that we use the notation A.B for field B of record type A1 D
RECORD OFFERS
1 PRICE REAL
1 NAME VIRTUAL
SOURCE IS SUPPLIERS . SNAME OF OWNER OF O_SUPPLIER
1 ITEM VIRTUAL
SOURCE IS ITEMS. INAME OF OWNER OF O-ITEMS;
Figure 5.4 Virtual fields for OFFERS records.
Incidentally, the reader should note that each of the models discussed has a
method, roughly equivalent to "virtual fields," for solving the redundancy and
consistency problems. The virtual record types used in the hierarchical model
are quite similar in spirit to the virtual fields of the DBTG proposal, and they
serve the same purpose. The object model provides the same facility, since an
object 0i that is part of another object O^ never appears physically within
The DBTG proposal uses the notation B IN A for the more common A.B.
246
02. Rather, O\ is pointed to, or referenced by, O2. In Chapter 7 we shall see
how these problems are dealt with in the relational model through the schema
design process known as "normalization."
View Definition
The DBTG proposal calls for a subschema data definition language, in which
one can define views. In a view, one is permitted to use a different name for
any record type, field, or DBTG set. We can omit from the view fields that are
present in a record type, we can eliminate record types altogether, and we can
eliminate DBTG sets from the view.
As the view facility of the DBTG proposal contains no concepts not present
in the data definition language for the conceptual scheme, we shall, in the
following sections, write programs that act on the conceptual scheme directly,
as if it were a complete view of itself. Thus, views play no role in what follows.
5.2 THE DBTG QUERY LANGUAGE
In this section we shall consider the query aspects of the DML that is defined by
the CODASYL proposal. The next section covers the commands that update
the database.
In the DBTG approach, all programs are written in a host language
(COBOL in the DBTG proposal) augmented by the commands of the data
manipulation language, such as FIND (locate a described record), GET (read a
record from the database), and STORE (put a record into the database). This
arrangement is essentially the one illustrated in the second column of Figure
1.4, although statements of the extended language are not marked explicitly for
a preprocessor as they were in Figure 1.4.
The Program Environment
247
"
Currency pointers -.
^. ^-
- ^
Program
variables
/
""~ -
Workspace
Figure 5.5 The program environment.
Currency Pointers
As a program runs, it is necessary for it to locate various records by a FIND
command, and to operate upon them by other commands. To keep track of
recently accessed records, a collection of currency pointers is maintained auto
matically by the database system, and the values of these pointers are made
available to the program. The currency pointers with which we deal are:
1. The current of run-unit. The term "run-unit" means "program" in the
DBTG proposal. The most recently accessed record, of any type whatso
ever, is referenced by a currency pointer called the "current of run-unit."
2. The current of record type. For each record type T, the most recently
accessed record of this type is referred to as the "current of T."
3. The current of set type. For each DBTG set 5, consisting of owner record
type TI and member record type T-j, the most recently accessed record of
type TI or T2 is called the "current of S." Note that sometimes the current
of 5 will be an owner, and sometimes it will be a member. Also understand
that the current of S is a record, rather than a set occurrence. Sometimes
it is convenient to talk of the set occurrence containing the record "current
of 5" as if this set occurrence itself were the "current 5 occurrence," but
there is no such thing as a pointer to a set occurrence.
Example 5.6: Suppose that the data about suppliers from the relation of
Figure 4.2 is now represented according to the network of Figures 5.2 and 5.3.
In particular, let us focus on the set occurrence of the O -SUPPLIER set owned
by Ajax, in which the Ajax SUPPLIERS record owns three OFFERS records,
corresponding to items Brie, Perrier, and Endive. Each of these is owned by
248
an ITEMS record, according to the OJTEM DBTG set. If we assume that the
virtual fields described in Figure 5.4 are not present in OFFERS records, then
to find the items supplied by Ajax, we must visit each OFFERS record Ajax
owns. Only the prices are found in OFFERS records, but they are linked in
rings to their owner according to the O JTEM set, and by following that link we
can find the owning item, which is one of the items sold by Ajax. The structure
is suggested by Figure 5.6.
^3^8"]^-
-^
V^7
249
step, to the record 3.98, that record becomes the current of run-unit, the current
OFFERS record, and the current of both the O-SUPPLIER and O JTEM sets;
other currency pointers are not changed. The history of the currency pointers
is summarized in Figure 5.7. D
Current of:
1.
2.
3.
4.
5.
6.
7.
Reading a record from the database to the workspace is a two stage process.
First, using a sequence of FIND statements, we locate the desired record; that
is, the desired record must become the current of run-unit. At this point,
nothing has been copied into the template for the record type. To copy the
record into the template in the workspace, we simply execute the command
GET. This command always copies the current of run-unit into the template for
whatever record type is the current of run-unit. If we wish to copy only a subset
of the fields of the current of run-unit, we can list the desired fields after GET,
as in
GET <record type>; <list of fields>
Example 5.7: Suppose that the OFFERS record type is defined as in Figure
5.4, with the virtual fields NAME and ITEM, as well as the field PRICE. If the
current of run-unit is an OFFERS record, we can read the ITEM and PRICE
fields by:
GET OFFERS; ITEM, PRICE
The NAME field in the template for offers is not affected.
Notice that even though ITEM is a virtual field of OFFERS, we can pro
gram as though it actually existed. We rely on the system to get the correct
250
value from the ITEMS.INAME field of the owner of the OFFERS record in its
OJTEM set occurrence. D
For debugging purposes, we can append the record type to the command
GET, even if we want all fields of the record. For example
GET OFFERS
will copy the current of run-unit into the OFFERS template, if the current of
run-unit is a OFFERS record. Otherwise, the system will warn the user of an
error when the GET OFFERS command is executed. Let us emphasize that one
cannot use GET to read a record other than the current of run-unit, even if we
follow GET by the type of that record.
251
strategy. The variety of FIND statements is extensive, and we shall here con
sider only the following useful subset of the possibilities.
1.
2.
3.
4.
5.
6.
7.
Find a record given its database key, i.e., a pointer to the record.
Find a record given a value for its CALC-key.
From the file of records of a given type, find (one-at-a-time) all the records
with a given value in the CALC-key field or fields.
Visit all the members of a set occurrence in turn.
Scan a set occurrence for those member records having specified values in
certain of the fields.
Find the owner of a given record according to a given DBTG set.
Find the current of any record or DBTG set.
Let us now introduce the commands for executing the FIND statement. We
shall use a "Pidgin" version of the DBTG data manipulation language through
out, which differs from the proposal in two ways.
1.
2.
The proposal calls for many optional "noise words" in its syntax. We
have arbitrarily chosen to include or exclude them, with an eye toward
maximizing clarity.
We have inserted the words RECORD, SET, and other explanatory words,
in certain places where they help to remind the reader of what the variables
represent.
The first two kinds of FIND statement access records by a "key," either
the database key or the CALC-key. To access by database key we write:
FIND <record type> RECORD BY DATABASE KEY <variable>
where the <variable> is a variable in the workspace that has previously been
given a database key as value.
Example 5.8: We can store a database key into a variable of the workspace
by an instruction such as
XYZ := CURRENT OF ITEMS
Later, we could retrieve this particular ITEMS record by saying:
252
To find a record given values for its CALC-key fields, we "pass" those values
to FIND by placing the values in the corresponding fields of the template; then
we issue the command
FIND <record type> RECORD BY CALC-KEY
Example 5.9: Suppose CUSTOMERS records have field CNAME as CALCkey. Then we could find the balance for Zack Zebra by:
CUSTOMERS. CNAME := "Zack Zebra"
FIND CUSTOMERS RECORD BY CALC-KEY
GET CUSTOMERS; BALANCE
Note that CUSTOMERS.CNAME and CUSTOMERS.BALANCE could have
been written CNAME and BALANCE, respectively, as no ambiguity would
arise in our example database. D
253
254
gone around the ring already. If not, we "process" an order and move to the
next order.
To "process" an order, we must treat the ORDERS record as the owner of
an E-ORDER set occurrence, and scan each of the ENTRIES records in that set
occurrence by a similar loop, the inner while- loop. For each ENTRIES record,
we use a FIND OWNER command to reach the owner of that record according
to the EJTEM set. That owner is one of the items ordered by Zack Zebra, and
we print it. Note that duplicates, that is, items found on more than one order,
will be printed several times. D
Singular Sets
There are times when we would like to scan all the records of a certain type, for
example, to find all customers with negative balances. We cannot directly access
all the CUSTOMERS records by CALC-key or database key, unless we know
the name of every customer of the YVCB, or if we know all the database keys for
these records, which are two unlikely situations. Scanning set occurrences for
CUSTOMERS records won't work either, unless we have some way of locating
every set occurrence of some DBTG set.
We may define, for a given record type, what is known as a singular DBTG
set. A singular set has two special properties.
1.
The owner type is a special record type called SYSTEM. Having SYSTEM
as the owner distinguishes singular DBTG sets.
2.
255
There is exactly one set occurrence, and its members are all the records of
the member type. The records are made members automatically, with no
specific direction required from the user.
256
The last type of FIND we shall cover is a FIND statement whose purpose is to
make a current of record or set become the current of run-unit. The syntax is:
FIND CURRENT OF <set name> SET
257
or
258
To store a new record of type T into the database, we create the record r in
the template for record type T and then issue the command
STORE T
This command adds r to the collection of records of type T and makes r be
the current of run-unit, the current of T, and the current of any DBTG set of
which T is the owner or member type.
As mentioned above, if T is the member type of any DBTG sets in which
it is declared to have automatic insertion, then r becomes a member of one set
occurrence for each of these sets; exactly which occurrences depends on the set
selection clauses that are part of the DDL database description.
The opposite of AUTOMATIC is MANUAL. If DBTG set 5 is declared
this way, then member records are not inserted into any set occurrence of 5
when the records are stored, and we must "manually" insert records into set
occurrences of 5 by an INSERT command, to be discussed later in this section.
259
Set Selection
Granted that we have declared insertion of records of type T into set occur
rences of S to be AUTOMATIC, we need a mechanism for deciding which
set occurrence of S gets the new record. The STORE command itself cannot
specify the correct set occurrence. Rather, when we declare DBTG set S, we
include a SET SELECTION clause that tells how to select the set occurrence
of S into which a newly stored member record is to be placed. There are many
different ways in which the set occurrence could be chosen. We shall describe
only the two simplest kinds of set selection clauses. Remember that each of the
following statements belongs in the declaration for a set 5; i.e., they would be
added to declarations such as those of Figure 5.3. They are not part of the data
manipulation language. Also note that we use a "Pidgin" syntax to make the
meaning of the clauses more apparent.
1.
2.
Example 5.15: Suppose we wish to store ENTRIES records and insert them
automatically into E-ORDER and EJTEM set occurrences when we do. If O#
is the CALC-key for ORDERS, we can use an order number to select the set
occurrence for E-ORDER, by including in the declaration for E-ORDER the
clause
SET SELECTION IS THRU OWNER USING O#
We might choose to select the EJTEM occurrence through the owner identified
by INAME, but for variety, let us select the EJTEM occurrence by placing
SET SELECTION IS THRU CURRENT OF E-ITEM SET
in the declaration of EJTEM. The clause
INSERTION IS AUTOMATIC
must be placed in the declarations of both E-ORDER and EJTEM. The pro
gram in Figure 5.14 reads an order number, item, and quantity, creates an
ENTRIES record with the quantity, and stores that record into the database.
Because of the set-selection declarations we have made, this ENTRIES record
is automatically inserted into the correct set occurrences of E-ORDER and
EJTEM. D
260
261
read 0, I, Q
ORDERS. O0 := O
FIND ORDERS RECORD BY CALC-KEY
/* establishes the correct current of E-ORDER */
ITEMS. INAME := I
FIND ITEMS RECORD BY CALC-KEY
/* establishes the correct current of E_ITEM */
ENTRIES. QUANTITY := Q
STORE ENTRIES /* new order is now the current of run-unit,
but not a member of any set occurrences */
INSERT ENTRIES INTO E-ORDER, E-ITEM
Figure 5.15 Manual insertion of a new ENTRIES record.
the record removed must be the current of run-unit, not just the current of T.
Also, we are not permitted to execute the REMOVE statement if mandatory
retention has been specified for 5.
Record Modification
The command
MODIFY <record type>
has the effect of copying the template for <record type> into the current of
run-unit. If the current of run-unit is not of the designated record type, it is
an error. We can also modify a selected subset of the fields in the current of
run-unit by writing
MODIFY <record type> ; < field list >
If T is the record type for the current of run-unit, the values of the fields in the
list are copied from the template for T into the fields of the current of run-unit.
Other fields in the current of run-unit are unchanged.
Example 5.17: Suppose Ruth Rhino moves to 62 Cherry Lane. Assuming
that CNAME is the CALC-key for CUSTOMERS, we could change her CUS
TOMERS record by:
CUSTOMERS. CNAME := "Ruth Rhino"
FIND CUSTOMER RECORD BY CALC-KEY
CUSTOMERS. CADDR := "62 Cherry Lane"
MODIFY CUSTOMERS; CADDR
262
263
the structure of hierarchies; the syntax is again a "Pidgin" language chosen for
clarity. In Chapter 6 we shall discuss some of the options for declaring physical
layout of hierarchies that IMS provides.
There are three essential features that define structure in a hierarchy: trees,
nodes (logical record types), and fields within nodes. We shall declare trees by:
TREE <name> <list of logical record types>
Each logical record type is then declared by:
RECORD <name> <information>
The information associated with records includes the following.
1. Fields. We shall use the same notation as in Section 5.1 for declaring fields
within a record type.
2. The position of the record type within the hierarchy. We use the word
ROOT to indicate the record type is at the root of the tree, and otherwise,
we shall include a clause
PARENT = < parent name>
to indicate the parent record type for the record type being declared.
Virtual record types present as fields within the record. We use the clause
VIRTUAL <record name> IN <tree name>
to indicate which node in which tree the pointer field points to.
There are a number of other aspects to the IMS data definition language.
They allow us to declare additional pointers, for example, to parent records, to
the leftmost child, or to the next record in a preorder traversal of the database.
Unlike the pointers resulting from virtual record type declarations, these point
ers are not accessible to the data manipulation commands, and are only used
by the query-processing algorithms to speed up access to data.
3.
DEPTS
CUSTOMERS
SUPPLIERS
"<
ORDERS,
*
v
I
*ORDERS *SUPPLIERS V ^*ITEMS/
V S \ QUANTITY
^ -ENTRIES
- *ITEMS/
i
/
I
i
-OFFERS
264
Example 5.19: Let us express the hierarchy of Figure 2.26 in the above no
tation. (Figure 2.26 is repeated here, for convenience.) We use the same set
of fields for the logical record types as was found in Figure 5.2, when we de
clared the network structure for the YVCB database. To these we must add
the pointer fields representing virtual record types. The structure is shown in
Figure 5.16. For consistency with Figure 5.2, we use the record name OFFERS
for the node in Figure 2.26 that has a PRICE and a virtual ITEMS field, and we
use ENTRIES for the node consisting of a QUANTITY and a virtual ITEMS
field. D
TREE DEPTS-TREE
RECORD DEPTS ROOT
1 DNAME CHAR (10)
1 DEPT# INTEGER
RECORD EMPS PARENT=DEPTS
1 ENAME CHAR(20)
1 SALARY REAL
RECORD MGR PARENT=DEPTS
VIRTUAL EMPS IN DEPTS-TREE
RECORD ITEMS PARENT=DEPTS
1 INAME CHAR (10)
1 ITEM# INTEGER
RECORD VIRT-ORDERS PARENT=ITEMS
VIRTUAL ORDERS IN CUST_TREE
RECORD VIRT-SUPPS PARENT=ITEMS
VIRTUAL SUPPLIERS IN SUPPS-TREE
TREE CUST-TREE
RECORD CUSTOMERS ROOT
1 CNAME CHAR(20)
1 CADDR CHAR (50)
1 BALANCE REAL
RECORD ORDERS PARENT=CUSTOMERS
1 O# INTEGER
1 DATE CHAR(1O)
RECORD ENTRIES PARENT=ORDERS
1 QUANTITY INTEGER
VIRTUAL ITEMS IN DEPTS-TREE
TREE SUPPS-TREE
RECORD SUPPLIERS ROOT
1 SNAME CHAR(1O)
1 SADDR CHAR(50)
RECORD OFFERS PARENT=SUPPLIERS
1 PRICE REAL
VIRTUAL ITEMS IN DEPTS-TREE
265
266
(5.1)
finds the leftmost customer record whose BALANCE field has a negative value.
D
Order of Records
To understand what "leftmost" means in this context, recall that for each tree
of a database scheme, such as Figure 2.26, there is a collection of trees, one
for each database record in the current instance of the scheme. A database
record consists of one record of the root type and all its descendant records in
the database, as we discussed in Section 2.6. The order of database records of
a given type might be a sort based on some key field, or it might be random,
perhaps based on a hash function; the actual order depends on the physical
structure chosen when the database scheme is declared. Whatever order of the
database records there is, getting the "leftmost" means getting the first eligi
ble database record in this order. For example, the "leftmost" CUSTOMERS
database record might be the one whose customer name comes first in alpha
betical order. If there is a condition to be satisfied, such as BALANCE<0 in (5.1),
then we would scan CUSTOMERS records in their order, until we find one
meeting the condition.
267
If we are looking for a record type R that is not the root, then we examine
the database records in order from the left. Within a tree, the nodes have a
natural "from the left" order, with order among records of the same type that
are children of the same node determined in some specified manner, e.g., sorted
according to some key value, or random. We consider all records of type R, in
this order, until we find one meeting the conditions of the where-clause.
Example 5.21: Referring again to the tree CUST-TREE, we could ask:
GET LEFTMOST ORDERS
WHERE CUSTOMERS. BALANCE < 0
AND ORDERS. DATE = "Jan 3"
The effect of this query is that customer database records are examined, in
order "from the left," until we find one that has a root CUSTOMERS record
with a BALANCE less than zero and an ORDERS child with a DATE field of
"Jan 3." To find this ORDERS record we may skip over many database records
with BALANCE less than zero. If the desired database record has two or more
ORDERS children with the date "Jan 3," we stop at the leftmost such child.
We can access virtual record types as if they were physically present in
the database at the point where the virtual record (pointer) appears. Thus,
we could imagine that the ITEMS children of ORDERS in the CUST-TREE
database records are physically part of that tree, and treat them as grandchil
dren of CUSTOMERS records. Thus, we could write
GET LEFTMOST ITEMS
WHERE ORDERS. O# = 1024
to find the first item on order 1024. To execute this command, we scan all of the
CUST-TREE database records in order, until we find one with an ORDERS
child having order number 1024. Then we find the first (virtual) ITEMS child
of this ORDERS record. Note that the "ITEMS child" of an orders record is
technically part of an ENTRIES record, which includes a physical field, QUAN
TITY, as well as the fields of a virtual ITEMS record.
As a final variant, we could use, instead of a constant, 1024, an order
number that was read by the host language and passed to the GET command,
in a host-language variable we shall call order.
read order
GET LEFTMOST ITEMS
WHERE ORDERS. O# = order
D
Scanning the Database
Another version of the GET command allows us to scan the entire database
for all records satisfying certain conditions. We use the word NEXT in place of
268
LEFTMOST to cause a scan rightward from the last record accessed (i.e., from the
"current record" ) until we next meet a record of the same type satisfying the
conditions in the GET NEXT statement. These conditions could differ from
the conditions that established the "current record," but in practice they are
usually the same.
Example 5.22: Suppose we want to find all the items ordered by Zack Zebra.
We again go to CUST-TREE, and we execute the program of Figure 5.17.
In principle, we examine all the customer database records, but only the one
with CNAME equal to "Zack Zebra" will satisfy the condition in the whereclause. We find the first item in the first order placed by Zebra, with the GET
LEFTMOST statement. Then we scan to the right, from order to order, and
within each order, from item to item, printing the name of each item.
Eventually, we find no more items that Zebra ordered. At that time, the
variable FAIL will become true, indicating that the GET NEXT statement has
failed to find a record. It is also possible that there are no items ordered by
Zebra, in which case the initial GET LEFTMOST statement will cause FAIL to
become true. In general, any GET statement that does not find a record sets
FAIL to true. D
GET LEFTMOST ITEMS
269
parent; the former searches rightward for any record occurrence such that it
and its ancestors satisfy the associated conditions.
Example 5.23: Another way to print all the items ordered by Zebra is to find
the root of his database record by
GET LEFTMOST CUSTOMERS
WHERE CUSTOMERS. CNAME = "Zack Zebra"
This statement makes the customer record for Zebra be the "current parent"
as well as the "current record" for the CUST-TREE tree. Then, we scan all
the ITEMS descendants of this one record by repeatedly executing GET NEXT
WITHIN PARENT, as shown in Figure 5.18. We never look for any item that is
not a descendant of the customer record for Zebra, even though there is no
WHERE CUSTOMERS . CNAME = "Zack Zebra"
clause constraining us from jumping to other database records. Notice that the
"parent" record in this case is really a grandparent of the ITEMS records being
found. The "current parent" remains the Zack Zebra record, since all retrievals
but the first use GET NEXT WITHIN PARENT. D
GET LEFTMOST CUSTOMERS
WHERE CUSTOMERS. CNAME = "Zack Zebra"
GET NEXT WITHIN PARENT ITEMS
while -'FAIL do begin
print ITEMS. INAME
GET NEXT WITHIN PARENT ITEMS
end
Figure 5.18 Using get-next-within-parent.
Insertions
An INSERT command, for which we use the same "Pidgin" syntax as for the
varieties of GET, allows us to insert a record of type S, first created in the
workspace, as a child of a designated record occurrence of the parent type for
S. If the "current record" is either of the parent type for 5, or any descendant
of the parent type, simply writing
INSERT S
will make the record of type S sitting in the workspace a child of that occurrence
of the parent type that is the current record or an ancestor of the current record.
The position of the new child among its siblings is a matter to be declared
when the database scheme is specified. We shall not discuss the syntax for
270
specifying order, but the options include making each record the rightmost or
leftmost child of its parent at the time it is inserted, or keeping children in
sorted order according to a key field or fields.
If the desired parent record is not the current record or an ancestor of
that record, we can make it be so by including a where-clause in the INSERT
statement, with syntax and meaning exactly as for the GET statement.
Example 5.24: If the Produce Department starts selling Cilantro, which we
give product number 99, we can insert this fact into the database by the steps of
Figure 5.19. If the Produce Department's DEPTS record, or some descendant
of that record such as the EMPS record for an employee of the Produce Depart
ment, were already the current record, then we could omit the where-clause of
Figure 5.19. D
ITEMS. INAME := "Cilantro"
ITEMS. ITEM# := 99
/* the above assignments take place in the ITEMS
template of the workspace */
INSERT ITEMS
WHERE DEPTS. DNAME = "Produce"
Figure 5.19 The Produce Department now sells Cilantro.
271
REPLACE
the version of the current record in the workspace replaces the corresponding
record in the database.
Example 5.25: Suppose we wish to double the amount of Brie on order number
1024. We first get the ENTRIES child of the ORDER record for 1024, and hold
it. Then we double the QUANTITY field of the record, in the workspace, and
finally store the new ENTRIES record by a REPLACE command. The steps are:
GET HOLD LEFTMOST ENTRIES
WHERE ITEMS. INAME = "Brie" AND ORDERS. O# = 1024
ENTRIES. QUANTITY := 2 * ENTRIES. QUANTITY
REPLACE
Aas another example, we can delete order 1024 by the following code.
GET HOLD LEFTMOST ORDERS
WHERE ORDERS. Off = 1024
DELETE
The effect of this sequence of steps is to delete not only the ORDERS record
for 1024, but all the ENTRIES children of that record. Those records include
the QUANTITY field and the pointer to an ITEMS record that represents the
virtual ITEMS child of ENTRIES. We do not, of course, delete any ITEMS
records or any of their children. D
5.6 DATA DEFINITION IN OPAL
The object-oriented language OPAL presents a contrast to all of the languages
we have studied in this chapter and the previous one. OPAL is the language
of the Gemstone database system marketed by Servio Logic Corp. Its data
definition and data manipulation facilities are present in one language, whose
style borrows heavily from the language Smalltalk. We shall sketch the most
important features of this language, and then discuss the way the language
lets us define database schemes. The next section talks about other aspects of
OPAL that are more important for data manipulation.
Classes
A class is an abstract data type, consisting of
1.
2.
In Section 2.7 we saw a simple example of a language for defining data structures
for classes; OPAL has a considerably more general mechanism. The way one
defines methods in OPAL will be discussed below.
272
Objects that are members of a given class C are instances of C; they each
have the data structure of that class, and the methods for class C can be applied
to any instance of C.
Classes are arranged in a hierarchy. If class C is a descendant of class D.
then the methods of class D can (usually) be applied to instances of C, but not
vice versa. The details of subclass creation and inheritance of methods will be
described near the end of this section.
Methods
A procedure in OPAL is called a method. Each method is defined for a particu
lar class, and it applies to instances of that class and instances of its subclasses,
if any. The form of a method definition is:
method <class name>
< message format >
<body of met hod >
7.
The <class name> is the class to which the method applies. The <message
format > is the name of the method and/or the names of its parameters, and
the body is the code that is executed whenever the method is called.
The format of messages requires some explanation. In the simplest form,
a message format consists of only a name for the method. Such a method has
only one argument, the receiver object to which the method is applied. The
receiver ofLajnethodjalways appears to the left of the method's name, and the
receiver must be an instance of the class for which the method is defined.
Example 5.26: Any class understands the built-in (predefined) method new.
The message format for this method is simply the word new. For our first OPAL
examples, let us draw upon Example 2.31, where we defined certain types of
structures, roughly equivalent to OPAL classes, that were used to build the
YVCB database. For example, ItemType is a class, and if we wished to create
a new instance of that class, i.e., a record for an item, we could say:
ItemType new
This OPAL statement produces a new instance of the ItemType class. As it
is, nothing is done with that instance, but we could assign it to a variable, say
NewItem, by the statement
Newltem := ItemType new
Here, the receipt of the message new by the class name ItemType causes a new
instance of that class to be generated, and the assignment symbol : = causes the
variable on the left of the assignment to become a name for that instance. D
273
274
Then the formal parameter i would have the value "Brie" when we executed
the code of Figure 5.20 on the object that CurrentItem represents.
Line (3) introduces the special object designator self, which always denotes
the receiver of the method. That is, when executing (5.2), self denotes the
same object that CurrentItem denotes. Notice that without the word self, we
would have no way to refer to the receiver of a method, because methods have
no formal parameter name to represent their receiver.
We also see in line (3) a method getName, which we suppose has already
been defined. The presumed effect of getName, when applied to an object O
of class ItemType, is to return the name field of O. Thus, when getName is
applied to self during the execution of (5.2), it has the effect of returning the
name of the item CurrentItem. That value becomes the receiver of the second
method on line (3), the built-in method =.7 This method tests whether the
value of its receiver equals the value of its parameter. For example, in (5.2), we
test whether the item CurrentItem has name field equal to "Brie."
Lines (4) and (5) represent a test on the Boolean value (true or false)
created by the expression on line (3). Think of ifTrue: and ifFalse: as
parameter names for a built-in method whose receiver is a Boolean value. The
effect of applying the method is that the block of code following ifTrue: is
executed if the message is sent to the value true, and the block of code following
ifFalse: is executed if the message is sent to false.
Blocks of code are surrounded by square brackets, which function as beginend pairs. Thus, line (4) says that if CurrentItem is indeed the item object for
Brie, then we return the item number for Brie. In explanation, " is the symbol
for "return." We suppose that the method getNumber was previously defined
and, when applied to an ItemType object, produces the value of the /# field
of that object, which is then returned by the method checkItem. Finally, line
(5) says that if the item name in CurrentItem doesn't match the parameter
i, "Brie" in the case of (5.2), then we return the string "error: wrong item
name ' as a value. L]
Creating Record Types
275
we define a class C\ of type ( 1 ) whose instances are the tuples, and we create a
class C2 of type (2) to be the type of the relation itself, that is, a set of objects
of class C\. That relation may be the only instance of class C2.
To create a record type, we send to the built-in "variable" Object the
message subclass. More exactly, there is a built-in method with a number of
parameters, most of which we shall not discuss, whose function is to define a
new class. The three important parameters of the class-creation method, as far
as we are concerned here, are:
1. subclass: <class name>. This parameter's value is the name of the new
class, given as a quoted string.
2. instVarNames: <field names>. The objects of a given class can have
instance variab/e names, which function as names of fields. These variables
are called "instance variables" because they occur in every instance of the
class.
3. constraints: <data types>. We may, optionally, place a constraint on
the class to which the value of one or more of the instance variables must
belong. It is worth noting that OPAL, as a default, is typeless, and objects
belonging to any class can be stored in instance variables or anywhere else
objects are permitted.
Example 5.28: Let us create a class ItemType corresponding to the declara
tion in Example 2.31, where this class was defined to consist of an item name
field and an item number field. The message we send to Object is shown in
Figure 5.21.
Object
subclass : ' ItemType '
instVarNames : #['name', 'number']
constraints: #[
# [ #name , String] ,
#[ Onumber, Integer]
276
277
278
279
culus, plus insertion, deletion, and modification operations; DBTG and IMS
databases provide FIND and GET, respectively, to do search and retrieval from
the database, as well as providing insert, delete, and modify operations. In
OPAL, however, even the most primitive operations must be declared for each
class we define.
For example, it would be normal that for each instance variable, we should
have a way of obtaining the value of that variable given an instance. It would
also be typical that we should have a way of changing the value of that variable,
given an instance. We shall, in what follows, assume that for each instance
variable X in any class, there is a method getX that returns the value of X,
when the message with that name is sent to an object of the appropriate class.
Also, there is a method
storeX: v
that sets X to value v when sent to an object with an instance variable X.9
Example 5.31: For class ItemType we could declare the methods of Figure
5.23. D
method: ItemType
getName
"name
7.
method: ItemType
getNumber
"number
X
method : ItemType
storeName: n
name := n
y.
method: ItemType
storeNumber: n
number := n
I
Figure 5.23 Methods for ItemType.
9 It is possible to create methods like these automatically by sending the class name the
message compileAcceasMethodsFor:, followed by a list of instance variables.
280
Insertion
As we saw in Section 5.6, sets can be used as if they were relations. We might
therefore want to define a method for some class that was a set of "tuples,"
to allow us to create new tuple objects and insert them.10 The scenario is as
follows. Suppose we have a class T that serves as "tuples," e.g., ItemType in
Figure 5.22(a). Suppose also that 5 is the class defined to be a set of T's, as
ItemSet in Figure 5.22(a). Then we would ordinarily create one instance of
class 5, say r, to serve as the "relation" of type T. We create r by sending
the message new to S, which understands this message because all classes do.
That is, we execute:
r := 5 new.
Now, we can create a method, which we shall refer to as insert, for objects
of type S. This method takes a value for each of the instance variables (fields
or components) of class T. It creates a new object of type T, and sends that
object an appropriate stored message for each instance variable X. Finally,
insert sends the add message to r, to add the new object; add is another built-in
method that all sets understand.
Example 5.32: Suppose we have executed
Items := ItemSet new.
to create a "relation" Items that is a set of objects of class ItemType. We
can then define the method insert as in Figure 5.24. Notice that "insert" does
not appear as the name of the method; we only used that name informally.
Rather, the method is identified by its two parameter names, insertName:
and insertNumber:. Also note that surrounding NewItem with bars, as in
line (4), is OPAL's way of declaring NewItem to be a local variable for the
method.
Line (5) makes NewItem an object of class ItemType, and line (6) sets its
instance variables to the desired values, na and num, which are the values of the
parameters for this method. Note that two different methods, storeName and
storeNumber are applied to the object NewItem in line (6), and the semicolon
is punctuation to separate the two methods. Finally, line (7) adds the new tuple
to the set to which the method being defined is applied; hence the receiver self
for the method add.
Having declared Items to be the set of items for the YVCB database, we
can add Cilantro, which is item 99, by sending it the message:
Items insertName: 'Cilantro' insert Number: 99.
10 We talk as if all actions had to be defined as methods. While it would be usual for
something as fundamental as insertion into a set to be defined for that set's type, it is
also possible to write all of the operations we describe here as parts of ordinary OPAL
programs.
281
I Newltem I
(5)
(6)
(7)
num.
(8) X
Retrieval
Access to the database is obtained by following the paths that are implicit in
the structure of the various classes. For example, one of the instance variables
of each customer object is orders, which is constrained to be of class OrderSet,
that is, a set of orders. By sending message getOrders to an object of type
CuatType, we are returned this set of orders; actually we get a pointer to a
representative of the set, so this retrieval operation can be carried out cheaply,
without copying large sets unless forced to do so.
Given this pointer, which OPAL sees as a set-valued object, we can visit
each order in the set. From each order object we can reach, through its includes
instance variable, an object that is an IQset, i.e., a set of item-quantity pairs.
Similarly, from this object, we can reach each of the items in that order, and
the quantity of each that was ordered.
We should notice the similarity between this way of exploring the objects
in the database and the way we explored a tree in the hierarchical model.
What we just described is very much like exploring a database record of the
tree CUST-TREE in Figure 5.16, which has a customer record at the root, orders
records for children of the root, and children of each order record consisting of
a (virtual) item and its corresponding quantity. The principal difference is that
in OPAL, all objects other than constants are "virtual." For example, orders
records appear physically as children of customer records in the hierarchical
database, but in the OPAL database, only pointers to orders appear in the
object that is a set of orders. Furthermore, that set of orders does not appear
physically in the customer record; only a pointer to that set-valued object
does.11
11 Of course, physical layout of either kind of database may allow the orders placed by a
283
Here, variable c takes on each customer object in turn. Sending the getBalance
message to c returns the balance of the customer, and if that value is less than
0, the block has value true. In that case, a pointer to the customer object
represented by c is placed in the set being formed by the selection, and when
that set is complete, it is assigned to the new variable Deadbeats, which, like
Customers, is of class CustSet. D
284
i]
Deletion
Let us reflect briefly on the different ways different types of systems perform
deletion. Each of the relational systems, being value-oriented, deletes one or
more tuples from a relation by specifying values in certain fields that the victim
285
tuples must possess. On the other hand, network and hierarchical systems,
being object-oriented, first locate the victim by making it the "current of rununit" or the equivalent, and then delete the record independently of any values
it may have. OPAL, being object-oriented, also needs to locate victim objects
before it can delete them. However, in OPAL we have no currency pointers to
keep track of objects automatically; rather we need to store the victim objects
in variables used for that purpose.
If variable O's value is an object in set 5, then sending 5 the message
5 remove: O
will delete that object from S.
There are several ways we could arrange that O denotes an object in 5.
One approach is to use the do: method described below.
Index Creation
To this point, we have described OPAL as if it were a general-purpose language,
286
I OrdersForCust I
Customers do:
[:c I
OrdersForCust : c getOrders.
OrdersForCust do:
[:o I
(o testFor: 'Brie')
ifTrue: [OrdersForCust remove:
o]
287
Identity Indices
We can also create indices on subparts of the elements of a set, even if those
subparts are not of elementary type. We base these indices on the object
identity of the objects found in those fields. The paths referring to such fields
are of the same form /i./2. -In as for equality indices, but condition (3) does
not necessarily hold; that is, /n does not have to be constrained to be of an
elementary type.
Example 5.38: We could have based an index for IQPs of Example 5.37 on
the object identity of the item objects themselves. That is, we could say
IQPs createldentityIndexOn: 'item'
Note that the parameter name mentions "identity" rather than "equality." D
Using Indices
When we create one or more indices on an object that is a set, that object
functions like a database object, for example, as a relation or as a logical record
type. To take advantage of efficient retrieval from such sets, we must use a
selection block, which is a one-argument block whose body is a conjunction
(logical AND) of comparisons. Each comparison must relate two variables or
constants by one of the arithmetic comparison operators.
If we wish to take advantage of an equality index, we can use any of the
usual six comparisons on values, which are written =, "=, <, >, <=, and >= in
OPAL. If we wish to use an identity index, which we should recall is on the
objects' identities (i.e., pointers to objects) themselves, then the last four of
these make no sense, and we are restricted to the two comparisons on object
identities, == (the same object) and
(different objects).
Selection blocks are distinguished from ordinary one-argument blocks by
the use of curly brackets {} rather than square brackets [] . If there is no
appropriate index to use, the effect of a selection block does not differ from
288
EXERCISES
289
290
COUNTRIES
FLEETS
NAVAL BASES
SQUADRONS
DESTROYERS
CRUISERS
SUBMARINES
CARRIERS
EXERCISES
291
REGIONS
OFFICES
AGENTS
LISTINGS
CLIENTS
(a) Hierarchy.
REGIONS(RNAME)
OFFICES(CITY, OADDR)
AGENTS(ANAME, SALES)
LISTINGS(LADDR, PRICE)
CLIENTS(CNAME, CADDR)
(b) Record formats.
Figure 5.28 Real estate database.
5.10: In Exercise 2.11(f) we defined a scheme for courses, students, and grades in
the object model of Section 2.7. Translate that scheme into data definitions
of OPAL.
5.11: Write the following queries in (t) the DBTG DML (it) DL/I (tit) OPAL.
Refer to the databases defined in (t) Figures 5.2 and 5.3 (it) Figure 5.16
(iit) Figure 5.22, respectively.
a) Find all the employees of the Produce department.
b) Find the items supplied by Acme.
c) Find the suppliers of the items sold by the Produce department.
* d) Find the manager of the department Arnold Avarice works for.
5.12: Give OPAL data definitions for the beer drinkers' database of Exercise
4.1. Create suitable classes for drinkers, bars, and beers, that make all
the important connections accessible. For example, the object for a bar
consists of a field (instance variable) for the name of the bar, a field for the
set of drinkers who frequent the bar, and a field for the set of beers sold
at the bar. Also, declare OPAL variables Drinkers, Bars, and Beers to
represent the sets of objects of the three classes.
292
* 5.13: Write OPAL programs to answer the queries of Exercise 4.1, using the
database you denned in answer to Exercise 5.12. Assume that getX has
been defined for every instance variable X of every class. As we have not
discussed printing in OPAL, assign the results of the queries to suitable
variables. Hint: The following methods that were not covered in the text
may prove useful here and in following exercises:
i) B not produces the complement of the Boolean value B.
it) S includes: O produces the value true if object O is a member of set
S and false if not.
ttt) S isEmpty produces true if S is an empty set and false otherwise.
5.14: Write the queries of Exercise 4.3 in OPAL, referring to the database of
Exercise 5.12.
* 5.15: Write an OPAL program to find the sum of all the customers' balances in
the YVCB database.
* 5.16: Suppose we have defined employee-manager pair objects with the following
OPAL declaration:
Object subclass: 'EMpair1
instVarNames : # [ ' emp ' , ' mgr ' ]
constraints: #[ #[#emp, String], #[#mgr, String]]
Also suppose that Manages is a set of employee- manager pairs. Write
an OPAL method that finds, in Manages, all the subordinates of a given
individual i, that is, all the individuals of whom t is the "boss" in the sense
of Example 1.12.
BIBLIOGRAPHIC NOTES
A number of references concerning object-oriented systems and models were
given in the bibliographic notes for Chapters 1 and 2. Here, we shall add
references to particular database management systems.
Network-Model Systems
As was mentioned, the DBTG proposal comes from CODASYL [1971, 1978].
Olle [1978] is a tutorial on the proposal.
Among the important systems based on this proposal are TOTAL (Cincom
[1978]), IDMS (Cullinane [1978]), and ADABAS (Software AG [1978]). Each
of these systems is described in Tsichritzis and Lochovsky [1977], and TOTAL
is also described in Cardenas [1979].
BIBLIOGRAPHIC NOTES
293
IMS
The material in Sections 5.4 and 5.5 is based on IBM [1978b]. More exten
sive descriptions of the system can be found in Date [1986], Tsichritzis and
Lochovsky [1977], and Cardenas [1979].
System 2000
Another important system based on the hierarchical model is System 2000 (MRI
[1978]). For descriptions, see Tsichritzis and Lochovsky [1977], Cardenas [1979],
or Wiederhold [1983].
OPAL
The description of the language in Sections 5.6 and 5.7 is taken from Servio
Logic [1986]. The underlying Smalltalk language is defined in Goldberg and
Robson [1980]. The Gemstone database management system, of which OPAL
is the user interface, is described in Maier, Stein, Otis, and Purdy [1986].
CHAPTER 6
Physical
Data
Organization
We have alluded many times to the need to make operations like selection from
a relation or join of relations run efficiently; for example, selections should take
time proportional to the number of tuples retrieved, rather than the (typically
much larger) size of the relation from which the retrieval is made. In this
chapter we cover the basic techniques of storage organization that make these
goals realistic.
We begin by discussing key-based organizations, or "primary index" struc
tures, in which we can locate a record quickly, given values for a set of fields
that constitute a key. These organizations include hashing, indexed-sequential
access, and B-trees. Then we consider how these structures are modified so
we can locate records, given values for fields that do not constitute a key and
whose values do not, in principle, influence the location of the record. These
structures are called "secondary indices."
Then, we explore what happens when the objects stored, which we think
of as records, have variable-length. This situation includes both true variablelength records, e.g., those containing fields that are strings of arbitrary length,
and structures that are more complex than records, such as a record for a
department followed by records for all of the employees of that department.
We next show how these techniques are used to support efficient access in the
database systems that we discussed in Chapters 4 and 5.
In the last sections, we discuss partial-match queries and range queries, two
classes of database operations that are increasing in importance as database sys
tems tackle the new kinds of applications discussed in Section 1.4. We offer two
data structures to support systems where these types of queries are common:
partitioned hashing and k-d-trees.
294
295
Files
As with the higher-level data models, it is normal to see records as being in
stances of a scheme. That is, normally we deal with collections of records that
have the same number of fields, and whose corresponding fields have the same
data type, field name, and intuitive meaning. For example, records representing
the tuples of a single relation have a field for each attribute of that relation, and
the field for each attribute has the data type associated with that attribute. Let
us term the list of field names and their corresponding data types the format
for a record.
We shall use the term file for a collection of records with the same format.
Thus, for example, a file is an appropriate physical representation for a relation.
This notion of a file differs from the common notion of a "file" as a stream of
characters, or perhaps other types of elements, accessible only by scanning from
beginning to end. In practice, we shall find that files are normally accessible in
many different ways, and their records often are not stored as a single stream
or sequence.
Two-Level Storage
The physical storage medium in which records and files reside can normally
296
Blocks
Another factor that influences the way we account for costs in database op
erations is that it is normal for storage to be partitioned into blocks of some
substantial number of bytes, say 29 through 212, and for transfers of data be
tween secondary and main memory to occur only in units of a full block. This
constraint applies whether we have a system that supports virtual memory,
in which case that memory is partitioned into blocks of consecutive bytes, or
whether our secondary storage is thought of as the bytes of a disk, in which
case a block might be a sector of a single track.
It is common that records are significantly smaller than blocks, so we fre
quently find several records on one block. Since our costs are so closely tied to
the number of blocks we move between main and secondary memory, it becomes
very important to arrange that, when we have to access several records of a file,
they tend to lie on a small number of different blocks. Ideally, when we access
a block, we need to access all, or almost all, the records on that block.
The Cost of Database Access
We shall define the unit of cost for operations on physical blocks to be the
block access, which is either the reading from or writing into a single block.
We assume that computation on the data in a block does not take as much
time as transferring a block between main and secondary memory, so the cost
of computation will generally be neglected.
In reality, not every time we need to read or write the contents of a block
will the block be transferred to or from secondary memory. The operating
system or the DBMS itself will buffer blocks, keeping copies around in main
memory as long as there is room and remembering they are there. However, we
297
often cannot predict whether a block will be available in main memory when we
need it, since it may depend on factors beyond our control, such as what other
jobs are running on the system at the time, or what other database operations
are being executed as a result of requests from other users.
Conversely, the time to access a particular block on a disk depends on the
place where the last access on that disk was made, because of the time to move
the heads from cylinder to cylinder and the time it takes a disk to rotate from
one angular position to another. Systems that deal with the largest amounts of
data often need to take into account the exact sequence in which block accesses
are made and design the layout of blocks on the disk units accordingly. These
systems are often quite limited in the class of operations on data that they
perform, compared with the data manipulation languages discussed in Chapters
4 and 5. We shall, therefore, not consider access costs at this level of detail.
In summary, we assume there is some fixed probability that the need to use
a block will actually result in a transfer of data between main and secondary
memory. We also suppose that the cost of an access does not depend on what
accesses were made previously. With that agreed, we can assume each block
access costs the same as any other, and thus we justify the use of block accesses
as our measure of running time.
Pointers
In essence, a pointer to a record r is data sufficient to locate r "quickly." Because
of the variety of data structures used to store records, the exact nature of a
pointer can vary. The most obvious kind of pointer is the absolute address, in
virtual memory or in the address system of a disk, of the beginning of record r.
However, absolute addresses are often undesirable; for several reasons, we
might permit records to move around within a block, or perhaps within a group
of blocks. If we moved record r, we would have to find all pointers to r and
change them. Thus, we often prefer to use as a pointer a pair (6, fc), where b
is the block on which a record r is found, and k is the key value for r, that is
the value of the field or fields serving as a key for records in the file to which
r belongs. If we use such a scheme, then in order to find r within block b we
need to rely on the organization of blocks so that we can find r within b. The
matter of block formats is discussed below, but as an example, in order to find
r in block 6, given its key k, it is sufficient to know that:
1. All records in block 6 have the same record format as r (and therefore,
none can agree with r in its key fields),
2. The beginnings of all the records in block b can be found (so we can examine
each in turn, looking for key &), and
3. Each record in block 6 can be decoded into its field values, given the be
ginning of the record (so we can tell if a record has key fc).
298
Pinned Records
When records may have pointers to them from unknown locations, we say the
records are pinned; otherwise they are unpinned. If records are unpinned, they
can be moved around within blocks, or even from block to block, with no adverse
consequences, as long as the movement of blocks makes sense from the point
of view of the data storage structure. However, when records are pinned, we
cannot move them at all, if pointers are absolute addresses, and we can move
them only within their block if a block-key-pair scheme is used for pointers.
Another constraint we face when records are pinned is that we cannot
delete them completely. If there were a pointer p to record r, and at some time
we deleted r, we might, at a later time place some other record r' in the space
formerly occupied by r. Then, if we followed pointer p, we would find record
r' in place of r, yet have no clue that what we found was not the record r
to which p was intended to refer. Even if we use block-key pairs for pointers,
we are not completely safe from this problem, known as dangling pointers or
dangling references. The reason is that r' might have the same key value as
r, since it was inserted into the file after r had left, and therefore, caused no
violation of the principle of unique key values.
To avoid dangling pointers, each record must have a bit called the deleted
bit, that is set to 1 if the record is deleted. The space for the record can never
again be used, but if we go searching for a record, say by following a pointer,
and we come upon the deleted record, we know the record isn't really there and
ignore it.
Record Organizations
When we arrange the fields in a record, we must place them in such a way that
their values can be accessed. If all fields have fixed length, then we have only
to choose an order for those fields. Each field will thus begin at a fixed number
of bytes, called its offset, from the beginning of the record. Then, whenever we
come upon a record known to have the format in question, we can find a field,
given the beginning of the record, by moving forward a number of bytes equal
to the offset for that field.
There may be several bytes, not devoted to data fields, that are required
in each record. For example, under some circumstances we need:
1. Some bytes that tell us what the format of the record is. For example,
if we are storing records belonging to several record types or several rela
tions, we may wish to store a code indicating the type or relation of each.
Alternatively, we can store only one type of record in any block, and let
the block indicate the type of all of its records.
2. One or several bytes telling how long the record is. If the record is of a
type that has only fixed-length fields, then the length is implicit in the type
299
information.
A byte in which a "deleted" bit, as described above, is kept.
A "used/unused" bit, kept in a byte by itself, or sharing a byte with other
information such as the "deleted" bit. This bit is needed when blocks are
divided into areas, each of which can hold a record of some fixed length.
We need to know, when we examine an area, whether it really holds a
record, or whether it is currently empty space, with some random data
found therein.
5. Waste space. We might put useless bytes in a record's area is so that
all fields can begin on a byte whose address is a convenient number. For
example, many machines operate on integers more efficiently if they begin
at an address divisible by 4, and we shall assume this requirement here.
Example 6.1: Let us introduce a simple, running example for this chapter.
We suppose that records of the type numbers consist of the following fields:
1. Field NUMBER, of type integer, which serves as a key. It is intended that
this field always holds a positive integer.
2. Field NAME, which is a single byte indicating the first letter of the English
name for the number in the first field. All positive integers have names that
begin with one of the letters in the word soften, but there is no known
significance to this fact.
3. Field SQUARE, which holds the square of the number in the first field.
In this example, SQUARE is of type integer. In other examples, we shall
let SQUARE be a character string holding the digits of the number in
question; the purpose of the latter arrangement is so this field can vary in
length.
On the assumption that integers take four bytes, the three fields above
take a total of nine bytes. To this quantity, we shall add another byte, at the
beginning of the record, which holds a used/unused bit and a "deleted" bit. We
shall call this the INFO byte.
3.
4.
01234
78
11
NUMBER SQUARE
INFO NAME WASTE
Figure 6.1 A fixed-length record format.
However, recall we suppose integers must begin at an address that is a
multiple of 4. Thus, it makes the most efficient use of space if we choose as our
record organization the order: INFO, NAME, NUMBER, SQUARE, placing
300
two waste bytes after NAME so the last two fields can be properly aligned.
The arrangement is suggested in Figure 6.1, and it uses 12 bytes per record. D
Variable-Length Records
When fields can vary in length, we have additional record-formating problems,
because we cannot rely on fields being at the same offset in each record with a
given format. There are two general strategies:
1.
2.
Let each field of variable length start with a count, that tells how many
bytes the field value uses. If there is more than one variable-length field,
it is sometimes useful to have in the beginning of the record a count of
the total length of the record, although that information is, in principle,
redundant.
Place, in the beginning of each record, pointers to the beginning of each
variable-length field. We also need a pointer to the end of the last such
field. Furthermore, it is necessary to have all the fixed-length fields precede
the variable-length fields, so the end of one is the beginning of the next.
Scheme (1) uses less space, but it is time-consuming to locate fields beyond
the first variable-length field, since we can only calculate the offset of a field if
we examine all previous variable-length fields, in turn, to determine how long
they are. We shall give a simple example of (1) below. Scheme (2) can be
used not only for storing fields within records but for storing records within
blocks, and we shall give an example of such an arrangement when we cover
block formats.
Example 6.2: Let us consider numbers records, as introduced in Example 6.1,
but with the field SQUARE stored as a character string composed of its decimal
digits. The bytes of this field will be preceded by a single byte whose value is
the number of (additional) bytes used by the SQUARE field. Thus, character
strings for this field are limited to the range 0 to 255, which is a common
treatment for variable-length character strings. The fields and information bytes
of the record are:
1. Byte 0 holds the length of the entire record, including the variable-length
field. Thus, the limit on field SQUARE is somewhat more stringent than
255 bytes, since the whole record must use no more than 255 bytes.
2. Byte 1 holds the INFO bits, discussed in Example 6.1.
3. Byte 2 holds the field NAME.
4. Byte 3 is waste.
5. Bytes 4-7 hold the field NUMBER.
6. Byte 8 holds the length of the field SQUARE.
7. Bytes 9 and following hold the value of SQUARE, as a character string.
The contents of two records, for numbers 2 and 13, are shown in Figure 6.2.
1234
10
301
789
14
INFO WASTE
(a) Record for NUMBER = 2.
0
12
1234
789 10 11
13 3 1 6
INFO WASTE
(b) Record for NUMBER = 13.
Figure 6.2 Variable-length records.
Notice that because there is only one variable-length field, the length of
the record and the length of that field are easily related, and we can dispense
with either byte 0 or byte 8, but not both. That is, the value of byte 0 is always
nine more than the value of byte 8.
Also note that if there were fields following SQUARE in this format, then
we would have to consult byte 8 to find them. For example, the offset of a
hypothetical field following SQUARE would have offset equal to nine plus the
contents of byte 8. D
Block Formats
Just as we need to locate fields within a record, we must be able to locate
records within a block. As records require some space for format information,
such as a length or a "deleted" bit, so do blocks often require some extra space
for special purposes. For example, blocks often have pointers in fixed positions
to link blocks into lists of blocks.
If integers and pointers within the records are required to start at "conve
nient" bytes, which we have taken (as a plausible example) to mean "divisible
by 4," then we must be careful how we place records within a block. While
many variations are possible, the simplest scheme is to assume that the offsets
of integers and pointers within a record are always divisible by 4, and then
require that records start with an offset within their block that is also divisible
by 4. Since blocks themselves will begin at bytes that are multiples of some
large power of 2, it follows that the address (first byte) of a block will also be
divisible by 4, and thus, all fields that need to be aligned will start at bytes
divisible by 4.
302
11 12
23 24
35 36
47 48
59 60 63
303
of the second record; the length field in that record tells us where to find the
third record, and so on.
Evidently, that is a cumbersome way to search the block, so a more desir
able approach is to place at the beginning of the block a directory, consisting
of an array of pointers to the various records in the block. These "pointers"
are really offsets in the block, that is, the number of bytes from the beginning
of the block to the place where the record in question starts.
The directory can be represented in several ways, depending on how we
determine the number of pointers in the directory. Some choices are:
1.
2.
Precede the directory by a byte telling how many pointers there are.
Use a fixed number of fields at the beginning of the block for pointers to
records. Fields that are not needed, because there are fewer records in the
block than there are pointers, are filled with 0, which could not be the
offset of a record under this scheme.
Use a variable number of fields for pointers to records, with the last field
so used holding 0, to act as an endmarker for the list of pointers.
3.
0 3 4 7 8 11 12 15 16
16
28
40
25 26 27 28
Record 2
39 40
53 54 59 60 63
We could have been more economical about the storage of offsets in the
first four fields. Rather than using four bytes for each, we could have used a
single byte for each offset, since in these tiny blocks, offsets are numbers in the
range 0-63. In fact, even if blocks were of length 1024, which is a common
choice, we still could have stored offsets in a single byte, assuming that offsets
had to be divisible by 4 and storing the offset divided by 4, in the byte. D
304
Semi-Pinned Records
Another reason for adopting the scheme of Figure 6.4 is that it has the effect of
"unpinning" pinned records. If there are pointers to a record r from outside the
block, we make that pointer point to the field of the directory that holds the
offset of r. We may then move r around within the block, changing its offset in
the directory as we do, and we never create a dangling reference.
Incidentally, when we have variable- length records, there is frequently a
good reason why we would move these records around in the block. For example,
the data in a record may grow or shrink, and we wish to make space by moving
all following records to the right, or we wish to consolidate space and move
subsequent records left. The number of records on the block may change,
requiring us to create additional directory fields, and move records around to
make the room.
Another advantage of the scheme of Figure 6.4 is that we can, in effect,
delete pinned records. We move the "deleted" bit from the record itself to the
directory, assuming there is room. Then, if we wish to delete a record r, we can
reuse its space. We set the "deleted" bit in r's directory entry, so if we ever
follow a pointer to that entry, we shall know the record is no longer there. Of
course, the directory entry is now dedicated to the deleted record permanently,
but that is preferable to retaining the space for a large, deleted record.
1.
2.
3.
4.
305
Lookup. Given a "key" value, find the record(s) with that value in its "key"
fields. We put quotation marks around "key" because we need not assume
that values for the field or fields forming a "key" uniquely determine a
record.
Insertion. Add a record to the file. We assume that it is known that the
record does not already exist in the file, or that we do not care whether or
not an identical record exists. If we do wish to avoid duplication of records,
then the insertion must be preceded by a lookup operation.
Deletion. Delete a record from the file. We assume it is not known whether
the record exists in the file, so deletion includes the process of lookup.
Modification. Change the values in one or more fields of a record. In
order to modify a record, we must find it, so we assume that modification
includes lookup.
Efficiency of Heaps
Suppose there are n records, and that R is the number of records that can fit
on one block. If records are pinned, and deleted records cannot have their space
reused, then we should understand n to be the number of records that have ever
existed; otherwise, n is the number of records currently in the file. If records
are of variable length, then take R to be the average number of records that
can fit in a block, rather than the exact number. Then the minimum number
of blocks needed to store a file is \n/R~\ , or, since n is normally much greater
than R, about n/R.
Recall that the time to perform operations such as insertion, deletion, and
lookup is measured by the number of blocks that must be retrieved or stored,
between secondary and main memory. We shall assume, for uniformity among
all our data structures, that initially, the entire file is in secondary memory. To
look up a record in a heap, given its key, we must retrieve n/2R blocks on the
average, until we find the record, and if there is no record with that key, then
we must retrieve all n/R blocks.
To insert a new record, we have only to retrieve the last record of the heap,
which is the one that has empty space within it. If the last block has no more
room, then we must start a new block. In either case, we must write the block
to secondary storage after we insert the record. Thus, insertion takes two block
accesses, one to read and one to write.
Deletion requires us to find the record, i.e., perform a lookup, and then
rewrite the block containing the record, for a total of n/2R + 1 accesses, on
the average when the record is found, and n/R accesses when the record is not
found. If records are pinned, then the process of deletion is only the setting
of the "deleted" bit. If records are not pinned, then we have the option of
reclaiming the space of the record.
For deletions from files of fixed-length records, we can reclaim space by
306
finding a record on the last block of the file, and moving it into the area of
the deleted record. With luck, we can dispense with the last block altogether,
if we have removed its last record. In general, compacting the file in this way
minimizes the number of blocks over which it is spread, thereby reducing the
number of blocks that must be retrieved in lookups and further deletions.
If records are of variable length, we can still do some compaction. If we
use a format like that of Figure 6.4, we can slide records around in the block,
making sure the pointers to those records, which are at the beginning of the
block, continue to point to their records. If we create enough space in one block
to move in a record from the last block, we might do so. However, when records
vary in length, it is wise to keep some of the space in each block free, so when
records on that block grow, we are unlikely to have to move a record to another
block.
Finally, modification takes time similar to deletion. We need n/2R block
accesses for a successful lookup, followed by the writing of the one block con
taining the modified record. If records are of variable length, then we may want
to, or be required to, read and write a few more blocks to consolidate records
(if the modified record is shorter than the old record), or to find another block
on which the modified record can fit, if that record has grown.
Example 6.6: Suppose we have a file of 1,000,000 records, of 200 bytes each.
Suppose also that blocks are 212 = 4096 bytes long. Then R = 20; i.e., we can
fit 20 records on a block. Thus, a successful lookup takes n/2R = 25,000 block
accesses, and an unsuccessful one takes 50,000 block accesses. On the optimistic
assumption that the retrieval of any block from disk takes .01 second, even
successful lookups take over four minutes. The time to do modification and
deletion are essentially the same as for lookup. Only insertion, which assumes
no search of the file is necessary, can be done at "computer speed," that is, in
a fraction of a second.
The directory of blocks takes a significant amount of space, perhaps so
much that we would not want to keep it in main memory. Suppose that block
addresses are four bytes long. Then we need 200,000 bytes, or 50 blocks, to
hold the addresses of all the blocks. D
307
of buckets used for this file. The integer h(v) is the bucket number for the key
value v.
Each bucket consists of some (presumably small) number of blocks, and
the blocks of each bucket are organized as a heap. There is an array of pointers,
indexed from 0 to B 1, which we call the bucket directory. The entry for i
in the bucket directory is a pointer to the first block for bucket t; we call this
pointer the bucket header. All of the blocks in bucket t are linked in a list by
pointers, with a null pointer, some value that cannot be the address of a block,
appearing in the last block of the list (or in the bucket header if the bucket is
currently empty). It is common for B to be sufficiently small that the entire
bucket directory can reside in main memory, but if that is not the case, then
the directory is spread over as many blocks as necessary, and each is called into
main memory as needed.
Hash Functions
There are many different kinds of functions one can use for the hash function
h. It is essential that the range be 0, . . . , B - 1, and it is highly desirable that h
"hashes" keys; that is, h(v) takes on all its possible values with roughly equal
probability, as v ranges over all possible key values. A great deal has been said
about hash functions, and we do not intend to go into the subject deeply here;
see Knuth [1973], for example.
A simple kind of hash function converts each key value to an integer, and
then takes the remainder of that integer modulo B. If key values are integers to
begin with, we simply compute h(v) = v mod B. If keys are character strings,
we can convert strings to integers as follows. Divide a string into groups of
characters, perhaps one or two characters per group, treat the bits representing
the character group as an integer, and sum these integers.
If key values consist of values from several fields, convert each field's value
to an integer, by a method such as the ones just mentioned, and take their sum,
and divide the result by B, the number of buckets. A variety of other ways to
produce "random" integers from data of other types exist, and good methods
are not hard to discover.
Example 6.7: In Figure 6.5 we see a file of numbers records with the format
introduced in Example 6.1, organized as a hashed file with four buckets; i.e.,
B = 4. We assume the parameters of Example 6.3, that is, records twelve bytes
long, with up to five of these and a link packed into one block, as in Figure 6.3.
We have stored a set of prime numbers, using the hash function h(v) = v mod 4.
Storing keys that are primes is one of the few foolish things one can do
with a hash function that chooses buckets by taking remainders divided by B,
the number of buckets. There can never be a prime that goes into bucket 0,
except for B itself, if B is a prime. In Figure 6.5, there will never be a prime
308
17 13
29
--HI
23
7 35 19 11
31
Bucket
Directory
Figure 6.5 Hashed file organization.
in bucket 0, and only the prime 2 belongs in bucket 2. Thus, all the records for
primes, except for 2, will distribute themselves in buckets 1 and 3, so we only
get the benefit of two buckets, while paying for four buckets. D
309
Having found the last block, we place the inserted record therein, if there
is room. If there is no room, we must obtain another block and link it to the
end of the list for bucket h(v).
Deletions are executed similarly. We find the record to be deleted as in
a lookup. If records are pinned, then we simply set the "deleted" bit in the
record. If records are unpinned, we have the option of compacting the blocks of
the bucket, as we did for a heap in the previous section. So doing may reduce the
number of blocks needed for this bucket, thereby reducing the average number
of block accesses needed for subsequent operations.
Example 6.8: It is discovered that 35 is not a prime, so we wish to delete its
record from the structure of Figure 6.5. We compute /i(35) = 35 mod 4, which
is 3, and so look in bucket 3, where we find the record for 35 in the first block on
the list. Assuming records are unpinned, we can go to the last (second) block
in bucket 3 and move a record from that block into the third area in the first
block for bucket 3. In this case, the record for 31 is the only candidate, and it
empties its block. We can thus remove the second block for bucket 3, leaving
the situation in Figure 6.6. D
0
0-
17 13
29
23 I 7 | 31 | 19 1 11
Bucket
)irector
I
Figure 6.6 Effect of deleting 35.
Finally, modifications are performed by doing a lookup. We then change
the field or fields in those records that are to be modified. If records are variablelength, there is the possibility that records will have to be moved among the
blocks of the bucket, as discussed in connection with heaps.
Efficiency of Hashing
The central point is that a hashed file with li buckets behaves as if it were a
heap approximately 1/Bth as long. Thus, we can speed up our operations by
almost any desired factor, ?, that we want. The limiting considerations are:
310
1.
Buckets must have at least one block, so we cannot lower the average
lookup cost below one access per lookup, no matter how large we make B.
2. We have to store the bucket directory, either in main memory or on blocks
of secondary storage. Making B too large forces us to use secondary storage
and increase the number of block accesses by one per operation (to retrieve
the block with the needed bucket header).
Thus, if we have a file of n records, of which R fit on a block, and we
use a hashed file organization with B buckets, whose headers are kept in main
memory, we require on the average:
a) \n/2BK\ accesses for a successful lookup, deletion of an existing record,
or modification of an existing record.
b) \n/ BK\ accesses for an unsuccessful lookup, or for checking that a record
is not in the file (during the attempted deletion of a nonexistent record or
during a check for existence prior to an insertion).
The reason these relationships hold is that the average bucket has n/B
records, and we can apply the analysis for heaps from the previous section. It
should be emphasized that these estimates assume random records in our file.
If the hash function does not distribute records evenly among the buckets, or
if by bad luck, our file has an atypical collection of records, then the average
number of accesses per operation could rise considerably.
To these estimates, we must add one if the bucket directory is not in
main memory, and we must add an additional one for operations that require
modification and writing of one of the blocks of the bucket (i.e., for anything
but a lookup). If records are of variable length, and it may be necessary to move
records among blocks of the bucket, then a fraction of an access per operation
should be added.
Example 6.9: Let us consider the file with n = 1,000,000 and R - 20 discussed
in Example 6.6. If we choose B = 1,000, then the average bucket has n/B =
1,000 records, which would be distributed over n/BR 50 blocks. On the
assumption block addresses require four bytes, the bucket directory requires
4,000 bytes, and could easily be kept in main memory. The operations requiring
examination of an entire bucket, such as lookup of a record that is not in the file,
take 50 block accesses, plus another one if writing of a block is needed, e.g., if
the operation is insertion preceded by a check that the record does not already
exist. Operations requiring search of only half the bucket on the average are
expected to require 25 or 26 accesses. Using our previous estimate of .01 second
per access, any of these operations can be performed in under a second. D
311
access method. The description of isam files that we shall give here assumes
that keys are true keys, each belonging to a unique record of the file, rather
than "keys" that determine a small number of records, as in the previous two
sections. We leave as an exercise the generalization of the lookup technique
to the case where keys are really "keys." For the isam representation, we are
required to sort the records of a file by their key values, so let us first consider
how data of arbitrary format is to be sorted.
Sorting Keys
No matter what the domain of values for a field, we can, in principle, compare
values from the domain, and therefore we can sort these values. The justification
is that to be stored in a file, the values must be representable as bit strings,
which can be ordered if we treat them as integers and use numerical order.
The usual domains of values, such as character strings, integers, and reals
have conventional orders placed on them. For integers and reals we have nu
merical order. For character strings we have lexicographic, or dictionary order
defined by X\X2 Xk < YiY2 Vm, where the X's and Y"s represent char
acters, if and only if either
1. k < m and X\ Xk = Y\ Yk, or
2. For some i < min(k, m), we have X\ = YI, X2 = 2, . . . , Xi-i = YI-I, and
the binary code for Xi is numerically less than the binary code for YJ.
In ASCII or any other character code in common use, the order of the codes
for letters of the same case is alphabetical order and the order of the codes for
digits is the numerical order of the digits. Thus, for example, ' an ' < ' and ' by
rule (1), and 'banana1 < 'bandana1 by rule (2) with i = 4.
If we have a key of more than one field, we can sort key values by first
arbitrarily picking an order for the key fields. Records are sorted by the first
field, which will result in an ordered sequence of clusters; each cluster consists
of records with the same values in the first field. Each cluster is sorted by the
value of the second field, which will result in clusters of records with the same
values in the first two fields. These clusters are sorted on the third field, and
so on. Note that this ordering is a generalization of lexicographic ordering for
character strings where, instead of ordering lists of characters, we order lists of
values from arbitrary domains.
Example 6.10: Suppose we have a key with two fields, both with integer
values, and we are given the list of key values (2,3), (1,2), (2,2), (3,1), (1,3).
We sort these on the value of the first field to get (1,2), (1,3), (2,3), (2,2), (3,1).
The first cluster, with 1 in the first field, is by coincidence already sorted in
the second field. The second cluster, consisting of (2,3) and (2,2), needs to be
interchanged to sort on the second field. The third cluster, consisting of one
record, naturally is sorted already. The sorted order is
312
313
For example, it would not be convenient to use the hashed file organization of
Section 6.3 for index files, since there is no way to find the value v2 that covers
i in a hashed file without searching the entire file.
Searching an Index
Let us assume the index file is stored over a known collection of blocks, and
we must find that record (v2,b) such that v2 covers a given key value v\. The
simplest strategy is to use linear search. Scan the index from the beginning,
looking at each record until the one that covers i is found. This method is
undesirable for all but the smallest indices, as the entire index may be called
into main memory, and on the average, half the index blocks will be accessed
in a successful lookup. Yet even linear search of an index is superior to linear
search of the main file; if the main file has R records per block, then the index
has only 1/flth as many records as the main file. In addition, index records
are usually shorter than records of the main file, allowing more to be packed on
one block.
Binary Search
A better strategy is to use binary search on the keys found in the index file.
Suppose that B\,...,Bn are the blocks of the index file (not main file), and
i, . . . , vn are the first keys found on BI,..., Bn, respectively. Let us look for
the block of the main file where a record with key v could be found. We first
retrieve index block &fn/2l and compare with its first key, say w. If v < w,
repeat the process as if the index were on blocks BI Bfn/2i-i- If v > w,
repeat the process as if the index were on B(-n/2l Bn. Eventually, only one
index block will remain. Use linear search on that block to find the key value
in the index that covers v. There is a pointer to a block B of the main file
associated with this key, and if there is a record with key , it will be on block
B.
As we divide the number of blocks by two at each step, in [log2(n + 1)]
steps at most we narrow our search to one index block. Thus the binary search
of an index file requires that about log2 n blocks be brought into main memory.
Once we have searched the index, we know exactly which block of the main file
must be examined and perhaps must be rewritten to perform an operation on
that file. The total number of block accesses, about 2 + log2 n, is not prohibitive,
as an example will show.
Example 6.11: Let us again consider the hypothetical file of 1,000,000 records
described in Example 6.6. We assumed blocks were 4,096 bytes long, and records
200 bytes long. Since the length of the key matters here, let us assume the key
field or fields use 20 bytes. As R = 20, i.e., 20 records fit on a block, the main
file uses 50,000 blocks. We thus need the same number of records in the index
314
file.
An index record uses 20 bytes for the key and 4 bytes for the pointer
to a block of the main file. As many as 170 records of 24 bytes each could
fit on one block, but that would leave no room for used/unused bits. Let us
suppose 150 records are placed on one block of the index file. We would then
require 50,000/150 = 334 blocks for the index file; that is, n = 334 in the above
calculation.
Linear search would require about 168 index block accesses on the average
for a successful lookup, in addition to two accesses to read and write a block
of the main file. However, if we use a binary search, accessing and rewriting
a record of the main file requires 2 + log2 334, or about 11 block accesses.
In comparison, the hashed organization requires only three accesses, on the
average, provided we use about as many buckets as there are blocks of the main
file (one access to read the bucket directory, and two to read and write the lone
block of the bucket).
However, there are some advantages of the sorted organization over the
hashed file. In response to a query asking for records with keys in a given
range, we would have to examine almost all the buckets of the hash table, if the
range were substantial, so the hash table offers little help. On the other hand,
the sorted organization allows us to look almost exclusively at those blocks of
the index and the main file that contain relevant records. The only extra work
we have to do is use binary search to find the first relevant index block, and
look at some records outside the desired range in the first and last index blocks
and the first and last blocks of the main file that we access. D
Interpolation Search
A method of searching an index that can be superior to binary search is known
as interpolation or address calculation search. This method is predicated on
our knowing the statistics of the expected distribution of key values, and on
that distribution being fairly reliable. For example, if we are asked to look up
John Smith in the phone book, we do not open it to the middle, but to about
75% of the way through, "knowing" that is roughly where we find the 5's. If we
find ourselves among the T's, we go back perhaps 5% of the way, not halfway
to the beginning, as we would for the second step of a binary search.
In general, suppose we have an algorithm that given a key value i, tells
us what fraction of the way between two other key values, 2 and 3, we can
expect i to lie. Call this fraction /(ii, 2 , ^3)- If an index or part of an index
lies on blocks B\, . . . ,Bn, let v2 be the first key value in B\ and 113 the last
key value in Bn. Look at block Bj, where i = \nf(v\, 2,^3)] to see how its
first key value compares with v\. Then, as in binary search, repeat the process
on either BI, . . . , Bi-i or Bi,...,Bn, whichever could contain the value that
covers v\, until only one block remains.
315
First, we sort the initial file of records and distribute them among blocks. Since
files tend to grow, we often find it convenient to distribute the initial records
in such a way that there is a small fraction, say 20%, of the total space unused
on each block.
The second step of initialization is to create the index file, by examining
the first record on each block of the main file. The keys of each of these records
are paired with the addresses of their blocks to form the records of the index
file. One useful exception is to replace the key value in the first index record
by oo, a value that is less than any real key value. Then, should we insert
a record with a key that precedes any key in the current file, we do not have
to treat it specially. When we apportion the index records among blocks, we
might again want to leave a small fraction of the space available, because when
we are forced to increase the number of blocks of the main file we also have to
increase the number of records of the index file.
The final step of initialization is to create a directory containing the ad
dresses of the index blocks. Often, this directory is small enough to put in main
memory. If that is not the case, the directory may itself be put on blocks and
316
moved in and out of main memory as needed. If we must do so, we are getting
very close to the multilevel index structure known as "B-trees," discussed in
Section 6.5.
Lookup
Suppose we want to find the record in the main file with key value i . Examine
the index file to find the key value v2 that covers v\ . The index record containing
2 also contains a pointer to a block of the main file, and it is on that block
that a record with key value v\ will be found, if it exists.
The search for the index record with key v^ covering v\ can be performed
by any of the techniques discussed abovelinear search, binary search, or in
terpolation searchwhichever is most appropriate.
Modification
To modify a record with key value v\, use the lookup procedure to find the
record. If the modification changes the key, treat the operation as an insertion
and deletion. If not, make the modification and rewrite the record.
Insertion
To insert a record with key value v\ , use the lookup procedure to find the block
Bi of the main file, on which a record with key value v\ would be found. Place
the new record in its correct place in block Bi, keeping the records sorted and
moving records with key values greater than v\ to the right, to make room for
the new record.5
If block Bi had at least one empty record area, all records will fit, and we
are done. However, if Bi was originally full, the last record has no place to go,
and we must follow one of several strategies for creating new blocks. In the
next section we shall discuss a strategy ("B-trees"), in which Bi is split into
two half-empty blocks. An alternative is to examine BJ+I. We can find Bt+i'
if it exists, through the index file, since a pointer to Bi+i is in the record of the
index file that follows the record just accessed to find Bi. If BJ+I has an empty
record area, move the excess record from Bi to the first record area of .Bj+i,
shifting other records right until the first empty record area is filled. Change
the used/unused information in the header of BJ+I appropriately, and modify
the index record for Bi+i to reflect the new key value in its first record. If Bi+i
has many empty record areas, we can shift enough records from Bi to Bi+i to
equalize the amount of empty space on each; the number of block accesses is
not increased, and in fact, very little extra computation is needed.
5 Remember, we assume that the most significant cost of operations is the block access
time, and simple computations on the block, like moving data left or right, do not
dominate the total cost.
317
If Bi+i does not exist, because i = k, or Bi+i exists but is full, we could
consider obtaining some space from Bj_i similarly. If that block is also full,
or doesn't exist, we must get a new block, which will follow Bi in the order.
Divide the records of Bi between Bi and the new block. Then insert a record
for the new block in the index file, using the same strategy as for inserting a
record into the main file.
Deletion
As for insertion, a variety of strategies exist, and in the next section we shall
discuss one in which blocks are not allowed to get less than half full. Here,
let us mention only the simplest strategy, which is appropriate if relatively few
deletions are made. To delete the record with key value i, use lookup to
find it. Move any records to its right one record area left to close the gap,6
and adjust the used/unused bits in the header. If the block is now completely
empty, return it to the file system and delete the record for that block in the
index, using the same deletion strategy.
Example 6.12: Suppose we have a file of numbers records, and initially our
file consists of records for the following list of "random" numbers, which were
generated by starting with 2, and repeatedly squaring and taking the remainder
modulo 101. Our initial sorted list is:
2, 4, 5, 16, 25, 37, 54, 56, 68, 79, 80, 88
We can fit five records on a block, but let us initially place only four on each to
leave some room for expansion. The initial layout is shown in Figure 6.7. Each
of the three blocks of the main file has one empty record area and four bytes of
waste space at the end, one byte of which could be occupied by used/unused bits
for the five record areas of the block. The one block of the index file has three
records, (-00,61), (25,62)1 and (68,63), where 6i, 62, and 63 are the addresses
of the three blocks of the main file. The directory of index blocks is not shown,
but would contain the address of the one index block.
Now, let us consider what happens when the next four numbers in this
random sequence, 19, 58, 31, and 52, are inserted. We place 19 in the first
block, and it happens to follow all the numbers already in that block. We thus
place it in the fifth record area, the one that is empty, and there is no need to
slide records to the right in this block. Number 58 similarly goes in the second
block, and" its proper place is in the fifth record area, so no rearrangement is
necessary.
The third insertion, 31, also belongs in block 2, and its proper place is in
6 This step is not essential. If we do choose to close up gaps, we can use a count of the
full record areas in the header in place of a used/ Caused bit for each record area. The
reader should again be reminded that if records are pinned we do not even have the
option of moving records into record areas whose records have been deleted.
318
((
o
oo
ID
III
25 37 54 56
///I
68 79 80 88
///
wl/// ///I///
Figure 6.7 Initial index file.
the second record area, after 25. We must thus slide 37, 54, 56, and 58 to the
right to make room. However, there is not room for six records, and we must
find a place for one of them. In this case, the third block has space, so we can
shift 58 into the first record area of the third block, and shift the four records
already in that block to the right. Since 58 is now the lowest key in the third
block, we must change the index record for that block. These modifications are
shown in Figure 6.8.
<4
1U
13 I///
25 31 37|54 56 |///
^
/*
58 68 79 80 88 I///
f
-oo 1 25
58 I///
///!///
If records are pinned down to the place in which they are first stored, we cannot,
in general, keep records sorted within a block. One solution is to start the file
319
-z
,
^99
r
r& //////i
;> Ib iy ///
25 31 37
///I
52 54 56
///
58 68 79 80 88 ///
Sort the file and distribute its records among blocks. Consider filling each block
to less than its capacity to make room for expected growth and to avoid long
chains of blocks in one bucket. Create the index with a record for each block.
As in the previous organization, it is important to use key oo for the first
block, so keys smaller than any seen before will have a bucket in which they
belong.
Operations
The operations on files with this organization are performed with a combination
of the ideas found in Section 6.3, concerning hashed files, and the organization
just discussed, concerning sorted files with unpinned records. The salient fea
tures are mentioned below:
1. Lookup. Find the index record whose key value v2 covers the desired key
value i. Follow the pointer in the selected index record to the first block
of the desired bucket. Scan this block and any blocks of the bucket chained
to it to find the record with key i .
320
2.
Insertion. Use the lookup procedure to find the desired bucket. Scan the
blocks of the bucket to find the first empty place. If no empty record area
exists, get a new block and place a pointer to it in the header of the last
block of the bucket. Insert the new record in the new block.
3. Deletion. Use the lookup procedure to find the desired record. We might
consider setting the used/unused bit for its record area to 0. However, as
discussed in Section 6.1, if there may exist pointers to the record being
deleted, another deletion strategy must be used. The used/unused bit is
kept at 1, and to indicate removal of the record, a deletion bit in the record
itself is set to 1.
4. Modification. Perform a lookup with the given key. If only nonkey fields
are to be changed, do so. If one or more fields of the key change, treat
the modification as a deletion followed by an insertion. However, if records
are pinned and modification of key fields is permitted, we must not simply
set the deleted bit of the old record to 1. If we did nothing else, then old
pointers to that record would "dangle," and we would not be able to find
the modified record by following those pointers. Thus, we must not only
set the deleted bit for the deleted record, but we must leave in that record
a "forwarding address," pointing to the new incarnation of the record.
Example 6.13: Suppose we begin with the file shown in Figure 6.7 and add
the numbers 19, 58, 31, 52, 78, and 24. As in Example 6.12, the first two of
these go in blocks 1 and 2, and they accidentally maintain the sorted order of
the blocks, as they are placed in the fifth record area of each block. When we
insert 31, we must create a new block for the second bucket and place it in the
first record area. Similarly, 52 goes in the second record area of the new block,
78 fills the block of the third bucket, and 24 requires us to create a second block
for bucket 1. The final organization is shown in Figure 6.10. D
Additional Links
As records are not placed in a bucket in sorted order after the initialization,
we may have difficulty if we wish to examine records in sorted order. To help,
we can add a pointer in each record to the next record in sorted order. These
pointers are somewhat different from the pointers we have been using, since
they not only indicate a block, but they also indicate an offset within the block;
the offset is the number of the byte that begins the stored record, relative to
the beginning of the block. The algorithms needed to maintain such pointers
should be familiar from an elementary study of list processing.
Example 6.14: The second bucket of Figure 6.10 with pointers indicating the
sorted order is shown in Figure 6.11. D
6.5 B-TREES
321
cy Sao
37^ 54
56
58
An index being nothing more than a file with unpinned records, there is no
reason why we cannot have an index of an index, an index of that, and so on,
until an index fits on one block, as suggested in Figure 6.12. In fact, such an
arrangement can be considerably more efficient than a file with a single level of
322
indexing. In the structure of Figure 6.12, the main file is sorted by key value.
The first level index consists of pairs (v, 6), where 6 is a pointer to a block B
of the main file and v is the first key on block B. Naturally, this index is also
sorted by key value. The second level of index has pairs (v,b), where b points
to a first-level index block and v is its first key, and so on.
Third-level
Index
Second- level
Index
First-level
Index
6.5 B-TREES
323
Lookup
Let us search for a record with key value v. We find a path from the root of the
B-tree to some leaf, where the desired record will be found if it exists. The path
begins at the root. Suppose at some time during the search we have reached
node (block) B. If B is a leaf (we can tell when we reach a leaf if we keep the
current number of levels of the tree available) then simply examine block B for
a record with key value v.
If B is not a leaf, it is an index block. Determine which key value in block
B covers v. Recall that the first record in B holds no key value, and the missing
value is deemed to cover any value less than the key value in the second record;
i.e., we may assume the missing key value is oo. In the record of B that covers
t> is a pointer to another block B'. In the path being constructed, B' follows
B, and we repeat the above steps with B' in place of B.
Since the key value in record t of B is the lowest key of any leaf descending
from the tth child of B, and the main file's records are sorted by key value, it
is easy to check that B' is the only child of B at which a record with key v
could exist. This statement also holds for t = 1, even though there is no key
in the first record of B. That is, if v is less than the key in the second record,
then a main-file record with key v could not be a descendant of the second or
subsequent children of B.
Modification
As with the other organizations discussed, a modification involving a key field
is really a deletion and insertion, while a modification that leaves the key value
fixed is a lookup followed by the rewriting of the record involved.
Insertion
To insert a record with key value , apply the lookup procedure to find the
block B in which this record belongs. If there are fewer than 2e 1 records
in B, simply insert the new record in sorted order in the block. One can show
that the new record can never be the first in block B, unless B is the leftmost
leaf. Thus, it is never necessary to modify a key value in an ancestor of B.
If there are already 2e 1 records in block B, create a new block B\ and
divide the records of B and the inserted record into two groups of e records
each. The first e records go in block B and the remaining e go in block B\ .
324
Now let P be the parent block of B. Recall that the lookup procedure
finds the path from the root to B, so P is already known. Apply the insert
procedure recursively, with constant d in place of e, to insert a record for B\ to
the right of the record for B in index block P. Notice that if many ancestors of
block B have the maximum 2d 1 records, the effects of inserting a record into
B can ripple up the tree. However, it is only ancestors of B that are affected.
If the insertion ripples up to the root, we split the root, and create a new root
with two children. This is the only situation in which an index block may have
fewer than d records.
Example 6.15: Nontrivial examples of B-trees are hard to show on the page.
Let us therefore take the minimum possible values of d and e, namely two.
That is, each block, whether interior or a leaf, holds three records. Also to
save space, we shall use small integers as key values and shall omit any other
fields, including used/unused bits in the header. In Figure 6.13 we see an initial
B-tree.
. First record, key
value omitted
Third record
325
6.5 B-TREES
By, but now B^ has four records. We therefore get a new block, BH, and place
25 and 32 in B7, while 36 and 49 go in Bi2.
We now must insert a record with key value 36 and a pointer to B\2 into
B3. Value 36 is selected because that is the lowest key value on the block B\2This insertion causes B3 to have four records, so we get a new block B\3. The
records with pointers to B7 and B12 go in B3, while the records with pointers
to BS and Bg go in B13. Next, we insert a record with key value 64 and a
pointer to Bi3 into B\. Now BI has four records, so we get a new block BU,
and place the records with pointers to B2 and B3 in BI , while the records with
pointers to Bi3 and B4 go in B14. As BI was the root, we create a new block
B15, which becomes the root and has pointers to BI and Bi4. The resulting
B-tree is shown in Figure 6.14.
Bg
Bio
Deletion
If we wish to delete the record with key value v, we use the lookup procedure
to find the path from the root to a block B containing this record. If after
deletion, block B still has e or more records, we are usually done. However, if
the deleted record was the first in block B, then we must go to the parent of
B to change the key value in the record for B, to agree with the new first key
value of B. If B is the first child of its parent, the parent has no key value for
B, so we must go to the parent's parent, the parent of that, and so on, until
326
we find an ancestor A\ of B such that A\ is not the first child of its parent
AI- Then the new lowest key value of B goes in the record of AI that points to
A\. In this manner, every record (i,pi) in every index block has key value v\
equal to the lowest of all those key values of the original file found among the
leaves that are descendants of the block pointed to by p\ .7
If, after deletion, block B has e l records, we look at the block B\ having
the same parent as B and residing either immediately to the left or right of B.
If BI has more than e records, we distribute the records of B and B\ as evenly
as possible, keeping the order sorted, of course. We then modify the key values
for B and/or B\ in the parent of B, and if necessary, ripple the change to as
many ancestors of B as have their key values affected. If BI has only e records,
then combine B with BI, which will then have exactly 2e 1 records, and in
the parent of B, modify the record for BI (which may require modification of
some ancestors of B) and delete the record for B. The deletion of this record
requires a recursive use of the deletion procedure, with constant d in place of e.
If the deletion ripples all the way up to the children of the root, we may
finish by combining the only two children of the root. In this case, the node
formed from the combined children becomes the root, and the old root is deleted.
This is the one situation in which the number of levels decreases.
Example 6.16: Let us delete the record with key value 64 from the B-tree of
Figure 6.14. The lookup procedure tells us the path to the block that holds this
record is BI$, BU, B\3, Bs. We delete the record from Bg and find that it was
the first record of that block. We therefore must propagate upwards the fact
that the new lowest key value in BS is 81. As BS is the leftmost child of BIS,
we do not change B\z, nor do we change BU, since B13 is its leftmost child.
However, B\4 is not the leftmost child of BIS, so there is a key value in Bi5
that must be changed, and we change 64 to 81 there. Notice that a deletion
never causes more than one key value to be changed.
We have another problem when we delete 64. Block BS now has only one
record. We go to its parent, B\3, and find that Bs has no sibling to its left. We
therefore examine Bg's sibling to the right, Bg. As Bg has only two records,
we can combine Bg with Bs. Now we discover that B13 has only one child, and
we must combine B\3 with its sibling, B4. Block BIS will now have pointers to
B8, BIO, and B\I. The key value 196 to go with the pointer to B\\ is found in
.64, while the key value 144 to go with B\Q is found in BU. In general, when
we merge blocks in a deletion, the necessary key values are found either in the
merged blocks or in their common parent. We leave it as an exercise for the
reader to develop an algorithm to tell where the desired key values are found.
7 This property is not essential, and we could dispense with the modification of keys in
index blocks. Then v\ would be a lower bound on the keys of descendants of the block
pointed to by p\ . The descendants to the left of that block will still have keys less than
i, as they must for the B-tree to be useful for finding records.
6.5 B-TREES
327
On combining BIS ami .84, we find BH has only one child, and so we
combine BH with B\. At this time, BI$ has only one child, and since it is the
root, we delete it, leaving the B-tree of Figure 6.15. D
B2 9\ 9 M - H
B3 I?|36M-|I
Bl3Ml44|o|l96|
BQ
Bf
B\2
BQ
BIQ
328
Example 6.17: Let us reconsider our running example of a file, which we dis
cussed in relation to single-level indices in Example 6.11. Records are assumed
200 bytes long, and if we want an odd number on 4,096-byte blocks, we must
choose 2e - 1 = 19; i.e., e = 10. We assumed in Example 6.11 that keys were
20 bytes long, and pointers took 4 bytes. Since we omit the first key, we can
fit 171 index records on a 4,096-byte block, since 170 x 20 -I- 171 x 4 = 4,084.
Thus, d = 86. The expected number of block accesses per operation is thus
2 + \ogd(n/e) = 2 + log86(l,000,000/10) < 5
This figure is greater than the best for hashed access (about 3 read/writes),
but is superior to methods using a single level of indexing, except perhaps in
those situations where an interpolation search can be performed. The B-tree
shares with the methods of Section 6.4 the advantage over hashed access of
permitting the file to be listed or searched conveniently in sorted order. D
6.6 FILES WITH A DENSE INDEX
In the schemes discussed so far, hashing, sparse indices, and B-trees, most
blocks of the main file are only partially filled. For example, a hash structure
with an adequate number of buckets will have only one or two blocks in most
buckets, and at least one block per bucket will be only partially filled. In the
B-tree scheme discussed in the previous section, all the leaf blocks are between
half-full and completely full, with the average block around three-quarters full.
In contrast, the heap, mentioned in Section 6.2, keeps all main file blocks
but the last full, which saves a significant amount of space.8 The problem with
using a heap, of course, is that we must have an efficient way of finding a record,
given its key value. To do so, we need another file, called a dense index, that
consists of records (v,p) for each key value v in the main file, where p is a
pointer to the main file record having key value v. The structure of the dense
index may be any of the ones discussed in Sections 6.3-6.5; i.e., we only require
that we can find the record (v,p) quickly, given the key v. Note, incidentally,
that a "dense" index stores a pointer to every record of the main file, while the
"sparse" indices discussed previously stored the keys for only a small subset of
the records of the main file, normally only those that were the first on their
blocks.
To look up, modify, or delete a record of the main file, given its key, we
perform a lookup on the dense index file, with that key, which tells us the block
of the main file we must search for the desired record. We must then read this
block of the main file. If the record is to be modified, we change the record and
8 There is the detail that if records are pinned, we cannot physically delete records but
must set a "deleted" bit. However, this extra space cost occurs with all storage structures
when records are pinned.
329
rewrite its block onto secondary storage. We thus make two more block accesses
(one to read and one to write) than are necessary to perform the corresponding
operation on the dense index file.
If we are to delete the record, we again rewrite its block and also delete
the record with that key value from the dense index file. This operation takes
two more accesses than a lookup and deletion from the dense index.
To insert a record r, we place r at the end of the main file and then insert
a pointer to r, along with r's key value, in the dense index file. Again this
operation takes two more accesses than does an insertion on the dense index
file.
It would thus seem that a file with a dense index always requires two more
accesses than if we used, for the main file, whatever organization (e.g., hashed,
indexed, or B-tree) we use on the dense index file. However, there are two
factors that work in the opposite direction, to justify the use of dense indices
in some situations.
1. The records of the main file may be pinned, but the records of the dense
index file need not be pinned, so we may use a simpler or more efficient
organization on the dense index file than we could on the main file.
2. If records of the main file are large, the total number of blocks used in the
dense index may be much smaller than would be used for a sparse index
or B-tree on the main file. Similarly, the number of buckets or the average
number of blocks per bucket can be made smaller if hashed access is used
on the dense index than if hashed access were used on the main file.
Example 6.18: Let us consider the file discussed in Example 6.17, where we
used a B-tree with d = 86 and e = 10 on a file of n = 1,000,000 records.
Since dense index records are the same size as the records in the interior nodes
of a B-tree, if we use a B-tree organization for the dense index, we may take
d = 86 and e 85.9 Thus, the typical number of accesses to search the dense
index is 2 + log86( 1,000,000/85), or slightly more than 4. To this we must add
two accesses of the main file, so the dense index plus B-tree organization takes
between one and two more block accesses [the actual figure is 2 log86 (85/10)]
than the simple B-tree organization.
There are, however, compensating factors in favor of the dense index. We
can pack the blocks of the main file fully if a dense index is used, while in the
B-tree organization, the leaf blocks, which contain the main file, are between
half full and completely full; thus, we can save about 25% in storage space
for the main file. The space used for the leaves of the B-tree in the dense
index is only 12% of the space of the main file, since index records are 24 bytes
Technically, the leaf blocks of the B-tree used as a dense index must have key values
in all the records, including the first. This difference means that we can fit only 170
records in leaf blocks, and so must take e = 85.
330
long and main file records are 200 bytes. Thus, we still have a net savings of
approximately 13% of the space. Perhaps more importantly, if the main file has
pinned records, we could not use the B-tree organization described in Section
6.5 at all. D
Methods for Unpinning Records
Another use for a dense index is as a place to receive pointers to records. That
is, a pointer to record r of the main file may go instead to the record in the
dense index that points to r. The disadvantage is that to follow a pointer to
r we must follow an extra pointer from the dense index to the main file. The
compensation is that now records of the main file are not pinned (although the
records of the index file are). When we wish to move a record of the main
file, we have only to change the one pointer in the dense index that points
to the moved record. We may thus be able to use a more compact storage
organization for the main file, and the storage savings could more than cover
the cost of the dense index. For example, if the main file is unpinned, we can
reuse the subblocks of deleted records.
Another technique for making files unpinned is to use the key values of
records in place of pointers. That is, instead of storing the address of a record
r, we store the value of the key for r, and to find r we do a standard lookup
given the key value. The IMS database system (IBM [1978b]), for example,
makes use of this technique, as discussed in Section 6.10. In this way, both the
dense index file and the main file can be unpinned. The disadvantage of this
implementation of pointers is that to follow a "pointer" to the main file, we
must search for the key value of that record in the dense index, or in whatever
structure is used for accessing the main file, which will probably take several
block accesses. In comparison, we would need only one block access if we could
go directly to the record of the main file, or two accesses if we went directly to
the record in the dense index and then to the record in the main file.
Summary
In Figure 6.16 we list the four types of organizations for files allowing lookup,
modification, insertion, and deletion of records given the key value. In the
timing analyses, we take n to be the number of records in the main file and,
for uniformity with B-trees, we assume the records of the main file are packed
about e to a block on the average, and records of any index files can be packed
about d to a block on the average.
6.7 NESTED RECORD STRUCTURES
Frequently, we are interested in doing more than retrieving records, given their
key value. Instead, we want to find a subset of the records in a file based
331
Time per
Operation
Advantages and
Disadvantages
Problems with
Pinned Records
Hashed
3 if buckets
average one
block.
Must search
buckets for
empty space
during inser
tion or allow
more blocks
per bucket
than optimal
IhHin index
w 2 4- log(n/de)
for binary search
s 3 + log log(n/de)
if address cal
culation is
feasible and
is used.
&2 + logd(n/e)
Fastest of all
methods. If
file grows,
access slows,
as buckets get
large. Cannot
access records
easily in order
of sorted key
values.
Fast access if
address calcul
ation can be
used. Records
can be accessed
in sorted order.
Fast access.
Records can be
accessed in
sorted order.
Blocks tend not
to be solidly
packed.
Often slower by
one or two block
accesses than
if same access
method used for
index file were
used for the
main file.
May save space.
Use B-tree
as dense
index.
Organization
B-tree
Dense index
Same as above.
None.
332
333
(6.1)
334
record takes 168 bytes, and it is very likely that it fits on one block. Thus, if
we search for and find the Zack Zebra CUSTOMERS record, we often require
no additional retrievals to find the orders and entries within those orders for
this customer, although additional block retrievals are necessary if we follow
the virtual pointers and actually retrieve the items ordered.
With bad luck, the records of Figure 6.17 will be distributed over two
consecutive blocks, and both will have to be retrieved to get all the orders for
Zack Zebra. However, even two block accesses is much better than what we
might have to face if we stored customers, orders, and item-quantity pairs in
separate files, with no clustering of orders or entries for the same customer.
Then we might find each of the nine records of Figure 6.17 on a different block.
D
Separate Storage of Repeating Groups
Another way to store a nested structure is to replace each of the outermost
repeating groups (those that are not inside any other repeating group) by a
pointer that indicates a place on another block or group of blocks where the
repeating group itself may be found in consecutive space. Also needed is in
formation about how to tell when the list of records in the repeating group
ends. For example, we might attach to the pointer a count of the number of
occurrences of the repeating group, or the records of the repeating group itself
might be linked in a list, with a null pointer to indicate the end of the repeating
group.
Example 6.21: In the structure for a single CUSTOMERS database record,
given by (6.2) above, there is one outermost repeating group, which is the
structure for a single order that was given by (6.1). We could replace this
repeating group by a pointer, which we could think of as a "virtual list of
orders." The structure of a CUSTOMERS database record thus becomes
335
CUSTOMERS VORDERS
This structure is equivalent to a simple, fixed-length record, like CUSTOMERS
records, with an additional field for a pointer and perhaps a count of orders.
The structures for the orders can then be stored in preorder sequence, or
they can be further broken down by replacing the outermost repeating group
of (6.1), which is
(VITEM QUANTITY)*
by a pointer to a list of "entries," which are item-quantity pairs. If we take this
option, then the CUSTOMERS, ORDERS, and ENTRIES records are spread
over three files, and therefore, over three groups of blocks, as is suggested by
the structure of Figure 6. 18. There, we have linked together records belonging
to a single repeating group. D
CUST(Zebra) | VORDERS(y)| - (other customers)
I ORDERS(1024) I VENTRY((j')| /[
|VITEM(c')|3|/|
| ORDERS(1026) | VENTRY(y) |
| VITEM(c.)|6]T|
7
Brie
[ VITEM(y) 2048
f
Perrier
Macadamias
336
337
adequate number of buckets, we need about 2 block accesses, one for the block
of the bucket and one for the block on which the customer record is stored. To
that, we add another sixth of an access in the case the orders are spread over
two blocks.
It is hard to make an exact comparison of this performance with a structure
in which orders are not nested close to their customers, because, given the
structures seen so far, we do not even have a way of finding orders records
given a customer name. In fact, in the hierarchy of Figure 2.26, ORDERS
records do not contain the name of the customer, and it was assumed implicitly
that the name is found in the parent record whenever we reach an ORDERS
record.
However, if we adopt the approach of the relational model, where ORDERS
records are tuples containing the name of the customer, as in Figure 2.8, then
we can find the orders, given a customer, provided we use a "secondary index"
on customer name for the ORDERS relation. Secondary indices are discussed
further in the next section, but, for example, we might build a hash table whose
records have a "key" field, which is a customer name, and a second field, which
is a pointer to one of the orders for that customer. Of course these "keys"
are not true keys, but since we assume only four orders per customer on the
average, the records will distribute fairly well among buckets, and we might
expect to discover the locations of all orders for a given customer in about two
block accesses. There is no reason to expect two of these orders to be on the
same block, if they are in a file sorted by order number, for example. Thus, six
accesses (two for hashing and four for finding the orders themselves) are needed
for an "ordinary" structure, while the nested structure needs only 2.17 on the
average. D
Another natural use of nested structures is for storing two record types
connected by a DBTG set. For example, the structure of Figure 6.17 could be
thought of as one where CUSTOMERS records "own" the following ORDERS
records, which in turn "own" their following ENTRIES records, each consisting
of a pointer to an item and a quantity. Then the nested structure facilitates
queries that ask for the orders owned by a given customer, or the entries owned
by a given order. It also facilitates moving from member records to their owners,
e.g., from an order to the customer that placed the order, since it is likely that
the owner is on the same block as the member record in question.
338
1.
2.
339
Then, inserted records are placed in available space at the end of the block
in which they belong, and records can be linked in the proper order, as was
suggested by Figure 6.11. In the second case, we have a block directory for
our variable-length records, and pointers go to the directory itself. Then, we
can slide records around within the block, making sure the directory pointers
continue to point to the correct records. Therefore, we can insert records into
their proper order in the block, and need not link the records.
However, we must consider what happens when blocks overflow. If records
are pinned to fixed locations within blocks, the best we can do is what was done
in Figure 6.10. We consider each original block the first block of a "bucket,"
and we link to it additional blocks containing the newly inserted records. To
preserve the correct order of records we need links as in Figure 6.11.
If pointers to records are really pointers to the block directory, then we
have some options. The block directory itself cannot move, so we cannot split
blocks into two blocks and distribute records simply. We still must think of the
overflowing block as the beginning of a chain for a bucket. We must keep the
directory on that block or, as the bucket grows, on the first and subsequent
blocks of its bucket. However, we can keep the records of the bucket in their
proper order, distributed among the blocks of the bucket.
6.8 SECONDARY INDICES
Prior to Section 6.7, we were concerned with structures for primary indices,
those that allow us to find a record given the value of its key. Often, the
organization of the file was determined by the needs of the primary index. Then
in Section 6.7 we saw that there are reasons why we might want to organize the
records of a file in a way that is not compatible with the desired primary index
structure. That problem can be handled by using a dense index, with one of
the structures discussed in Sections 6.3-6.5, to hold pointers to the records of
the file, which are then distributed to meet some other need, such as fitting into
a nested structure. We also saw in Section 6.7 that structures can be designed
to make efficient certain operations other than lookup, insertion, and deletion
of a record with a given key value; in particular, we considered how to support
queries that follow relationships between two or more record types.
In this section, we shall consider another type of operation: given the value
v of a field other than the key, find all records of a certain type that have value
v in that field. This problem reduces to the one we considered in the previous
section, because we can use a secondary index, which is a nested structure with
pattern
VALUE (REFERENCE)*
A REFERENCE, in this sense, is a way of getting to one of the records having
the given VALUE. Two reasonable interpretations of references are:
340
f -r:
lr f
80 88
54 56
t
X
s
t
4
&
16 68 79
25 37 /// III
Fortuitously, the references for each letter but / fall on one block. Thus,
assuming the dense index for letters is kept in main memory, we need only a
341
little more than one block access per letter to find references to all the numbers
with a given NAME value. Of course, we must still retrieve the records for
these numbers if we want to access them, and this step would take one access
per number if pointers were stored in the secondary index, and several accesses
per number if keys (i.e., the numbers themselves) were stored in the secondary
index and some appropriate structure were used for a primary index. If the
size of the numbers file were very large, and the numbers for each NAME value
covered several blocks, we could still retrieve references to all of the numbers
with a given first letter with about as many block accesses as the references
themselves could fit in.
Instead of storing the secondary index in preorder, with a dense index on
letters, we could use separate storage of the repeating group of references, as
in Figure 6.18. Then the secondary index itself would consist of only the (up
to) six letters that are possible values of NAME, each paired with a pointer to
a chain of blocks that hold of all the references for that letter. The index itself
becomes a short file with at most six letter-pointer records, which we might
store as a linked list of blocks. An example, using the data of Figure 6.7, with
our tiny blocks holding six elements each, is shown in Figure 6.20. There, each
of the lists of references fits on one block, but in general, these lists would cover
many blocks. D
4 | 5 |54|56|///
80 88
The importance of storing the references for a given value close to that value
and close to each other goes up as the number of references associated with
each value increases. Only then can we minimize the number of blocks that
342
need to be retrieved when using the secondary index. On the other hand, if
the expected number of records with a given value in the field of the secondary
index is small, then we have the option of treating the secondary index as if it
were a dense index for a key value. Some of the structures for primary indices
need some modification. A hashed file, as we have mentioned, does not depend
on the keyness of the values it hashes. However, sorted files and index structures
can present some pitfalls if used as a dense index on values that do not serve
as keys.
Suppose we have a secondary index on field F, which we store as a dense
index. That is, our secondary (dense) index is a file of pairs (v,p), where p is
a pointer to a record with value v in field F. Let us sort the file on the first
component and use the (sparse) isam index structure of Section 6.4 to find,
given a value v, those pairs with first component v. There may, in fact, be two
or more records, say (,pi) and (f,p2), in the secondary index file. With bad
luck, the first of these comes at the end of one block and the second at the
beginning of the next, as
V
P2
Representing Links
There are several ways we can represent links so that we can travel efficiently
from owner to members or vice versa. Suppose we have a link from member
record type T2 to owner record type T\. The most efficient implementation
of the link is generally to store the files corresponding to both of these record
types as a nested structure 7\(T2)*. Then, if we implement this structure as
suggested in Section 6.7, we can easily go from an owner of type T\ to all of
343
its members, and we can go easily from a member to its owner, if we use the
preorder sequence.13
If there is another link from record type T3 to 7\ , we can list the occurrences
of T-j records with the corresponding T\ records, using a nested structure such as
Ti(T2)*(Ts)*. Again, the methodology of Section 6.7 can be used to implement
such structures.
However, suppose there is another link from T2 to some record type T^.
We cannot list T2 records after 7\ records and also list them after T4 records, or
at least, it would hardly be efficient or convenient to do so. If we duplicated T2
records and placed them after both 7\ and T4 records owning them, we would
introduce the redundancy and potential for inconsistency that we always wish
to avoid.
Multilist Structures
We therefore need another way of representing links, one that does not force
records of one type to be adjacent to records of another type. In this organi
zation, called a multilist, each record has one pointer for each link in which it
is involved, although we do have the option of eliminating the pointer for one
link and representing that link by a nested structure, as discussed above.
Suppose we have a link L from T2 to TI . For each record R of type TI we
create a ring beginning at R, then to all of the records RI, fl2, . . . , R^ of type
T2 linked to R by L, and finally back to R. The pointers for link L in records of
types TI and T\ are used for this purpose. Such rings were suggested in Figure
5.1.
It is important to remember that in a multilist organization, each record
has as many pointers as its record type has links. As the pointers are fields in
the records, and therefore appear in fixed positions, we can follow the ring for a
particular link without fear of accidentally following some other link. Another
essential, if we are to navigate through multilists is that each record must have,
in a fixed location such as the first byte, a code that indicates its record type.
If we didn't have that code, we couldn't tell when we had reached the owner
record in a ring. If the owner and member types of a link kept their pointer
for the link in different positions,14 then we could not find that pointer without
knowing whether we were at a member or owner.
Example 6.24: Multilist structures involving two or more links can look
quite complex, although logically, they are only implementing physically sev
eral many-one mappings, as we illustrated in Figure 2.15. There, we showed
13 If we store the repeating group of TI records separately, as in Figure 6.18, then we also
need pointers back from the TI records to their TI "owner."
14 In general, it is not possible to avoid having the pointer position for at least one link
differ between its owner and member types, without wasting substantial space.
344
COURSES
CS101 9
E-COURSES
ENROLL
E-STUDENTS
STUDENTS
Figure 6.21 A multilist structure for courses and students.
345
346
347
in the tree in very few block accesses on the average, since they collectively
follow the node in the preorder sequence. It is this property of the preorder
listing, together with the assumption that the most frequent type of query will
ask for the descendants of a given node, that justifies the preorder sequence as
an important organization for hierarchical data.
348
A
C2
63
C3
C4
C5
Cg
Cy
Cg
OIG
bi \ &2
C2
C4
III
0 C8
111111111111111
349
Oi2
b5
eg
llllo 68
67
///////////// o
020
030
&4
C5
cG 1 I
o CT eg //////////// o
Pointer Networks
In some queries, we do not want to see the entire database record, and if so, we
can often speed our access to the relevant parts if we use a network of pointers
to connect the records that belong to one database record. For example, if
all the children of the root were linked by a chain of pointers, we could visit
each child in turn, even though many blocks holding the descendants of those
children appear between them.
IMS uses two types of pointer networks. The first is the obvious one: each
record points to the next record in the preorder listing. This arrangement is
called preorder threads. The second arrangement is for each record to have a
pointer to its leftmost child and a pointer to its right sibling. The right sibling
of a node n is that child of the parent of n that is immediately to the right of
350
n. For example, in Figure 6.25(a), g is the right sibling of 6, and g has no right
sibling.
Example 6.27: Figure 6.25(a) shows a tree; Figure 6.25(b) shows that tree
with preorder threads, and Figure 6.25(c) shows the same tree with leftmost
child (solid) and right sibling (dashed) pointers. D
/l\
h
(a) A tree.
*-d
Each method has its advantages. Preorder threads need only one pointer
per record, while leftmost child/right sibling pointers require space for two
pointers per record, even though many of these pointers are null (for example,
no leaf node has a leftmost child). On the other hand, leftmost child/right
sibling pointers enable us to travel from left to right through the children of a
node quickly, even though many descendants intervene in the preorder sequence.
Observe, for example, how we can go from btog directly in Figure 6.25(c), while
we must travel through c, d, e, and / in Figure 6.25(b).
351
352
1.
2.
3.
Indices in System R
The B-tree indices mentioned in (3) above make no distinction between a pri
mary and secondary index. That is, it doesn't matter to the system whether
the set of attributes for an index forms a key for the relation. Suppose we have
an index on attributes AI, . . . , Ak- Then the interior nodes of the B-tree are
blocks filled, as much as the B-tree scheme allows, with records consisting of a
pointer to another block and a list of values, one for each of the attributes of
the index. These records are essentially the same as the pairs consisting of a
pointer and a key value that we discussed in connection with B-trees in Section
6.5; the difference is that there is no presumption of keyness.
Leaf nodes of the B-tree consist of values for attributes AI, . . . , A^ and
associated lists of tuple identifiers; there is one tuple identifier for each tuple
having the given values for AI, . . . , Ak- Actually, tuple identifiers point not to
the tuple, but to a place near the end of the block, where a pointer to the tuple
itself can be found. This double indirection, through a block directory, does
not cost us extra block accesses, and it has the advantage that tuples may be
moved around within blocks, as was mentioned in Section 6.1.
The reader should note that this arrangement differs somewhat from the
B-tree schemes we discussed in Sections 6.5 and 6.6. In the terms of Section
6.7, there is a nested structure with the pattern
VALUE (RECORD)*
serving as a secondary index into the main file. This structure is implemented
by storing its instance in preorder, among a sequence of blocks. These blocks,
which are the "leaves of the B-tree" mentioned above, are managed by splitting
overfull blocks into two, and merging blocks less than half full, according to the
B-tree style of handling insertions and deletions.
353
The resulting sequence of tuples beginning with the CUSTOMERS record for
Zack Zebra and including all its "owned" ORDERS records and INCLUDES
records "owned" by them, is shown in Figure 6.26.
CUSTOMERS record for Zack Zebra
ORDERS record for order 1024
INCLUDES record:
O# = 1024; ITEM = "Brie"; QUANTITY = 3
INCLUDES record:
O# = 1024; ITEM = "Perrier"; QUANTITY = 6
ORDERS record for order 1026
INCLUDES record:
O# = 1026; ITEM = "Macadamias" ; QUANTITY = 2048
Figure 6.26 Tuples stored "via set."
The similarity of Figure 6.26 to Figure 6.17 should be observed. The
INCLUDES tuples correspond to what we called "entries," which consist of a
virtual item and a quantity. However, one should appreciate the fact that in the
hierarchical and network models, we do not have to place the customer name in
both CUSTOMERS and ORDERS, or the order number in both ORDERS and
INCLUDES. The structure of the network or hierarchy allows us to determine
the customer for an order by its owner (in networks) or parent (in hierarchies).
In a relational system, it is the common values between CUSTOMERS and
ORDERS, and between ORDERS and INCLUDES, that determines the posi
tions of ORDERS and INCLUDES records; e.g., an ORDERS record follows
the CUSTOMERS record with the same customer name.
The formal requirements for storing the tuples of a relation R nested within
a relation 5, according to the pattern SR* are as follows.
1. We can establish a correspondence between a set of attributes X of R and
Y of 5. For example, R could be ORDERS, 5 could be CUSTOMERS,
X could consist of the single attribute CUST, and Y could be the sin
gle attribute NAME. Note that CUST in ORDERS and NAME in CUS
TOMERS "mean" the same thing, but of course there is no requirement
for a similarity of "meaning."
2. Y is a key for 5.
3. Whenever we have a tuple p, in R, there is a tuple v in 5 such that n[X] =
v[Y]; that is, the X-value of every tuple in R occurs as a V-value in some
tuple of 5.
Under these conditions, we can store, after each tuple v of S, all the tuples p,
of R such that n[X] = v[Y]. Every tuple of R will have a unique tuple of S to
354
follow.
Multilist Structures in System R
We can also have, in System R, a multilist structure linking tuples of two
relations according to common values in a field of each. Suppose that two
relations R and S satisfy (l)-(3) above. Then we may create new attributes
PTR for both relation schemes; these new attributes are not accessible to the
user. The values for these two attributes are used to form rings connecting each
tuple v of S to the tuples of R that it "owns," that is, the tuples of R whose
X-values agree with v[Y].
Example 6.28: In Figure 6.27 we see the ORDERS and INCLUDES relations
of Figure 4.2 stored as a multilist structure based on the commonality of values
between the attributes CUST and NAME, respectively. D
6.12 RANGE QUERIES AND PARTIAL-MATCH QUERIES
Classical database systems are designed to handle the type of query that appears
repeatedly in Chapters 4 and 5, one in which a value for one attribute or field is
given and values of related attributes or fields are desired. The index structures
covered so far in this chapter are well suited to such queries. However, in some
modern applications, such as those discussed in the second half of Chapter 1
graphics databases, computer aided design databases, and VLSI databaseswe
are often faced with queries for which the index structures described so far are
inadequate. These queries may involve inequalities, rather than equalities, and
they may have many simultaneous conditions.
355
356
Range Queries
A query in which fields are restricted to a range of values rather than a single
value is called a range query. Figure 6.29(a) and (c) are range queries; in (a),
x\ is restricted to the range oo < xi < 3, and in (c) x\ is restricted to the
range 0 < x\ < 10, for example.
Partial-Match Queries
A query in which several fields (but not all fields) are restricted to single values
is often called a partial-match query. Figure 6.29(b) is a partial-match query,
since values for two of the fields, Xi and y\ are specified, and the other two
fields are left unspecified.
Usually, but not necessarily, range queries and partial-match queries are
applied to files with the property that no proper subset of the fields of a record
is a key. For example, a rectangle is uniquely determined only by the four
coordinates used in Example 6.29; any subset of three or fewer cannot determine
a unique rectangle. It is also normal that the query asks for the entire record,
as did the queries in Figure 6.29, rather than for a subset of the fields.
357
However, even that is not as good as the ideal, because on the average, few of
these rectangles will have y\ = 6. Furthermore, the primary index would only
help if the query specifies a value for x\ (or whichever field we chose for the
primary index).
Another alternative is to get pointers to all of the possible solution records
from dense indices on all of the fields for which the query provides a value.
Then we intersect these sets of pointers. If these sets are sufficiently small, the
intersection can take place in main memory. The pointers in the intersection
tell us where to find the records in the solution. In our running example, the
cost is proportional to the number of index blocks that point to records with
either x\ = 5 or yi = 6, plus the number of blocks on which solution records
are found, which can still be much larger than the theoretical ideal.
Performance of Index Structures on Range Queries
We have similar, or even worse, problems with range queries. If we use hashing
for our indices, then we get no help for queries with large ranges. For example,
if field X is restricted to range a < X < b, then we must look in the buckets
for every value between a and b inclusive for possible values of X. There may
easily be more values in this range than there are buckets, meaning that we
must look in all, or almost all, the buckets.
Structures like isam indices and B-trees do support range queries to an
extent. We can find all X values in the range a < X < b with a number of
block accesses that is close to the number of records whose X field is in that
range. However, isam or B-tree secondary indices on each of the fields still leave
us with the problem we encountered with partial-match queries: the number of
blocks retrieved is proportional to the number of records that satisfy one of the
conditions, not to the number of answers. For example, in the query of Figure
6.29(c), we would retrieve either all the rectangles with 0 < x\ < 10 or all the
rectangles with 0 < y\ < 10.
The data structures we propose in the next two sections are more efficient
on some but not all of the partial-match and range queries. However, they allow
us to avoid maintaining indices on all the fields,15 which is expensive because
all insertions and deletions must deal with each of the indices we create. The
structures we propose next are often superior in retrieval time, are simpler and
faster to update, and require less space than keeping secondary indices on all
the fields.
'
15 If we omitted an index on one field F, then a query that specified a range or a value
only for F would receive no help from the structure.
358
359
62 =
(1,1)
(3,6)
(0,4)
(4,3)
(5,0)
(6,1)
(7,2)
(6,7)
(5,6)
360
That condition is not compatible with the requirement that h divide "ran
dom" sets of values evenly among buckets. In effect, we must partition values
so the lowest numbers, up to some fixed value ao, go in bucket 0, values bigger
than GO, up to some larger value ai, go in bucket 1, and so on. If we can find a
sequence of numbers ao, . . . , as-2 such that the number of values in each of the
ranges oo to OQ, ao to 01,...,os_2 to +00 are expected to be about equal,
then the hash function that sends these ranges to buckets 0, 1, . . . , B 1, respec
tively, serves both purposes: it preserves order and it "randomizes." However,
to use this approach to hashing requires that we know quite accurately the
distribution of values in each of the fields on which we hash.
Example 6.32: The partitioned hash function of Example 6.30 respects order,
since it uses the most significant bits of its values. To justify its use, we must
assume that points are chosen at random from the square of side 8 with lowerleft corner at the origin, i.e., the set of points (x,y) defined by 0 < x < 7 and
0 < y < 7. Actually, in this simple case it is sufficient to assume that the
expected numbers of points in each of the four quadrants of this square are
about the same; the exact distribution within quadrants does not matter. Even
this assumption requires accurate knowledge of the nature of the data. It would
be terrible, for example, if 90% of the points turned out to be in one quadrant.
With bad luck, even a small range in x and y will force us to look at all
four buckets. That happens, for example, if the query asks for 3 < x < 4 and
3 < y < 4. However, some ranges for x or y that are smaller than half of the
entire range (0 to 7) allow us to restrict bit 61 (if the range is for x) or bit 62
(if for y) of the bucket number. In that case, we need only look at a subset of
the buckets. For example, the query 1 < x < 3 and 2 < y < 5 requires us to
look only at the two buckets with 61 = 0, i.e., buckets 0 and 1. For the data of
Figure 6.30, no matching points are found. D
In justification, note that c bits are left unspecified, and these bits can be
replaced by any of 2C bit strings.
For example, if 6j = b/k for each i, and m out of the fc fields are specified,
361
then the number of buckets searched is 26^fc m)/fc. As a more specific example,
if m = fc/2, then we search 26/2 buckets, or the square root of the total number
of buckets.
When we have range queries to evaluate, we need to make the simplifying
assumption that the number of bits devoted to each field is large. Then, we
can neglect "edge effects" due to small ranges that hash to two different values,
as was illustrated in Example 6.32. That is, we assume that if a field Fj is
restricted to a range that is fraction r of its total domain, then the number of
different hash values hi(v) resulting from given values v in this range will be
fraction r of the total number of values that hi can produce, that is, r26' .
Thus, let us suppose that for i = 1, 2, . . . , fc, field Fj is restricted by our
query to a range whose length is fraction TJ of its total range; if the query does
not restrict Fj, then take rj to be 1. Then the number of buckets that must be
retrieved is
The above equality follows because Hfc=i 26' = 26, and 26 is B, the number of
buckets. Thus, we have shown that, neglecting edge effects, the fraction of the
buckets that must be examined is the same as the product of the TJ'S, which
is the fraction of all possible records that match the query. That is the least
possible cost, since almost all the retrieved blocks consist of answer records
only, and any retrieval algorithm must access at least that many blocks.
In summary, partitioned hashing offers essentially best-possible perfor
mance on range queries. It offers good performance on partial-match queries,
but the comparison with the multiple-indices structure considered in Section
6.12 could go either way depending on the data. In favor of partitioned hashing
is the fact that update of records is almost as simple as possible, while a pos
sible problem is that the good performance on retrieval depends on our having
a priori knowledge of the statistics of the data.
362
The problem is that B-trees often have very few levels, so frequently it
would not be possible to devote even one level to each field. We shall instead
consider a similar structure, called a k-d-tree, which is really designed for mainmemory operation. We shall then mention how it can be adapted to our model
of costs, where only block accesses are counted.
A k-d-tree is a variant of a binary search tree, which is a tree whose nodes
each hold a record and have (optional) left and right children. In an ordinary
binary search tree, there is one key field for records, and if node N has key x,
then the left child of N and all its descendants have keys less than x, while the
right child of TV and all its descendants have keys greater than x.
To find the record with key value v, we start at the root. In general, during
the search we shall be at some node M. If the key at M is we are done. If v
is less than the key of M we go to M's left child, and if v is greater, we go to
M's right child. Thus, we follow only one path from the root to a place where
the record is found, or we try to move to a missing child, in which case we
know there is no record with key v in the tree. If we wish to insert a record,
we search for its key and insert the record at the place where we find a missing
child. Deletion is a bit trickier; if we find the record with key , say at node TV:
1. If N has no children, delete N.
2. If TV has only one child, M, replace TV by M.
3. If TV has two children, find the leftmost descendant of the right child of TV,
and move that node to replace TV.
These techniques, and the reason they work, are fairly common knowledge, and
we shall not elaborate on them further. The reader interested in the details can
consult Aho, Hopcroft, and Ullman [1983].
A k-d-tree differs from a binary search tree only in that the levels of the
tree are assigned to fields in round-robin fashion; that is, if there are k fields,
F\, ..., Ffc, then level i is assigned Fi, for t = 1, 2, . . . , k, level k + 1 is assigned
FI, and so on. If TV is a node at a level to which Fj is assigned, and the value of
field Fi in the record at TV is x, then the left child of TV and all its descendants
must have Fj < x, while the right child of TV and its descendants must have
Fi>x.
Example 6.33: Let us store the nine points of Example 6.30 in a k-d-tree.
In general, many different k-d-trees can be used for the same set of records;
which one we get depends on the order in which we insert. The one we get by
inserting the nine points in order, row by row and left-to-right within rows, is
shown in Figure 6.31; we shall see how this tree is obtained shortly. D
Lookup in k-d-Trees
As above, we assume our k-d-tree stores records with fields Fi,...,Ffc, and
the levels are assigned fields in round-robin fashion. Suppose we are asked to
363
(6,7)
(7,2)
Figure 6.31 k-d-tree.
find the record (i, , ty). The search procedure is applicable to any node TV;
initially, we start with TV equal to the root.
1. If TV holds record (i, . . . , VK), we are done.
2. Otherwise, let TV be at level j, and let Fi be the field assigned to TV's level.
Let x be the value of field Fi in the record at TV; we call x the dividing
value for TV. If DJ < x, and there is no left child of TV, then we have failed
to find the desired record. If vi < x and TV has a left child M, then repeat
the search process at M .
3. If Vt > x and TV has no right child, then the search has failed. If j > x
and TV has right child M, repeat the search at M
Thus, the lookup procedure is little different from lookup in an ordinary
binary search tree. The only modification is that at each node along the search
path, we must compare the proper fields of the desired record and the record at
the node; in the binary search tree, it is always the key fields that are compared.
To insert a record r, we perform the lookup procedure, and when we come
to a missing child, we insert the record in the place for that child, knowing that
should we come looking for r we shall be directed from the root to that child
and thus find r. Deletion is performed as described for binary search trees in
general. In fact, we have some additional options regarding the choice of the
record that replaces the deleted record; this matter is left as an exercise.
Example 6.34: We mentioned that the k-d-tree of Figure 6.31 was constructed
364
by inserting each of the nine data points in turn. The last to be inserted is (7, 2),
so let us imagine the tree missing that node and see how insertion takes place.
We start at the root, which is level one and, therefore, is assigned the first field
(x) for its branching. We compare the first field of (7,2) with the first field of
the record at the root, which is (3,6). Thus, the dividing value at the root is
3. As 7 > 3, we go to the right child of the root.
At the second level, we deal with field two, i.e., the y components. The
second field of our record, 2, is less than the second field of the record (6, 7)
which we found at the right child of the root. Thus, we move to the left child
of that node, where record (5, 6) is found. At the third level, we again compare
first fields, we find that 7 > 5, and so we move to the right child, where record
(5, 0) lives. We compare second fields, find 2 > 0, and again move to the right
child, which holds record (6, 1). As this node is at level five, we compare first
fields again, and we find 7 > 6, so we move to the right child. However, there
is no right child, so we place the record (7, 2) in that position, making it the
right child of (6,1). D
Partial-Match Retrieval from k-d-Trees
If we are given values for a subset of the fields, we search as outlined above, as
long as we are at a node whose assigned field is one for which we have a value.
If we have no value for the field of our current node, then we must go both
left and right, and the search algorithm is applied to both children of the node.
Thus, the number of paths we follow from the root can double at each level for
which no value is given by the partial-match query.
Example 6.35: Suppose we want to find, in the tree of Figure 6.31, the set of
points with y = 1. Then at even-numbered levels, we can use the y-value 1 to
guide our search to the left or right, while at odd-numbered levels we have no
z-value to use, and so must search both left and right.
We begin at the root, and since x is the field assigned to this level, we must
examine both the left and right subtrees of the root. Of course, we must also
check the point at the root itself, but that point, (3,6) does not have y = 1, so
we do not select it. At the left child of the root we find point (1, 1), which we
select because of its y-value. As this node is assigned field y, we can restrict
the search. We need only move to the right child, because no point with y = 1
can be found in the left subtree, where all y-values must be less than 1 (the
left subtree is empty, here, so by coincidence we haven't saved any work). The
right child has point (0,4), which we do not select. Since x is the assigned field,
we must look at both subtrees, but they are empty trees, and we are done with
the left subtree of the root.
Now, we search the right subtree of the root, starting at (6, 7). As y is the
assigned field at this level, and 7 > 1, we need only search the left subtree of
365
(6,7). That takes us to (5,6), where because the branch is on x, we are forced
to look at both subtrees. The left subtree has only point (4,3), so we are done
with the left side. For the right side, we examine (5, 0). Since y is the assigned
field, and 1 > 0, we have only to search right.
The next move takes us to (6, 1), which we select because it has y = 1. As
x is the assigned field, we must examine both subtrees. As the left is empty,
and the right contains only (7,2), we are done. The two points with y = I,
namely (1,1) and (6, 1), have been found.
In the above example, each time we were able to restrict the search to one
subtree, it turned out that the other subtree was empty anyway; thus we did
not save any time. However, had we asked a partial-match query like x = 4, we
would have quickly followed one path:
(3,6), (6,7), (5,6), (4,3)
thereby finding the one point with x = 4. D
Range Queries on k-d-Trees
A similar idea allows us to restrict the search in a k-d-tree when given a range
query. Suppose we are at a node N that is assigned the field F, and the query
specifies a range from a to b for that field (possibly a = oo or b = oo, or both).
Let the value of field F at N be x. If 6 < x, then any descendant of N with an
F-value in the range would have to be in the left subtree, so we do not have to
search the right subtree of N. Likewise, if a > x, then we need not search the
left subtree. Otherwise, we must search both left and right.
Example 6.36: Suppose we again have the k-d-tree of Figure 6.31 and we ask
the range query with 2 < x < 4 and 2 < y < 4.16 Begin at the root and note
that the point there is not selected because its y-value is outside the range for
y. As the dividing value, x = 3, is inside the range for x, we must search both
left and right from the root.
Following the left path, we come to (1,1), which is outside the range in
both x and y. The dividing value, y = 1, is below the lower limit for y's range,
so we need only search right. That takes us to (0,4), which is not selected
because x is outside the range.
Now, let us follow the right path from the root. We come to (6, 7), whose
dividing value is y = 7. As this number exceeds the top end of the range for
y, we need search only left. That takes us to (5,6), and the dividing value,
x = 5, again sends us only left, because the top of x's range is less than 5. We
thus come to (4,3), which is the only point selected, and we are done, because
the node of (4, 3) has no children. The entire subtree rooted at (5, 0) is not
searched because we know that any point found there would have to have a
16 We use the same range for x and y only for convenience in remembering the query.
366
leaves.
Performance of k-d-Trees for Range Queries
The performance of k-d-trees on range queries is harder to estimate precisely.
The following argument offers a reasonable approximation. Suppose that our
query restricts field Fi to fraction TJ of its total domain. In particular, consider
a node TV with assigned field Fj. If the range for Fi is o to 6, and the dividing
value at N is x, where a < x < b, then we must search both subtrees of N.
However, on the left, we shall only encounter records with Fi < x. Thus, in the
search of the left subtree, the range for Fi is from a to x, and the set of possible
values for Fi we might encounter in the left subtree of N is that portion of the
set of possible values for Fi that is less than x. A similar statement holds for
the right subtree of x, but the range and set of possible values are restricted to
the portion > x.
As a result, on the average, both the range and the set of possible values
are divided in half when we must search both subtrees, keeping TJ effectively
the same as it was at N. On the other hand, if we are fortunate to need to
search only one subtree, then the range does not change size, but the set of
possible values for Fi that we might encounter is divided by 2 on the average.
Thus, on the average, rj doubles when we need to search only one subtree.
The consequence is that we cannot expect to search only one of the subtrees
367
too often. If we start with a range for Fi that is fraction rJ of the total domain
for Fi, and at j nodes that have Fi as the assigned field, we need to search
only one subtree, then the effective range for Fi has become fraction T^ of the
set of possible values. As this fraction cannot exceed 1, we find that, on the
average, we cannot expect to search only one subtree more than j < log(l/rj)
times due to Fj.
When we consider the possible savings due to each of the k fields, we find
that the number of levels at which we might expect that any path fails to
bifurcate is
(6.3)
The fraction of leaves reached is 1/2 raised to the power given by quantity (6.3),
which simplifies to fraction Ili=i rt- Fr example, if each range in the query is
half the total domain for its field, and there are n records in the file, we shall
have to look at about n/2k of the records. In general, the fraction of nodes we
look at will be close to the fraction of the entire file that we expect to meet
the conditions of the range query. Thus, like partitioned hashing, k-d-trees are
approximately as efficient as possible for range queries.
Minimizing Block Accesses for k-d-Trees
The k-d-tree was conceived of as a main-memory data structure, and it is not
well tuned to the cost measure that is appropriate for large databases: the
number of block accesses. To minimize block accesses, we must apportion nodes
to blocks in such a way that when we access a block, we are likely to need many
of the records (nodes) found on that block.
Let us suppose that blocks and records are related in size so that a node
and all its descendants for m levels can fit on one block; that is, blocks can hold
2m 1 records. Then we can allow every node at levels 1, m + 1, 2m + 1, and
so on, to be the "root" of a block, and use that block for all its descendants for
m levels down the tree.
Example 6.37: The tree of Figure 6.31 is shown partitioned into blocks on
the assumption that m = 2; i.e., blocks can hold three records. That number
of records is too low to be typical, but will illustrate the idea. Notice that the
node (0,4) is in a block by itself, and the block with root (6, 1) is missing one
of its descendants. It is inevitable for all but the most regular trees that gaps
like these will occur. D
Partitioning the nodes into blocks as described above will tend to minimize
the number of block accesses, because whenever a search reaches the root node
N of a block, thereby causing the block to be read into main memory, we shall
be following at least one path of descendants of N, and we shall follow many
368
"(5,6)
(4,3)
,
(5,0)
EXERCISES
c)
6.2:
**
6.3:
6.4:
369
370
* 6.5: Give an algorithm that takes definitions for complex objects, as in Section
2.7, and produces appropriate record formats.
6.6: Give algorithms to allocate and deallocate records using the block formats
of
a) Figure 6.3.
* b) Figure 6.4.
6.7: What advantage is there to using key values as pointers (rather than block
addresses), if a hash table is used as the primary index?
* 6.8: In Section 6.5 we claimed that when deleting from a B-tree, the keys for
the new interior nodes are found either in the blocks being merged or in
their common parent. Show this claim is true by giving an algorithm to
find the needed keys.
* 6.9: In Section 6.4 we defined "covering" of a key value by an entry in an isam
index; that definition is appropriate if the key is a true key, guaranteed
to determine a unique record in the main file. Modify the definition of
"covers" so we can obtain the first (of perhaps many) records of the main
file with a given "key" value, in the case that "keys" do not determine
unique records.
* 6.10: Modify the B-tree lookup procedure for the case (as in System R), where
"keys" can determine more than one record, and we need to find all records
with a given "key" value.
* 6.11: Modify the B-tree lookup, insertion, and deletion algorithms if we do not
insist that a key value at an interior node be the exact minimum of the
keys of the descendants for one of the children of that node (just that it be
a lower bound on the keys of the descendants of that child).
* 6.12: What happens to the distribution of nodes in a B-tree if keys only increase?
For example, consider a file of employees where ID numbers are never
reused as employees leave the company, and a new number, higher than
any used before, is assigned to each new employee.
6.13: Suppose we keep a file of information about states. Each state has a
variable-length record with a field for the state name and a repeating group
for the counties of the state. Each county group has fields for the name
and population, a repeating group for township names, and a repeating
group for city names. Give the nested structure for state records.
* 6.14: Suppose we have a nested structure of format A(B)*. An A record takes
20 bytes and a B record 30 bytes. A pointer requires 4 bytes. Each A
has associated with it from 2 to 8 B's with probabilities .05, .1, .2, .3,
.2, .1, and .05, respectively. If blocks are 100 bytes long, compare the
average number of blocks per instance of the structure used if we adopt
EXERCISES
the
a)
b)
c)
d)
6.15:
6.16:
6.17:
6.18:
* 6.19:
* 6.20:
6.21:
371
following organizations.
Store records in preorder, as in Figure 6.17.
Allocate space for eight B records regardless of how many there are.
Represent the repeating group of B's by a pointer, as in Figure 6.18.
Allocate room for p B records along with each A record, and include
a pointer to additional B records; the pointer is null if there are p or
fewer B records in the repeating group.
In (d), what is the optimal value of p?
Express the following hierarchies as nested structures.
a) The tree of Figure 5.27.
b) The tree of Figure 5.28.
Show how to express the structures of complex objects, as in Section 2.7,
as nested structures.
Explain the differences between the terms (i) primary index (it) secondary
index (Hi) dense index, and (iv) sparse index.
Give an algorithm to maintain sorted order within a bucket by linking
records, as in Figure 6.11.
In Section 6.9 we discussed the formatting of records in a DBTG database,
and we claimed that it was not always possible to keep the pointer fields
associated with a link at the same offset in both member and owner types of
a given link, without wasting space. Show that this claim is true by giving
an example of a network in which at least one record type has unnecessary,
unused space, or at least one link has its pointers in different positions in
owner and member records. Hint: Assume that all nonpointer fields are
too large to fit in space unoccupied by a pointer.
Continuing with the problem of Exercise 6.19, suppose that we reserve k
fields of all record formats to hold link pointers, that no record type is
involved in more than ro links, and that in the entire network of n record
types, we are willing to tolerate up to p links that do not have the same
position in owner and member types. For given m, n, and p, what is the
smallest k such that we can find record formats for any network meeting
the above conditions.
Suppose blocks hold 1000 bytes of data, in addition to a few pointers, and
there are records of three types, A, B, and C, of length 300, 200, and
400 bytes, respectively. Let C be a child of B, and B a child of A in the
hierarchy. Suppose that the key for record oj of type A is taken to be t,
and that database records are to be distributed into three buckets, based
on the key value of their root records. The three buckets take keys (t)
below 10 (it) 10-20, and (iii) above 20, respectively. Show the structure
372
BIO
/
^20
\
b\
62
l\
Ci
C2
I
C3
67
I
a30
/l\
&3
/
C4
bs
\
C5
bg
bw
Cg
J'4
&5
be
I
Ce
l\
C7
C8
bn
/ l\
C1O
Cii
Ci2
6.22:
6.23:
* 6.24:
* 6.25:
6.26:
of blocks if the database records of Figure 6.33 are inserted, in the order
shown, assuming the "two-dimensional" organization of Figure 6.23.
Show (a) preorder threads, and (b) leftmost-child/right-sibling pointers on
the database records of Exercise 6.21.
Show the effect of deleting 07 and inserting ci3 as the rightmost child of 62
a) Assuming records are unpinned and can slide within a bucket.
b) Assuming records are pinned and preorder threads are maintained.
Give algorithms to update (a) preorder threads (b) leftmost-child/rightsibling pointers when a record is inserted or deleted.
Show that in any n node tree, the number of nonnull leftmost-child pointers
plus the number of nonnull right-sibling pointers is exactly n - 1. What
does this relationship say about the space efficiency of leftmost-child/rightsibling pointers?
Suppose we store a file of rectangles, as discussed in Example 6.29. Let the
particular rectangles in the file be:
(1,2,10,5)
(3,4,7,6)
(2,4,9,6)
(5,1,7,8)
(1,6,3,8)
(4,2,7,10)
(5,6,9,10)
(8,2,10,7)
(9,3,10,9)
a)
b)
EXERCISES
373
6.27: Write an SQL query saying that two rectangles, represented as in Example
6.29, intersect. Is this a partial-match query? A range query? Would the
structures of Sections 6.13 and 6.14 be of use answering the query?
6.28: If all partial-match queries specify values for at least m out of the k fields,
how many indices are needed so that there will be at least one index to
answer each query, using the method outlined in Section 6.12?
6.29: Suppose we ask a range query that specifies a range for the key equal to
l/10th of the total domain for the key in the isam-indexed file of Example
6.11. On the average, how many block accesses does it take to answer the
query?
* 6.30: More generally than was discussed in Section 6.13, we can use a partitioned
hash function whose individual /ij's have ranges that are not powers of 2.
If the range of hi is 0 to nj 1, then the number of buckets is nj=i nProve that for any distribution of partial-match queries with the property
that when a value for field F is specified, any possible value for F is equally
likely, we can minimize the average number of buckets examined if we use
a partitioned hash function that stores record (1,...,fc) in the bucket
numbered
+ nk (/ifc-i(fc-i) + nfc_i(/ifc_2(fc-2) H
H "2M^i)))
(6-4)
for some values i, . . . , nfc. Note that n\ is not explicitly involved in the
formula.
6.31: Suppose we use the scheme described in Exercise 6.30 to store the rectangle
data of Exercise 6.26 in nine buckets, with MI = MJ = 3 and n3 = n4 = 1;
i.e., only the coordinates of the lower-left corner determine the bucket.
Show the distribution into buckets of the data in Exercise 6.26.
* 6.32: Consider the population of partial-match queries that each specify a value
for exactly one of the k fields, and the probability that field Fj is specified
is pi, where JZt=i P = 1. Show that the optimum value of nj to choose in
the bucket address formula (6.4) is cpi, where c is a constant that is the
same for all i and depends only on the desired number of buckets. As a
function of B, the number of buckets, what is c?
* 6.33: Consider the population of partial match queries in which the probabilities
of any field having a specified value is independent of what other fields
are specified. Let the probability that Fj has a specified value be 9j, for
t = 1, 2, . . . , k. Show that the optimum value of nj in (6.4) for this class of
queries is dqi/(l - <fr), for some d that is independent of i. Give a method
of calculating d as a function of the q^a and the desired number of buckets.
374
BIBLIOGRAPHIC NOTES
General information about data structures can be found in Knuth [1968, 1973]
and Aho, Hopcroft, and Ullman [1974, 1983]. Wiederhold [1987] covers file
structures for database systems. The selection of physical database schemes is
discussed by Gotlieb and Tompa [1973].
BIBLIOGRAPHIC NOTES
375
Hashing
Two surveys of techniques for hashing are Morris [1968] and Maurer and Lewis
[1975]; Knuth [1973] also treats the subject extensively.
Some recent developments involve variations of hashing that adapt to
changing conditions, especially growth in file size. Larson [1978], Fagin, Nievergelt, Pippenger, and Strong [1979], Litwin [1980], and Larson [1982] describe
these structures.
Interpolation Search
The B-tree is from Bayer and McCreight [1972], where it was presented as a
dense index, as in Section 6.6. Comer [1978] surveys the area.
The performance of B-trees as a data structure for database systems is dis
cussed by Held and Stonebraker [1978], Snyder [1978], Gudes and Tsur [1980],
and Rosenberg and Synder [1981]. Also see the references to Chapter 9 for
articles on concurrent access to B-trees.
Secondary Indices
Optimal selection of secondary indices is discussed by Lum and Ling [1970] and
Schkolnick [1975]. Comer [1978] shows the problem to be ^/"P-complete.
Partial-Match and Range Queries
The use of partitioned hash functions was considered in its generality by Rivest
[1976], and the design of such functions was also investigated by Burkhard
[1976] and Bolour [1979].
The k-d-tree is from Bentley [1975]. Finkel and Bentley [1974], Bentley
and Stanat [1975], Lueker [1978], Willard [1978a, b], Culik, Ottmann, and
Wood [1981], Robinson [1981], Scheuermann and Ouksel [1982], Willard and
Lueker [1985], and Robinson [1986] consider related structures for range queries.
Bentley and Friedman [1979] and Samet [1984] survey the area.
There is a well-developed theory of how fast range queries can be answered.
See Burkhard, Fredman, and Kleitman [1981] and Fredman [1981].
Notes on Exercises
Exercise 6.30 is from Bolour [1979]. Exercise 6.32 is by Rothnie and Lozano
[1974] and 6.33 from Aho and Ullman [1979].
CHAPTER 7
Design Theory
for
Relational Databases
Our study of database scheme design in Chapter 2 drew heavily on our intuition
regarding what was going on in the "real world," and how that world could
best be reflected by the database scheme. In most models, there is little more
to design than that; we must understand the options and their implications
regarding efficiency of implementation, as was discussed in Chapter 6, then rely
on skill and experience to create a good design.
In the relational model, it is possible to be somewhat more mechanical
in producing our design. We can manipulate our relation schemes (sets of at
tributes heading the columns of the relation) according to a well-developed
theory, to produce a database scheme (collection of relation schemes) with cer
tain desirable properties. In this chapter, we shall study some of the desirable
properties of relation schemes and consider several algorithms for obtaining a
database scheme with these properties.
Central to the design of database schemes is the idea of a data dependency,
that is, a constraint on the possible relations that can be the current instance of
a relation scheme. For example, if one attribute uniquely determines another, as
SNAME apparently determines SADDR in relation SUPPLIERS of Figure 2.8,
we say there is a "functional dependency" of SADDR on SNAME, or "SNAME
functionally determines SADDR."
In Section 7.2 we introduce functional dependencies formally, and in the
following section we learn how to "reason" about functional dependencies, that
is, to infer new dependencies from given ones. This ability to tell whether a
functional dependency does or does not hold in a scheme with a given collection
of dependencies is central to the scheme-design process. In Section 7.4 we con
sider lossless-join decompositions, which are scheme designs that preserve all
the information of a given scheme. The following section considers the preser
vation of given dependencies in a scheme design, which is another desirable
376
377
property that, intuitively, says that integrity constraints found in the original
design are also found in the new design.
Sections 7.6-7.8 study "normal forms," the properties of relation schemes
that say there is no, or almost no, redundancy in the relation. We relate two
of these forms, Boyce-Codd normal form and third normal form, to the desir
able properties of database schemes as a wholelossless join and dependency
preservationthat were introduced in the previous sections.
Section 7.9 introduces multivalued dependencies, a more complex form of
dependency that, like functional dependencies, occurs frequently in practice.
The process of reasoning about multivalued and functional dependencies to
gether is discussed in Section 7.9, and Section 7.10 shows how fourth normal
form eliminates the redundancy due to multivalued dependencies that is left
by the earlier normal forms. We close the chapter with a discussion of more
complex forms of dependencies that, while not bearing directly on the database
design problem as described here, serve to unify the theory and to relate the
subject of dependencies to logical rules and datalog.
3.
Redundancy. The address of the supplier is repeated once for each item
supplied.
Potential inconsistency (update anomalies). As a consequence of the re
dundancy, we could update the address for a supplier in one tuple, while
leaving it fixed in another. Thus, we would not have a unique address for
each supplier as we feel intuitively we should.
Insertion anomalies. We cannot record an address for a supplier if that
supplier does not currently supply at least one item. We might put null
values in the ITEM and PRICE components of a tuple for that supplier,
but then, when we enter an item for that supplier, will we remember to
delete the tuple with the nulls? Worse, ITEM and SNAME together form
a key for the relation, and it might be impossible to look up tuples through
a primary index, if there were null values in the key field ITEM.
378
4.
379
deduce that the ??? stands for "16 Raver St." Thus, the functional depen
dency makes all but the first SADDR field for a given supplier redundant; we
know what it is without seeing it. Conversely, suppose we did not believe the
functional dependency of SADDR on SNAME holds. Then there would be no
reason to believe that the ??? had any particular value, and that field would
not be redundant.
When we have more general kinds of dependencies than functional depen
dencies, the form redundancy takes is less clear. However, in all cases, it appears
that the cause and cure of the redundancy go hand-in-hand. That is, the depen
dency, such as that of SADDR on SNAME, not only causes the redundancy, but
it permits the decomposition of the SUP JNFO relation into the SUPPLIERS
and SUPPLIES relations in such a way that the original SUPJNFO relation
can be recovered from the SUPPLIERS and SUPPLIES relations. We shall
discuss these concepts more fully in Section 7.4.
7.2 FUNCTIONAL DEPENDENCIES
In Section 2.3 we saw that relations could be used to model the "real world"
in several ways; for example, each tuple of a relation could represent an entity
and its attributes or it could represent a relationship between entities. In many
cases, the known facts about the real world imply that not every finite set of
tuples could be the current value of some relation, even if the tuples were of
the right arity and had components chosen from the right domains. We can
distinguish two kinds of restrictions on relations:
1. Restrictions that depend on the semantics of domain elements. These
restrictions depend on understanding what components of tuples mean.
For example, no one is 60 feet tall, and no one with an employment history
going back 37 years has age 27. It is useful to have a DBMS check for such
implausible values, which probably arose because of an error when entering
or computing data. The next chapter covers the expression and use of this
sort of "integrity constraint." Unfortunately, they tell us little or nothing
about the design of database schemes.
2. Restrictions on relations that depend only on the equality or inequality of
values. There are other constraints that do not depend on what value a
tuple has in any given component, but only on whether two tuples agree
in certain components. We shall discuss the most important of these con
straints, called functional dependencies, in this section, but there are other
types of value-oblivious constraints that will be touched on in later sec
tions. It is value-oblivious constraints that turn out to have the greatest
impact on the design of database schemes.
Let R(.Ai,...,A,) be a relation scheme, and let X and Y be subsets of
{Ait...,An}. We say X y, read "X functionally determines Y" or UY
380
functionally depends on X" if whatever relation r is the current value for ft,
it is not possible that r has two tuples that agree in the components for all
attributes in the set X yet disagree in one or more components for attributes
in the set Y. Thus, the functional dependency of supplier address on supplier
name, discussed in Section 7.1, would be expressed
{SNAME} - {SADDR}
Notational Conventions
To remind the reader of the significance of the symbols we use, we adopt the
following conventions:
1. Capital letters near the beginning of the alphabet stand for single at
tributes.
2. Capital letters near the end of the alphabet, U, V, . . . , Z, generally stand
for sets of attributes, possibly singleton sets.
3. R is used to denote a relation scheme. We also name relations by their
schemes; e.g., a relation with attributes A, B, and C may be called ABC.1
4. We use r for a relation, the current instance of scheme R. Note this con
vention disagrees with the Prolog convention used in Chapter 3, where R
was used for the instance of a relation and r for a predicate, i.e., the name
of the relation.
5. Concatenation is used for union. Thus, A\ An is used to represent the
set of attributes {A\, . . . , An}, and XY is shorthand for X U Y. Also, XA
or AX, where A" is a set of attributes and A a single attribute, stands for
XLI{A}.
Significance of Functional Dependencies
Functional dependencies arise naturally in many ways. For example, if R repre
sents an entity set whose attributes are AI, . . , , An, and X is a set of attributes
that forms a key for the entity set, then we may assert X Y for any subset
Y of the attributes, even a set Y that has attributes in common with X. The
reason is that the tuples of each possible relation r represent entities, and en
tities are identified by the value of attributes in the key. Therefore, two tuples
that agree on the attributes in X must represent the same entity and thus be
the same tuple.
Similarly, if relation R represents a many-one relationship from entity set
EI to entity set E2, and among the .Aj's are attributes that form a key X for
EI and a key Y for JE2, then X Y would hold, and in fact, X functionally
1 Unfortunately, there are cases where the natural symbol for a single attribute, e.g., Z
for "zip code" or R for "room" conflicts with these conventions, and the reader will be
reminded when we use a symbol in a nonstandard way.
381
382
Satisfaction of Dependencies
We say a relation r satisfies functional dependency X Y if for every two
tuples n and v in r such that n[X] = v[X], it is also true that n[Y] = v[Y].
Note that like every "if then" statement, it can be satisfied either by n[X]
differing from f[X] or by n[Y] agreeing with v[Y]. If r does not satisfy X Y,
then r violates that dependency.
If r is an instance of scheme R, and we have declared that X Y holds
for R, then we expect that r will satisfy X Y. However, if X Y does not
hold for R in general, then r may coincidentally satisfy X Y, or it might
violate X Y.
383
F |= X
Keys
When talking about entity sets we assumed that there was a key, a set of
attributes that uniquely determined an entity. There is an analogous concept for
relations with functional dependencies. If R is a relation scheme with attributes
A\A% An and functional dependencies F, and X is a subset of A \A2 An,
we say A" is a key of R if:
1. X A\A2 An is in F+. That is, the dependency of all attributes on
the set of attributes X is given or follows logically from what is given, and
2. For no proper subset Y C X is Y A\A2 An in F+.
We should observe that minimality, condition (2) above, was not present
when we talked of keys for entity sets in Section 2.2 or keys for files in Chapter
6. The reason is that without a formalism like functional dependencies, we can
not verify that a given set of attributes is minimal. The reader should be aware
that in this chapter the term "key" does imply minimality. Thus, the given key
for an entity set will only be a key for the relation representing that entity set
if the given key was minimal. Otherwise, one or more subsets of the key for the
entity set will serve as a key for the relation.
As there may be more than one key for a relation, we sometimes designate
one as the "primary key." The primary key might serve as the file key when the
relation is implemented, for example. However, any key could be the primary
key if we desired. The term candidate key is sometimes used to denote any
minimal set of attributes that functionally determine all attributes, with the
term "key" reserved for one designated ( "primary" ) candidate key. We also use
the term superkey for any superset of a key. Remember that a key is a special
case of a superkey.
384
Example 7.3: For relation R and set of dependencies F of Example 7.2 there
is only one key, A, since A ABC is in F+ , but for no set of attributes X
that does not contain A, is X ABC true.
A more interesting example is the relation scheme R(CITY, ST, ZIP),
where ST stands for street address and ZIP for zip code. We expect tuple
(c, s,z) in a relation for R only if city c has a building with street address 5,
and z is the zip code for that address in that city. It is assumed that the
nontrivial functional dependencies are:
CITY ST - ZIP
ZIP - CITY
That is, the address (city and street) determines the zip code, and the zip code
determines the city, although not the street address. One can easily check that
{CITY, ST} and {ST, ZIP} are both keys. D
385
Example 7.4: Consider the relation scheme ABCD with functional depen
dencies A C and B D. We claim AB is a key for ABCD (in fact, it is
the only key). We can show AB is a superkey by the following steps:
1. A C (given)
2. AB - .ABC [augmentation of (1) by AB]
3. B - I> (given)
4. ABC - ABCD [augmentation of (3) by ABC]
5. AB - ABCD [transitivity applied to (2) and (4)]
To show AB is a key, we must also show that neither A nor B by them
selves functionally determine all the attributes. We could show that A is not a
superkey by exhibiting a relation that satisfies the given dependencies (1) and
(3) above, yet does not satisfy A ABCD, and we could proceed similarly
for B. However, we shall shortly develop an algorithm that makes this test
mechanical, so we omit this step here. D
Soundness of Armstrong's Axioms
It is relatively easy to prove that Armstrong's axioms are sound; that is, they
lead only to true conclusions. It is rather more difficult to prove completeness,
that they can be used to make every valid inference about dependencies. We
shall tackle the soundness issue first.
Lemma 7.1: Armstrong's axioms are sound. That is, if X Y is deduced
from F using the axioms, then X Y is true in any relation in which the
dependencies of F are true.
Proof: Al, the reflexivity axiom, is clearly sound. We cannot have a relation
T with two tuples that agree on X yet disagree on some subset of X. To prove
A2, augmentation, suppose we have a relation r that satisfies X V, yet there
are two tuples n and v that agree on the attributes of XZ but disagree on YZ.
Since they cannot disagree on any attribute of Z, /i and v must disagree on
some attribute in Y. But then n and v agree on X but disagree on Y, violating
our assumption that X Y holds for r. The soundness of A3, the transitivity
axiom, is a simple extension of the argument given previously that A B and
B C imply A C. We leave this part of the proof as an exercise. D
Additional Inference Rules
There are several other inference rules that follow from Armstrong's axioms.
We state three of them in the next lemma. Since we have proved the soundness
of Al, A2, and A3, we are entitled to use them in the proof that follows.
Lemma 7.2:
a)
b)
386
387
Attributes of X+
1
1
1
1
...
1
1
Other attributes
1
0
1
0
...
1
0
388
Computing Closures
It turns out that computing F+ for a set of dependencies F is a time-consuming
task in general, simply because the set of dependencies in F+ can be large even
if F itself is small. Consider the set
Then F+ includes all of the dependencies A Y, where Y is a subset of
{Bi, B2, , Bn}. As there are 2" such sets Y, we could not expect to list F+
conveniently, even for reasonably sized n.
At the other extreme, computing X+, for a set of attributes X, is not
hard; it takes time proportional to the length of all the dependencies in F,
written out. By Lemma 7.3 and the fact that Armstrong's axioms are sound
and complete, we can tell whether X Y is in F+ by computing X+ with
respect to F. A simple way to compute X+ is the following.
Algorithm 7.1: Computation of the Closure of a Set of Attributes with Re
spect to a Set of Functional Dependencies.
INPUT: A finite set of attributes U, a set of functional dependencies F on U,
and a set X C U.
OUTPUT: X+, the closure of X with respect to F.
METHOD: We compute a sequence of sets of attributes X(\ X^, ... by the
rules:
1. A-<> is X.
2. X(t+1) is X^ union the set of attributes A such that there is some depen
dency Y - Z, in F, A is in Z, and Y C X.
Since X = X<> C C X^ C C U, and U is finite, we must eventually
reach i such that X^ = X^+i^. Since each X^+1^ is computed only in terms
of XU>, it follows that X = X(i+1) = X(i+2> = . There is no need to
compute beyond X^ once we discover X^ = X(t+1\ We can (and shall) prove
that X+ is *> for this value of t. D
Example 7.5: Let F consist of the following eight dependencies:
AB - C
D - EG
C - A
BE-C
BC^D
CG^BD
ACD - B
CE-AG
and let X = BD. To apply Algorithm 7.1, we let X(0> = BD. To compute X(1>
we look for dependencies that have a left side B, D, or BD. There is only one,
D - FG, so we adjoin E and G to X<> and make X = BDEG. For X,
we look for left sides contained in X^ and find D EG and BE C. Thus,
= BCDEG. Then, for X(3) we look for left sides contained in BCDEG
389
390
Minimal Covers
We can find, for a given set of dependencies, an equivalent set with a number
of useful properties. A simple and important property is that the right sides of
dependencies be split into single attributes.
Lemma 7.4: Every set of functional dependencies F is equivalent to a set of
dependencies G in which no right side has more than one attribute.
Proof: Let G be the set of dependencies X A such that for some X Y
in F, A is in Y. Then X A follows from X Y by the decomposition rule.
Thus G C F+. But F C G+, since if Y = A^ An, then X -Y follows from
X AI, . . . , X An using the union rule. Thus, F and G are equivalent. D
It turns out to be useful, when we develop a design theory for database
schemes, to consider a stronger restriction on covers than that the right sides
have but one attribute. We say a set of dependencies F is minimal if:
1.
2.
3.
391
Proof: By Lemma 7.4, assume no right side in F has more than one attribute.
We repeatedly search for violations of conditions (2) [redundant dependencies]
and (3) [redundant attributes in left sides], and modify the set of dependencies
accordingly. As each modification either deletes a dependency or deletes an at
tribute in a dependency, the process cannot continue forever, and we eventually
reach a set of dependencies with no violations of (1), (2), or (3).
For condition (2), we consider each dependency X Y in the current set
of dependencies F, and if F [X F } is equivalent to F, then delete X > Y
from F. Note that considering dependencies in different orders may result in
the elimination of different sets of dependencies. For example, given the set F:
A- B
A-C
B^A
C -> A
B-C
we can eliminate both B A and A C, or we can eliminate B C, but we
cannot eliminate all three.
For condition (3), we consider each dependency A\-Ak B in the
current set F, and each attribute Ai in its left side, in some order. If
is equivalent to F, then delete Ai from the left side of A\ A/, B. Again,
the order in which attributes are eliminated may affect the result. For example,
given
AB-C
A^B
B-A
we can eliminate either A or B from AB C, but we cannot eliminate them
both.
We leave as an exercise the proof that it is sufficient first to eliminate all
violations of (3), then all violations of (2), but not vice versa. D
Example 7.6: Let us consider the dependency set F of Example 7.5. If we
use the algorithm of Lemma 7.4 to split right sides we are left with:
AB - C
-F
CG - 5
C-A
D-G
CG-D
BC - D
BE-C
CE-A
ACD - B
CE-G
Clearly CE A is redundant, since it is implied by C A. CG B is
redundant, since CG - D, C - A, and ACD - B imply CG - B. Then
no more dependencies are redundant. However, ACD B can be replaced by
CD B, since C A is given, and therefore CD B can be deduced from
ACD B and C A. Now, no further reduction by (2) or (3) is possible.
Thus, one minimal cover for F is that shown in Figure 7.2(a).
Another minimal cover, constructed from F by eliminating CE A,
392
AB-C
C - A
BC^D
D - F
D -. G
BE-C
CG-D
CE-G
BE-C
CG - B
CE-G
(a)
(b)
393
We claim that the only way to recover r is by taking the natural join of
and TSIP- The reason is that, as we shall prove in the next lemma, if we let
s = TSA oo rsip, then TTS/I(S) = rSA, and 7rS/P(S) = rs/p. If s ^ r, then given
TSA and TSIP there is no way to tell whether r or s was the original relation for
scheme SAIP. That is, if the natural join doesn't recover the original relation,
then there is no way whatsoever to recover it uniquely.
Lossless Joins
If R is a relation scheme decomposed into schemes Ri,R2, . . . ,Rk, and D is a
set of dependencies, we say the decomposition has a lossless join (with respect
to D), or is a lossless-join decomposition (with respect to D) if for every relation
r for R satisfying D:
r = 7rflt (r) txi TTflj (r) ixi txt irRt (r)
that is, every relation r is the natural join of its projections onto the /Vs. As
we saw, the lossless-join property is necessary if the decomposed relation is to
be recoverable from its decomposition.
Some basic facts about project-join mappings follow in Lemma 7.5. First
we introduce some notation. If p (Ri, RI, ,Rk) is a decomposition, then
mp is the mapping defined by rnp(r) = t< jLi7rR,(r). That is, mp(r) is the join
of the projections of r onto the relation schemes in p. Thus, the lossless join
condition with respect to a set of dependencies D can be expressed as: for all
r satisfying D, we have r = mp(r).
Lemma 7.5: Let R be a relation scheme, p = (Ri, . , Rk) be any decomposi
tion of R, and r be any relation for R. Define rj = KR,(r). Then
a) r C mp(r).
b) If s = mp(r), then nR, (s) = rj.
c) mp(mp(r)) = mp(r).
Proof:
a) Let p, be in r, and for each t, let Hi=n[Ri].3 Then Hi is in rj for all i. By
definition of the natural join, /i is in mp(r), since n agrees with Hi on the
attributes of Ri for all i.
b) As r C s by (a), it follows that nR,(r) C irR.(s). That is, r* C irR,(s). To
show nRl(s) C n, suppose for some particular i that /ij is in 7rfl((s)- Then
there is some tuple /i in s such that fi\Ri] = Hi- As /i is in a, there is some
Vj in TJ for each j such that n[Rj] = Vj. Thus, in particular, n[Ri] is in rj.
But n[Ri] = Ait, so Hi is in TJ, and therefore irRt(s) C rj. We conclude that
3 Recall that v[X] refers to the tuple v projected onto the set of attributes X.
394
c)
then KR^S) is not necessarily equal to rt. The reason is that TJ may contain
"dangling" tuples that do not match with anything when we take the join. For
example, if RI = AB, R2 = BC, r\ = {0i&i}, and r2 = {fciCi^02}, then
a = {oi6iCi} and KBC(S) = {^i^i} / r2. However, in general, ir^s) C rj, and
if the TJ'S are each the projection of some one relation r, then ^Rt(s) = rj.
The ability to store "dangling" tuples is an advantage of decomposition.
As we mentioned previously, this advantage must be balanced against the need
to compute more joins when we answer queries, if relation schemes are decom
posed, than if they are not. When all things are considered, it is generally
believed that decomposition is desirable when necessary to cure the problems,
such as redundancy, described in Section 7.1, but not otherwise.
Testing Lossless Joins
It turns out to be fairly easy to tell whether a decomposition has a lossless join
with respect to a set of functional dependencies.
Algorithm 7.2: Testing for a Lossless Join.
INPUT: A relation scheme R = A\ An, a set of functional dependencies F,
and a decomposition p = (Ri, . . . , Rk)OUTPUT: A decision whether p is a decomposition with a lossless join.
METHOD: We construct a table with n columns and k rows; column j corre
sponds to attribute Aj, and row t corresponds to relation scheme Ri. In row t
and column j put the symbol dj if Aj is in Ri . If not, put the symbol bij there.
Repeatedly "consider" each of the dependencies X Y in F, until no
more changes can be made to the table. Each time we "consider" X y,
we look for rows that agree in all of the columns for the attributes of X. If
we find two such rows, equate the symbols of those rows for the attributes of
Y. When we equate two symbols, if one of them is a,, make the other be Oj.
If they are 6^ and 6/,, make them both bij or both 6y, as you wish. It is
important to understand that when two symbols are equated, all occurrences of
those symbols in the table become the same; it is not sufficient to equate only
the occurrences involved in the violation of the dependency X Y.
If after modifying the rows of the table as above, we discover that some
row has become QI on, then the join is lossless. If not, the join is lossy (not
lossless). D
395
Example 7.8: Let us consider the decomposition of SAIP into SA and SIP
as in Example 7.7. The dependencies are S A and 5/ P, and the initial
table is
,i]
<i'2
&i3
b\,\
Oi
622
a3
a4
Since S - A, and the two rows agree on S, we may equate their symbols for
A, making 622 become a2. The resulting table is
a\
iii
a2
02
b\3
u.t
614
a.1
Since some row, the second, has all a's, the join is lossless.
For a more complicated example, let R = ABCDE, RI = AD, fl2 = AB,
RS = BE, R4 = CDE, and RS = AE. Let the functional dependencies be:
A - C
DE - C
B C
CE A
C-D
The initial table is shown in Figure 7.3(a). We can apply A C to equate
&13, 623, and 653. Then we use B C to equate these symbols with 633; the
result is shown in Figure 7.3(b), where 613 has been chosen as the representative
symbol. Now use C - D to equate 04, 624, 634, and 654; the resulting symbol
must be 04. Then DE C enables us to equate 613 with 03, and CE A
lets us equate 63i, 64i, and a\. The result is shown in Figure 7.3(c). Since the
middle row is all a's, the decomposition has a lossless join. D
It is interesting to note that one might assume Algorithm 7.2 could be
simplified by only equating symbols if one was an aj. The above example shows
this is not the case; if we do not begin by equating 613, 623, 633, and 653, we can
never get a row of all a's.
Theorem 7.4: Algorithm 7.2 correctly determines if a decomposition has a
lossless join.
Proof: Suppose the final table produced by Algorithm 7.2 does not have a
row of all a's. We may view this table as a relation r for scheme R; the rows
are tuples, and the Cj's and 6jj's are distinct symbols, each chosen from the
domain of Aj. Relation r satisfies the dependencies F, since Algorithm 7.2
modifies the table whenever a violation of the dependencies is found. We claim
that r ^ mp(r). Clearly r does not contain the tuple 01a2 an. But for each
Ri, there is a tuple ^ in r, namely the tuple that is row t, such that Hi[Ri]
consists of all a's. Thus, the join of the TT/Z, (r)'s contains the tuple with all a's,
since that tuple agrees with /^ for all t. We conclude that if the final table from
396
D
al
bi2
^13
a4
&15
01
O2
^23
^24
^25
631
O2
633
^34
05
^41
^42
a3
a4
a5
a1
^52
^53
^54
a5
BCD
01
612
O1
02
02
bi3
613
a4
bis
''21
62S
OB
OB
OB
''.Ml
''11
''13
634
''12
03
04
01
&52
''K!
&S4
(b)
A
ai
O1
f,' j
a\
a^
ba
02
02
642
&S2
C
a3
03
03
o3
03
D
a4
04
04
a4
04
E
bis
bat
a5
OB
OB
(c)
Figure 7.3 Applying Algorithm 7.2.
Algorithm 7.2 does not have a row with all o's, then the decomposition p does
not have a lossless join; we have found a relation r for R such that mp(r) ^ r.
Conversely, suppose the final table has a row with all a's. We can in general
view any table T as shorthand for the domain relational calculus expression
A R(wk)) }
(7.1)
where Wi is the ith row of T. When T is the initial table, formula (7.1) defines
the function mp. In proof, note mp(r) contains tuple ai an if and only if for
each t, r contains a tuple with Oj in the jth component if Aj is an attribute of
Ri and some arbitrary value, represented by 6y, in each of the other attributes.
Since we assume that any relation r for scheme R satisfies the dependencies
F, we can infer that each of the transformations to the table performed by
397
Algorithm 7.2 changes the table (by identifying symbols) in a way that does
not affect the set of tuples produced by (7.1), as long as that expression changes
to mirror the changes to the table. The detailed proof of this claim is complex,
but the intuition should be clear: we are only identifying symbols if in (7.1)
applied to a relation R which satisfies F, those symbols could only be assigned
the same value anyway.
Since the final table contains a row with all a's, the domain calculus ex
pression for the final table is of the form:
{a1...an|(3611)...(36fcn)(fi(a1...an)A...)}
(7.2)
row for RI
row for R2
aa a
aa- a
R2
R2 ^i
aa a
bb b
bb b
aa- a
398
399
5
545 Tech Sq.
545 Tech Sq.
Z
02138
02139
C
Cambridge, Mass.
Cambridge, Mass.
(a)
C
Cambridge, Mass.
Cambridge, Mass.
Z
02138
02139
(b)
5
545 Tech Sq.
545 Tech Sq.
Z
02138
02139
(c)
Figure 7.5 A join violating a functional dependency.
We should note that a decomposition may have a lossless join with respect
to set of dependencies F, yet not preserve F. Example 7.10 gave one such
instance. Also, the decomposition could preserve F yet not have a lossless join.
For example, let F = {A - B, C - D}, R = ABCD, and p = (AB, CD).
Testing Preservation of Dependencies
In principle, it is easy to test whether a decomposition p = (Ri,. . . ,Rk) pre
serves a set of dependencies F. Just compute F+ and project it onto all of the
ftj's. Take the union of the resulting sets of dependencies, and test whether
this set is equivalent to F.
However, in practice, just computing F+ is a formidable task, since the
number of dependencies it contains is often exponential in the size of F. There
fore, it is fortunate that there is a way to test preservation without actually
computing F+; this method takes time that is polynomial in the size of F.
Algorithm 7.3: Testing Preservation of Dependencies.
INPUT: A decomposition p = (jRi, . . . , Rk) and a set of functional dependencies
F.
OUTPUT: A decision whether p preserves F.
400
401
+ n{c,D})
= {D}U({A,B,C,D}n{C,D})
= {C,D}
Similarly, on the next pass, the BG-operation applied to the current Z {C, D}
produces Z = {B,C,D}, and on the third pass, the /1B-operation sets Z to
[A, B, C, D}, whereupon no more changes to Z are possible.
Thus, with respect to G, {D}+ = {A,B,C,D}, which contains A, so we
conclude that G [= D A. Since it is easy to check that the other members
of F are in G+ (in fact they are in G), we conclude that this decomposition
preserves the set of dependencies F. D
Theorem 7.6: Algorithm 7.3 correctly determines if X Y is in G+.
Proof: Each time we add an attribute to Z, we are using a dependency in
G, so when the algorithm says "yes," it must be correct. Conversely, suppose
X Y is in G+. Then there is a sequence of steps whereby, using Algorithm
7.1 to take the closure of X with respect to G, we eventually include all the
attributes of Y. Each of these steps involves the application of a dependency in
G, and that dependency must be in TTR, (F) for some t, since G is the union of
these projections. Let one such dependency be U V. An easy induction on
the number of dependencies applied in Algorithm 7.1 shows that eventually U
becomes a subset of Z, and then on the next pass the /Zj-operation will surely
cause all attributes of V to be added to Z if they are not already there. D
402
Example 7.12: Consider the relation scheme CSZ of Example 7.10, with
dependencies CS Z and Z C. The keys for this relation scheme are
CS and SZ, as one can easily check by computing the closures of these sets
of attributes and of the other nontrivial sets (CZ, C, S, and Z). The scheme
CSZ with these dependencies is not in BCNF, because Z C holds in CSZ,
yet Z is not a key of CSZ, nor does it contain a key. D
Third Normal Form
In some circumstances BCNF is too strong a condition, in the sense that it is not
possible to bring a relation scheme into that form by decomposition, without
losing the ability to preserve dependencies. Third normal form provides most
of the benefits of BCNF, as far as elimination of anomalies is concerned, yet it
is a condition we can achieve for an arbitrary database scheme without giving
up either dependency preservation or the lossless-join property.
Before defining third normal form, we need a preliminary definition. Call
an attribute A in relation scheme R a prime attribute if A is a member of any
key for R (recall there may be many keys). If A is not a member of any key,
then A is nonprime.
Example 7.13: In the relation scheme CSZ of Example 7.12, all attributes
are prime, since given the dependencies CS Z and Z C, both CS and
SZ are keys.
In the relation scheme ABCD with dependencies AB C, B D, and
BC A, we can check that AB and BC are the only keys. Thus, A, B, and
C are prime, and D is nonprime. D
A relation scheme R is in third normal form6 (3NF) if whenever X A
holds in R and A is not in X, then either X is a superkey for R, or A is prime.
Notice that the definitions of Boyce-Codd and third normal forms are identical
except for the clause "or A is prime" that makes third normal form a weaker
condition than Boyce-Codd normal form. As with BCNF, we in principle must
consider not only the given set of dependencies F, but all dependencies in F+
to check for a 3NF violation. However, we can show that if F consists only of
dependencies that have been decomposed so they have single attributes on the
right, then it is sufficient to check the dependencies of F only.
Example 7.14: The relation scheme SAIP from Example 7.7, with dependen
cies SI - P and 5 A violates 3NF. A is a nonprime attribute, since the only
key is SI. Then 5 A violates the 3NF condition, since 5 is not a superkey.
However, the relation scheme CSZ from Example 7.12 is in 3NF. Since all
6 Yes Virginia, there is a first normal form and there is a second normal form. First
normal form merely states that the domain of each attribute is an elementary type,
rather than a set or a record structure, as fields in the object model (Section 2.7) can
be. Second normal form is only of historical interest and is mentioned in the exercises.
403
of its attributes are prime, no dependency could violate the conditions of third
normal form. D
y\
y2
a
?
Here, x, y\, and 3/2 represent lists of values for the sets of attributes X and Y.
If we can use a functional dependency to infer the value indicated by a
question mark, then that value must be a, and the dependency used must be
Z A, for some Z C X. However, Z cannot be a superkey, because if it were,
then the two tuples above, which agree in Z, would be the same tuple. Thus, R
is not in BCNF, as supposed. We conclude that in a BCNF relation, no value
can be predicted from any other, using functional dependencies only. In Section
7.9 we shall see that there are other ways redundancy can arise, but these are
"invisible" as long as we consider functional dependencies to be the only way
the set of legal relations for a scheme can be defined.
Naturally, 3NF, being weaker than BCNF, cannot eliminate all redundancy.
The canonical example is the CSZ scheme of Example 7.12, which is in 3NF,
yet allows pairs of tuples like
CSZ
c
1
3\
S2
Z
Z
where we can deduce from the dependency Z C that the unknown value is
c. Note that these tuples cannot violate the other dependency, CS Z.
404
database schemes, and with each individual relation scheme having the proper
ties we desire for relation schemes.
It turns out that any relation scheme has a lossless join decomposition
into Boyce-Codd Normal Form, and it has a decomposition into 3NF that
has a lossless join and is also dependency-preserving. However, there may be
no decomposition of a relation scheme into Boyce-Codd normal form that is
dependency-preserving. The CSZ relation scheme is the canonical example. It
is not in BCNF because the dependency Z C holds, yet if we decompose
CSZ in any way such that CSZ is not one of the schemes in the decomposition,
then the dependency CS Z is not implied by the projected dependencies.
Before giving the decomposition algorithm, we need the following property
of lossless-join decompositions.
Lemma 7.6: Suppose R is a relation scheme with functional dependencies F.
Let p = (#i, . . . , Rn) be a decomposition of R with a lossless join with respect
to F, and let a = (S1,S2) be a lossless-join decomposition of RI with respect
to TTft, (F). Then the decomposition of R into (5i, 52, #2, , Rn) also has a
lossless join with respect to F.
Proof: Suppose we take a relation r for R, and project it onto RI , . . . , Rn to
get relations TI, . . . , rn, respectively. Then we project ri onto 5i and 52 to get
Si and 2. The lossless-join property tells us we can join s\ and s2 to recover
exactly ri, and we can then join ri, . . . , rn to recover r. Since the natural join
is an associative operation, by Theorem 2.1 (a), the order in which we perform
the join doesn't matter, so we recover r no matter in what order we take the
joinof Si,s2,r2,...,rn. D
We can apply Lemma 7.6 to get a simple but time-consuming algorithm
to decompose a relation scheme R into BCNF. If we find a violation of BCNF
in R, say X A, we decompose R into schemes R A and XA. These are
both smaller than R, since XA could not be all attributes of R (then X would
surely be a superkey, and X A would not violate BCNF). The join of R A
and XA is lossless, by Theorem 7.5, because the intersection of the schemes
is X, and X XA. We compute the projections of the dependencies for R
onto R A and XA, then apply this decomposition step recursively to these
schemes. Lemma 7.6 assures that the set of schemes we obtain by decomposing
until all the schemes are in BCNF will be a lossless-join decomposition.
The problem is that the projection of dependencies can take exponential
time in the size of the scheme R and the original set of dependencies. However,
it turns out that there is a way to find some lossless-join decomposition into
BCNF relation schemes in time that is polynomial in the size of the given set of
dependencies and scheme. The technique will sometimes decompose a relation
that is already in BCNF, however. The next lemma gives some useful properties
of BCNF schemes.
405
Lemma 7.7:
a) Every two-attribute scheme is in BCNF.
b) If R is not in BCNF, then we can find attributes A and B in R, such that
(R AB) A. It may or may not be the case that (R AB) B as
well.
Proof: For part (a), let AB be the scheme. There are only two nontrivial
dependencies that can hold: A B and B A. If neither hold, then surely
there is no BCNF violation. If only A B holds, then A is a key, so we do
not have a violation. If only B A holds, then B is a key, and if both hold,
both A and B are keys, so there can never be a BCNF violation.
For (b), suppose there is a BCNF violation X A in R. Then R must
have some other attribute B, not in XA, or else X is a superkey, and X A
is not a violation. Thus, (R AB) A as desired. D
Lemma 7.7 lets us look for BCNF violations in a scheme R with n attributes
by considering only the n(n 1)/2 pairs of attributes {A, B} and computing the
closure of R AB with respect to the given dependencies F, by using Algorithm
7.1. As stated, that algorithm takes O(n2) time, but a carefully designed data
structure can make it run in time O(n); in any event, the time is polynomial in
the size of R. If for no A and B does (R - AB)+ contain either A or B, then
by Lemma 7.7(b) we know R is in BCNF.
It is important to realize that the converse of Lemma 7.7(b) is not true.
Possibly, A is in BCNF, and yet there is such a pair {A, B}. For example, if
R = ABC, and F = {C - A, C - B}, then R is in BCNF, yet R-AB = C,
and C does functionally determine A (and B as well).
Before proceeding to the algorithm for BCNF decomposition, we need one
more observation, about projections of dependencies. Specifically:
Lemma 7.8: If we have a set of dependencies F on R, and we project them
onto RI C R to get FI, and then project FI onto #2 Q RI to get F2, then
That is, we could have assumed that F was the set of dependencies for RI , even
though F presumably mentions attributes not found in Ri.
Proof: If XY C R2, then X - Y is in F+ if and only if it is in F/. D
Lemma 7.8 has an important consequence. It says that if we decompose
relation schemes as in Lemma 7.6, then we never actually have to compute
the projected dependencies as we decompose. It is sufficient to work with the
given dependencies, taking closures of attribute sets by Algorithm 7.1 when we
need to, rather than computing whole projections of dependencies, which are
exponential in the number of attributes in the scheme. It is this observation,
together with Lemma 7.7(1)). that allows us to take time that is polynomial in
the size of the given scheme and the given dependencies, and yet discover some
406
407
The details of the algorithm are given in Figure 7.6. Figure 7.6(a) is the
main routine, which repeatedly decomposes the one scheme Z that we do not
know to be in BCNF; initially, Z is R. Figure 7.6(b) is the decomposition
procedure that either determines Z cannot be decomposed, or decomposes Z
into Z A and XA, where X A. The set of attributes XA is selected by
starting with Y = Z, and repeatedly throwing out the attribute B, the one of
the pair AB such that we found X A, where X = Y AB. Recall that it
does not matter whether X B is true or not. EH
Example 7.15: Let us consider the relation scheme CTHRSG, where C =
course, T = teacher, H = hour, R = room, S = student, and G = grade. The
functional dependencies F we assume are
C T Each course has one teacher.
HR C Only one course can meet in a room at one time.
HT R A teacher can be in only one room at one time.
CS G Each student has one grade in each course.
HS R A student can be in only one room at one time.
Since Algorithm 7.4 does not specify the order in which pairs AB are
to be considered, we shall adopt the uniform strategy of preserving the order
CTHRSG for the attributes and trying the first attribute against the others,
in turn, then the second against the third through last, and so on.
We begin with the entire scheme, CTHRSG, and the first pair to consider
is CT. We find that (HRSG)+ contains C; it also contains T, but that is
irrelevant. Thus, we begin the while-loop of Figure 7.6(b) with A = C, B = T,
and Y = CHRSG.
Now, we try the CH pair as {A,B}, but (RSG)+ contains neither C nor
H. We have better luck with the next pair, CR, because (HSG)+ contains R.
Thus, we have A = R, B = C, and we set Y to HRSG, by throwing out B, as
usual. With Y = HRSG, we have no luck until we try pair RG, when we find
(HS)+ contains R. Thus, we have A = R and B G, whereupon Y is set to
HRS.
At this point, no further attributes can be thrown out of Y, because the test
of Lemma 7.7(b) fails for each pair. We may therefore decompose CTHRSG
into
1. HRS, which plays the role of XA, with X = HS and A = R, and
2. Z = CTHRSG - R, which is CTHSG.
We now work on Z = CTHSG in the main program. The list of pairs AB
that work and the remaining sets of attributes after throwing out B, is:
1. In CTHSG: A = T,B = H, leaves Y = CTSG.
2. In CTSG: A = T,B = S, leaves Y = CTG.
3. In CTG: A = T,B = G, leaves Y = CT.
408
409
410
C-T
HR-C
HT - R
CS-G
HS - R
Algorithm 7.5 yields the set of relation schemes CT, CHR, THR, CSG, and
MRS. O
Theorem 7.7: Algorithm 7.5 yields a dependency-preserving decomposition
into third normal form.
Proof: Since the projected dependencies include a cover for F, the decompo
sition clearly preserves dependencies. We must show that the relation scheme
YB, for each functional dependency Y B in the minimal cover, is in 3NF.
Suppose X A violates 3NF for YB; that is, A is not in X, X is not a superkey for YB, and A is nonprime. Of course, we also know that XA C YB,
and X A follows logically from F. We shall consider two cases, depending
on whether or not A = B.
Case 1: A = B. Then since A is not in X, we know X C Y, and since X is
not a superkey for YB, X must be a proper subset of Y. But then X B,
which is also X A, could replace Y B in the supposed minimal cover,
contradicting the assumption that Y B was part of the given minimal cover.
Case 2: A / B. Since Y is a superkey for YB, there must be some Z C Y
that is a key for YB. But A is in Y, since we are assuming A ^ B, and A
cannot be in Z, because A is nonprime. Thus Z is a proper subset of Y, yet
Z B can replace Y B in the supposedly minimal cover, again providing a
contradiction. D
There is a modification to Algorithm 7.5 that avoids unnecessary decompo
sition. If X AI, . . . , X An are dependencies in a minimal cover, then we
7 Sometimes it is desirable to have two or more attributes, say A and B, appear together in
a relation scheme, even though there is no functional dependency involving them. There
may simply be a many-many relationship between A and B. An idea of Bernstein [1976]
is to introduce a dummy attribute 0 and functional dependency AB 9, to force this
association. After completing the design, attribute 9 is eliminated.
411
may use the one relation scheme XA\ An in place of the n relation schemes
i, . . . , XAn. It is left as an exercise that the scheme XA\ An is in 3NF.
YCXU{A,,...,At-i]
Then Y Ai is in a, and the rows for YAi and X agree on Y (they are all a's)
after the columns of the X-row for A\,...,Ai-i are made a's. Thus, these
rows are made to agree on Ai during the execution of Algorithm 7.2. Since the
YAi-Tow has a^ there, so must the X-TOW. O
Obviously, in some cases r is not the smallest set of relation schemes with
the properties of Theorem 7.8. We can throw out relation schemes in T one at
a time as long as the desired properties are preserved. Many different database
schemes may result, depending on the order in which we throw out schemes,
since eliminating one may preclude the elimination of others.
412
Example 7.17: We could take the union of the database scheme produced for
CTHRSG in Example 7.16 with the key SH, to get a decomposition that has
a lossless join and preserves dependencies. It happens that SH is a subset of
HRS, which is one of the relation schemes already selected. Thus, SH may be
eliminated, and the database scheme of Example 7.16, that is
413
CS101
CS101
CS101
CS101
CS101
CS101
Deadwood
Deadwood
Deadwood
Deadwood
Deadwood
Deadwood
H
M9
W9
F9
M9
W9
F9
R
222
333
222
222
333
222
G
Weenie
Weenie
Weenie
Grind
Grind
Grind
B+
B+
B+
c
c
* Note we could have eliminated clause (3). The existence of tuple V follows from the
existence of <f' when we apply the definition with ;/ and v interchanged.
414
415
If X - V holds, and
{x - y, Y -H z> }= x - (z - y)
It is worthwhile comparing A4-A6 with A1-A3. Axiom A4, the comple
mentation rule, has no counterpart for functional dependencies. Axiom Al,
reflexivity, appears to have no counterpart for multivalued dependencies, but
the fact that X - Y whenever Y C X, follows from Al and the rule (Ax
iom A7, to be given) that if X Y then X - Y. A6 is more restrictive
than its counterpart transitivity axiom, A3. The more general statement, that
X - y and y - Z imply X - Z, is false. For instance, we saw in Exam
ple 7.18 that C - HR holds, and surely HR -- # is true, yet C -- # is
false. To compensate partially for the fact that A6 is weaker than A3, we use
a stronger version of A5 than the analogous augmentation axiom for functional
dependencies, A2. We could have replaced A2 by: X Y and V C W imply
WX VY, but for functional dependencies, this rule is easily proved from
Al, A2, and A3.
Our last two axioms relate functional and multivalued dependencies.
A7: {X - y} \= X y.
A8: If X - y holds, Z C Y, and for some W disjoint from Y, we have
W - Z, then X - Z also holds.
Soundness and Completeness of the Axioms
We shall not give a proof that axioms A1-A8 are sound and complete. Rather,
we shall prove that some of the axioms are sound, that is, they follow from the
definitions of functional and multivalued dependencies, leaving the soundness
of the rest of the axioms, as well as a proof that any valid inference can be
made using the axioms (completeness of the axioms), for an exercise.
Let us begin by proving A6, the transitivity axiom for multivalued depen
dencies. Suppose some relation r over set of attributes U satisfies X - Y and
y - Z, but violates X >- (Z Y). Then there are tuples n and v in r,
where n(X\ = v[X], but the tuple 0, where j1[X] = n[X], <j>[Z -Y] = n(Z - Y},
and
416
i/[y], and
11[U - X - Y] = n[U - X - Y]
is in r. Now if1 and v agree on Y, so since Y - Z, it follows that r has a tuple
w, where w[y] = i/[y], w[Z] = ^[Z], and
w[t/ - Y - Z] = v[V - Y - Z]
We claim that w[X] = n[X], since on attributes in Z n X, w agrees with
V', which agrees with n. On attributes of X Z, u1 agrees with i/, and v agrees
with n on X. We also claim that w[Z Y] = n[Z Y], since w agrees with ip
on Z Y, and V agrees with n on Z - Y . Finally, we claim that u[V] = v[V],
where V = U X - (Z Y). In proof, surely u> agrees with v on V Z, and
by manipulating sets we can show V f~l Z = (Y n Z) X. But u agrees with
V1 on Z, and V agrees with v on Y, so w agrees with v on V D Z as well as on
V - Z. Therefore u agrees with v on V. If we look at the definition of <, we
now see that u = 41. But we claimed that u is in r, so <f1 is in r, contrary to our
assumption. Thus X - Z Y holds after all, and we have proved A6.
Now let us prove A8. Suppose in contradiction that we have a relation r
in which X Y and W - Z hold, where Z C Y, and W n Y is empty, but
X Z does not hold. Then there are tuples v and /i in r such that i/[A] = n[X],
but i/[Z] / P\Z]. By X - Y applied to v and fi, there is a tuple </1 in r, such
that <f>[X] = n[X] = v[X], 41[Y] = n[Y], and </1[[/ - X - Y] = v[U - X - Y].
Since W n V is empty, <j> and v agree on W. As Z C V, 0 and /i agree on Z.
Since v and /i disagree on Z, it follows that <j1 and i/ disagree on Z. But this
contradicts W Z, since ^ and i> agree on W but disagree on Z. We conclude
that X Z did not fail to hold, and we have verified rule A8.
The remainder of the proof of the following theorem is left as an exercise.
Theorem 7.9: (Beeri, Fagin, and Howard [1977]). Axioms A1-A8 are sound
and complete for functional and multivalued dependencies. That is, if D is a set
of functional and multivalued dependencies over a set of attributes I/, and D+
is the set of functional and multivalued dependencies that follow logically from
D (i.e., every relation over U that satisfies D also satisfies the dependencies in
D+), then D+ is exactly the set of dependencies that follow from D by A1-A8.
D
Additional Inference Rules for Multivalued Dependencies
There are a number of other rules that are useful for making inferences about
functional and multivalued dependencies. Of course, the union, decomposition,
only the existence of if>, not the additional existence of \l> as in the third clause of the
definition. Thus, the violation of a multivalued dependency can be stated as the absence
of <j> (not <t> or T/) from the relation r.
417
{x -~ y, x -~ z} \= x -~ YZ
2.
3.
4.
We leave the proof that these rules are valid as an exercise; techniques similar
to those used for A6 and A8 above will suffice, or we can prove them from
axioms A1-A8.
We should note that the decomposition rule for multivalued dependencies is
weaker than the corresponding rule for functional dependencies. The latter rule
allows us to deduce immediately from X Y that X A for each attribute A
in y. The rule for multivalued dependencies only allows us to conclude X - A
from X - y if we can find some Z such that X - Z, and either Z n Y = A
orY-Z = A.
The Dependency Basis
However, the decomposition rule for multivalued dependencies, along with the
union rule, allows us to make the following statement about the sets Y such
that X -~ Y for a given X.
Theorem 7.10: If U is the set of all attributes, then we can partition U X
into sets of attributes Yi,...,Yk, such that if Z C U X, then X f Z if and
only if Z is the union of some of the Y^s.
Proof: Start the partition ofU X with all ofU X in one block. Suppose at
some point we have partition W\, . . . , Wn, and X - W, for i = 1, 2, . . . ,n. If
X >- Z, and Z is not the union of some HVs, replace each Wi such that Wif~\Z
and Wi Z are both nonempty by Wi C\ Z and Wi Z. By the decomposition
rule, X -*- (Wi n Z) and X - (Wi Z). As we cannot partition a finite
set of attributes indefinitely, we shall eventually find that every Z such that
X - Z is the union of some blocks of the partition. By the union rule, X
multidetermines the union of any set of blocks. D
We call the above sets Y\ , . . . , Yk constructed for X from a set of functional
and multivalued dependencies D the dependency basis for X (with respect to
D).
418
419
Lossless Joins
Algorithm 7.2 helps us determine when a decomposition of a relation scheme
R into (Ri,. ,Rk) has a lossless join, on the assumption that the only depen
dencies to be satisfied by the relations for R are functional. That algorithm can
be generalized to handle multivalued dependencies, as we shall see in the next
section. In the case of a decomposition of R into two schemes, there is a simple
test for a lossless join.
Theorem 7.11: Let R be a relation scheme and p = (Ri,R2) a decomposi
tion of R. Let D be a set of functional and multivalued dependencies on the
attributes of R. Then p has a lossless join with respect to D if and only if
420
421
422
423
for CSP. For then, given any relation r for CSPY that satisfies SP - Y
and the dependency C - S1 in CSP, we can prove that mp(r) = r. Yet we
could not prove this assuming only the functional dependency SP V; the
reader is invited to find a relation r satisfying SP V (but not the embedded
dependency) such that mp(r) ^ r. D
We shall consider embedded multivalued dependencies further in the next
section. Here let us introduce the standard notation for such dependencies. A
relation r over relation scheme R satisfies the embedded multivalued depen
dency X - Y \ Z if the multivalued dependency X - Y is satisfied by the
relation 7rxuvuzC'"), which is the projection of r onto the set of attributes men
tioned in the embedded dependency. Note that there is no requirement that X,
Y, and Z be disjoint, and by the union, decomposition, and complementation
rules, X - Y holds in nxuYuz(r) if and only if X - Z does, so X >- Y \ Z
means the same as X - Z \ Y. As an example, the embedded multivalued
dependency from Example 7.21 is written C - 5 \ P or C - P | 5.
7.11 GENERALIZED DEPENDENCIES
In this section we introduce a notation for dependencies that generalizes both
functional and multivalued dependencies. Modeling the "real world" does not
demand such generality; probably, functional and multivalued dependencies are
sufficient in practice.12 However, there are some key ideas, such as the "chase"
algorithm for inferring dependencies, that are better described in the general
context to which the ideas apply than in the special case of functional or multi
valued dependencies. The ideas associated with generalized dependencies also
get used in query optimization, and they help relate dependencies to logical
rules (Horn clauses), thereby allowing some of this theory to apply to optimiza
tion of logic programs as well.
We view both functional and multivalued dependencies as saying of rela
tions that "if you see a certain pattern, then you must also see this." In the
case of functional dependencies, "this" refers to the equality of certain of the
symbols seen, while for multivalued dependencies, "this" is another tuple that
must also be in the relation. For example, let U = ABCD be our set of at
tributes. Then the functional dependency A B says that whenever we see, in
12 Often, one observes inclusion dependencies, as well. These are constraints that say
a value appearing in one attribute of one relation must also appear in a particular
attribute of some other relation. For example, we would demand of the YVCB database
that a customer name appearing in the GUST field of an ORDERS tuple also appear
in the CNAME field of the CUSTOMERS relation; i.e., each order must have a real
customer behind it. The desire to enforce inclusion dependencies explains the mechanics
of insertion and deletion in the DBTG proposal (Section 5.3), and the constraints System
R places on a pair of relations that are stored "via set" (Section 6.11). As inclusion
dependencies do not influence the normalization process, their theory is mentioned only
in the exercises.
424
some relation r, two tuples ab^cidi and ab2c2d?,, then 61 = 62 in those tuples.
The multivalued dependency A - B says of the same two tuples that we must
also see the tuple abic^d2 in r, which is a weaker assertion than saying 61 = 62.
A convenient tabular form of such dependencies is shown in Figure 7.8.
a
a
l>\
62
c\
2
d\
f/2
where the *j's are n-tuples of symbols, and t is either another n-tuple (in which
case we have a tuple-generating dependency) or an expression x = y, where
x and y are symbols appearing among the ij's (then we have an equalitygenerating dependency). We call the <j's the hypotheses and t the conclusion.
Intuitively, the dependency means that for every relation in which we find the
hypotheses, the conclusion holds. To see the hypothesis tuples, we may have
to rename some or all of the symbols used in the hypotheses to make them
match the symbols used in the relation. Any renaming of symbols that is done
applies to the conclusion as well as the hypotheses, and of course it applies to
425
all occurrences of a symbol. We shall give a more formal definition after some
examples and discussion.
Frequently we shall display these dependencies as in Figure 7.8, with the
hypotheses listed in rows above a line and the conclusion below. It is sometimes
useful as well to show the attributes to which the columns correspond, above a
line at the top. In all cases, we assume that the order of the attributes in the
relation scheme is fixed and understood.
Typed and Typeless Dependencies
426
a
a
d
b
c
e
e
a
a
c
c
c
S1
Pi
S2
p2
Si
P2
y
yi
y2
y3
says that if we have two tuples p. and v in relation r that project onto
427
XUY\JZ
to give tuples p,' and i/, and n'[X] = v'[X], then there is some tuple w in r
that projects to u/ and satisfies u'[X] = n'[X] = v'[X], u1'[Y] = fJ.'[Y], and
w'[Z] = v'[Z]. Notice that nothing at all is said about the value of w for
attributes in U - X Y Z. Clearly, we can express all the above in our
generalized dependency notation, where n and v are the first and second hypo
theses, and w is the conclusion. Since we can only conclude that the tuple u>
has some values in the attributes U X Y Z, but we cannot relate those
values to the values in /i or f, we must use unique symbols in our conclusion.
One reason for introducing the generalized dependency notation is that
it leads to a conceptually simple way to infer dependencies. The test works
for full dependencies of all sorts, although it may take exponential time, and
therefore, is not preferable to Algorithm 7.1 for inferring functional depen
dencies from other functional dependencies, or to the method outlined before
Algorithm 7.6 (computation of the dependency basis) when only functional and
multivalued dependencies are concerned. When there are embedded dependen
cies, the method may succeed in making the inference, but it may also give an
inconclusive result. There is in fact, no known algorithm for testing whether an
embedded dependency follows logically from others, even when the dependen
cies are restricted to an apparently simple class, such as embedded multivalued
dependencies.
Generalized Dependencies and Horn Clauses
428
body. The correct interpretation of such a rule is that, given values of C, SI,
and P2 that, together with values for the other variables of the body, satisfy
the subgoals of the body, the conclusion of the head is true for all values of Y3.
However, the meaning of the embedded dependency is that there exists some
value of Y3 that makes the head true for these values of C, SI, and P2.
Symbol Mappings
Before giving the inference test for generalized dependencies, we need to intro
duce an important concept, the symbol mapping, which is a function h from
one set of symbols 5 to another set T; that is, for each symbol a in 5, h(a) is
a symbol in T. We allow h(a) and h(b) to be the same member of T, even if
If n = 01a2 an is a tuple whose symbols are in 5, we may apply the
symbol mapping h to /i and obtain the tuple h(n) = /i(01)/i(a2) -/i(an). If
{ni, . . . , Hk} is a set of tuples whose symbols are in 5, and (1/i, . . . , vm] are
tuples whose symbols are in T, we say there is a symbol mapping from the first
set of tuples to the second if there is some h such that for all i = 1, 2, . . . , fc,
h(Hi) is Vj for some j. It is possible that two or more /Vs are mapped to the
same i>j, and some i^'s may be the target of no /^.
Example 7.24: Let A = {abc, ade, fbe} and B = {xyz, wyz}. There are
several symbol mappings from A to B. One has h(a) = h(f) = x, h(b) =
h(d) = y, and h(c) = h(e) = z. Thus, h maps all three tuples in A to xyz.
Another symbol mapping has g(a) = x, g(b) = g(d) = y, g(c) = g(e) = z, and
g(f) = w. Symbol mapping g sends abc and ade to xyz, but sends fbe to wyz.
a
Our most important use for symbol mappings is as maps between sets of
rows as in Example 7.24. The reader should observe a duality that holds in that
situation. We defined symbol mappings as functions on symbols, and when
applied to sets of rows, we added the requirement that the mapping applied
to each row of the first set is a row of the second set. Dually, we could have
defined mappings from rows to rows, and added the requirement that no symbol
be mapped by two different rows to different symbols. Thus, in Example 7.24,
we could not map abc to xyz and also map ade to wyz, because a would be
mapped to both x and w.
Formal Definition of Generalized Dependency
With the notion of a symbol mapping, we can formally define the meaning
of generalized dependencies. We say a relation r satisfies the tuple-generating
dependency (ti, . . . , tn)/t if whenever ft is a symbol mapping from all the hypo
theses {ti,...,tn} to r, we can extend h to any unique symbols in t in such
a way that h(t) is in r. We also say that r satisfies the equality-generating
429
61
Ci
01
62
C2
Q2
b\
C2
430
the mapping to the unique symbol 02 in such a way that the conclusion of d is
present in r. Therefore, r satisfies d. D
Applying Dependencies to Relations
431
432
our present claim. That is, Algorithm 7.2, the lossless join test, can now be
seen as a use of the chase process to test whether F (= j, where j is the join
dependency made from the decomposition to which Algorithm 7.2 is applied,
that is ex (Ri, . . . , Rk). As in Theorem 7.4, we can see the relation r used in the
chase as saying that certain tuples are in a hypothetical relation that satisfies
D.
Initially, these tuples are the hypotheses {<i,...,tm} of the dependency
being tested. Each time we apply a dependency, we are making an inference
about other tuples that must be in this hypothetical relation (if we use a tuplegenerating dependency), or about two symbols that must be equal (if we use
an equality-generating dependency). Thus, each application is a valid inference
from D, and if we infer the presence of t, that too is valid, i.e., we have shown
that any relation containing ti,...,tm also contains t (or a tuple that agrees
with t on nonunique symbols).
However, the dependency d says more than that a relation that contains the
exact tuples {t\, . . . , tm} also contains t. It says that if any relation whatsoever
contains the tuples formed by some symbol mapping h of the ti 's, then h can be
extended to the unique symbols of t, and h(t) will also be in the relation. We
can show this more general statement by following the sequence of applications
of dependencies in D during the chase. That is, start with (h(ti), ..., h(tm)}
and apply the same sequence of dependencies from D by composing the symbol
mapping used to apply each dependency, with the symbol mapping h, to get
another symbol mapping. The result will be the image, under h, of the sequence
of changes made to the original relation r (t\, . . . , tm}.
We must also explain how to test, using the chase process, whether an
equality-generating dependency (t\, . . . , tm)/a = b follows from a set of depen
dencies D. Follow the same process, but end and say yes if we ever equate
the symbols a and 6; say no as for tuple-generating dependencies, if we can
make no more changes to r, yet we have not equated a and 6. The validity
of the inferences follows in essentially the same way as for tuple-generating
dependencies.
We can sum up our claim in the following theorem.
Theorem 7.12: The chase process applied to a set of full generalized de
pendencies D and a (possibly embedded) generalized dependency d determines
correctly whether D |= d.
Proof: Above, we argued informally why the procedure, if it makes an answer
at all, answers correctly. We shall not go into further detail; Maier, Mendelzon,
and Sagiv [1979] contains a complete proof of the result.
We must, however, show that if D has only full dependencies, then the
process is an algorithm; that is, it always halts. The observation is a simple
one. When we apply a full dependency, we introduce no new symbols. Thus,
433
the relation r only has tuples composed of the original symbols of the hypothe
ses of d. But there are only a finite number of such symbols, and therefore r is
always a subset of some finite set. We have only to rule out the possibility that r
exhibits an oscillatory behavior; that is, it assumes after successive applications
of dependencies, a sequence of values
n.r2,...,rn = ri,r2-"
Tuple-generating dependencies always make the size of r increase, while
equality-generating dependencies either leave the size the same or decrease it.
Thus, the cycle must contain at least one equality-generating dependency. But
here, an equality of symbols permanently reduces the number of different sym
bols, since only the application of an embedded dependency could increase the
number of different symbols in r, and D was assumed to contain full depen
dencies only. Thus no cycle could involve an equality-generating dependency
and full tuple-generating dependencies only, proving that no cycle exists. We
conclude that either we reach a condition where no change to r is possible, or
we discover that the conclusion of d is in r. D
C2
01
61
02
0*3
(a) A B | C
02
0.3
b3
63
c3
c4
d4
d5
d4 = d5
(b) B - D
a4
04
64
65
c5
ce
d6
0*7
04 6e c5 d^
(c) A -~ C | D
Figure 7.11 Example dependencies.
Example 7.27: Example 7.8 was really an application of the chase algorithm
to make the inferences {S - A, SI - P} (= txj (SA, SIP) and
434
{A - C, B - C, C - D, DE - C, CE - 4} |=
txi
As another example, we can show that over the set of attributes ABCD
04
64
c5
de
04
65
Ce
d7
EXERCISES
435
EXERCISES
7.1: Suppose we have a database for an investment firm, consisting of the follow
ing attributes: B (broker), O (office of a broker), / (investor), S (stock), Q
(quantity of stock owned by an investor), and D (dividend paid by a stock),
with the following functional dependencies: 5 - D, / B, IS Q, and
B-O.
a) Find a key for the relation scheme R = BOSQID.
b) How many keys does relation scheme R have? Prove your answer.
c) Find a lossless join decomposition of R into Boyce-Codd normal form.
d) Find a decomposition of R into third normal form, having a lossless
join and preserving dependencies.
7.2: Suppose we choose to represent the relation scheme R of Exercise 7.1 by
the two schemes ISQD and IBO. What redundancies and anomalies do
you forsee?
7.3: Suppose we instead represent R by 5D, IB, ISQ, and BO. Does this
decomposition have a lossless join?
7.4: Suppose we represent R of Exercise 7.1 by ISQ, IB, SD, and ISO. Find
minimal covers for the dependencies (from Exercise 7.1) projected onto
each of these relation schemes. Find a minimal cover for the union of the
projected dependencies. Does this decomposition preserve dependencies?
7.5: In the database of Exercise 7.1, replace the functional dependency S D
by the multivalued dependency 5 - D. That is, D now represents the
dividend "history" of the stock.
a) Find the dependency basis of /.
b) Find the dependency basis of BS
c) Find a fourth normal form decomposition of R.
7.6: Consider a database of ship voyages with the following attributes: 5 (ship
name), T (type of ship), V (voyage identifier), C (cargo carried by ong
ship on one voyage), P (port), and D (day). We assume that a voyage
consists of a sequence of events where one ship picks up a single cargo,
and delivers it to a sequence of ports. A ship can visit only one port in a
single day. Thus, the following functional dependencies may be assumed:
5 - T, V - SC, and SD - PV.
a) Find a lossless-join decomposition into BCNF.
b) Find a lossless-join, dependency-preserving decomposition into 3NF.
* c) Explain why there is no lossless-join, dependency-preserving BNCF
decomposition for this database.
436
7.7: Let U be a set of attributes and D a set of dependencies (of any type) on
the attributes of U. Define SAT(D) to be the set of relations r over U such
that r satisfies each dependency in D. Show the following.
a) SAT(DiUL>2) = SAT(D1)nSAT(D2).
b) If D\ logically implies all the dependencies in D2, then
SAT(DI) D SAT(Z>2)
7.8: Complete the proof of Lemma 7.1; i.e., show that the transitivity axiom
for functional dependencies is sound.
7.9: Complete the proof of Theorem 7.2 by showing statement (*):
If X1 C X2 then X^ C X(2j} for all j
7.10: Let F be a set of functional dependencies.
a) Show that X A in F is redundant if and only if X+ contains A,
when the closure is computed with respect to F {X A}.
b) Show that attribute B in the left side X of & functional dependency
X A is redundant if and only if A is in (X [B})+, when the
closure is taken with respect to F.
* 7.11: Show that singleton left sides are insufficient for functional dependencies.
That is, show there is some functional dependency that is not equivalent
to any set of functional dependencies {A\ BI, . . . , A^ Bk}, where the
A's and B's are single attributes.
* 7.12: Develop the theory of functional dependencies with single attributes on the
left and right sides (call them 5AFD's). That is:
a) Give a set of axioms for SAFD's; show that your axioms are sound
and complete.
b) Give an algorithm for deciding whether a set of SAFD's implies an
other SAFD.
c) Give an algorithm to test whether two sets of SAFD's are equivalent.
d) SAFD's look like a familiar mathematical model. Which?
* 7.13: In Theorem 7.3 we used two transformations on sets of functional depen
dencies to obtain a minimal cover:
t) Eliminate a redundant dependency.
it) Eliminate a redundant attribute from a left side.
Show the following:
a) If we first apply (it) until no more applications are possible and then
apply (t) until no more applications are possible, we always obtain a
minimal cover.
EXERCISES
437
b)
If we apply first (i) until no longer possible, then apply (ii) until no
longer possible, we do not necessarily reach a minimal cover.
7.14: A relation scheme R is said to be in second normal form if whenever X A
is a dependency that holds in R, and A is not in X, then either A is prime
or X is not a proper subset of any key (the possibility that X is neither a
subset nor a superset of any key is not ruled out by second normal form).
Show that the relation scheme SAIP from Example 7.14 violates second
normal form.
7.15: Show that if a relation scheme is in third normal form, then it is in second
normal form.
7.16: Consider the relation scheme with attributes S (store), D (department),
/ (item), and M (manager), with functional dependencies SI D and
SD-M.
a)
b)
* 7.17: Give an O(n) algorithm for computing X+, where X is a set of at most n
attributes, with respect to a set of functional dependencies that require no
more than n characters, when written down.
* 7.18: Complete the proof of Theorem 7.5 by providing a formal proof that in the
row for RI, an a is entered if and only if RI fl R2 A.
7.19: Complete the proof of Lemma 7.5 by showing that if r C s then
438
EXERCISES
439
* 7.37: Near the beginning of Section 7.10 we claimed that we could project a set
of multivalued and functional dependencies D onto a set of attributes 5 by
the following rules (somewhat restated).
t) X - Y is in irs(D) if and only if XY C 5 and X - Y is in D+.
ii) X - Y is in irs(D) if and only if X C 5, and there is some multi
valued dependency X ~ Z in D+, such that Y = Z D 5.
Prove this contention.
7.38: Show that the decomposition (CHR, CT, CSG) obtained in Example 7.20
is not lossless with respect the the given functional dependencies only; i.e.,
the multivalued dependency C - HR is essential to prove the lossless
join.
440
7.39: Use the chase algorithm to tell whether the following inferences are valid
over the set of attributes ABCD.
a) [A B, A - C} \= A -~ D
b) {A -~ B | C, B -~ C | D} \= A -~ C \ D
c) {A B | C, A - Z?} (= A C | D
**d) {A-^B|C, AC|D} |=,4--S|D
* 7.40: Show that no collection of tuple-generating dependencies can imply an
equality-generating dependency.
7.41: State an algorithm to determine, given a collection of functional, (full)
multivalued, and (full) join dependencies, whether a given decomposition
has a lossless join.
7.42: Show that the multivalued dependency X - Y over the set of attributes
U is equivalent to the join dependency txi (XY, XZ), where Z = U X Y.
Hint: Write both as generalized dependencies.
7.43: What symbol mapping explains the application of Figure 7.11(b) to Figure
7.12(b) to deduce Figure 7.12(c)?
* 7.44: Show that Theorem 7.11, stated for functional and multivalued dependen
cies, really holds for arbitrary generalized dependencies. That is, (Ri,R2)
has a lossless join with respect to a set of generalized dependencies D if
and only if (Ri D fl2) -" (Ri - #2).
* 7.45: Show that if decomposition p = (Ri,...,Rk) has a lossless join with
respect to a set of generalized dependencies D, then the decomposition
(Ri, . . . , Rk, S) also has a lossless join with respect to D, where 5 is an
arbitrary relation scheme over the same set of attributes as p.
* 7.46 Show that it is AfP-haid (A/"P-complete or hardersee Garey and Johnson
[1979]) to determine:
a) Given a relation scheme R and a set of functional dependencies F on
the attributes of fl, whether R has a key of size k or less with respect
toF?
b) Given R and F as in (a), and given a subset S C R, is 5 in BNCF
with respect to Fl
c) Whether a given set of multivalued dependencies implies a given join
dependency.
* 7.47: A unary inclusion dependency A C B, where A and B are attributes (per
haps from different relations) says that in any legal values of the relation(s),
every value that appears in the column for A also appears in the column
for B. Show that the following axioms
t) A C A for all A.
ii) If A C B and B C C then A C C.
BIBLIOGRAPHIC NOTES
441
Show that if relations are assumed to be finite, then all the above
dependencies can be reversed; that is,
AI C AI, A^ - ^3, At C A3, AH A5, . . . , An C An-i, An - A\
b)
Show that there are infinite relations for which (a) does not hold; that
is, they satisfy all the given dependencies but not of their reverses.
* 7.49 Show that if D is a set of functional dependencies only, then a relation R
is in BCNF with respect to D if and only if R is in 4NF with respect to D.
* 7.50 Show that if X - AI, . . . , X - An are functional dependencies in a mini
mal cover, then the scheme XA\ An is in 3NF.
BIBLIOGRAPHIC NOTES
Maier [1983] is a text devoted to relational database theory, and provides a
more detailed treatment of many of the subjects covered in this chapter. Fagin
and Vardi [1986] and Vardi [1988] are surveys giving additional details in the
area of dependency theory. Beeri, Bernstein, and Goodman [1978] is an early
survey of the theory that provided the motivation for the area.
Functional Dependencies
Functional dependencies were introduced by Codd [1970]. Axioms for func
tional dependencies were first given by Armstrong [1974]; the particular set of
axioms used here (called "Armstrong's axioms") is actually from Beeri, Fagin,
and Howard [1977]. Algorithm 7.1, the computation of the closure of a set of
attributes, is from Bernstein [1976].
Lossless-Join Decomposition
Algorithm 7.2, the lossless join test for schemes with functional dependencies,
is from Aho, Beeri, and Ullman [1979]. The special case of the join of two
relations, Theorem 7.5, was shown in the "if direction by Heath [1971] and
Delobel and Casey [1972] and in the opposite direction by Rissanen [1977].
Liu and Demers [1980] provide a more efficient lossless join test for schemes
with functional dependencies. Testing lossless joins is equivalent to inferring a
join dependency, so the remarks below about inference of generalized depen
dencies are relevant to lossless-join testing.
442
Dependency-Preserving Decomposition
Algorithm 7.3, the test for preservation of dependencies, is by Beeri and Honeyman [1981].
The paper by Ginsburg and Zaiddan [1982] points out that when projected,
functional dependencies imply certain other dependencies, which happen to
be equality-generating, generalized dependencies, but are not themselves func
tional. As a result, when we discuss projected dependencies, we must be very
careful to establish the class of dependencies about which we speak.
Graham and Yannakakis [1984] discuss "independence," a condition on a
decomposition that allows satisfaction of dependencies to be checked in the
individual relations of a decomposition.
Gottlob [1987] gives an algorithm to compute a cover for ITR(F) directly
from F; that is, it is not necessary to compute F+ first. However, the algorithm
is not guaranteed to run in polynomial time.
Normal Forms and Decomposition
Third normal form is denned in Codd [1970] and Boyce-Codd normal form in
Codd [1972a]. The definitions of first and second normal forms are also found
in these papers.
The dependency-preserving decomposition into third normal form, Algo
rithm 7.5, is from Bernstein [1976], although he uses a "synthetic" approach,
designing a scheme without starting with a universal relation. Theorem 7.3,
the minimal cover theorem used in Algorithm 7.5, is also from Bernstein [1976];
more restrictive forms of cover are found in Maier [1980, 1983].
The lossless-join decomposition into BCNF given in Algorithm 7.4 is from
Tsou and Fischer [1982]. Theorem 7.8, giving a 3NF decomposition with a
lossless join and dependency preservation, is from Biskup, Dayal, and Bernstein
[1979]. A related result appears in Osborn [1977].
The equivalence problem for decompositions of a given relation was solved
by Beeri, Mendelzon, Sagiv, and Ullman [1981]. Ling, Tompa, and Kameda
[1981] generalize the notion of third normal form to account for redundancies
across several different relation schemes.
Schkolnick and Sorenson [1981] consider the positive and negative conse
quences of normalizing relation schemes.
Additional Properties of Decompositions
BIBLIOGRAPHIC NOTES
443
isfy the dependencies, and projections of relations that satisfy the projected
dependencies. Maier, Mendelzon, Sadri, and Ullman [1980] show that these
notions are equivalent for functional dependencies, but not for multivalued de
pendencies.
Honeyman [1983] offers an appropriate definition for what it means for a
decomposition (database scheme) to satisfy a functional dependency. Graham,
Mendelzon, and Vardi [1986] discuss the extension of this question to generalized
dependencies.
Recognizing Normalized Relations
Osborn [1979] gives a polynomial-time algorithm to tell whether a given relation
scheme R is in BCNF, with respect to a given set of dependencies F over #.14
In contrast, Jou and Fischer [1983] show that telling whether R is in third
normal form with respect to F is .ATP-complete.
Multivalued Dependencies
Multivalued dependencies were discovered independently by Fagin [1977], Delobel [1978], and Zaniolo [1976] (see also Zaniolo and Melkanoff [1981]), although
the earliest manifestation of the concept is in Delobel's thesis in 1973.
The axioms for multivalued dependencies are from Beeri, Fagin, and
Howard [1977]. The independence of subsets of these axioms was considered by
Mendelzon [1979], while Biskup [1980] shows that if one does not assume a fixed
set of attributes, then this set minus the complementation axiom forms a sound
and complete set. Lien [1979] develops axioms for multivalued dependencies on
the assumption that null values are permitted.
Sagiv et al. [1981] show the equivalence of multivalued dependency theory
to a fragment of prepositional calculus, thus providing a convenient notation in
which to reason about such dependencies.
The dependency basis and Algorithm 7.6 are from Beeri [1980]. Hagihara
et al. [1979] give a more efficient test whether a given multivalued dependency
is implied by others, and Galil [1982] gives an even faster way to compute the
dependency basis.
Embedded multivalued dependencies were considered by Fagin [1977], Delobel [1978] and Tanaka, Kambayashi, and Yajima [1979].
More Normal Forms
Fourth normal form was introduced in Fagin [1977]. In Fagin [1981] we find an
"ultimate" normal form theorem; it is possible to decompose relation schemes so
14 The reader should not be confused between this result and Exercise 7.46(b). The latter
indicates that telling whether a relation scheme R is in BCNP given a set of functional
dependencies, defined on a superset of R, is .ATP-complete.
444
The "chase" as an algorithm for inferring dependencies has its roots in the
lossless join test of Aho, Beeri, and Ullman [1979]. The term "chase," and its
first application to the inference of dependencies, is found in Maier, Mendelzon,
and Sagiv [1979]. Its application to generalized dependencies is from Beeri and
Vardi [1984b].
The undecidability of implication for generalized tuple-generating depen
dencies was shown independently by Vardi [1984] and Gurevich and Lewis
[1982]. Key results leading to the undecidability proof were contained in earlier
papers by Beeri and Vardi [1981] and Chandra, Lewis, and Makowsky [1981].
Axiomatization of Generalized Dependencies
Several sound and complete axiom systems for generalized dependencies are
found in Beeri and Vardi [1984a] and Sadri and Ullman [1981]. Yannakakis and
Papadimitriou [1980] gives an axiom system for algebraic dependencies.
Inclusion Dependencies
BIBLIOGRAPHIC NOTES
445
Notes on Exercises
Exercise 7.13 (on the order of reductions to produce a minimal cover) is from
Maier [1980]. Exercise 7.17 (efficient computation of the closure of a set of
attributes) is from Bernstein [1976], although the problem is actually equivalent
to the problem of telling whether a context-free grammar generates the empty
string.
Exercise 7.32, the soundness and completeness of axioms A1-A8 for func
tional and multivalued dependencies, is proved in Beeri, Fagin, and Howard
[1977]. The algorithm for projecting functional and multivalued dependencies,
Exercise 7.37, was proved correct in Aho, Beeri, and Ullman [1979].
Exercise 7.46(a), the J^fP-completeness of telling whether a relation scheme
has a key of given size, is by Lucchesi and Osborn [1978]; part (b), telling
whether a relation scheme is in BNCF, is from Beeri and Bernstein [1979], and
part (c), inferring a join dependency from multivalued dependencies, is from
Fischer and Tsou [1983].
Exercise 7.48 is from Kanellakis, Cosmadakis, and Vardi [1983]; it is the key
portion of a polynomial-time algorithm for making inferences of dependencies
when given a set of functional dependencies and unary inclusion dependencies.
CHAPTER 8
Protecting
the
Database
Against
Misuse
There are several dangers from which a DBMS must protect its data:
1. Accidents, such as mistyping of input or programming errors.
2. Malicious use of the database.
3. Hardware or software failures that corrupt data.
Chapters 9 and 10 deal with item (3), as well as with a class of potential
programming errors that are caused by concurrent access to the data by several
processes. In this chapter we cover the DBMS components that handle the first
two problems.
1. Integrity preservation. This component of a DBMS deals with nonmalicious
data errors and their prevention. For example, it is reasonable to expect a
DBMS to provide facilities for declaring that the value of a field AGE should
be less than 150. The DBMS can also help detect some programming bugs,
such as a procedure that inserts a record with the same key value as a record
that already exists in the database.
2. Security (or access contra/). Here we are concerned primarily with restrict
ing certain users so they are allowed to access and/or modify only a subset
of the database. It might appear that any attempt on the part of a user to
access a restricted portion of the database would be malicious, but in fact
a programming error could as well cause the attempted access to restricted
data.
In this chapter, we give some general principles and some simple examples
of how integrity constraints and access control are handled in existing database
446
8.1 INTEGRITY
447
systems. Sections 8.1 and 8.3 cover integrity and security, respectively, from a
general perspective. Section 8.2 discusses integrity in Query-by-Example. In
the last three sections we cover three examples of security mechanisms: Queryby-Example, SQL, and OPAL.
8.1 INTEGRITY
There are two essentially different kinds of constraints we would like a DBMS to
enforce. As discussed at the beginning of Chapter 7, one type is structural, con
cerning only equalities among values in the database. By far the most prevalent
instances of such constraints are what we there called functional dependencies.
Many, but not all, functional dependencies can be expressed if the DBMS allows
the user to declare that a set of fields or attributes forms a key for a record
type or relation.
The need to express functional dependencies is not restricted to relational
systems, nor do all relational systems have such a facility, explicitly. For exam
ple, the hierarchical system IMS allows the user to declare one field of a logical
record type to be "unique," meaning that it serves as a key for that type. A
unique field in the root record type serves as a key for database records, as
well as for records of the root type. Also, the unique field for any record type,
together with the unique fields for all of its ancestor record types, will serve as
a key for that record type.
The second kind of integrity constraint concerns the actual values stored
in the database. Typically, these constraints restrict the value of a field to
some range or express some arithmetic relationship among various fields. For
example, a credit union might expect that the sum of the BALANCE field,
taken over all members of the credit union, equals the net assets of the union.
As another example, if the record for a course contained fields E%, H%, and
L%, indicating the percentage of the grade devoted to exams, homework, and
labs, we would expect that in each such record the sum of the values in these
fields is 100. This is the kind of integrity constraint we shall discuss here.
There are two important issues regarding integrity checking. First we dis
cuss the way constraints can be expressed, and we show how taking "derivatives"
of integrity constraints can often lead to an efficient way to perform the checks.
Second, we discuss how the system can determine when integrity checks need
to be made, and we illustrate with the DBTG approach one way the user can
control such checks.
448
language. In this section, we shall consider the matter abstractly, using rela
tional algebra as our constraint language.
Example 8.1: Referring to the YVCB database again, we could write as the
containment of two queries the constraint that orders can only be entered if
placed by people who are customers, i.e., those listed in the CUSTOMERS
relation. These queries can be expressed in any notation; we shall use relational
algebra as an example, and write
ircusT(ORDERS)C 7rCNAME(CUSTOMERS)
(8.1)
D
8.1 INTEGRITY
449
The rules for taking the "derivative" of monotone expressions are given below.
It should be understood that AF, the "change in the value of expression E,"
is really an upper bound on that change, because of the effects of projection,
which we shall discuss in the proof of Theorem 8.1, and of union. Suppose
we have a database with relations H\. . . . . H,,. and we insert into each relation
Ri the set of tuples A/Zj (which may be empty for some i's). Let E be an
expression of relational algebra involving the operations x, U, a, and ir. Then
A /-;. a set of tuples that includes all of those tuples that were not in the value
of E before the A/Y,"s were inserted into the flj's, but are in the value of E
afterward, is defined by:
1. If F is a constant relation, then AF = 0.
2. If E is a relation variable, say Ri, then AF = Aflj.
3. If E = ffF(E\) then AF = <TF(&Ei).
4. If F = 7r!,(F1) then AE = 7rL(A;1).
5. If E = E\\JE2 then AE = AFiUA^.
6. If E = EI x E2 then AF = (E\ x AF2) U (A^i x F2) U (AFi x AF2).
The same rules apply if the A/I', "s are deletions from each of the fij's.
Then, AE is an upper bound on the set of tuples deleted from E. However, if
we want to develop rules that handle combinations of insertions and deletions
at the same tune, then we have much of the complexity that we face when we
consider set difference with insertions only.
Fortunately, if we are only concerned with checking integrity constraints
expressible in monotone relational algebra, then deletions cannot contribute
to violations of the constraints. If E is an integrity constraint, a function of
database relations Ri,...,Rn, we have only to compute A.E, by the above
rules, and check that this relation is empty. As we mentioned, A / is only an
upper bound on the set of tuples that newly appear in the value of expression
450
F. However, as we shall show in Theorem 8.1, any tuples in AF that are not
newly inserted are tuples that were in the relation denoted by F even before
insertion, and thus these are violations of the integrity constraint anyway.
Example 8.3: The rule body (8.2) is easily seen equivalent to the relational
algebra expression
E = aB<0(CUSTOMERS(C, A, B))
Then by rule (3),
AF = <7B<o(ACUSTOMERS(C',,4,B))
That is, if we insert a new customer or customers (the members of the set of
tuples ACUSTOMERS), the change in the expression F, which represents the
violations of the integrity constraint (8.2), is computed by applying the selection
for B < 0 to the inserted tuple or tuples.
As another, abstract example, consider
F = R(A, B) ixj S(B, C) = 7n,2,4(<7$2=s3(/Z x 5))
Then
AF = 7r1,2,4(<7$2=$3((fl x A5) U (Afl x 5) U (Afl x A5))) =
(R 1X3 A5) U (Afl txj S) U (Afl ixj A5)
Note that the above steps did not depend on the particular attributes of R and
5, and therefore illustrate the fact that natural join also behaves like multipli
cation as far as the taking of "derivatives" is concerned. D
Theorem 8.1: If the A//,"s above are sets of insertions, then AF contains all
of the new tuples in the relation produced by expression F; if the AAj's are
sets of deletions, then AF contains all of the tuples that are no longer in F.
In each case, there can be some tuples in AF that are not inserted into (resp.
deleted from) F, but these tuples are in F both before and after the insertions
(resp. deletions).
Proof: The proof is an induction on the number of operators in the expression
F. We shall do only the case of insertions and rule (4), the projection rule. The
basis and the remaining cases of the induction are left as an exercise.
Suppose that F = ?r,(F) [F is E\ in rule (4)], and the values of F and
F before the insertion are E0id and Fojd; after insertion their values are Enew
and Fnew. Then by the inductive hypothesis, Fnew F0id C AF, and the
tuples in AF that are not in Fnew Fgid are in both Fnew and FQ/J. Put in
an algebraically equivalent way, AF C Fnew. Now the set of tuples in /:',.
that are not in F0|d is 7rt(Fnetu) 7rt(FoJd). Call this set 5; we must show that
5 C AF = 7r,(AF) and that tuples in AF - 5 are in E0id (and therefore in
both Enew and F0<d, since we assume only insertions are made).
8.1 INTEGRITY
451
452
check that the customer name in each inserted order is already a customer in the
CUSTOMERS relation. Note that some of the customers placing new orders
may already have orders on record, so they are not really "new" customers;
they are in AU, but not in Enew E0id, using the notation found in the proof
of Theorem 8.1. D
The cases where E or F are not monotone and where these expressions
share relations as operands are harder. We leave it as an exercise to develop
useful bounds on the set of tuples that must be checked.
Controlling the Time of Integrity Checks
Instead of trying to check automatically only those integrity checks that could
be violated when an insertion or deletion is made, many DBMS's simply allow
the user to execute a checking program that is triggered by certain events that
the user declares to be triggering events, such as insertion into, or deletion from,
a given relation. The general idea is that the integrity constraints are allowed
to function as high-level "interrupts," like ON conditions in PL/I.
For example, the DBTG proposal allows ON clauses of the form
ON <command list> CALL <procedure>
in the declaration of DBTG sets and record types. For a DBTG set, the
<command list> may include any of INSERT, REMOVE, and FIND. The
<procedure> is an arbitrary routine written in the DBTG data manipulation
language, which is an extension of COBOL, and thus has full computing capa
bility as well as the ability to access any part of the database. For example, if
we declare for DBTG set S:
ON INSERT CALL P1
the procedure PI could check that certain fields of the current of run-unit, which
is the member record being inserted, are not already present in the selected set
occurrence. Thus, these fields, plus a key for the owner record type, functionally
determine the rest of the fields of the member type.
The <command list> for an ON clause in a record type declaration can
include any of the above three commands that are permitted in DBTG set
declarations and also the remaining four: STORE, DELETE, MODIFY, and
GET. Such an ON clause is triggered whenever a command in the list is executed
and the current of run-unit is of the relevant record type.
8.2 INTEGRITY CONSTRAINTS IN QUERY-BY-EXAMPLE
To demonstrate how the ideas of the previous section can be put into practice,
we shall discuss integrity in the Query-by-Example system in detail. First, if
we review Section 4.5, we note that when a relation is declared in QBE, we are
allowed to specify whether each field is key or nonkey. The system then enforces
453
the functional dependency of each nonkey field on the set of key fields taken
together. This integrity check is triggered on each insertion or modification
of a tuple in the relation, and operations that would cause a violation of the
dependency are not done; rather, a warning is printed.
The QBE system maintains a constraint table for each relation. To create
a constraint on relation R, we call for a table skeleton for R. We enter one or
more rows representing the constraints into the skeleton. Below the relation
name we enter
I. CONSTR(<condition list . I.
The first I. refers to the constraint itself and the second I. to the entries
defining the constraint, which are in the portion of the row that follows to
the right. The <condition list> can consist of any or all of I. (insert), D.
(delete), U. (update), and identifiers that represent user defined conditions, to
be described subsequently. The terms in the <condition list> indicate when
the integrity constraint is to be tested; for example, CONSTR(I. ,U.) . tells
us to test the constraint whenever an insertion or modification occurs in the
relevant relation. CONSTR. is short for CONSTR(I. ,D. ,U.). In principle, the
constraint applies to all of the tuples in the relation. However, for many simple
constraints, the system can deduce that only the tuple inserted or modified
needs to be checked, as we discussed in Section 8.1.
In the rows of the skeleton, we place entries for some or all of the attributes.
An entry may be a constant, which says the tuple being inserted, deleted, or
modified must have that constant value for that attribute, or the constraint
does not apply. An entry may be of the form Oc, where c is a constant and 0 an
arithmetic comparison, which says that the corresponding component of a tuple
must stand in relation 0 to c, whenever the constraint applies to the tuple. An
entry can be blank or have a variable name beginning with underscore, which
means the tuple can be arbitrary in that attribute. Moreover, there can be
additional rows entered in the skeleton for R or in another skeleton; these rows
place additional constraints on the values that may appear in the tuple being
inserted, deleted, or modified, according to the semantics of the QBE language.
Example 8.5: Let us once more consider the YVCB database. To place the
constraint on balances that no one owe more than 100 dollars, we could call for
a CUSTOMERS skeleton and enter
CUSTOMERS
I. CONSTRU..U.). I.
NAME
ADDR
BALANCE
>= -100
454
a value for Jiotdog equal to the value of the ITEM attribute in the inserted
tuple, must be such that some tuple in the SUPPLIES relation has that value
for its ITEM attribute. D
INCLUDES
O#
I. CONSTRd.). I.
SUPPLIES
ITEM
QUANTITY
Jiotdog
NAME
ITEM
PRICE
Jiotdog
Figure 8.1 Constraint that orders may only include supplied items.
NAME
Zack Zebra
ADDR
BALANCE
> -50
The first row indicates that there is a defined trigger called ZZlim that is "trig
gered" whenever we modify or insert a tuple for Zebra. The second row says
that if the CUSTOMERS tuple for Zebra is inserted or modified, check that his
new balance is not lower than 49.99. The tuples for other members are not
affected by this constraint. D
Old-New Constraints
Sometimes one wishes to constrain updates in such a way that there is a re
lationship between the old and new values for certain attributes. We include
455
in the constraint specification a line representing the old tuple as well as the
constraint tuple itself. Often the QBE language allows the relationship between
the old and new tuples to be expressed in the tuples themselves, but if not, a
condition box can be used.
Example 8.7: To create a constraint that a supplier cannot raise the price of
Brie we enter:
SUPPLIES
I. CONSTR(U.). I.
I.
NAME
_bmw
-bmv
ITEM
Brie
Brie
PRICE
<= -P
-P
The row with the keyword CONSTR . represents the new value, and the other row
represents the old value. The presence of I . in the latter row distinguishes the
old-new type of constraints from a general constraint requiring more than one
row to express, as in the second part of Example 8.5. The presence of variable
-bmw in both rows is necessary, or else we would only check that the new price
for the supplier involved in the change is less than the price charged for Brie
by at least one other supplier. D
Timing of Constraint Enforcement
The QBE system allows one to enter an entire screenful of commands at once,
and this collection of commands may include several insertions, deletions, or
updates. It is important to note that integrity constraints are not checked as
each command in the collection is executed, but only after all of the commands
in the collection are executed. This feature allows us certain freedoms in the
order in which we specify commands, as long as the commands are entered
together.
Thus, in Example 8.5 we constrained our YVCB database in such a way
that we could not place an order for an item not supplied. If we enter as
one "screenload" an order for Goat Cheese and fact that Acme now sells Goat
Cheese, we would not violate the constraint. However, if the system entered
the orders and checked the integrity constraints before entering the new supply
information, we would have had an integrity violation.
The Constraint Table
All integrity constraints declared are available to the user. We can print the
constraints pertaining to a relation R if we enter
P. CONSTR. P.
under the relation name in a skeleton for R. Alternatively, we could print only
the constraints of specified type; for example
456
P. CONSTRU.).
P.
8.3 SECURITY
Many of the problems associated with security are not unique to database sys
tems, but must be faced by the designer of an operating system, for exam
ple. Therefore, let us touch on some of the techniques common to security
for database systems and more general systems, and then turn to some of the
specialized problems and techniques germane to existing database systems.
1. User identification. Generally, different users are accorded different rights
to different databases or different portions of the database, such as particu
lar relations or attributes. These rights may include the reading of portions
of the database, and the insertion, deletion, or modification of data. The
most common scheme to identify users is a password known only to the
system and the individual. Presumably, the passwords are protected by
the system at least as well as the data, although to be realistic, guarantees
or proofs of security are nonexistent.
2. Physical Protection. A completely reliable protection scheme must take
into account the possibility of physical attacks on the database, ranging
from forced disclosure of a password to theft of the physical storage de
vices. We can protect against theft fairly well by encrypting the data. A
high security system needs better identification than a password, such as
personal recognition of the user by a guard.
3. Maintenance and Transmittal of Rights. The system needs to maintain a
list of rights enjoyed by each user on each protected portion of the database.
One of these rights may be the right to confer rights on others. For exam
ple, the DBTG proposal calls for DBTG sets, record types, and "areas"
(essentially regions of the physical memory) to be protectable; the mech
anism could be a password for each protected object. The proposal does
not call for a table of user rights to protected objects, and transmission of
rights can be handled outside the system, by informing users of passwords,
for example. Both System R and the Query-by-Example System (to be
discussed further in Section 8.4) maintain a table of rights and permit the
granting of rights to others.
8.3 SECURITY
457
458
D
Quel and QBE follow this general approach; we shall discuss Query-byExample's security mechanism in detail in the next section. The DBTG pro
posal allows the "privacy lock" for a protectable object to be an arbitrary pro
cedure, so we are able to implement arbitrary checks, expressed in the DBTG
data manipulation language, for granting or denying a request to access a pro
tected object. For example, we could check that NAME = "Acme" in every
tuple retrieved.
8.4 SECURITY IN QUERY-BY-EXAMPLE
The QBE system recognizes the four rights: insert (I.), delete (D.), update
(U.), and read (P., for "print"). To confer one or more rights to a relation R
upon a person or group of people, the owner of relation R enters a tuple in an
R skeleton. Under the relation name R appears the entry
I. AUTR(<list. <name> I.
where <list> is a list of one or more of the four rights, I., D., U., and P.;
<name> is either the name of the person being given the rights or a variable,
representing an arbitrary person. We may omit (<list>) if we intend to grant
459
all four rights, and we may omit <name> if we wish to grant a set of rights to
all users.
To complete the row with the AUTR. keyword, we enter variables or con
stants in some or all of the columns for the attributes. A variable indicates
that the right applies to the column. A constant indicates the right applies
only to tuples with that constant value in that column. A blank indicates that
the column cannot be accessed. Note that this rule differs from the general
QBE policy that blanks are synonymous with variables mentioned only once.
The full power of the QBE language can be brought to bear to refine the set of
tuples in the relation R to which the right is granted. For example, we can use
condition boxes to constrain the values of variables, and we can add additional
rows that also restrict values of variables.
Example 8.9: Let us again use the YVCB database as an example. To give
user Zebra the right to read the ORDERS relation we say
ORDERS
I. AUTR(P.). Zebra I.
O#
_n
DATE
CUST
_d
_c
To grant Zebra all four access rights to the ORDERS relation we can write
ORDERS
I. AUTR. Zebra I.
O#
_n
DATE
CUST
_d
_c
To give anyone the right to read names and balances (but not addresses) from
the CUSTOMERS relation, provided the balance is nonnegative, we say
CUSTOMERS
NAME
I. AUTR(P.). -Snake I.
_n
ADDR
BALANCE
>= 0
460
INCLUDES
O#
ITEM
I. AUTR(P.). -Snake I.
_n
Jiotdog
SUPPLIES
NAME
ITEM
Acme
Jiotdog
QUANTITY
PRICE
Figure 8.2 Anyone may read order numbers for items supplied by Acme.
to the representation of his name as it appears in the database.
Example 8.10: We can give everyone authorization to read only his own
balance by:
CUSTOMERS
I. AUTR(P.). _Snake I.
NAME
-Snake
ADDR
BALANCE
_b
D
The Authorization Table
As for integrity constraints, all AUTR. statements are placed in a table. From
this table we can print the rights granted to an individual concerning a rela
tion, or all grants concerning a relation, in much the same manner as we print
integrity constraints. Similarly, the owner of a relation can delete rights from
the table concerning that relation.
8.5 SECURITY IN SQL/RT
The version of SQL for the IBM PC/RT, which was described in Sections 4.64.8, uses a very simple security mechanism. This simplicity is appropriate for a
system running on a computer that is in essence a large personal computer, to
be shared by a few people at most. The SQL database system runs under the
AIX operating system, which is essentially UNIX. Thus, SQL is able to make
use of the protection facilities that UNIX provides for files.
UNIX divides the world, as far as access to a file is concerned, into three
parts: the owner of the file, the "group" to which the owner belongs, and the
rest of the world. Of these, only the notion of a group requires explanation.
There is the underlying assumption that users are divided into groups, and the
privileges the owner assigns to "group" are available only to those users who
are in the same group as the owner. The privileges that the owner may grant
461
or withold from himself, his group, or the world are read, write, and execute;
the latter is not relevant when access to a database is concerned.
To grant an access privilege to a relation ft, the owner says one of the six
combinations of:
GRANT READ/WRITE/ALL ON R TO GROUP/WORLD
The possible privileges are READ and WRITE; ALL stands for both of these privi
leges. The privilege of writing includes inserting, deleting, and modifying tuples,
as well as other operations such as index creation for the relation, or dropping
the relation itself. The read privilege includes only the right to use the relation
in queries. The owner is assumed to have all privileges, so there is no need to
grant them explicitly.
To cancel a privilege, say
REVOKE READ/WRITE/ALL ON R FROM GROUP/WORLD
Privileges may be granted and revoked for views as well as for relations,
and we need to use views if we are to allow anything more refined than all-ornothing access to relations. The ability to exercise the write privilege on a view
is limited, because there are many views we can express in SQL for which ^no
natural translation from the change in the view to the appropriate change in
the database exists. SQL/RT permits modification of views only when the view
is obtained from a single relation by selection and projection. When projection
is involved in the view, we can modify the underlying relation in response to
an insertion into the view by padding the inserted tuple with nulls in those
attributes not found in the view.
Example 8.11: Louise Ledger, manager of the accounting department at the
YVCB, is the owner of the CUSTOMERS relation. Other employees in the
accounting department are members of the same group as Ledger, and other
employees of the YVCB are not. It is desired that:
1.
2.
3.
462
When we insert into PUBLIC-CUST, the new tuple is given a null BAL
ANCE. Deletion from this view is performed by deleting all tuples in CUS
TOMERS with the same name and address; there should be only one. Modi
fications are similarly reflected by modification to all matching tuples of CUS
TOMERS.
To grant the proper accesses to the accounting group and to all the users
of the database, Ledger should issue the following SQL commands:
GRANT READ ON CUSTOMERS TO GROUP;
GRANT READ ON PUBLIC-CUST TO WORLD;
GRANT WRITE ON PUBLIC-CUST TO GROUP;
D
8.6 SECURITY IN OPAL/GEMSTONE
The Opal language discussed in Sections 5.6 and 5.7 is part of the Gemstone
object-oriented database system. Security issues for Gemstone are addressed
through built-in objects and methods of Opal.
The basic unit to which access can be granted or denied is called a segment.
All objects created are assigned to a segment. In the simplest situation, each
user has one segment, containing all of his objects, and there are several owned
by the system itself. However, it is possible for users to have more than one
segment. For example, to control access on a relation-by-relation or view-byview basis, as we did in the previous section, we would have to divide objects
into segments according to the "relation" to which they belong.
There are three authorizations understood by Gemstone, and they are
represented by the Opal symbols #read, #write, and Onone.1 Their meanings
should be obvious. The privilege to write includes the privilege to read, and
the #none privilege denies all access to the protected segment.
User Profiles
There is an object called System that can be sent messages of various types,
some involving security. One of the messages System understands is
System myUserProfile
which returns an object called the user profile, the profile belonging to the user
who sends System the message.
We may send to the user profile object certain messages to read or change
some of the facts about the status of the user. One of these facts concerns the
default segment, which is the "current" segment, the one into which objects
1 Recall from Section 5.6 that "symbols," indicated by the leading #, are essentially
internal representations for character strings.
463
newly created by this user would be placed. If we send the user profile the
message
defaultSegment
we are returned the default segment as an object. We can then send this object
messages that read or modify authorizations to that segment. For each segment,
we may specify an authorization for the owner, for up to four groups, and for
the "world." The forms of the messages are illustrated in the next example.
Example 8.12: A user can give himself write authorization, the most general
authorization that Gemstone uses, for his own segment by sending the following
messages.2
(1) ((System myUserProfile)
(2)
defaultSegment)
(3)
ownerAuthorization: #write.
That is, line (1) produces the user profile object as a result. The message sent
this object on line (2) produces the default segment object as a result. On
line (3), this segment is sent a message that gives the owner of that segment
authorization to write into the segment.
Generally, it is not possible to send the
ownerAuthorization
message, or similar messages, to any segment but one's own.
To authorize the accounting group (represented by the symbol #accounting) to read his default segment, a user may say:
( (System myUserProfile)
defaultSegment)
group: ^accounting
authorization: Oread.
Finally, to deny either read or write access to this user's default segment
by users not in the accounting group, he can say:
((System myUserProfile)
defaultSegment)
worldAuthorization: tfnone.
D
Privileges
There are certain activities that require a higher degree of protection than is
afforded, through the authorization mechanism, to the reading and writing of
In all the following messages, the parentheses are redundant, because messages are
applied left-to-right.
464
objects. These activities include shutting down the system, reading or changing
(other people's) passwords, creating new segments, and changing authorization
for segments belonging to others.
To provide the necessary security, certain messages can only be sent by
users whose profiles explicitly contain the corresponding privilege. For example,
the privilege
SegmentProtection
allows a user whose profile contains this privilege to change the authorization
on other users' segments. Thus, if Profile is a variable whose current value is
the user profile of user .A, then user B might send the message
Profile worldAuthorization: #write.
If the SegmentProtection privilege appears in B's user profile, then this action
will be taken, and all objects in A's defualt segment will become publicly read
able and writable. If B does not have this privilege, the message will not be
accepted. Of course, A may send the same message to his own profile without
any special privilege.
Curiously, the "privilege" of adding privileges is not itself protected by the
privilege mechanism. In principle, any user could send to Profile the message
Profile addPrivilege : 'SegmentProtection' .
and gain the SegmentProtection privilege for the user whose profile was the
current value of Profile. The normal way to prevent this situation is to store
the user profiles themselves in a segment owned by the "data curator," who
is thus the only one who can send such messages legally, as long as writeauthorization for this segment is withheld from any other users.
EXERCISES
EXERCISES
465
8.4: Complete the proof of Theorem 8.1 by considering the operators selection,
union, and product, and by considering the situation in which the changes
to the argument relations are deletions rather than insertions.
* 8.5: Explain how to check whether E C F holds after insertions to argument
relations of expressions E and F, when
a) E and F share arguments.
b) E and F involve the set difference operator.
8.6: Suppose we have a database consisting of the following relations.
EMPS(EMP_NO, NAME, ADDR, SALARY, DEPT_NO)
DEPTS(DEPTJSTO, DNAME, MANAGER)
*
** 8.7:
8.8:
8.9:
8.10:
466
* c)
BIBLIOGRAPHIC NOTES
Fernandez, Summers, and Wood [1980] is a survey of database security and
integrity.
Integrity
The general idea of integrity constraints through query modification is from
Stonebraker [1975].
The Query-by-Example integrity subsystem discussed in Section 8.2 is
based on Zloof [1978]. This mechanism did not appear in the commercial QBE
discussed in IBM [1978a].
Authorization
CHAPTER 9
Transaction
Management
Until now, our concept of a database has been one in which programs accessing
the database are run one at a time (serially). Often this is indeed the case.
However, there are also numerous applications in which more than one program,
or different executions of the same program, run simultaneously (concurrently).
An example is an airline reservation system, where at one time, several agents
may be selling tickets, and therefore, changing lists of passengers and counts
of available seats. The canonical problem is that if we are not careful when
we allow two or more processes to access the database, we could sell the same
seat twice. In the reservations system, two processes that read and change the
value of the same object must not be allowed to run concurrently, because they
might interact in undesirable ways.
A second example is a statistical database, such as census data, where
many people may be querying the database at once. Here, as long as no one
is changing the data, we do not really care in what order the processes read
data; we can let the operating system schedule simultaneous read requests as
it wishes. In this sort of situation, where only reading is being done, we want
to allow maximum concurrent operation, so time can be saved. For contrast, in
the case of a reservation system, where both reading and writing are in progress,
we need restrictions on when two programs may execute concurrently, and we
should be willing to trade speed for safety.
In this chapter we shall consider models of concurrent processes as they
pertain to database operation. The models are distinguished primarily by the
detail in which they portray access to elements of the database. For each model
we shall describe a reasonable way to allow those concurrent operations that
preserve the integrity of the database while preventing concurrent operations
that might, as far as a model of limited detail can tell, destroy its integrity. As
a rule, the more detailed the model, the more concurrency we can allow safely.
Section 9.1 introduces most of the necessary concepts, including "locking,"
the primary technique for controlling concurrency. In Section 9.2 we discuss the
467
468
TRANSACTION MANAGEMENT
simplest model of transactions. That model leads to a discussion of the "twophase locking protocol" in Section 9.3; that protocol is the most important
technique for managing concurrency. Sections 9.4 and 9.6 discuss more realistic
models, where reading and writing are treated as distinct operations. Section
9.5 talks about "lock modes" in general; reading and writing are the most
common "modes." Access to tree-structured data is covered in Section 9.7.
In Section 9.8 we begin to discuss how the theory must be modified to
account for the possibility that software or hardware failures may occur, and
in the following section we consider what options exist for containing the effect
of an error. Section 9.10 discusses logging and other mechanisms for avoiding
the loss of data after a system error. Finally, Section 9.11 discusses all of these
issues in the context of "timestamps," which after locking, is the most common
approach to concurrency control.
Atomicity
To a large extent, transaction management can be seen as an attempt to make
complex operations appear atomic. That is, they either occur in their entirety or
do not occur at all, and if they occur, nothing else apparently went on during
the time of their occurrence. The normal approach to ensuring atomicity of
transactions is "serialization," to be discussed shortly, which forces transactions
to run concurrently in a way that makes it appear that they ran one-at-a-time
(serially). There are two principal reasons why a transaction might not be
atomic.
1. In a time-shared system, activities associated with two or more transactions
might be done simultaneously or be interleaved. For example, several disk
units might be reading or writing data to and from the database at the
same time. The time slice for one transaction T might end in the middle of
a computation, and activities of some other transaction performed before
469
T completes.
A transaction might not complete at all. For example, it could have to
abort (terminate) because it tried to perform an illegal calculation (e.g.,
division by 0), or because it requested some data to which it did not have
the needed access privilege. The database system itself could force the
transaction to abort for several reasons, which we shall discuss. For exam
ple, it could be involved in a deadlock, contending for resources.
In case (1), it is the job of the database system to ensure that, even though
things happen in the middle of a transaction, the effect of the transaction on
the database is not influenced by those interstitial activities. In case (2), the
system must ensure that the aborted transaction has no effect at all on the
database or on other transactions.
In reality, transactions are sequences of more elementary steps, such as
reading or writing of single items from the database, and performing simple
arithmetic steps in the workspace. We shall see that when concurrency control
is provided, other primitive steps are also needed, steps which set and release
locks, commit (complete) transactions, and perhaps others. We shall always
assume that these more primitive steps are themselves atomic. Even though,
for example, the end of a time slice could occur in the middle of an arithmetic
step, we may, in practice, view that step as atomic, because it occurs in a
local workspace, and nothing can affect that workspace until the transaction
performing the arithmetic step resumes.
2.
Items
To manage concurrency, the database must be partitioned into items, which are
the units of data to which access is controlled. The nature and size of items are
for the system designer to choose. In the relational model of data, for example,
we could choose large items, like relations, or small items like individual tuples
or even components of tuples. We could also choose an intermediate size item,
such as a block of the underlying file system, on which some small number
of tuples are stored. The size of items used by a system is often called its
granularity. A "fine-grained" system uses small items and a "coarse-grained"
one uses large items.
The most common way in which access to items is controlled is by "locks,"
which we discuss shortly. Briefly, a lock manager is the part of a DBMS that
records, for each item /, whether one or more transactions are reading or writing
any part of /. If so, the manager will forbid another transaction from gaining
access to /, provided the type of access (read or write) could cause a conflict,
such as the duplicate selling of an airline seat.1
1 Reading and writing are the most common types of access, but we shall see in Section
9.5 that other kinds of access can be controlled by other "lock modes." as well.
470
TRANSACTION MANAGEMENT
471
item, insert lock records, and delete lock records, this or a similar data structure
will allow efficient management of locks.
How Locks Control Concurrency
To see the need for using locks (or a similar mechanism) when transactions
execute in parallel, consider the following example.
Example 9.1: Let us consider two transactions 7\ and T2. Each accesses an
item A, which we assume has an integer value, and adds one to A. The two
transactions are executions of the program P defined by:
P: READ A; A:=A+1; WRITE A;
The value of A exists in the database. P reads A into its workspace, adds one to
the value in the workspace, and writes the result into the database. In Figure
9.1 we see the two transactions executing in an interleaved fashion,2 and we
record the value of A as it appears in the database at each step.
Am
database
TI:
5555
READ A
T2:
A:=A+1
READ A
A in TI'S
workspace
A in T2's
workspace
6
WRITE A
A:=A+1 WRITE A
472
TRANSACTION MANAGEMENT
Let us now consider programs that interact with the database not only
by reading and writing items but by locking and unlocking them. We assume
that a lock must be placed on an item before reading or writing it, and that the
operation of locking acts as a synchronization primitive. That is, if a transaction
tries to lock an already locked item, the transaction may not proceed until the
lock is released by an unlock command, which is executed by the transaction
holding the lock. We assume that each transaction will unlock any item it locks,
eventually.3 A schedule of the elementary steps of two or more transactions,
such that the above rules regarding locks are obeyed, is termed legal.
Example 9.2: The program P of Example 9.1 could be written with locks as
P: LOCK A; READ A; A:=A+1; WRITE A; UNLOCK A;
Suppose again that T\ and T2 are two executions of P. If T\ begins first, it
requests a lock on A. Assuming no other transaction has locked A, the lock
manager grants this lock. Now TI, and only T\ can access A. If T2 begins
before TI finishes, then when T2 tries to execute LOCK A, the system causes T2
to wait. Only when TI executes UNLOCK A will the system allow T2 to proceed.
As a result, the anomaly indicated in Example 9.1 cannot occur; either T\ or
T2 executes completely before the other starts, and their combined effect is to
add 2 to A. D
Livelock
The system that grants and enforces locks on items cannot behave capriciously,
or certain undesirable phenomena occur. As an instance, we assumed in Exam
ple 9.2 that when TI released its lock on A, the lock was granted to T2. What
if while TI was waiting, a transaction T3 also requested a lock on A, and T3
was granted the lock before T2? Then while TS had the lock on A, T4 requested
a lock on A, which was granted after T3 unlocked A, and so on. Evidently, it
is possible that T2 could wait forever, while some other transaction always had
a lock on A, even though there are an unlimited number of times at which T2
might have been given a chance to lock A,
Such a condition is called livelock. It is a problem that occurs potentially
in any environment where processes execute concurrently. A variety of solutions
have been proposed by designers of operating systems, and we shall not discuss
the subject here, as it does not pertain solely to database systems. A simple
way to avoid livelock is for the system granting locks to record all requests that
are not granted immediately, and when an item A is unlocked, grant a lock on
A to the transaction that requested it first, among all those waiting to lock A.
3 Strictly speaking, since some transactions will abort before completing, the system itself
must take responsibility for releasing locks held by aborted transactions.
473
Deadlock
There is a more serious problem of concurrent processing that can occur if we
are not careful. This problem, called "deadlock," can best be illustrated by an
example.
Example 9.3: Suppose we have two transactions 7\ and T2 whose significant
actions, as far as concurrent processing is concerned are:
TI: LOCK A; LOCK B; UNLOCK A; UNLOCK B;
T2: LOCK B; LOCK A; UNLOCK B; UNLOCK A;
Presumably TI and T2 do something with A and B, but what they do is not
important here. Suppose T\ and T2 begin execution at about the same time.
TI requests and is granted a lock on A, and T2 requests and is granted a lock
on B. Then TI requests a lock on B, and is forced to wait because T2 has a
lock on that item. Similarly, T2 requests a lock on A and must wait for TI to
unlock A. Thus neither transaction can proceed; each is waiting for the other
to unlock a needed item, so both TI and T2 wait forever. D
A situation in which each member of a set S of two or more transactions
is waiting to lock an item currently locked by some other transaction in the set
S is called a deadlock. Since each transaction in 5 is waiting, it cannot unlock
the item some other transaction in S needs to proceed, so all wait forever. Like
livelock, the prevention of deadlock is a subject much studied in the literature of
operating systems and concurrent processing in general. Among the solutions
to deadlock are:
1. Require each transaction to request all its locks at once, and let the lock
manager grant them all, if possible, or grant none and make the process
wait, if one or more are held by another transaction. Notice how this rule
would have prevented the deadlock in Example 9.3. The system would
grant locks on both A and B to TI if that transaction requested first; TI
would complete, and then T2 could have both locks.
2. Assign an arbitrary linear ordering to the items, and require all transactions
to request locks in this order.
Clearly, the first approach prevents deadlock. The second approach does
also, although the reason why may not be obvious. In Example 9.3, suppose
A precedes B in the ordering (there could be other items between A and B in
the ordering). Then T2 would request a lock on A before B and would find A
already locked by TI. T2 would not yet get to lock B, so a lock on B would be
4 Although it may cause "deadlock," to be discussed next.
474
TRANSACTION MANAGEMENT
Serializability of Schedules
Now we come to a concurrency issue of concern primarily to database system
designers, rather than designers of general concurrent systems. By way of intro
duction, let us review Example 9.1, where two transactions executing a program
P each added 1 to A, yet A only increased by 1. Intuitively, we feel this situ
ation is wrong, yet is it not possible that these transactions did exactly what
the writer of P wanted? We argue not, because if we run first TI and then
TZ, we get a different result; 2 is added to A. Since it is always possible that
transactions will execute one at a time (serially), it is reasonable to assume that
the normal, or intended, result of a transaction is the result we obtain when we
execute it with no other transactions executing concurrently. Thus, we shall
assume from here on that the concurrent execution of several transactions is
correct if and only if its effect is the same as that obtained by running the same
transactions serially in some order.
Let us define a schedule for a set of transactions to be an order in which the
elementary steps of the transactions (lock, read, and so on) are done. The steps
of any given transaction must, naturally, appear in the schedule in the same
order that they occur in the program of which the transaction is an execution.
A schedule is serial if all the steps of each transaction occur consecutively. A
schedule is serializable if its effect is equivalent to that of some serial schedule;
we shall make the notion of "equivalent" more precise in the next section.
475
Example 9.4: Let us consider the following two transactions, which might
be part of a bookkeeping operation that transfers funds from one account to
another.
TI: READ A; A:=A-10; WRITE A; READ B; B:=B+10; WRITE B;
T2: READ B; B:=B-20; WRITE B; READ C; C:=O20; WRITE C;
Clearly, any serial schedule has the property that the sum A+B+C is preserved.
In Figure 9.2(a) we see a serial schedule, and in Figure 9.2(b) is a serializable,
but not serial, schedule. Figure 9.2(c) shows a nonserializable schedule. Note
that Figure 9.2(c) causes 10 to be added, rather than subtracted from B as a
net effect, since TI reads B before T2 writes the new value of B. It is possible to
prevent the schedule of Figure 9.2(c) from occurring by having all transactions
lock B before reading it. D
TI
T2
READ A
A: =A- 10
WRITE A
READ B
B:=B+10
WRITE B
TI
READ A
T2
TI
T2
READ B
READ A
A:=A-10
B:=B-20
WRITE A
WRITE B
READ B
READ C
B:=B+10
C:=C+20
WRITE B
A:=A-10
READ B
WRITE A
READ B
B:=B-20
WRITE B
READ C
C:=O20
WRITE C
(a)
B:=B-20
READ B
WRITE B
B:=B+10
READ C
WRITE B
C:=C+20
WRITE C
WRITE C
(b)
(c)
476
TRANSACTION MANAGEMENT
tal" errors, in the sense that we may call a schedule nonserializable, when in
fact it produces the same result as a serial schedule, but we shall never say a
schedule is serializable when in fact it is not (a "fatal" error). Nonfatal errors
may rule out some concurrent operations, and thereby cause the system to run
more slowly than it theoretically could. However, these errors never cause an
incorrect result to be computed, as a fatal error might. Succeeding sections will
use progressively more detailed models that enable us to infer that wider classes
of schedules are serializable, and therefore, to achieve more concurrency while
guaranteeing correctness. We can thus approach, though never reach, the con
dition where every schedule of every collection of transactions is allowed if its
effect happens to be equivalent to some serial schedule and forbidden otherwise.
Schedulers
We have seen that arbitrary transactions can, when executed concurrently, give
rise to livelock, deadlock, and nonserializable behavior. To eliminate these
problems we have two tools, schedulers and protocols. The scheduler is a portion
of the database system that arbitrates between conflicting requests. We saw,
for example, how a first-come, first-serve scheduler can eliminate livelock. A
scheduler can also handle deadlocks and nonserializability by
1. Forcing a given transaction to wait, for example, until a lock it wants is
available, or
2. Telling the transaction to abort and restart.
It might appear that (2) is never desirable, since we lose the cycles that were
spent running the transaction so far. However, forcing many transactions to
wait for long periods may cause too many locks to become unavailable, as wait
ing transactions might already have some locks. That in turn makes deadlock
more likely, and may cause many transactions to delay so long that the effect
becomes noticeable, say to the user standing at an automatic teller machine.
Also, in situations where we already have a deadlock, we often have no choice
but to abort at least one of the transactions involved in the deadlock.
Protocols
Another tool for handling deadlock and nonserializability is to use one or more
protocols, which all transactions must follow. A protocol, in its most general
sense, is simply a restriction on the sequences of atomic steps that a transaction
may perform. For example, the deadlock-avoiding strategy of requesting locks
on items in some fixed order is a protocol. We shall see in Section 9.3 the
importance of the "two-phase locking" protocol, which requires that all needed
locks be obtained by a transaction before it releases any of its locks.
The importance of using a nontrivial protocol (i.e., a protocol more restric
tive than "any sequence is OK" ) will be seen throughout this chapter. We shall
477
see how schedulers that can assume all transactions obey a particular protocol
can be made much simpler than those that cannot make such an assumption.
For example, there are variants of the two-phase locking protocol that allow
a scheduler to guarantee no deadlocks in a simple manner. The overall rela
tionship of the lock manager, scheduler, and protocol is suggested in Figure
9.3
Lock ^ Lock
Table
Manager
i
\
T
Request
Lock
Grant or
Deny Lock
i
Scheduler
/MN
e o o ci
*^
Transactions
Following
Protocol
Request
Lock
Grant Access,
Wait, or Abort
1
TRANSACTION MANAGEMENT
478
LOCK B
LOCK C
UNLOCK B f3(B,C)
LOCK A
UNLOCK C fa(A,B,C)
UNLOCK A f5(A,B,C)
LOCK A
LOCK C
UNLOCK C U(A,C)
UNLOCK A f7(A,C)
479
Example 9.5: In Figure 9.4 we see three transactions and the functions as
sociated with each LOCKUNLOCK pair; the function appears on the same line
as the UNLOCK. For example, /i, associated with A in 7\, takes A and B as
arguments, because these are the items that 7\ reads. Function /3 takes only
B and C as arguments, because T2 unlocks B, and therefore writes its value,
before it locks and reads A.
Figure 9.5 shows a possible schedule of these transactions and the resulting
effect on items A, B, and C. We can observe that this schedule is not serializable. In proof, suppose it were. If T\ precedes T2 in the serial schedule, then
the final value of B would be
rather than
h(A0J3(B0,C0))
If TI precedes T\ , then the final value of A would apply f\ to a subexpression
involving fa. Since the actual final value of A in Figure 9.5 does not apply j\ to
an expression involving /5, we see that TI cannot precede T\ in an equivalent
serial schedule. Since T2 can neither precede nor follow T\ in an equivalent
serial schedule, such a serial schedule does not exist. D
Fatal and Nonfat al Errors
Note how our assumption that functions produce unique values is essential in
the argument used in Example 9.5. For example, if it were possible that
/3(/2(Ao,flb),Cb) = h(A0J3(B0,C0))
(9.1)
then we could not rule out the possibility that T\ precedes Tg. Let us reiterate
that our assumption of unique values is not just for mathematical convenience.
The work required to enable the database system to examine transactions and
detect possibilities such as (9.1), thereby permitting a wider class of schedules
to be regarded as serializable, is not worth the effort in general.
An assumption such as the unavailability of algebraic laws is a discrepancy
in the nonfatal direction, since it can rule out opportunities for concurrency but
cannot lead to a fatal error, where transactions are allowed to execute in parallel
even though their effect is not equivalent to any serial schedule. Similarly, our
assumption that locks imply both reading and writing of an item is a nonfatal
departure from reality. The reader should observe that schedules which are
equivalent under our assumption about locks will still be equivalent if, say, a
transaction locks an item but does not write a new value. We shall consider in
the next sections how relaxing our assumption regarding what happens when
locks are taken allows more schedules to be considered serializable, but still
only calls schedules serializable if in fact they are.
480
TRANSACTION MANAGEMENT
Step
AO
B
Bo
AO
A0
Bo
Bo
A0
fi(Ao,B0)
(11)
(12)
(13)
(14)
T3:
TI:
T3:
T3:
/3(Bo,Co)
/3(Bo,Co)
fs(Bo,Co)
fs(Bo,Co)
/S(BO,CO)
f3(Bo,Co)
fs(Bo,Co)
C
C0
Co
Co
C0
C0
Co
Co
f*(Ao, BO, Co)
U(AQ, BO, CQ)
f*(Ao, BQ,CQ)
Key:
481
or. 02; ;a
where each aj is an action of the form
TJ: LOCK Am or Tf. UNLOCK Am
Tj indicates the transaction to which the step belongs. If , is
TJ: UNLOCK Am
look for the next action ap following Oj that is of the form T,: LOCK Am. If
there is one, and s ^ j, then draw an arc from Tj to Tt. The intuitive meaning
of this arc is that in any serial schedule equivalent to 5, Tj must precede Tt .
If G has a cycle, then 5 is not serializable. If G has no cycles, then find
a linear order for the transactions such that Tj precedes Tj whenever there is
an arc Tj Tj. This ordering can always be done by the process known as
topological sorting, defined as follows. There must be some node Tj with no
entering arcs, else we can prove that G has a cycle. List Tj and remove Ti from
G. Then repeat the process on the remaining graph until no nodes remain. The
order in which the nodes are listed is a serial order for the transactions. D
482
TRANSACTION MANAGEMENT
Example 9.6: Consider the schedule of Figure 9.5. The graph G, shown in
Figure 9.6 has nodes for 7\, T2, and 7V To find the arcs, we look at each
UNLOCK step in Figure 9.5. For example step (4),
T2: UNLOCK B
is followed by TI: LOCK B. In this case, the lock occurs at the next step. We
therefore draw an arc T2 7\. As another example, the action at step (8),
T2: UNLOCK C
is followed at step (11) by T3: LOCK C, and no intervening step locks C. There
fore we draw an arc from T2 to T3. Steps (6) and (7) cause us to place an arc
7\ T2. As there is a cycle, the schedule of Figure 9.5 is not serializable. D
LOCK A
UNLOCK A
time
i
LOCK A
UNLOCK A
LOCK B
UNLOCK B
LOCK B
UNLOCK B
TI
T2
T3
Example 9.7: In Figure 9.7 we see a schedule for three transactions, and
Figure 9.8 shows its serialization graph. As there are no cycles, the schedule of
Figure 9.7 is serializable, and Algorithm 9.1 tells us that the serial order is TI,
TI, TS- It is interesting to note that in the serial order, TI precedes T3, even
though in Figure 9.7, TI did not commence until T3 had finished. D
Theorem 9.1: Algorithm 9.1 correctly determines if a schedule is serializable.
Proof: Suppose G has a cycle
483
precedes the effect of Tjp on A, since there is an arc Tj_, - Tjp, representing
the fact that Tjp uses the value of A produced by Tjp_, . Therefore, in schedule
5, g is applied to an expression involving /, in the formula for A. Thus the
final value of A differs in R and 5, since the two formulas for A are not the
same. We conclude that R and S are not equivalent. As R is an arbitrary serial
schedule, it follows that S is equivalent to no serial schedule.
Conversely, suppose the serialization graph G has no cycles. Define the
depth of a transaction in an acyclic serialization graph to be the length of the
longest path to the node corresponding to that transaction. For example, in
Figure 9.8, T\ has depth 0 and T3 has depth 2. Note that a transaction of depth
d can only read values written by transactions of length less than d.
We can show by induction on d that a transaction T of depth d reads the
same value for each item it locks, both in the given schedule 5 (from which the
serialization graph was constructed) and in the serial schedule R constructed
by Algorithm 9.1. The reason is that if transaction T reads a value of item A,
then in both schedules, the same transaction T' was the last to write A (or in
both schedules T is the first to read A).
Suppose in contradiction that in 5, transaction T reads the value of A
written by T', but in R, it is the value written by T" that T reads. Let
Mi , -Ma, I -Mr
484
TRANSACTION MANAGEMENT
Example 9.8: Let us consider an example of the reasoning behind the second
part of the proof of Theorem 9.1. Suppose a transaction T locks items A and
B, and in a particular schedule 5, item A is locked, in turn, by T\, T2, and then
T, while item B is locked by Tjj, 7\, T*, and T in that order (other transactions
may lock A or B after T does). Figure 9.9 suggests how the values of A and B
are changed, in both S and its equivalent serial schedule R.
Ao
BO
-
-
-
T3
-
-
Ti
7\
-.
-
T2
-
T4
-
-
T
r
485
Proof: Suppose not. Then by Theorem 9.1, the serialization graph G for S
has a cycle,
Then some lock by Tja follows an unlock by T^ ; some lock by Tj3 follows an
unlock by Tj2 , and so on. Finally, some lock by Tj, follows an unlock by Tip .
Therefore, a lock by Tj, follows an unlock by Tj, , contradicting the assumption
that Tj, is two-phase. D
Another way to see why two-phase transactions must be serializable is to
imagine that a two-phase transaction occurs instantaneously at the moment it
obtains the last of its locks (called the lock point). Then if we order transactions
according to the time at which they reach this stage in their lives, the order
must be a serial schedule equivalent to the given schedule. For if in the given
schedule, transaction TI locks some item A before T2 locks A, then T\ must
unlock A before T2 locks A. If TI is two-phase, then surely TI obtains the last
of its locks before T2 obtains the last of its locks, so TI precedes T2 in the serial
order according to lock points. Thus, the order of transactions we constructed
will conform to all the arcs of the serialization graph, and thus, by Theorem
9.1, be an equivalent serial schedule.
486
TRANSACTION MANAGEMENT
LOCK
UNLOCK A
LOCK A
LOCK B
UNLOCK A
UNLOCK B
LOCK B
UNLOCK B
487
488
TRANSACTION MANAGEMENT
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
489
WLOCK A
RLOCK B
UNLOCK A
RLOCK A
UNLOCK B
WLOCK B
RLOCK A
UNLOCK B
WLOCK B
UNLOCK A
UNLOCK A
WLOCK A
UNLOCK B
RLOCK B
UNLOCK A
UNLOCK B
T3
Figure 9.11 A schedule.
490
TRANSACTION MANAGEMENT
alizability. Moreover, we have the same partial converse, that any transaction
in which some UNLOCK precedes a read- or write-lock can be run in a nonserializable way with some other transaction. We leave these results as exercises.
9.5 LOCK MODES
We saw in the previous section that locks can be issued in different "flavors,"
called lock modes, and different modes have different properties when it comes
to deciding whether a lock of one mode can be granted while another transaction
already has a lock of the same or another mode on the same item. In Section 9.4,
the two modes were "read" and "write," and the rules regarding the granting
of locks were:
1. A read-lock can be granted as long as no write-lock is held on the same
item by a different transaction.
2. A write-lock can be granted only if there are no read- or write-locks held
on the same item by a different transaction.
Note that these rules do not apply to the situation where a transaction is
requesting a lock and also holds another lock on the same item.
Lock Compatibility Matrices
We can summarize the rules for read- and write-locks by a lock compatibility
matrix. The rows correspond to the mode of lock being requested, and the
columns correspond to the mode of lock already held by another transaction.
The entries are Y (yes, the requested lock may be granted), and AT (no, it may
not be granted). The lock compatibility matrix for read and write is shown in
Figure 9.13.
Lock Held by
Another Transaction
Read
Write
Read
Write
Lock
Requested
491
Write
Incr
8 It should be noted that T1 may not be the last transaction prior to Tj to lock A in a
mode incompatible with M. Since some !", between them may lock A in a mode that
forbids the granting of M, but is not itself forbidden by a lock in mode L, we must draw
arcs from both T\ and TS to "/"-.. In the read/write case, we could take advantage of the
simplicity of the compatibility matrix for read and write to omit the arc TI Tj if TS
write-locked A; for then we knew arcs TI TS and Ts TI would be present.
TRANSACTION MANAGEMENT
ILOCK A
UNLOCK A
ILOCK A
UNLOCK A
ILOCK B
UNLOCK B
ILOCK B
UNLOCK B
493
that a transaction reads a set of items (the read-set) and writes a set of items
(the write-set), with the option that an item A could appear in either one of
these sets, or both.
Example 9.12: Any transaction that queries a database but does not alter it
has an empty write-set. In the transaction
READ A; READ B; C:=A+B; A:=A-1; WRITE C; WRITE A
the read-set is {A, B} and the write-set is {A,C}. D
Semantics of Transactions and Schedules
Our semantics of transactions differs from the model of Section 9.4 only in one
point. We do not assume that write-locking an item implies that the item is
read. Thus, associated with each write-lock on an item A is a function that
computes a new value for A only in terms of the read-set of the transaction. In
particular, this new value does not depend on the old value of A if A is not in
the read-set.
When attributing semantics to schedules, we shall abandon the requirement
of Section 9.4 that the value of item A read by a transaction is significant,
whether or not that value affects the final value of any item in the database.
Should we care about the values read by a read-only transaction, then we can
modify the transaction to write an imaginary item. Thus, two schedules are
equivalent if and only if they produce the same values for each database item
written, as functions of the initial values of the items read.
Two Notions of Serializability
Following the pattern of Sections 9.2 and 9.4, we should define a schedule to
be "serializable" if it is equivalent to some serial schedule. Unfortunately, this
definition leads to difficulties, such as the fact that a simple graph-theoretic test
does not exist in this model as it did in the previous models. Thus, equivalence
to a serial schedule is considered only one possible definition of "serializability,"
and it is usually referred to as view-serializability. This notion of serializability
will be discussed later in the section.
A more useful definition of serializability is called conflict-serializability.
This notion is based on local checks regarding how pairs and triples of transac
tions treat a single item. We shall now develop the mechanics needed to define
conflict-serializability and to present a graph-theoretic test for this property,
albeit a more complicated test than was needed for previous models.
Serialization Graphs for Read-Only, Write-Only Transactions
When we allow write-only access, we must revise our notion of when one transac
tion is forced to precede another in an equivalent serial schedule. One important
494
TRANSACTION MANAGEMENT
difference is the following. Suppose (in the model of Section 9.4), that in given
schedule S, the transaction T\ wrote a value for item A, and later T2 wrote a
value for A. Then we assumed in Section 9.4 that TI write-locked A after 7\
unlocked A, and by implication, T2 used the value of A written by 7\ in com
puting a new value. Therefore, when dealing with serializability, it was taken
for granted that in a serial schedule R equivalent to 5, TI appears before TI,
and, incidentally, that no other transaction T write-locking A appears between
TI and T2. One gets the latter condition "for free" in Algorithm 9.2, since that
algorithm forced T to appear either before TI or after TI in R, whichever was
the case in 5.
However, if we assume that TI has written its value for A without reading
A, then the new value of A is independent of the old; it depends only on the
values of items actually read by T2. Thus, if between the times that 7\ and T2
write their values of A, no transaction reads A, we see that the value written
by TI "gets lost" and has no effect on the database. As a consequence, in a
serial schedule, we need not have TI appearing before T2 (at least as far as the
effect on A is concerned). In fact, the only requirement on TI is that it be done
at a time when some other transaction TS will later write A, and between the
times that TI and T3 write A, no transaction reads A.
We can now formulate a new definition of a serialization graph, based
on the semantics that the values written by a transaction are functions only
of the values read, and distinct values read produce distinct values written.
The conditions under which one transaction is required to precede another are
stated informally (and not completely accurately) as follows. If in schedule 5,
transaction T2 reads the value of item A written by 7\ , then
1.
2.
495
of the above rules is to rule out the possibility that T2, in (1) and (2) above, is
a useless transaction.7
Testing for Useless Transactions
It is easy, given a schedule S, to tell which transactions are useless. We create
a graph whose nodes are the transactions, including the dummy transaction Tf
assumed to exist at the end of S. If 7\ writes a value read by T2, draw an arc
from TI to T2. Then the useless transactions are exactly those with no path to
Tf. An example of this algorithm follows the discussion of a serializability test.
Conflict-Serializability
The simple serialization graph test of previous sections does not work here.
Recall that there are two types of constraints on a potential serial schedule
equivalent to a given schedule S.
1. Type 1 constraints say that if TI reads a value of A written by TI in 5,
then TI must precede T2 in any serial schedule. This type of constraint
can be expressed graphically by an arc from T\ to T2.
2. Type 2 constraints say that if T2 reads a value of A written by TI in S, then
any TS writing A must appear either before TI or after T2. These cannot
be expressed by a simple arc. Rather, we have a pair of arcs TS - TI and
T2 T3, one of which must be chosen.
The above constraints apply to the dummy initial and final transactions, but
do not apply to useless transactions.
Then schedule 5 is said to be conBict-serializable if there is some serial
schedule that respects all the type 1 and type 2 constraints generated by S. As
we saw in Theorems 9.1 and 9.3, the notions of view- and conflict-serializability
are equivalent in the simpler models of Sections 9.2 and 9.4. We shall, how
ever, see that conflict-serializability implies view-serializability, but not vice
versa, in the present model. There is a relatively easy-to-state test for conflictserializability, which is one reason we prefer this notion, even though it misses
detecting some serializable schedules.
The Polygraph Test for Conflict-Serializability
A collection of nodes, arcs, and pairs of alternative arcs has been termed a
polygraph. A polygraph is acyclic if there is some series of choices of one
arc from each pair that results in an acyclic graph in the ordinary sense. The
obvious conflict-serializability test is to construct the appropriate polygraph and
7 We cannot simply remove useless transactions from 5, since the portion of the system
that schedules transactions cannot know that it is scheduling a transaction that will
later prove to be useless.
4%
TRANSACTION MANAGEMENT
1.
497
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
RLOCK A
RLOCK A
WLOCK C
UNLOCK C
RLOCK C
WLOCK B
UNLOCK B
RLOCK B
UNLOCK A
UNLOCK A
WLOCK A
RLOCK C
WLOCK D
UNLOCK B
UNLOCK C
RLOCK B
UNLOCK A
WLOCK A
UNLOCK B
WLOCK B
UNLOCK B
UNLOCK D
UNLOCK C
UNLOCK A
T2
T3
write-lock simultaneously. Thus, we may assume all reading and writing occurs
at the time the lock is obtained, and we may ignore the UNLOCK steps.
Let us consider each read-lock step in turn. The read-locks on A at steps
(1) and (2) read the value "written" by the dummy transaction TQ. Thus, we
draw arcs from T0 to TI and T2. At step (5) T3 reads the value of C written
by TI at step (3), so we have arc 7\ - T3. At step (8), T4 reads what 7\
wrote at step (6), so we have arc TI T4, and so on. Finally, at the end, T/
"reads" A,B,C, and D, whose values were last written by T4, TI, Ti, and T2,
respectively, explaining the three arcs into Tf.
Now we search for useless transactions, those with no path to T/ in Figure
9.18; T3 is the only such transaction. We therefore remove the arc TI - T3
from Figure 9.18.
TRANSACTION MANAGEMENT
In step (5) of Algorithm 9.3 we consider the arcs or arc pairs needed to
prevent interference of one write operation with another. An item like C or D
that is written by only one nondummy transaction does not figure into step (5).
However, A is written by both T3 and T4, as well as dummy transaction TQ.
The value written by T3 is not read by any transaction, so T4 need not appear
in any particular position relative to T3. The value written by T4 is "read" by
Tf. Therefore, as TS cannot appear after T/, it must appear before T4. In this
case, no arc pair is needed; we simply add to P the arc TS T4. The value of
A written by TO is read by T\ and T2. As T3 and T4 cannot appear before TO,
we place arcs from T\ and T2 to T3 and T4; again no arc pair is necessary.
B,C
499
a)
500
TRANSACTION MANAGEMENT
501
WLOCK A
UNLOCK A
WLOCK C
UNLOCK C
RLOCK A
WLOCK B
UNLOCK A
UNLOCK B
RLOCK C
WLOCK D
UNLOCK C
UNLOCK D
WLOCK B
WLOCK D
UNLOCK B
UNLOCK D
502
TRANSACTION MANAGEMENT
503
F
^.<
Figure 9.22 A hierarchy of items.
Example 9.15: Figure 9.22 shows a tree of items, and Figure 9.23 is the
schedule of three transactions T\, T2, and TI, obeying the tree protocol. Note
that TI is not two-phase, since it locks C after unlocking B. D
The Tree Protocol and Serializability
While we shall not give a proof here (see Silberschatz and Kedem [1980]), all
legal schedules of transactions that obey the tree protocol are serializable. The
algorithm to construct a serial ordering of the transactions begins by creating
a node for each transaction. Suppose Tj and Tj are two transactions that lock
the same item (at different times, of course). Let FIRST(T) be the item first
locked by transaction T. If FIRST(Tj) and FIRST(Tj) are independent (neither
is a descendant of the other), then the tree protocol guarantees that Tj and Tj
do not lock a node in common, and we need not draw an arc between them.
Suppose therefore, without loss of generality, that FIRST(Tj) is an ancestor
of FIRST(Tj). If TJ locks FIRST(Tj) before Tj does, then draw arc Tj - Tj.
Otherwise draw an arc Tj Tj.
It can be shown that the resulting graph has no cycles, and any topological
sort of this graph is a serial order for the transactions. The intuition behind
the proof is that, at all times, each transaction has a frontier of lowest nodes
in the tree on which it holds locks. The tree protocol guarantees that these
frontiers do not pass over one another. Thus, if the frontier of Tj begins above
the frontier of Tj, it must remain so, and every item locked by both Tj and Tj
will be locked by Tj first.
504
TRANSACTION MANAGEMENT
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
LOCK A
LOCK B
LOCK D
UNLOCK B
LOCK B
LOCK C
LOCK E
UNLOCK D
LOCK F
UNLOCK A
LOCK G
UNLOCK C
UNLOCK E
LOCK E
UNLOCK F
UNLOCK B
UNLOCK G
UNLOCK E
T\
T2
T3
505
506
TRANSACTION MANAGEMENT
4.
It obeys the two-phase protocol, in the sense that all unlocks follow all
warnings and locks.
We assume that this protocol acts in conjunction with the simple scheduler that
allows any lock to be placed on an item A only if no other transaction has a
lock or warning on A, and allows a warning to be placed on A as long as no
transaction has a lock on A.
D\
(EJ
(FJ
(G
Example 9.17: Figure 9.25 shows a hierarchy, and Figure 9.26 is a schedule
of three transactions obeying the warning protocol. Notice, for example that at
step (4) TI places a warning on B. Therefore, TS was not able to lock B until
TI unlocked its warning on B at step (10). However, at steps (l)-(3), all three
transactions place warnings on A, which is legal.
The lock of C by T2 at step (5) implicitly locks C, F, and G. We assume
that any or all of these items are changed by T-j before the lock is removed at
step (7). D
Theorem 9.6: Legal schedules of transactions obeying the warning protocol
are serializable.
Proof: Parts (l)-(3) of the warning protocol guarantee that no transaction
can place a lock on an item unless it holds warnings on all of its ancestors. It
follows that at no time can two transactions hold locks on two ancestors of the
same item. We can now show that a schedule obeying the warning protocol is
equivalent to a schedule under the model of Section 9.2, in which all items are
locked explicitly (not implicitly, by locking an ancestor). Given a schedule S
satisfying the warning protocol, construct a schedule R in the model of Section
9.2 as follows.
1. Remove all warning steps, and their matching unlock steps.
2. Replace all locks by locks on the item and all its descendants. Do the same
for the corresponding unlocks.
Let R be the resulting schedule. Its transactions are two-phase because those of
S are two-phase, by part (4) of the warning protocol. We have only to show that
(1)
WARN A
WARN A
(2)
(3)
(4)
(5)
(6)
(7)
(8)
0)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
507
WARN A
WARN B
LOCK C
LOCK D
UNLOCK C
UNLOCK D
UNLOCK A
UNLOCK B
LOCK B
WARN C
LOCK F
UNLOCK A
UNLOCK
UNLOCK
UNLOCK
UNLOCK
B
F
C
A
508
TRANSACTION MANAGEMENT
Warn Lock
Warn
Lock
509
The methods for recovery from system-wide software and hardware failures
will be discussed in Section 9.10. Here, we consider only the problems caused by
single-transaction failures or by the scheduler's decision to abort a transaction.
Commitment of Transactions
When dealing with transactions that may abort, it helps to think of active
transactions, which have not yet reached the point at which we are sure they
will complete, and completed transactions, which we are sure cannot abort for
any of the reasons suggested by (l)-(3) above, such as an attempted illegal step
or involvement in a deadlock. The point in the transaction's execution where
it has completed all of its calculation and done everything, such as ask for
locks, that could possibly cause the transaction to abort, we call the commit
point. In what follows, we shall assume that COMMIT is an action taken by
transactions, just like locking, writing, and computation in the workspace are
steps. In Section 9.10 we shall see that particular actions must be taken when
reaching the commit point, but for the moment, let us simply regard the COMMIT
action as marking the commit point of the transaction.
Transactions That Read "Dirty" Data
In several of the examples we have seen so far, transactions read items that
had been written by other transactions, and the reading occurred prior to the
commit point of the writing transaction. For example, in Figure 9.17, TS reads
C at step (5), and the value it reads was written by T\ at step (4), yet T\
could not possibly have committed until step (7), when it wrote the value of
B.11 Data written into the database by a transaction before that transaction
commits is called dirty data.
We are severely punished for reading dirty data in any situation where the
writing transaction could abort. The following example illustrates what can
happen.
Example 9.18: Consider the two transactions of Figure 9.28. Fundamentally
these transactions follow the model of Section 9.2, although to make clear cer
tain details of timing, we have explicitly shown commitment, reads, writes, and
the arithmetic done in the workspace of each transaction. We assume that the
WRITE action stores a value in the database, while arithmetic steps, such as (3),
are done in the workspace and have no effect on the database.
Suppose that after step (14) transaction T\ fails, perhaps because division
by 0 occurred at step (14), or because a deadlock involving other transactions
caused the scheduler to decide to abort T\. We have to take the following
actions.
11 Recall we have assumed that writing of an item occurs at the time that item is unlocked.
510
TRANSACTION MANAGEMENT
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
LOCK A
READ A
WRITE A
LOCK B
UNLOCK A
LOCK A
READ A
A:=A*2
READ B
WRITE A
COMMIT
UNLOCK A
B : =B/A
TI still holds a lock on item B. That lock must be removed by the system.
The value of A written by TI at step (4) must be restored to the value A
had prior to step (1). There appears not to be a record of the old value
of A; when we discuss recovery from crashes in Section 9.10 we shall see it
is essential in this situation that we have written the old value of A into a
"journal" or "log."
3. The value of A read by T2 at step (8) is now wrong. T2 has performed
an incorrect calculation and written its result in A. Thus, not only do we
have to restore the value of A from before step (1), we have to undo all
effects of T2 and rerun that transaction.12
4. Suppose that some other transaction T3 read the value of A between steps
(13) and (14). Then T3 is also using invalid data and will have to be
redone, even if, like T2, it has already reached its commit point. Further,
any transaction that read a value written by T3 will have to be redone, and
so on. D
The phenomenon illustrated in point (4) of Example 9.18 is called cascading
rollback. It is the consequence of our decision to allow T2 read dirty data. That
is, once we allow even one transaction to read dirty data, then completion of
any transaction T is no guarantee that sometime in the far future it will be
discovered that T read a value which should not have been there, and therefore
In this simple example, nothing but A was changed by Tj.
511
T must be redone.
A transaction cannot write into the database until it has reached its commit
point.
A transaction cannot release any locks until it has finished writing into the
database; therefore locks are not released until after the commit point.
We want the best possible throughput, i.e., the largest possible rate of
transaction completion for a machine of given processing speed and for a
given mix of transactions.
512
TRANSACTION MANAGEMENT
2.
513
If one or more locks are unavailable, T is put on a queue to wait. This scheme
clearly avoids deadlock resulting from competition for locks (which is the only
resource that, in our model, can cause deadlock).
We must be more careful if livelock is to be avoided. To prevent livelock,
the scheduler cannot allow a transaction T to proceed, even if all of its desired
locks are available, as long as any transaction in the queue is waiting for one of
the locks T needs. Furthermore, once a transaction enters the queue, it cannot
proceed, even when all of its locks are available, if there is another transaction
ahead of it in the queue that wants any of the same locks.14
Example 9.20: Suppose there is a sequence of transactions
t/o,Vi,/i,V2,C/2,...
with the property that each f/j locks item A, while each Vi locks B and C.
Suppose also that a transaction T\ is initiated immediately after (/0, and T\
needs locks on A and B, Then T\ must wait for its lock on A and is placed
on the queue. However, before U0 terminates, V\ initiates and is granted locks
on B and C. Thus, when UQ terminates, it releases the lock on A, but now,
B is unavailable, and T\ cannot proceed. Before Vi terminates, f/i initiates,
which again prevents 7\ from proceeding when V\ releases its lock on B. In
this manner, T\ can wait in the queue forever.
However, following the livelock-prevention policy described above, the
scheduler should not grant a lock on B to V\, because T\ is waiting on the
queue, and T\ wants a lock on B. Thus, the correct action by the scheduler
when Vi requests its locks is to place V\ on the queue behind T\. Then, when
UQ finishes, the locks on both A and B become available and are granted to T\ .
When T\ finishes, its lock on B is released and given to Vi , along with the lock
on C, which we assume has remained available.
A similar example can be developed where, if we do not follow the livelockprevention policy outlined above, T\ waits at the front of the queue forever,
never in a situation where all of its locks are available at once, while transactions
behind it on the queue are repeatedly given all of their locks. This construction
is left as an exercise. D
Theorem 9.7: Suppose that we use a protocol in which all locks are obtained
at the beginning of the transaction, and we use a scheduler that allows a trans
action T (which may be on the queue or not) to receive its requested locks if
and only if:
1. All of the locks are available, and
14 We can, though, grant locks to a transaction not at the head of the queue if all the locks
are available, and no transaction ahead of it on the queue wants any of those locks.
Thus, strictly speaking, our "queue" is not a true queue.
514
TRANSACTION MANAGEMENT
2.
515
orders, again locking index blocks and blocks of the ORDERS relation as we
go
If we had to lock initially every block we might need during the execution
of the query of Figure 9.29, we would have to ask for a lock on every block of
the two relations and the two indices. Or, if we were using a hierarchy of locks,
we would take locks on the entire relations and indices. However, if we can let
the query run while we decide on the locks we want, we could begin with a lock
on the root of the ITEM index, examine it to find the next step on the path to
the Brie tuples, and so on. Typically, we would wind up locking only a small
fraction of the blocks.
The advantage to limiting the number of blocks that get locked is that we
can allow updating, insertion, and deletion to go on in parallel with our query,
as long as those operations don't require the rewriting of any of the blocks our
query accesses. Additionally, by taking locks as we need them, our query is
allowed to proceed even if, say, an ORDERS tuple we needed was being written
during the time we accessed the INCLUDES relation. D
Aggressive Protocols
The most aggressive version of two-phase locking requests a lock on an item
immediately before reading or writing the item. If an item is to be written after
reading, the read-lock is taken first and upgraded to a write-lock when needed.
Of course locks can only be released after all of the locks are taken, or we are
outside the realm of two-phase locking, and nonserializable behavior becomes
possible. Also, locks still must be released at the end if we wish to follow the
strict protocol.
As was mentioned, this aggressive behavior can lead to deadlocks, where
two or more transactions are each waiting to acquire a lock that another has,
and none can proceed. The possibility that locks will be upgraded from read to
write introduces another possibility for deadlock. For example, T\ and TI each
hold a read-lock on item A and cannot proceed without upgrading their locks
to a write-lock, as each wants to write a new value of A. There is a deadlock,
and either 7', or '!"' must abort and run again.
Incidentally, one might suppose that we could avoid deadlocks by the trick
of ordering the items and having each transaction lock items in order. The
problem is that when running transactions like Figure 9.29 aggressively, we
cannot choose the order in which many of the blocks are locked. We have to
traverse the index in the way it was designed to be traversed, for example. If
the index is, say a B-tree, we could order the blocks top-to-bottom, so locking
would occur in the right order, but how to we decide on the order for the index
on ITEMS, relative to the index on O# for ORDERS? If we place the latter
first, Figure 9.29 cannot get its locks in the right order. If we place the former
first, then we have problems with a query that runs in the opposite direction
516
TRANSACTION MANAGEMENT
from Figure 9.29, e.g., "find all the items ordered by Zack Zebra."
Choosing an Aggressive or Conservative Protocol
Suppose that the nature of items and transactions is such that the chances of two
transactions trying to lock the same item is very small. Then the probability of
deadlock is very likely to be small, and an aggressive protocol is best. Aborting
transactions will not reduce throughput by much, and by being aggressive we
are avoiding the excess locking and unnecessary transaction delay that was
illustrated in Example 9.21.
On the other hand, suppose that the typical transaction locks a large
enough fraction of the items that unavailable locks are the norm rather than
a rare occurrence. In this case, there is a high probability that a transaction
will be involved in a deadlock, and if we are too aggressive, the probability that
any given transaction will complete is small. Thus, the cost in wasted cycles
may be too great, and a conservative protocol can easily turn out to be more
efficient.
9.10 RECOVERY FROM CRASHES
In Section 9.8 we considered what must be done to handle single transactions
that fail. Now, we must consider the more difficult cases of software and hard
ware failure. Such failures come in two degrees of seriousness, depending on
what is lost. Memory can be divided into volatile storage, whose contents will
not survive most failures such as loss of power, and stable storage, which can
survive all but the most serious physical problems such as a head crash on a
disk or a fire. Memory and cache are examples of volatile storage, while disks
and tapes are stable. In what follows, we shall often use "secondary storage"
as a synonym for "stable storage," and "main memory" may be regarded as
meaning "volatile storage."
We shall refer to loss of volatile storage only as a system failure, while loss
of stable storage is termed a media failure. A database system that does not
lose data when a failure of one of these types occurs is said to be resident in
the face of that kind of failure.
The Log
The most common tool for protecting against loss of data in the face of system
failures is the log or journal, which is a history of all the changes made to the
database, and the status of each transaction. That is, the following events are
recorded by appending records to the end of the log.
1. When a transaction T initiates, we append record (T, begin).
517
2.
When transaction T asks to write a new value v for item A, we first append
record (T, A,v). If there is the possibility that we shall have to undo
transactions, as we discussed in Example 9.18, then this record must also
include the old value of A. Also, if item A is a large object, such as a
relation or memory block, we would be better off letting v be an encoding
of the changes in A (e.g., "insert tuple ^") than the entire new value of A.
3. If transaction T commits, we append (T, commit).
4. If transaction T aborts, we append (T, abort).
Example 9.22: The following is really an example of how a log could be
used to handle transaction abort, but it will illustrate several points about logs
and system failures. Suppose we execute the fourteen steps of Figure 9.28,
after which T\ aborts. Since a system that allows the schedule of Figure 9.28
evidently is not using strict two-phase locking, we must allow for the fact that
rollback of transactions is possible, and therefore, when we write new value v
for an item A that had old value w, we write the record (T, A, io, v). To allow
actual values to be computed, we shall assume that item A starts with the value
10.
Figure 9.30 shows the records written into the log and indicates the step at
which the log entry is written. As we shall see, it is essential that the log entry
be written before the action it describes actually takes place in the database.
n
Step
Before (1)
(4)
Before (7)
(11)
(12)
After (14)
Entry
(7\, begin)
(T1tA,10,9)
(T2, begin)
(T2,-4,9,18)
(T2, commit)
(Zi, abort)
Example 9.22 also suggests how the system could use the log to recover
from the failure which we suppose happened after step (14) of Figure 9.28. It
will also suggest some of the problems faced when we do not use the strict
protocol. First, we examine the log and discover that T\ has aborted, and
therefore we must roll back the database to its state before T\ began. It is not
hard to find the record (Ti, begin) by scanning backwards from the end of the
log. We can also find the record (7\, A, 10, 9) and discover that 10 is the value
that must be restored to A.
518
TRANSACTION MANAGEMENT
519
records per log block, then we need to do only 1/nth as much block writing as
if we wrote the block of the affected item each time a write occurred. Of course,
the paging strategy will probably cause some fraction of the database blocks to
be written out anyway, during the time it takes to fill up one log block. Yet
we are still likely to save time if we write log blocks into stable storage as they
are created (or after each transaction, which we shall see is required) and write
database blocks into stable storage only when required by the paging manager.
A Resilient Protocol
We are now ready to discuss a protocol that is resilient in the face of sys
tem failures. There are several other methods in use, but this one is probably
the simplest to understand and implement; others are mentioned in the bibli
ographic notes. This protocol is called the redo protocol, because to recover
from system failure we have only to redo certain transactions, never undo them
as was the case in Example 9.22.
The redo protocol is a refinement of strict two-phase locking. On reaching
the commit point of a transaction T, the following things must happen in the
order indicated.
1. For each item A for which a new value, v, is written by the transaction T,
append (T, A, v) to the log.
2. Append the record (T, commit) to the log.
3. Write to stable storage the block or blocks at the end of the log that have
not yet been written there. At this point, T is said to be committed.
4. For each item A, write its new value v into the place where A belongs
in the database itself. This writing may be accomplished by bringing the
block for A to main memory and doing the update there. It is optional to
write the block of A back into stable storage immediately.
Example 9.23: In Figure 9.31 we see a transaction T following the redo pro
tocol, and next to T we see the log entries made in response to T. The commit
point is reached between steps (3) and (4); we assume all calculations in the
workspace are performed between steps (3) and (4). The log is written into
stable storage between steps (6) and (7). CH
The "Redo" Recovery Algorithm
When a system failure occurs, we execute a recovery algorithm that examines
the log and restores the database to a consistent state, i.e., one that results from
the application of some sequence of transactions. It is also necessary that any
locks held at the time of the crash be released by the system, since either the
transaction that held them will be reexecuted and will ask for them again, or
the transaction has already committed but not released its locks. In the latter
case, the transaction will not be resumed, and so will not have an opportunity
520
TRANSACTION MANAGEMENT
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(T, begin)
LOCK A
LOCK B
(T,A,v)
(T,B,w)
(T, commit)
WRITE A
WRITE B
UNLOCK A
UNLOCK B
2.
521
The copy of A in main memory was updated, but the crash occurred before
the block holding A was written into stable memory.
522
TRANSACTION MANAGEMENT
Checkpointing
One might suppose from our example that recovery only involved scanning the
log for entries made by the most recent transaction or a few recent transactions.
In truth, there may be no limit to how far back in the log we must look. We
need to find a point far enough back that we can be sure any item written
before then has had its main-memory copy written into stable storage.
Unfortunately, depending on the paging strategy used, it may be hard or
easy to find such a point, or to be sure one exists. In the extreme case, the
entire database fits into main memory, and there is thus no reason why the page
manager ever needs to write a page onto secondary memory. Thus, a database
system needs occasionally to perform a checkpoint operation, which guarantees
that any prior writes have been copied to stable storage. The easiest way to
perform checkpointing is to do the following.
1. Temporarily forbid the initiation of transactions and wait until all active
transactions have either committed or aborted.
2. Find each block whose main-memory copy has been updated but not recopied into secondary memory. A bit associated with each page in the page
table can warn us that the page has been modified.
3. Copy the blocks found in (2) into secondary storage.
4. Append to the end of the log a record indicating that a checkpoint has
occurred, and copy the end of the log onto stable storage.
If we need to recover from a crash, we run the redo algorithm, but we only
consult the log as far back as the most recent checkpoint. In fact, the log prior
to the most recent checkpoint record will never be consulted for recovery from
a system failure, and as long as that part of the log is not needed for any other
purpose, such as to help recover from a media failure or to act as a record of
activity in case of a security violation, it can be discarded.
Example 9.25: If we decide to do a checkpoint during the time that transac
tion T of Figure 9.31 runs, we must wait until after step (10). 17 By the end of
step (10), the values of A and B have at least been written into main memory.
To perform the checkpoint, the values of A and B (and any other items that
weer updated in main memory only) are written into secondary memory. A
checkpoint record is appended to the log somewhere after step (10). If we need
to recover from a later crash, the existence of the checkpoint record will prevent
the redo algorithm from consulting the log as far back as transaction T. That
is the right thing to do, because the effects of T and previous transactions have
already appeared in stable storage, and so were not lost during the crash. CH
Evidently, checkpointing incurs some cost; not only might we have to do a
17 If a crash occurs before then, the checkpoint will not have occurred and will not be
written into the log.
523
lot of writing from main to secondary storage, but we need to delay transactions
that want to initiate during the checkpoint process. Fortunately, checkpointing
does not have to occur too frequently, since as long as crashes are rare, we prefer
to spend a lot of time recovering (examining a long log) than to spend a lot
of time during normal operation protecting against a time-consuming recovery
process.
524
2.
TRANSACTION MANAGEMENT
issue a message that the transaction was lost, since we shall find a begin
record in one of the logs.
When checkpointing, we must make sure that any blocks that were modified
since the last checkpoint are also copied into the archive. These blocks
include those that were in main memory at the time the current checkpoint
operation started, and which therefore will be copied into the stable storage
of the database itself during the checkpoint operation. Also included are
blocks that were previously written into the stable storage of the database,
during the normal operation of the page manager. These blocks can be
found by examining the log, since the last checkpoint, for items whose
values have been written at least once.
525
by T2, we conclude that in this case, the order based on lock points agrees with
the actual schedule. Put another way, if we use two-phase locking, the schedule
seen in Figure 9.32 is a possible schedule, but if we use timestamps to control
concurrency, we could not allow such a sequence of events.
(1)
(2)
(3)
(4)
READ B
READ A
WRITE C
WRITE C
T\
TI
On the other hand, consider the schedule of Figure 9.33. We can tell that
T2 did not reach its lock point until after step (7), because 7\ had a lock on
B until that time, and therefore, T2 could not have locked B until that time.
However, T3 finished by step (6), and therefore reached its lock point before T2
did. Thus, in a serial schedule based on lock points, T3 precedes T2. However,
evidently, in a serial order based on the time of initiation, T2 precedes T3.
Which of these orders can be correct? Only the order T2, T3 could appear in an
equivalent serial schedule, because in Figure 9.33, T3 writes a value of A after
T2 reads A, and if the serial order had T3 before T2, then T2 would erroneously
read the value written by T3. Thus, Figure 9.33 is an example of a schedule
we could see if timestamps were used to control concurrency, but not if locking
were used. D
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
READ A
READ A
READ D
WRITE D
WRITE A
READ C
WRITE B
WRITE B
526
TRANSACTION MANAGEMENT
The point of Example 9.26 is that neither locking nor timestamps can
be said to dominate the other. Each permits some schedules that the other
forbids. Since we generally want a concurrency control method that permits as
many serializable schedules as possible, we cannot rule out either timestamps
or locking on the basis of the set of schedules they permit.
Establishing Timestamps
If all transactions must go through the scheduler to get their timestamps, then
timestamps may be created by having the scheduler keep a count of the number
of transactions it has ever scheduled, and assigning the next number to each
transaction in turn. Then, we can be sure that no two transactions get the same
timestamp, and that the relative order of timestamps corresponds to the order
in which the transactions initiated. An alternative approach is to use the value
of the machine's internal clock at the time a process initiates, as that process'
timestamp.
If there are several processes that can assign timestamps, e.g., because:
1. The database system is running on a machine with more than one proces
sor, and several incarnations of the scheduler are possible, or
2. The database is distributed over several machines, as discussed in Chapter
10,
then we must choose a unique suffix of some fixed length for each processor, and
we must append this suffix to each timestamp generated by that processor. For
example, if there were no more than 256 processors, we could append an 8-bit
sequence to each timestamp, to identify the processor. We must also arrange
that the counts or clocks used by each processor remain roughly in synchronism;
how to do so is explained in Section 10.6.
Enforcing Serializability by Timestamps
Now, we must consider how timestamps are used to force those transactions
that do not abort to behave as if they were run serially. The particular scheme
we describe is analogous to a locking scheme using read- and write-locks; we
could have a timestamp-based system that did not distinguish between reading
and writing (analogous to the simple locking scheme of Section 9.2). We could
even have a timestamp-based scheme that distinguished more kinds of access,
such as incrementation, as we discussed in Example 9.10.
In the read/write scheme, we associate with each item in the database
two times, the read-time, which is the highest timestamp possessed by any
transaction to have read the item, and the write- time, which is the highest
timestamp possessed by any transaction to have written the item. By so doing,
we can maintain the fiction that each transaction executes instantaneously, at
the time indicated by its timestamp.
527
We use the timestamps associated with the transactions, and the read- and
write-times of the items, to check that nothing physically impossible happens.
What, we may ask, is not possible?
1. It is not possible that a transaction can read the value of an item if that
value was not written until after the transaction executed. That is, a
transaction with a timestamp t\ cannot read an item with a write-time of
<2, if t2 > t\. If such an attempt is made, the transaction with timestamp
ti must abort and be restarted with a new timestamp.
2. It is not possible that a transaction can write an item if that item has its
old value read at a later time. That is, a transaction with timestamp ti
cannot write an item with a read-time t^, if t^ > t\. The transaction with
timestamp <i must abort and be restarted with a new timestamp.
Notice that the other two possible conflicts do not present any problems.
Not surprisingly, two transactions can read the same item at different times,
without any conflict. That is, a transaction with timestamp of t\ can read an
item with a read-time of t2, even if t^ > t\. Less obviously, a transaction with
timestamp t\ need not abort if it tries to write an item A with write-time t2,
with <2 > t\. We simply do not write anything into A. The justification is
that in the serial order based on timestamps, the transaction with timestamp
ti wrote A, then the transaction with timestamp t^ wrote A. However, between
ti and <2, apparently no transaction read A, or else the read-time of A would
exceed t\ when the transaction with timestamp t\ came to write, and that
transaction would abort by rule (2).
To summarize, the rule for preserving serial order using timestamps is the
following. Suppose we have a transaction with timestamp t that attempts to
perform an operation X on an item with read-time tr and write-time tw.
a) Perform the operation if X = READ and t > tw or if X = WRITE, t > tr,
and t >tw. In the former case, set the read-time toti{t>tr, and in the
latter case, set the write-time to t if t > tw.
b) Do nothing if X = WRITE and tr < t< tw.
c) Abort the transaction if X = READ and t < tw or X = WRITE and t < tr.
Example 9.27: Let us review the transactions of Figure 9.1, which are shown in
Figure 9.34, with the read-time (RT) and write-time (WT) of item A indicated
as it changes. Suppose that TI is given timestamp 150 and T2 has timestamp
160. Also, assume the initial read- and write-times of A are both 0. Then A
would be given read-time 150 when T\ reads it and 160 at the next step, when
it is read by T^. At the fifth step, when TI writes A, T2s timestamp, which is
160, is not less than the read-time of A, which is also 160, nor is it less than the
write-time, which is 0. Thus the write is permitted, and the write-time of A is
set to 160. When 7\ attempts to write at the last step, its timestamp, which
is 150, is less than the read-time of A (160), so T\ is aborted, preventing the
528
TRANSACTION MANAGEMENT
TI
150
T2
160
(1)
(2)
(3)
(4)
(5)
READ A
(6)
WRITE A
TI aborts
READ A
RT=0
WT=0
RT=150
RT=160
A:=A+1
WRITE A
WT=160
A:=A+1
B
200
(1)
(2)
(3)
(4)
(5)
(6)
(7)
150
175
RT=0
WT=0
READ B
READ A
RT=0
WT=0
RT=200
RT=0
WT=0
RT=150
READ C
WRITE B
WRITE A
RT=175
WT=200
WT=200
WRITE C
T2 aborts
WRITE A
Figure 9.35 Transactions controlled by timestamps.
529
530
TRANSACTION MANAGEMENT
will not suffer seriously because of it. Since the timestamp concurrency control
algorithm that we described here is an aggressive strategy, we probably only
want to use it when access conflicts are rare anyway.
When we allow writing into the database before the commit point of
a transaction is reached, we also face problems if we must recover from a
system failure; it matters not whether concurrency control is maintained by
timestamps, locking or another mechanism. We still need to place records
(T, begin), and (T, commit) or (T, abort) on the log for each transaction T.
However, it is no longer adequate to simply redo the transactions T for which
(T, commit) appears. If that record does not appear, then we also must undo
any writes of T, using the old value that was placed in the log record for this
purpose. Further, undoing T may result in cascading rollback, just as if T had
aborted.
Strict Timestamp-Based Concurrency Control
To avoid cascading rollback, and to allow the redo algorithm of Section 9.10 to
suffice for recovery from system failures, we can adopt may of the ideas used for
locking-based algorithms. First, we can perform all updates in the workspace,
and write into the database only after the transaction reaches its commit point.
This approach is analogous to strict two-phase locking, as discussed in Section
9.8, and we shall refer to this protocol as "strict" as well. As in Section 9.10, we
perform our writes in two stages. First a record is written into the log, which
is copied to stable storage; second the value is written into the database itself.
Also as before, a commit record is written on the log between the two stages.
When we use timestamps, there is a subtlely that strictness introduces.
We abort transaction T if we try to write an item A and find the read-time
of A exceeds T's timestamp. Thus, the checking of timestamps must be done
prior to the commit point, because by definition, a transaction may not abort
after reaching its commit point.
Suppose, for example, T has timestamp 100, and T decides to write A. It
must check that the read-time of A is less than 100, and it must also change
the write-time of A to 100; if it does not change the write-time now, another
transaction, say with a timestamp of 110, might read A between now and the
time T reaches its commit point. In that case, T would have to check again on
the read-time of A (which is now 110) and abort after T thought it reached its
commit point.
However, now we are faced with the situation where T has changed the
write-time of A, but has not actually provided the database with the value
supposedly written at that time; T cannot actually write the value, because T
still might abort, and we wish to avoid cascading rollback. The only thing we
can do is to give transaction T what amounts to a lock on A that will hold
between the time T changes the write-time of A and the time T provides the
531
corresponding value. If T aborts during that time, the lock must be released
and the write-time of A restored.18
There are two different approaches to making the checks that are needed
when a transaction T read or writes an item A:
1. Check the write-time of A at the time T reads A, and check the read-time
of A at the time T writes the value of A in its workspace, or
2. Check the read-time of A (if T wrote A) and the write-time of A (if T read
A) at the time T commits.
In either case, when writing A, we must maintain a lock on A from the time
of the check to the time the value is written. However, in approach (1), these
locks are held for a long time, while in (2) the lock is held for a brief time,
just long enough for the other items written by A to have similar checks made
on their read-times. On the other hand, strategy (2), often called optimistic
concurrency control, checks timestamps later than (1), and therefore will abort
more transactions than (1).
To summarize, the steps to be taken to commit a transaction running under
the optimistic strategy, item (2) above, are the following.
t) When the transaction T finishes its computation, we check the read-times
of all items T wants to write into the database; "locks" are taken on all
these items. If any have a read-time later than the timestamp of T, we
must abort T. Also check the write-times of items read by T, and if any
are too late, abort T. Otherwise, T has reached its commit point.
) Write T's values into the log.
iii) Append a commit record for T to the log and copy the tail of the log into
stable storage.
iv) Write T's values into the database.
u) Release the "locks" taken in step (t).
If we use strategy (1), then the only difference is that step (t) is not done.
Rather, the "locks" will have been taken, and the checks made, during the
running of the transaction.
A Multiversion Approach
To this point we have assumed that when we write a new value of an item,
the old value is discarded. However, there are some applications where it is
desirable to keep the history of an item available in the database. For example,
a hospital may wish to store not only a patient's temperature today, but his
temperature throughout his stay. The hospital may in fact wish to retain records
18 If we do not restore the write-time, then a transaction with timestamp 90, say, might
assume it did not have to write its value of A because its value would be overwritten
before being read in the equivalent serial schedule.
532
TRANSACTION MANAGEMENT
100
(1)
(2)
(3)
(4)
(5)
200
READ A
READ A
WRITE B
READ B
WRITE A
RT=0
WT=0
RT=100
RT=200
RT=0
WT=0
WT=200
RT=100
WT=100
533
and TI creates a new version of A with write-time 100; we call these B\ and
AI, respectively. The advantage of multiple versions is seen at step (4), where
TI reads B. Since TI has timestamp 100, it needs to see the value of B that
existed at that time. Even though TI wrote B at step (3), the value BQ, which
existed from time 0 to time 199, is still available to TI, and this value is the
one returned to TI by the scheduler. D
Multiversion scheduling is the most conservative variety of timestampbased concurrency control that we have covered. It clearly causes fewer aborts
than the other approaches studied in this section, although it causes some aborts
that conservative two-phase locking would not cause (the latter causes none at
all). The disadvantages of multiversion scheduling are that:
1. We use extra space,
2. The retrieval mechanism is more complicated than for single-version meth
ods, and
3. The DBMS must discover when an old version is no longer accessible to
any active transaction, so the old version can be deleted.
We leave the discovery of an algorithm to achieve (3) as an exercise.
Summary
There are four variants of timestamp- based concurrency control that we have
considered in this section.
1. Unconstrained, with cascading rollback possible.
2. Check read- and write-times when an item is read from the database or
written in the workspace.
3. Check read- and write-times just before the commit point (optimistic con
currency control).
4. Multiversion method. Here we could handle the read- and write-time
checks as in any of (l)-(3); we shall assume (2).
The relative advantages of each are summarized in Figure 9.37.
Restart of Transactions
The timestamp-based methods we have covered do not prevent livelock, a situ
ation where a transaction is aborted repeatedly. While we expect transactions
to be aborted rarely, or the whole approach should be abandoned in favor of
the locking methods described earlier, we should be aware that the potential
for cyclic behavior involving only two transactions exists.
Example 9.31: Suppose we have transaction TI that writes B and then reads
A, while T2 writes A and then reads B.19 If TI executes, say with timestamp
19 These transactions may read and write other items, so writing before reading need not
534
TRANSACTION MANAGEMENT
Locks
Abort
Rollback
Weak Point
Unconstrained
None
Possible
Rollback
Long
Time
Possible
Cascading
Aborts
Redo
Algorithm
Optimistic
Short
Time
More Likely
Redo
Algorithm
Aborts
Multiversion
None
Less Likely
Redo
Algorithm
Locking
B
100
(1)
(2)
(3)
(4)
(5)
(6)
(7)
110
120
130
RT=0
WT=0
WRITE B
WRITE A
RT=0
WT=0
WT=100
WT=110
READ A
WRITE B
WT=120
READ B
WRITE A
WT=130
READ A
The solution to the problem indicated by Example 9.31 is not easy to find.
Probably the simplest approach is to use a random number generator to select a
random amount of time that an aborted transaction must wait before restarting.
mean that the transactions are unrealistic.
EXERCISES
535
EXERCISES
9.1: In Figure 9.39 we see a schedule of four transactions. Assume that writelocks imply reading, as in Section 9.4. Draw the serialization graph and
determine whether the schedule is serializable.
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
RLOCK A
RLOCK A
WLOCK B
UNLOCK A
WLOCK A
UNLOCK B
RLOCK B
UNLOCK A
RLOCK B
RLOCK A
UNLOCK B
WLOCK C
UNLOCK A
WLOCK A
UNLOCK A
UNLOCK B
UNLOCK C
TI
T3
9.2: Repeat Exercise 9.1 under the assumptions of Section 9.6, where a writelock does not imply that the value is read.
* 9.3: In Figure 9.40 are two transactions. In how many ways can they be sched
uled legally? How many of these schedules are serializable?
536
TRANSACTION MANAGEMENT
LOCK A
LOCK B
UNLOCK A
UNLOCK B
2i
LOCK B
UNLOCK B
LOCK A
UNLOCK ^
T2
EXERCISES
537
but one cannot place a read-lock when the other has a write-warning.
9.9: Two lock modes are equivalent in a given compatibility matrix if they have
identical rows and columns. Show that there are only five inequivalent lock
modes in your table from Exercise 9.8.
* 9.10: Suppose a set of items forms a directed, acyclic graph (DAG). Show that
the following protocol assures serializability.
t) The first lock can be on any node.
it) Subsequently, a node n can be locked only if the transaction holds a
lock on at least one predecessor of n, and the transaction has locked
each predecessor of n at some time in the past.
* 9.11: Show that the following protocol is also safe for DAG's.
* 9.12:
9.13:
* 9.14:
* 9.15:
* 9.16:
538
* 9.17:
* 9.18:
* 9.19:
* 9.20:
9.21:
TRANSACTION MANAGEMENT
tit) Long delays; i.e., the short transactions must wait for the sum-ofbalances to complete.
i) Cascading rollback in case of system failure.
v) Inability to recover from system (not media) failures.
Indicate which of these problems may occur if we use each of the following
concurrency-control strategies.
a) Strict two-phase locking, with locks taken at the time a transaction
needs them.
b) Strict two-phase locking, with all locks taken at the beginning of the
transaction.
c) Nonstrict two-phase locking, with locks taken at the time they are
needed and released as soon after the lock point as they are no longer
needed.
d) Non-two-phase locking, with locks taken immediately before reading
or writing and released immediately after reading or writing.
e) Timestamp-based, optimistic concurrency control, with timestamps
checked at the end of the transaction.
f) As in (e), but with timestamps checked when the items are read or
written.
g) A multiversion, timestamp-based scheme, with appropriate versions
read, and timestamps checked at the time of writing.
How do the fraction of cycles lost to aborted transactions compare in
the situation of Exercise 9.16, for the seven concurrency-control methods
listed?
Suppose that in the situation of Exercise 9.16 we instead used a hierarchy
of items and the "warning protocol." That is, it is possible to place a reador write-warning on the entire relation R. How would this approach fare
with respect to problems (t)-() mentioned in Exercise 9.16?
Extend the idea of warnings on a hierarchy of items to timestamp-based
concurrency control.
In Example 9.20 we mentioned that it was possible for transaction 7\
to wait at the beginning of the queue forever, if locks could be given to
following transactions on the queue, should all locks that such a transaction
needs be available. Give an example of such a situation.
In Figure 9.41 is a list of transactions and the items they lock. We suppose
that these five transactions become available to initiate in the order shown.
TI, the first to initiate, finishes after T3 becomes available but before T4
does. No other transaction finishes until after all five become available.
Suppose we use the conservative, deadlock- and livelock-avoiding protocol
of Theorem 9.7. Indicate the order in which the transactions actually
EXERCISES
539
540
TRANSACTION MANAGEMENT
BIBLIOGRAPHIC NOTES
Much of the theory and practice of transaction management was first organized
in the survey by Gray [1978]. Papadimitriou [1986] is an excellent summary of
the theory of concurrency control in database systems, and Bernstein, Hadzilacos, and Goodman [1987] is likewise an important study of the theory and
pragmatics of the subject. The organization of concurrency-control policies
into families such as "strict," "aggressive," and "conservative," which we have
followed here, comes from the latter text.
Serializability
Eswaran, Gray, Lorie, and Traiger [1976] is the origin of the notion of serializability as the appropriate notion of correctness for concurrent database sys
tems. Similar ideas appeared in Stearns, Lewis, and Rosenkrantz [1976], which
includes the notion of a serialization graph.
The model of Section 9.6 and the polygraph-based serializability test are
from Papadimitriou, Bernstein, and Rothnie [1977] and Papadimitriou [1979].
Locking
The two-phase locking protocol is from Eswaran, Gray, Lorie, and Traiger
[1976].
Some recent studies of the performance of different locking policies are
found in Tay, Goodman, and Suri [1985], Tay, Suri, and Goodman [1985], and
Franaszek and Robinson [1985].
Lock Modes
The theory of lock modes is discussed in Korth [1983]. Stonebraker [1986]
discusses the use of lock modes as a technique for implementing more expressive
query languages, such as logic-based languages.
Lock Granularity
The choice of lock granularity is discussed by Gray, Lorie, and Putzolo [1975]
and Reis and Stonbraker [1977, 1979].
Non-Two-Phase Locking
Two-phase locking is necessary and sufficient to assure serializability when an
abstract model of transactions, such as appeared in Sections 9.2, 9.4, and 9.6,
is used. If we model transactions in more detail, e.g., by using normal seman
tics for arithmetic operations, then we can use less restrictive protocols and
still have serializability. This theory has been developed by Kung and Pa
padimitriou [1979], Yannakakis, Papadimitriou, and Kung [1979], Yannakakis
BIBLIOGRAPHIC NOTES
541
544
nodes could be dedicated telephone lines, for example. While it is possible that
there is a link from every node to every other, it is more likely that only a
subset of the possible links exist.
Whether we are talking about a local-area network or a more widely dis
tributed network, it should be apparent that communication between nodes is
likely to be costly. In the local-area network, the capacity is large, but small
messages such as "please grant me a lock on item An bear considerable overhead.
In a network composed of phone lines, the rate at which data can be transmitted
is low compared with the instruction-execution speed of a computer. In either
case, we are motivated to keep communication to a minimum as we execute
transactions, manage locks or timestamps, and commit transactions.
Resiliency of Networks
Naturally, a distributed database is vulnerable to a failure at any of its nodes.
The links between nodes may also fail, either because the link itself fails, or
because the computer at either end fails. We would like the distributed database
system to continue to function when a link or node fails; i.e., the system should
be resilient in the face of network failure.
One way to promote resiliency is to keep more than one copy of each item,
with different copies at different sites. If we do so, then part of the transaction
management problem for the distributed database is to guarantee that all of
the copies have the same value; more specifically, we must ensure that changes
to an item with several copies appears to be an atomic operation. That is
especially difficult if one of the copies of item A is at a node that has failed.
When a node N with a copy of A fails, we may access other copies of A; that
ability is what the redundancy provided by the multiple copies buys. When
node N eventually recovers, it is necessary that the changes made to A at the
other nodes are made at TV as well.
A more complex failure mode occurs when a link or links fail and thereby
partition the network into two or more pieces that cannot communicate. For
example, any tree becomes disconnected if a nonleaf node fails, or if any link
fails.
Example 10.1: The failure of node D in the tree of Figure 10.1 disconnects
the tree into three pieces {A, B, C}, {E}, and {F, G, H}. The failure of the link
(B,D) separates the network into two pieces, {A,B, C} and {D, E, F, G, H}.
D
Disconnection of the network makes it more difficult to keep the database
system operational. For one problem, all the copies of an item may be in one
block of the network partition, and the other blocks cannot access that item.
For another problem, in different blocks, different changes may be made to
the same item, and these changes will have to be integrated when the network
545
E
B
D^
G
F
546
of talking a logical lock is translated into taking physical locks, in such a way
that the logical lock appears to be granted as an atomic action.
Global Transactions, Local Subtransactions, and Serializability
Our first task, as we extend concurrency control concepts from the single-site
case to the distributed case, is to consider how locks on logical, or global, items
can be built from locks on physical, or local, items. The only thing we can do
with physical items is take a lock on a single physical copy Ai of a logical item
A, by requesting the lock from the lock manager that is local to the site of .Aj.
Whatever we do with physical copies must support the properties we expect
from locks on the logical items. For example, if we use read- and write-locks,
then we need to know that at no time can two transactions hold write-locks,
or a read- and a write-lock, on the same logical item. However, any number of
transactions should be able to get read-locks on the same logical item at the
same time.
If there is but one copy of an item, then the logical item is identical with
its one physical copy. Thus, we can maintain locks on the logical item if and
only if we maintain locks on the copy correctly. Transactions wishing to lock
547
an item A with one copy send lock-request messages to the site at which the
copy resides. The lock manager at that site can grant or deny the lock, sending
a back a message with its decision in either case.
However, if there are several copies of an item, then the translation from
physical locks to logical locks can be accomplished in several ways, each with
its advantages. We shall consider some of these approaches and compare the
numbers of messages required by each.
Write-Locks-AllRead-Locks-One
A simple way to maintain logical locks is to maintain ordinary locks on copies of
items, and require transactions to follow a protocol consisting of the following
rules defining locks on logical items.
1. To obtain a read-lock on logical item A, a transaction may obtain a readlock on any copy of A.
2. To obtain a write-lock on logical item A, a transaction must obtain writelocks on all the copies of A.
This strategy will be referred to as write-locks-all.
At each site, the rules for granting and denying locks on copies are exactly
the same as in Chapter 9; we can grant a read-lock on the copy as long as no
other transaction has a write-lock on the copy, and we can only grant a writelock on the copy if no other transaction has either a read- or write-lock on the
copy.
The effect of these rules is that no two transactions can hold a read- and
write-lock on the same logical item A at the same time. For to hold a writelock on logical item A, one transaction would have to hold write-locks on all the
physical copies of A. However, to hold a read-lock on A, the other transaction
would have to hold a read-lock on at least one copy, say A\. But the rules
for locks on the physical copy A\ forbid a transaction from holding a readlock at the same time another transaction holds a write-lock. Similarly, it is
not possible for two transactions to hold write-locks on A at the same time,
because then there would have to be conflicting write-locks on all the physical
copies of A.
Analysis of Write-Locks-All
Let us see how much message traffic is generated by this locking method. Sup
pose that n sites have copies of item A. If the site at which the transaction is
running does not know how many copies of A exist, or where they are, then we
may take n to be the total number of sites.2 To execute WLOCK A, the transIt is worth noting that considerable space and effort may be required if each site is to
maintain an accurate picture of the entire distributed database, at least to the extent
548
action must send messages requesting a lock to all n sites. Then, the n sites
will reply, telling the requesting site whether or not it can have the lock. If it
can have the lock, then the n sites are sent copies of the new value of the item.
Eventually, a message UNLOCK A will have to be sent, but we may be able to
attach this message to messages involved in the commitment of the transaction,
as discussed in Sections 10.4 and 10.5.
The messages containing values of items may be considerably longer than
the lock messages, since, say, a whole relation may be transmitted. Thus, we
might consider sending only the changes to large items, rather than the complete
new value. In what follows, we shall distinguish between
1. Control messages, which concern locks, transaction commit or abort, and
other matters of concurrency control, and
2. Data messages, which carry values of items.
Under some assumptions, control and data messages cost about the same, while
under other conditions, data messages could be larger and/or more expensive.
It is unlikely that control messages will be more costly than data messages.
Sometimes, we shall have the opportunity to attach control messages to data
messages, in which case we shall count only the data message.
When a transaction write-locks a logical item A, we saw by the analysis
above that it needed to send 2n control messages and n data messages. If one
of A's copies is at the site running the transaction, we can save two control
messages and one data message, although we must still request and reserve a
lock at the local site. If one or more sites deny the lock request, then the lock
on A is not granted.
To obtain a read-lock, we have only to lock one copy, so if we know a site
at which a copy of A exists, we can send RLOCK A to that site and wait for a
reply granting the lock or denying the lock request. If the lock is granted, the
value of A will be sent with the message. Thus, in the simplest case, where
we know a site at which A can be found and the lock request is granted, only
two messages are exchanged, one control (the request), and one data (the reply,
including the value read). If the request is denied, it probably does not pay to
try to get the read-lock from another site immediately, since most likely, some
transaction has write-locked A, and therefore has locks on all the copies.
The Majority Locking Strategy
Now let us look at another, seemingly rather different protocol for defining locks
on logical items.
of knowing what items exist throughout the database, and where the copies are. For
this reason, among others, there is an advantage to using large items in a distributed
environment.
549
1.
To obtain a read-lock on logical item A, a transaction must obtain readlocks on a majority of the copies of A.
2. To obtain a write-lock on logical item A, a transaction must obtain writelocks on a majority of the copies of A.
We call this strategy the majority approach.
To see why majority locking works, note that two transactions each holding
locks on A (whether they are read- or write-locks doesn't matter) would each
hold locks on a majority of the copies. It follows that there must be at least one
copy locked by both transactions. But if either lock is a write-lock, then there
is a lock conflict for that copy, which is not permitted by the lock manager at
its site. Thus, we conclude that two transactions cannot hold write-locks on
logical item A simultaneously, nor can one hold a read-lock while the other holds
a write-lock. They can, of course, hold read-locks on an item simultaneously.
Comparison of Methods
Before proceeding to some other methods for distributed locking, let us compare
the write- locks-all and majority methods. Each uses n data messages for a write
and one data message for a read. Write-locks-all uses 2n control messages for a
write and one for a read, while majority uses n+1 for write and n for read. Thus,
3 In what follows, we assume n is odd, and use (n + l)/2 for the more precise {(n + 1)/2"|.
550
551
sponsibility for locking a particular logical item A lie with one particular site,
no matter how many copies of the item there are. At the extreme, one node
of the network is given the task of managing locks for all items; this approach
is the "central node method," which we describe shortly. However, in its most
general form, the assignment of lock responsibility for item A can be given to
any node, and different nodes can be used for different items.
A sensible strategy, for example, is to identify a primary site for each
item. For example, if the database belongs to a bank, and the nodes are bank
branches, it is natural to consider the primary site for an item that represents
an account to be the branch at which the account is held. In that case, since
most transactions involving the account would be initiated at its primary site,
frequently locks would be obtained with no messages being sent.
If a transaction, not at the primary site for A, wishes to lock A, it sends
one message to the primary site for A and that site replies, either granting or
withholding the lock. Thus, locking the logical item A is the same as locking
the copy of A at the primary site. In fact, there need not even be a copy of A
at the primary site, just a lock manager that handles locks on A.
552
In case (a), M must remember that N has asked it for the token, but does
not know whether it can have it yet [another site could answer (b)]. M
"reserves" the token for N; doing so prevents another site P from also
being told by M that it has no objection to P's obtaining the token.4
3. If all sites reply (a) to N, then N knows it can have the write-token. It
sends a message to each site that replied (a), telling it that N has accepted
the write-token, and they should destroy whatever tokens they have for A.
If some site replies (b), then N cannot have the write-token, and it must
send messages to the nodes that replied (a) telling them they can cease
reserving the write-token for A, and may allow another site to get that
token.
To read A, essentially the same process takes place, except that if the local
site has any of the read-tokens for A, no messages need to be sent. In (2) above,
the responding site M does not object [send message (b)] if it has a read-token
for A, only if it has a write-token. In (3), if N is allowed to obtain a read-token
for A, then only write-tokens, not read-tokens, are destroyed at other sites.
More Comparisons Among Methods
Evidently, the primary copy token method uses considerably more messages
than the other methods so far; both reading and writing can use 3m control
messages, where m is number of nodes in the network, while other methods
use a number of messages that is proportional to the number of copies of an
item, at worst. On the other hand, the primary copy token approach averages
much less than 3m control messages per lock operation when one site runs most
of the transactions that reference a particular item. Then the write-token for
that item will tend to reside at that site, making control messages unneeded for
most transactions. Thus, a direct comparison with the fc-of-n methods is not
possible; which is preferable depends on the site distribution of the transactions
that lock a particular item.
Similarly, we cannot compare the primary site method directly with the
write-locks-all method; while the former uses smaller numbers of messages on
the average, the latter has the advantage when most locks are read-locks on
copies that are not at the primary site for that item. It appears that the
primary site approach is more efficient than the k-of-n methods for k > 1.
However, there are other considerations that might enter into the picture. For
example, the primary site method is vulnerable to a failure at the primary site
4 The reason we must be careful is that there might be no tokens for A at all. For example,
none might have been created, or the last one could have been lost, because the node
holding it failed. If we did not use "reservations," two sites could ask for the write-token
for A at the same time, and each be told by all of the sites (including each other) that
they did not have any token on A. Then, each would create a write-token for A and
there would be two tokens when at most one should exist.
553
for an item, as the sites must then detect the failure and send messages to
agree on a new primary site. In comparison, fc-of-n type strategies can continue
locking that item with no interruption.
We can also compare primary copy token methods with the primary site
approach. In the later method, a write requires two control messages to request
and receive a lock from the primary site, then n data messages, as usual, to
write the new value. Reading requires a control message asking for a lock
and a data message in response, granting the request and sending the value.
If all transactions referencing A run at the primary site for A, then the two
approaches are exactly the same; no messages are sent, except for the obligatory
writes to update other copies of A, if any. When other sites do reference A, the
primary site method appears to save a considerable number of messages.
However, the token method is somewhat more adaptable to temporary
changes in behavior. For example, in a hypothetical bank database, suppose a
customer goes on vacation and starts using a branch different from his usual
one. Under the primary site method, each transaction at the new branch would
require an exchange of locking messages. In comparison, under the token ap
proach, after the first transaction ran at the new branch, the write-token for the
account would reside at that branch as long as the customer was on vacation.
554
Method
Control Msgs.
to Write
2n
Control Msgs.
to Read
Comments
> n+1
>n
Primary Site
Primary Copy
Token
Central Node
0 4m
0-4m
Good if read
dominates
Avoids some
deadlock
Efficient; some
vulnerability
to crash
Adapts to changes
in use pattern
Vulnerable to
crash; efficiencies
may result from
centralized traffic
pattern
Write-LocksAll
Majority
555
A.
Also, transaction T2 has two subtransactions, TZ.I running at 5i and writing a
new value of A\, and T2.2, running at 52 and writing the same value into A^. We
shall assume that write-locks-all is the protocol followed by these transactions
for defining locks on logical items, but as we shall see, other methods cause
similar problems.
TI.I
^2.1
TI.Z
T2.2
WLOCK AI
UNLOCK A3
WLOCK A\
UNLOCK >1i
WLOCK AI
UNLOCK A i
At 5i
WLOCK AI
UNLOCK AI
At 52
556
52 we find that T2.2 must precede 7\.2. Unfortunately, a serial order must be
formed not just from the subtransactions, but from (logical) transactions. Thus,
if we choose to have 7\ precede T2, then Ji.2 precedes T2.2, violating the local
ordering at 52. Similarly, if the serial order is T2,7\, then the local ordering at
5i is violated. In fact, in the order of events indicated in Figure 10.3, the two
copies of A receive different final values, which should immediately convince us
that no equivalent serial order exists.
The problem indicated above is not restricted to write-locks-all. For exam
ple, suppose we use the primary site method of locking. We can modify Figure
10.3 by letting A\ be the sole copy of A and letting A2 be the sole copy of
another logical item B. Therefore, 5i and 52 are the primary sites for A and
B, respectively. The schedule of Figure 10.3 is still not serializable, since the
final value of B is that written by 7\ and the final value of A is what T2 writes.
In fact, notice that all the locking methods of Section 10.2 become the same
when there is only one copy of each item; thus this problem of nonserializability
comes up no matter what method we use. D
557
in the next section, and only after committing are locks released.
In a situation like Figure 10.3, T\.i and T^.2 would not release their locks at
the second line, if the strict protocol were followed. In this case, there would be
a deadlock between T\ and T2, since each has a subtransaction that is waiting
for a lock held by a subtransaction of the other. We shall discuss distributed
deadlock detection in Section 10.8. In this case, one of T\ and T^ has to abort,
along with all of its subtransactions.
10.4 DISTRIBUTED COMMITMENT
For the reason just discussed (supporting distributed two-phase locking), as
well as for the reasons discussed in Sections 9.8 and 9.10 (resiliency), it is
necessary for a distributed transaction to perform a commit action just before
termination. The existence of subtransactions at various sites complicates the
process considerably.
Suppose we have a transaction T which initiated at one site and spawned
subtransactions at several other sites. We shall call the part of T that executes
at its home site a subtransaction of the logical transaction T; thus logical T
consists solely of subtransactions, each executing at a different site. We distin
guish the subtransaction at the home site by calling it the coordinator, while
the other subtransactions are the participants. This distinction is important
when we describe the distributed commitment process.
In the absence of failures, distributed commitment is conceptually simple.
Each subtransaction Tj of logical transaction T decides whether to commit
or abort. Recall, Tj could abort for any of the reasons discussed in Chapter
9, such as involvement in a deadlock or an illegal database access. When /',
decides what it wants to do, it sends a vote-commit or vote-abort message
to the coordinator. If the vote-abort message is sent, Ti knows the logical
transaction T must abort, and therefore Ti may terminate. However, if Ti sends
the vote-commit message, it does not know whether T will eventually commit,
or whether some other subtransaction will decide to abort, thus causing T to
abort.
Thus, after voting to commit, Tj must wait for a message from the coordina
tor. If the coordinator receives a vote-abort message from any subtransaction,
it sends abort messages to all of the subtransactions, and they all abort, thus
aborting the logical transaction T. If the coordinator receives vote-commit
messages from all subtransactions (including itself), then it knows that T may
commit. The coordinator sends commit messages to all of the subtransactions.
Now, the subtransactions all know that T can commit, and they take what
steps are necessary at their local site to perform the commitment, e.g., writing
in the log and releasing locks.
It is useful to visualize the subtransactions changing state in response to
their changes in knowledge about the logical transaction. In Figure 10.4, the
558
Send
Receive
vote-commit
commit
Initial ^
/^Willing\
( Committed"
. to commit
Send vote-abort
or Receive abort
(a) Participant.
Receive all
Send
vote-commit
commit
Initial )
+S MustN, >H Committed
commit
Receive any
vote-abort
Send
abort
Aborted
/
(b) Coordinator.
Figure 10.4 State transactions for distributed commitment.
transitions among states are indicated. The following comments are useful in
understanding the diagram.
1. Do not forget to distinguish between voting messages, which are sent by
participant transactions to the coordinator, and decision messages sent by
the coordinator to the participants.
2. The coordinator is a participant, and in principle sends messages to itself,
although we do not "pay" for these messages with network traffic. For
example, the coordinator might decide to abort because it divides by zero,
which we regard, in Figure 10.4(b), as if the coordinator had "received" a
vote -abort message from itself.
3.
4.
5.
6.
559
The Committed and Aborted states really are not entered until the subtransactions perform whatever steps are required, such as releasing locks
and writing in the log.
When a participant is in the Initial state, it will eventually decide to send
vote-abort or vote-commit, entering the Aborted or Willing- to-commit
states, respectively. This decision is based on the circumstances of the
participant; for example, it "decides" to abort if the system tells it that it
is involved in a deadlock and must abort.
It is also possible that a participant will enter the Aborted state because
the coordinator tells it to. That may happen if some other participant has
decided to abort and informed the coordinator, which relays the message
to all participants.
The use of a coordinator is not essential. All participants could broadcast
their votes to all others. However, the number of messages would then
be proportional to the square of the number of participants, rather than
linearly proportional to this number. Commitment algorithms of this type
are discussed in the exercises.
Blocking of Transactions
When there are network failures, the simple distributed commitment protocol
of Figure 10.4 can lead to blocking, a situation where a subtransaction at a
site that has not failed can neither commit nor abort until failures at other
sites are repaired. Since a site may be down indefinitely, and since the blocked
subtransaction may be holding locks on items, which it cannot release, we are
in a difficult situation indeed. There are many circumstances that can cause
blocking; perhaps the simplest is the following.
Example 10.4: Suppose a subtransaction 7 ', holds a lock on one copy of item
A, and Ti reaches its commit point. That is, Ti sends vote-commit to its coor
dinator and enters the state Willing-to-commit in Figure 10.4(a). After a long
time, Ti receives neither a commit nor an abort message from the coordinator.
We claim that Ti must remain in this state and hold its lock on the local copy
of A; i.e., Ti is blocked. Any other action can lead to an error.
1. If Ti decides to commit without instructions from the coordinator, it may
be that some other subtransaction with a local copy of A decided to abort,
but the coordinator has failed and cannot tell Ti to abort. If Tj commits,
another transaction may read the local copy of A, which should not have
been changed; i.e., the local copy of A is dirty data.
2. If Ti decides to abort without instructions from the coordinator, it could be
that the coordinator received vote-commit messages from all participants,
but afterward, the network failed, cutting Tj off from the coordinator.
However, some other participants were not cut off from the coordinator;
560
they received the commit message and wrote new values for their copies of
A. Thus, the copies of A no longer hold the same value.
Other options could be considered, such as releasing the lock on A without
committing or aborting. However, all options can lead to an inconsistent value
for the copies of A, because Tj is in a state where it does not know whether the
logical transaction of which Tj is a part will eventually commit or abort, and
there are scenarios where either could happen. D
Two-Phase Commit
The most common approach to distributed commitment is a variant of the sim
ple algorithm of Figure 10.4. The protocol is called two-phase commit, because
of the two phases, voting followed by decision, that we see in Figure 10.4. Twophase commit does not avoid all blocking, but it does reduce the likelihood of
blocking. We shall later mention an improvement, called "three-phase commit,"
which does avoid blocking when nodes fail (although not necessarily when the
network disconnects).
Two-phase commit offers two improvements over the simplest protocol.
First, subtransactions measure the time since a response message was first ex
pected, and if the message is delayed so long that it is probable a network failure
has occurred, the subtransaction "times out," entering a state from which it
will attempt to recover. The most serious problem, as we saw in Example 10.4,
is when a participant is in the Willing-to-commit state, and a timeout occurs,
i.e., the elapsed time since it sent the vote-commit message exceeds a preset
time limit. To help avoid blocking, such a transaction sends a message help-me
to all other participants.
On receiving a help-me message:
1. A participant in the Committed state replies commit. It can do so safely,
because it must have received the commit message from the coordinator,
and thus knows that all participants have voted to commit.
2. A participant that is in the Aborted state can send the abort message,
because it knows that the transaction must abort.
3. A participant that has not voted yet (i.e., one in the Initial state) can help
resolve the problem by deciding arbitrarily to abort, so it too makes an
abort reply and sends vote-abort to the coordinator.5
4. A participant in the Waiting-to-commit state cannot help resolve the prob
lem, so it makes no reply.
A blocked transaction that receives an abort or commit message follows that
instruction, going to the appropriate state. That this choice is always correct
5 We leave as an exercise the observation that should a participant in the Initial state
decide to commit in this situation there is the possibility of inconsistent data.
561
562
Receive
Send
Receive
begin-vote,
vote-commit
commit
Initial )
^(Deciding)
^^\VillingN
4 Committed
to commit
Timeout
Send
, , help-me
Receive
abort
Receive
commit
( Blocked
(a) Participant.
Send
Receive all
.begin-vote^.---- vote-commit
Initial 1-* Waiting
Send
commit
Timeout or
Receive any
vote-abort
Must
abort
Send
abort
(b) Coordinator.
563
ceives one, because all other participants are either cut off from the sender,
failed, or also in the Willing-to-commit state, then this participant remains
blocked.
There are two other conditions under which a timeout occurs and some
action to avoid blocking occurs. In Figure 10.5(b), the coordinator times out
if, after it sends begin-vote, one or more participants do not vote, after a
predetermined and long time limit. If so, the coordinator decides to abort
and sends abort messages to all the participants that can hear it.6 Of course,
participants that are cut off from the coordinator at this time will not get the
message; they remain blocked, if they heard the earlier begin-vote, voted to
commit, and are unable to recover successfully when they time out.
The last place a timeout can occur is in Figure 10.5(a), where a participant
has finished its task and a long time elapses, during which it is never asked to
vote. Possibly the coordinator has failed or been cut off from this participant.
The participant decides to abort, so it can release its locks. Not shown in Figure
10.5(a) is the fact that if subsequently, this participant does get a begin-vote
message from its coordinator, it simply votes to abort. Some additional points
about the transitions of Figure 10.5 follow.
1. A transaction may have entered the Aborted or Committed state and still
be asked to send messages in response to a help-me. There is nothing
wrong with the supposition that a nonactive transaction will respond to
messages. In reality, the system consults its log and responds for the trans
action. In fact, normally all messages and state changes are managed by
the system, rather than being built into transactions.
2. In the blocked state, it makes sense to repeat the help-me message after
a while, in the hope that a node that was failed or disconnected will now
be available to help. In many systems, a node that recovers from a failure
will make its presence known anyway, since it must find out about what
happened to the transactions it was involved in and the items they changed.
Thus, a blocked subtransaction can resend help-me whenever a node with
a participant subtransaction reestablishes communication.
Recovery
In addition to logging all of the information discussed in Section 9.10, a dis
tributed system that is resilient against network failures must enter into the
log at each site the messages it sends and receives. When a node recovers,
or becomes reconnected to parts of the network that it could not reach for a
while, it is the responsibility of that node to find out what has happened to the
8 Notice that deciding to abort in ambiguous situations is always safe as long as no
participant can then decide to commit; that possibility is what makes the Willing-tocommit state the source of most of the complexity.
564
565
566
567
Receive
begin-vote
Send
vote-commit
Receive
prepare- commit
Knows all are
willing to
commit
Timeout
Receive
commit
Knows all know
that all are
willing to
commit
Figure 10.6(a) Participant in three-phase commit.
Our first (but erroneous) thought is that the two messages, preparecommit and commit, which the coordinator sends in sequence to the partic
ipants, cannot both be necessary. That is, the receipt of prepare-commit
assures the participant that commit will eventually be sent, unless the coordi
nator fails; in the latter case, surely the coordinator would have sent commit
if it could. However, if we eliminate one of the messages, then we are back
to two-phase commit, and Example 10.5 should convince the reader that par
ticipants can block, even under our restrictive failure model. Furthermore, if
we interleave the two messages, say by sending both to one participant, then
both to a second participant, and so on, we again behave like two-phase com
mit, and blocking is possible. In fact, the reader can show as an exercise that
568
Send
begin-vote
Receive any
vote-abort
Send
or Timeout ^abort
.
Must
abort
Receive all
vote-commit
Should
^commit
Send
prepare-commit
Send
commit
Committed
V
>
Figure 10.6(b) Coordinator in three-phase commit.
should the coordinator send any commit message prior to sending the last of
the prepare-commit messages, then blocking is possible.
What is essential about three-phase commit is that the coordinator sends
all of the prepare-commit messages out before it sends any commit message.
The intuitive reason is that the prepare-commit message informs each partic
ipant that all are willing to commit. If any participant Ti receives commit, it
knows that the coordinator has sent all its prepare-commit messages, and thus
every participant that is still live has received prepare-commit or is about to
do so, since the message could be delayed but not lost by the network. That is,
the receipt of a commit message by /', tells /', that all know all are willing to
commit.
Technically, Tj only knows that every participant T either knows that all
are willing to commit, or T will know it shortly, or T will fail before it re
ceives the prepare-commit. However, since the protocol of Figure 10.6 only
involves messages between the coordinator and participants, and because as
569
sumption (5) assures us messages are not lost, it can be assumed that messages
are received instantaneously. That is, when Tj commits, every participant has
either received prepare-commit or has already failed. The reason is that if
some TJ actually fails after the time Tj receives commit, but before Tj receives
prepare-commit, then there would be no observable change in the activity of
the network if we assumed that Tj had failed before Tj received commit. What
we have shown is that it is impossible for two participants to be simultaneously
in the Willing-to-commit and Committed states, respectively. This fact and
other useful observations about the protocol of Figure 10.6 are summarized in
the following lemma.
Lemma 10.1: Prior to transactions entering the recovery state, and under
the (justifiable) assumption that messages are delivered instantaneously, the
following states are incompatible.
a) One (live or failed) participant cannot have entered the Committed state
while any live participant is still in the Willing-to-commit state.
b) One (live or failed) participant cannot have entered the Aborted state while
another (live or failed) participant has entered the Committed state, or any
live participant has entered the Ready-to-commit state.9
Proof: For (a), we note that in order for a participant to enter the Committed
state before any recovery takes place, it must receive a commit message. By
the argument given above, we know that every live participant has (on the
assumption of instantaneous messages) received prepare-commit, and therefore
has left the Willing-to-commit state.
We leave (b) as an exercise. The reader has only to examine Figure 10.6 and
argue that a prepare-commit message cannot be sent if one or more participants
have aborted. D
Recovery in Three-Phase Commit
The consequence of Lemma 10.1 is that we cannot have a failed participant that
has aborted if any live transaction has reached as far as the Ready-to-commit
state, and we cannot have a failed participant that has committed if any live
transaction is still in the Willing-to-commit state. Thus, when one or more
participants detect the need for recovery, because of a timeout, we have only
to arrange that each live participant discloses to the others its state, or more
precisely, its state just before it entered the Recovery state. If all are in Willingto-commit or Aborted, then we know no failed participant has committed, and
it is safe for all to abort. If any has reached the Ready-to-commit state or the
9 In fact, it is not even possible for a failed participant to have entered Ready-to-commit,
but we state the conditions this way because we want them to be weak enough that
they are preserved during the recovery process.
570
Committed state, then no failed transaction can have aborted, so it is safe for
all to commit.
In the latter case, the distributed commitment process must be taken by
steps. That is, any participants still in the Willing-to-commit state must first
be brought to the Ready-to-commit state, and then all those in that state must
be made to commit. The reason we must continue in stages is that at any time,
more participants may fail, and we must avoid creating a situation where one
participant is in Willing-to-commit while another has already committed.
Electing a New Coordinator
As with two- or three-phase commit in general, the recovery process can be con
ducted in several different ways. As we have considered only the centralized, or
coordinator-based approach, because it tends to save messages, let us continue
with that approach now. Then as soon as one participant realizes recovery is
needed, it sends a message to all the other participants. Several participants
may reach this conclusion at about the same time, so many redundant messages
will be sent in the worst case, but not in the typical case.
Then, the live participants must attempt to elect a new coordinator, be
cause the only time we enter the Recovery state is if a participant has timed out
waiting for the coordinator to send a message. Each participant knows the orig
inal set of participants, although some now are failed. We may assume that the
participants are numbered 7\,...,T]b, and the lowest- indexed live participant
will be the new coordinator. Since T\ may have failed, we cannot just assume
T\ is the new coordinator. Rather, each participant must make known to the
others that it is live. If done properly, at most one live participant will conclude
that it is the new coordinator (because it never heard from any lower-numbered
participant).
One relatively efficient way to make the decision is for each Tj to send a
message with its index, i, to Tj+i,Tj+2,. .. ,Tfc in that order. However, if Tj
receives a message from a lower- numbered participant, then Trf knows it is not
the coordinator, and so stops sending messages. Most participants will stop
sending messages very quickly, but if some messages are delayed inordinately,10
then on the order of k2 messages could be sent.
After this step, each live participant will have a notion of who the new
coordinator is. If no failures occurred during the election, then all will have
the same notion. However, if the lowest-numbered participant failed during the
election, then there may be disagreement regarding who is the coordinator.
10 Note we are no longer assuming messages are sent instantaneously; that assumption was
justified only by the pattern of messages (to and from the coordinator) that is present
in the basic three-phase commit algorithm.
571
Example 10.6: Suppose there are participants 7\, . . . , 7V Also suppose that
during the election, the following sequence of events occurs.
1. TI sends a message to T2 before TI can send its own message to T3. Thus,
TI never sends any messages.
2. TI fails.
3. T3 sends a message to 7V T^ is thereby inhibited from sending any mes
sages.
The net effect of these events is that T2 thinks T\ is the coordinator, while
TS and T4 both think T3 is the coordinator. After a suitable timeout period, so
it can be determined that no more messages are being sent, T3 starts its roll as
coordinator by requesting the state of all participants.11 D
It is easy to show that no more than one live participant can think it is the
new coordinator. For suppose Ti and Tj both are live and think they are the
coordinator, where t < j. Since Ti thinks it is the coordinator, it never received
a message from any participant lower than t. Thus, it continued to send out
messages to the participants numbered above i, and in particular to Tj. Thus,
TJ would not think it is the coordinator.
It is possible that no live participant thinks it is the coordinator, in which
case the live participants will time out waiting for the recovery to begin. They
will then elect a new coordinator.
572
12 As an exercise, the reader should find a scenario in which several rounds of recovery are
necessary, during which a participant gets into the Ready-to-commit state then fails,
and the final decision is to abort.
573
Distributed Timestamps
While it may not be obvious, the most elementary approach to distributed
timestamping actually works. That is, we may let the computers at each node of
the network keep their own clocks, even though the clocks cannot possibly run in
synchronism. To avoid the same timestamp being given to two transactions, we
13 In unfortunate circumstances, these participants will find none of the participants that
made the final decision live at the moment, and then the recovering participant must
block.
574
require that the last fc bits of the "time" be a sequence that uniquely identifies
the node. For example, if there were no more than 256 nodes, we could let
k = 8 and give each node a distinct eight-bit sequence that it appended to its
local clock, as the low-order bits, to form the timestamp.
Even setting aside the theory of relativity, it is not realistic to suppose
that all of the clocks at all of the nodes are in exact synchronism. While minor
differences in the clocks at two nodes are of no great consequence, a major
difference can be fatal. For example, suppose that at node N, the clock is five
hours behind the other clocks in the system. Then, on the assumption that most
items are read and written within a five hour period, a transaction initiating at
TV will receive a timestamp that is less than the read- and write-times of most
items it seeks to access. It is therefore almost sure to abort, and transactions,
in effect, cannot run at N.
There is, fortunately, a simple mechanism to prevent gross misalignment
of clocks. Let each message sent bear its own timestamp, the time at which the
message left the sending node according to the clock of the sender. If a node
ever receives a message "from the future," that is, a message with a timestamp
greater than its current clock, it simply increments its clock to be greater than
the timestamp of the received message. If, say, a node was so inactive that it
did not discover that its clock had become five hours slow, then the first time it
ran a transaction it would receive a message telling it to abort the transaction it
was running. That message would include the "correct time." The node would
then update its clock and rerun the transaction with a realistic timestamp. We
shall thus assume from here on that the creation of timestamps that have global
validity is within the capability of a distributed DBMS.
A Timestamp-Based Algorithm
Next, let us consider the steps necessary to read and write items in such a way
that the effect on the database is as if each transaction ran instantaneously, at
the time given by its timestamp, just as was the case in Section 9.11. As in
Section 10.1, we shall consider the elementary step to be an action on a copy
of an item, not on the copy itself. However, when dealing with timestamps,
the elementary steps are not locking and unlocking, but examining and setting
read- and write-times on copies.
Many of the locking methods discussed in Section 10.2 have timestampbased analogs. We shall discuss only one, the analog of write-locks-all. When
reading an item A, we go to any copy of A and check that its write-time does
not exceed the timestamp of the transaction doing the reading. If the writetime is greater than the timestamp, we must abort the transaction.14 Looking
14 In terms of the distributed commitment algorithms discussed in Sections 10.4-5, the
subtransaction attempting to write must vote to abort.
575
576
and there is nothing else we can do. However, if there are other copies of A,
then we can proceed as if the copy at N did not exist. When N recovers, it not
only has the responsibility to find out about the transactions being committed
or aborted when it failed, but now it must find out which of its items are out
of date, in the sense that transactions have run at the other sites and modified
copies of items that, like A, are found at N and also at other nodes.
577
T2, which is waiting for A2 held by TS, and so on, while Tfc is waiting for Ak
held by T\. That follows because the fact that TI holds a lock on A\ while it
is waiting for AI tells us A\ < A2 in lexicographic order. Similarly, we may
conclude AI < A3 Ak < AI, which implies a cycle in the lexicographic order,
an impossibility.
With care, we can generalize this technique to work for distributed data
bases. If the locking method used is a centralized one, where individual items,
rather than copies, are locked, then no modification is needed. If we use a
locking method like the fc-of-n schemes, which lock individual copies, we can
still avoid deadlocks if we require all transactions to lock copies in a particular
order:
1.
2.
578
Waits-for-Graphs
We mentioned in Section 9.1 that a necessary and sufficient test for a deadlock
in a single-processor system is to construct a waits-for graph, whose nodes are
the transactions. The graph has an arc from T\ to TZ if T\ is waiting for a lock
on an item held by T^. Then there is a deadlock if and only if there is a cycle in
this graph. In principle, the same technique works in a distributed environment.
The trouble is that at each site we can maintain easily only a local waits-for
graph, while cycles may appear only in the global waits-for graph, composed of
the union of the local waits-for graphs.
Example 10.7: Suppose we have transactions 7\ and T2 that wish to lock
items A and B, located at nodes NA and NB, respectively. A and B may be
copies of the same item or may be different items. Also suppose that at NA,
(a subtransaction of) T^ has obtained a write-lock on A, and (a subtransaction
of) TI is waiting for that lock. Symmetrically, at NB TI has a lock on B, which
TI is waiting for.
__ / rr' \
v )
(a) Local waits-for graph at N&.
1.
2.
579
Use a central node to receive updates to the local waits-for graphs from all
of the sites periodically. This technique has the advantages and disadvan
tages of centralized methods of locking: it is vulnerable to failure of the
central node and to concentration of message traffic at that site,16 but the
total amount of traffic generated is relatively low.
Pass the current local waits-for graphs among all of the sites, preferring
to append the local graph to another message headed for another site if
possible, but sending the local graph to each other site periodically any
way. The amount of traffic this method generates can be much larger than
for the central-node method. However, if the cost of messages is relatively
invariant to their length, and frequently waits-for information can be "pig
gybacked" on other messages, then the real cost of passing information is
small.
580
items are locked by any given transaction, e.g., locking in lexicographic order or
taking all locks at once. There also are schemes that do not place constraints on
the order in which items are locked or accessed, but still can assure no deadlocks
occur. These schemes use timestamps on transactions, and each guarantees that
no cycles can occur in the global waits-for graph. It is important to note that
the timestamps are used for deadlock avoidance only; access control of items is
still by locking.
In one scheme, should (a subtransaction of) TI be waiting for (a subtransaction of) TI, then it must be that the timestamp of TI is less than the timestamp
of T2; in the second scheme, the opposite is true. In either scheme, a cycle in
the waits-for graph would consist of transactions with monotonically increasing
or monotonically decreasing timestamps, as we went around the cycle. Nei
ther is possible, since when we go around the cycle we come back to the same
timestamp that we started with.
We now define the two deadlock avoidance schemes. Suppose we have
transactions T\ and T2 with timestamps t\ and t2, respectively, and a subtransaction of TI attempts to access an item A locked by a subtransaction of
T2.
1. In the wait-die scheme, TI waits for a lock on A if ti < *2, i-e., if TI is the
older transaction. If ti > <2, then TI is aborted.
2. In the wound-wait scheme, TI waits for a lock on A if t\ > t2. If <i < *2'
then TI is forced to abort and release its lock on A to 7\.17
In either scheme, the aborted transaction must initiate again with the same
timestamp, not with a new timestamp. Reusing the original timestamp guar
antees that the oldest transaction, in either scheme, cannot die or be wounded.
Thus, each transaction will eventually be allowed to complete, as the following
theorem shows.
Theorem 10.3: There can be neither deadlocks nor livelocks in the wait-die
or the wound-wait schemes.
Proof: Consider the wait-die scheme. Suppose there is a cycle in the global
waits-for graph, i.e., a sequence of transactions TI, . . . , Tfc such that each T, is
waiting for release of a lock by Tj+i, for 1 < t < k, and Tfc is waiting for TI. Let
ti be the timestamp of Tj. Then <i < t^ < < tk < <i, which implies ti < <i,
an impossibility. Similarly, in the wound-wait scheme, such a cycle would imply
ti>t2>->tk>ti, which is also impossible.
To see why no livelocks occur, let us again consider the wait-die scheme. If
17 Incidentally, the term "wound-wait" rather than "kill-wait" is used because of the image
that the "wounded" subtransaction must, before it dies, run around informing all the
other subtransactions of its transaction that they too must abort. That is not really
necessary if a distributed commit algorithm is used, but the subject is gruesome, and
the less said the better.
Method
Timeout
Messages Phantom
aborts
581
Other
None
T is the transaction with the lowest timestamp, that is, T is the oldest trans
action that has not completed, then T never dies. It may wait for younger
transactions to release their locks, but since there are no deadlocks, those locks
will eventually be released, and T will eventually complete. When T first initi
ates, there are some finite number of live, older transactions. By the argument
above, each will eventually complete, making T the oldest. At that point, T is
sure to complete the next time it is restarted. Of course, in ordinary operation,
transactions will not necessarily complete in the order of their age, and in fact
most will proceed without having to abort.
The no-livelock argument for the wound-wait scheme is similar. Here, the
oldest transaction does not even have to wait for others to release locks; it takes
the locks it needs and wounds the transactions holding them. D
Comparison of Methods
Figure 10.8 summarizes the advantages and disadvantages of the methods we
have covered in this section. The column labeled "Messages" refers to the
message traffic needed to detect deadlocks. The column "Phantom aborts"
refers to the possibility that transactions not involved in a deadlock will be
required to abort.
582
EXERCISES
10.1: Suppose we have three nodes, 1, 2, and 3, in our network. Item A has
copies at all three nodes, while item B has copies only at 1 and 3. Two
transactions, T\ and T^ run, starting at the same time, at nodes 1 and 2,
respectively. Each transaction consists of the following steps:
RLOCK B; WLOCK A; UNLOCK A; UNLOCK B;
* 10.2:
* 10.3:
* 10.4:
* 10.5:
10.6:
Suppose that at each time unit, each transaction can send one message to
one site, and each site can read one message. When there is a choice of
sites to send or receive a message to or from, the system always chooses
the lowest numbered site. Additional messages are placed in a queue to be
sent or received at the next time units. Simulate the action of the network
under the following concurrency rules.
a) Write-locks-all.
b) Majority locking.
c) Primary site, assumed to be node 1 for A and 3 for B.
d) Primary copy token, with initially sites 2 and 3 holding read tokens
for A, and 1 holding the write token for B.
e) Timestamp-based concurrency control, assuming the timestamp of T\
exceeds that of T2, and both are greater than the initial read- and
write-times for all the copies.
Show that in order for no two link failures to disconnect a network of n
nodes, that network must have at least 3n/2 edges. Also show that there
are networks with [3n/2] edges that cannot be disconnected by the failure
of two links.
How many edges must an n-node network have to be resilient against the
failure of any k links?
Suppose that we have an incrementation lock mode, as in Example 9.10,
in addition to the usual read and write. Generalize the k-of-n methods to
deal with all three kinds of locks.
Some distributed environments allow a broadcast operation, in which the
same message is sent by one site to any desired subset of the other sites.
Redo the table of Figure 10.2 on the assumption that broadcasts are per
mitted and cost one message each.
Suppose that a logical read-lock requires that j physical copies be readlocked, and a logical write-lock requires write- locks on k copies. Show that
if either j + k < n or k < n/2, then logical locks do not work as they should
(thus, the fc-of-n strategies are the best possible).
EXERCISES
583
* 10.7: Determine the average number of messages used by the primary-copy token
method of denning locks, on the assumption that, when it is desired to lock
some item A,
i) 50% of the time a write token for A is available at the local site (and
therefore there is no read-token).
it) 40% of the time a read-token for A is available at the local site,
tit) 10% of the time neither a read- nor write-token for A is available at
the local site.
iv) Whenever a desired token is not available locally, all sites are willing
to give up whatever tokens they have to the requesting site, after the
necessary exchange of messages.
10.8: What happens when the transactions of Figure 10.3 are run under the lock
methods other than write-locks-all?18
* 10.9: We can perform a distributed two-phase commit without a coordinator if
we have each of the n participants send their votes to all other participants.
a)
10.10:
10.11:
10.12:
10.13:
10.14:
18 Note: Example 10.3 talks about similar pairs of transactions and their behavior under
the other lock methods. We are interested in the exact transactions of Figure 10.3.
584
10.15: Suppose there are four participants, T\ (the coordinator), TI, TS, and T4, in
a two-phase commit. Describe what happens if the following failures occur.
In each case, indicate what happens during recovery (if the recovery phase
is entered), and tell whether any transaction blocks.
a) TI fails after sending vote-commit to T^ and TS, but not IV
b) TI fails after sending vote-abort; the other participants vote to com
mit.
c) TI fails before voting; the other participants vote to commit.
d) All vote to commit, but T\ fails before sending out any commit mese)
f)
10.16:
10.17:
10.18:
10.19:
10.20:
*
**
10.21:
10.22:
10.23:
BIBLIOGRAPHIC NOTES
585
10.24: Suppose that there are three items AI, A2, and AS at sites Si, S2, and
S3, respectively. Also, there are three transactions, 7\, T2, and T3, with
Ti initiated at site Si, for i = 1,2,3. The following six events happen,
sequentially:
TI locks AI; TI locks AI; T3 locks A3;
TI asks for a lock on A2; T2 asks for a lock on A3;
TS asks for a lock on AI.
a)
b)
c)
d)
e)
f)
BIBLIOGRAPHIC NOTES
As was mentioned in Chapter 9, many of the key ideas in concurrency and
distributed systems were enunciated by Gray [1978], and an extensive, modern
treatment of the subject can be found in Bernstein, Hadzilacos, and Goodman
[1987].
Additional surveys of distributed database systems are Rothnie and Good
man [1977], Bernstein and Goodman [1981], and the text by Ceri and Pelagatti
[1984].
Distributed Concurrency Control
The fc-of-n family of locking strategies is from Thomas [1975, 1979]. The pri
mary site method is evaluated by Stonebraker [1980], the central node technique
in Garcia-Molina [1979], and primary-copy token methods in Minoura [1980].
Timestamp-based, distributed concurrency control is discussed in Bern
stein and Goodman [1980b]. The method of maintaining global timestamps in
a distributed system is by Lamport [1978].
Additional methods are covered in Bayer, Elhardt, Heller, and Reiser
[1980], while Traiger, Gray, Galtieri, and Lindsay [1982] develop the concepts
underlying distributed concurrency control.
Some of the complexity theory of distributed concurrency control is found
in Kanellakis and Papadimitriou [1981, 1984].
586
Two-phase commit is from Lampson and Sturgis [1976] and Gray [1978]. Threephase commit is from Skeen [1981].
The complexity of commit protocols is examined in Dwork and Skeen [1983]
and Ramarao [1985]. Segall and Wolfson [1987] discuss minimal-message algo
rithms for commit, assuming no failures.
The knowledge-theoretic definition of two- and three-phase commitment is
taken from Hadzilacos [1987].
Leader election in distributed database systems is covered by GarciaMolina [1982]. Peleg [1987] gives references and optimal algorithms for leader
election in many cases, although the model does not take into account failure
during the election.
Recovery
The works by Menasce, Popek, and Muntz [1980], Minoura [1980] Skeen and
Stonebraker [1981], and Bernstein and Goodman [1984] contain analyses of the
methods for restoring crashed, distributed systems.
Many other algorithms have been proposed for maintaining replicated data,
allowing partition of the network, and then restoring or updating copies cor
rectly when the network becomes whole. See Eager and Sevcik [1983], Davidson
[1984], Skeen and Wright [1984], Skeen, Cristian, and El Abbadi [1985], and El
Abbadi and Toueg [1986].
Distributed Deadlocks
Menasce and Muntz [1979] and Obermarck [1982] give distributed deadlock de
tection algorithms. Timestamp-based deadlock detection (wait-die and woundwait) are from Stearns, Lewis, and Rosenkrantz [1976] and Rosenkrantz,
Stearns, and Lewis [1978].
The complexity of distributed deadlock detection is treated by Wolfson and
Yannakakis [1985].
Systems
One of the earliest distributed database system experiments was the SDD-1
system. Its distributed aspects are described in Bernstein, Goodman, Rothnie,
and Papadimitriou [1978], Rothnie et al. [1980], Bernstein and Shipman [1980],
Bernstein, Shipman, and Rothnie [1980], Hammer and Shipman [1980], and
Bernstein, Goodman, Wong, Reeve, and Rothnie [1981]. See also the comment
on the system by McLean [1981].
BIBLIOGRAPHIC NOTES
587
BIBLIOGRAPHY
BIBLIOGRAPHY
589
ANSI [1975]. "Study group on data base management systems: interim report,"
FDT 7:2, ACM, New York.
Apt, K. R. [1987]. "Introduction to logic programming," TR 87-35, Dept. of
CS, Univ. of Texas, Austin. To appear in Handbook of Theoretical Computer
Science (J. Van Leeuwen, ed.), North Holland, Amsterdam.
Apt, K. R., H. Blair, and A. Walker [1985]. "Towards a theory of declarative
knowledge," unpublished memorandum, IBM, Yorktown Hts., N. Y.
Apt, K. R. and J.-M. Pugin [1987]. "Maintenance of stratified databases viewed
as a belief revision system," Proc. Sixth ACM Symp. on Principles of Database
Systems, pp. 136-145.
Apt, K. R. and M. H. Van Emden [1982]. "Contributions to the theory of logic
programming," J. ACM 29:3, pp. 841-862.
Armstrong, W. W. [1974]. "Dependency structures of data base relationships,"
Proc. 1974 IFIP Congress, pp. 580-583, North Holland, Amsterdam.
Arora, A. K. and C. R. Carlson [1978]. "The information preserving properties
of certain relational database transformations," Proc. Intl. Con/. on Very Large
Data Bases, pp. 352-359.
Astrahan, M. M. and D. D. Chamberlin [1975]. "Implementation of a structured
English query language," Comm. ACM 18:10, pp. 580-587.
Astrahan, M. M., et al. [1976]. "System R: a relational approach to data man
agement," ACM Trans, on Database Systems 1:2, pp. 97-137.
Astrahan, M. M., et al. [1979]. "System R: a relational database management
system," Computer 12:5, pp. 43-48.
Bachman, C. W. [1969]. "Data structure diagrams," Data Base 1:2, pp. 4-10.
Badal, D. S. [1980]. "The analysis of the effects of concurrency control on
distributed database system performance," Proc. Intl. Conf. on Very Large
Data Bases, pp. 376-383.
Balbin, I. and K. Ramamohanarao [1986]. "A differential approach to query
optimization in recursive deductive databases," TR-86/7, Dept. of CS, Univ.
of Melbourne.
Bancilhon, F. [1986]. "A logic-programming/object-oriented cocktail," SIGMOD Record, 15:3, pp. 11-21.
Bancilhon, F. and S. Khoshafian [1986]. "A calculus for complex objects," Proc.
Fifth ACM Symp. on Principles of Database Systems, pp. 53-59.
590
BIBLIOGRAPHY
BIBLIOGRAPHY
591
592
BIBLIOGRAPHY
BIBLIOGRAPHY
593
594
BIBLIOGRAPHY
BIBLIOGRAPHY
595
596
BIBLIOGRAPHY
BIBLIOGRAPHY
597
BIBLIOGRAPHY
BIBLIOGRAPHY
599
Gray, J. N., et al. [1981]. "The recovery manager of the system R database
manager," Computing Surveys 13:2, pp. 223-242.
Gray, J. N., R. A. Lorie, and G. R. Putzolo [1975]. "Granularity of locks in a
shared database," Proc. Intl. Conf. on Very Large Data Bases, pp. 428-451.
Gray, J. N., G. R. Putzolo, and I. L. Traiger [1976]. "Granularity of locks
and degrees of consistency in a shared data base," in Modeling in Data Base
Management Systems (G. M. Nijssen, ed.), North Holland, Amsterdam.
Greenblatt, D. and J. Waxman [1978]. "A study of three database query lan
guages," in Shneiderman [1978], pp. 77-98.
Griffiths, P. P. and B. W. Wade [1976]. "An authorization mechanism for a
relational database system," ACM Trans, on Database Systems 1:3, pp. 242255.
Gudes, E. and S. Tsur [1980]. "Experiments with B-tree reorganization," ACM
SIGMOD Intl. Conf. on Management of Data, pp. 200-206.
Gurevich, Y. and H. R. Lewis [1982]. "The inference problem for template
dependencies," Proc. First ACM Symp. on Principles of Database Systems, pp.
221-229.
Hadzilacos, T. and C. H. Papadimitriou [1985]. "Some algorithmic aspects of
multiversion concurrency control," Proc. Fourth ACM Symp. on Principles of
Database Systems, pp. 96-104.
Hadzilacos, T. and M. Yannakakis [1986]. "Deleting completed transactions,"
Proc. Fifth ACM Symp. on Principles of Database Systems, pp. 43-46.
Hadzilacos, V. [1982]. "An algorithm for minimizing roll back cost," Proc. First
ACM Symp. on Principles of Database Systems, pp. 93-97.
Hadzilacos, V. [1987]. "A knowledge-theoretic analysis of atomic commitment
protocols," Proc. 5ixth ACM Symp. on Principles of Database Systems, pp.
129-134.
Haerder, T. and A. Reuter [1983]. "Principles of transaction oriented database
recoverya taxonomy," Computing Surveys 15:4, pp. 287-317.
Hagihara, K., M. Ito, K. Taniguchi, and T. Kasami [1979]. "Decision problems
for multivalued dependencies in relational databases," SIAM J. Computing 8:2,
pp. 247-264.
Hammer, M. and D. McLeod [1981]. "Database description with SDM: a se
mantic database model," ACM Trans, on Database 5ystems 6:3, pp. 351-386.
Hammer, M. and D. Shipman [1980]. "Reliability mechanisms for SDD-1: a
600
BIBLIOGRAPHY
system for distributed databases," ACM Trans, on Database Systems 5:4, pp.
431-466.
Harel, D. [1986]. "Logic and databases: a critique," SIGACT News 18:1, pp.
68-74.
Heath, I. J. [1971]. "Unacceptable file operations in a relational data base,"
ACM SIGFIDET Workshop on Data Description, Access, and Control, pp. 1933.
Heiler, S. and A. Rosenthal [1985]. "G-WHIZ: a visual interface for the func
tional model with recursion," Proc. Intl. Conf. on Very Large Data Bases, pp.
209-218.
Held, G. and M. Stonebraker [1978]. "B-trees reexamined," Comm. ACM 21:2,
pp. 139-143.
Holt, R. C. [1972]. "Some deadlock properties in computer systems," Comput
ing Surveys 4:3, pp. 179-196.
Honeyman, P. [1982]. "Testing satisfaction of functional dependencies," J. ACM
29:3, pp. 668-677.
Hull, R. and R. King [1987]. "Semantic database modeling: survey, applica
tions, and research issues," CRI-87-20, Computer Research Inst., USC.
Hull, R. and C. K. Yap [1984]. "The format model, a theory of database
organization," J. ACM 31:3, pp. 518-537.
Hunt, H. B. III and D. J. Rosenkrantz [1979]. "The complexity of testing
predicate locks," ACM SIGMOD Intl. Conf. on Management of Data, pp. 127133.
IBM [1978a]. Query-by Example Terminal Users Guide, SH20-2078-0, IBM,
White Plains, N. Y.
IBM [1978b]. IMS/VS publications, especially GH20-1260 (General Informa
tion), SH20-9025 (System/Application Design Guide), SH20-9026 (Application
Programming Reference Manual), and SH20-9027 (Systems Programming Ref
erence Manual), IBM, White Plains, N. Y.
IBM [1984]. "SQL/data system application programming for VM/system prod
uct," SH24-5068-0, IBM, White Plains, N. Y.
IBM [1985a]. "SQL/RT database programmer's guide," IBM, White Plains,
NY.
IBM [1985b]. "Easy SQL/RT user's guide," IBM, White Plains, NY.
BIBLIOGRAPHY
601
602
BIBLIOGRAPHY
Kellogg, C., A. O'Hare, and L. Travis [1986]. "Optimizing the rule-data inter
face in a KMS," Proc. Intl. Conf. on Very Large Data Bases, pp. 42-51.
Kent, W. [1979]. "Limitations of record-based information models," ACM
Trans, on Database Systems 4:1, pp. 107-131.
Kerschberg, L., A. Klug, and D. C. Tsichritzis [1977]. "A taxonomy of data
models," in 5ystems for Large Data Bases (Lockemann and Neuhold, eds.),
North Holland, Amsterdam, pp. 43-64.
Khoshafian, S. N. and G. P. Copeland [1986]. "Object identity," OOPSLA '86
Proceedings, ACM, New York, pp. 406-416.
Kim, W. [1979]. "Relational database systems," Computing Surveys 11:3, pp.
185-210.
Klug, A. [1981]. "Equivalence of relational algebra and relational calculus query
languages having aggregate functions," J. ACM 29:3, pp. 699-717.
Knuth, D. E. [1968]. The Art of Computer Programming, Vol. 1, Fundamental
Algorithms, Addison-Wesley, Reading Mass.
Knuth, D. E. [1973]. The Art of Computer Programming, Vol. 3, Sorting and
Searching, Addison-Wesley, Reading Mass.
Korth, H. F. [1983]. "Locking primitives in a database system," J. ACM 30:1,
pp. 55-79.
Korth, H. F. and A. Silberschatz [1986]. Database System Concepts, McGrawHill, New York.
Kowalski, R. A. [1974]. "Predicate logic as a programming language," Proc.
1974 IFIP Congress, pp. 569-574, North Holland, Amsterdam.
Kuhns, J. L. [1967]. "Answering questions by computer; a logical study," RM5428-PR, Rand Corp., Santa Monica, Calif.
Kung, H.-T. and C. H. Papadimitriou [1979]. "An optimality theory of con
currency control for databases," ACM SIGMOD Intl. Conf. on Management of
Data, pp. 116-126.
Kung, H.-T. and J. T. Robinson [1981]. "On optimistic concurrency control,"
ACM Trans, on Database Systems 6:2, pp. 213-226.
Kunifuji, S. and H. Yokuta [1982]. "PROLOG and relational databases for
fifth-generation computer systems," TR002, ICOT, Tokyo.
Kuper, G. M. [1987]. "Logic programming with sets," Proc. 5ixth ACM Symp.
on Principles of Database Systems, pp. 11-20.
BIBLIOGRAPHY
603
604
BIBLIOGRAPHY
BIBLIOGRAPHY
605
606
BIBLIOGRAPHY
tributed database systems," Ph. D. Thesis, Dept. of EE, Stanford Univ., Stan
ford, Calif.
Minsky, N. H. and D. Rozenshtein [1987]. "Law-based approach to objectoriented programming," Proc. 1987 OOPSLA Conf.
Mitchell, J. C. [1983]. "Inference rules for functional and inclusion dependen
cies," Proc. Second ACM Symp. on Principles of Database Systems, pp. 58-69.
Moffat, D. S. and P. M. D. Gray [1986]. "Interfacing Prolog to a persistent
data store," Proc. Third Intl. Conf. on Logic Programming, pp. 577-584.
Mohan, C., B. G. Lindsay, and R. Obermarck [1986]. "Transaction management
in the R* Distributed database management system," ACM Trans, on Database
Systems 11:4, pp. 378-396.
Morris, K., J. F. Naughton, Y. Saraiya, J. D. Ullman, and A. Van Gelder [1987].
"YAWN! (yet another window on NAIL!)," to appear in Database Engineering.
Morris, K., J. D. Ullman, and A. Van Gelder [1986]. "Design overview of the
NAIL! system," Proc. Third Intl. Conf. on Logic Programming, pp. 554-568.
Morris, R. [1968]. "Scatter storage techniques," Comm. ACM 11:1, pp. 38-43.
MRJ [1978]. 5ystem 2000 Reference manual, MRI Systems Corp., Austin, Tex.
Naish, L. [1986]. "Negation and control in Prolog," Lecture Notes in Computer
Science 238, Springer-Verlag, New York.
Naqvi, S. [1986]. "Negation in knowledge base management systems," in Brodie
and Mylopoulos [1986], pp. 125-146.
Nicolas, J. M. [1978]. "Mutual dependencies and some results on undecomposable relations," Proc. Intl. Conf. on Very Large Data Bases, pp. 360-367.
Obermarck, R. [1982]. "Distributed deadlock detection algorithm," ACM
Trans, on Database 5ystems 7:2, pp. 187-208.
Olle, T. W. [1978]. The Codasyl Approach to Data Base Management, John
Wiley and Sons, New York.
Orenstein, J. A. and T. H. Merrett [1984]. "A class of data structures for
associative searching," Proc. Fourth ACM 5ymp. on Principles of Database
Systems, pp. 181-190.
Osborn, S. L. [1977]. "Normal forms for relational databases," Ph. D. Thesis,
Univ. of Waterloo.
Osborn, S. L. [1979]. "Testing for existence of a covering Boyce-Codd normal
form," In/ormation Processing Letters 8:1, pp. 11-14.
BIBLIOGRAPHY
607
Ozsoyoglu, G. and H. Wang [1987]. "On set comparison operators, safety, and
QBE," unpublished memorandum, Dept. of CSE, Case Western Reserve Univ.,
Cleveland, Ohio.
Ozsoyoglu, M. Z. and L.-Y. Yuan [1985]. "A normal form for nested relations,"
Proc. Fourth ACM Symp. on Principles of Database Systems, pp. 251-260.
Paige, R. and J. T. Schwartz [1977]. "Reduction in strength of high level op
erations," Proc. Fourth ACM Symp. on Principles of Programming Languages,
pp. 58-71.
Papadimitriou, C. H. [1979]. "The serializability of concurrent database up
dates," J. ACM 26:4, pp. 631-653.
Papadimitriou, C. H. [1983]. "Concurrency control by locking," J. ACM 12:2,
pp. 215-226.
Papadimitriou, C. H. [1986]. The Theory of Database Concurrency Control,
Computer Science Press, Rockville, Md.
Papadimitriou, C. H., P. A. Bernstein, and J. B. Rothnie Jr. [1977]. "Com
putational problems related to database concurrency control," Proc. Con/. on
Theoretical Computer Science, Univ. of Waterloo, Waterloo, Ont.
Papadimitriou, C. H. and P. C. Kanellakis [1984]. "On concurrency control by
multiple versions," ACM Trans, on Database Systems 9:1, pp. 89-99.
Paredaens, J. and D. Jannsens [1981]. "Decompositions of relations: a compre
hensive approach," in Gallaire, Minker, and Nicolas [1980].
Peleg, D. [1987]. "Time-optimal leader election in general networks," unpub
lished memorandum, Dept. of CS, Stanford Univ.
Perl, Y., A. Itai, and H. Avni [1978]. "Interpolation searcha log log n search,"
Comm. ACM 21:7, pp. 550-553.
Pirotte, A. [1978]. "High level data base query languages," in Gallaire and
Minker [1978], pp. 409-436.
Przymusinski, T. C. [1986]. "An algorithm to compute circumscription," un
published memorandum, Dept. of Math. Sci., Univ. of Texas, El Paso.
Przymusinski, T. C. [1988]. "On the declarative semantics of stratified deduc
tive databases and logic programs," in Minker [1988].
Ramakrishnan, R., F. Bancilhon, and A. Silberschatz [1987]. "Safety of recur
sive Horn clauses with infinite relations," Proc. 5ixth ACM Symp. on Principles
of Database Systems, pp. 328-339.
608
BIBLIOGRAPHY
BIBLIOGRAPHY
609
610
BIBLIOGRAPHY
BIBLIOGRAPHY
611
612
BIBLIOGRAPHY
BIBLIOGRAPHY
613
614
BIBLIOGRAPHY
BIBLIOGRAPHY
615
INDEX
Argument 101
Arity 44, 101
Armstrong, W. W. 384, 441
Armstrong's axioms 384-387, 414, 441
Arora, A. K. 442
Assignment 175, 177, 191-192, 272
Associative law 62-63
Astrahan, M. M. 238
Atom 24
Atomic formula 24, 101, 146
Atomicity 468-469, 542, 545-546
Attribute 3, 25, 35, 37, 44, 226, 273
See also Prime attribute
Attribute renaming 179, 192, 217
Augmentation 384, 414-415
Authorization table 17, 460
Automatic insertion 258
Average
See Aggregation
Avni, H. 375
AWK239
Axioms 384, 414-415, 443-445
See also Armstrong's axioms, In
ference, of dependencies
Abiteboul, S. 95
Abort, of transaction 469, 476, 508, 512,
517, 520, 530, 557, 579-581
Abstract data type 22, 43, 95
See also Class, Data abstraction,
Encapsulation
Access control 2
See also Security
Active transaction 509
Acyclic polygraph 495
ADABAS 292
Address 301
Address calculation search
See Interpolation search
Aggregation 95, 145, 171, 175, 194-195,
203-204, 216-219
Aggregation by groups
See Group-by
Aggressive protocol 511-512, 515-516,
540
Aghili, H. 542
Agrawal, R. 542, 586
Aho, A. V. 65, 95, 239, 362, 374-375,
421, 441, 445
Algebraic dependency 444
Allman, E. 238
Alpine 587
Anomaly
See Deletion anomaly, Insertion
anomaly, Update anomaly
ANSI/SPARC 29
Append statement 191
Application program 14-15
Apt, K. R. 171, 173
Archiving 523-524
B
Bachman, C. W. 94
Bachman diagram 94
Backup
See Archiving
Badal, D. S. 586
Balbin, I. 172
Bancilhon, F. 95, 171-172
Baroody, J. A. Jr. 30
Bayer, R. 172, 375, 541, 585
616
INDEX
617
BCNF
See Boyce-Codd normal form
Beck, L. L. 95
Beech, D. 95
Been, C. 172, 416, 418, 421, 441-445,
542
Bentley, J. L. 375
Berman, R. 238
Bernstein, P. A. 31, 239, 441-442, 445,
540-542, 585-586
Bidiot, N. 172
Biliris, A. 541
Binary search 313-314
Binary search tree 362
Biskup, J. 442-443
Blair, H. 173
Blasgen, M. W. 238
Block 296, 518
Block access 296
Block directory
See Directory
Blocking, of transactions 559-560, 564573
Bocca, J. 30
Body, of a rule 102, 107-111
Bolour, A. 375
Bosak, R. 94
Bound variable 145-147
Boyce, R. F. 238
Boyce-Codd normal form 401-409, 420,
438, 440, 442-443, 445
Bradier, A. 172
Broadcast 582
Brodie, M. L. 30, 94
Brown, M. R. 587
Browne, J. C. 542
B-tree 321-328, 331, 351-352, 357, 375,
502, 541
Bucket 306-307
Buckley, G. N. 541
Buffer 296
Built-in predicate 101-102, 107
Burkhard, W. A. 375
C 227-234
CAD database 19, 354
CALC location mode 344-347
CALC-key 250-252, 259
Candidate key 48, 383
Cardenas, A. F. 292-293
Carey, M. J. 541, 586
Carlson, C. R. 442
Cartesian product
See Product
Casanova, M. A. 444
Cascading rollback 510-511, 529-531
CASE database
See Software engineering database
Casey, R. C. 441
Central node locking 553-554, 579
Ceri, S. 585
Chain mode 346
Chamberlin, D. D. 238
Chandra, A. K. 95, 171-172, 444
Chandy, K. M. 542
Chase 430-434, 444
Checkpoint 522-524
Chen, P. P. 94
Childs, D. L. 94
Cincom 292
Circumscription 173
Clark, K. L. 172-173
Class 85, 271-272, 275
See also Abstract data type
Clause 102
See also Horn clause, Rule
Clifford, J. 30
Clippinger, R. F. 94
Clock 573-574
Clocksin, W. F. 30
Closed world assumption 161-164, 172173
Closure, of a set of attributes 386, 388389, 400, 445
Closure, of a set of dependencies 383,
388-390, 399, 418
Clustering of records 335-337, 367-368
COBOL 240, 246
618
INDEX
Crash recovery
See Resiliency
Cristian, F. 586
Culik, K. II 375
Cullinane 292
Currency pointer 246-249
Current of record type 247-249, 251
Current of run-unit 247-250, 256-257,
260-261, 264
Current of set type 247-249, 259
Current parent 264, 268-269
Cursor 231
CWA
See Closed world assumption
Cylinder 17
D
DAG
See DAG protocol, Directed acyc
lic graph
DAG protocol 537, 541
Dangling reference 298, 320
Dangling tuple 50-53, 394
Data abstraction
See Encapsulation
Data curator 464
Data definition language 8, 12-13, 207210, 223-227, 240-246, 262-265,
271-278
Data dependency
See Dependency
Data independence 11-12
Data item 241
Data manipulation language
See Query language, Subschema
data manipulation language
Data model 2-3, 8, 32-34, 96
See also Datalog, Entity-relation
ship model, Hierarchical model,
Network model, Object model, Re
lational model
Database 2
Database administrator 16
Database catalog 225-227
Database integration 9, 29
INDEX
619
Dependency 376-377
See also Algebraic dependency,
Equality-generating
dependen
cy, Functional dependency, Gen
eralized dependency, Implicational
dependency, Inclusion dependen
cy, Join dependency, Multivalued
dependency, Subset dependency,
Tuple-generating dependency
Dependency basis 417-419, 443
Dependency graph 103-104, 106
Dependency preservation 398-401, 403404, 408-412, 442
Depth, of a transaction 483
Derivative 172, 447, 449-452, 465
Derochette, D. 542
DeWitt, D. J. 30, 542
Dictionary order
See Lexicographic order
Difference 55-57, 178, 189-190
DiPaola, R. A. 172
Direct location mode 345
Directed acyclic graph 537
Directory 303
Dirty data 509-510
Disk 17-18, 296-297, 468
Dissly, C. W. 542
Distributed system 543-587
DL/I 262, 264, 266-271
DML
See Data manipulation language
Dobbs, C. 94
Domain 43, 208
Domain closure assumption 162
Domain relational calculus 148-156,
195-196
Domain size 443-444
Domain variable 196
Domain-independent formula 151-152,
172
DRC
See Domain relational calculus
Duplicates 201, 203-204, 216
Dwork, C. 586
620
INDEX
E
Eager, D. 586
EDB
See Extensional database
El Abbadi, A. 586
El Masri, R. 29, 95
Election, of a coordinator 570-571, 586
Elhardt, K. 585
Ellis, C. S. 541-542
Embedded dependency 426-428, 439
Embedded multivalued dependency
422-423, 443
Empty relation 93
Empty tuple 93
Encapsulation 22
See also Data abstraction
Encryption 456
Entity 34
Entity set 33-34, 37, 45-46, 48, 67, 380
Entity-relationship diagram 37-38, 40,
45-49, 67, 73, 87
Extension
See Instance, of a database, ISBL
extension
Extensional database 10-11, 100-101,
171
INDEX
621
G
Galil, Z. 443
Gallaire, H. 171
Galtieri, A. 585
Garcia-Molina, H. 542, 554, 585-586
Garey, M. R. 440
Gelembe, E. 542
Gelfond, M. 172-173
Gemstone 30, 271, 293, 462
See also OPAL
Generalization 95
Generalized closed world assumption
164, 173
Generalized dependency 423-434, 440,
443-444
Generalized projection 167
Genesereth, M. R. 30
Get statement 249, 264, 266-269
Ginsberg, M. 172
Ginsburg, S. 442
Global clock 573-574, 585
Global item 545-546
Global transaction 546
Goldberg, A. 293
Goldfinger, R. 94
Gonzalez-Rubio, R. 172
Goodman, N. 31, 441, 540-542, 585-586
Gotlieb, C. C. 374
Gottlob, G. 442
Graham, M. H. 443
Granularity 469-470, 540-541
Graph
See Dependency graph, Directed
acyclic graph, Polygraph, Serial
ization graph, Waits-for graph
Graphics database 19-20, 354
Gray, J. N. 540-542, 585-586
Gray, P. M. D. 30
Greenblatt, D. 238
Griffiths, P. P. 466
Ground atom 162
Group 461
Group-by 195, 217-219
Grumbach, S. 95
Gudes, E. 375
Gurevich, Y. 444
H
Hadzilacos, T. 541-542
Hadzilacos, V. 31, 540, 542, 565, 585586
Haerder, T. 542
Hagihara, K. 443
Hammer, M. M. 95, 238, 586
Harel, D. 95, 171-172
Hash function 306-307
Hash table 3, 347
Hashing 306-310, 328, 331, 351, 357,
375
See also Partitioned hashing, Par
titioned hashing
Head, of a rule 102
See also Rectified rule
Heap 304-306, 351
Heath, I. J. 441
Heiler, S. 95
Held, G. 238, 375
Heller, H. 585
Hierarchical model 28, 72-82, 94, 346350, 457, 502
See also IMS
HISAM 347-349
Holt, R. C. 542
Honeyman, P. J. 442-443
Hopcroft, J. E. 65, 362, 374
Horn clause 25, 47-128, 163, 448
Host language 14-16, 18-21, 28-30,
227-234, 246
Howard, J. H. 416, 441, 443, 445
Hull, R. 95, 172
Hunt, H. B. III 541
Hypothesis row 424
INDEX
622
Instance, of a database 10
Instance, of a pattern 332
Instance variable 275, 295
Integrity 7, 446-456, 466
Integrity constraint 102, 379, 398, 447448
Intension
See Scheme
Intensional database 11, 100-101, 171
Interpolation search 314-315, 375
Interpretation 97-98
Intersection 57-58, 62, 168, 178
IRIS 30
Isa hierarchy 35-37, 40, 67
See also Type hierarchy
Isam 310-321, 331, 347, 351, 357
ISBL 177-185, 238, 457
ISBL extension 184-185
Itai, A. 375
Item 469, 502, 545-546
Ito, M. 443
Jaeschke, G. 95
Jannsens, D. 444
Jarke, M. 30
Jasper, R. B. 94
Johnson, D. S. 440
Join 64-65, 176-178, 239, 450, 464-465,
470
See also Natural join, Semijoin, 6join
Join dependency 425-426, 440, 444-445
Jou, J. H. 443
Journal
See Log
K
Kambayashi, Y. 31, 443
Kameda, T. 442
Kanellakis, P. C. 444-445, 541, 585
Kasami, T. 443
INDEX
KBMS
See Knowledge-base management
system
fc-d-tree 361-368, 375
Keating, W. 94
Kedem, Z. 503, 541
Keller, A. 239
Kellogg, C. 30
Kendrick, G. 94
Kent, J. 542
Kent, W. 238
Kernighan, B. W. 239
Kerschberg, L. 94
Key 35-36, 47-50, 205, 208, 294, 297298, 304, 308, 311, 323, 381, 383,
402, 440, 443-445, 452-453, 470
See also Database key
Khoshafian, S. N. 30, 95
Kim, W. 238
King, R. 95
King, W. F. 238
Kleitman, D. J. 375
Klug, A. 29, 94, 171
Knowledge system 23 24, 30, 32
Knowledge-base management system 1,
24, 28-29
See also Knowledge system
Knuth, D. E. 307, 374-375
Jfc-of-n locking 550, 554, 577, 582, 585
Kolling, K. 587
Korth, H. F. 31, 95, 540
Kowalski, R. A. 30, 171
Kranning, L. A. 94
Kreps, P. 238
Kuhns, J. L. 94, 171
Kung, H.-T. 540-541
Kunifuji, S. 30
Kuper, G. M. 95, 171-172
Lacroix, M. 94
Ladner, R. E. 542
Lai, M. Y. 542
Lamport, L. 585
Lampson, B. 586
623
Larson, P. 375
LDL30
Le, V. T. 172
Least fixed point 117, 119, 122-123,
126-129, 131
Leftmost child 349-350
Leftmost record 266-267
Lehman, P. L. 541
Level number 242
Levels of abstraction 7, 10, 29
Levien, R. E. 94, 171
Lewis, H. R. 444
Lewis, P. M. II 540, 586
Lewis, T. G. 375
Lexicographic order 311
Lien ,Y. E. 443
Lifschitz, V. 172-173
Limited variable 105, 153, 158
Lindsay, B. G. 585, 587
Ling, H. 375
Ling, T. W. 442
Link 66-7, 71, 73, 78, 240, 342-343,
543
See also Many-one relationship
Linvy, M. 586
Lipski, W. Jr. 94, 542
Literal 102, 146
Litwin, W. 29, 375
Liu, L. 441
Livelock 472-473, 513-514
Lloyd, J. W. 171
Local item 545-546
Local transaction 546
Local-area network 543
Location mode 344-346
Lochovsky, F. H. 94, 292-293
Lock 17-18, 270, 467-472, 477-479, 502,
505, 512, 540, 546 554, 575
See also Read-lock, Warning lock,
Write-lock
Lock compatability matrix 490, 507508
Lock manager 469-471
Lock mode 490-492, 537, 540
Lock point 485, 524, 556
Lock table 470, 529
624
INDEX
Lone, R. A. 540-542
Lossless join 393-398, 403-408, 411412, 419-420, 440-442, 444
Lozano, T. 375
Lucchesi, C. L. 445
Lueker, G. S. 375
Lum, V. 375
Lynch, N. 542
M
Maier, D. 30, 293, 432, 441-445
Main file 312
Main memory
See Volatile storage
Majority locking 548-550, 554
Makowsky, J. A. 444
Manber, U. 542
Mandatory retention 258
Manna, Z. 171
Manual deletion 260-261
Manual insertion 258, 260
Many-many relationship 33, 39-40, 48,
72, 78-79
Many-one relationship 39, 49, 65, 380
See also Link
Mapping
See Set-of-mappings (representa
tion of relations)
625
INDEX
Muntz, R. R. 586
Mylopoulos, J. 30, 94
o
N
NAIL! 30
Naish, L. 172
Naive evaluation 119, 126
Naqvi, S. 172
Natural join 59-60, 62, 72, 122
See also Join
Naughton, J. F. 30
Navigation 3-4, 21, 64, 71-72, 86-87,
249, 281-282, 346
Negation by failure 172-173
Negation, in rules 97, 99, 128-139, 145,
172
See also Stratified negation
Negation, logical 139-141
Negative literal 102
Nested record structure 330, 332-339,
342, 346, 352
Nested transaction 542
Network 66, 73-74, 77, 543-545
Network failure 559, 582
Network model 28, 65-72, 94, 292, 342346, 457
See also CODASYL, CODASYL
DDL, CODASYL DML
Nicolas, J.-M. 171, 444
Nievergelt, J. 375
Nilsson, N. J. 30
Node 543
Node failure 544, 565
Nonfatal error 475-476, 478-479
Non-first-normal-form relation 95
Nonprime attribute 402
Nonrecursive predicate 103-104, 106115, 139, 141, 144
Normal form 401
See also Boyce-Codd normal form,
First normal form, Fourth normal
form, Second normal form, Third
normal form
Normalization 76, 246, 442-444
.VP-completeness 440, 443, 445, 501
Obermarck, R. 586-587
Object 272, 295
Object identity 22-23, 28-29, 33, 43, 66,
82,95
Object model 82-87, 94-95, 171, 245246
Object-base 1
Object-oriented database system
See OO-DBMS
Occurrence, of a variable 146
Offset 298, 320
O'Hare, A. 30
Olle, T. W. 292
One-one relationship 38-39, 48
OO-DBMS 20-23, 28-30, 85-86, 240293
See also Complex object, Data ab
straction, Object identity, Objectbase
OPAL 87, 271-288, 293, 462-464, 466
Operational meaning of rules
See Computational meaning of
rules
Optimistic concurrency control 531,
533-534
Optional retention 258
Ordinary predicate 101, 107
Osborn, S. L. 442-443, 445
Otis, A. 30, 293
Ottmann, Th. 375
Ouskel, M. 375
Owner 68, 76, 241, 251, 259, 461
Ozsoyoglu, G. 172
Ozsoyoglu, M. Z. 95
Page
See Block
Page manager 518-519
Page table 518
Paging strategy 518
INDEX
626
Paige, R. 172
Papadimitriou, C. H. 444, 540-542,
585-586
Parameter 273-274
Paredaens, J. 444
Parent 263
Parker, D. S. 443
Partial-match query 356-357, 359-361,
364-366, 373, 375
Participant 557
Partition, of networks 544-545
Partitioned hashing 358 361, 373
Password 456
Pattern 332
Pattern matching 201, 213
Pelagatti, G. 585
Peleg, D. 586
Perfect model 138 139, 170, 173
Perl, Y. 375
Persistent data 2
Phantom deadlock 579-581
Physical data independence 11-12, 54
Physical database 7, 11, 29, 294-375
Physical item
See Local item
Physical scheme 13
Pinned record 298, 318-319, 322, 329,
331, 338-339, 351
Pippenger, N. 375
Pirotte, A. 94
Pixel 19, 27
PL/I 184
Pointer 76, 246, 263, 281, 295, 297, 320,
348-350
See also Virtual record type
Pointer array mode 346
Polygraph 495-499, 540
Popek, G. J. 586
Positive literal 102
POSTGRES 30
Precedence, of operators 147
Precompiler 228
Predicate lock 541
Predicate symbol 24, 100
Preorder threads 349-350
QBE
See Query-by-Example
QUEL 185-195, 201, 238, 458
INDEX
627
R
Ramakrishnan, R. 171-172
Ramamohanarao, K. 172
Ramarao, K. V. S. 586
Range query 356-357, 359, 361, 365367, 375
Range statement 185
Read-lock 470, 486-487, 490-493
Read-set 493
Read-time 526, 574-575
Read-token 551
Receiver 272
Record 241, 263, 295
See also Logical record, Variablelength record
Record format
See Format, for a record, Logical
record format
Record structure 2-3
Record type 241-243, 252, 274-276
See also Logical record type
RECORDOF 83
Record-oriented system
See Value-oriented system
Recovery 516-524, 529, 563-564, 569573, 575-576, 586
See also Cascading rollback
Rectified rule 111-112
Recursion 18-19, 26-27
Recursive predicate 103-104, 115-128
628
INDEX
Robson, D. 293
Rohmer, J. 172
Root 263
Rosenberg, A. L. 375
Rosenkrantz, D. J. 540-541, 586
Rosenthal, A. 95
Ross, K. A. 172
Roth, M. 95
Rothnie, J. B. Jr. 375, 540-541, 585586
Rowe, L. A. 30
Rozenshtein, D. 30, 95
Rubinstein, P. 238, 466
Rule 25, 96-100, 102
See also Horn clause, Rectified
rule, Safe rule
Run-unit 247
Rustin, R. 94
S
Sadri, F. 443-444
Safe formula 149, 151-161, 172, 188-189
Safe rule 104-106, 136-139, 143, 161
Sagiv, Y. 429, 432, 442-444, 541
Samet, H. 375
Sammet, J. E. 94
Saraiya, Y. 30
SAT 436
Satisfaction, of a dependency 382, 428429, 443
Scheck, H.-J. 95
Schedule 474
Scheduler 476, 512
Scheme 10-11
See also Conceptual database, Da
tabase scheme, Logical record for
mat, Relation scheme
Scheuermann, P. 375
Schkolnick, M. 375, 442, 541
Schmid, H. A. 95
Schmidt, J. W. 94
Schwartz, J. T. 172
Sciore, E. 30, 95, 444
SDD-1 541, 586
INDEX
Shared lock
See Read-lock
Shasha, D. E. 542
Shepherdson, J. C. 173
Shipman, D. W. 95, 586
Shmueli, O. 172
Sibley, E. 94
Silberschatz, A. 31, 95, 172, 503, 541
Simple selection 140, 207
Singleton set 215
Singular set 254-255
Site
See Node
Skeen, D. 586
Skeleton 196
Smalltalk 271, 293
Smith, D. C. P. 95
Smith, J. M. 30, 95
Snyder, L. 375
Software AG 292
Software engineering database 19
Soisalon-Soininen, E. 542
Sorenson, P. 442
Sorted file
See B-tree, Isam
Sorting 311
Soundness 385, 415-416, 445
Source, for a virtual field 245
Sparse index 312, 328
SQL 6-7, 13-14, 210-234, 238, 457,
460-462, 466
SQUARE 238
Stable storage 516, 523
Stanat, D. 375
Statistical database 467
Stearns, R. E. 540, 586
Stein, J. 30, 293
Stonebraker, M. 30, 238, 375, 466, 540,
585-587
Store statement 258
Stratification 133-135
Stratified negation 132-139, 172-173
Strict protocol 511-512, 530-531, 540,
556-557
Strong, H. R. 375
Sturgis, H. 586
629
Subclass 275
Subgoal 102
Subquery 214
Subscheme 11
See also View
Subscheme data definition language 89, 13
Subscheme data manipulation language
9
Subset dependency 429
Subtransaction 546
See also Coordinator, Participant
Subtype
See Type hierarchy
Sum
See Aggregation
Summers, R. C. 466
Superkey 383
Suri, R. 540
Swenson, J. R. 95
Symbol mapping 428
System failure 508, 516-523
System R 210, 238, 351-354, 542
System R* 587
System 2000 293
INDEX
630
TOTAL 292
Toueg, S. 586
Traiger, I. L. 540-541, 585
Transaction 468, 546
See also Nested transaction
Transaction management 2, 5-6
See also Concurrency control
Transitive closure 26, 92-93, 117, 145,
175
Transitivity 384, 414-416
Travis, L. 30
TRC
See Tuple relational calculus
Tree 73, 77, 263, 502-507
See also B-tree, Database record,
Hierarchical model, fc-d-tree
Tree protocol 502-504, 540
Trigger 452-453
Trivial dependency 384
Tsichritzis, D. C. 29, 94, 292-293
Tsou, D.-M. 442, 445
Tsur, S. 30, 172, 375
Tuple 3, 22, 43, 295
Tuple indentifier 352
Tuple relational calculus 156-161, 174,
185, 212
Tuple variable 156, 185-186, 200, 212213
Tuple-generating dependency 424, 427,
430-431, 433, 440, 444
Two-phase commit 560-564, 586
Two-phase locking 468, 478, 484-486,
489-490, 500, 511-512, 524-526,
540, 555-557
Type hierarchy 30, 82, 85-86, 95, 272,
276, 278
See also Isa hierarchy
Type union 85
Typed dependency 425
Typeless dependency 425, 427
U
Uhrig, W. R. 542
Ullman, J. D. 30, 65, 95, 172, 362, 374375, 421, 441-445
Unary inclusion dependency 440-441,
445
Undecidability 444
Undo protocol 542
Union 55, 57, 122, 178, 189, 191, 449
Union rule 385-386, 416-417
Unique symbol 426, 428, 430
Universal quantifier 102-103, 147, 215
UNIX 239, 460
Unlock 477, 486, 502, 505
Unpinned record 298, 304, 315, 322,
329-330
Update 14, 205, 221, 261, 270-271, 279,
305-306, 309, 316, 320, 323, 328329, 453, 458
Update anomaly 377
Used/unused bit 299
Useless transaction 495
User identification 456
User profile 462
User working area 246-247
INDEX
631
W
Wade, B. W. 466
Wait-die 580-581, 586
Waits-for graph 474, 542, 577-581
Waldinger, R. 171
Walecka, S. 429
Walker, A. 30, 173
Wang, H. 172
Warning lock 505, 507-508
Warning protocol 504 507, 536-537,
541
Warren, D. H. D. 30
Warren, D. S. 30
Waxman, J. 238
Weihl, W. 542
Weikum, G. 542
Weinberger, P. J. 239
Whyte, N. 238
Wiederhold, G. 29-31, 95, 293, 374
Willard, D. E. 375
Wolfson, O. 586
Wong, E. 238, 466, 586-587
Wood, C. 466
Wood, D. 375, 542
Workspace 246, 264, 468
Wound-wait 580-581, 586
Wright, D. D. 586
Write-lock 470, 486-487, 490-493
Write-locks-all 547-552, 554, 556
Write-set 493
Write-time 526, 574-575
Write-token 551
Yajima, S. 443
Yannakakis, M. 442, 444, 501, 540-542,
586
Yao, A. C. 375
Yao, F. F. 375
Yao, S. B. 541
Yap, C. K. 95
Yokuta, H. 30
Youssefi, K. 238
Yuan, L.-Y. 95
Yuppie Valley Culinary Boutique 40-42
YVCB
See Yuppie Valley Culinary Bou
tique
Zaiddan, S. M. 442
Zaniolo, C. 30, 94-95, 172, 443
Zloof, M. M. 238, 466
Zook, W. 238
TO APPEAR IN VOLUME II
The second volume co\ers query optimization in database systems,
and explains extensions of these ideas to handle the more expressive
query languages that are used in knowledge-base sv stems. Recently
discovered techniques for efficient implementation of logic languages
will be discussed, along with the design of some experimental
knowledge-base systems. The "universal relation" model, for under
standing queries posed in natural language or in very high-level lan
guages, will also be treated.
ISBN Q-fifll"fS-lflfl-X