0% found this document useful (0 votes)
29 views84 pages

Ilovepdf Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views84 pages

Ilovepdf Merged

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

CO~"iDIJT~NG INFORMATIO~I CENTER HG-45

bnfv~.rslty c i ',Vashrngton ~
3737 Brooklyn Avenue N.E.
Guest Editor's .Introduction •
Seattle, Washington 98105 ~"

The Development of D a t a - B a s e Teohnol.ogy

E. H. SIBLEY
Department o/Information Systems Management, University of Maryland, College park, Maryland 20742,
and National Bureau of Standards, Washington, D.C. 20284

I should like to thank Elliott Organick for their implementation in current conuner-
inviting me to serve as Guest Editor for cial and experimental systems.
this issue of COMPUTING SURVEYS, which We were faced with one major problem
has become our industry's most effective in trying to provide an integrated issue:
educational journal. I believe this issue every model uses its own terminology. We
deals with the most important topic in therefore decided to attempt to use both a
computing today: data-base technology. common terminology-:and a single• example
Here we attempt an integrated approach wherever possible. This is. b y no means a
to a very disoriented field. Problems abound: simple task; we must not only define t~#rmi-
differences in terminology, differences in nology and apply it to descriptions of vari-
modeling, and •differences in implementa- ous models, but also show haw these terms
tion confuse the potential user, who is faced differ f r o m those used by others in dis-
with almost unanswerable questions, such cussing the same ideas. There is no stand-
as: ard, and we are forced to be arbitrary:
•Should I wait for the dust to settle, or It was not possible to discuss every model
start to use data-base technology or implementation in one issue; in fact; it is
now?; and difficult to deal with anything but basic con-
• If I go data-base, which type of sys- cepts. We admit to errors and omissions, and
tem do I choose? apologize for them; a real attempt was made
Such problems are found in any evolving to solicit aid from a wide variety of experts.
technology, especially when it is associated We issue them a blanket vote of thanks and
with a fast developing industry, such as apologize for inadvertent omissions:
computing; while some problems appear
more philosophical than real, others arise OUTLINE OF THE ISSUE
from a poor understanding of new con-
cepts. Here we shall try to answer some of Figure 1 provides a graphic overview of this "
the questions and reduce the confusion. issue for readers. The first article,by Fry and
But obviously, one issue of COMPUTING Sibley, could be called an "entry" to the
SURVEYS cannot be all encompassing; data issue. Dependent.on ~this are t w o a titles,
technology is a field which already boasts essentially at the same "level"; one by
hundreds of articles, and textbooks by the Chamberlin, the other by Taylor and Frank.
dozen. Thus, this issue confines itself to an They discuss tmJ different and independent
explanation of various models of data-base approaches. The article by Tsicliritzis and
systems, showing their differences and simi- Lochovsky, using, the common terms and
larities, while trying .to relate the models to example of the first paper .and discussing

Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM's copyright n0tiee is
given and that reference is made to the publication, to its date of issuE, and to the fact that reprin;ting
privileges were granted by permission of the Association for Computing Machinery.

Computing Sttrvcye, Vol. 8, No. 1, March 1976

t
2 • E . H . Sibley

Fry and Slbley:


•Deftne' Terms
• Give History of Hajor Systems
• Describe CommonExample
• Discuss Trends and Issues

• Describe Relational Approach • Describe Network Approach


• Discuss Implementation in .Discuss Implementation in
J ResearchInstallations Commerctal Systems

' " i : h ::l:a~iRte~::~o : ~ d a : : ~ : ~ ° : : Approaches

Figure 1: The structure and contents of this issue.

differences between the hierarchic and other interests started somewhere; thus, our first
approaches, depends on all three previous question is:
papers. Finally, the paper by Michaels, • Why did it all happen?
Mittman, and Carlson compares the ap- The pressures of the "Computer Age" arise
proaches described in the second and third from a fascinating new technology supplied
papers; it is consequently dependent on with inexpensive equipment. The efficient,
both of these. fast, accurate, and economical way a com-
puter can perform numeric and logical opera-
PRESSURES TOWARD DATA-BASE tions (compared to the slow, inaccurate hu-
TECHNOLOGY man counterpart) has forced automatk)n of
many previously manual operations in uni-
Data-base technology is one of the most versities, government, and the private sector.
rapidly growing areas of computer and infor- In the early days of computers, automation
mation science. In less than twenty years, merely entailed conversion from manual
with the greatest part of the development in operations, with little attempt to integrate
the past eight years, data-base systems have any resulting system. The typical data-proc-
come from nothing to be a major topic of essing operation of the fifties and sixties was
current interest. Top management of major created in this manner. Data* was used in
corporations have grown to appreciate the the same way as before; the inefficient dupli-
importance of their data bases; government
regulatory agencies are already worrying
* In COMPUTINGSURVEYS,the word data is used
about the implementation of privacy and as a collective noun. Although "datum" may be
freedom of information acts and their rela- correct for hard-line grammarians, it will not be
tion to data banks. This proliferation of found here.

Computing Surveys. Vol. 8. No. I. March 1976


The Development o] Dala-Basei Technology • 3

cation of data and effort was continued with DISADVANTAGES OF DATA INTEGRATION
duplication of computerized data and pro-
cedures. The operation seemed successful: it The industry congratulated itself on reducing
reduced overall cost, and therefore little data redundancy and improving its availa-
thought was given to further improvement of bility, but it also introduced the potential
the system. In time, astute data processing for disaster. The first problem with integra-
managers recognized a problem: while stored tion arises because the data base is now more
within the data vaults of the organization vulnerable to destruction through machine
(probably in the form of bits on a tape), data malfunction, personal error, or deliberate
was essentially unavailable. The programs human tampering. The loss of "quality" in a
that input, stored, and used the data were data base (including total destruction) by
essentially the owners of the data. Any other any of these means may be considered a
user found it difficult to obtain, integrate, or threat to the organization, because data is
transform the "available" data for use in one of its most valuable assets. "Integrity"
another program. Thus, every new need for techniques are therefore a necessity.
data involved writing a new program to ob- The other threat to the integrated data
tain the data before it could be proc- base relates to its "security" and accuracy.
essed by yet another program, and even Most enterprises have some secret processes
this was difficult--the data formats were or private material which should be pro-
"locked" in the original programs, and some- tected against theft or access by unauthorized
times the original object code had been lost! people. To achieve this protection, the system
This essential unavailability of otherwise must ensure that it is secure, i.e., that any
transferable data gave rise to the question: information which is private to the organi-
• Why not integrate the data? zation is safe from unauthorized dissemina-
This led to the thought that integration, tion or tampering. The problem with erro-
were it possible, could be achieved by de- neous information is that it might result in
fining the data format, storing it as a "data an incorrect (adverse) decision about a
definition," and allowing general-purpose person or enterprise. Government and other
"data-base management" software to access enterprises are vitally interested in both of
it. And this gave rise to the prime concept these aspects, which have been termed "in-
of the generalized data-base management formation privacy." Traditionally, privacy
system. has been defined as the right of an indivi-
In dealing with a system to store and ac- dual or organization to be "left alone'--it
cess data for a set of different programs, usually is considered the right to retain cer-
and consequently for a set of different types tain non-public information without threat
of users, two further questions arose. First: of disclosure. An integrated data base
• Can we access this data through our cur- threatens privacy: it becomes easier to col-
rent computer languages? lect, and to unwittingly divulge informa-
This involves either specification of addi- tion of a confidential nature (which may
tional commands in conventional program- have been legally obtained from the individ-
ming languages, or the provision of calls to ual or enterprise) to some other unauthorized
special subroutines which allow access to the person oragency. Today, information privacy
data base. And second: has been expanded to include the right of
• Why not allow a higher-level language the individual or organization to know
for ad hoc use of the data base? what "personal" information is retained
This can be achieved by providing a special on any data base. It also implies that the
query language as an interface. While in- individual or enterprise has the right to
efficient, the first prototype systems appeared challenge the data, causing either its cor-
very successful to users, and the first com- rection, or the additirn of a statement that
mercial systems started to appear. Then the the fact is under dispute. As an example: a
first problems arose. person may wish to ensure that data given

Computing SurveysmVol. 8, No. I, March 1976


4 • E . H . Sibley

confidentially to an agency for the purpose managers at the fourth or fifth level of peck-
of obtaining a credit card is not divulged ing order. This had idistorted the organiza-
capriciously to a neighbor; the person may tion, and it has alreudy led to organizational
also wish to know what other information is changes: in fact, we now see "Vice President
on file at a credit bureau, and be able to of Information Systems"--a far cry from the
correct or dispute any potentially threaten- lowly "Data Processing Operations Man-
ing fact. ager" of yesterday.
When data is integrated and readily avail-
able through either program or ad hoc query DATA BASE VERSUS DATA PROCESSING
interfaces to a community of users, the possi-
bility of loss of integrity, and the ability to We have seen that data which had previously
penetrate the security (thereby threatening been duplicated and spread throughout the
privacy) obviously increases. Early data- organization is now being drawn into a uni-
base management systems often needed to fied and sometimes monolithic system. This
be retrofitted to ensure reasonable integrity represents the concentration of a valuable
and security. asset. In the past, the only major identifiable,
The placing of controls within the system tangible asset of a corporation was its money,
brought another new issue into view--the or objects immediately convertible to money.
question of who was to make the policy de- Accountants had learned how to audit and
cisions, who would issue passwords and who control the flow of money. The advent of
was really allowed to make decisions on data data management suggested the possibility
formats. A new post was needed. of the audit and control of data.
In exactly the same way that an account-
THE DATA ADMINISTRATION CONCEPT ant determines the accuracy of money flow,
a data-base management auditor could de-
Data administration is really a special form termine accuracy, quality, and privacy as-
of managerial control which includes both pects of the data. This gives rise to the con-
authority over data integrity and security, cept of data auditing. New and pending
and responsibility for overall efficiency. Be- state and federal government regulations on
cause the data base for the enterprise was privacy and freedom of information would
growing in total size and complexity, new make the enterprise legally accountable for
measures for improving efficiency were its data. Consequently, the matter of ad
possible. The question arose: hoc use and poor control over the applica-
• Should programmers be allowed to define tion and dissemination of information is sud-
their own data? denly a real concern. Someday soon a high-
The implementation of central authority level administrator will be sentenced (fined
for data definition made it impossible to al- and maybe even given a prison term) for
low the programmer this flexibility. By intro- contravening these regulations--and then
ducing a single data authority, and by pro- the entire industry will tighten its controls--
viding information about the community of almost overnight. Thus three factors have
users, the "best:' data structure could be de- combined to imply a need for more effective
fined: this data structure is efficient for the and automated control mechanisms to be
community of users rather than for any one built into the automated systems, with
particular user. reasonable safeguards against unauthorized
As a consequence of the advent of a new access. The three motivating factors are:
technology and new managerial control • the regulations of government to in-
mechanisms, the computing operation started sure privacy;
to take on a new importance. Government • the aspirations of management for
and business agencies found themselves with more effective control over its
expensive data-processing operations which operations; and
were being run by relatively low-level man- • the understanding of auditors of the
agement. Some commercial organizations need to retain quality in all financial
found that equipment requiring presidential and nonfinancial data.
(or even board) approval, was being run by The data-base management system today

Computing Surveye, Yol. 8, No. 1, March 1976


The Development of Data.Bas~ Technology • 5

allows data to be shared by a community of trieval efficiency.However, to the ultimate


users, while insuring the integrity of the data user or application programmer this choice
over time, and providing security against un- of physical structure should be invisible
authorized access. At the same time, the ad- (termed: data independence), because the
ministrator may select different methods of user should not have to worry about these
storing the data (termed: the physical struc- internal details.These, then, are the princi-
ture) for different parts of the data base, pal objectives of a data-base management
thereby aiding in providing storage and re- system.
Relational Data-Base M a n a g e m e n t Systems
DONALD.D. CHAMBERLIN
IBM Research Laboratory, San Jose, California 9519S

The essential concepts of the relational data model are defined, and normalization,
relational languages based on the model, as well as advantages and
implementations of relational systems are discussed.
Keywords and Phrases: Data base, data-base management, data independence,
data model, relational systems
CR Categories: 8.5I, ~.3~,~.~

INTRODUCTION oriented toward the information content of


Before describing the relational model of their data, and decreasingly concerned with
data, we will briefly discuss some trends in its representation details. Increasingly, the
data-base management which give motiva- user interface of a modem DBMS deals
tion to the development of the relational with abstract information rather than with
model. The first large-scale, machine-read- the various bits, pointers, arrays, lists, etc.,
able collections of data were stored on ex- which may be used to represent informa-
ternal media such as cards or tape. Be- tion. Responsibility for choosing an ap-
ginning in the late fifties and early sixties, propriate representation for the information
data banks were being stored on-line using is being assumed by the system and is not
direct-access devices such as disks. Gen- exposed to the end user; indeed, the repre-
eralized software packages such as BDAM sentation of a given fact may change over
and ISAM IT21 were developed to aid pro- time without the user being aware of the
grammers in accessing the data. During the change. The general term for this trend
late sixties and early seventies, the idea of away from representation details is data
an integrated data-base management system independence.
was developed. This concept allowed several If we attempt to extrapolate the trend
applications to share a common bank of toward data independence, we observe that
data, maintained and protected by a central most current DBMS present the user with
system• In an integrated data-base environ- a view of records connected in some sort of
ment, the data-base management system structure, such as a network or hierarchy.
(DBMS) provides each application program In such a view, information may be repre-
with its own view of the common data, sented in at least three ways:
implements various operators for retrieval 1) by the contents ofrecords (e.g.,Smith's
and update of data, and resolves inter- employee record has DEPTNO = 50.);
ference between concurrent users. 2) by the connections between records
The overall trend which is visible in e.g., Smith's employee record occurs
data-base management today is the fol- in the hierarchy below the department
lowing: users are becoming increasingly record for Dept. 50.); and

Copyright © 1976, Association for Computing Machinery, Inc• General permission to'republish,
but not for profit, all or part of this material is granted provided that ACM's copyright notice is
given and that referenceis made to the publication, to its date of issue, and to::the fact that reprinting
privileges were granted by permission of the Associationfor Computing Machinery.
44 • DonaldD. Chamberlin

CONTENTS IBM 7090, the user was given sets of data-


tuples (not constrained to be of common
type) and set-theoretic operators, such as
INTRODUCTION union and intersection, to manipulate the
DEFINITIONS
NORMALIZATION
data.
LANGUAGES In the late sixties, several artificial in-
Query Facilities telligence-oriented systems were imple-
Relational Calculus
Relational Algebra mented based on binary relations, which are
Mapping-Oriented Languages simply collections of ordered pairs of ob-
Graphics-Oriented Languages ects related in a certain way. An example
Data Manipulation
Data Definition and Control of a binary relation is:
Language Evaluation
Natural Languages FATHER-OF: {{Mary, George),
ADVANTAGES {George, Bill), {John, Bill)}
IMPLEMENTATIONS
SUMMARY Systems incorporating binary relations for
ACKNOWLEDGMENTS
REFERENCES data storage included the Relational Data
File of Levien and Maron [Y2], the TRAMP
system of Ash and Sibley [Y5], and the
LEAP language of Feldman and Rovner [Y6].
The considerable attention paid to n-ary
3) by the ordering of records (e.g., all the relations as a tool for general data-base
sales records are stored in chrono- management dates from a 1970 paper by
logical order). E. F. Codd, of IBM [M2]. Codd was t h e
User requests made to the DBMS are then first to give a rigorous definition for n-ary
framed in terms which depend on user relations in the data-base context, and to
knowledge of the representation chosen emphasize their advantages for data :inde-
pendence and symmetry of access.
(e.g., "FIND N E X T RECORD OF ABC Codd's paper introduced concepts which
SET"). set the direction for research in relational
The relational data model makes it pos- data-base management for several years to
sible to eliminate this last representation- come. The paper defined a data sublanguage
dependence from the user interface. In the as a set of facilities, suitable for embedding
relational model, information is represented in a host programming language, which
in only one way at the user interface: by permits the retrieval of various subsets of
data values. User requests become free of data from a data bank. The paper noted
any dependence on internal representation, that a standard logical notation, the first
and hence may be framed in a high-level, order predicate cMculus, is appropriate as a
nonprocedural language. At the same time, data sublanguage for n-ary relations. The
the system becomes free to choose any paper also introduced a set of operators
physical structure for storage of data, and ("join," "projection," etc.) which were later
to optimize the execution of a given request. developed into the well-known relational
A very early proposal for a representation- algebra. Finally, the paper explored\~the
independent approach to file processing was properties of "redundancy" a n d ~ q q n -
made by the Language Structure Group of sistency" of relations, which laid the ~r(~nd-
the CODASYL Development Committee in work for Codd's later theory of normaliza-
1962 [Y1]. The CODASYL proposal, called tion.
"An Information Algebra," was nearer im-
plemented: An early worker in a related DEFINITIONS . ~,:
area was David Childs, of the University of
Michigan, who proposed a "set-theoretic We will now discuss the basic concepts and
data structure," based on a "reconstituted definitions which underlie the relational data
definition of relation" [Y3, Y4]. In Childs' model. Many of these concepts were first
system, which was implemented on an introduced by Codd's original paper [M2].

Computing Surveys, Vol. 8, No. 1, March 1976


Relational Data-Base Management ,gysten~ 45

ELECTIONS YEAR WINNER-NAME LOSERtNAME

O~ 1952 Eisenhower Stevenson


1956 Eisenhower Stevenson
1960 Kennedy Nixon
1964 Johnson Goldwater
1968 Nixon {HUmphrey, Wallace}
1972 Nixon MeGovern

FIGURE. 1 The ELECTIONS relation.

An excellent introduction to relational con- of Figure 1, the second and third columns
cepts can also be found in Date's recent text- are both based on the same domain: the
book [-Zt1]. set of names of Presidential candidates.
In mathematics, the term relation may However, ,each column has a different role-
be defined as follows: Given sets D1, D ~ , . . . , name to describe its meaning in this par-
D~ (not necessarily distinct), a relation R ticular relation: WINNER-NAME and
is a set of n-tuples each of which has its LOSER-NAME.
first element from D~, second element from The individual entries in each mple are
D2, etc. The sets D~ are called domains. called its components. Thus, we may say
The number n is called the degree of R, and that in the tuple whoseYEAR-eomponent is
the number of tuples in R is Called its "1952,'.' the LOSER-NAME-component is
cardinality. "Stevenson."
It is customary (though not essential) A column or set of columns whose values
when discussing relations to represent a uniquely identify a row of a relation is
relation as a table in which each row repre- called a candidate key (often shortened to
sents a tuple. An example of this representa- simply key) of the relation. In Figure 1,
tion is shown in Figure 1, which illustrates a YEAR is a key for. ELECTIONS •since no
relation describing Presidential elections. two rows have the same YEAR. I t is pos-
In the tabular representation of a relation, sible for a relation to have more than one
the following properties, which derive from key. For example, if the ELECTIONS re-
the definition of a relation, should be ob- lation had an additional column ADMIN-"
served: ISTRATION-NUMBER, it would also be
a key. When a relation has more than one
1) no two rows are identical;
key, it is customary to designate one as the
2) the ordering of rows is not signifi-
cant; and
primary key.
Often a column or set of columns in one
3) the ordering of columns is significant
relation will correspond to a key of another
(i.e., the meanings of the tuples
relation. For example, consider the PRESI-
(1972, Nixon, McGovern) and (1972,
DENTS relation of Figure 2, whose key is
McGovern, Nixon) are quite differ-
NAME. The values of WINNER-NAME
ent).
in the ELECTIONS relation correspond to
When a rdation is represented as a table, its values of the key-column NAME in PRESI-
degree is the number of columns' and its DENTS. Consequently, WINNER-NAME
cardinality is the number of rows. in ELECTIONS is called a foreign key.
In the tabular representation of a rela- Two facts should be noted: 1 ) a foreign
tion, it is customary to name the table and" key need not be (and often is not) a key of
to name each column, as shown in Figure 1. its own relation; and 2) the foreign key need
The columns of the table are called attributes." not have the same role-name (e.g.,
(Sometimes the name of a column is referred WINNER-NAME) as the corresponding
to as a role name.) It is important to dis'- key in the other relation (e.g., NAME).
tinguish between attributes and domains. In an integrated data-base management
For example, in the ELECTIONS relation system, different users may have a need to
46 • Donald D. Chamberlin

PRESIDENTS NAME PARTY HOME-STATE

Eisenhower Republican Texas


Kennedy Democrat Massachusetts
Johnson Democrat Texas
Nixon Republican California

FIGUaE 2. The PRESIDENTS relation.

see different subsets of the universe of data. in first normal form are sometimes called
The term data model denotes the universe "flat tables". If we look carefully at the re-
of data--the complete set of relations stored lation in Figure 1, we see that it is not in
in the system. A schema is a set of declara- first normal form. This is because an elec-
tions which describe the data model. The tion, while it has only one winner, may
term data submodel denotes the set of re- have several losing candidates. Thus, for
lations which is available to a particular example, the tuple for the election of 1968
user, and a subschema is a set of declarations contains the component {"Humphrey",
for the data submodel. A complete data- "Wallace"}. In fact, the LOSER-NAME
management system must provide a means component of each election tuple is a list.
for defining the schema and a subschema for whose length depends on the number of
each distinct class of users of the system. votes a candidate must receive to merit in-
clusion in the data base.
We can convert the ELECTIONS rela-
NORMALIZATION
tion into first normal form by breaking it
The issue of designing a schema and sub- Up into two relations, one containing infor-
schemas for a data base leads us to a discus- mation on winning candidates and the other
sion of normalization. The concept of nor- on losing candidates. This also gives us a
malization was introduced by Codd in [M2] good opportunity to record other attributes
and dealt with more rigorously in his later of interest about the candidates, such as
papers [N1] and [N2]. A number of other their party and number of votes received.
authors have also made contributions to the This leads us to the data base shown in
theory of normalization (see bibliography). Figure 3, which is in first normal form.
Normalization theory begins with the The key of ELECTIONS-WON is YEAR;
observation that certain collections of rela- the key of ELECTIONS-LOST is (YEAR,
tions have better properties in an updating LOSER-NAME}.
environment than do other collections of To illustrate the advantages of the higher
relations containing the same data. The normal forms, we need to make updates to
theory then provides a rigorous discipline the data base by inserting new tuples, de-
for the design of relations which have favor- leting existing tuples, and making changes
able update properties. The theory is based to existing tuples. These updates are not
on a series of normal forms--first, second, particularly well motivated for our example
and third normal form--which provide suc- data base, in which data is mostly static
cessive improvements in the update prop- and unchanging. Of course, in an operational
erties of a data base. We will discuss these data base describing, for example, the in-
normal forms on an intuitive basis; for a ventory of a store, updates would be very
thorough treatment, see [N1], IN8], or frequent. For the sake of consistency, we will
[z11]. continue with our Presidential example.
Almost all references to relations im- (You may imagine that some data was found
plicitly deal with relations in first normal to be in error and is being updated to correct
form. A relation in first normal form is a the data base.)
relation in which each component of each Relations in first normal form may be
tuple is nondecomposable; i.e., the com- used with any of the relational languages
ponent is not a list or a relation. Relations which are described in the next section.

Coml~uthag Su~veye, Vol. 8, No. 1, March 1976


Relational Data-Base Manwgeme~ 8y~t~18 • 47

However, a relation in first normal form may worse, it leads to the; possibility that differ-
exhibit three kinds of misbehavior, which ent tuples may contain inconsistent values
are called update anomalies, insertion of HOME-STATE for the same President.
anomalies, and deletion anomalies. All these Insertion anomalies: Suppose we wish to
anomalies arise because more than one insert a fact about a candidate which is
"concept" may be mixed together in the independent of any election, e.g., "Dewey,
same tuple. Consider the ELECTIONS- was a Republican." This is difficult in our
WON relation of Figure 3. Mixed together example data base because there is no rela-:
in one tuple of this relation are facts about tion for candidates. We are forced to invent
candidates (e.g., "Eisenhower came from a tuple in ELECTIONS-LOST (or ELEC-
Texas") and facts about elections (e.g., TIONS-WON?) having null values for
"In 1952 Eisenhower received 442 elec- YEAR and the o~,er irrelevant attributes.
toral votes"). In some applications it may In many systems we would be unable to
be important that each of these facts be store this fact because null values are not
independently updated, inserted, and de- permitted in the primary key.
leted. This gives rise to the three anomalies, Deletion anomalies: Suppose we wish to
which we can now illustrate by the following delete the information about elections as
examples. they fall beyond a certain number of years
Update anomalies: Suppose the fact that in the past. When we delete the 1952-tuple
"Eisenhower's home state is Texas" is from ELECTIONS-WON, we still retain
found to be in error, and his home state the fact that Eisenhower was a Republican.
must be changed to Nebraska. Since Eisen- But when we delete the 1956-tuple, all
hower appears in more than one tuple of facts about Eisenhower are lost. In some
ELECTIONS-WON, this erroneous fact applications, this might have •very serious
may be represented many times (in general, consequences. For example, consider a rela-
a time-varying number of times). This tion describing orders for various items,
makes it difficult to update this particular shown in Figure 4. As orders are filled we
fact, since all tuples where it is represented delete their tuples from the relation. When
must be searched out and updated. Even we have deleted the last order for toasters,

YEAR WINNER- WINNER- PARTY HOME-


ELECTIONS-WON NAME VOTES STATE

1952. Eisenhower 442 Republican Texas


1956 Eisenhower 447 Republican Texas
1960 Kennedy 303 Democrat Mass.
1964 Johnson 486 Democrat Texas
1968 Nixon 301 Republican Calif.
1972 Nixon 520 Republican Calif.

LOSER- LOSER-
ELECTIONS-LOST YEAR PARTY
NAME VOTES

1952 Stevenson 89 Democrat


1956 Stevenson 73 Democrat
1960 Nixon 219 Republican
1964 Goldwater 52 Republican
1968 Humphrey 191 Democrat
1968 Wallace 46 Am. Indep.
1972 MeGovern 17 Democrat

FIGURE 3. Elections data base in first normal form.

o
48 • Donald D. Chamberlin

QUANTITY-
ORDERS ITEM PRICE DATE
ORDERED

Toaster 20.00 1/lo/75


Toaster 20.00 2/15/75
Mixer 28.00 4/6/75

FIGURE4. The ORDERS relation.

we find we n o longer have any information variety of ways. The original definition was
about the price of toasters--possibly an given by Boyce and Codd in IN1]. Later
unintended result. This kind of relation writers, including Kent [N8], Codd [M14],
burdens the user with the responsibility of and Sharman [N15], proposed alternate
making sure that the tuple he deletes is not definitions which framed the same concept
the last tuple of some "category" (e.g., in simpler terminology. We present two of
toasters), and therefore the sole bearer of these equivalent definitions:
information about that category (e.g.,
price). Definition, Boyce and Codd [M14]:
An important objective of normalization A relation R is in third normal form if it is in
first normal form and, for every attribute
is the elimination of the update, insertion, collection C of R, if any attribute not in C is
and deletion anomalies. The most widely- functionally dependent on C, then all attri-
known result of normalization theory is butes in R are functionally dependent on C.
third normal form. Since second normal form
is of little significance except as a stopping- Definition, Sharman [N15]:
A relation is in third normal form if every
off place on the way to third, we will proceed determinant is a key.
directly to the definition of third normal
form. Both definitions are formal ways of ex-
In order to understand how third normal pressing a very simple idea:-that each re-
form avoids the three anomalies, we must lation should describe a single "concept,"
discuss the concept of functional dependence and if more than one "concept" is found in:
among the attributes of a relation. We say a relation, the relation should be split into
that an attribute B of relation R is func- smaller relations. The result of applying
tionally dependent on attribute A if, at every this "splitting" process to the sample data
instant of time, each A-value in R is as- base of Figure 3 is shown in Figure 5. A
sociated with only one B-value. We ex- moment's examination will show that the
press this relationship by the notation A --~ update, insertion, and deletion anomalies
B, and say "A determines B" or "B de- we discussed are not present in the data
pends on A." Similarly, a set of attributes in base of Figure 5.
R may be functionally dependent on an- The design of a data base in third normal
other attribute or set of attributes. The form depends on knowledge of the func-
attribute (or set of attributes) on the left tional dependencies among the attributes
side of the arrow (A in our example) is of the data. This knowledge cannot be
called the determinant. discovered automatically by a system (un-
Clearly, from our definition of key in the less the data base is completely static), but
previous section, every relation contains at must be furnished by a data-base designer
least one functional dependence: all attri- who understands the semantics of the in-
butes of the relation are dependent on the formation. In fact, there is not a mlique
key. (The dependence may be trivial if the third normal form representation for a
relation contains only a key.) If a relation given data base. In IN1] Codd briefly ad-
has more than one key, then all its attributes dressed the problem of choosing an "Optimal
are dependent on each key. Third Normal Form" from among the
Third normal form has been defined in a various alternatives.

Computing Surveys, Vol. 8, No. 1, March 1976


Relational Data-Base Management Systems • 49

LANGUAGES programming language. The term query lan-


guage usually refers to a stand-alone lan-
Such a great variety of relational languages guage in which an end user interacts di-
is available that it would be impossible to rectly With the data-base management
treat them all here. W e will describe a system. Mos~t query languages provide a
representative example of several important variety of facilities (e.g., update, creation,
categories of relational languages; references and deletion of relations) in addition to a
to other languages can be found in the query capability. As compared with a
bibliography. typical data sublanguage, a query language
The term data sublanguage, i n t r o d u c e d is usually at a higher level, less procedural,
earlier, denotes a set of data-base opera- and intended for a more casual user. Some-
tors intended to be embedded in a host times, however, the same basic set of opera-

ELECTIONS-WON YEAR WINNER -NAME WINNER-VOTES

1952 Eisenhower 442


1956 Eisenho~ver 447
1960 Kennedy 303
1964 Johnson 486
1968 Nixon 301
1972 Nixon 52O

PRESIDENTS NAME PARTY HOME-STATE

Eisenhower Republican Texas


Kennedy Democrat Mass.
Johnson Democrat Texas
Nixon Republican Calif.

ELECTIONS-LOST YEAR LOSER -NAME LOSER -VOTES

1952 Stevenson 89
1956 Stevenson 73
1960 Nixon 219 •
1964 Goldwater 52
1968 Humphrey 191
1968 Wallace 46
1972 McGovern 17

LOSERS NAME PARTY

Stevenson Democrat
Nixon Republican
Goldwater Republican
Humphrey Democrat
Wallace. Am. Indep.
McGovern Delnocrat "
FIGURE 5. a t a ~ a s e in third normaLform.

Computing Surveys, Vol. 8, No. 1, March 1976


50 . Donald D. Chamberlin

tots can serve both as a d a t a sublanguage A typical query in ALPHA has two parts:
and as a query language. a target, which specifies the particular at-
This section will explore the approach tributes of the particular relation which are
taken by various relational languages to to be returned, and a qualification, which
providing facilities for query, data manipu- selects particular tuples from the target
lation (e.g., insertion, deletion, and update relation by giving a condition which they
of tuples), data definition (e.g., creation of must satisfy. We will illustrate ALPHA (and
new relations and other structures), and other languages) by some sample queries
data control (e.g., authorization and control based on the data base of Figure 5.
of data integrity). We will then briefly In Q1) below, the RANGE statement de-
consider some ways in which languages clares P be a variable ranging over the rows
can be evaluated and compared, and discuss of the PRESIDENTS relation. The next
the role of natural language as a data-base statement retrieves into workspace W the
interface. HOME-STATE of row P whenever the
NAME of row P is " K E N N E D Y . "
Query Facilities The qualification part of an ALPHA query
may be quite complex and may use the
Query, or retrieval of information from the universal and existential quantifiers: "for
data base, is perhaps the aspect of relational all" (V), and "there exists" (3). For ex-
languages which has received the most at- ample, see display Q2) below.
tention. We will illustrate the variety of Various other languages based, like
approaches to query by presenting ex- ALPHA, on the relational calculus, have been
amples of four classes of languages: rela- proposed. This class of languages imfludes
tional calculus, relational algebra, mapping- QuEL [S15], CO]bARD [L3], and RIL [L7].
oriented languages, and graphics-oriented
languages. Although we deal only with
query facilities in this section, all the lan- Relational Algebra
guages discussed have facilities for update
and other operations in addition to query. A second major class of languages is based
on the relational algebra, which was in-
Relational Calculus troduced by Codd in [M2] and refined in
[M3]. The relational algebra is a collection
Codd's 1970 paper [M2] laid the ground- of operators that deal with whole relations,
work for two families of relational lan- yielding new relations as a result. The
guages which came to be called the rela- major operators of relational algebra in-
tional calculus and the relational algebra. The elude the following:
relational calculus family grew from the • Projection: The projection operator re-
observation that a first-order applied predi- turns only the specified columns of the
cate calculus can be used as a data sub- given relation, and eliminates dupli-
language for normalized relations. In ILl] cates from the result. For example, to
Codd presented the details of such a calculus- find all the unique (party, home-state}
based sublanguage, called ALPHA. pairs in the PRESIDENTS relation,

Q1) What was the home state of President Kennedy?


RANGE PRESIDENTS P
GET W P. HOME-STATE: (P.NAME ='KENNEDY').

Q2) List the election years in which a Republiban from Illinois was elected.
RANGE PRESIDENTS P
RANGE ELECTIONS-WON E
GET W E.YEAR: 3 P (P.NAME = E.WINNER-NAME &
P. PARTY ffi'REPUBLICAN' & P. HOME-STATE = 'ILLINOIS').

Computing Surveye, Vol. 8, No. 1, March 1976


Relational Data-Base Ma~vaQeme~ 8 y ~ b ~ • 51

we might write the following projec- If a given PRFEsIDENT8 tuple matches


tion: more than one E L E C T I O N S - W O N
PRESIDENTS [PARTY, HOME- tuple, it is concatenated with each of
STATE]. them, forming multiple output rows.
If a given tuple imatches no tuple in
(Note that some algebra-based lan- the other relation, it does not par-
guages use column numbers rather than ticipate in the output at all.
column names. In such languages we • Set-theoretic Operators: In relational al-
would write PRESIDENTS [2, 3] in gebra, the set-theoretic operators--
place of the given expression. This union, intersection, and set-difference---
notation, although less mnemonic, has take two relations as operands, treating
the advantage of avoiding ambiguity each as a set of tuples, and produce a
if some intermediate result has two single relation as a result. The operand
columns with the same name.) relations must have compatible sets of
Restriction: The restriction operator attributes.
selects only those tuples of a relation • Division: Some algebraic languages in-
which satisfy a given condition. As elude an operator, called "division,"
originally proposed, the condition only which operates on two input relations
allowed comparison of one component to produce a third 'relation. This ope-
of a tuple with another component. Some rator is sometimes useful in expressing
implementations of the algebra permit queries which contain the word "all."
other condition-types as well, e.g., com- However, since it can be expressed
parison of a tuple-component with 'a in terms of the other algebraic oper-
constant. For example, to seleet those ators, the division operator does not
tuples from the ELECTIONS-WON extend the logical power of the lan-
relation where YEAR is greater than guage. The reader is referred to [M3]
1945, we might write: for a complete treatment of division.
• Nesting: The algebra has the convenient
ELECTIONS-WON [YEAR:> 1945]. property that its operators can be
Join: The join operator takes two re- nested to form expressions of arbitrary
lations as arguments, which we will complexity, with parentheses used as
refer to as relations A and B. A new needed to remove ambiguities. To il-
relation is formed by concatenating a lustrate nesting of operators, we will
tuple of A with a tuple of B wherever repeat examples Q1) and Q2) (displayed
a given condition holds between them. below) in the relational algebra:
For example: Languages based on the relational al-
ELECTIONS-WON [WINNER- gebra have been implemented at M I T [SI]
NAME : NAME] PRESIDENTS. and [$12], the IBM Scientific Centre in
England [$4] and [$16], and General Motors
This expression concatenates a tuple Research Laboratory :[$5]. In addition,
of ELECTIONS-WON with a tuple of studies of optimization algorithms for the
PRESIDENTS whenever WINNER- relational algebra have been published by
NAME in ELECTIONS-WON matches Smith and Chang [TIS], Pecherer [T16],
NAME in the PRESIDENTS tuple. Gotlieb [T17], and others.

Q1) What was the home state of President Kennedy?


PRESIDENTS [NAME •'KENNEDY'] [HOME-STATE]
Q2) List the election years in which a Republican from Illinois was elected.
(ELECTIONS-WON [WINNER-NAME -- NAME] PRESIDENTS)
[PARTY = 'REPUBLICAN'] [HOME-STATE •'ILLINOIS'][YEAR]

C o m p u ~ SuerS, VoL8, No. 1. March !076


52 • Donald D. Chamberlin

Mapping.Oriented Languages these languages, the user states his query


not by a conventional linear syntax, but by
A third class of relational languages, called making choices or filling i n blanks on a
"mapping-oriented" languages, has been graphic display. Examples of this class of
proposed by R. F. Boyee and others [L9]. languages are Query By Example [L21,
These languages, directed at the nonpro- L24] and CuPm [L17]. We will illustrate
gramming professional, offer power equiva- this type of language by presenting ex-
lent to that of the relational calculus or amples using Query By Example.
algebra while avoiding mathematical con- In Query By Example, the user is pre-
cepts such as quantifiers. Mapping-oriented sented with a blank relation on his display.
languages include: SQUARE, a terse, APL- He fills in one or more rows of the relation
like notation [L25]; SEQUEL, a structured with an example of the desired result.
language based on English keywords ILl0, Known values are frilled in directly. Un-
L8]; and SLICK, a language intended for known values are represented by arbi-
implementation on associative hardware trarily chosen example values, which are
ILl2]. We will illustrate this class of lan- underscored to show that they are ex-
guages by presenting examples of SEQUEL. amples. The attributes to be printed are
The basic building block of mapping- identified by a "P." A query may be con-
oriented languages is the "mapping," which fined to a single relation, or span more than
maps a known attribute or set of attributes one relation, as illustrated by Q1) and Q2)
into a desired attribute or set of attributes at top of page 53.
by means of some relation. Q1) is an ex-
ample of a simple mapping:
Q1) What was the home state of President Data Manipulation
Kennedy? Most relational languages provide facilities
SELECT .HOME-STATE for data manipulation, which includes in-
FROM PRESIDENTS sertion, deletion, and update of tuples.
WHERE N A M E - - ' K E N N E D Y ' . Since update is not well motivated for the
In general, the result of a mapping may Presidential data base, we introduce the
be used in the specification of another following relation to illustrate data manipu-
mapping, as shown in Q2) below. This pro- lation:
cess of "nesting" mappings inside each other
makes it possible to express queries of great EMP (EMPNO, NAME, JOB,
complexity. SALARY)
This relation describes a set of employees,
Graphics-OrientedLanguages giving, in each instance, his or her employee
number, name, job, and salary.
Recently, another important class of rela- Many languages with set-oriented query
tional languages has been proposed: the features also allow set-oriented data manipu-
class of "graphics-oriented" languages. In lation. For example, the .following state-

q2) List the election years in which a Republican from Illinois


was elected.
SELECT YEAR
FROM ELECTIONS-WON
WHERE WINNER-NAME =
SELECT NAME
FROM PRESIDENTS
WHERE PARTY •'REPUBLICAN'
AND HOME-STATE = 'ILLINOIS'

ComputingSurveys, Vol. 8, No. 1, March 1976


Relational Data-Base Manag~w~,,~y~ms 53

Q1) What was the home state of President Kennedy?

PRESIDENTS NAME PARTY HOME.STATE

KENNEDY I P. NEVADA

Q2) List the election years in which a Republican from Illinois was elected.

ELECTIONS-WON YEAR WINNER-NAME WINNER-VOTES

P.1948 WILSON

PRESIDENTS NAME PARTY HOME-STATE

WILSON REPUBLICAN 'ILLINOIS

ment in SEQUEL[L10] has the effect of giving condition. Our first call to GAMMA-0 uses
a 10 % raise to all programmers: the operator CREATE-SCAN, which creates
a scan on the EMP relation to search for
UPDATE EMP tuples according to their EMPNO attribute.
SET SALARY = SALARY*I.1 The system returns a~ identifier, called a
WHERE JOB = 'PRDGRAMMER' SCANID, by which We may refer to the
newly created scan in future calls. Next we
All the languages we have discussed so call the operator SET-SCAN and furnish
far have been high level and nonprocedural the value which is to be searched for (in this
in nature. Indeed, one of the advantages of case the EMPNO, which is the parameter of
the relational model is that it is readily our transaction). Our next call is to the
compatible with high-level languages. But operator NEXT-SUBTUPLE, which re-
it should not be concluded that t h e rela- turns an actual tuple satisfying the cri-
tional model is incompatible with a lower- terion we established by the previous calls:
level, more procedural programming inter- (NEXT-SUBTUPLE ,could be called re-
face. In fact, several low-level, host-lan- peatedly if we expected many tuples to
guage relational interfaces have been pro- satisfy the criterion.) Having obtained the
posed, including GAMMA-0 [L4], XRM [$6], desired employee-tuple, we can compute a
and MINIZ [$8]. These interfaces are well new salary-value in our host program and
suited for writing programs that are to be then call UPDATE SUBTUPLE, which puts
called repeatedly and which update the the new salary-value into the data-base.
data base according to parameters furnished GAMMA-0allows a program to have as many
with the call. active scans as it wishes, and to control the
We will illustrate how one low-level re- position of each by explicit culls. When a
lational language, GAMMA-0, might be used .program has no further use for a scan, it
to write a transaction which finds the em- may drop it by .culling the operator DROP-
ployee-tuple having a given employee SCAN.
number and updates its salary component Although it i s a low-level, procedural
according to some computation. GAMMA-0 language, GAMMA-0 is considered .a rela-
consists of a set of operators which may be tional language because the means of ac-
called from a host language such as P L / I . cess to tuples is not predetermined. A rela-
GAMMA-0 is based on the concept of a tion may be accessed associatively through
"scan," which is like a cursor that moves any of its attributes--the attribute to be
through a relation testing tuples for some matched is declared when a scan is opened.

ComputingSurvive, V~l, 8DNo: I, March 1976


54 • Donald D. Chamberlin

Data Definition and Control the view as though it were a stored relation.
The supportability of updates to the data
In addition to query and data manipulation
base made by means of derived views is a
facilities, a complete data sublanguage
complicated question, one which requires
needs facilities for data definition and data
more research [M14].
control. Data definition has two main as-
pects: The issue of authorization is closely re-
lated to the issue of derived views. In fact,
• Specification of the characteristics of one approach to authorization is to grant to
data to be stored, e.g., the column- each user a particular restricted view [C6].
names and data-types for each rela- Another approach is to automatically add
tion; and certain predicates to the queries and up-
• definition of alternative "views" dates issued by a user in order to restrict
which are derived from the stored their scope to the set of authorized tuples
data. In relational terminology, a [C31.
view is a dynamic "window" on the This unified approach to language design
data base. Updates made to stored can be extended into the aTea of assertions
relations are visible through the concerning data integrity. An assertion is a
various views which are defined on statement about the data base which the
these relations. system automatically enforces by refusing
any update which fails to satisfy the as-
Data control also has two m a i n aspects:
sertion. In language terms, an assertion is
• control over authorization of various simply a predicate, which is syntactically a
users to perform various operations fragment of a query, and which may con-
on the data base; and tain other queries nested inside it. For
• ability to make integrity assertions example, suppose we wish to assert that for
that protect the validity of data and any given election the number of votes re-
define the set of permitted transitions ceiveed by the winner is greater than the
in the data base. number of votes received by any loser.
This assertion may be made as follows in
The relational model permits a language to S~QvEL (the variable X represents a tuple
take a consistent, unified approach to query, of the ELECTIONS-WON, relation):
data manipulation, data definition, and
data control. Several relational languages ASSERT ON ELECTIONS-WON X:
have gone to great lengths to provide such a WINNER-VOTES >
unified approach; these languages include (SELECT MAX (LOSER-VOTES)
S~QUEL [L10, LS, C6, I5], QvEL [S15, C3, FROM ELECTIONS-LOST
I4], and Query By Example [L21, L24]. WHERE YEAR=X.YEAR)
An important observation to be made in
data definition is that the definition of a Language Evaluation
view is simply a process of deriving a rela-
tion from the set of stored relations, and The great variety of proposed relational
that this is similar to the process of stating a languages leads us to the question: How can
query. Therefore, the full power of a query languages be evaluated and compared?
language may be applied to the definition of There are at least three criteria involved in
views. This is possible because all the re- any objective attempt to evaluate a lan-
lational query languages we have discussed guage: completeness, level, and learnability.
have the property of closure, i.e., they ope- Space constraints permit us to touch only
rate on relations to construct or define new briefly on each of these.
relations. A view may be a selected subset Codd [M3] was the first to establish a
of a stored relation, or it may span over careful definition of completeness for data-
more than one stored relation, as in the base sublanguages. He defined a language
ease of a join. Once the definition of a view to be relationally complete if it permits ex-
has been made, queries may be directed to pression of any query expressible in the

Computing Surveye, Vol. 8, No. 1, March 1076


Relational Data-Base Managem4mt S y s t ~ • 55

relational calculus. He then proceeded to are presently developing a system called


prove that the relational algebra was rela- RENDEZVOUS [E0] which engages in an
tionally complete and hence could serve as English dialog with the user to help him
the standard of comparison for completeness develop an unambiguous formulation of his
of algebra-oriented languages. Since the ap- query.
pearance.of this early work, proofs of rela-
tional completeness have been published for
the SQUARE and Query By Example lan- ADVANTAGES
guages [L25], ILl3].
The first attempt at a quantitative defi- We can now review and summarize the ad-
nition of language "level" was made by vantages of the relational model for data-
Halstead in his investigation of "software base management. Relations have four pri-
physics" [L23]. According to Halstead's mary advantages:
definition, "level" is a property of a par-
ticular expression of an algorithm. The 1) Simplicity: This term should need no
"simplest conceivable" expression of a given further explanation. The relational user
algorithm is assigned a level of 1, and more is presented with ia single, consistent
complicated expressions of the same al- data structure. He formulates his re-
gorithm are given level-values ranging from quests strictly in terms of information
0 to 1, computed on the basis of parameters content, without reference to system-
such as the number of operators and ope- oriented complexities.
rands used in expressing the algorithm. In 2) Data independence:" C. J. Date [Zll] has
[L14], Halstead applies the formulas of defined data independence as "im-
software physics to a comparison of Codd's munity of applications to change in
ALPHA language [L1] and DBTG-CoRoL storage structure and access strategy."
As we have seen, the relational model
[Zll.
The last method of language evaluation makes it possible to eliminate the de-
we will discuss is that of psychological tails of storage structure and access
tests in which the language is taught to a strategy from the user interface.
group of subjects under controled conditions 3) Symmetry: Data-base systems which are
and their learning progress is measured. The based on connections between records
emphasis of the experiment may be placed make some questions easier to ask than
on measuring speed of learning or degree of others--namely, questions whose struc-
comprehension, or on identifying particular ture matches that of the data base.
language features which seem to cause For example, in a hierarchic data base,
learning difficulties. Studies of this type the easiest question to ask is a question
have been published on the languages that begins at the root of the tree and
SQUARE and SEQUEL[L20], and Query By moves toward the leaves, applying suc-
Example [L22]. cessive qualifications at each step. Ques-
tions not reflecting this preferred struc-
ture can be asked awkwardly if at all.
Natural Languages
Since all information is represented by
Recently there has been considerable in- data values in relations, there is no
terest in the use of a natural language such preferred format for a question at the
as English u s a query language. The rela- user interface. It should be noted here
tional data model is well adapted to such an that symmetry of the data model does
attempt because it contains no implementa- not necessarily imply symmetry of the
tion-oriented concepts. M o s t natural-lan- underlying physical data structures
guage-oriented query systems, such as REL maintained by t h e system. A data-base
[E4] and CONVERSE [E5], attempt to trans- designer may choose to optimize the
late, without feedback to the user, from performance of some frequently posed
natural language into a computer-oriented query-type (for example, by providing
language. E. F. Codd and J. M. Cadiou an index for a certain attribute). The
56 • Donald D. Chamberlin

important thing to note is that such IMPLEMENTATIONS


optimizations do not appear in the user
interface. The greatest open research question of the
4) Strong theoretical foundation: The rela- relational data model is whether it can be
tional data model rests on the well- implemented to form an efficient and opera-
developed mathematical theory of re- tionally complete data-base management
lations and on the first-order predicate system. Many individuals and groups have
calculus. This theoretical background made contributions to this area of research;
makes possible the definition of rela- unfortunately, space limitations only per-
tional completeness and the rigorous mit mention of some of the major land-
study of good data-base design (nor- marks and large, ongoing projects in the
implementation of relations. For references
malization).
to many systems not discussed here, see the
The relational model also has a variety of Implementations sections of the bibli-
secondary advantages which derive from ography; special attention should be given
the fundamental advantages just outlined. to [$19], which is a transcript of a panel
Perhaps the most important of these is the discussion among implementors of r e l a -
ease with which high-level, nonprocedural tional systems.
relational languages may be defined. Be- Since the earliest n-ary relational lan-
cause they are easy to learn and use, high- guages proposed were Codd's relational
level languages make data bases available algebra and relational calculus, it is natural
to a new class of casual users who lack the that much of the earliest implementation
training required by conventional pro- work was directed toward these languageS.
gramming languages. High-level languages In [M3], Codd observed that the relational
also give the system maximum flexibility calculus has many advantages over the re -~
to optimize the execution of a given re- lational algebra from the end-user's point of
quest, and to adapt the stored data struc- view, but that the relational algebra pro-
tures to the changing needs of the user vides a sequence of operations which can
population. The nonprocedural approach to be more directly implemented on a machine.
language design permits a unified treatment In [M3], Codd also provides an algorithm,
of data definition, manipulation, and con- called a "reduction algorithm," for trans-
trol, as discussed in the section on "Lan- lating a relational calculus expression into
guages" (pages 49-55). Finally, high-level a sequence of operations in relational al-
languages make it easy to define and manipu- gebra. This approach was extended by
late views of data which are not directly Palermo [T6], who made certain improve-
supported by physical structures. (Of course, ments in the efficiency of the reduction al-
many of these advantages may also be ob- gorithm and implemented the operators of
tained by the use of high-level languages not the relational algebra using APL/360.
based on the relational data model.) A number of early projects in relational
One additional advantage of relations will data-base management adopted the ap-
be mentioned here. The relational model proach of implementing the relational al-
makes it possible to draw a clear distinction gebra directly. Perhaps the earliest of these
between data semantics and data structure. was the MACAIMS system, developed by
For example, the semantics of a data base Goldstein and Strnad at M I T [$1]. The
may be such that when a department record MACAIMSsystem, implemented on MULTtCs,
is deleted, all employee records for that introduced the important concept of en-
department should also be deleted. In the coding each data item by a fixed-length
relational model such semantic rules can be identifier, a n d using these identifiers rather
stated independently of data-base structure. than the actual data items in stored rela-
In some other data models (e.g., if employee tions. MACAIMSalso made a contribution to
records occur hierarchically under depart- the field of data independence by enabling
ment records) this type of semantic rule is different relations to be stored in different
closely related to (and often constrained by) forms and converted to a canonical form,
the data structure. when necessary, for comparison. More

Computing Surveys, Vol. 8, No. 1, March 1976


Relational Data-Base Manageme~ ~ystom8 • 57

recent developments in the use of rela- queries in relational calculus which s p a n


tional algebra at M I T are presented in more than one relation.
[S12]. In addition to the work on l~igh-level
Another early algebra-oriented system is languages, such as the ialgebra and calculus,
the Relational Data Management System efforts have been made to develop a lower-
(RDMS) of General Motors [$5]. RDMS level, procedural, relational interface for
is a display-oriented query system which host-language systems, or to serve as an
implements not only the operators of the intermediate interface.in implementing some
relational algebra, but also a number of other relational language. The first such
other set-oriented operators such as SORT, interface to b e implemented was the Rela-
GRAPH, and HISTOGRAM. tional Memory (RM) developed b y IBM in
An ongoing project in the implementation Cambridge, Massachusetts [$2, $3]. RM per-
of relational algebra is located at the IBM mits variable-length byte strings (entities)
Scientific Centre in Peterlee, England. The to be stored and referenced by numeric
Peterlee system was first called IS/1 and identifiers. Binary relations whose data-
later renamed the Peterlee Relational Test elements are integer s or entity-identifiers
Vehicle (PRTV) [$4, S16]. The system has may then be constructed. RM provides
been used in an environmental research efficient associative access to the binary re-
study with a data base of ten million char- lations: a hashing technique is used to lo-
acters [All, as well as by the Greater London cate a given "left-side" value, and all its
Council in an urban planning application associated "right-side" values are then ac-
having a data base of 50 million characters cessed by means of a linked list. R M also
[S19]. In addition to the usual algebraic provides a recovery capability for restoring
operators (join, projection, etc.), PRTV the data base to an earlier state in the event
provides an easy means to extend the system of a failure.
b y adding new relational operators. A user In 1973 the R M system was extended to
may construct temporary relations by ap- support n-ary relations; the resulting system
plying various operators either to stored was named X R M (Extended 'Relational
relations or to existing temporary relations. Memory) [$6]. X R M uses the "entities" of
The definition of a temporary relation is RM to store n-ary tuples; it also uses R M
kept irL ~he form of a tree of operators, and binary relations as "inversions" which pro-
the actual tuples are not materialized until vide efficient associative access to these
they are needed for output. An optimizer n-tuples. X R M maintains a "master rela-
may rearrange the operators in the defini- tion" whic~describes the various relations
tion of a temporary relation, e.g., choosing and inversions in the system. A user:may
to do restriction as early as possible and re- access a tuple associatively by its key-value
ordering as late as possible. PRTV allows (or the data-value in some inverted column),
different visible subsets of the data base, or may scan over a relation, retrieving all
but does not permit simultaneous use of the tuples which satisfy a given condition.
system by more than one user. X R M was used as the underlying access
The problem of optimizing the execution method in a prototype system developed at
of relational systems has recently attracted IBM Research in San Jose, which imple-
a great deal of interest. Smith and Chang, ' ments the SEQUEL data sublanguage IS10,
based at the University of Utah [T18], have Sll]. The SEQUEL system, which became
applied techniques of automatic program-" operational in 1974, provides set-oriented
ming t o transform relational algebra ex- facilities for: query, insertion, deletion, and
pressions into equivalent but more efficient update; dynamic creation and dropping of
expressions. Gotlieb, working at the Uni- relations; and automatic enforcement of
versity of Toronto [T17], has published a assertions about data integrity. These fea-
study of various algorithms for imple- tures are made available either as a stand-
menting the join operator. Rothnie, of M I T alone, display-oriented interface for casual
and the Defense Department Computer In- users, or as a host-language interface that
stitute [T8, T20], has developed an al- can be called from P L / I programs. The
gorithm for limiting the search• space for system contains an optimizer (described in

Coming .... ':!' / ;ii~O, 1 , " "~ ~


58 • Donald D. Chamberlin

[SLID which uses XRM inversions to limit ing a relational prototype is the INGRES
the search space for a given query. The (Interactive Graphics and Retrieval Sys-
SEQUEL prototype has been extended by tem), of the University of California at
IBM at Cambridge and by t h e ' M I T Sloan Berkeley [$7, $9, $15]. INGRES, which runs
School of Management to accommodate a on a P D P - 1 1 / 4 0 under the UNIX operating
multiple-user environment. The resulting system, implements QUEL, a relationally
system, called GMIS, is being used at MIT complete query language based on the re-
•as an ~information system for modeling New lational calculus. The INGRES system im-
England energy resources. [A12, $19]. plements a variety of features by automatic
Another prototype system based on XRM modification of the QUEL statement sub-
is being developed at IBM Research in mitted by the user. Alternative views are
Yorktown Heights, to implement Query supported by substituting the view-defini-
By Example. The system contains an tion into the user's statement [I4]. Authori-
optimizer which interprets Query By Ex- zation and integrity control are provided by
ample queries in terms of operations similar adding extra predicates to the user's state-
to those of the relational algebra (join, re- ment which limit its scope [C3]. Concurrent
striction, etc). At present, the system sup- update requests are kept from interfering
ports only a single user and does not pro- with each other by analyzing their respec-
vide update facilities. tive scopes and allowing an update to
A large-scale prototype data-base man- proceed only when it is "safe" [I2]. Finally,
agement system, called System R, is pres- the QUEL statement, which may contain
ently under construction a t ' I B M Research m a n y variables, is broken up by a "de-
in San Jose [$20]. System R is the first at- composition" algorithm into a series of
tempt to apply the relational data model to one-variable statements which are executed
an environment of many concurrent users one at a time. The physical data structures
and a high volume of requests. It will pro- used by INGRES include hashed tables (in-
vide an operationally complete data-man- cluding "order-preserving" hash functions
agement capability, with facilities for au- which permit sequential scanning in key-
thorization, logging and recovery, definition value order) and "generalized directories,"
of alternative views, and enforcement of which employ a tree-structure to map a
data consistency and integrity. System R key into an address interval, and then use
will support the SEQVEL language as an an order-preserving function to compute
external interface, as well as a set of pro- an address within the interval [$9].
cedural operators for host-language pro- Implementation of another relational
gramming. Requests to the system will be system, called ZETA, is presently under way
executed by an optimizer which chooses at the University of Toronto [$8, S14].
among various physical access methods, The ZETA system is constructed in three
including inversions maintained in the form levels. The lowest level is a language called
of B-trees IT1], physical pointer-chains, and MINIZ, which provides such basic operations
a sort-merge facility. A user is not con- as scanning a relation and accumulating a
strained to protect himself against the up- list of identifiers of tuples which satisfy a
dates of other concurrent users by explicit given condition. The middle level imple-
locking statements; the system automati- ments views ("derived relations") and has
cally generates locks as needed at the level an optimizer/interpreter which accepts
of individual tuples. Deadlocks are auto- queries spanning multiple relations. Three
matically detected and resolved. Some of types of end-user interfaces are supported
the locking techniques developed as part of by ZETA :
the System R project have been described
in [C1, C4, C8]. System R is being imple- • a host-language facility which pro-
mented on an IBM 370, using a VM/370 vides features similar to SEQUEL;
operating system modified for the data- • a query language generator system
base environment [T13]. whereby a user may create his own
Another large-scale attempt at construct- self-contained query language using

Computing Surveye, Vol. 8, No. 1, March 1976


Relational Data-BaseManag~m~ . 59

a syntax-driven compiler/compiler; decision is made as to w l ~ h e r the tuple


and satisfies the output cri~erian.
• a natural-language recognition system
based on semantic networks. This
system, called TORTS, is presently SUMMARY
being tested on a data base of student
records. This paper has discussed the terminology of
the relational data model and traced its
A second relational system, called OMEO~, development in t e r m s of normalization
[T23], is also being implemented at Toronto theory, language design, and implementa-
on a PDP-11/45. Like ZETA, OMEGA has a tion techniques. We h a v e discussed the
multilevel architecture. One of the internal advantages of n-ary relations for data-base
levels of OMEGA is the Link and Selector management, including simplicity, data in-
Language (LSL) [T12], an expression- dependence, symmetry, and a strong theo-
oriented language which provides subsetting retical foundation.
operations on a relation ~"selectors") and In order to be considered a true relational
connections from one relation to another system, a data-base system must possess at
("links"). least the following attributes:
A recent, and very promising, develop-
ment is the emergence of several designs for 1) All information is represented by
associative hardware to support relational data values. No essential informa-
data bases. One such proposal, called CASSM tion is contained in invisible connec-
(Context Addressed Segment Sequential tions among records.
Memory) was made by Su, et al, at the 2) At the user interface, no particular
University of Florida [HI, H2, H5]. C~ssM access path is "preferred" over any
is an array of processors, each having access other.
to a circular memory space (e.g., a disk 3) The user interface is independent of
track or circulating magnetic bubble reg- the means by which data is physically
ister). As data circulates in the memory, stored.
the processors search in parallel for data In [M14], E. F. Codd summarized the
which satisfies a given condition. In [L12], areas in which further research is most
Copeland and Su discuss implementation of needed in relational data-base manage-
a high-level, mapping-oriented relational meat. The following areas were included:
language called S~CK, superimposed on
CASSM. 1) Development of concurrency control
A similar design, called RAP, which is techniques specifically geared to the
also based on a cellular array of processors relational model.
with circulating memories, was recently 2) Measuring the performance attain-
reported by Ozkarahan, et al., of the Uni- able when the relational approach is
versity of Toronto [H3]. applied to a large-scale data base.
A third associative hardware system for 3) Development of the theory whereby
relational applications, called RARES (Ro- multiple alternative views of shared
tating Associative Relational Store), has data may be supported for retrieval
been proposed by Lin and Smith at the and update.
University of Utah [H4]. Like CASSM and 4) Demonstration of the viability of
RAP, RARES contains multiple rotating natural language query formulation
memory tracks with a read/write head per subsystems.
track. However, unlike CAss~ and RAP,
which store tuples in a linear fashion along ACKNOWLEDGMENTS
a track, RARES lays each tuple across many
tracks so that the entire tuple is read in The author is indebted to E. F. Codd of I B M for
his helpful comments during the preparation of
one character-read-time. Each tUple, read this paper. The bibliography which follows is
from memory, is held in a pipeline while a based on a bibliography compiled by E. F. Codd.

m
60 • Donald D. Chamberlin

The author i s also grateful to his colleagues at systems: a tutorial," Proc. Fourth In-
the IBM Research Laboratory in San Jose for ternatl. •ymposium on Computer and In-
their support and discussions. formation Sciences, Dec. 1972, Plenum
Press, New York, 1972.
[M8] CORD, E . F . "Understanding relations,"
CLASSIFICATION OF REFERENCES continuing series of articles published in
FDT, the quarterly bulletin of ACM-
Models and Theory SIGMOD, beginning with Vol. 5, 1 (June
M 1) General 1973),* ACM, New York, 1973.
N 2) Normalization, Decomposition, and [M9] HAWRYSZKIEWYCZ, I. T. "Semantics of
Synthesis data base systems," M I T Project, MAC
Z 3) Relationships between CODASYL Report MAC TR-112, Cambridge, Mass.,
D D L / D B T G and the Relational Dec. 1973.
Model [M10] BRACCHI, G. ; FEDELI, A. ; AND PAOLINI, P.
L Languages and Human Factors " A multi-level relational model for data-
Implementations base management systems," Data Base
S 1) Software Management, Proc. I F I P TC-2 Working
H 2) Hardware Conf. on Data-Base Management Systems,
T Implementation Technology April 1974, North-Holland Publ. Co.,
C Authorization, Views, and Concurrency Amsterdam, The Netherlands, 1974.
I Integrity Control [Mll] STONEBRAKER, M. "A functional view
A Applications of data independence," Proc. ACM-
D Deductive Inference and Approximate SIGFIDET Workshop on Data Descrip-
Reasoning tion, Access, and Control, May 1974,*
E Natural Language Support ACM, New York, 1974, pp. 63-81.
Y Sets and Relations (prior to 1969) [MI2] MBLTZER, H. S. "Relations and rela-
Certain references include asterisks with the tional operations," IBM Report to GUIDE
following meaning: 38 Information Systems Division, Dallas,
* Proceedings of ACM-SIGFIDET and Texas, May 1974.
ACM-SIGMOD Workshops are obtain- [M13] HI~CHCOCK, P. "Fundamental opera-
able from ACM Headquarters, 1133 Ave- tions on relations in a relational data
nue of the Americas, New York, N.Y. base," IBM Scientific Centre Report
10036 UKSC 0051, Peterlee, England, May 1974.
** Proceedings of the 1975 ACM Pacific [MI4] CorD, E. F. "Recent investigations in
Conference, San Francisco, April 17-18, relational data base systems," Informa-
1975 are obtainable from: Mail Room, tion Processing 74, Proc. I F I P Congress,
Boole & Babbage, 850 Stewart Drive, August 1974, Vol. 5, North-Holland Publ.
Sunnyvale, California 94086 Co., Amsterdam, The Netherlands, 1974,
~Vp. 1017-1021.
[M15] ~D~XI~D, H. "Datenbanksysteme 1,"
Models and Theory Reihe Informatik/16 (1974), Bibliogra-
1) General phisches Institut, Mannheim, W. Ger-
[M1] Coon, E. F. "Derivability, redundancy many.
and consistency of relations stored in [M16] HALL, P. A. V.; TODD, S. J. P.; AND
large data banks," IBM Research Re- HITCHCOCK, P. " A n algebra of relations
port RJ599, August 1969. for machine computation," IBM Scien-
[M2] Cony, E. F. " A relational model of tific Centre Report UKSC 0066, Peterlee,
d a t a for large shared d a t a banks," England, Jan. 1975.
Comm. ACM 13, 6 (June 1970), pp 377-397. [M17] SCHMID, H. A.; ANDSWENSON,J . R . "On
[M3] CODD,E . F . "Relational completeness of the semantics of the relational data
data-base sublanguages", Courant Com- model," Proc. ACM-S1GMOD C o n f . ,
May 1975,* ACM, New York, 1975, pp 211-

uter Science Symposia 6, "Data Base
vstems," New York, May 1971, Pren- 223.
t~ce-Hall, Englewood Cliffs, N.J., 1971,
pp. 65-98.
[M4] STRNAD,A . L . " T h e relational approach Models and Theory
to the management of data bases," Proc. 2) Normalization, Decomposition, and Synthesis
I F I P Congress, August 1971, Vol. 2, [N1] CODD,E. F. " F u r t h e r normalization of
North-Holland Publ. Co., Amsterdam, the data base relational model," Courant
The Netherlands, 1971, pp. 901-904. Computer Science Symposia 6, "Data
[MS] DURCHHOLZ,R. " D a s Datenmodell bei Base Systems," New York, May 1971,
Codd," Technical Report No. 69, Gesell- Prentice-Hall, New York, 1971, pp. 33-64.
schaft fiir Mathematik und Datenver- [N2] CODD, E. F. "Normalized data base
arbeitung, Bonn, W. Germany, July 1972. structure: a brief tutorial," Proc. 1971
[M6] HAWRYSZKIEWYCZ,I T.; AtqD DENNIS, ACM-SIGFIDET Workshop on Data
J . B . " A n approach to proving the cor- Description, Access, and Control, Nov.
rectness of data-base operations," Proc. 1971,* ACM, New York, 1971, pp. 1-17.
ACM-SIGFIDET Workshop on Data [N3] HEA'rH, I. J. "Unacceptable file opera-
Description, Access, and Control, Nov.- tions in a relational data base," Proc.
Dec. 1972,* ACM, New York, 1972, pp. 1971 ACM-SIGFIDET Workshop on Data
323-348. Description, Access, and Control, Nov.
[M7] DATE, C. J. "Relational data base 1971, ACM, New York, 1971, pp. 19-33.

ComputingSurveys,Voi.8, No. I, March 1976


Relational Data-Base Managemen~ ~y~tcff~ • '61

IN4] DELOBEL, C. "Aspects theoretiques sur Holland Publ. Co., Amsterdam, The
la structure de l'information dans une Netherlands, 1974.
base de donn~es", Revue Francaise d'In- [Z5] Co•D, E. F.; AND DATB,:C. J. " I n t e r -
formatique el de Recherche Operationelle, active support for non-prbgrammers: the
B - 3 (Sept. 1971). relational and network approaches,"
INS] DELOnEL, C. " A theory about data in Proc. 1974 ACM-SI(YMOD Dsbate "Data
an information system," IBM Research Models: Data Structure Set versus Rela-
Report, RJ964, San Jose, Calif., Jan. 1972. tional," May 1974,* ACM, New York,
[N6] RISSANEN, J.; AND DELOBEL, C. " D e - 1974.
composition of files, a basis for data stor- [Z6] DATE, C. J.; ANvCoDv, E . F . " T h e re-
age and retrieval," IBM Research Re- lational and network approaches: com-
port R J1220, San Jose, Calif., May 11,973. parison of the application programming
[N7] DELOBEL, C.; AND CASEY, R . G . De- interfaces," Prec. 1974 ACM-SIGMOD
composition of a data base and the theory Debate "Data Models: Data Structure Set
of Boolean switching functions," IBM versus Relational~" May, 1974,* ACM,
J. R. & D. 17, 5 (Sept. 1973), pp. 374-387. New York, 1974. •
[NS] KENT, W. " A primer of normal forms," [Z7] BACHMAN,C. W. " T h e data structure
IBM Technical Report TR 02.600, San set model," PreC. 1975 ACM-SIGMOD
Jose, Calif., Dec. 1973. Debate "Data Models: Data Structure Set
[N9] ARMSTRONG,W.W. "Dependency struc- versus Relational," May 1974,* ACM,
tures of data base relationships," In- New York, 1974.
formation Processing 7~, Prec. I F I P Con- [Z8] SZBLEY,E. H. "On the equivalences of
gress, August 1974, Vol. 3, North-Holland data based systems," Prec. ACM-
Publ. Co., Amsterdam, The Netherlands, SIGMOD Debate "Data Models: Data
1974, pp. 580-584. Structure Set versus Relational," May
[N10] DELOBEL, C.; AND LEONARD, M. " T h e 1974,* ACM, New York, 1974.
decomposition process in a relational [Z9] EVEREST,G . C . " T h e futures of data-
model," Technical Report, Laboratoire base management," Prec. ACM-SIGMOD
d'Informatique, Univ. of Grenoble, Workshop on Data Description, Access,
France, Sept. 1974. and Control, May, 1974, ACM, New
[Nll] WANG, C. P.; AND WEDEKIND, H. "Seg- York, 1974, pp. 445-.462.
ment synthesis in logical data base de- [Z10] OLLE,T . W . "Current and future trends
sign," I B M J. R. & D. 19, 1 (Jan. 1975) in data base management systems," In-
pp 71-77. formation Processing 7~, Prec. I F I P
[N12] ~ERNSTEIN, P. A.; SWENSON,J. R.; AND Congress, August, 1974. Vol. 5, North-
TSICHRITZIS, D. " A unified approach to Holland Publ. Co., Amsterdam, The
functional dependencies and relations," Netherlands, 1974, pp 998-1006.
Proc. ACM-S[GMOD Conf. May 1975,* {Zll] DATE, C. J. " ~ n introduction to data
ACM, New York, 1975, pp. 237-245. base systems," Addison-Wesley, Reading,
[N13] FADOUS, R. Y.; AND FORSYTH, J. " F i n d - Mass., 1975.
ing candidate keys for relational data [Z12] KAY, M. H. " A n assessment of the
bases," Prec. ACM-SIGMOD Conf., May CODASYL DDL for use with a rela-
1975,* ACM, New York, 1975, pp. 203-210. tiona 1 schema, " Data Base Description,
[N141 FADers, R. Y. "Mathematical founda- B. C. M. Douque aad G. M. Nijssen
tions for relational data bases," PhD. (Eds.), North-Holland Puhl. Co., Am-
Thesis, Michigan State Univ., Lansing, sterdam, The Netherlands, 1975, pp.
1975. 199-214.
IN15] SHARMAN,G. C. H. " A new model of [Z13] ROnINSON, K. A. " A n analysis of the
relational data base and high level lan- uses of the CODASYL set concept,"
guages," Technical Report TR. 12.136, Data Base Description, B. C. M. Douque
IBM Hursley Park Laboratory, England, and G. M. Nijssen, (Eds.), North-Holland
Feb., 1975. Publ. Co., Amsterdam, The Netherlands,
1975, pp. 169-182.
[Z14] TAYLOR, R. W. "Observations on the
Models and Theory attributes of database sets," Data Base
3) Relationships between CODASYL D D L / Description, B. C. M. Douque and G. M.
DBTG and Relational Model Nijssen (Eds.), North-Holland Publ. Co.,
{Z1] CODASYL Data Base Task Group Re- Amsterdam, The Netherlands, 1975, pp.
port, April 1971, ACM, New York. 73-84.
[Z2] CANNING,R . G . "Problem areas in data [Z15] OLLE, T. W. " A n analysis of short-
management," EDP Analyzer 12, 3 comings in the schema DDL with an
(March 1974). outline of proposed improvements,"
[Z3] EARNEST,C. P. " A comparison of the Data Base Description, B. C. M. Douque
network and relational data structure and G. M. Nijssen (Eds.), North-Holland
models," Technical Report, Computer Publ. Co., Amsterdam, The Netherlands,
Sciences Corp., El Segundo, Calif., April 1975, pp. 283-298.
1974. [Z16] HuiTs, M. "Requirements for languages
[Z4] NIJSSEN, G. M. " D a t a structuring in in data-base systems," Data Base Descrip-
tion, B. C. M. D o u q u e and G. M.
DDL and relational d a t a m o d e l , " Prec. Nijssen (Eds.), North-Holland Publ. Co.,
I F I P TC-2 Working Conf. on Data Base Amsterdam, The Netherlands, 1975, pp.
Management Systems, April 1974, North- 85-110.

Computing Surv~y~ Vol. ~No. 1, March J976


62 • Donald D. Chamberlin

[Z17] RORINSON,K.A. "Data base--the ideas ory," Proc. ACM-EIGMOD Workshop on


behind the ideas," Computer J. 18, 1 Data Description, Access, and Control,
(Jan. 1975), pp. 7-12. May 1974,* ACM, New York, 1974, pp.
[Z18] HELD,.G.; AND STONEBRAKER,M: " N e t - 265-276.
works, hierarchies, and relations in data [L13] ZLOOF, M. M. "Query by example,"
base management systems," Proc. • ACM$. Research Report RC4917, IBM T. J.
Pacific 75 Regional Conf., Aprd, 1975, Watson Research Center, Yorktown
ACM, New York, 1975, pp. 1-9. Heights, N. Y., July 1974.
[Z19] MARTIN,J. T. "Computer data.base or- [L14] HAt,STEAn, M. H. "Software physics
ganization," Prentice-Hall, Englewood comparison of a sample program in DSL
Cliffs, N.J., 1975. ALPHA and COBOL," IBM Research Re-
port RJ1460, San Jose, Calif., Oct. 1974•
[L15] PIRO'I~rE, A.; AND WODON, P: "A com-
Languages and Human Factors prehensive formal query language for a
[L1] CoDv,E. F. " A data base sublanguage relational data base: FQL," Technical
founded on the relational calculus," Proc. Report R283, M.B.L.E. Laboratoire de
1971 ACM-SIGFIDET Workshop on Recherches, Brussels, Belgium, Dec.
Data Description, Access, and Control, 1974.
Nov. 1971,* ACM, New York, 1971, pp. [L16] SUMMERS, R. C.; COLEMAN, C. D.; AND
35--68. FERNANDEZ, E. B. " A programming
[L2] CODD,E. F. "Relational algebra," Cou- language extension for access to a shared
rant Computer Science Symposia 6, data base," Proc. ACM Pacific 75 Re-
"Data Base Systems," New York, May gional Conf., April 1975,** ACM, New
1971, Prentice-Hall, New York, 1971. York, 1975, pp 114-118.
[L3] BRACCHI,G.; FEDELI, A.; AND PAOLINI, [L17] McDoNALD, N.; AND STONEBRAEER,M.
P• "A language for a relational data "CUPID: the friendly query language,"
base," Sixth Annual PrincEton Conf. on Proc. ACM Pacific 75 Regional Conf.,
Information Sciences and Systems, April 1975,** ACM, New York, 1975,
March 1972, Princeton Univ., N.J., 1972. 127-131•
[IA] B~ORNER,D.; Conn, E. F.; DECKERT,
K. L.; AND TRAIGER, I. L. "The GAMMA
[L18] •E STGAARD,R . E . " A COBOL data base
facility for the relational data model,"
ZERO n-ary relational data base interface: Proc. ACM Pacific 75 Regional Conf.,
specifications of objects and operations," April, 1975,** ACM, New York, 1975, pp
IBM Research Report R J1200, San Jose, 132-139.
Calif., April, 1973. [L19] SHU, N~, C.; HOUSEL, B. C.; AND LUM,
[L5] EARLEY, J. "Relational level data V.Y. CONVERT:a high level transla-
structures for programming languages," tion definition language for data con-
Acta Informatica, 2, 4 (1973), pp. 293-309. version," Proc. ACM-SIGMOD Conf.,
[L6] D E E , E.; HILDER, W.; KING, P. J. H.; May, 1975,* ACM, New York, 1975, p 3.
AND TAYLOR, E. "ConoL extensions to (Comm. ACM) 18, 10 (Oct. 75) 557-567.
handle a relational data base," British [L20] REISNER, P.; BOYCE, R. F.; ANn CHAM-
Computer Society, Working Party #5, BERLIN, D. D. "Human factors evalu-
Oct. 1973. ation of two data base query languages:
[L7] FEHDER, P. L. "The representation- SQUAREand SEQUEL," Proc. A F I P S Na-
independent language," IBM Research tioual Computer Conf., May 1975, Vol. 44,
Reports RJll21 & RJ1251, San Jose, AFIPS Press, Montvale, N.J., 1975, pp
Calif., Nov. 1972 & July 1973 respectively. 447-452.
[LS] BOYCE,R. F.; AND CHAMBERLIN, D . D . [L21] ZLOOF, M. M. "Query by Example,"
"Using a structured English query lan- Proc. A F I P S National Computer Conf.,
guage as a data.definition facility," May 1975, Vol. 44, AFIPS Press, Mont-
IBM Research Report RJ1318, San vale, N.J., 1975, pp 431-438.
Jose, Calif., Dec. 1973• THOMAS, J. C.; AND GOULD, J. ]). "A
[L9] BoYcE,R. F. ; CHAMBERLIN,D. D. ; KING, [L22] psychological study of Query by Ex-
W. F., III; AND HAMMER,M.M. "Speci- amvle," Proc. AFIPS National Computer
fying queries as relational expressions: Conf., May 1975, Vol. 44, AFIPS Press,
SQUARE," Data Base Management, Proc. Montvale, N.J., pp 439-445.
IFIP Working Conf., April 1974, North- [L23] HALSTEAn, M. H. "Software physics:
Holland Publ. Co., Amsterdam, The basic principles," Research Report
Netherlands, 1974, pp 169-177• RJ1582, IBM Research Laboratory, San
[L10] CHAMnERLXN,D. D.; ANB BOYCE, R. F• Jose, Calif., May 1975.
"SEQUEL: A structured English query
language," Proc. ACM-SIGMOD Work- [L24] ZLOOF, M.M. "Query by Example: the
shop on Data Description, Access, and invocation and definition of tables and
Control, May 1974,* ACM, New York, forms," Proc. Internatl. Conf. on Very
1974, pp. 249-264. Large Data Bases, Sept. 1975, ACM, New
[Lll] JERVIS,B. "Query languages for rela- York, 1975, pp 1-24.
tional data-base management systems," [L25] BoYcE, R. F.; CHAMBERLIN,D. D.; KING,
Masters Thesis, Univ. of British Colum- W. F.; AND HAMMER,M.M. "Specifying
bia, Vancouver, B.C., May 1974. queries as relational expressions: the
[L12] COP~LAND,G. P•; AND Su, S. Y• W. "A
high level data sublanguage for a con- SQUAREdata sublanguage," Comm. ACM
text-addressed segment-sequential mere- 18, 11 (Nov. 1975), pp. 621-628.

Computing Surveye,Vol. 8. No. 1, March 1976


Relational Data-Base Management • 63

Implementations Computer Oonf., May 1976~yoL 44, AFIPS


Press, Montvale, N.J., 1975, pp. 403-408.
1) Software
ISl] GOLDSTEIN, R. C.; ANn STRNAD, A. L. [S15] HELD,G. D.; STONEBRAKER,M. R.; A N D
WONO, E. "IN6aES: a relational data
"The MACAIMS data management sys- base system," Pron. A F I P 8 National
tem," Proc. 1970 ACM-SIGFIDET Work- Computer Conf., May 1975, Vol. 44, AFIPS
shop on Data Description and Access, Press, Montvale, N.J., 1975, pp 409-416.
Nov. 1970,* ACM, New York, 1970, pp. [$16] TODD,S. J. P. "Peterlee relational test
201-229. vehicle PRTV, a technical overview,"
IS2] SYMONDS, A. J.; ANn LORIE, R. A. " A IBM Scientific Centre Report UKSC
schema for describing a relational data 0075, Peterlee, England, J u l y 1975.
base," Proc. ACM-SIGFIDET Workshop
on Data Description and Access, Nov. [$17] WINsLoW, L. E. " A n efficient imple=
mentation of Codd's relational model
1970,* ACM, New York, 1970, pp. 230-245. data base," Proc. COMPCON 75, llth
[s31 LORIE, R. A.; ANn SYMONnS, A. J. " A Annual IEEE Computer Society Conf.,
relational access method for interactive Sept. 1975, IEEE, New York, 1975.
applications," Courant Computer Science
Symposia, 6, Data Base Systems, Prentice* [S18] MANACHER,G. K, "On the feasibility
of implementing a large relational data
Hall, New York, 1971, pp 99-124. base with optimal performance on a
[$4] NOTLEY, M. G. "The Peterlee IS/1 mini-computer," "Pron. Int¢fnatl. Conf.
system," IBM UK Scientific Centre Re- on "Very Large Data Bases, Sept. 1975,
port UKSC-0018, March 1972. ACM, New York, 1975, p p 175-201.
IS5] WHITNEY, V. K. M. "RDMS: a rela- [$19] CorD, E. F. (Ed.), "Implementation of
tional data management system," Proc.
Fourth Internatl. Symposium on Computer relational data base management sys-
and Information Sciences (COINS IV), tems," FDT, Qnar~erly Bulletin of ACM-
Dec. 1972, Plenum Press, New York, 1972. • SIGMOD 7, 3-4 (1975).
[S6] LoRIE, R. A. " X R M - - a n extended [S20] ASTRAHAN,M. M: et. al., "System R:
(n-ary) relational memory," IBM Scien- a relational approach to data-base man-
tific Center Report G320-2096, Cam- ageme n t," Research Report RJ 1738, IBM
bridge, Mass., Jan. 1974. Research Laboratory, San Jose, Calif.
IS7] McDONALD, N.; STONEBRAKER,M.; AND
Feb. 1976.
WONG, E., "Preliminary design of
INGRES: Part I , " Electronics Research
Lab. Report ERL-M435, Univ. of Cali-
fornia, Berkeley, April 1974. Implementations
IS8] CZARNIK, B. ; SCHUSTER, S. ; AND
2) Hardware
TSICHRITZIS, D. "ZETA: a relational
data base management system," Proc. [H1] Su, S. Y. W.; COPELAND, (]. P.; AND
ACM Pacific 75 Regional Conf., April LXPOVSKI,G. J . " Retrieval operations and
1975,** ACM, New York, 1975, pp. 21-25. data representations in a context-ad-
IS9] HELD, G.; ANn STONEBRAKER,M. "Stor- dressed disk sytsem," Proc. ACM-EIG-
age structures and access methods in the PLAN-SIGIR lnterface Meeting on Pro-
relational data base management system gramming Languafes and Information Re-
INGRES," Proc. ACM Pacific 75 Regional trieval, Nov. 1978, AC,M, New York, 1973.
Conf., April 1975,** ACM, New York,
1975, pp 26-33. [H2] COPELAND,G. P.; LIPOVSKI, G. J.; ANn
[S10] ASTRAHAN, M. M.; AND LOreS, R. A. Su, S. Y. W. " T h e architecture of
"SEQUEL-XRM: a relational system," CASSM: a cellular system for non-nu-
Proc. ACM Pacific 75 Regional Conf., meric processing," Pros. First Annual
April 1975,** ACM, New York, 1975, pp Symposium on Computer Archilecture, Dec.
34-38. 1973, IEEE, N.Y.,: 1973.
[su] ASTRAHAN, M. m.; AND CHAMBERLIN, [H3] OZKARAHAN,E. A.; SCHUS'rER, S. A.;
D . D . "Implementation of a structured AND SMITH, K . C . "RAP: an associative
English query language," Comm. ACM
18, 10 (Oct. 1975), pp 580-588. processor for data base management,"
[812] STEWERT, J.; AND GOLDMAN, J. "The Proc. A F I P S National Computer Conf.,
relational data management system: a May 1975, Vol. 44, AFIPS Press, Mont-
perspective," Proc. ACM-SIGMOD Work- vale, N.J., 1975, pp'379-357.
shop on Data Description, Access, and [H4] LIN, C. S.; ANn Sm,rti, D. C. P. " T h e
Control, May 1974,* ACM, New York, design of a rotating associative array
1974, pp 295-320. memory for a relational data-base man-
[S13] McLEoD, D. J.; ANn MELDMAN, M. J.
" R I s s : a generalized minicomputer rela- agement application," Pron. lnternatl.
tional data base management system," Conf. on Very Laege Data Bases, Sept.
Proc. A F I P S National Computer Conf., 1975, ACM, New York, 1975, pp. 453.-454.
May 1975, Vol. 44, AFIPS Press, Mont- [H5] Su, S. Y. W.; ~tNn LIPOVSKX, G. J.
vale, N.J., 1975, pp 397--402. '~CASSM: a cellula~ system for very large
[S14] MYLOPOULOS, J.; SCHUSTER, S. A.; AND data bases," Pros. lnternatl. Conf. on
TSICHRITZm, D. " A multi-level rela- Very Large Data Bases, Sept. 1975, ACM,
tional system," Proc. A F I P S National New York, 1975, pp 456-472.
64 • Donald D. Chamberlin

Implementation Technology conditions," The Soken Kiyo 5, 1 (1975),


[TI] BAYER, R.; AND McCR~IGHT, E. "Or- 159.-175.
[T151 FARLEY, J. H. GILLES; AND SCHUSTER,
ganization and maintenance of large or- S . A . "Query execution and index selec-
dered indices," Proe. ACM-SIGFIDET tion for relational data bases," Tech-
Workshop, Nov. 1970, ACM, New York, nical Report CSRG-53, Computer Sys-
1970 pp 107-141. tems Research Group, Univ. of Toronto,
[T2] " I B M system/360 operating system data • Toronto, Ont., Canada, March 19'75.
management services," IBM Publication [Ti6] PECHERER, R. M. "Efficient evaluation
No. GC26-3746, 1971. of expressions in a relational algebra,"
[T3] DATE, C. J.; AND HOPEWELL, P. " F i l e Proc. ACM Pacific 75 Regional Conf.,
definition and logical data inde- April 1975,~'* ACM, New York, 1975, pp
• ndence," Proc. 1971 ACM-SIGFIDET
orkshop on Data Description, Access, IT17]
44--49.
GOTLIEB, L. R. "Computing joins of
and Control, Nov. 1971,* ACM, New relations," Proc. ACM-SIGMOD Conf.
York, 1971, pp 117-138. May 1975,* ACM, New York, 1975.
[T4] DATE, C. J.; AND HOPEW~SI~L,P. "Stor- [T18] SMITH, J. M.; AND CHANG, P. "Opti-
age structure and physical data inde- mizing the performance of a relational
• ndence," Proc. 1971 ACM-SIGFIDET
orkshop on Data Description, Access, algebra data base interface," Comm.
ACM 18, 10 (Oct. 1975), pp 568-579.
and Control, Nov. 1971,* ACM, New [T19] SCHKOLNICK,M. "Secondary index op-
York, 1971, pp 139-168. timization," Proc. ACM-SIGMOD Conf.
[T5] ROVHNI~,J. B. " T h e design of general- May 1975,* ACM, New York, 1975, pp
ized data management systems," PhD 186-192.
Thesis, MIT, Cambridge, Mass., Sept. [T20] ROTHNIE, J. B. "Evaluating inter-
1972. e n t r y retrieval expressions in a rela-
IT6] PALERMO,F. P. " A data base search tional data base management system,"
problem," Fourth Internatl. Symposium Proc. A F I P S National Computer Conf.,
on Computer and Information Science May 1975, Vol. 44, AFIPS Press, Mont-
(COINS IV), Dec. 1972, Plenum Press, vale, N.J., 1975, pp. 417-423.
New York, 1972. [T21] PALERMO,F. P. " A n APL environment
IT7] HALL, P. A. V.; AND TODD, S. J . P . " F a c - for testing relational operators and data
torisations of algebraic expressions," base search algorithms," Proc. A P L 75
IBM Scientific Centre Report UKSC Conf., June 1975, ACM, New York, 1975,
005,5, Peterlee, England, April 1974. pp 249-256.
IT8] ROVHNIE,J. B. " A n approach to imple- [T22] HALL, P. A.V. "Optimisation of a single
menting a relational data management relational expression in a relational data
system," Proc. ACM-SIGMOD Workshop base system," IBM Scientific Centre
on Data Description, Access, and Control, Report UKSC 0076, Peterlee, England,
May 1974,* ACM, New York, 1974, pp July 1975.
277-294. [T23] SCHMID~H. A.; AND BERNSTEIN, P. A.
[T9] WHITNEY, V. K. M. "Relational data " A multi-level architecture for relational
management implementation tech- data base systems," Proc. Internatl.
niques," Proc. ACM-SIGFIDET Work- Conf. on Very Large Data Bases, Sept.
shop on Data Description, Access, and 1975, ACM, New York, 1975, pp 202-226.
Control, May 1974,* ACM, New York, [T24] LIEN, Y. E.; TAYLOR, C. E.; REYNOLDS,
1974, pp; 321-350. ,, M. L.; AND DRISCOLL, J. R. "Binary
[T10] CAsEY, R. G.; AND OSMAN, I. Gen- search tree complex--a realization of a
eralized page replacement algorithms in relational database management system~"
a relational data base," Proc. ACM- Proc. Internatl. Conf. on Very Large
SIGFIDET Workshop on Data Descrip- Data Bases, Sept. 1975, ACM, New York,
tion, Access, and Control, May 1974,* 1975, pp 540-542.
ACM, New York, 1974, pp 101-124.
[TIll HALL, P. A. V. "Common sub-expres-
sion identification in general algebraic
systems," IBM Scientific Centre Report
Authorization,Views and Concurrency
UKSC 0060, Peterlee, England, Nov.
[C1] CHAMBERLIN,D. D.; BoYcE, R. F.;
1974. TRAIGER, I . L . " A deadlock-free scheme
IT12] TSICHnITZIS, D. " A network frame- for resource locking in a data base en-
vironment," Information Processing 74,
work for relation implementation," Proc. Proe. I F I P Congress, August 1974, North-
I F I P TC-$ Special Working Conf. on the Holland Publ. Co., Amsterdam, The
DDL, Jan. 1975, published as Data Base Netherlands, 1974, pp. 340-343.
Description, North-Holland Publ. Co., [C2] OWENS, R. C. "Evaluation of access
Amsterdam, The Netherlands, 1975, pp. authorization characteristics of derived
269-282. data sets," Proc. ACM-SIGFIDET Work-
IT13] GRAY,J. N.; AND WATSON,V. " A shared shop on Data Description, Access, and
segment and inter-process communica- Control, Nov. 1971,* ACM, New York,
tion facility for VM/370," Research Re- 1971, pp 263-278.
port RJ1579, IBM Research Laboratory, [C3I STONEBRAKER,M.; AND WONG, E. "Ac-
San Jose, Calif., Feb. 1975. cess control in a relational data base
[T14] CHINA, Y. " A data base search algo- management system by query modifi-
rithm based on complicated retrieval cation," Electronics Research Lab.,

ComputingSurveys,Voi. 8, No. 1, March 1976


Relalional Data-BaseM a n ~ e ~
Report ERL-M438, Univ. of Calif., [A2] KUNII, T. L.;AMANO, T,; ARISAwA,H.;
Berkeley, May 1974. AND OKAVA,S. "An interactive fashion
[C4] ESWARAN,K. P.; GRAY, J. N.; LORIE, • design system ImrAvs," abstract in Proc.
R. A.; AND TRAIGER, I. L. "On the Conf. on Uomputsr ~bt~phics & Interac-
notions of consistency and predicate tive Tech~,iques, July 197/4; paper in Com-
locks in a data base system," IBM Re- puters & Graphics 1, (1975), Pergamon
search Report RJ1487, San Jose, Calif., Press, New York:
Dec. 1974. [A3] WILLIAMS,R. "On the application of
[C5] FERNANDEZ,E. B.; SUMMERS, R.. C.; AND relational data structures in computer
COLEMAN, C. D. "An authorization raphics," Information Procsssing 7~,
model for a shared data base," Proc. roc. IFIP Congress, August 1974, Vol. 4,
ACM-SIGMOD Conf., May 1975,* ACM, North-Holland Publ. Co., Amsterdam,
New York, 1975, pp23-31. The Netherlands, pp 722-726.
[C6] CHAMBERLIN,D. D.; GRAY, J. N.; AND [A4] KUNII, T. L. ; WEYL, S. ; ANDTENENRAUM,
J.M. "A relational data base schema for
describing complex pictures with color
system," Proc. A F I P S National Com- and texture," Pro¢. Second Jr. Conf. on
uter Conf., May 1975, Vol. 44, AFIPS Pattern Recognition, August 1974, IEEE
ress, Montvale, N.J., 1975, pp 425-430. Cat• No. 74CH0885-4C, IEEE, New
[C7] WEE,R. S.S. "Problems in the dynamic York, 1974.
sharing of data in a relational data base [A5] YALLE, G. "Interactive handling of
environment," IBM Scientific Centre data base relations: experiments with
Report UKSC 0067, Peterlee, England, the relational approach," Technical Re-
August 1975. port, Univ. of Bologna, Bologna, Italy,
[C8] GRAY, J. N. ; LORIE, R. A. ; AND PUTZOLU, March 1975.
G . R . "Granularity of locks in a large [A6] DEJONO,S. P. ; ANy ZLOOF,M.M. "The
shared data base," Proc. [nternatl. Conf• system for business automation (SBA):
on Very Large Data Bases, Sept. 1975, programming lauguage," IBM Research
ACM, New York, 1975, pp 428--451. Report RC5302, Yorktown Heights, N.Y.,
March 1975.
[A7] DEJoNo,S. P.; AND ZLooF~M. M. "Ap-
Integrity Control plication design within the system for
[I1] FLORENTIN,J.J. "Consistency auditing business automation", IBM Research Re-
of data bases," Computer J. 17, 1 (Feb. port RC5366, Yorktown Heights, N.Y.,
1974), pp 52-58. April 1975.
[I2] STONEBRAKER, M• "High level in- [A8] NAVATHE, S. B.; AND MlmvrBN, A. G.
tegrity assurance in relational data base "Investigations into the application of
management systems," Electronics Re- the relationalmodel to data translation,"
search Lab. Report ERL-M473, Univ• Proe. A C M - 8 1 G M O D C'tmf.M a y 1975,*
of Calif. at Berkeley, August 1974. A C M , New York, 1975,pp 123-138.
[I3] GRAVES,R. W. "Integrity control in a [AP] BANDURSKI, A. E.; ANY JBItFI~nSON,D. K.
relational data description language," "Data description for computer-aided
Proc. ACM Pacific 75 Regional Conf., design," Proc. ACM-81GMOD Conf.
April 1975,** ACM, New York, 1975, pp May 1975,* ACM, New York, 1975 pp
108-113. 193-202.
[I4] STONEnRAKER,M. "Implementation of [A10] WILLIAMS, R.; AND GIvmsos, G. M.
integrity constraints and views by query "A picture building system," Proe. Conf•
modification," Proe. ACM-SIGMOD on Computer Graphics, Pattern Recog-
Conf. May 1975,* ACM, New York, 1975, nition, & Data ,Structure, May 1975,
pp 65-78. IEEE Cat. N o . 75CH0981-1C, IEEE,
[I5] ESWARAN,K. P.; ANn (3HAMRERLIN, New York, 1975.
D. D. "Functional specifications of a [All] Go, A.; STONEnRAKSR, M.; AND WIL-
subsystem for data base integrity," LIAMS, C. " A n approach to implement-
Proc. Internatl. Conf. on Very Large ing a geo-data system," Proc. ACM-
Data Bases, Sept. 1975, ACM, New SIGDA-~IGMOD.~IGqRAPH Workshop
York, 1975, pp 48-68. on Data Bases for Interactive Design,
[16] HAMMER,M. M.; AND McLEov, D. J. Sept. 1975, ACM, New York, 1975, pp.
'l~S~m~ts~e ~,t,e ~ot cYi?n~errneal~t.i°~:nlfdaot~ 67-77.
[A12] DONOVAN,J.; FZSS~L, R.; GREENRERG,
Very Large Data Bases, Sept. 1975, ACi~, S.; ANY GU~S~AO, ..L', "An experi-
New York, 1975, pp 25--47. mental VM/370 based miormation sys-
tem," Proc. lnternaN. Conf. on Very Large
Data Bases, Sept. 1975, ACM, New York,
Applications 1975, pp 549-553.
[A1] SooP, K.; SVENSSON, P.; AND WIKTORIN,
L. "An experiment with a relational
data base system in environmental re- Deductive Inference and Approximate
search," Proc. Fourth Internatl. Sym- Reasoning
osium on Computer and Information The references in this Section represent a small
ciences (COINS IV), Dec. 1972 Plenum sample of the publications in deductive inference.
Press, New York, 1972. Many additional referenceswill be found in [D1].

Computer Survt,.vs.V~ S. No:l. ~reb l~e

. ~.~*~?,. ~ - : U~
66 • Donald D. Chamberlin

[D1] CHANG,C. L.; AND LEE, R. C. T. ,Sym- Center, Yorktown Heights, New York,
bolic logic and mechanical theorem proving, July 1973.
Academic Press, New York, 1973. leg] CODD, E. F. "Seven steps to REN-
[D2] MINKER,J. "Performing inferences over DEZVOUS with the casual user," Proc.
relational data bases," Proc. ACM- IFIP TC-~ Working Conf. on Data Base
,SIGMOD Conf. May 1975,* ACM, New Management ,Systems, April 1974, North-
York, 1975 pp 79--91. Holland Publ. Co., Amsterdam, The
{D3] Z-ADEH,L. A. "Calculus of fuzzy re- Netherlands, 1974.
strictions," Report ERL-M562, Elec-
tronics Research Lab., Univ. of Calif.,
Berkeley, Calif., Feb. 1975. Sets and Relations (prior to 1969)
These references are included to enable the reader
to trace work published prior to 1969 on computer
Natural Language Support support for (mathematical) sets and relations.
[El] SIMMONS, R. F. "Natural language [YI] CODASYL Development Committee.
question-answering systems: 1969," "An information algebra", Comm. A C ~
Comm. ACM 15, 1 (Jan. 1970), 15-30. 6, 4 (April 1962), 190-204.
[E2] SCH-ANK,R. C. ; ANDCOLnY,K.M. (Eds.), [Y2] LEVIEN,R. E.; AND MARON, M.E. " A
Computer models of thought and language computer system for inference execution
W. H. Freeman, San Francisco, 1973. and data retrieval," Comm. ACM 10,
[~] RusTIN, R-AND-ALL(Ed.), "Natural lan- 11 (Nov. 1967), 715-721.
guage processing," Courant Computer [Y3] CmLDS, D. L. "Feasibility of a set-
Science Symposia 8, New York, Dee. theoretical data strueture--a general
1971, Prentice-Hall, Englewood Cliffs, structure based on a reconstituted defi-
N.J., 1971. nition of relation," Proc. IFIP Congress
[E4] THOMPSON, F. P.; LOCKEM-ANN, P. C.; 1968, North-Holland Publ. Co., Amster-
DOSTERT, B. H.; .AND DEVERILL, R. dam, The Netherlands, 1968, pp 162-172.
"REL: a rapidly extensible language [Y4] CmLvS, D. L. "Description of a set-
system," Proc. ~th ACM National Conf., theoretic structure," Proc. AFIP,S 1968
New York, 1969, ACM, New York, 1969, Fall Jr. Computer Conf., Vol. 33, AFIPS
vv 399--417. Press, Montvale, N.J., 1968, pp 557-564.
[E5] KELLOGG,C. H.; BURGER, J.; DILLER, T.:
-ANDFOGT, K. "The CONVERSEnatural [Y5] lSH~elWa tiLo~ N~nSe~rE;, EwitHh ~'e~uR:~i:~
language data management system: cur- capabilities," Proc. ACM ~$rd National
rent status and plans," Proc. ACM ,Sym- Conf., August 1968, Brandon/Systems
posium on Information ~torage and Re- Press, Princeton, N.J., 1968, pp 143-156.
trieval, 1971, ACM, New York, 1971, [Y6] FELDMAN,J. A.; AND ROVNER, P. D.
pp 33-46. , " A n ALGoL-based associative language,"
[E6] WINgeR-An,T. 'Procedures as a repre- Comm. ACM 12, 8 (August 1969), 439-449.
sentation for data in a computer program [Y7] KUHNS,J. L. "Logical aspects of ques-
for understanding natural language,"
MIT Project MAC Report MAC TR-84,
Cambridge, Mass., 1971. f - . ( ),
[ET] MONTGOMERY,C. A. " I s natural lan- • Dec. 1969, Academic Press, New York,
guage an unnatural query language?" 1969.
Pros. ACM National Conf., New York, [Y8] KOCHEN,M. "Adaptive mechanisms in
1972, ACM, New York, 1972, pp 1075-- digital concept processing," in Discrete
1078. Adaptive Proeesses--,symposium and
[E8] PETRICK, S. R. "Semantic interpreta-
tion in the REQUESTsystem," IBM Re- Panel Discussion, AIEE, New York, 1962
search Report RC4457, IBM Research pp 50--58.

Computing Surveys,Vol.8pNo. 1, March 1976


Hierarchical D a t a - B a s e M a n a g e m e n t : A Survey

D. C. TSICHR|TZIS
and
F. H. LOCHOVSKY
Department of Computer Science, University o] Toronto, Toronto, Ontario, Canada MS~ IA7

This survey paper discusses the facilities provided by hierarchical data-base


management systems. The systems are based on the hierarchical data model which
is defined as a special case of the network data model. Different methods used to
access hierarchically organized data are outlined. Constructs and examples of
programming languages are presented to illustrate the features of hierarchical
systems. This is followed by a discussion of techniques for implementing such
systems. Finally, a brief comparison is made between the hierarchical, the
network, and the relational systems.
Keywords and Phrases: data base, data-base management, data manipulation, data
model, data independence, generalized processing, hierarchical data model,
hierarchical languages, hierarchical systems, network systems, record retrieval,
relational systems
CR Categories: 3.51, 4.33, 4.34

INTRODUCTION world are often arranged in a hierarchical


fashion.
In the real world, there seems to be a great D a t a bases portray and represent infor-
deal of order and structure. I t m a y be t h a t mation about the world. Since the world ap-
this structure is implicit as a basic property pears to us to be arranged hierarchically, it
of the physical world. On the other hand, it seems natural t h a t data base management
m a y be that humans simply use this struc- systems (DBMSs) should provide the tools
ture to understand the world better. Whether needed to represent and manipulate hier-
God-given or human created, structure is archical structures. There are actually a
instrumental in helping men to understand number of v e r y successful D B M S s which
the world. One of the most common and provide the user with a hierarchical view of
simplest structures is the hierarchical struc- the world [9, 14, 17]. In this paper, we
ture. In the natural world, among the ani- discuss the facilities offered b y these
mals, for instance, there is a hierarchical DBMSs. Before considering these facilities,
structure that is graded according to in- however, we first have to define, exactly,
telligence. Some human organizations are the nature of a hierarchical structure.
also arranged hierarchically, e.g., manage-
ment structures. Because hierarchies are so HIERARCHICALDATAMODEL
familiar in nature and in human society, it D a t a represent ideas about the real world
seems natural to us that our ideas about the which people conceive in terms of entities

Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish, but
not for profit, all or part of this material is granted provided that ACM's copyright notice is given
and that reference is made to the publication, to its date of issue, and to the fact that reprinting privi-
leges were granted by permission of the Association for Computing Machinery.

C o m p u t i u g ~ v ~ , V~L 8, I~o. 1, M~ch 1~6


106 • D. C. Tsichrit~is and F. H. Lochovsky

CONTENTS bute of an entity. Since relationships between


attributes are represented within record
types, we will concentrate mainly on rela-
tionships among different entities.
The relationships between entities can be
represented by a graph [1]. In the graph,
each node indicates a record type that repre-
INTRODUCTION sents an entity and each undirected arc
HIERARCHICAL DATA MODEL specifies a relationship between the entities.
HIERARCHICAL LANGUAGES Many relationships are possible between
Tree Traversal
General Selection any two nodes. Each relationship is named
IMPLEMENTATION explicitly to distinguish it from other rela-
CONCLUDING REMARKS tionships. Figure 1 shows some of the rela-
REFERENCES
tionships between presidents, congresses,
etc.
Each relationship depicted in Figure 1
can potentially be a general N : M relation-
ship. Taylor and Frank have shown, for
instance, in their paper on CODASYL data
base management systems in this special
T
issue [see page 67] that the relationship be-
tween P R E S I D E N T and CONGRESS is
and their attributes. A data model is a
N : M . However, most of the relationships
method of logically organizing such data
are I : N , e.g., A D M I N I S T R A T I O N S
according to the relationships found among
HEADED, A D M I T T E D DURING, etc. In
the data. In general, two types of relation-
addition, as it was shown in the same com-
ships can be distinguished [19]. The first
type is a relationship among attributes of panion paper, there is a way of transforming
a general N: M relationship into two 1: N re-
the same entity; the second type is a rela-
lationships by introducing an intermediate
tionship between entities. The difference
record type. By applying this transformation
between the two types can be understood b y
examining the following examples. repeatedly, we can obtain a relationship
graph, called a data structure diagram, where
• The attribute relationship becomes clear if
all the relationships are 1 :N, as seen in Fig-
we consider presidential elections. A presi-
ure 2 [2].
dential candidate has a name and is
The relationships between the entities can
affiliated with a political party. The can-
didate's name and his political party both be viewed as relationships between the
record types. All of the relationships can be
describe the candidate, and these two
assumed to be 1 :N without loss of generality.
descriptions are of interest to us only as
The I : N relationships have a certain direc-
long as the election is of interest--perhaps
tion that is represented by introducing di-
only until the election is held.
• The entity relationship between two en- rected arcs in the graphs, as we have done in
Figure 2. Each arc points from one to N in
tities becomes clear if we consider the re-
lationship between a president and an the I : N relationship. For instance, between
election. Both the president and the elec- the record types STATE and P R E S I D E N T ,
the arc representing the relationship NA-
tion exist independently of one another,
whether or not the president still holds TIVE SONS points from STATE to PRESI-
DENT, since each state may have many
office as a result of an election.
In the sequel, attribute relationships of an native son presidents, but each president
entity will be represented by a record type. A was born in only one state.
record type is a named collection of data Consider the special ease where the data
items. A data item is the smallest named structure diagram represents a tree in which
logical unit of data; it represents an attri- the direction of the arcs points away from

Computing Survey% Vol. 8, No. 1, March 1976


Hierarchical Data-Bays Ma~agcrd¢r~ 107

I PRES,OEN, I

L.=,o.i/ \ \
STATE
q ~ ~°M INISTRATiOHS
HEADED

ADMITTED
DURING~ ADMiNiSTRATiONI
FIGURE 1. A general relationship graph.

I PRESIDENT [

ELEC,,ONSWON/
// /
\ \ ~oo.o.....o
,-" / \ ~1 CO.ORES.RES'IN~
1
[ '<EOT'ON I / \ f
/ \ / ""'""""
""'°7 I o"o--1
I .ATE I ~

FIGURE 2. A d a t a s t r u c t u r e diagram.

[PRESIDENT] LEVEL1

I .~.CT,ON I IADH,..AT.NI LEVEL.

I STATE I LEVEL3
FIGURE 3. Hierarchical definition tree.

the root, as in Figure 3. Because there can a hierarchical definition Lre¢. A hierarchical
be at most one arc between any two record definition tree specifies both what record
types the arcs do not need to be labeled. Such types are allowed to be included in the data
a restricted data structure diagram is called base and the permissible relationships be-

Compufil~ Surveys. 1/01.8. No. !. M~t~h1070


108 D. C. Tsichrit~is and F. H. Lochovsky
PRESIDENT

ADMINISTRATION

(
ELECTION

//
I STATE

FIGURE 4. Data-base trees.

tween record types. Figure 3 represents a the record occurrences can be identified in a
hierarchical definition tree--one that is a natural way. A hierarchical path is a se-
subset of the data structure diagram shown quence of records in which the records,
in Figure 2. The hierarchical data model is starting at a root record, follow alternately
defined as the data model v~hich organizes in a parent-child relationship. For example,
data logically according to the structural the sequence: PRESIDENT record, AD-
relationships of hierarchical definition trees. MINISTRATION record, STATE record,
The level of a record type in the hier- defines a hierarchical path.
archical definition tree :.s a measure of its In a hierarchical data base that is struc-
distance from the root of the tree. The root tured like Figure 3, each P R E S I D E N T
record type is the highest level record type in record occurrence can have many ADMIN-
the tree (level 1). The PRESIDENT record ISTRATION record occurrences connected
type in Figure 3 is an example of a root to it. Each ADMINISTRATION record oc-
record type. The other record types, called currence may (in turn) have several STATE
dependent record types, are at lower levels in record occurrences connected to it, as ex-
the tree (levels 2, 3, etc.). In this instance emplified in Figure 4. Each STATE record
the ELECTION and ADMINISTRATION occurrence, however, has exactly one parent
record types are both at level 2. record occurrence: ADMINISTRATION
A data base that corresponds to the hier- (during which the state was admitted), and
archical definition tree of Figure 3 is shown one grandparent record occurrence: PRESI-
in Figure 4. The hierarchical data base is a D E N T (who served when the state was
collection or forest of trees called data-base admitted).
trees whose record occurrences appear as Two things should be noted in Figure 4.
nodes. There are, for instance, three data First, there can be a varying number of oc-
base trees in Figure 4. All of the trees are currences of each record type at each level.
constructed according to the relationships Second, each record occurrence (except for a
permitted explicitly in the hierarchical root record occurrence, PRESIDENT)
definition tree. In a hierarchical data base, must be connected to an occurrence of an
parents and children, or ancestors (parents, ancestor record type as constrained by the
parents of parents, etc.) and descendants hierarchical definition tree. There can be no
(children, children of children, etc.) among "independent" record occurrences of record

Computing Surveyv, Vol. 8, No. 1, March 1976


Hierarchical Data-Base Management • 109

0 PRESIDENT

I
/\
ADMINISTRATION

+ STATE ÷ ÷ 4"

(a)

÷ STATE +
II +
I
-IF

(b)
FIGURE 5. TwO d a t a - b a s e h i e r a r c h i e s .

types ELECTION, ADMINISTRATION, since they carry no data) serve only to main-
or STATE. If we regard the relationships tain the structure and to qualify the data.
between parent and child record types as a Since, logically, the two notations are
data structure set, then membership in the equivalent, for simplicity and uniformity
set is always mandatory in terms of the only the former notation will be used.
DBTG network systems, as Taylor and All features of the example of a presi-
Frank have discussed in the paper already dential data base shown in Figure 2 cannot
mentioned [page 67]. be completely captured by one hierarchical
The reader can find in the literature a definition tree. The data structure diagram
similar, but not identical, notation for is itself a network. However, we can repre-
representing a hierarchical data base. This sent the same information b y using more
notation allows the data to reside only at the than one hierarchical definition tree as
terminal nodes of the tree [5l. All inter- shown in Figure 6.
mediate nodes of the data-base trees are The record types contain the following
present only to maintain the hierarchical data items:
relationships. Only the terminal nodes have
record occurrences with data-item values. P R E S I D E N T - - P R E S N U M B E R , PRES
NAME, BIRTHDATE, DEATH
Figure 5(a) shows a hierarchical definition
DATE, PARTY, SPOUSE.
tree and a data-base tree of the type pre-
ELECTION--YEAR, PRES VOTES,
viously discussed; Figure 5(b) represents LOSER VOTES, LOSER, LOSER
the same information in terms of a hier- PARTY.
archical definition tree where the record A D M I N I S T R A T I O N - - A D M I N NUM-
types are present only at the terminal nodes. BER, INAUG DATE, VICEPRESI-
The intermediate nodes (marked with a dot DENT.

Computing Sumreys, Vol. 8, No. 1, Match 1076


110 D. C. Tsiehritais and F. H. Loehovsky
[ ?RESIDENT 1

ELECT.. I I A INISTRA IO" I I CON°RESSSER"D

STATE ADMITTED J

(a)

STATE } I CONGRESS J

I NATIVEPRESIDENT ]
l
PRESIDENTSERVED J

(b) (¢)
FXGURE6. Presidential data-base hierarchical definition trees.
CONGRESS SERVED--CONGRESS parent record occurrence and inserting a
NUMBER. child. For example, in Figure 5, before one
STATE ADMITTED--STATE NAME. can insert an occurrence of a STATE record
STATE--STATE NAME, POPULA- type, an occurrence of an ADMINISTRA-
TION, STATE VOTES. TION record type would first have to be se-
NATIVE P R E S I D E N T - - P R E S NUM- lected.
BER. When a record occurrence is deleted, all of
CONGRESS--CONGRESS NUMBER, its descendant record occurrences are also
SENATE REP PERCENT, SENATE deleted. The hierarchical data model does
DEM PERCENT, HOUSE REP PER- not allow non-root records to exist without
CENT, HOUSE DEM PERCENT. ancestors. In Figure 5, if an ADMINISTRA-
P R E S I D E N T SERVED--PRES NUM- TION record occurrence is deleted, all oc-
BER. currences of STATE record types connected
to the occurrence of the ADMINISTRA-
TION record type also have to be deleted.
HIERARCHICAL LANGUAGES To retain the descendant records, but not a
parent record, as is sometimes necessary,
A hierarchical system is a DBMS which pro- some systems provide commands which de-
vides facilities for inserting, modifying, de- lete the data-item values, but not the
leting, and retrieving record occurrences record itself [6, 9, 10, 17]. Essentially,
in a hierarchical data base. Because of the this facility permits null (empty) records to
nature of the hierarchical data model, each exist in the data base. In this way, the data
new record that is inserted (except for a associated with the descendant record is re-
root record occurrence) has 'to be connected tained.
to an occurrence of a parent record type. Records retrieved in a hierarchical data
Usually this action is effected by selecting a base may be both selected and qualified ac-

ComputingSurveym,Vol. 8, No. 1, M a s h 1976


Hierarchical Data-Base Management • 111

cording to the tree structure [12, 13, 18]. normalization, while the qualifying of de-
Records are usually selected by means of a scendants is called downward hierarchical
qualification which expresses the criterion of normalization [16].
selection. The qualification consists of con- The retrieval operation in a hierarchical
ditions of the form data base may be performed in one of two
ways. According to the first method, a user
((data item name~(conditional operator~ explicitly uses the tree structure of the data
(value)) base to traverse the data-base trees in a
specified order. Traversal may be inde-
connected by Boolean operators AND, OR,
pendent of record selection and qualifica-
and NOT. The usual conditional operators
tion, in which case it resembles sequential
are < , _<, > , > , = , and ~ or their mne-
processing. On the other hand, records may
monic equivalent. The qualification may se-
be selected and qualified, but in a very re-
lect records of several different record types.
stricted manner only determined by the
In our example, a qualification such as
traversal order. According to the second
((PRES N A M E = E I S E N H O W E R ) AND
retrieval method, the user selects and quali-
(YEAR = 1956)) selects both P R E S I D E N T
fies records based on the relationships be-
and ELECTION records, In general, the
tween the data items of the record types.
qualification can be associated with data
Although the user has to be aware of the
items from any record type in the hierarchi-
tree structure of the data base, he does not
cal definition tree. However, almost all sys-
explicitly use this structure to retrieve
tems permit the qualification to contain
records. Instead, the system utilizes the
data items only from record types that lie in
hierarchical structure to determine which
a hierarchical path. In this way, they avoid
records qualify among those selected by the
the ambiguity that arises when the Boolean
user. Each method will be discussed in de-
negation operator (NOT) is specified in the
qualification [12, 13]. tail in the following sections.
After a record has been selected, other
records may qualify for retrieval. Every Tree Traversal
record has, for example, a unique set of I n systems that use a tree traversal language,
ancestors in the data base. All ancestors of record types and record occurrences are
the selected record may qualify for retrieval. sometimes called segment types and segment
Further, a record may have a set 'of de- occurrences or simply segments [14]. A single
scendants, all of which can also qualify for data-base tree consisting of one root record
retrieval. For example, an ADMINISTRA- occurrence and all its assoeiateddeseendants,
TION record may have several STATE is sometimes called a data-ba~e record. The
records connected to it. Notice that each se- root record type of the hierarchical defini-
lected record always has no more than one tion tree is referred to a s the root segment
ancestor record of each ancestor record type. type. The other segment types are called
That is, each record has, at most, one parent dependent segment types. Each segment type
which (in turn) has, at most, one parent, is further divided into data items called
and so on. However, in general, a selected fields. In this section, the original terminol-
record may have several descendant records ogy of the Hierarchical Data Model section,
of each descendant record type, since it may page 105, will be used.
have several children which (in t u r n ) m a y One of the DBMSs that uses a tree traver-
have several children, etc. When most sys- sal language is+ the Information Manage-
tems qualify descendants, this qualification ment System (IMS) [14]. IMS uses a pre-
is performed along only one hierarchical order tree traversal to traverse a data-base
path. This means that from a P R E S I D E N T tree. A preorder tree traversal is defined as
record, one can qualify ELECTION records follows [151:
or ADMINISTRATION records, but not
both. The qualifying of ancestors of a se- 1) Visit the record if it has not already
lected record is called upward hierarchical been visited.

Compu~JUl Swrveym.Vol. 8. No. 1. March !076


112 • D. C. Tsichritzis and F. H. Lochovsky

2) Else, visit the leftmost child not pre- data-base tree. The calls to D L / I require
viously visited. several parameters which are passed through
3) Else, if no children, grandchildren, a subroutine call in the host language. These
etc., remain to be visited, go back parameters identify communication buffers,
to the parent record. I / O buffers, the type of call, and qualifica-
tions. However, only the last two param-
These steps are applied to each record of the eters are relevant to the following discussion.
data-base tree when the record is reached. During the operation of retrieving records,
I t is assumed that the children of each several records may be selected by means of
parent are ordered according to the appear- one or more qualifications. After selection,
ance of the child record type in the hier- other records may be qualified. No more
archical definition tree, i.e., all ELECTION than one record of each record type, lo-
records under a P R E S I D E N T record come cated in the hierarchical path, may be quali-
before any A D M I N I S T R A T I O N records. fied. A qualification on a record is specified
The traversal begins at the root record. by means of a segment search argument
The traversal essentially visits all records (SSA). An SSA specifies a qualification that
in the tree going in top-to-bottom, left-to- applies to only one record type. The form of
right order. Taking as an example the data- an SSA is:
base tree given in Figure 7, a preorder tree
traversal would visit the records in the order (RECORD N A M E ) ( C O M M A N D CODE)
indicated. If one imagines individual oc- (QUALIFICATION)
currences of data-base trees to be connected
The (RECORD NAME) is the name of a
to an imaginary head record, a single data- record type in the hierarchical definition
base tree is formed; this procedure makes it
tree. A (QUALIFICATION) is optional in
possible to visit all the records in the data
the SSA and is expressed in the form speci-
base. During the traversal, the selection of
fied previously. The only Boolean operators
record occurrences can take place. In addi-
permitted are AND and OR.
tion, the records may be qualified.
The (COMMAND CODE) is optional and
Requests to IMS are specified through pro-
specifies the various options of the call. Some
cedure calls to Data Language/One (DL/
of the more important options permit:
I) from application programs written in
P L / I , COBOL, or Assembler Language. A • retrieval or insertion of some or all of
position pointer marks the progress of an the records from the root to a speci-
application program through the data base fied record type in a single D L / I call
according to a preorder traversal of the (path call);
• backing up to the first child under a
record at any or all levels (except at
the root record level);
• retrieval of the last occurrence of a
record that meets all specified condi-
tions under a parent;
• setting of the parentage to a specific
record (see the GET N E X T W I T H I N
P A R E N T call below).
The set of SSAs in a D L / I call specify a
O PRESIDENT path that passes from a root record down the
data-base tree to a specified record. There
• ELECTION
may be no more than one SSA for each level
ADMINISTRATION in the hierarchical path. If an SSA, located
at a particular level, does not uniquely
4- STATE identify a record, or if no SSA is specified,
FIGURE 7. Preorder tree traversal. then IMS selects the hierarchically next

Computing Surveys, VoL 8, No. I, March 1976


Hierarchical Data-Base Management • 113

(preorder) record, except when modified by under a given parent is reached. A GET
command codes. N E X T call continues to retrieve records
Every D L / I call results in the setting of auntil IMS sets the status code to signal
status code parameter in the communication that the end of the data base has been
buffer used by IMS to communicate with an reached. A GET N E X T W I T H I N P A R E N T
application program. The status code may call sets the status code to signal that there
be checked after every D L / I call to de- are no more children under the parent.
termine whether the result of the call is suc- A GET HOLD call is used to retrieve and
cessful, unsuccessful, warning, or some other hold a record for a DELETE or REPLACE
condition. It is then up to the application call. A GET HOLD call allows only one ap-
program to determine the appropriate action plication at a time to alter a record by
to be taken. The various status codes will not serializing access to the record.
be discussed here. The interested reader An I N S E R T eM1 is used to load a data
should refer to the appropriate manuals base or to add new records to a data base.
[141. SSAa are used to select the position (in the
Data-base calls are categorized into three data-base tree) where the record is to be in-
general types: GET calls; I N S E R T calls; serted. Notice that an I N S E R T call per-
and, DELETE and REPLACE calls. All forms two operations: it stores the new
calls may optionally include some form of record, and connects it to its parent. This
SSAs. dual operation is necessary, since every
The GET UNIQUE call retrieves a record, record, except a root record, must have a
as selected by the SSAs, independently of parent.
its current position in the data base. This A DELETE call deletes a record and all
call is used for nonsequential processing or to of its descendant records from the data base.
establish a start position for sequential" The DELETE call is a "triggered" delete--
processing of the data base. that is, the record selected and all of its
The GET N E X T call processes in a for- descendants are deleted.
ward direction only, starting from the cur- A REPLACE call updates records in the
rent position in the data base. Records (op- data base. Any nonkey data item may be
tionally of a particular type) are retrieved changed in the record. Attempting to change
in the order established by a preorder traver- a key data item results in an error.
sal of the data-base trees. GET N E X T can To demonstrate the use of the D L / I calls,
be used without SSAs to retrieve the records the following query will be implemented
in the data base sequentially. It can also be using P L / I : Print the names of all the states
used to search for a particular record if an that were admitted during a Democratic
SSA is included in the call. administration. The query is first presented
A GET N E X T W I T H I N P A R E N T call in its entirety, and then an explanation of
obtains records (optionally of a particular its various parts is given. It is assumed that
type) within the family of a parent record. there is a hierarchical data-base structure
The parent record is established by the last that conforms to the hierarchical definition
GET N E X T or GET UNIQUE call, or by tree shown in Figure 6(a). Only the PRESI-
the (COMMAND CODE) option in an D E N T and STATE A D M I T T E D record
SSA of a preceding GET N E X T or GET types are required for this query. To con-
UNIQUE call. The only difference between form to the naming conventions of IMS, these
a GET N E X T and a GET N E X T W I T H I N two record types are renamed PRES and
P A R E N T call is the result obtained after SADMIT, respectively.
the last child (optionally of a given type)
DLITPLhPROCEDURE (QUERY__PCB) OPTIONS (MAIN);
DECLARE QUERY__PCB POINTER;
/* CommunicationBuffer * /
DECLARE 1 PCB BASEDo(QUERY--PCB),
2 DATA.__BASLNAME CHAR (8),

~omputiBg SurveyS, Vo|. ~pNO. 1, MaTch 1~76


114 • D. C. Tsichrit~is and F. H. Lochovsky

2 SEGMENT__LEVELCHAR (2),
2 STATUS__CODE CHAR (2),
2 PROCESSING__OPTIONSCHAR (4),
2 RESERVED__FOR__DLIFIXED BINARY (31,0),
2 SEGMENT__NAME___FEEDBACKCHAR (8),
2 LENGTH~OF__KEY__FEEDBACK__AREAFIXED BINARY (31,0),
2 NUMBER_..OF SENSITIVE_SEGMENTSFIXED BINARY (31,0),
2 KEY__FEEDBACK~AREA CHAR (28);
/ * I/O Buffers */
DECLARE PRES__IO__AREA CHAR (65),
1 PRESIDENTDEFINED PRES__IO__AREA,
2 PRES__NUMBERCHAR (4),
2 PRES._NAME CHAR (20),
2 BIRTHDATECHAR (8),
2 DEATH__DATE CHAR (8),
2 PARTY CHAR (10),
2 SPOUSE CHAR (15);
DECLARE SADMIT__IO__AREA CHAR (20),
1 STATE ADMITrED DEFINED SADMIT_IO_AREA,
2 STATE_NAME CHAR (20);
/ * Segment Search Arguments */
DECLARE 1 PRESIDENT__SSASTATIC UNALIGNED,
2 SEGMENT__NAME CHAR (8) INIT ('PRES '),
2 LEFT__PARENTHESISCHAR (1) INIT ('('),
2 FIELD__NAME CHAR (8) INIT ('PARTY '),
2 CONDITIONAL_.OPERATOR CHAR (2) INIT (' ='),
2 SEARCH__VALUE CHAR (10) INIT ('DEMOCRAT '),
2 RIGHT__PARENTHESISCHAR (1) INIT (')');
DECLARE 1 STATE__ADMII"rED_SSA STATIC UNALIGNED,
2 SEGMENT_NAME CHAR (8) INIT ('SADMIT ');
/ * Some necessary variables * /
DECLARE GU CHAR (4) INIT ('GU '),
GN CHAR (4) INIT ('GN '),
GNP CHAR (4) INIT ('GNP '),
FOUR FIXED BINARY (31) INIT (4),
SUCCESSFUL CHAR (2) INIT (' '),
RECORD~NOT_FOUND CHAR (2) INIT ('GE');
/*This procedure handles IMS error conditions */
ERROR:PROCEDURE(ERROR__CODE);

END ERROR;
/ * Main Procedure */
CALL PLITDLI(FOUR,GU,QUERY__PCB,PRES__IO.~AREA,PRESIDENT__SSA);
DO WHILE (PCB.STATUS__CODE=SUCCESSFUL);
CALL PLITDLI (FOUR,GNP,QUERY__PCB,SADMIT~IO._~AREA,STATE__ADMITrED_SSA);
DO WHILE (PCB.STATUS_CODE=SUCCESSFUL);
PUT EDIT (STAT~ NAME) (A);
CALL PLITDLI (FOUR,GNP,QUERY~PCB,SADMIT_IO_._AREA, STATE.__ADMITTED_SSA);
END;

Computing Survey, Vol. 8, No. 1, March 1976


HierarchicalData-Base Manq~jem~n$ • 115

IF PCB.STATUS_CODE--1 = RECORD_.NOT_FOUND
THEN DO;
CALL ERROR (PCB.STATUS_CODE);
RETURN;
END;
CALL PLITDLI (FOUR,GN,QUERY_PCB, PRES_IO._AREA, PRESIDENT~SSA)~
END;
IF PCB.STATUS~CODE~ = RECORD__NOT_FOUND
THEN DO;
CALL ERROR (PCB.STATUS__CODE);
RETURN;
END;
END DLITPLI;
To run this query, the user would invoke the application program. The use of P L / I
IMS. After IMS has performed the necessary structures, applied, for example, to PRESI-
initializations, it passes control to the pro- DENT, facilitates access to the data items
cedure called DLITPLI. Within his pro- of the record type. All records are assumed
gram, a user must declare a mask for a com- to be stored as character data. In this ex-
munication buffer, allocate I / O buffers, and ample, only buffers for the I~RESIDENT
set up the formats of the various SSAs that and STATE A D M I T T E D record types are
will be needed. required.
The declaration of the P L / I structure Finally, the SSAs are declared. There may
named PCB is an outline of the structure for be one SSA for each record type in the hier-
the communication buffer, established by archical definition tree. In the example
IMS, that is needed for communication with given, only the P R E S I D E N T and STATE
the user's program. The buffer is allocated A D M I T T E D record types require SSAs.
by IMS in the initialization phase, and a Although the SSAs in this example do not
pointer to it (QUERY__PCB) is passed as change after they have been declared and
a parameter, to the application program. In initialized, it is possible to change the search
general, there may be several communica- value and/or the conditional operator by
tion buffers, one for each data base accessed means of the host languages facilities.
by the application. The declaration of the The E R R O R procedure will not be de-
P L / I structure, PCB, only serves to facili- scribed in detail here. It is used to print
tate access to the communication buffer via error messages and to take appropriate ac-
the structure entry names. The buffer is tion when an error condition arises.
used by D L / I to advise an application pro- As mentioned earlier, a program accesses
gram of the results of its D L / I calls. All en- an IMS data base by making a subroutine
tries in the buffer are read-only, and most call to DL/I. In P L / I , such D L / I calls are
are self-explanatory. The only entry of in- characterized by the starting sequence
terest in this example is the STATUS__ CALL PLITDLI. These calls may have a
CODE. The STATUS CODE indicates varying number of parameters. However,
whether the result of a D L / I call is success- in the present example each of the calls
fnl, or an error, a warning, or other condi- uses five parameters, which are in order of
tion. appearance within the call:
Next, the I / O buffers for the record types
are declared. There is usually one I / O buffer • the number of parameters to follow
for each record type in the hierarchical (in this example, four);
definition tree. The I/O buffers are used to • the type of call, e.g., GET NEXT,
hold the record, corresponding to the ap- INSERT, etc.;
propriate record type, that is retrieved by • the pointer to the communication
IMS, and to make the record available in buffer for the call;

Comput/ngSurveys, Voi. 8, No. 1, March 1976


116 • D. C. Tsichrit~is and F. H. Lochovsky

• the location of the I / O buffer; and although they operate on a tree, their
• the SSA(s) for the call. processing looks sequential. The nature of
the commands they use can also influence
The processing of the query is performed their implementation. It would be nice if the
ia several steps as follows: record in the hierarchy that is logically
1) Retrieve the first occurrence of a next would also be the next physical record.
P R E S I D E N T record where the If performed in this way, sequential proce-
P A R T Y data item is equal to essing of the data-base tree would be very
DEMOCRAT. This action is per- efficient. In the Implementation section,
formed by means of a GET [see page 118], we will examine some imple-
U N I Q U E (GU) call using the mentations of a hierarchical data base.
PRESIDENT__SSA. If no record
is selected, then the processing is General Selection
complete.
2) Get the first STATE A D M I T T E D The terminology employed in systems that use
record that is a descendant of the general selection languages differs from that
P R E S I D E N T record. This action of systems using tree traversal languages
is performed by means of a GET [5, 6, 7, 9, 10, 11, 17, 20]. Among the general
NEXT WITHIN PARENT selection languages, a record type is some-
(GNP) call using the STATE__ times referred to as a repeating group. A
ADMITTED__SSA. Notice that record occurrence is called a repeating group
it is not necessary to specify an occurrence or data set. A data-base tree con-
SSA for A D M I N I S T R A T I O N rec- sisting of one root record and all its descend-
ords. We are not really interested ants is called a logical entry. Finally, a data
under which administration the item is called a data element. However, to
state was admitted, but only under avoid terminological confusion, repeating
which president. IMS, therefore, groups will be called record types, and the
selects each A D M I N I S T R A T I O N other terms will also be named as before.
record in turn as required and In the following discussion, a hierarchy
processes its associated STATE structured according to the hierarchical
A D M I T T E D records. If no states definition trees given in .Figure 6 will be
were admitted during a president's used.
tenure, then we go to step 4. General selection languages treat the
3) Get all the other STATE AD- record occurrences of a record type as sets of
M I T T E D records, in turn, by data-item values. Records are selected and
means of a GET N E X T W I T H I N qualified according to the relationships
P A R E N T (GNP) call until there found among the data items of the record
are no more STATE A D M I T T E D types of the hierarchical definition tree. A
records for this president. Print the qualification in the general selection lan-
name of each state. guages is usually specified by a W H E R E
4) Get the next P R E S I D E N T record clause [12, 13, 17]. A W H E R E clause con-
where the PARTY data item is sists of th= keyword W H E R E and a Boolean
equal to DEMOCRAT. This action combination of the conditions discussed
is effected by a GET N E X T (GN) earlier. The W H E R E clause specifies the
call using the same SSA as for the records to be selected. After these specifica-
GET UNIQUE call in step 1. If no tions have been effected, upward and/or
P R E S I D E N T record is selected, downward hierarchical normalization can
then the processing is complete. be performed. Downward hierarchical nor-
Otherwise go to step 2. malization is usually restricted to one hier-
archical path.
Tree traversal languages usually operate The common characteristic features of
on one record at a time. They essentially general selection languages will be illus-
perform a linear tree traversal. As a result, trated by examples using the "Natural Lan-

Computing Surveys Vol. 8, No. I, March 1976


Hierarchical Data.Base Management • 117

guage" 'feature of the SYSTEM 2000 [ 9, query is answered by selecting and qualifying
17]. This feature is typified by an Eng- only A D M I N I S T R A T I O N records. In this
lish-like, interactive query language. The case again, several records may be selected
commands of this language consist of two and qualified.
parts: an action part and a W H E R E clause. Now consider the query:
The action part specifies the operation to be
performed, e.g., retrieval, update, etc., and P R I N T PRES N A M E WHERE VICE-
the data items on which to operate. The P R E S I D E N T EQ AGNEW AND
basic retrieval command is the P R I N T V I C E P R E S I D E N T EQ FORD:
command. The following query illustrates
upward hierarchical normalization. The casual user would perhaps expect the
response to this query to be Nixon as the
P R I N T V I C E P R E S I D E N T WHERE president who had both Agnew and Ford as
STATE NAME EQ ALASKA: vice-presidents. However, the semantics of
the W H E R E clause are such that the answer
This query specifies that we want VICE- to this query is null, i.e., no P R E S I D E N T
P R E S I D E N T data-item values in AD- record qualifies. This problem arises because
MINISTRATION records. In addition, the all Boolean operations must be performed at
ADMINISTRATION records must have a the same level at which records are selected,
STATE A D M I T T E D descendant that sat- that is, on A D M I N I S T R A T I O N records.
isfies the W H E R E clause. To provide this, The result is that the intersection (AND) of
we select all STATE A D M I T T E D records two conditions which specify the same data
satisfying the W H E R E clause and then per- item--but different values--mus be ~ull.
form an upward hierarchical normalization The same record cannot have two different
to qualify ADMINISTRATION records. In values for the same data item.
general, several ADMINISTRATION rec- If the Boolean operations could be per-
ords may qualify. However, according to the formed on qualified records at a higher level,
semantics of the example, only one AD- then the problem would be solved. The pur-
M I N I S T R A T I O N record would qualify. pose of the HAS clause is to raise the level
The next query illustrates downward at which the Boolean operations are per-
hierarchical normalization. formed by carrying out an upward hier-
archical normalization on the selected rec-
P R I N T STATE NAME WHERE VICE- ords. For instance, the query:
P R E S I D E N T EQ NIXON:
P R I N T PRES NAME WHERE PRES
We examine the A D M I N I S T R A T I O N NAME HAS V I C E P R E S I D E N T
records to select those records that satisfy EQ AGNEW AND PRES NAME
the W H E R E clause. Although, in general, HAS V I C E P R E S I D E N T EQ FORD:
several records may be selected, we are con-
strained in this example to select only one will produce the answer "Nixon," as ex-
record. A downward hierarchical normaliza- pected. In the previous query, using only the
tion is then performed to qualify STATE W H E R E clause, the intersection (AND) was
A D M I T T E D records. Many STATE AD- performed on A D M I N I S T R A T I O N rec-
M I T T E D records may qualify. Note that ords. In this query, the intersection is per-
this and the preceding query are concep- formed on P R E S I D E N T records, since the
tually symmetric queries. However, they are level of Boolean operations is raised to the
answered in quite different ways. P R E S I D E N T record by means of the HAS
clause. (The level to which upward hier-
P R I N T A D M I N N U M B E R WHERE archical normalization is performed is spe-
V I C E P R E S I D E N T EQ NIXON: cified by the data-item name preceding the
keyword HAS.) In the first query, no AD-
This query involves neither upward nor M I N I S T R A T I O N record satisfies both con-
downward hierarchical normalization. The ditions of the W H E R E clause. In the second

C°mputingSurveys, Vol. 8. NO. 1, March i976


118 * D. C. Tsichrit~is and F. H. Lochovsky

query, there is at least one PRESIDENT M O V E command deletes data-item values


record which has ADMINISTRATION from a record occurrence [9, 17]. The
records that satisfy the HAS clauses. effect is to make the data-item value null.
Approaching the problem from a different But the record is not deleted and associated
angle, one can visualize the HAS clause as se- descendant records are therefore retained.
lecting records of a certain type and then de- General selection languages are truly
termining whether there is at least one hierarchical. They use tree operations to se-
parent record connected to all the selected lect and qualify records in the data base.
records. On the other hand, the W H E R E Records are qualified because they are an-
clause starts at a parent record and de- cestors or descendants of particular selected
termines whether for a given parent, one records. Queries that require movement up
descendant exists that satisfies the W H E R E and down the data-base tree can usually be
clause. handled in one command. Because of these
The HAS clause is also useful in other features, general selection languages are
situations. For example, sometimes it is more versatile than tree traversal languages
necessary to retrieve values from a record of which traverse the data-base trees in a
one type based on the selection of a record of specific linear manner.
a different type at the same level. For in-
stance, suppose we want to know all the
election years for the president with ad- IMPLEMENTATION
ministration number 48. A W H E R E clause
does not have the capability to retrieve these To represent and then implement a hier-
values. However, a HAS clause does, as archical data base, we must first be able to
follows: specify the connections between the records,
i.e., how are the records related to one
P R I N T YEAR W H E R E PRES NAME another. One way of implementing a hier-
H A S ADMIN NUMBER EQ 48: archical data base is by the use of pointers.
Essentially every relationship between two
We select the appropriate ADMINIS- record types in the hierarchical definition
TRATION record(s) and then do an upward tree has a set of pointers associated with it.
hierarchical normalization to qualify a The pointers implement the connections be-
P R E S I D E N T record. We then do a down- tween the parent and child record occur-
ward hierarchical normalization to qualify rences. For instance, for the data base tree
all ELECTION records connected to the outlined in Figure 4, both a forward and a
P R E S I D E N T record. It is also possible to backward pointer can be associated with
retrieve a year based on the selection of a every branch of the tree.
STATE A D M I T T E D record. While this organization is simple to
visualize it has many drawbacks. A great
P R I N T YEAR W H E R E PRES NAME deal of space is taken up by pointers. The
H A S STATE NAME EQ ALASKA: amount of space required for pointers that
are associated with a parent record varies as
In general, the insertion, update, and de- children are inserted and deleted. I t would
letion commands for general selection lan- facilitate storage allocation if this space re-
guages are similar in their operational effect quirement were fixed. To this end, some sys-
to the commands outlined in the Tree tems limit the number of children of a given
Traversal section, [See page 111]. However, type that a parent may have [8]. If upward
systems that use general selection languages or downward hierarchical normalization of
usually distinguish between a record and its more than one level is adopted, much
position in a data-base tree. This means a pointer chasing becomes necessary.
position in the data-base tree may exist, but To reduce the space occupied by pointers,
either there may not be any record, or only the connections can be implemented in
a part of a record may be associated with it. other, slightly different ways. As an ex-
For example, in SYSTEM 2000, the RE- ample of a way of avoiding many forward

Computiug SurveyB, Vol. 8, No. 1, March 1976


Hierarchical Data-Base Manayemenj • 119
!

base. This approach results in many special


purpose access methods for storing hier-
archical data bases. We will outline, as an
example, the approach taken by IMS.
IMS has two basic storage organizations:
hierarchical sequential and hierarchical
direct [14]. Within each of these organiza-
tions, the access method may optionally use
an indexed organization. The indexed or-
0 PRESIDENT ganization is similar to B-trees [3]. In the
simplest form of indexed organization, the
• ELECTION records are ordered by a key data item value
and divided into blocks. Each •block con-
ADMINISTRATION
tains a fixed number of records. A directory
4- STATE indicates the highest key value stored in
FIGURE 8. Hierarchical pointer implementation.
each block. In a more general organization,
there are several levels Of directories. In this
manner, the block in which a record is stored
and backward pointers between parent and
can be quickly determined from its key value.
children, only one forward pointer plus one
The block is then scanned sequentially to
brother pointer can be used (Figure 8) [16].
find the desired record.
In such a case, each record has a fixed
In the hierarchical sequential organiza-
amount of pointer overhead. This organiza-
tion, records are related by physical ad-
tion saves space, but now even more pointer
jacency. The Hierarchical Sequential Access
chasing is involved. Whereas in the previous
organization any child of a parent can be
Method (HSAM) organization stores all
records in physically adjacent storage loca-
accessed by following one pointer, this or-
tions in hierarchical (preorder) sequence. All
ganization may demand that several point-
records must be of fixed length. To modify a
ers be followed.
HSAM data base, the entire data base must
Instead of deploying explicit pointers, the
be reloaded.
physical contiguity of records can imply a
The Hierarchical Sequential Indexed Ac-
connection. For example, the GET N E X T
cess Method (HISAM) organization is used
command in IMS processes the data-base
for indexed access to a data-base tree. Each
tree in a sequential manner. The records can
data-base tree is stored in physically con-
be allocated to reflect this sequential nature.
tiguous locations in hierarchical sequence.
Such allocation saves space and increases
The root record type must contain a key
access speed for the GET N E X T type of
data item. The key data item is used to in-
processing, for, depending on the buffering
dex each data-base tree. The storage area is
characteristics of the operating system, the
divided into a primary and an overflow area.
next record occurrence is often already in
The primary area stores, in hierarchical se-
the buffer. Hence, no additional secondary
quence, a fixed size part of a data-base tree
storage access is necessary in this case of
consisting of the root record and as many
sequential allocation. However, data base
records of the data-base tree as can be ac-
reorganization problems arise for cases
commodated. Additional records of the data-
where a record is inserted or deleted. Certain
base tree are stored, in hierarchical sequence,
overhead costs are also incurred by skipping
in the overflow area. A direct address pointer
groups of records when only a certain type
relates a data-base tree in the primary area
of record is wanted. In addition to these
to its extension in the overflow area.
problems, changes to the schema, which are In the hierarchical direct organization,
difficult in any organization, are particu- records are related by pointers. There are
larly difficult in this approach. two pointer organizations possible: hier-
Pointers can be combined with physical archical and physical child/physical twin.
contiguity to implement a hierarchical data Hierarchical pointers relate records in hier-

Computilat krv~y.~ Vo|. 8~.No. 1, Mal~c,h t976


120 • D. C. Tsichritzis and F. H. Lochovsky

archical sequence. Physical child/physical ing the pointer to the data base. Records in
twin pointers relate all records of a given the data base are related by pointers, as
type under a parent record to each other, discussed previously.
and the parent to its first child. Both pointer To summarize, a successful hierarchical
organizations may optionally have backward implementation can utilize contiguity and
pointers. It is possible to specify any com- pointers with the following objectives:
bination of pointer organizations within a • it is important to have efficient re-
data base and different organizations for trieval by eliminating costly pointer
different record types. chasing and resulting secondary
The pointers in the hierarchical direct
storage accesses;
organization are stored with each record. • the space required both for the data
The record consists of two parts in this case:
and the pointers should be minimized;
a prefix and a data part. The data part con-
and
tains the record data as supplied by the user.
• costly reorganizations should be
The prefix, which is system controled and not
avoided by providing a flexible en-
available to an application, contains system vironment that allows easy expan-
data and the pointers. The system data con-
sion of the data base.
sists of a record code, a delete flag, and a
counter. The record code identifies the record There are other ways of implementing a
type, and the delete flag indicates whether hierarchical data base. For example, a
the record has been deleted. The counter is method can be used to assign a logical ad-
optional and is only present if the record dress to a record in a data-base tree. This
type participates in a logical relationship logical address can then be mapped into a
[14]. A simple prefix consisting of the record physical one. A logical address to a record in
code and the delete flag is stored with every a data-base tree is called a trace [16]. An ex-
record in a multiple record type HISA2¢I ample of assigning traces to records in a
data base. data-base tree will be outlined.
Within the hierarchical direct organiza- Each record type in a hierarchical defini-
tion, the Hierarchical Direct Access Method tion tree can be identified by a type-number
(HDAM) organization is used to access root as indicated in Figure 9(a). Any record oc-
records via a hash algorithm. Records are currence in a data-base tree can then be
hashed into a primary storage area called the identified by its type-number and a genera-
root segment addressable area. The root record tion tuple. The generation tuple defines a
type must contain a key data item. The path, in the data-base tree, which leads to
hash is performed on the key data item. A the record occurrence.
fixed portion of a data-base tree, including For example, suppose that Figure 9(b)
the root record, is stored in the root segment corresponds to the third data-base tree in a
addressable area. Additional records of a hierarchical data base. The root record type
data-base tree are stored in an overflow of this tree is identified by the trace 1 (3).
area. A direct address pointer relates records The number 1 is the type-number of the
of a data-base tree in the root segment ad- record type. The number 3 indicates that
dressable area with its extension in the over- this is the third record of type-1. The first
flow area. type-2 child of this root is assigned the trace
The Hierarchical Indexed Direct Access 2 (3, l ) - - T h a t is, it is the first type-2 child
Method (HIDAM) organization provides under the third root record. The first type-4
indexed direct access to records in a data descendant under the first type-3 child of the
base. The index in this organization is a se- third root record has as its trace 4 (3, 1, 1).
quential file called I N D E X . Each record in The first number identifies the record type.
the I N D E X file contains the key data item The numbers in parentheses define a path
value of a root record and a pointer to the to a particular record occurrence. Using a
root record in the data base. Root records type-number and a generation tuple, any
are accessed by searching for the key data- record occurrence in the data base can be uni-
item value in the I N D E X and then follow- quely addressed by the path to it. In this rep-

Computing Surveys, Vol. 8, No. 1, March 1976


Hierarchical Data-Base M a ~ g ~ n t 121

lO

2/\
4+ + 4(3 1 1) • 413 1 21

(a) (b)
FIGURE9. Assigning traces to records.

resentation, all traces are valid provided the then we have a nice implementation of the
bounds of the hierarchical definition tree are hierarchical data structure [4].
adhered to, i.e., type-numbers and levels are
respected. However, traces may correspond
to records which have not yet been inserted. CONCLUDING REMARKS
Given a trace, there is a straightforward
algorithm to obtain the traces of ancestors, To summarize the essential points made by
descendants, and brothers. The rules that this survey, a hierarchical system is a DBMS
govern our example representation of traces which presents to the users of the system cer-
are: tain explicit views of the data base that are
characteristic of the hierarchical data model.
1) ancestor trace--drop some (greater The hierarchical data model has the follow-
than level of ancestor) digits at the ing characteristics i
end of the generation tuple and
change the type-number. 1) There is a set of record types {R1,
2) Descendant traces--add (descendant R2, . . . , R n } .
level minus current level) digits in 2) There is a set of relationships con-
next generation tuple places and necting all record types in one data
change the type-number. structure diagram.
3) Next brother trace--add one to the last 3) There is no more than one relationship
digit in the generation tuple. between any two record types R i
4) Previous brother trace--subtract one and Rj. Hence~ relationships need
from the last digit in the generation not be labeled,
tuple (if digit ~ 1). 4) The relationships expressed in the
data structure diagram form a tree
Sometimes, valid traces are restricted by with all arcs pointing toward the
specifying some bounds such as maximum leaves.
number of levels in the hierarchical defini- 5) Each relationship is I : N , and it is
tion tree, maximum number of children at total--that is, for every R j record
each level, etc. occurrence there is exactly one R i
Traces for record occurrences already in record occurrence connected to iS, if
the data base are kept in a trace table. A R i is the parent of R j in the defini-
trace table gives a mapping between a trace tion tree.
and the location where the record occurrence
Hierarchical systems deal with relation-
is stored. This mechanism captures all the
ships among attributes (~f the same entity in
structure of a hierarchical data base. Given a manner similar to relational and network
the trace of a record occurrence, one can systems discussed in the companion papers
find its ancestors, the record occurrence in this issue. All of the systems organize the
itself, its brothers, and its descendants. If attributes as items in groups: Their main
we can efficiently implement the trace table, difference is found in the way they treat re-

Compu~ngSurve~a, Vol. 81 No, |. Match ~976


: i
.... ..~-::. ! ~i~~
122 • D. C. Tsichrit~is and F. H. Lochovsky

lationships between entities. Relational sys- COMPANY and child record type REGIS-
tems provide operations on relations which TRATION.
construct new relations representing relation- Note that the N : M relationship could be
ships among entities. As a result, relational represented in a single hierarchical defini-
systems do not use any additional concepts tion tree with one record type as the root
to the relationships among entities. The sys- and the other as the child. However, data
tem is homogenous in that all relationships duplication is again required: for example,
are represented by relations. Both hier- if the root is STATE, then the COMPANY
archical and network systems use the idea record occurrences would have to be re-
of a link or connection between record types peated under each state in which the com-
to handle relationships between entities. In panies are registered. If the amount of data
network systems, the data structure . sets associated with each company is quite large,
serve as the logical links between record then a great deal of storage space is required
types in a network. In the case of hierarchi- for duplication. It should be noted, however,
cal systems, the parent-child relationships that some existing hierarchical systems have
represented by the hierarchical definition facilities which essentially eliminate this
tree are the links. problem: they implement N : M relation-
Hierarchical systems have been available ships by using logical pointers among hier-
and well accepted for a long time [5, 6, 7, 9, archies. They allow logical hierarchical views
10, 11, 14, 17, 20]. It is difficult to relate the that are different from the physically imple-
success of a particular system to its data mented hierarchical structures [ 9, 14, 17].
model. There are many other parameters Hierarchical systems also handle con-
which influence the quality of a commercial ceptually symmetric queries in a very differ-
system. However, for some applications a ent manner. For example, consider the rela-
hierarchical data model seems very natural, tionship SERVED in Figure 1. If PRESI-
e.g., a corporate management structure is D E N T is the root, then a query such as:
truly hierarchical. In addition, most applica- "Find all congresses in which President P
tions can be modeled by a hierarchical or- served," is simple to answer. For President
ganization of the data, although some appli- P, all CONGRESS descendants are found.
cations produce more difficulty and redund- However, the symmetric query: "Find all
ancy than others. presidents who served in Congress C," in-
A hierarchical data model provides no volves a data base search of CONGRESS
means for implementing direct N : M rela- records. For every President, it is necessary
tionships between record types. Such a rela- to determine if he served in Congress C.
tionship can only be effected within record Sometimes, content addressibility, e.g., in-
types. However, most hierarchical systems verted files, may be used to speed the search.
do provide the ability to handle many hier- In addition, if two definition trees, one with
archical definition trees. By using data the root CONGRESS and one with the root
duplication, one can represent an N : M re- P R E S I D E N T , are used, then the problem
lationship by two hierarchical definition disappears.
trees, each representing a I : N relationship. Some specific advantages are widely ac-
For instance, consider the record types cepted for the hierarchical approach:
STATE and COMPANY and the N : M re-
lationship "registration" between them. • It is a simple data model which pro-
vides the user with relatively few,
That is, a company may be registered in
many states and a state may have many easy to master, commands.
companies registered in it. The N : M rela- • Because of the constraints on the
tionship beween STATE and COMPANY types of relationships allowed, it can
can be handled by using two hierarchical allow an easier implementation than
definition trees, one with root record type other, more complex structures.
STATE and child record type REGISTRA- Some specific disadvantages are also
TION, the other with root record type associated with hierarchical systems:

Computing Surveya~ Vol. 8. No. 1, March 1976


Hierarchical Data-BareManagem~n~ • 123
°
• The restrictions imposed force a some- management system~frDMS),"~,:b
~ In Proc.
1FIP Congress 1908, Wol. 2. North-Holland
times unnatural organization of the Publ. Co., Amsterdam, The Netherlands,
data. For instance, as shown pre- 1968, pp 1245-1252.
viously, N: M relationships can some- [7] CONTROL DATA CORe., "MARS VI multi-
access retrieval system reference manual,"
times only be represented in a clumsy 44625500, 1970.
way. [8] CII, "SOCRATE manuel de presentation,"
• Because of the strict hierarchical or- Reference document 4337 P/FR, Louvecien-
nes, France.
dering, operations such as insertion [9] DATAPRORESEARCHCORP., '~SYSTEM2000--
and deletion become quite complex. MRI Systems Corporation," Datapro 70,
(April 1972).
• A delete operation can lead to the loss [lO] HYERETT, G. D.; DISSLY,C. W.; AND HARD-
of information present in the de- GRAVE W. T , ~'Remote file management
,

scendants if null records are not per- system (RFMS) users manual," TRM-16,
Computation Center, Univ. of Texas at
mitted. Consequently, users have to Austin, Texas, August 1971.
be careful when performing a delete [111 FRANKS, E. W,; "A data management sys-
operation. tem for time-shared file processing using a
cross-index file and self-defining entries," in
• It is sometimes not possible to answer Proc. AFIPS, Spring Jr. Computer Conf.,
symmetrical queries easily in a hier- 1966, Vol. 28, Spartan Books, New York,
archical system. Therefore, the strue- [12] 1966, pp. 79-86.
HARDGRAW:,W.T., "Theoretical aspects of
ture of the data base may tend to re- Boolean operations on tree structures and
flect the needs of the application. implications for generalized data manage-
ment," TSN-26, Computation Center, Univ.
Sometimes criticism is concentrated on of Texas at Austin, T~xas, August 1972.
HAItl)GRAVE, W. T., "BoLTs: a retrieval
the nature of the hierarchical commands [13l language for tree-structured data base sys-
which are claimed to be too procedural. How- tems," Information ~ystems, COINS-IV,
ever this criticism can be answered, as we [14] Plenum Press, New York, 1974.
IBM, INFORMATIONMANAGEMENTSYSTEM//
have seen in the section, General Selection, VIRTUAL STORAGE (IMS/VS) PUBLICATIONS
see page 116, by showing t h a t higher level 1975:
interfaces can be implemented. It is in this General information manual, GH20-1260-3.
System~application design guide, SH20-9025-2.
way, that even a casual user interface can be Application programmi~ reference manual,
easily accommodated. SH20-9026-2.
,System programming reference manual, SH20-
9027-2.
Operator's reference manual, SH20-9028-1.
Utilities reference manual, SH20-9029-2.
REFERENCES Messages and codes reference manual, SH20-
903O-2.
[1] ABRIAL, J. R., "Data semantics," Data [15] KNUTH, D.E., The art "of computer program-
base management, Klimbie, J. W., and Koffe- ming, Vol. 1, Addison-Wesley Publ. Co.,
man, K. L., [Eds.], North-Holland Publ. Co., Reading, Mass., 1968, p. 316.
Amsterdam, The Netherlands, 1974, pp 1-59. [16] LOWENTHAL,E.I., " ~ functional approach
[2] BACHMAN, C. W., "Data structure dia- to the design of storage structures mr gen-
grams," Data Base 1, 2 (1969), 4-10. eralized data management systems," PhD
[3] BAYER,R.; AND McCREIGHT, E., "Organi- Thesis, Univ. of Texas at Austin, Texas,
zation and maintenance of large ordered in- 1971.
dexes," Acta Informatica 1, 3 (1972), 173-189. [17] MRI SYSTEMSCORP., "SYSTEM~000 general
[4] BERNSTEIN, P. A.; AND TSICHRITZI$,D. C., information manual," Austin, Texas, 1972.
"Allocating storage in hierarchical data [18] PARSONS, R. G.; D.~L~, A. G.; ANY YURKA-
bases," Technical Report CSRG-34, Com- NAN, C. V., "Data manipulation language
puter Systems Research Group, Univ. of requirements for database management sys-
Toronto, May 1974 (to appear in Information tems," Computer J. 17, 2 (May 1974), 99-103.
Systems Journal). [19] SCHMID,H. A.; ANYSWENSON,J.R., "On the
[5] BLEIER,R.E., "Treating hierarchical data semantics of the relational data model,"
structures in the SDC time-shared data man- in Proe. ACM SIGMOD, Internatl. Conf. on
ement system (TDMS)," in Proc. ACM Management of Data, 1975, ACM, New York,
tional Conf. 1967, ACM, New York, 1967, 1975, pp. 211-223.
pp 41-49. [.20] UNITED COMPUTING SYSTEMS, Isc., UCS-
[6] BLEIER,R. E.; AND ~ORHAUS, A.H., "File VI UNIDATA data managey_ent system refer-
organization in the SDC time-shared data ence manual, Kansas City, Missouri, 1970.

ComputingSurveyz, VoL 8, No. 1. March 1976


Evolution of Data-Base Management Systems*
JAMES P. FRY
Graduate School of Business Administration, University of Michigan, Ann Arbor, Michigar~58109

EDGAR H. SIBLEY"
Departme.nt of Information Systems Mana~eme~lt, University of Maryland, College Park, Maryland 207~.,
and National Bureau of Standards, Washington, D.C. 20285

This paper deals with the history and definitions common to data-base
technology. It delimits the objectives of data-base management systems,
discusses important concepts, and defines terminology for use by other papers in
this issue, traces the development of data-base systems methodology, gives a
uniform example, and presents some trends and issues.
Keywords and Phrases: data base, data-base management, data definition,
data manipulation, generalized processing, data model, data independence,
distributed data base, data-base mach!nes, data dictionary
CR Categories: 3.51, 4,33, 4.34

I. GENERALIZED PROCESSING the sorting process); those authors then pro-


posed that these ideas be extended into other
A data-base management system (DBMS) data-processing areas, such as file main-
is a generalized tool for manipulating large tenance and report generation. This gen-
data bases; it is made avai.able through eralized processing entails the building of
special software for the interrogation, main- special data functions which perform fre-
tenance, and analysis of data. Its interfaces quently used, common, and repetitive data-
generally provide a broad range of language processing tasks. But such generality cannot
to aid all users--from clerk to data adminis- be accomplished without cost. The price of
trator. generalized processing is a reduction in oper-
DBMS technology can be traced back to ating efficiency, often through interpretive
the late fifties, when authors such as McGee processing, or a necessary increase in re-
[G1 and G2] 1 discussed the success of "gen- sources such as hardware capacity. The suc-
eralized" routines. These routines were cess of generalized processing (and conse-
capable of sorting any file regardless of its quently of generalized data-base technology)
data content (the user merely supplying thus becomes an issue of cost tradeoff.
parameters to direct the major elements of Hardware improvements developed over
the past two decades have effected signifi-
* This work is sponsored in part by the National cant decreases in price/performance ratio,
Science Foundation Grant GJ 41831.
1Editor,s Note: See page 35 for the key to the thereby tending to offset operational in-
classifiation system used for references cited in efficiency and to emphasize the cost of ap-
this paper. plication and software development. The
Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM's copyright notice is
given and that reference is made to the publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Computing Machinery.

Computing Surveys, Vol. 8, No, I, March 1976

~i.̧!111 ~o
8 * James P. Fry and Edgar H. Sibley

CONTENTS guages, which a r e themselves a form of


parameterized generalized processing, albeit
with very special parameters. For example,
the development of high-level interrogation
languages for ad hoc requests has broadened
the user access to data by providing a sim-
ple and, it is hoped, easy-to-use interface.
1. G E N E R A L I Z E D P R O C E S S I N G Such an approach allows the inquirer to use
2. OBJECTIVES OF DATA-BASE M A N A G E M E N T a language similar to everyday English,
Data Availability
Data Quality
rather than requiring him to write a pro-
Privacy and Security gram in an artificial language. Generalized
Management Control data-processing techniques have evolved
Data Independence
3. F U N D A M E N T A L CONCEPTS AND DEFINITIONS into a class of sophisticated, generalized
Elements of Logical Structure software systems, one of which is the data-
Introduction to Data Models base management system. The reader should
Mapping from the Logicalto the PhysicalStructure
4. H I S T O R I C A L P E R S P E C T I V E carefully distinguish between the terms
Evolution of Data DefinitionLanguages DBMS and "data management." The
Centralized Data Definition:Fiftiesand Sixties
Stored-Data DefinitionLanguages: 1970's
latter has been used by the government to
Development of Report Generator Systems designate an administrative function, by
Hanford/RPG Family some hardware vendors to designate their
5. D E V E L O P M E N T OF DBMS
Early Developments: Prior to 1964 access methods, and by some software
Establishment of Families:1964-1968 vendors to designate and embellish compre-
Postley/Mark IV Family hensive packaged systems.
B a c h m a n / I D S Family
Formatted File/GIS Family
Vendor/CODASYL Developments: 1968 to the Present 2. OBJECTIVES OF DATA-BASE MANAGEMENT
C O D A S Y L / D B T G Family
IMS Family
Inverted File Family
The Guest Editor's Introduction to this
Additional Vendor Developments issue of COMPVTING SURVEYS discussed the
6. T H E P R E S I D E N T I A L DATA BASE E X A M P L E concepts of data-base technology and intro-
7. T R E N D S A N D ISSUES
Ad Hoe versus Programming Systems
duced some of its objectives:
Geographically Distributed Systems • to make an integrated collection of
Data-Base Machines data available to a wide variety of
To Standardize or Not?
ACKNOWLEDGMENT users;
CLASSIFICATION OF R E F E R E N C E S • to provide for quality and integrity
REFERENCES of the data;
A V A I L A B I L I T Y OF R E F E R E N C E S
• to insure retention of privacy through
security measures within the system;
and
• to allow centralized control of the
data base, which is necessary for
efficient data administration.
To this we add the objective of "data inde-
v
pendence," a term to be defined later [see
page 12] in this paper. This section will deal
benefits of a generalized approach can thus with each of the stated objectives, relating
be summarized as the elimination of program them to the overall functional architecture
duplication (frequently found in computing of the DBMS.
systems), and the amortization of the one- While various "views of data" (the prin-
time development costs over many applica- cipal topic of this issue of COMPUTINGSUR-
tions of the program. VEYS) are important to the user interface,
In cases where a particular data-process- the requirements for quality, integrity, se-
ing application cannot be parameterized, the curity, and control have far-reaching effects
usual recourse is to develop high-level lan- on the overall cost, accessibility, and per-

Computing Surveys, ¥oi. 8, No. I, March 1976


Evolution of Data-Base Management Systems • 9

formance of the system. Although it is erent machine and some had incompatible
possible to add functional capabilities to an formats (different structures on different
existing system, the cost of retrofitting is tapes). Moreover, none of t h e data defini-
often prohibitive, and the post-design addi- tions were easily available. The manager
tion may adversely affect the system per- who needed the data for important predic-
formance. Although quality, security, and tions was unable to obtain answers i n n
control factors are given relatively scant reasonable amount of time.
treatment in other papers in this issue of There are two important mechanisms for
SVRVEYS, it should not be inferred that these making data available: the "data definition"
are unimportant. In fact, the consequences and the "data dictionary." A data definition
of excellent or poor satisfaction of these is a more sophisticated version of a DATA
needs may make or break a working system. DIVISION in COBOL,or a FORMAT state-
ment in FORTRAN; however, a data defini-
tion is supplied outside t h e user program or
Data Availability
query and must be attached to it in some
Everest [G12] states that the major objective way. The data definition (as specified by a
of a DBMS is to make data sharing possible. data administrator) generally consists of a
This implies that the data base as well as statement of the names of elements, their
programs, processes, and simulation models properties (such as Character or numerical
are available to a wide range of users, from type), and their relationship to other ele-
the chief executive to the foreman (Everest ments (including complex groupings) which
and Sibley [GS]). Such sharing of data re- make up the data base. The data definition
duces its average cost because the com- of a specific data base is often called a
munity pays for the data, while individual schema.
users pay only for their share. However, When the data definition function is cen-
under these circumstances the data cannot tralized (which is necessary to achieve the
"belong" to any individual, program, or de- objectives of DBMS), control of the data-
partment; rather, it belongs to the organiza- base schema is shifted from the programmer
tion as a whole. to the data administrator [A1]. The pro-
What, then, is the overall cost of data? One grammer or the ad hgc user of a query lan-
way to answer this question is by observ- guage is no longer a ~ e to control many of
ing data entry. Keypunching and verifying, the physical and logical relationships. While
or other types of data entry involving hu- this restricts the programmer to some ex-
man keystroking, tend to cost about 50¢ per tent, it means that all programs use the
thousand characters input. Thus, if the same definition; thus any new program can
average-sized file is two million characters retrieve or update data as easily as any other.
(a figure representative of much of today's Furthermore, greater: data definition capa-
industry and government), it costs $1000 to bilities are provided, the storage and re-
input each average-sized file. Under certain trieval mechanisms are hidden from the
conditions the cost of collecting data could program, the formats cannot be lost, and the
be substantially higher, e.g., when the data programmer's task is si~apler.
must be collected by telemetry, or in long Centralized data definition facilitates the
and complicated experiments. control of data duplication, which generally
Another expense is associated with the entails some storage inefficiency. However,
lack of data, the so-called "lost opportunity not all duplication of dam is bad; a con-
cost." If data is not available when an im- troled duplication may be ~ e s s a r y to allow
portant decision is to be made, or if duplicate special classes of users t o , b r a i n especially
but irreconcilable data exists, an ad hoc and fast responses without penalizing quality
possibly wrong decision results. Nolan for other users.
[A4] gives a scenario of a typical business The data definition facility is inherent to
where a manager knew that data existed, all DBMS. W i t h o u t it, the data base is
but some of it had been produced on a diff- owned by its progra~as, difficult to share,

ComputingSurveys,V o l .8 , N o . 1, M a r c h 1976
• i ? .
10 * James P. Fry and Edgar H. Sibley

and generally impossible to control. This, elements, and the data validation state-
then, is the cornerstone of data-base man- ments can be used to!generate procedures for
agement systems. input editing or other quality checking.
Whereas the data definition facility is the The data dictionary is extremely impor-
data administrator's control point, the data tant as part of the DBMS security mecha-
dictionary [D1] provides the means of broad- nism. If an adversary knows you are gather-
casting definitions to the user community. ing data, that adversary has already violated
The data dictionary is the version of the your security. For this reason, the data dic-
data definition that is readable by humans. tionary should be as secure as the DBMS.
It provides a narrative explanation of the Furthermore, if security requirements are re-
meaning of the data name, its format, etc., tained in the dictionary they can be auto-
thus giving the user a precise definition of maticaliy checked (and special procedures
terms; e.g., the name TODAYS-DATE may can be invoked) every time a data defini-
! b e defined narratively and stated to be tion is produced for the DBMS. This would
stored in ANSI standard format as Year: improve security monitoring.
Month: Day.
Within the past five years a number of Data Quality
data dictionary packages have appeared on
the market [D2]. Some of these are an in- Perhaps the most neglected objective of
tegral part of the data definition function, DBMS is the maintenance of quality. Prob-
while others provide an interface to multiple lems relating to the quality of data and the
DBMS, and still others are stand-alone integrity of systems and data go hand-in-
packages. hand. Data may have poor quality because
The dictionary will normally perform it was:
some, if not all, of the following functions: • never any good (GIGO--garbage in,
storage of the definition, response to inter- garbage out);
rogation, generation of data definition for • altered by human error;
the DMBS, maintenance of statistics on use, • altered by a program with a bug;
generation of procedures for data validation, • altered by ~ machine error; or
and aid in security enforcement. Obviously, • destroyed by a major catastrophe
storage of the data definitions in the dic- (e.g., a mechanical failure of a disk).
tionary is obligatory. Maintenance of quality involves the de-
The dictionary wiil normally be able to tection of error, determination of how the
either provide formatted dictionaries (on error occurred (with preventive action to
request) or respond to a simple query for a avoid repetition of the error), and correction
data entry, or to do both. This facility al- of the erroneous data. These operations en-
lows ad hoe users to browse through the tail precautionary measures and additional
definitions (on- or off-line) to determine cor- software functions within the data-base
rect data names. management system. The prevention and
In some dictionary systems, especially correction of the five listed causes of error
those that augment a DBMS, the data ad- will now be briefly discussed.
ministrator can invoke a data definition gen- In dealing with normal data-processing
erator. This allows the administrator to pick applications, the programmer is faced with a
names of elements from the dictionary, great deal of input validation. A survey by
group them, and then produce a new data the authors showed that about 40 % of the
definition. P R O C E D U R E divisions of present-day in-
The dictionary may be both a collector dustrial COBOL programs consists of error-
for, and a repository of statistics on DBMS checking statements. If the validation re-
usage. These statistics can be utilized to im- quirements can be defined at data definition
prove the efficiency of the DBMS by re- time, then error checks may be applied auto-
grouping elements for better accessing. matically by the system at input, update,
The dictionary may contain information manipulation, or output of data, depending
on techniques for validation of particular on the needs specified by the data adminis-

Computing Surveys, Vol. 8, No. I, March 1976


i
Evolution of Data-Base Management Sy~tern~ " i 11
! i
trator. Many current DBMS allow valida- generally involves t h e u s e of the audit trail.
tion. Some have a check mechanism which In modern operating systems a restart facil-
ensures that the values conform to the ity is often prOvided. Normally, in order to re-
stated PICTURE (like COBOL); they also start the DBMS after a failure which does
check that the value is within the defined not involve physical damage to the storage
range, or that it is one of a predefined set. If devices, a "checkpoint" facility is used. A
a system is to support such techniques it checkpoint is a snapshot of the entire machine
must have special clauses in the data defini- condition (CPU, memory, etc.) recorded on
tion language (DDL), as well as a series of the log. This entry presents a known condi-
procedures to be invoked on error detection. tion of the entire system.
A second cause of poor data is human or A checkpoint may be either taken by the
program error. Little can be done to prevent computer operator or automatically initiated
such errors unless they contravene some by the DBMS. Usually the latter method is
validation rule, and their discovery nor- triggered by a procedure which keeps count
mally involves human knowledge. The cause, of the number of transactions processed and
however, may be detected by referring to then initiates the checkpoint when a prede-
the "audit trail." An audit trail is a log in fined value is exceeded, The problem with
some journal of all changes made to the data such facilities is that they often need a
base. When a change is to be made, there are quiescent system, i.e., one in which the trans-
two important objects: the original data and actions are being held in queues or have been
the changed data. When logged, these ob- completed. This "freeze" operation may
jects are termed "before" and "after" im- take some time. Unstarted procedures are
ages. Generally, these images contain the held until the checkpoint process has been
data, time, and name of the procedure caus- completed, causing a delay which can lead
ing the change. They may also record the to dissatisfied users.
name of the person who initiates the pro- After any major error it is possible to
cedure. A quality audit is an attempt to de- back-up to the latest checkpoint on the log
termine, through examination of the before and then move forward along the log, re-
and after images, who or what procedure placing updated (after) images of completed
changed the data value. A quality audit may transactions or reinitiating unfinished trans-
find that some user promulgates many errors, actions. Recovery can be a complicated
whereupon the data administrator may re- process, and many current data-base man-
quest that the user take more care (or a agement systems rely substantially on the
course in better techniques). If, however, the capabilities of the underlying operating
error appears to have been generated by system to perform the function.
some operational program, a programmer Sometimes a major storage failure (e.g., a
may be called in to debug it. disk crash) requires replacement of hard-
Sometimes an error will be detected after ware and total reloading of the (possibly very
a procedure is partially completed. In this large) data base. It is not unusual to find
case, as well as when a user makes a mistake, commercial and governmental data bases
it is often necessary to "back-out" the pro- with over one billion characters. A sequen-
cedure or "back-up" the data base. This is a tial load of such a large data base may take
process of reinstating the data that has been two to six hours on present-day computers.
incorrectly updated. Many data-base man- The reload is from a data-base dump, that is,
agement systems provide an automatic from a copy taken at some time in the past
facility for reinstatement, achieved by
(assuming possible failure of the original). A
reading the before images from the audit
trail and replacing any updated data with data-base dump only represents the status
its prechange value. of the data base at a certain time, and any
Poor quality data can also be generated updating performed subsequent to that
by an unpredicted disaster. The process of time must be replicated by using the log.
recovering from a permanent hardware Many current systems use such techniques,
failure, or of restarting after a minor error although some still rely on reinitiating and

Computes Surveys. Vol. 8, No. 1, Mar~h :197/6


12 • James P. Fry and Edgar H. Sibley

reprocessing the logged update transactions. It includes the establishment of the data ad-
This procedure tends to be very slow. ministration function and the design of
The quality and integrity of data depend effective data bases. Data administration
on input-validation techniques in the origi- currently uses primitive tools; a discussion of
nal data definition, logging of data-base them would be beyond the scope of this
changes, periodic snapshots of the entire paper (see [A1, 2, and 3]). However, it is
machine status, and total or incremental important to note that data-base design in-
data-base dumping. These operations require volves tradeoffs, because users may have
additional software in the data-base man- quite incompatible requirements. As an ex-
agement system, both for initiation of the ample, one group may require very rapid
protective feature and for its utilization to response to ad hoc requests, while another
reconstitute a good data base. Howexrer, they requires long and complicated updating
entail an overhead expense which adds to the with good security and quality control of
normal running cost. the data. The implementation of a system
responsive to the first need may suggest a
Privacy and Security storage technique quite different from that
needed by the second. The only way to re-
The third major objective of data-base solve such a conflict is to determine which
management systems is privacy--the need to user has the major need. If the requirements
protect the data base from inadvertent ac- are equally important, a duplicate data base
cess or unauthorized disclosure. Privacy is may be necessary--one for each class of user.
generally achieved through some security Although the installation of a data-base
mechanism, such as passwords or privacy management system is an important step to-
keys. However, problems worsen when con- ward effective management control, today's
trol of the system is decentralized, e.g., in data administrator faces a challenge: the
distributed data bases, where the flow of available tools are simplistic and seldom
data may overstep local jurisdictions or cross highly effective. They involve simulation,
state lines. data gathering, and selection techniques.
Who has the responsibility for the privacy Some new analytical methods appear promis-
of transmitted data? When data requested ing [G3]. These methods select the "best"
by someone with a "need to know" is put among several options of storage techniques,
into a nonsecure data base and subsequently but they are usually associated with one
disseminated, privacy has been violated. particular DBMS rather than with several.
One solution to this problem is to pass the
privacy requirements along with the data, Data Independence
which is an expensive, but necessary addi-
tion. The receiving system must then retain Many definitions have been offered for the
and enforce the original privacy require- term data independence, and the reader
ments. should be aware that it is often used am-
Security audits, another application of the biguously to define two different concepts.
audit trail; are achieved by logging access But first, we must define other terms. A
(by people and programs) to any secure in- physical structure2 describes the way data
formation. These mechanisms allow a se- values are stored within the system. Thus
curity officer to determine who has been ac- pointers, character representation, floating-
cessing what data under what conditions, point and integer representation, ones- or
thereby monitoring possible leakage and pre- 2 The terms data structure a n d storage structure,
venting any threat to privacy. Much of this which were promulgated by the CODASYL Sys-
technology is, however, still in its infancy. tems Committee [U2] can be attributed to l)'Im-
perio [DL2]. However, in computer science, the
term data structure is more closely associated with
Management Control physical implementation techniques such as linked
lists, stacks, ring structures, etc. To prevent am-
The need for management control is central biguity we opt for the more basic terms, logical
to the objectives of data-base management. and physical structure.

Computing Surveys, Vol, 8, No. I, March 1976


Evolution of Data-Base Management Systems • ,13

twos-complement, or sign-magnitude repre- structuring function! [G13], but this defini-


sentation of negative integers, record block- tion of data independence is perhaps too
ing, and access-method are all things associ- broad. It suggests that substantial logical
ated with the physical structure. The term change could be made without creating a
logical structure describes the user's view of need to change the programs--a difficult, if
data. Thus, a COBOL DATA DIVISION is not impossible task. However, a serious at-
(mainly) a statement of logical structure; it tempt is being made to understand how
deals with named elements and their rela- much logical change can be made without
tionships rather than with the physical im- adverse affect on the program. Some of the
plementation. A record in a COBOLprogram different models discussed in this issue of
is manipulated without knowledge of the SURVEYS claim to be more data independent
computer hardware or its access method. than others. Full data independence appears,
As an example, the data item named however, to involve an understanding of
AUTHOR may have values FRY, SIBLEY, data semantics, the formalization of the
FRANK, TAYLOR, CHAMBERLIN, etc. meaning of data. Research on data seman-
Whereas the name AUTHOR is a logical tics is currently in its infancy.
phenomenon, the representation of authors
is a physical phenomenon. 3. FUNDAMENTAL CONCEPTS AND
In the early days of DBMS, the term DEFINITIONS
"physical data independence" was used. A
system was said to be (physically) data in- Some important ideas were introduced when
dependent if it could deal with different we discussed the basic objectives of DBMS.
physical structures and/or different access This section presents further concepts and
methods, and if the user of the data base definitions.
did not have to provide information on de- Unfortunately, our language is rich in its
tails of the structure. Thus a definition of words and semantics about data. Entity,
physical data independence is: item, name, element, value, instance, and
A system is data independent if the pro- occurrence (to name a few) come ready-
gram or ad hoc requests are relatively in- equipped with meaning, yet they are used in
dependent of the storage or access different ways. We must be precise, and are
methods. thus forced to make exact definitions for
Systems with physical data independence these words which we must use consistently.
provide a discrete number of choices for ira-
plementing the physical storage of data. Elements of Logical Structure
Other systems also allow the user to make The starting point is to define the object of
requests with little knowledge of the logical the discourse, the entity, and the process of
structure of the data. Such systems, which its definition, which is a modeling process.
are said to have logical data independence, A human being is constantly "modeling" in-
may operate correctly even though the log- formation--a baby sees an animal and says
ical structure is, within reason, altered. A "dog" (though it may be a horse). The
definition of logical data independence is: process of modeling information as data
The ability to make logical change to the often involves trial-and-e~ror. First, infor-
data base without significantly affecting mation needs are determined, next data
the programs which access it. (and processes) are structured to satisfy the
Logical data independence has two im- needs, and then data is restructured because
portant aspects; first, the capability of a of changes in the need or necessary improve-
data-base management system to support ments to the model.
various (system or user) views of the data The principal construct in the data struc-
base, and second, the capability of the data- turing process is the entity:
base management system to allow modifi-
cation of these views without adversely im- An information system deals with objects and
events in the real world that are of interest. These
pacting the integrity of existing applications. real objects and events, called entities, are repre-
The latter capability is important in the re- sented in the system by data. Information about

ComputingSutWey$~ Vol. 8, No. 1, March 1976


14 • James P. Fry and Edgar H. Sibley

PRE$-NAME tion of the P R E S I D E N T entity. Another


user m a y have a different viewpoint, and
need to add P A R T Y and S T A T E to the
:SPOUSE model. Thus the dala base depends on the
(assumed) usage, an d the model m a y need
to be changed during the life of the system.
I t is possible to describe a " d a t a model" of
an entity in a formal fashion using a set-
theoretic notation where:
• all repeating groups are enclosed in
MONTH DAY YEAR
{ } to represent the fact t h a t they
m a y repeat as m a n y times as neces-
FIGURE 1. The PRESIDENT entity. sary, and
• all ordered sets or n-tuples are en-
a particular entity is in the form of "values" which closed in ( ) to show t h a t the order of
describe quantitatively and/or qualitatively a set the items is important.
of attributes that have significance in the system In this way, the entity P R E S I D E N T m a y
[MI]. be defined as in Display 1 below.
Thus the goal of the data structuring process An instance or occurrence of an e n t i t y is a
involves the collection of data (a set of facts set of values for each of the items of the en-
capable of being recorded) about some tity definition. Any repeating group in an
identifiable entities that convey information entity has an occurrence which consists of
(i.e., meaning to humans). The repository one value for each of its items. However,
for the data is the data base. A data base is there m a y be potentially zero or an un-
described in terms of its logical structure; limited number of occurrences of a repeating
this provides a template for the data in- group in the occurence of the entity. Natur-
stances which constitute the data base. ally, each element value should be valid,
D a t a about an entity is generally re-
i.e., it should conform to the rules and must
corded in terms of some attribute(s) t h a t
describes the entity. In the description of be one of the possible members of the set of
the data base, separate attributes are called allowable values.
elementary items or, in brief, items, while the If the names of t h e presidents of the
collection of elementary items is termed a United States are the value set (or range) of
repeating group or group. For example, the the domain named P R E S - N A M E , then we
entity P R E S I D E N T may be described in have the value set in Display 2 below and can
terms of items P R E S - N A M E , SPOUSE, the construct one instance of P R E S I D E N T as:
group B I R T H - D A T E , and the repeating (FORD, BETTY, (7, 14, 1913),
group C H I L D R E N . The group B I R T H - {<SUSAN, (12, 10, 57)), <JOHN, (09, 03, 52)),
(STEPHEN, (12, 21, 55)), (MICHAEL, (09, 17, 50))}).
D A T E is made up of M O N T H , DAY, and
Y E A R , while the repeating group C H I L - For almost any real-world situation there
D R E N is made up of C - N A M E and D A T E are many entities of interest which are re-
O F - B I R T H . There may be zero or m a n y lated in some fashion. In a Presidential In-
repetitions of the C H I L D R E N group. The formation System there will be entities such
definition of the P R E S I D E N T entity is as P R E S I D E N T , CONGRESS, E L E C -
illustrated in Figure 1. This represents, of T I O N , S T A T E , and A D M I N I S T R A T I O N ,
course, only one possible model of a defini- all of which are interrelated; for example,
Display I:
PRESIDENT -~ (PRES-NAME, SPOUSE, BIRTH-DATE, {CHILDREN})
where BIRTH-DATE ~- ( M O N T H , D A Y , Y E A R )
and CHILDREN = (C-NAME, DATE-OF-BIRTH)

Display 2: Value Set of PRES-NAME = {FO~RD, NIXON, JOHNSON, KENNEDY,...},

Computing Surveye, Vol. 8, No. 1, March 1976


Evolution of Data-Base Management Systems • 15
i
PRESIDENTs "WIN" ELECTIONs, A key may be considered either a logical or a
STATEs are admitted during a Presi- physical phenomenon: the key may be used
dent's ADMINISTRATION, and PRESI- to identify an entity (logical), or it may
DENT(s) serve with CONGRESS(es). A re- cause the system to sort the set of instances
lationship may therefore exist between the of entities into an order based on the value
instances of two entities. Typically, there of the key (physical). In this issue of SUR-
are at least three types of relationships: vEYS, key will be considered a logical con-
• One-to-one: some of the Presidents pect, but note that this definition allows
are first native sons of some states; "sort by key" as a physical attribute of the
for example, Washington (one Presi- data base.
dent) was the first native son of The discussion of entities, items, and
Virginia (one state). groups has involved logical structure. The
• One-to-many: during an Administra- definition of this structure (a schema) re-
tion several STATES may be ad- quires some formal language, which is
mitted, but a state is not admitted termed a data definition language (DDL).
more than once in different Adminis- This language may be formatted, like a
trations. CoBoL DATA DIVISION, or be relatively
• Many-to-many: a President serves free-form. The following three articles in this
with many Congresses, and a Con- Special Issue give spedlfic examples of D D L
gress may serve under many PRESI- usage.
DENTs.
More on this topic is discussed in other Introduction to Data Models
articles in this issue of SURVEYS, but before
going further, the reader should note that The evolving field of data models is often
the following statements have different hotly debated. Proponents of each model
meanings. point out its advantages, but so far there is
• "The relationship named A exists no concensus as to the best version. In
between two entities, B and C"; and reality, there is a spectrum of data models
• "Two instances P~ and Q1 of entities ranging from the CoBoL-like "fiat file"
P and Q, respectively, are related by (single entity model) to the complex ex-
A,, tended-set model.
The first is a logical statement. It states log- Since COBOL, the most widely used lan-
ical relationships that may occur between guage today, has a DATA DIVISION with
two entities; for example, Presidents (B) data definition capabilities, it represents a
win (won) (A) Elections (C). The second good starting point for the discussion of data
statement refers to current values of data in models. Though limited, this data definition
the data base; for example, NIXON (Qi), a capability allows tile group (termed a
P R E S I D E N T (Q), wins (won) (A) the RECORD in CoBoL) to be defined as an 01
1972(P~) ELECTION (P). level, followed by the items, groups, and re-
Relationships may be explicit or implicit. peating groups at other levels. The PRESI-
The entities may be joined by some naming D E N T entity, discussed previously, is shown
convention (such as WIN), or the relation- in Figure 2.
ship may be implied (as in the example of In COBOL, each item is formatted by de-
P R E S I D E N T with the repeating group
O1 PRESIOENT.
CHILDREN). 02 PRES-NAME PICTURE X ( 2 0 ) . . .
Generally, the instances of certain items 02 SPOUSE PIC X IIO)...
02 BIRTH- DATE. ,.
in a group are in one-to-one correspondence 05 MONTH...
O3 DAY PIC..,
to an instance of the entity. For example, O3 YEAR...
the year of an election may uniquely identify 02 CHILOREN OCCURS O TO N TIMES.,.
05 C - N A M E . . .
a presidential election, or the congress num- 0:3 DATE - OF - BIRTH...
ber may uniquely define a Congress. These FIGURE 2. A CoBoL-like definition for
items are called identifiers or candidate keys. PRESIDENT.

ComputingSurveys. VoL 8. No. I, March 1976


• i
16 • James P. Fry a~d Edgar H. Sibley

fining a P I C T U R E . Thus P R E S - N A M E is "way in" to the entity is its entry, or (when


shown as 20 characters long, SPOUSE is 10 stood on its head) it" i s t h e "root" of the tree.
characterslong, while DAYistwo numerics. Logically, containment and hierarchical
The COBOLdefinition deals with one entity representations are equivalent; however, the
(defined at the 01 level), but a COBOL struc- physical implementation of such systems
ture may also be termed "contained" be- causes differences in the way they are ma-
cause the groups B I R T H - D A T E and nipulated. (Hierarchic systems are discussed
C H I L D R E N are contained within the in this issue by Tsichritzis and Lochovsky
P R E S I D E N T entity (see Figure 3). There [see page 105].) The hierarchic model has a 1
may be many levels of containment of groups to N relationship between an owner and
within groups. There is no semantic reason member entity; e.g., for one P R E S I D E N T
why groups shown as contained in the there m a y be many (or no) C H I L D R E N . It
P R E S I D E N T entity should not, by some also has two constraints: no member entity
other model or user, be considered separate can be shared by the owner entities, and no
entities; i.e., B I R T H - D A T E and C H I L D - entity at a lower level may own a member at
R E N might each be entities. The relation- a higher level in the hierarchy (assuming the
ship between P R E S I D E N T , B I R T H - D A T E , words "lower" and "higher" refer to the po-
and C H I L D R E N entities may, however, be sition down a page, with the root at the top).
constrained because the two latter are con- The second constraint really follows from
tained entities t h a t are not really separate, the first, but it has important effects.
but rather, are "owned" by the PRESI- If we first relax the multiple ownership
D E N T entity. Such a model is said to be constraint, it is possible to have the same
"hierarchical." Thus a hierarchy of enti- member entity participating in two different
ties involves a superior entity and one or relationships with a single owner entity.
more inferior entities, each of which m a y This requires a means of distinguishing the
participate as superior entities at a third relationships. As an example, the relation-
level, etc. A hierarchy represents a "tree," ships between P R E S I D E N T and STATE
"bush," or "fan-out" of entities all related may be both A D M I T T E D - D U R I N G - A D -
by a family-tree-like relationship (with no M I N I S T R A T I O N and NATIVE-SON: a
sons shared by different fathers). The top- President may be in office when one or more
most level of the hierarchy is termed the STATE(s) are admitted, and one state may
entry or root--terms arising because the have zero or several native sons as Presidents.
This problem can be resolved by labeling the
PRESIDENT arcs (showing the relationships between the
I PRES-NAME entities) with the name of the relationship,
B~ATE as shown in Figure 4.
CONTAINMENT The second constraint must be re]axed
CHILD] carefully. Most graphical or network models
still retain one constraint: t h a t no entity
may participate as both owner and member
in the same relationship. This m a y appear
unfortunate or unnecessary; after all,
PRESIDENT P E O P L E do E M P L O Y other P E O P L E ,
RE-N"MEI and some P E O P L E are P A R E N T S - O F
other PEOPLE. However, by careful de-
sign this problem can be resolved. Discussion
of this and concepts of the general network
model is given in the paper by Taylor and
Frank [see page 67].
FIGVRE 3. The PRESIDENT entity as con- At the more theoretical end of the spec-
tained and hierarehieM structures. trum is the class of data models based on

Computing Surveys, Vol. 8, No. I, March 1976


Evolution of Data-Base ManagemepJ 8y~ems • 17
i

Chamberlin [see pag~ 43]. Relational sys-


tems have been in use for some time at uni-
versities and research laboratories, e.g., the
use of MAcAIMs [Z1] and AvMINS [Z2] at
AOM,N,STRA, MIT, and of R D M $ [Z3] at General Motors
Research. Some prototype systems are ap-
pearing on the market now.
The binary association model, as dis-
cussed by Senko [DL5], is part of an attempt
at understanding and formalizing data se-
mantics through the use of binary relations.
Although one of the earliest set processors
/ ~ ~ NATIVF- ml DURING J was proposed in the Information Algebra
SERVES/ \ "<,:~'~ | AOMINISTRATION| [M1], Childs' set model [M2] was one of the
first to be implemented, and it is also being
investigated by Hardgrave [M4]. The ex-
tended set allows storage of a very wide
range of ordered sets and ordinary sets, and
FIGUR~ 4. Some n e t w o r k s t r u c t u r e e x a m p l e s . is intended to provide maximum generality
in storing relationships. However, applica-
mathematics, especially on set theory: tion of these models is still in the realm of
• relational, research, though one commercial system is
• binary association, and now available [V24].
• extended set models.
To recapitulate, information structuring
The relational model [M3] deals with en-
tities which have no containment. Thus (the selection of entities and specification of
each entity is made up only of items. The relationships between them) is a modeling
notation introduced earlier can be used to process with little methodology, other than
define the same three entities (two of them common sense. In order to use a DBMS, the
unchanged): information structure must be mapped to
PRESIDENT = (PRES-NAME, SPOUSE)
BIRTH-DATE = (MONTH, DAY, YEAR)
CHILDREN = (C-NAME, DATE-OF-BIRTH).
But there are now no links between the three the logical structure of the system. The
entities. These may, however, be made ex- mapping is expressed in a I~DL. The in-
plicit by using the candidate keys to estab- stances of the data (the data base) are
lish the relationships: stored by the DBMS to conform to this
P-DATE-OF-BIRTH = (PRES-NAME, MONTH, DAY, YEAR)
logical structure. A DBMS generally sup-
P-KIDS-OF = (PRE~-NAME, C-NAME) ports only one of the data models: relational
hierarchy, or network. Since each model uses
assuming that the candidate keys (unique by a different terminology, Table 1 attempts to
definition) are PRES-NAME,. MONTH, equate the various terms used with the con-
DAY, YEAR, and C-NAME. Another cepts that have been developed in this sec-
method is to link the entities implicitly by. tion.
passing the owner candidate key (PRES- The criteria for designing and selecting a
NAME) into the dependent entities: "best" model has not yet been established--
PR-BIRTH-DATE = (PRES-NAME, MONTH, DAY, YEAR) nor is it likely to be established in the near
PR-CHILDREN = (PRES-NAME, C-NAME, DATE-OF-BIRTH).
future. The user is therefore faced with two
The instances "of a group are often called decisions: which data model to utilize (i.e.,
n-tuples in the literature of relational sys- which type of DBMS), and how to structure
tems, which are discussed in the paper by the data using the chosen model.

Computing Surveys, Vol. 8, No. 1, March 1976


18 • James P. Fry and Edgar H. Sibley

Convegt Relational Nawork Hierarchic

Item role name/domain •-data item type item/field


Item value component . . . . • data item occurrence value
Group. not affowed. " igroup group
Entity (type) relation, record type entry/segment type
Entity instance tuple record occurrence entry/segment occur-
rence
Relationship foreign key comparable set type hierarchic (implied)
underlying domains
Relationship set occurrence assembly
instance
Data administrator d a t a model logical s t r u c t u r e logical s t r u c t u r e
view
Definition of data d a t a model definition schema schema
administrator view
User view data submodel
Definition Of user d a t a submodel defini- s u b s c h e m a subschema
view tion
Data-base subdivi- area
sion
Entry points p r i m a r y key s i n g u l a r sets CALC records root group
root s e g m e n t
Single u n i q u e / i d e n t i - c a n d i d a t e key key s e q u e n c e r (unique)
fier

TABLE 1. COMPARATIVE TECHNOLOGY.

Mapping from the Logical to the Physical request so t h a t any logical relations m a y be
Structure derived. As an example, the request:
PRINT SP.OUSI,:WIIERE PlIES-NAME :="FOIID"
The need to create and load a d a t a base, i.e.,
to m a k e the d a t a definition and then popu- does not mention t h a t we are dealing with a
late it with data, leads to the physical struc- P R E S I D E N T entity; it is left to the D B M S
ture, which is the representation of d a t a in to discover this fact from the logical struc-
storage. The accessing process for the d a t a ture. T h e physical mapping m u s t have some
base m a n a g e m e n t system is shown in some- mechanism t h a t will determine which d a t a
what oversimplified form in Figure 5. T h e to retrieve (using the key P R E S - N A M E if
definition of the logical structure is stored possible), and then will call the relevant
operating system access method and apply
within the D B M S and associated with the
a n y deblocking t h a t is necessary to return
the required portion of the character stream.
USER OR PROGRAM T h e process of mapping from occurrences
REOUEST
of data to their bit-string representation on
. T , STORED
disk or tape is generally system-dependent;
LOGICAL ASSOCIATIONOF/ DEFINITION therefore, these factors are discussed in the

iNAMES IN REQUESTWITHL
,DATA DEF,.NITION J " ~
(LOGICALSTRUCTURE)
separate papers in this issue of SURVEYS.
Most D B M S format (block and manage)
the pages or records themselves, and most
use the operating system access method to
IOFACCESS
T.O
} IB I store and retrieve the d a t a from secondary
devices.
FIGURE 5. Logical and physical aspects of a I n fact, because most modern D B M S use
DBMS. the available operating system, they gen-

Computing Surveys, Vol. 8, No. I, March 1976


Evolution of Data-Base Management S y ~ a • 19
i

erally use many of its facilities. Therefore, Evolution of Data Definition Languages
communication management facilities, pro-
gram library management, access methods, One important factor in the evolution of
job scheduling, special program manage- DBMS is the development of data defini-
ment (e.g., sorting and compiling), concur- tion languages. They provide a facility for
rent access prevention, checkpoint facility, describing data bases that are accessed by
etc. typically are all "adopted" by the multiple users and by diverse application
DBMS, though some rewrite and additions programs.
may be necessary.
Centralized Data Definition: Fifties and
Sixties
4. HISTORICAL PERSPECTIVES
Probably the first data definition facility
The origin of DBMS can be traced to the was the COMeOOL [DL1] developed at the
data definition developments, the report M I T Lincoln Laboratory for the SAGE Air
generator packages, and the command-and- Defense System in the early fifties. COMPOOL
control systems of the fifties--a time when provided a mechanism for defining attributes
computers were first being used for business of the SAGE data base for its hundreds of
data processing. Many systems have been real-time programs. The COMPOOL concept
developed since the fifties (See the surveys was later carried over to JOVIAL [PL4] (a
by Minker, [U1, 4]). M I T R E [U3, 8] and programming language), but some of the
CODASYL (U2, 7] show numerous system capability was lost when the language was
implementations that have generated wide implemented under a generalized operating
interest among users. system; the data definition became local to
In 1969 Fry and Gosden [U5] analyzed the language rather than global to the sys-
severM DBMS and developed a three-cate- tem.
gory taxonomy: Own Language, Forms About the same time, hardware vendors
Controled, and Procedural Language Em- were developing programming languages for
bedded. Succinctly stated, these categories business applications: FACT [PL1] was de-
can be contrasted as follows: Own Language veloped by Honeywell, GECOM [PL3] by the
Systems (such as GIS IV16]) have a high- General Electric Company, and Commercial
level, unconventional programming lan- Translator [PL2] by IBM; all provided some
guage; Forms Controled Systems (such as form of data-definition facility. GEcoM and
MARK IV IV12]) use the "fill-in-the-blank" Commercial Translator provided the capa-
approach, and Procedural Language Sys- bility of defining intrarecord structures, and
FACT offered the more advanced capability
tems (such as I-D-S IV9]) take advantage of
of providing inter-record hierarchical struc-
existing higher-level programming languages. tures.
In 1971 the CODASYL Systems Com- Under the aegis of CODASYL, these
mittee [I6] observed that the most significant vendor efforts were merged into COBOL[PL5]
difference among DBMSs was the method in the late fifties. This language has a cen-
employed in providing capabilities to the tralized DATA DIVISION which achieves
user. The Committee developed a two- the separation of the description of data
category classification scheme, Self Contained from the procedures operating on it. While
(which included the Forms Controled cate- the DATA DIVISION initially mirrored the
gory) and Host Language. data as stored on tape or cards, implementors
It is impossible to survey all systems, but soon found themselves using different ways
of physically storing data. This inherent in-
it is possible to trace the evolution of the compatibility between physical data stored
DBMS by tracing the evolution of two pre- by different manufacturers becomes an im-
cursors of data base management: data portant factor when data must be exchanged
definition languages and the development of between two systems.
generalized 1RPG systems. Approaches which attempt to mitigate the

Computing Surve~Vs,VoI. S, No. 1, March 1976


20 • James P. Fry and Edgar H. Sibley

data-transfer problem are the subiect of port generators cani perform complex table
recent research on the description of physi- transformations and produce sophisticated
cal structures and the development of stored- reports from a data base. These, then, al-
data definition languages. lowed the user to dxamine and manipulate
large volumes of data, and they may be
Stored-DataDefinition Languages: theSeventies said to be a precursor, or a particular type of
modern DBMS.
One of the first efforts in this area was
mounted by the CODASYL Stored-Data The Hanford/RPG Family (Figure 6)
Definition and Translation Task Group
[SL2] in 1969 with the goal of developing a The patriarch of today's RPG system was
language to describe stored data. At the 1970 developed at the Hanford (Washington)
ACM SIGFIDET (now SIGMOD) meet- operations of the Atomic Energy Commis-
ing, a preliminary report was made [SL3], and sion, which was then managed by the Gen-
later reports were published in 1972 [SL5]. eral Electric Company. In 1956 Poland,
Notable basic research efforts in the develop- Thompson, and Wright developed a gen-
ment of these languages were reported by eralized report generator [G1] (MARK I) and
Smith [SL1] and Sibley and Taylor [SL4, 7] a generalized sort routine for the IBM 702.
in 1971. The capability was extended in 1957 by the
The Data Independent Accessing Model development of a report and file maintenance
(DIAM) [DL3], developed by Senko and his generator (MARK II). These routines pro-
colleagues at the IBM San Jose Research vided the basis for a joint development by
Laboratory, provides a multilevel data de- several users under the SHARE organiza-
scription capability. The description starts tion of the 709 Package (9PAc) [Wl] for
at the information level, structures this into the IBM 709/90.
a logical definition, adds encoding informa- 9PAc is the principal ancestor of most
tion, and ends with a physical description of commercial report generators developed
the storage device and its logical-to-physical since 1960. Foremost among these is the
mapping structure. Each level provides aug- Report Program Generator (RPG) de-
mentation of the description at the preceding veloped for the IBM 1401 in 1961; this has
level. Recent work by Senko [DL4,5] ex- evolved into the RPG for the IBM System/
tends the information level in a new language 360 and an enhanced RPG II for the IBM
called FORAL. System/3, System/360, and several other
Thus, the single-level data description computers [W2, 3]. Other members of the
facility of the fifties, made incompatible by Hanford family include the COGENT sys-
storage developments in the sixties, led to tems, developed by Computer Sciences
the recent development of stored-data de- Corporation for the IBM 709 and System/
scription facilities in the seventies. 360 between 1965 and 1969 [Y5], and the
SERZ~S system [Y9].
Development of Report Generator Systems Another system, also based on MARK II
ideas, was being defined during the late
The development of programming languages fifties in. a SHARE 704 Project under Flet-
originally allowed the user (a programmer) cher Jones. This IBM 704 system, called
to define reports by giving simple definitions S u R ~ [W4], was the predecessor of GZRLS,
of the format of the lines and then writing the partiarch of the Postley/MARK IV
procedures to move data into buffers prior family.
to printing each line. Therefore, the program
written to produce a complete report could 5. DEVELOPMENT OF DBMS
consist of large numbers of statements in-
volving expensive programming. The de- The development of the data-base manage-
velopment of report generators stems from a ment systems may be divided into three
need to produce good reports without this somewhat overlapping periods: the early
large programming effort. In most cases, re- developments, prior to 1964; the establish-

Computing Surveys, Vol. 8, No. I, March 1976


Evolution of Data-Base Ma~agement Systems 21

1956 GENERALELECTRIC-HANFORD REPORTGENERATOR(MARK I )


(IBM 702)

1967 GENERAL ELECTRIC- HANFORO REPORT• FILE MAINTENANCE


GENERATOR(MARK]Dr)(I ~M 702 )

19,59 SHARE ORGANIZATION


J\
9PAC ( IBM 70~) SURGE ( I B M 704)

1961 IBM // R {see FigS)


/
/
I-D -S
(see Fig.9 )
1965 IBM RPG(IBM SYSTEM/360) COGENT

I
RPG ] I (|BM SYSTEM/3)
\
SERIES

FIovRZ 6. The Hanford/RPG Family.

meat of families during the period 1964- first to discuss the translation of a query
1968; and the vendor/CODASYL develop- language. They designed a language, QueRY
ments from 1968 to the present. Since the IX7], and developed techniques for analyzing
characteristics of the data-base management its syntax and compiling statements into
systems differ considerably during these machine code.
periods, we discuss them separately. One of the first identifiable data-base man-
agement systems to appear in the literature
Early Developments: Prior to 1964 was an elegant generalized tape system de-
veloped by Climenson for the ROA 501 in
The impetus for DBMS development came 1962. This system, called RetfievM Com-
originally from users in government, par- mand-Oriented Language [KS], provided
ticularly from the military and intelligence five basic commands, with Boolean state-
areas, rather than from industry and compu- ments permitted within some of them. The
ter manufacturers. Although these prototypes user had to specify the data description with
bear little resemblance to today's systems the query so that a program could be bound
and were somewhat isolated, they provided to its data.
some interesting "firsts" in the evolution Another early and ambitious develop-
of data-base technology. They also provided meat was ACSI-MA'rm IX1] sponsored by
the beginnings of several significant DBMS the US Army in the late fifties. This system
families. was designed by Minker to emphasize effec-
In 1961 Green [X2] and his colleagues de- tive memory utilization and inferential
veloped a natural-language system called processing. It could make inferences such
BAsE-BALL. Though not a data-base man- as: if John is the son of Adam, and Mary is
agement system by current definition, it the sister of John, then Mary is the daughter
made a contribution to the technology by of Adam. It contributed the first generalized
providing access to data through a subset of data-retrieval accessing package for a disk-
natural language (a limited vocabulary of oriented system with batched requests, a dy-
baseball-related terms). At approximately namic storage algorithm for managing core
the same time, the first implementation of a storage, and the first assembler to use a
B-Tree was described by Collilla and Sams dynamic storage allocation routine. Because
ix6]. disks were not reliable at that time, the
Cheatham and Warshall were probably the ASCI-MATiC system was never fully imple-

Computing Surveys, Vol. 8, No. I, March 1976


i
22 • James P. Fry and Edgar H. Sibley

mented. A prototype version was imple- Establishment of Fami!ies: 1964-1968


mented later at RCA (1964).
The US Air Force also pioneered develop- During this period the isolated developments
ment of DBMS by sponsoring several diminished and fulliscale families of DBMS
projects at the MITRE Corporation. The emerged, some borrowing heavily from the
prototype, called Experimental Transport past, others from sibling developments. A
Facility (ETF), led to the Advanced Data family is not limited to one company or
Management System (ADAM) [X24, 25, 27, government agency; because of the mobility
33], initiated in 1962. ADAM was imple- of its developers, a family may spread across
mented on an IBM 7030 (STRETCH) c o m - organizations, providing cross-fertilization
p u t e r , with the design goals of providing a
of ideas. Although the family lineages of
laboratory for modeling, prototype develop- DBMS are sometimes intertwined, each
ment, design verification, and evaluation of can be traced to its progenitor.
DBMS. Although ADAM did not meet all
its ambitious design goals [X37] (many have The Postley/Mark I V Family (Figure 8)
not yet been achieved anywhere), it still re- One early system, which evolved into the
mains a significant contribution to the tech- MARK IV family, was GIRLS (Generalized
nology. Information Retrieval and Listing System),
The COL~NGOsystem [X22], a contempo- developed for the 7090 by Postley [X4]. In-
rary of ADAM, was really a series of tape- fluenced by the SHARE SVRGEdevelopment
oriented data-base systems with CoBoL-like (as discussed on page 20), Gcsc led suc-
logical data structures implemented on the cessively to the development of the MARK I,
IBM 1401 computer system. The C-10 MARK II, and MARK III systems for the
IX30] system, originally named as a follow-on IBM 1401/60 at Informatics between 1961
to COLINGO, was implemented on the IBM and 1967. In 1968 the highly successful
1410 computer and embodies many of the MARK IV SYSTEM [V12] was released for use
ADAM concepts. The influence of ADAM can on the IBM System/360. Since then, nu-
be seen in System Development Corpora- merous releases of MARK IV have provided
tion's LVCID [X13, 21], and in parts of over twenty new features, and MARK IV has
Auerbach's DATAMANAGER-1(DM-1) [X31]. now been implemented on other hardware.

ETF

1962 MITRE
/
ADAM (IBM 7030) COLINGO ( ! BM 1401)
///'
1965 MITRE /
\
J
C-IO (IBM 14101
/ \
/ \
/ \
/ \
1967 SDC LUCID (see, Fig. 13) \ ACS[ -MATIC

MITRE BRANCH I \ //
\\\ /
DM-I (U 1218)
1969 AUERBACH
,.6ooo,

1970 WESTERN ELECTRIC

FIGURE 7. T h e M I T R E / A u e r b a c h Family.

Computing Surveys, Vol, 8, No. 1, March 1976


Evolution of Data-BaseManagem~ ~y~ems • 23
l

1962 AIS GIRLS (IBM 7090) (see Fig.6)

1962 INFORMATICS
i
MARK I ( IBM 1401/60)

1964 INFORMATICS
I
MARKH ( IBM 1401/60)

1966 INFORMATICS
I
MARK m ( IBM 1401/60)

1967 SCIENTIFICDATA MANAGE (XO$ 9401


SYSTEMS

1968 INFORMATICS MARK ]~ ASI-ST APPLICATION


(IBM SYSTEM/ ( IBM SOFTWARE INC,
360) SYSTEM/360)
ISUNDEEN BRANCH I

FIGURE 8. The Postley/MARK IV Family.

A significant offshoot of the Postley/ leagues at the General Electric Company in


MARK IV family is the Sundeen branch. This 1964. The I-D-S system, which stems from
spans two different companies, starting with the same needs as 9 PAC, combined random-
the MANAGESystem [X23, Y4] developed at access storage technology with high-level
Scientific Data Systems, and followed by procedural languages (GEcoM in 1963, and
the AS-IST system IV1] developed at Appli- COBOL in 1966) to provide a powerful net-
cations Software in 1967 for the IBM Sys- work model of data. Significant I-D-S de-
tem/360. velopments included:
• new data manipulation verbs or pro-
Bachman/IDS Family (Figure9) cedure calls at the high4evel language
interface;
The Integrated Data Store (I-D-S) IX15, • separate storage- and programdevel
18] was developed by Bachman and his col- item descriptions;

1964 GENERAL I-D-S/GECOM


(GE 400)( see Flg.6)
ELECTRIC )~
I
1966 GENERAL I-D-S/COBOL APL ( IBM SYSTEM/560) GENERAL
ELECTRIC • / ~H6000) (seeFig.II) MOTORS

$970 ELECTRIGENERALc dotoBASIC


~ .

1973 CODASYL DDLC 73 8 \

197,5 HONEYWELL I-D-S/t1 (H60001 MDQS


FIGURE 9. The Bachman/IDS Family.

ComputingSurrv~s,VoL8, No. 1, March1970


24 • JamesP. Fry and Edgar H. Sibley

• implicit insertion and removal of environment: CREATE, INSERT, FIND,


groups from relationships, based upon FOR EACH, REMOVE, and DELETE.
selection and ordering rules; Another contribution of APL was the intro-
• retrieval of, and modification to both duction of a distinct technology which
primary and secondary keys; separated logical relationships of the owner
• data paging concepts based on logical and member groups from their physical
data-base keys; implementation.
• incremental recovery and restart Another branch in the I-D-S family is the
using "before" and "after" images; dataBAsIc system [Vll]; implemented by
and Dressen at General Electric (now Honey-
• shared access to the data base, with well) in 1970. This system offered the non-
automatic detection of interference programming user high-level access to
and automatic restart capability. homogenous files (single record type) in a
Since 1964 the I-D-S system has evolved time-sharing environment using the BAsIc
under several different hardware systems, programming language. Its only retrieval
operating systems, and host languages. Re- statement consists of the FOR (Boolean
cently, a new version, I-D-S/II [V9], using search statement), which qualifies a set of
COBOL 74 [PL7], has been made available by groups (records) to be retrieved. Each re-
Honeywell. It is consistent with the CODA- trieval is processed by any number of pro-
SYL DDLC 73 specification [$3] which will cessing statements until a concluding
be discussed in the section on the CODASYL/ N E X T statement is encountered.
DDLC 73 specification [$3] which will be dis- A recent offshoot in the I-D-S family is the
cussed in the section on the CODASYL/ Honeywell Management Data Query Sys-
DBTG Family, on page 25, and with recent tem, MDQS [V10]. This system is a self-
COBOL additions. contained query and report specification
In 1966 Dodd and his colleagues at Gen- facility to access sequential, index sequential,
eral Motors Research developed APL (Asso- and I-D-S files.
ciative P L / I ) [X28], which is a development
somewhat similar to that of I-D-S, but Formatted File/GIS Family (Figure 10)
intended to provide data-management func-
tions for a computer-aided design environ- At about the same time as the host language
ment [Gll]. APL provides six data-manipu- progenitor (9PAc) was evolving, a series of
lation verbs in a P L / I host-language government systems was being developed to

SAGE
I
1958 J IRS (DTMB)(IBM704)

1959
, I
TUFF/TUG(DTMB)(IBM 704/91

1961 FFSISAClllBM70901 IPS (NAVY) IPS IPS


(CDi 16041 (IBM 7090) (AN/UYK-I)

1963 FFS(IDHS) "'""FFS(FICEU R() IBM 1410)


(,BMI40I) / ~

1965 ~ FFS(DCA/NMCSSC) FFS(D|A-IOHS)


\ \ (IOM 1410) \
\ \\
1969 GIS \ NIPS (IBM SYSTEM/360) CO
(IBM SYSTEM/ \ (IBM SYSTEM/360)
360) \ I
1970 FFs M,OMS(',M SYSTEM
(,6MS~"/3SO) /36O. He000)
FIGURB 10. T h e F o r m a t t e d F i l e / G I S Family.

Computing Surveys, Vol. 8, No. 1, March 1976


Evolution of Data-Base Management ~gystems • 25

support the needs of the Command-and- cessing System (Nn~s)i[X17]. NIPs added the
Control and the Intelligence communities. concepts of logical file maintenance, im-
Perhaps the most prolific of these was the proved query language, and on-line process-
Formatted File family, which spans all three ing. In 1968 NxPs was converted from IBM
development periods. Its origins can be 1410 to IBM System/360 and named NIPs-
traced to a series of systems developed at the 360 [Y12].
David Taylor Model Basin ~by Davis, Todd, A cousin of NIPs was also developed for
and Vesper. One of the principal systems-- the intelligence community--the Intelligence
Information Retrieval (IR) IX3, 9J--was an Data-Handling Formatted File System
experimental prototype developed in 1958 [X26]. This emphasized efficient large-file
for the IBM 704. This was followed by two processing and provoked interest in machine-
formatted file-processing packages: Tape independent implementation using COBOL.
Update for Formatted Files, TUFF [X16, 20], Prototype development of such a system be-
and Tape Updater and Generator, TuG gan in 1968 by the Defense Intelligence
[X5] (both developed to run on the IBM 704). Agency. The effort was first named the CO-
Later this family split into two branches in BOL Data Management System (CDMS)
the Air Force and Navy. The Air Force [Y8]; later (1970) it was renamed the Ma-
branch, SAC/AiDs Formatted File System chine Independent Data Management Sys-
[X14], was developed in 1961 for the Stra- tem (MIDMS) [Yll]. It was originally im-
tegic Air Command 438L system. Its major plemented on the IBM System/360 and
contribution to data technology was the was later coded (in 1973) for the H6000
development of a file format table, i.e., a series.
"self describing" data base. By storing a SAC FFS is considered to have inspired
machine-readable data definition with the IBM's Generalized Information System
data, each data base was directly accessible (GIS) [V16, 17]. This was originally de-
by FFS. veloped as a stand-alone program product
The Navy branch, Information Process- for System/360 (1965), but has been ex-
ing System (IPS) [Xll, 12, and Y10], was tended and enhanced to act as either a
also developed in 1963 for the CDC 1604 by stand-alone system or ad hoc interrogation
NAVCOSSACT. IPS also made contribu- interface for the IMS family.
tions to data-base technology in the imple-
mentation of a multilevel hierarchically Vendor/CODA$YL Developments: 1968 to the
structured data base on sequential media, Present
and in its implementation on several differ-
ent hardware systems, such as the IBM The trend in this period shifts from in-house
709/90 [X19] and the AN/FYK-1 [X32]. family-oriented activities to proprietary
During the implementation of IPS in vendor development. As a result, some ad-
1963, another branch of the family was de- vances made by commercially available
veloped for the Naval Fleet Intelligence DBMSs disappeared into a veil of secrecy.
Center in Europe (FICEUR) [Xl0]. This While few references have appeared recently
FFS was patterned after the SAC FFS and on the internals of particular DBMSs, the
implemented on the IBM 1410. SAC also technical literature abounds with articles on
added an FFS on the IBM 1401 for the mathematical and theoretical aspects, espe-
Pacific Air Force Headquarters. This system cially of relational systems. Chamberlin's
was later reprogrammed for the IBM Sys- article (see page 43) provides an excellent
tem/360 and is still in use on smaller models. bibliography of this development. Recent
About 1965 the SAC and F I C E U R years also show the entry of CODASYL into
branches of the formatted-file family merged, the data-base field.
resulting in the NMCS Information Proc-
CODAS YL/DBTG Family (Figure 11)
3 The D a v i d T a y l o r Model B a s i n is now called the
D a v i d T a y l o r N a v a l Ship Research and Develop- Based upon the pioneering ideas of I-D-S
m e n t Center. and APL, the CODASYL Programming

ComputingSurvCye.Vol. 8, No. 1, March 1976


26 • James P. Fry and Edgar H. Sibley

1964 GENERAL Z-D-S (seeFig.9)


ELECTRIC "~ /
1966 GENERAL APL
MOTORS
1968 CODASYL LIST PROCESSING
....-..-~..-.- TASKGROUP
1969 CODASYL DBTG_ SPECIFICATION

1970 B.E GOODRICH/ I D M S ~ XEROXDATA


CULLINANE (IBM SYSTEM/ ( ~ 1 ~ , ~ ) ~ SYSTEMS.
360)
1971 CODASYL DBTG1971 \ DMSI100 UNIVAC
SPECIIFICATIONx~(UNIVAC I100)
1973 CODASYL DDLC 1973~ EDMS XEROX
SPECIFICATION
~ p ~ (SIGMA5,7,9) SYSTEMS
DATA
. \
1973 PHILLIPS HOLAS(P I007) PHILIPS
1975, HONEYWELL Z-D-S/Z DBLTG1975 CODASYL
SPECIFICATION
I
1976 BF, GOODRICH/ IDMS-~ COBOLt976 COOASYL
CULLINANE (PDP1 1 / 4 5 ) SPECIFICATION

FIGURE 11. The CODASYL Family.

Language Committee started a new task CODASYL took two significant actions:
group to work on a proposal for extending • a new standing committee was created
COBOL to handle data bases [PL6]. This to deal exclusively with the data de-
group was originally called the List Process- scription, the Data Description Lan-
ing Task Group, though its name was later guage Committee (DDLC); and
changed to the Data Base Task Group-- • the DBTG was replaced by a new
DBTG--its major acronym, which will be task group to deal only with COBOLex-
used here. The first semipublic recommenda- tensions, the Data Base Language
tions of the DBTG were made in 1969 IS1]. Task Group (DBLTG).
These recommendations detailed the syntax Since that time, a new subcommittee has
and semantics of a Data Description Lan- also been formed to add DML statements to
guage (DDL) for describing network-struc- FORTRAN.
tured data bases, and the definition of The DDLC was charged with taking the
Data Manipulation Language (DML) state- Schema DDL and developing a common
ments to augment COBOL. The task group data description language to serve the major
intended that the DDL specifications should programming languages. In January 1974 a
be available to ail programming languages, first issue of the Data Description Language
while extensions like the DML would be Committee's publication, the Journal of
needed for every language. Development, was published [$3]. This re-
The initial DBTG specification was re- port specifies only the syntax and semantics
viewed by many user and implementation of the DDL.
groups. Their recommendations were further The DBLTG was charged with making
considered, and a new report was issued in the 1971 report of the DBTG consistent
1971 [$2]. The major change involved separa- with CODASYL COBOL specifications. In
tion of the data description into two parts; a February 1973 the DBLTG submitted its
Schema DDL for defining the total data base, report to the CODASYL Programming Lan-
and a Sub-schema facility for defining various guage Committee. This report is very similar
views of the data base consistent with differ- to the 1971 DBTG report, with nomencla-
ent programming languages. ture and relatively cosmetic changes. New
Based on the reviews of the 1971 report, items in the 1973 report included an ex-

Computing Surveys, Vol.8, No. 1, March1976


Evolution of Data-Base Managemeut Systems • 27

tension to the facility for dealing with error (Generalized Update Access Method), the
returns. forerunner of Data Language/One (DL/I).
Implementation of systems which con- The other was the implementation of two
formed to the 1969, 1971, and 1973 DBTG teleprocessing applications, EDmT (Engineer-
specifications started in 1970 with the ing Document Information Collection Task)
UNIVAC DMS 1100 [V22] for the 1108, and and LIMs (Logistics Inventory Management
since then for the UNIVAC 1110 series com- System). The software package which sup-
puters. At about the same time, B. F. ported EDICT and LIMS, the Remote-Access
Goodrich implemented a system called In- Terminal System (RATs), was jointly de-
tegrated Data Management System, IDMS veloped by Rockwell International and IBM
IV7], for the IBM System/360. This has during 1964-65. Both GuAM and RATS were
since been extended to IDMS-11 for the originally implemented on the IBM 7010
Digital Equipment Corporation PDP 11/45. with 1301 disk Storage.
The IDMS series is marketed by Cullinane In 1966, IBM, Caterpillar Tractor Corpo-
Corporation. The Digital Equipment Corpo- ration, and Rockwell International agreed to
ration has implemented DBMS-10 [VS] for a joint development effort to produce a
its PDP 10 computer system. DBMS, the Information Management Sys-
Some extensions to self-contained facilities tem (IMS) for the IBM System/360. When
for ad hoc interrogations have been imple- the system had to be frozen in 1968 (to meet
mented by Control Data Corporation, the Apollo commitment), Rockwell and
Query/Update IV6], and by Xerox Data IBM each continued with separate develop-
Systems, EDMS [V23]. In the Netherlands, ments, while Caterpillar withdrew entirely
Philips implemented a family of systems from the effort. The development at Rock-
termed PHOLAS IV19], and in Norway the well took the name of Information Control
SIBAS IV20] system has been developed by System/Data Language/I (ICS/DL/I).
Shipping Research Services. Honeywell has Originally, DL/I [X35] was a data descrip-
updated I-D-S to conform to 1973 specifica- tion facility which provided a means for
tions; this is the I-D-S/II IV9]. describing and organizing a hierarchically
structured data base. It also provided inter-
IMS Family (Figure 12) faces, which the programming user invoked
to access and store data from the host lan-
The IMS family of systems is an outgrowth
of the Apollo moon-landing program. Its guage (originally CoBoL). The on-line com-
origins can be traced to two developments ponent, ICS/DL/I [X84], added in 1968,
at The Space Division of North American allowed multiple access by using the DL/I
Aviation (now Rockwell International) in interface from COBOLor P L / I programs. In
1965. One was the implementation of GUAM, addition to running teleproc~ssing simul-

1965 NORTHAMERICAN GUAM(IBM7010) RATS(IBM 7010)


AVIATIONSPACE DIV J/ I

1966 ROCKWELLINT D I (|BMSYSTEM/360) |CS(|BMSYSTEM/360)

1968 IBM/ROCKWELLINT. ICS/DL/! (IBM SYSTEM3601

1969 IBM
I
IMS-I (IBMSYSTEM/360)

1969 IBM
I
IMS-2IIBM SYSTEM/360)

1969 IBM
I
]MS-VS (IBM SYSTEM/370)
FIGURE 12. The IMS Family.

Computes Sutrv~Fa,VoL 8, No. 1, March 1976


28 • James P. Fry and Edgar H. Sibley

taneously with batch processing, the system Additional Vendor Developments


handled several remote terminals.
In 1969 IBM released its version, the In- A variety of other data-base management
formation Management System/360 (IMS/ systems based on inverted files for efficient
360) IV13]. Since 1969 a series of improve- query processing were developed during this
ments has marked its evolution [V14, 15]. period by other vendors. Two of the more
commonly known are ADABAS [V21], de-
Inverted File Family (Figure 13) veloped by Schoell at Software AG (West
Germany), and Model 204, developed by
Following the LUCID System development Computer Corporation of America IV4].
in 1968, the Advanced Research Project ADABAS uses the inversion tables not only
Agency (ARPA) sponsored System Develop- for efficient retrieval, but also for linkages
ment Corporation's development of the betwee n records of different files. ADABAS
Time-Shared Data Management System, provides access to the data through a host
TDMS [Y1, 2, 3, and X29]. This was de- language interface, a self-contained language
signed to operate in the time-sharing en- for on-line inquiry, and a batch report
vironment of the ADEPT executive on the generator. AbAcAS is one of the few systems
IBM System/360. It was the first DBMS to which offer a data compression facility.
combine an inverted file implementation of The Model 204 query language provides
hierarchical data model with interactive most of the power of a general-purpose pro-
processing. gramming language from an on-line ter-
In 1966 the Computation Center at the minal, but is easy to use for simple requests.
University of Texas began the development This system uses the IFAM access method to
of a Remote File Management System allow multiple field indexing and variable
(RFMS) [Y7] on its CDC 6000. RFMS length records for file compression as well as
differed from TDMS mainly in its internal for text processing.
design. A version of RFMS was marketed by Three other vendor developments date
CDC as MARs VI IV5]. back to about 1969, TOTAL, DM-1, and
M R I Systems Corporation (whose princi- DMS II. Although in its initial release,
pals were originally associated with the TOTXL IV3] was primarily a direct access
University of Texas) continued develop- data-base management system, facilities
ment of an RFMS under the name of SYs-
were soon added to process DBTG-like sets
TEE 2000 [V18], which was offered com-
mercially in 1970. A number of significant implemented with chain pointers. TOTAL is
enhancements have been made since 1970 a host-language system, which can model
so that SvsTmt 2000 offers an integrated set the major data structures of the DBTG speci-
of host-language and self-contained capa- fications, and it was one of the first systems
bilities. to offer a Schema-Sub-schema processor fa-

1967 SYSTEMDEVELOPMENT LUCID (AN/FS032) (see Fig. 7)


CORPORATION J

1969 SYSTEMDEVELOPMENT TDMS(IBM SYSTEM/360)


CORPORATION

I969 UNIVERSITYOFTEXAS R MS(CDC6000)

1970 MRI S20OOICDC


60001 MARSV! (CDC60OO) Control Doto
I Corporotlon

1971 MR| S2000(LEVEL1 )


(UNIVAC I100)

1972 MRI
I
S2000(LEVEL 2)
( l BM SYSTEM/360)
FIGURE 13. The Inverted File Family.

Computing Surveya, Vol. 8, No. 1, March 1976


Evolution of Data-Base Managemeni Systems • 29

cility. It has become one of the most widely BIRTH-DATE and DEATH-DATE, the
used data-base management packages today. party affiliation (PRES-PARTY), and the
The Data Manager-1 System (DM-1) name of his SPOUSE. It will also be con-
[X31], designed by Sable at the Auerbach sidered necessary to know the STATE-
Corporation, stems from the Army ACSI- NAME of which the President is a native
MATIC development and MITRE'S ADAM. son. However, since STATE will later be de-
DM-1 consists of a series of service routines fined as an entity, we could alternatively de-
for returning and storing data; using these fine a relationship NATIVE-SON between
routines, both high-level ad hoc user func- P R E S I D E N T and STATE.
tions and host-language application pro- Using the notation presented in Section 3
grams can be developed. DM-1 was imple- under the discussion of the "Elements of
mented at the Air Force Rome Air Develop- Logical Structure" (page 13) we have Dis-
ment Center on U1218 computer and the play 1 below. If, however, an explicit rela-
Honeywell H6000. Based on the design phil- tionship were to be used for the native son,
osophy of DM-1, the Western Electric and STATE-NAME is the key of STATE
Company, initially assisted by Auerbach, then the statement appears as in Display
developed System Control-1 [Y6] on the 2 below.
System/360. The next entity of interest is the Presi-
Another development, by the Burroughs dent's ADMINISTRATION, which con-
Corporation, is the Data Management Sys- tains items such as the administration num-
tem II [V2] for the B6700/B7700 computer. ber (ADMIN-NUMBER) (e.g., George
Basically a host-language type system using Washington was No. 1), the inauguration
COBOL, its data definition language is formed date (INAUG-DATE), and the Vice-
in set-theoretic terms. It also offers a storagePresident (VP). In order to identify the
definition option. President of each Administration, it is also
necessary to include the item PRES-NAME
6. THE PRESIDENTIAL DATA BASE EXAMPLE in the ADMINISTRATION entity.
The discussion of data-base models in other At this point, it is worth asking why the
articles in this issue of COMPUTINGSURVEYS P R E S I D E N T entity does not contain the
will use a unified example which deals with ADMINISTRATION entity. This is a de-
some parts of the Executive branch of the sign decision, and the reader must assume it
US Government, with data about the Presi- is based on consideration of usage and
dent, his Administration, elections, Con- modeling. It should be noted, however, that
gress, etc. We use this example because it is a President can have had more than one
almost self-explanatory; it was first enunci- Administration, and consequently, if AD-.
MINISTRATION is contained, it would
ated in a paper by Willner, et al. [G9]. need to be a repeating group. As another al-
Because the example deals with the Execu- ternative, we could assume that the two
tive branch, the most obvious entity is the separate entities have a relationship
PRESIDENT. The important items in the H E A D E D between ADMINISTRATION
P R E S I D E N T entity will be assumed to be: and P R E S I D E N T . Thus, we have Display
the President's name (PRES-NAME). 3) below.
Display 1:
PRESIDENT" = (PRES-NAME, BIRTH-DATE, DEATH-DATE, PRES-PARTY, SPOUSE,
STATE-NAME )

Display 2:
PRESIDENT-1 = (PRES-NAME, BIRTH-DATE, DEATH-DATE, PRES-PARTY, SPOUSE)
and
NATIVE-SON = (PRES-NAME, STATE-NAME).
Display 3:
either
(ADMINISTRATION) = (ADMIN-NUMBER, PRES-NAME, INAUG-DATE, VP);
or:
PRESIDENT-2 = (PRES-NAME, BIRTH-DATE, DEATH-DATE, PRES-PARTY,
SPOUSE, STATE-NAME, {(ADMIN-NUMBER, INAUG-DATE,
vP)});
or:
ADMINISTRATION-I = (ADMIN-NUMBER, INAUG-DATE, VP)
HEADED = (PRES-NAME, ADMIN-NUMBER).
C o m p u t i ~ Surv~ye~ VoL 8, No. 1, March 1976

!
30 • James P. Fry and Edgar H. Sibley

The next entity is that of the ELECTION. But there are some drawbacks to this ex-
The interesting items in the election a r e : ample: one is t h e / a c t that it represents a
the year (ELECTION-YEA,R), the presi- relatively constant idata base, for although a
dential votes in the Electoral College (PRES- President may be replaced, the data about
VOTES), the LOSER, the LOSER-PARTY, the Administration is still retained. Conse-
the year in which the party was first cre- quently there is little updating in our ex-
ated as a political entity (PARTY-FIRST- ample, though there may be substantial
YEAR), and the votes of the losing party addition to the data base in election years.
(LOSER-VOTES). Once again, because elec- Some business data bases, however, present
tions a r e w o n by a President, the election a greater propensity to change. For example,
entity may have to contain the PRES- a payroll data base regularly has changes to
NAME; otherwise there must be some re- many items such as YEAR-TO-DATE-
lationship WON between the P R E S I D E N T PAY (presumably after every payday) and
and the ELECTION entities. Thus, the SALARY (presumably after every increase).
alternatives are: Thus, the presidential data base, while form-
ELECTION = (ELECTION-YEAR, PRES-NAME, PRES-VOTES, LOSER, LOSER-PARTY,
PARTY-FIRST-YEAR, LOSER-VOTES), etc.
Another entity within the data base is the ing the major example, will not suffice alone.
STATE. It has a name (STATE-NAME), a Other authors contributing to this issue of
population (POP), and a number of votes in COMPUTING SURVEYS will introduce other
the Electoral College (STATE-VOTES). examples to illustrate particular fine points.
States are admitted t o the Union during
some Administration. This fact may be 7. TRENDS AND ISSUES .
shown either implicitly, by having some re-
lationship ( A D M I T T E D - D U R I N G ) be- Historically, we have traced the develop-
ment of DBMS from the early systems,
tween the ADMINISTRATION and
STATE entities, or explicitly, by including which supported primarily the nonprogram-
the A D M I N - N U M B E R in the STATE en- ming user for ad hoc requests, to the recent
predominance of host-language systems
tity. It might be noted that there is already
a link between the P R E S I D E N T and which support the programming user. A cur-
rent trend is, then, the establislunent of a
STATE entities because the NATIVE-SON
relation has been shown as an element balance---a comprehensive set of DBMS
(STATE-NAME) in the P R E S I D E N T functions for a full spectrum of users while
entity. maintaining the current DBMS objectives
We have now defined most of the data [FI, 2, and 3]. Some of the current research
base, and need only incorporate the entity is developing bridges between various models
CONGRESS to complete it. This entry will of data so that a single DBMS can support
contain items such as: CONGRESS- a variety of data models.
NUMBER, SENATE-REPUBLICAN- Three major trends and one important
PERCENT, SENATE-DEMOCRAT- issue will affect the future of DBMS: the
emergence of conversational systems, the
PERCENT, HOUSE-REPUBLICAN-
PERCENT, AND HOUSE-DEMOCRAT- need for geographic distribution of the in-
formation system, the technological impacts
P E R C E N T . Again, there is a relation be-
tween the P R E S I D E N T and CONGRESS, on DBMS architecture, and the question of
standardization of the DBMS interface.
which may be found explicitly by incorporat-
Each of these is now briefly discussed.
ing P R E S - N A M E in the CONGRESS en-
tity, or implicitly by arranging a relation
Ad Hoc versus Programming Systems
CONGRESS-SERVED between the entities.
Figure 14 shoWs a sample of the presi: Artificial intelligence research h a s already
dential data base in tabular form. Unavail- improved our understanding of the difficul-
able information is shown by a ~b, e.g., in the ties involved in providing a natural language
Death and Inauguration Date columns. interface for computers. And though there

Computing Surveya, Vol. 8, No. 1, Ma~h 1976

' . • j
Evolution of Data-Base Managemen~Systems • 31

has been little that is immediately applicable, result, some DBMS already provide good
the fall-out from this research includes a languages for the nonprogrammer who is
better understanding of the structure and willing to learn a few rules, and there is
use of higher-level and very-high-level (re- growing interest in the development of the
stricted natural) language interfaces. As a casual-user interface (e.g., see IF4]).

PRESIDENT

PRES - BIRTH- DEATH- PRES- STATE-


SPOUSE
NAME DATE DATE PARTY NAME

Eisenhower 10/14/1890 03/28/1969 Republican Mamie Texas


Kennedy 05/29/1917 1!/22/1963 Democrat Jacqueline Mass.
Johnson 08/27/1908 01/22/1973 Democrat Claudia. " Texas
Nixon 01/09/1913 Republican Patricia Calif.
Ford 07/14/1913 Republican Elizabeth Mich.

ELECTION

PARTY-
ELECTION- PRES- LOSER- LOSER-
PRES-NAME LOSER FIRST
YEAR VOTES PARTY VOTES
YEAR

1952 Eisenhower 442 Stevenson Democrat 1824 89


1956 Eisenhower 457 Stevenson Democrat 1824 73
1960 Kennedy 303 Nixon Republican 1856 219
1964 Johnson 486 Goldwater Republican 1856 52
1968 Nixon 301 Humphrey Democrat 1824 191
WMlaee 3rd Party 1968 46
1972 Nixon 520 McGovern Democrat 1824 17

CONGRESS

SENATE- HOUSE- HOUSE-


SENATE
CONGRESS- PRES- REPUB- REPUB- DEMOCRAT-
DEMOCRAT-
NUMBER NAME LICAN- LICAN- PERCENT
PERCENT
PERCENT PERCENT

83 Eisenhower 50% 49~ 49% 5o%


84 Eisenhower 49% 51% 47% 53%
85 Eisenhower 49% 51% 46% 54%
86 Eisenhower 34% 66% 35% 65%
87 Kennedy 36% 64% 40% 60%
88 Kennedy 33% 67% 41% 59%
Johnson
89 Johnson 32% 68% 33% 67%
90 Johnson 36.% 64% 43% 57%
91 Nixon 43% 57% 44% 56%
92 Nixon 44% 54% 41% 59%
93 Nixon 42% 56% 44% 56%
Ford
94 Ford 37% 60% 33% 66%

Figure 14. A sample of the presidential data base.

Computing Surveye, Vo|. 8, No. l, MmTch1976


32 • James P. Fry and Edgar H. Sibley

STATE

STATE-NAME ADMIN-NUMBER POP STATE-VOTES

Texas 16 11196730 26
Mass. 4, 5689170 14
Calif. 18 19953134 45
Mich. 12 8875083 19

ADMINISTRATION

ADMIN-NUMBER PRES-NAME INAUG-DATE VP

1 Washington 04/30/1789 Adams


2 Washington 03/04/1793 Adams
16 Polk 03/04/1845 Dallas
18 Fillmore 07/lO/185o 4,
49 Eisenhower 01/20/1953 Nixon
50 Eisenhower 01/20/1957 Nixon
51 Kennedy 01/20/1961 Johnson
52 Johnson 11/22/1963 4,
53 Johnson 01/20/1965 Humphrey
54 Nixon 01/20/1969 Agnew
55 Nixon 01/20/1973 Agnew
Ford
56 Ford 08/09/74 Rockefeller
Fig. 14. (contd.) : A sample of the presidential data base.

A casual user is one who uses the system so problem in a busy industrial environment).
seldom that all rules and techniques are This advantage is offset by the relatively
likely to be forgotten between sessions, hence high cost of using what is essentially an in-
the need for special treatment. At the other terpretive system: the tradeoff is therefore
end of the user spectrum are the adept com- between people and machine costs. The
puter programmers who have technical people costs are in programming and de-
skills and a good knowledge of "system in- bugging, while the machine costs are in
ternals." In writing programs for nonpro- running. One presumes that the code pro-
grammers they presumably utilize all their duced from a high-level (query) interface
skills to produce procedures that will run costs more to run, therefore the question
efficiently. The assumption is that pro- arises: how many times must the program be
grammers cost more (they must be paid run before it pays for the cost of program-
while they understand the problem, write ming? And this is the classical question of
code, etc.), but their resulting programs are compiling, but now in the realm of even
cheaper to run. higher-level languages and with potentially
Thus, the case for ad hoc and host-lan- larg e data bases. There are, however, very
guage systems can be considered one of few jobs today which warrant the cost of
tradeoffs. The following is a partial list of special (assembler or machine language)
the advantages and disadvantages of the programming. This trend continues today in
use of higher-level interfaces: DBMS usage, and the self-contained ad
1) Their use facilitates more rapid running hoc user system is becoming more accepted
of the problem--the user asks the question by the user community.
directly, and he has no need to call on a pro- 2) The use of a higher-level language
grammer as intermediary (a process that simplifies the structure (removes DO-loops
sometimes takes weeks for even a simple and GO-TO statements) and is generally

Computing Surveys, Vol.8, No. 1, blarch 1976


Evolution of Data-Ba~e Managern~n~ S y ~ r ~ • 33

more understandable, consequently less difficulty) on some nonhoraogeneous sys-


error prone. On the other hand, a simple tems. This trend may !be seen, in part, as an
question may invoke a long and costly pro- answer to the wish of: industry and govern-
cedure; e.g., "Give me the average height of ment to access its data in reasonable time.
all Americans," may involve a sequential As an example, one major corporation"
search of 200 million records ! Also, the possi- found that the use of a computer network
bility for ambiguity immediately arises. The allowed it to strike a corporate dollar bal-
request "Give me the count of all people in ance each Friday; thus, the company could
New York," could be interpreted as "...all let the money out on short-term loan (over
people who are, at this instant, in the state of the week-end). Surprisingly, the money
New York," while the questioner intended realized as interest on the loan paid for all
to ask " . . . all people who have, as their the network facilities, Similarly, in many
residence, the city of New York." The large corporations, the warehousing cost is
trouble with this question is obvious, but great; all material resting in inventory repre-
the user may never realize that the answer sents an unprofitable capital expenditure.
given was not correct for the intended ques- Large retail merchandising companies can
tion. reduce inventory costs b y k n o w i n g what is
3) The very-high-level languages tend to available where in their many warehouses,
have a mathematical equivalence--they can and thus be able to reduce surplus stock.
be transformed into precise mathematical Some large corporations have been able to
formulas (e.g., in predicate calculus). They give their sales forces remote access to their
are therefore capable of exact checking. In computer systems, thereby allowing the
this way, the potentially ambiguous state- salesman (and through him, the customer)
ment can be transformed into an exact state- direct on-line access to shipping and pricing
ment and "played back" to the questioner, information. The competitive advantage is
thereby helping to eliminate error. The high- very high in such Cases.
level program, however, does not have an Distributed systems, then, show a need
exact statement of its operation in good for:
mathematical terms; it does what the pro- 1) computers.to be: networked. It is not
grammer told it to do, good or bad (and all generally possible to have all the
too often the latter). Precision of statement power at a central site, and each major
is an advantage to the mathematically so- node (e.g., the~ largest warehouses)
phisticated user, and possibly to others as has its processor.
well. 2) data to be distributed. If t h e data is
Thus, the user trend may well be toward entered at hundreds of locations
the higher-level-language interface, but for throughout the country, it is probably
years to come it will be necessary to pro- efficient to store it near the entry port.
gram the large and repetitive systems of in- In some large banking systems, the
dustry and government efficiently by using customer accounts are kept in the
the language interfaces currently in use computer system at the local bankS,
(e.g., COBOL, FORTRAN,PL/I). but other branches can still service
the customer (and debit the account!).
Geographically Distributed Systems But distributed sysHms pose many new
problems, and exacerbate many old ones:
Inexpensive communication between com- Some of the new problems are revealed in the
puting systems, and the development of na- following questions.
tional and international networks have • How do we change the request lan-
forced further changes on the design of com-
guage? Does the user have to know the
puting systems. In this, DBMS is no excep-
tion. The concept of distributed data bases, location of the data? Is there a central
where a processor calls on data at several data dictionary/directory? Can the
other locations, is already a reality on some user request data b y broadcasting a
homogeneous systems--and possibly (with message for it?

Computi~ Surv~, Vol. 8~ Not 1, March 1976


t, , • " . . . . .
34 • James P. Fry and Edgar H. Sibley

• Is it better to store multiple copies? seems to admit it has merits, but finds ex-
How much extra will it cost to update cuses in order to stpp it from happening too
a data base from a remote location? soon in his own field of interest. The argu-
What parts of the data base should ments for and against standardization (in
be stored (i.e., how does one distribute any area) are now given.
the data efficiently)? What are the For standards, there is one maj or argument:
best places to run a program (it may The provision of a standard aids the user by
be cheaper for a user at A to trans- making objects interchangeable; the nut, if
port data at B to the program at C and of the same diameter, fits the bolt. Thus:
then just receive the answers at A)? • the programming language is the
The old problems have already been dis- same on all machines: so the pro-
cussed, but are now complicated by the extra grammer who knows COBOL, for ex-
complexity of the distributed system: ample, can be transferred, or may get
• What redundancy is necessary to en- a new job and not need retraining;
sure good reliability of both hardware • the company can change machines
and data? How much does this affect and run the same COBOL programs,
the user in terms of the response time after their recompilation, on the new
for updates, and the excess processing machine;
cost? • parts are interchangeable: magnetic
• What problems are likely to occur in tapes have standard densities; plug-
concurrent operation? The possibility to-plug compatibility of storage and
that several users will all contend for input/output units is possible;
the same resources, and consequently • data can be interchanged over the
will need effective scheduling and network;
control, is obviously more acute in a • the network protocol is the same, so
large, distributed, many-user system. all users have to learn only one proto-
• How can privacy be retained? The col; and
potential for breaking the system • the commands to enter (log-on) and
rises as its complexity increases. The leave (log-off) the system, and some
chance of message interception ob- other controls, are the same through-
viously increases also. out the network.
Thus, the trend to distributed data bases, Against standards, there is one major argu-
with concepts of data machines as special ment: if we do not know the correct tech-
resource nodes on the network, brings with it nology, standardization may mean costly re:
a new set of tradeoff decisions. fitting later, or may even stifle develop-
ment. This argument is reasonable, since a
Data-Base Machines large-scale data-processing shop may have
many thousands of programs representing
Distributed data bases, in conjunction with millions of dollars of investment. Rewriting
emerging technology, will have a significant all these (probably COBOL) programs in
impact on DBMS architecture and on the some new language is beyond the wishes of
DBMS functions. There already are com- most current DP managers, who hope that
puters dedicated to DBMS, e.g., the Data- their programs are "here to stay." Such
computer IF5, 6]. "Front-end" and "back- built-in conservatism will undoubtedly slow
end" computers are in the prototype stage down any change from one well-developed
[F7, 8]. Also, new disk technologies and asso- standard to another, no matter how good the
ciative devices will have a great impact on new standard may be. This stifles acceptance
DBMS architecture [F9, 10]. of new ideas.
Many groups are concerned about stand-
To Standardize or N o t e.
ardization and are actively working in this
The computing profession has ambivalent area. The DBTG report has been accepted
feelings about standardization: everyone by the Programming Language Committee

Computing Surveys, Vol. 8, No. 1, March 1976


Evolutionof Data-BaseManaged 8y~lB~ • 35

of C O D A S Y L as a p a r t of J O D COBOL. T h e V Vendor Systems !


A N S I / X 3 / S P A R C / S t u d y Group on D a t a W Report Generator
X D B M S Prior to 1968
Base Systems has been meeting since 1972. Y D B M S 1968 to Present
P a r t of their charge is to develop a basis for Z RelationalSystems
D B M S standardization. Their recent report
[Fll] formulates m a n y functional interfaces REFERENCES
of a D B M S . T h e languages used to com-
municate across these interfaces m a y be (A) Data Administration
candidates for standardization. [A1] EVEREST,O.C., "Data base dministra-
There are therefore m a n y potential areas tor organizational role and functions,"
MISRC-WP-73-05,
for standardization of D B M S : [A2] GUIDE INTERNATIONAL,"The data base
• the definition language for the logical administrator," Nov. 1972.
structure; [A3] CANNING, R. G. (Ed.), "The data ad-
ministrator function," EDP Analyzer
• the language(s) to manipulate the 10, 11 (Nov. 1972).
data; [A4] NOLAN,R., "Computer data bases: The
• the protocols for invoking procedures future is now," Harvard Business Review
(Sept. 1973).
on the data-base machine;
• the protocols on the network of a dis-
t r i b u t e d system; and (D) Data Dictionary
• the storage devices and physical [DI] UHROWCZIK,P. P., "Data dictionary/
mapping of data. directories," IBM System J., 12, 4 (Dec.
1973).
Each of these has its proponents and op- {D2I CANmNG, R. G. (Ed.), "The data dic-
ponents for various kinds of system models. tionary/directory :function," EDP Ana-
As a result, the issue of standardization is a lyzer, 12, t0 (Nov~ 1974). •
mixture of c o m m o n sense, politics, econom-
ics, philosophy, convenience, and taste. (DL) Data Definition Language
M a n y researchers consider standards an [DL1] WILWORTH,N. E., "System data con-
anathema, but m a n y users see standards as trol," Teeh. Memo. TM222/013/00, System
a necessity. T h e arguments will still be going Development Corp., Santa Moniea, Calif.,
August 1975.
on fifty years from now (even though there [DL2] D'IMPERIO, i . , "Data structures and
will undoubtedly be D B M S standards b y their representation in storage," in An-
then). nual Review in ~Utoma~ie Programming,
M. Halpern and C: J. Shaw (Eds.), Perga-
mon Press, Elmsford, New York, 1969,
ACKNOWLEDGMENT Pp. 1-75.
[DL3] SENKO,M.; ALTMAN,E.; ASTRAHAN,M.;
The Guest Editor's Introduction to this issue of AND FEHDER, P., "Data Structures and
COMPVTINOSURVEYShas already expressed grati- accessing in data-base systems," IBM
Systems J., 12, 1 (1973), 30-93.
tude to the wide variety of experts who made this [DL4] SENKO,M. E., "Data description lan-
article possible. This includes developers of DBMS guage in" the context of a multilevel strue-,,
who implemented the systems and who helped us tured description: DIAM II with FORAL,
correct errors in the history, as well as our review- IBM Research Report g6769,1974.
ers. V. Kevin Whitney and Richard G. Canning [DL5] SENKO,M.E., "The DDL in the context
in particular made many valuable suggestions. of a multilevel Structured description
DIAM II with FoaAL," in Data Base De-
scription, B. C. M. Douque and G. M.
CLASSIFICATION OF REFERENCES Niissen (Eds.),N0rth-Hblland Publ. Co.,
Amsterdam, The Netherlands, 1975.
A Data Administration
D Data Dictionary
DL Data Definition Language (F) Future, Trends
F Future, Trends
G General [F1] WHITNEY,V. KEVZN, "Fourth generation
I Introductory data management systems," Proe. of
M Data Models--Theory AFIPS National ~ompub~r Conf., 1973,
PL Programming Languages Vol. 42, AFIPS Press, Montvale, N.J.
S DBMS Specifications 1973, pp. 239-244.
SL Stored-Data Definition F2] BACHMAN, CHARLES, "Trends in d a t a -
T DBMS Texts base Management," Proe. o f AFIPS
U Surveys National Computer Conf., 1975, Vol. 44,

i
36 • James P. Fry and Edgar H. Sibley

AFIPS Press, Montvale, N.J., 1975, pp. [G7] DENNING, PETER J., "Third generation
569-576. computer systems," Computing Surveys
[F3] EVEREST, GORDON C., "The futures of 3, 4 (Dec. 1971), 175-216.
data-base management, '~ Pr oc. 1974 [GS] EVEREST, GORDON C.; AND SIBLEY, cEDGAR
SIGMOD Conf., May 1974, pp. 445--462. H., "A critique of the GUIDE-SHARE
[F4] CODD,E . F . , "Seven steps to rendezvous data-base management system require-
with the casual user," Proc. IFIP TC-~ ments," Proc. of the 1971 ACM-SIGFIDET
Working Conf. on Data Base Management Annual Workshop on "Data Description,
System Congress, April 1974, North- Access and Control," E. F. Codd and
Holland|Publ. Co., Amsterdam, The Neth- A. L. Dean, (Eds.), pp. 93-112. also
erlands, 1974. MISRC-WP-71-2.
[F5] MARILL, THOMAS; AND STERN, DALE, [G9] WILLNER, S. E.; BANDURSKI, A. E.;
"The datacomputer--a network data util- GORHAN, W. C.; AND WALLACE, M. A.,
ity," Proc. of AFIPS National Computer "COMRADE data management system,"
Conf., 1975, Vol. 44, AFIPS Press, Mont- Proc. AFIPS National Computer Conf.,
vale, N.J., 1975, pp. 389-395. 1973, Vol. 42, AFIPS Press, Montvale,
[F6] MARILL,T.; ANYSTERN,D.DATACOMPUTER N.J., 1973, pp. 339-345.
VERSlON I USER MANUAL,Working paper [G10] BACHMAN, C. W., "The programmer as
no. 11, Computer Corp. of America, Cam- navigator," Comm. ACM 16, 11 (Nov.
bridge, Mass., August 1975. 1973), 653-658.
[F7] CANADAY,R. n . ; HARRISON,R. D.; IVIE, [Gll] GARTH, W., "Design console technology
E. L.; RYDER, J. L.; AND WEHR, L. A., at General Motors," Proc. SHARE 1974
"A back-end computer for data base Conf., August 1974.
management," Comm. ACM 12, 10 (Oct. [G12] EVEREST, GORDON C., "The objectives
1974), 575-582. of data-base management," Information
[F8] HEACOX, H. C.;,, COSLOY, • E. S."; AND Systems COINS IV (Tou), Plenum Press,
COHEN, J . B . , An experiment m dedi- New York, 1974, pp. 1-3h, also MISRC-
cated data management," in Proc. of WP-71-64.
Internatl. Conf. on Very Large Data Bases, G[13] NAVATHE, S. B.; AND FRY, J. P., "Re-
Sept. 1975, ACM, New York, 1975, pp. structuring for large data bases: three
511-513. levels of abstraction," ACM, TODS, to
[F9] Su, S. Y. W.; COPELAND, G. P.; AND appear in June 1976.
LIPOVSKI, G. J., "Retrieval operations [G14] TEOREY, T. J.; AND DAS, K.S., "Applica-
and data representations in u context- tion of an analytical to evaluate storage
addressed disk system," Proc. ACM- structure", Data Translation Technical
SIGPLAN-SIGIR Interface Meeting on Report No. 76DE 7.1. Univ. of Michigan
Programming Languages and Information Graduate School of Business Administra-
Retrieval, Nov. 1973, pp. 144-160. tion, Ann Arbor, 1976.
IF10] LIN, C. S., AND SMITH, D. C. P., "The
design of a rotating associative array
memory for a relational data-base manage- (I) Introductory
ment application," ACM TODS, 1, 1
(March 1976), 53-65. [Ill LYON, J. K., Introduction to data base
[Fll] ANS1/X3/SPARC/STUDY GROUP--DATA design, Wiley Interscience, DiD. of John
BASE SYSTEMS, "Interim report,"ACM/ Wiley and Sons, New York, 1971.
SIGMOD Ncwsletter:fdt, 7, 2 (Dec. 1975). [I2] BACHMAN,C. W., "Data structure dia-
grams," SIGBDP: Data Base 1, 2 (1969).
[I3] BYRNES, C.; AND STEIG, D., "File man-
agement systems: a current summary,"
(G) General Datamation 15, 11 (Nov. 1969).
[I4] OLLE, T.W., "MIS: data bases," Data-
[G1] McGEE, W. C., "Generalization: key to mation 16, 15 (Nov. 1970).
successful electronic data processing," [I5] DIXON,PAUL, "The role of data manage-
J. ACM 6, 1 (Jan. 1959), 1-23. ment in management information sys-
[G2] McGEE, R. C.; AND TELLIER, H., " A tems," IAG Journal 3, 2 (August 1970).
re-evaluation of generalization," Data- [I6] CODASYL SYSTEMS COMMITTEE, "In-
marion, .(July-August 1960), 25-38. troduction to 'feature analysis of general-
[G3] YAO,S. B.; AND MERTEN, A.G., "Selec- ized data-base management'," Comm.
tion of file organization using an analytic ACM 14, 5 (May 1971), 308-318.
model," Proc. of the Internatl. Conf. on
Very Large Data Bases, Sept. 1975, ACM,
New York, 1975, pp. 255-267. (M) Data Models--Theory
[G4] STEEL, T., "Beginnings of a theory of
information handling," Comm. ACM 7, [M1] CODASYL DEVELOPMENT COMMITTEE,
2, (Feb. 1964), 87-103. "An information algebra, phase I report
[G5] ROSEN, SAUL, "Programming systems of the Language Structure Group," Comm.
and languages--a historical survey," Proc. ACM 5, 4 (April 19~2), 190-204.
of the Spring Jr. Computer Conf., 1964, [M2] CHILDS, D. L., 'Feasibility of a set-
V.ol 25, AFIPS Press, Montvale, N.J., theoretic data structure: a general struc-
1964, pp. 1-25. ture based on a reconstituted definition of
[G6] RosIN, ROBERT F., "Supervisory and relation," Proc. IFIP Congress 1968,
monitor systems," Computing Surveys North-Holland Publ. Co., Amsterdam,
1, 1 (March 1969), 37-54. The Netherlands, 1968, pp. 420--430.

Computing Surveye, Vol. 8, No. 1, March 1976


Evolution of Data-Base Management8ynter~ • 37

CHILnS, D. L., "Description of a set- McGEE, W.C. ~Infovmal definitions for


theoretic data structure," Proc. A F I P S the developmenti of a storage structure
Fall Jr. Computer Conf., 1968, ~ F I P S definition language."
Press, Montvale, N.J., 1968, pp. 5o7=564. YOUNG, J. W., JR., "A procedural ap-
[M3] CORD, E. F., "A relational model of proach to filetranslation."
data for large shared data banks," Comm. SIBLEY, E. ; AND TAYLOR, R. "Preliminary
ACM 13, 6 (June 1970), 377-87. discussion of a general data to storage
[M4] HARDGRAVE, W. T., "A technique for structure mapping language," Proe. 1970
implementing a set processor," Proc. ACM SIGFIDET Workshop on Data
ACM SIGMOD/SIGPLAN Conf. on Data Description and Access, pp. 368-380.
Abstraction Definition and Structure, 1976, [SIAl TAYLOR, R. W., "Generalized data-base
Taylor and Ledgard, (Eds.). management system data structures and
their mapping to physieal storage," PhD
Thesis, Univ. of Michigan, 1971.
(PL) ProgrammingLanguages [SL5] FRY, Z. P.; SMITH, D. C. P. ; AND TAYLOR,
[PL1] CLIPPINGER,R. F., "FACT a business R. W., "An approach to stored-data
compiler description and comparison with definition and translation," Proc. of 1972
COBOL and commercial translator," in ACM S I G F I D E T Workshop on Data De-
Annual Review in Automatic Programming scription Access and Control, pp. 13-55.
12, 1, Pergamon Press, pp. 231-292. [SL6] BACHMAN, C. W., "The evolution of
[PL2] INTERNATIONAL BUSINESS MACHINES, storage structures," Comm. ACM 16, 7
General Information Manual--IBM Com- (July 1972), 628--634.
mercial Translator, Form F28-8043, IBM • [SL7] SIBLEY, EDGAR I{;; AND TAYLOR, ROBERT
Corp., 1960. W., "A data definition and mapping
[PL3] GENERALELECTRIC, "GEcOM--the gen- language," Comm. AOM 16, 12 (Dee. 1973),
eral compiler," CP13 144 (IOM-4-61), 750-759.
General Electric Corp• Computer Dept.,
April 1961.
[PIA] PERSTEIN,M.H., T h e J O V I A L J $ G r a m - (T) DBMS Texts
mar and Lexicon, Tech Memo No. TM555/
002/04, System Development Corp., Santa {TI] MARTIN, J., Computer data-base organi-
Monica, Calif., 1965. zation, Prentice-Hall, Englewood Cliffs,
[PL5] SAMMET,J.E., "Base elements of COBOL N.J., 1975.
61." Comm. ACM 6, 5 (May 1962), 237-253. [T2] DATE, C. J., An introduction to data-
[PL6] C O D A S Y L DATABASE TASK GROUP, base systems, Addison-Wesley, Reading,
"COBOL extensions to handle data bases," Mass., 1975.
(PB 177 682), Jan. 1968. [T3] CA~AN, C., Data management systems,
[PL7] COROL, American National Standard Pro- Melville Publ. Co., Los Angeles, Calif.
grammi.ng Language ConoL, X3.23--1974, 1973.
American National Standards Institute, [T4] LEFKOVlTZ,D., Data management for on-
Inc., New York, 1974. line system, Hayden, Publ. Co., Rochelle
Park, N.J., 1974.

(S) DBMS Specifications


(U) Surveys
[S1] CODASYL DATA BASE TASK GROUP,
October, 1969 Report, (superseded by 1971 [U1] MINKER, JACK; AND SABLE, JEROME,
DBTG Report, currently out of print)• File orgamzation and data manage-
[$2] CODASYL DATA BASE TASK GROUP, ment," in Annua! review of information
April 1971 Report, (available from ACM). science and technology, Vol. 2, John
{$3] CODASYLDATADESCRIPTIONLANGUAGE Wiley & Sons, New York, 1967, pp. 185-
COMMITTEE, CODASYL Data Description 222.
Language Journal of Development, (June [U2] CODASYL Systems Committee, A sur-
1973), NBS Handbook 113, (Jan. 1974)• vey of generalized data base management
systems, (PB 203142), May 1969.
[U3] FRY, J. P., et al., "Data management
(SL) Stored-Data Definition systems survey," MITRE Corporation
Report, MTP 3~J, (AD 684 907) Jan. 1969.
[SL1] SMITH,D. C.P., "An approach to data [U4] MINKER,J., ' Generalized data manage-
description and conversion," PhD Thesis, meat systems--some perspectives," Univ.
Moore School Report 72-20, Univ. of of Maryland Computer Science Center
Pennsylvania, Philadelphia. Technical Report, Dee. 1969.
[SL2] CODASYL STORAGESTRUCTUREDEFINI- [US] FRY, J. P., AND GOSDEN,J.A., "Survey
TIONLANGUAGETASKGROUP, Design Ob- of management information systems and
jectives for a .Storage Structure Definition their languages," in Critical factors in
Language, 1970. (See [SL5]). data managemenl, F. Gruenberger, (Ed.),
[SL3] STORAGE STRUCTURE DEFINITION LAN- McGraw-Hill Book Co., 1969, pp. 41-55.
GUAGE TASK GROUP (SSDLTG) of [U6] GUIDE INT., Comparison of data-base
CODASYL Systems Committee, management systems," Oct. 1971.
FRY, J. P. "Introduction to storage [U7] CODASYL SYSTEMS COMMITTEE, Fea-
structure definition." ture analysis of !oeneralized data base

Computing Surve~,e,Vol. 8~No. 1, March 1976


38 • James P. Fry and Edgar H. Sibley
Management Systems, May 1971, (avail- [V7] CULLINANE CORPORATION, "Integrated
able ACM). database management system program
[US1 KOEHR, G. J., et al., " D a t a manage- and reference."
ment s y s t e m s catalogue," The M I T R E
Corp., Technical Report MTP 139, Jan. DIGITAL EQUIPMENTCORPORATION
1973, (available from The M I T R E Corp). 146 Main Street
Maynard, Mass. 01754
[VS] DECsYSTEM i0, " D a t a base manage-
[V) Vendor Systems ment system programmer procedures
APPLICATIONS SOFTWARE, INC. manual," DEC-10-APPMA-B-D, 2d ed.
Corporate Offices DECsYSTEM 10, " D a t a base management
21515 Hawthorne Boulevard system data base administration pro-
Torrance, Calif. 90503 cedures manual," DEC-10-AAPMA-B-D,
2d ed.
[V1] SUNDEEN, D. H., AS-IST--a general
purpose management system, Application HONEYWELL INFORMATIONSYSTEMS
Software, Inc., San Pedro, Calif., 1968. 200 Smith Street
Waltham, Mass. 02154
BURROUGHS CORPORATION
Burroughs Place [V9] I - D - S / I I RELATED PUBLICATIONS:
Detroit, Mich. 48232 I-D-S/II programmer reference manual,
DE09.
[V2] BURROUGHS CORPORATION, "B6700/B I-D-S/II data base administrator guide,
7700 DMS I I data and structure definition DEI0.
language (DAsDL) reference manual," Interactive I-D-S/II reference manual,
April 1974. DEll.
BURROUGHS CORPORATION, "B6700/ UFAS (United File Access System), DC89.
B7700 DMS I I host language interface 1/0 supervisor, DD82.
reference manual," April 1974. File management supervisor, DD45.
CINCOM SYSTEMS, INC. [Vl0] MANAGEMENT DATA QUERY SYSTEM
2300 Montana Avenue 0VIDQS) :
Cincinnati, Ohio 45211 MDQS, User Guide, DCS0.
MDQS Data Base Administrator Guide,
IV3] CINCOM SYSTEMS, "TOTAL/7 reference DC81.
manual--application programming," Pub. MDQS IV, DD92.
#P02-1321-2, June 1974. MDQS 1V Administrator Guide, DD94.
CINCOM SYSTEMS, "TOTAL/7 reference. [Vli] DRESSEN, P. C., "The dataBAsic lan-
manual--data base administration," Pub. guage--a data processing language for
#P02-1322-2, June 1974. non-professional programmers," Proc.
COMPUTER CORPORATION OF AMERICA AFIPS Spring Jr. Computer Conf., 1970,
575 Technology Square AFIPS Press, Montvale, N.J., 1970, pp.
Cambridge, Mass. 02139 307-312.
IV4] COMPUTER CORPORATION OF AMERICA, INPORMATICS
"CCA 204 data base management software MARK IV Systems Co.
system user language reference manual," 21050 Vanowen Street
March 1974. Canoga, Calif. 91303
CONTROL DATA CORPORATION [V12] POSTLEY,J . A . , "The MARK I V s y s t e m , "
8100 34th Avenue South Datamation, 14, 1 (Jan. 1968), 28-30.
Minneapolis, Minn. 55420
[VS] CONTROLDATA CORPORATION, MARS VI IBM (for IBM information, see local representa-
reference manual, Pub. $17305100, CDC, tive)
1974. IV13] Information Management System/360
CONTROL DATA CORPORATION, MARS VI (IMS/SGO) application description manual,
reference manual (full inversion), Pub. IBM Form No. H20-0524.
$17313000. IV14] Information Management System~360 Ver-
CONTROL DATA CORPORATION, MARS "V[ sion ~, general information manual, IBM
reference manual (partial inversion), Pub. Form No. GH20-0765.
60385900. [V15] Information Management System Virtual
[VG] CONTROL DATA CORPORATION, "Query Slorage (1MS/VS) general information
update version 2.0 reference manual," manual, IBM Form No. GH20-1260.
Pub. $ 60307500. IV16] Generalized information system application
CONTROL DATA CORPORATION, " D a t a description manual, iBM Form No. GH20-
definition language for query/update 0179.
subsehema," Pub. ~ 60359200.
[V17] BRYANT, J. H.; AND SEMPLE, P., "GIS
CULLINANE CORPORATION and file management," Proc. ACM 1966
One Boston Place National Conf., ACM, New York, N.Y.,
Boston, Mass. 02108 1966, pp. 97-107.

Computi~ngSurveys,Vol.8, No. 1 March 1976


Evolution of Data-Base Management ~yaeam • 39

MRI SYSTEMSCORPORATION Reference Mammal, 903012B, February


Box 9968 1974.
Austin, Texas 78766 SET THEORETIC.INFORMATIONCORPORATION
[Vl8] SYSTEM2000 PUBLICATIONS: 117 N. 1st Street
General information ma~ual, G-1. Ann Arbor, MieI'L 48104
BASIC reference manual (Includes binder), [V24] SET THEORETIC INFORMATION CORPORA-
A-1. TION, "STDS/I reference guide," 1975.
Immediate access feature, I-1.
COBOL procedural language interface fea-
ture, C1-. (W) Report Generator
FORTRAN procedural language interface
feature, F-1. [Wl] "SHARE 7090 9PAt, Part I: "Introduc-
P L / I procedural language interface feature, tion and general principles," in 7090
P-1. Programming Systems, ,Systems Reference
Report writer feature, R-I. Library, IBM, File 7090--28, Form JZ8-
PHILIPS-ELECTROLOGICA,B. V. 6166-1, p. 32, 1961. ,,
PO Box 245 [W2] LESLIE, H., "The report generator,
Apeldoorn, The Netherlands Datamation, (June 1967), 26-28.
[W3] FRIEDnERG, L. M., " R P G : the coming
[Vl9] PUBLICATIONS: of AGE," Dagama~ion, (June 1967), 29-31.
Introduction to PHOLAS, Pub. no. 5122 [W4] LONGO,F., "SuRoE: a recording of the
991 25221. COBOL merchandise control algorithm,"
PHOLASsub-schema DDL and SML, Pub. Comm. ACM §, 2 (Feb. 1962), 98-100.
no. 5122 991 25861.
PHOLAS schema DDL and SSL, Pub. no.
5122 991 25841. (X) DBMS prior to 1968,
SHIPPING RESEARCHSERVICES, INC. [X1] MILLER, L,; MINKER, J.; REEl), W.; AND
205 S. Whiting Street SHINDLE, W., " A multi-level file• strut-
Alexandria, Va. 22304
[V20] SHIPPING RESEARCH SERVICES, INC., • P~ if., , P
"The data base system SIBAS: an intro- Books, New York, 1960,pp. 53-59.
duction," 1974. [X2] GREEN,B. V.; WOLF,A. K.; CHOMSKY,C.;
ASCHIM,F. F.; AND BOONE, P., "SIDAS-- AND LAUGHERY, J., "Ba~e-ball, an auto-
an implementation of the CODASYL data mated question-answer," Proc. Western
base concept," Management Informatics Jt. Computer Conf., May 1961, Spartan
2, 3 (1973). Books, New York, 1961.
[X3] VESPER, N. R., Information Retrieval
SOFTWAREAG Program, Reportl C-1210 (David Taylor
Reston International Center Model Basin), May 1961.
11800 Sunrise Valley Drive [X4] POSTLEY, J. A.; AND BUE'ITELL,T. D.,
Reston, Va. 22091 "Generalized information retrieval and
61 Darmstadt listing system," Datamation 4, 12 (Dec.
Hilpertstrasse 20 1962), 22-26,
West Germany [X5] User's Manuat ]or TUG-Format Table
Tape Updaler and Generator, Naval Com-
[V21] ADABAS introduction; ADABAS reference mand Systems Support Activity, prepared
manual; AND ADABAS utilities manual, by International Business Machines
Software ag of North America, Reston, Va. Corp., Rockville, :lVID. Oct. 1962.
[X6] COLLILLA,R. A.; 'ANDSAMS, B . H . , "In-
SPERRY UNI~rAC formation structure for processing and re-
PO Box 500 trieving," Comm. ACM-§, 1 (Jan. 1962),
Blue Bell, Pa. 19422 11-15.
ATV House [X7] CHEATHAM,T. E., JR.; ANDWARSHALL,S•
17 Great Cumberland Place "Translation of re~trieval requests couched
London Wl, England in a 'semiformal' English-like language,"
[V22] "UNIvAc 1100 Series, Data Management Comm. ACM 6, 1 (Jan. 1962),3,t-39.
System (DMS 1100) schema definition, [X8] CLIMENSON,W.D., "REco~--a retrieval
data administrator reference," Sperry command language," Comm. ACM 6, 3
Rand Corp., 1972, 1973. (March 1963), 117-122.
[X9] NAVAL COMMANDSYSTEMS SUPPORT AC-
"UNIvaC 1100 Series, Data Management TIVITY, "User's manual for the 704/7090
Systems (DMS 1100) American National information retrieval," NAVCOSSACT
Standard COBOL (Fieldata), data manipu- Document No. 10S001, CM-76, Nov. 1963.
lation language, programmer reference," [X10] Intelligence Data Processing System For-
Sperry Rand Corp., 1972. matted File System, U.S. Navy Fleet
Intelligence Center and Intelligence Sys-
XEROX CORPORATION tems Dept. I B M Federal Systems Div.,
701 South Aviation Boulevard May 1963.
El Segundo, Calif. 90245 Vol. 1. Program description
[V23] "XERox EXTENDED DMS SIGMA 6/7/9," Vol. 2. Program flow diagram~ and listings

• • . . ~ v • - • . . • ,. . . . . •. . ~ • ~ :
40 • James P. Fry and Edgar H. Sibley

Vol. 4. Information system design and' [X21] GRANT, E., LucID User's Manual,
utilization Tech Memo No. TM-2354/001, System
Vol. 5. Information retrieval. Development Corp., Santa Moniea, Calif.
[Xll] NAVAL COMMAND SYSTEMS SUPPORT Ac- [X22] June 1965.
SPITZER, J. F., et al., "The COLINGO
• IVlTY, "User's manual for NAVCOS- system design philosophy," in Informa-
SACT information processing system tion System Sciences, Proc. of the Second
ase I library maintenance system," Congress, 1965, Spartan Books, New York,
VCOSSACT Document No. 88MO08, 1965, pp. 36-39.
CM-52, August 1~63.'
[x12] NAVAL COMMAND SYSTEMS SUPPORT [X23] SDS MANAGE REFERENCE MANUAL,
Publication 90-10-46A, Scientific Data
ACTIVITY, "User's manual for NAVCOS- Systems, May 1966.
SACT information processing system [X24] CONNORS, T. L., "ADAM--a generalized
phase I , " NAVCOSSACT Document No. data management system," Proc. AFIPS
90S003A, CM-51, July 1963 Su lement I Spring Jt. Computer Conf., 1966, Vol. 28,
published Jan. 1964. " PP
[X13] SYSTEM DEVELOPMENT CORP., "System Spartan Books, New York, 1966, pp. 193-
203.
design specifications for LUCID phase I , " [X25] A USER'S GUIDE TO THE ADAM SYSTEM,
Tech. Memo No. TM-1749/0O0/O0, Santa MTR-268, M I T R E Corp., (AD 664 332),
Monica, Calif., Jan. 1964. August 1966.
Vol. 1. Lucid control system design [X26] IDHS 1410 FORMATTED FILE SYSTEM:
Part 1. The Master Tape, Tech Memo FILE MAINTENANCE AND FILE GENERA-
No. TM-1749/101/00. TION MANUAL, Defense Intelligence
Part 2. Parameter Load, Tech Memo Agency, DIAM-65-9-1, August 1966.
No. TM-1749/102/00. Also, IDHS 1410 FORMATTED FILE SYS-
Part 3. Operational Control, Tech Memo TEM: RETRIEVAL AND OUTPUT MANUALs
No. TM-1749/103/00. DIAM-69-9-2.
Part 4. Test Set-Up, Tech Memo No. [X27] A DESCRIPTION OF THE INTERNAL OPERA-
TM-1749/104/00. TIONS OF THE ADAM SYSTEM, MTR-216,
MITRE Corp., (AD 660 581), August 1966.
Vol. 2. "GENDARME data processing fa- [X28] DODD, G. G., " A P L - - a language for
cilities," Tech. Memo No. TM-1749/ associative data handling in PL/1," Proc.
201/00. AFIPS Fall Jr. Computer Conf., 1966,
Vol. 3. "Lucid program design: the Vol. 29, Spartan Books, New York, 1966,
grammar of OPAQUE,"Tech. Memo No. " 667--684.
TM-1749/301/O0. [X29] RHAUS, A.; AND MILLS, R., The Time-
[x14] BRYANT, J. H., "AIDS experience in Shared Data Management System: A New
managing data-base operation," Proc. of Approach to Data Management, Tech
the Symposium on Development and Man- Memo SP-2747, System Development
agement of a Computer-CenteredData Base, Corp., Santa Monica, Calif. 1967.
A. Walker, (Ed.), System Development WILLIAMS, W. D.; AND BARTRAM, ]~. C.,
Corp., Santa Monica, Calif., 1964, pp. COMPOSE~PRODUCE: A User-Oriented
36-42. Report Generator Capability Within the
[X151 BACHMAN, C. W.; AND WILLIAMS, S. B., SDC Time-Shared Data Management Sys-
"A general purpose programming system tem, Tech Memo SP-2634, System Develop-
for random access memories," Proc. ment Corp., Santa Monica, Calif. 1967.
AFIPS Fall Jt. Computer Conf., 1964, [X30] STEIL, G. P., "File management on a
Vol. 26, Spartan Books, New York, 1964, small computer," Proc. 1967 AFIPS
" 411--422. Spring Jt. Computer Conf., Spartan Books,
[X16] VAL COMMAND SYSTEMS SUPPORT AC- New York, 1967, pp. 199-203. ,
TIVITY, "User's manual 1401 TUFF tape [X31] DIXON, PAULJ.; AND SABLE, J., ' DM-1--
updater for formatted files," NAVCOS- A generalized data management system,"
SACT Document No. 90S012W, CM-108, Proc. AFIPS Spring Jt. Computer Conf.,
NM~YC1964. (30), 1967, 185-198.
(X17] S INFORMATION PROCESSING SYSTEM [X32] NAVAL COMMAND SYSTEMS SUPPORT AC-
(NIPs), IBM 1410, NMCS Support Cen- TIVITY, "User's manual for information
ter, Washington, D.C., 1964. processing" system phase 3A for.the AN/,,
[Xl8] INTEGRATED DATA STORE--A NEW CON- FYK-1 (V) data processing set,
CEPT IN DATA MANAGEMENT, Publica- NAVCOSSACT Document No. 88MO01A,
tion CPB-483 (5C10-16), General Electric CM-123, Revision 5, August 1967.
Co. [X33] A F L C / E S D / M I T R E , Advanced Data
[XI9] NAVAL COMMAND SYSTEMS SUPPORT AC- Management (ADAM) Experiments, Final
TIVITY, "7090 i n f o r m a t i o n processing Report, (AD 648 226), Feb. 1967.
system revised," NAVCOSSACT Docu- [X34] BROWN, R.; AND NORDYKE, G. P., "ICS
ment No. 90MO02, 0M-01, Oct. 1965. an information control system," Proc.
[x20] NAVAL COMMAND SYSTEMS SUPPORT AC- IFIPS Conf. Mechanized Information
TIVITY, "User's manual for 704/7090 TUFF Storage, Retrieval and Dissemination, 1967,
MOP I I I tape updater for formatted
files," NAVCOSSACT Document No. North-Holland Publ. Co., Amsterdam,
10S001, CM-74, Nov. 1963. Change 1 pub- The Netherlands, 1967.
lished Feb. 1964. Change 2 published [X35] Data Language No. 1 (DL-1) Encyclopedia
August 1965. Pub. $SM-F, North America Aviation,

Computing SurveyJ, Vol. 8, No. 1, March 1976


Evolution of Data-Base Management ~ystems • 41
i
Inc., and International Business Ma- [Y10] NAVAL COMUAND SYSTSMS Strr~oa~ Ac-
chines Corp., 1967. TIVITY, "Information proeessingsystem
[X37] GILDEA, R. A., Evaluation of ADAM an (IPS) user's guide,'~ NAVCOSSACT
advanced data,management system, MITRE Document No. 85M904, TR.03, September
Corp., (AD 661 273), May, 1967. 1971. Change 1 published Feb. 1972.
Change 2 publishedFeb. 1973.
[Yll1 MEINJ~RS, E . E.i "A machine-independ-
(Y) DBMS 1968 to Present ent data management system" Datama-
tion, 19, 6 (June 1973), 92-98.
[YI] BLEIER, R. E., Treating Hierarchical [Y12] NMCS INFORMATIONPROCESSINGSYSTEM
Data Structures in the SDC Time-Shared 360 FORMATTEDFILE SYSTEM(NIPS FFS),
Data Management System (TDMS), Tech NMCS Support Center, CSM UM 15-74
Memo Sp-2750, System Development October 1974).
Corp., Santa Monica, Calif., 1968. Vol. I : Introduction to file concepts.
[Y2] BLEIER, R. E.; AND VORHAUS,A., File Vol. II: File structuring (FS).
Organization in the SDC Time-Shared Vol. Ill: File maintenance (FM).
Data Management System (TDMS), Tech Vol. IV: Retrieval and sort Processor
Memo SP-2750, System Development (RASP).
Corp., Santa Monica, Calif., 1968. Vol. V: Output processor (OP).
[Y3] RAUCHER, V.; AND SCHWIMMER, I~I. S., Vol. VI: Terminal processi~ (TP).
The Time-Shared Data Management System Vol. VII: Utility support ( U ) .
(TDMS), Language Specifications, Tech Vol. VIII : Job preparation.
Memo TM-3370, Systems Development Vol. IX: Error codes.
Corp., Santa Monica, Calif., 1968. TR 54-74: Installation of NiPs 360 FFS.
[Y4] SDS/9 SERIES Manage, Publication No.
CB 10035, Scientific Data Systems, 1968.
[Y5] ATLEE,E. S., et al., "COGENTIII func- (Z) Relational Systems
tional specifications," Computer Sciences
Corporation, 1968. [Zl] GOLDSTEIN,R. C.; AND STRNAn, A. L.,
[Y6] WELSH, W. A., "Engineered design of
EDP. systems," Systems and Procedures "The MAcAIMs data management sys-
Association Internal Meeting, October tem," Proc. 1970 ACM-SIGFIDET Work-
1968. shop on Data Description and Access,
[YT] "Remote file management system Nov., 1970, pp. 201-229.
(RFMS), "Computation Center Technical [Z2] MCINTOSH,S.; ANn GRIFFEL,D., "Data
Staff Documentation, Publications 0 to 14, management for a penny a byte," Com-
Univ. of Texas at Austin, 1968. puter Decisions, (May 1973).
[Y8] MANGOLD,C. A., "COBOL data manage- [Z3] WHITNEr, V. K. M., "RDMS: a rela-
ment system (CDMS) briefing," Proc.
Guide, 30, (May 1970), 175-729. tional data management system," Proc.
[Yg] McELRoY, D. C., "The SERIES data Fourth Internatl. Symposium on Computer
management system," Datamation 16, 4 and Information Sciences (COIN,g IV),
(April 1971), 131-136. Dec. 1972, Plenum Press, New York, 1972
AVAILABILITY OF REFERENCES

Addresses Publications
EDP Analyzer EDP Analyzer
Canning Publications, Inc.
925 Anza Avenue
Vista, Calif. 92083
ACM Association for Computing Machinery SIGBDP
1133 Avenue of the Americas DBTG Specifications
New York, N.Y. 10036 CODASYL Systems Committee Re-
(212) 265-6300 port
SIGMOD Proceedings
SIGFIDET Proceedings
Comm. ACM
J.ACM
TOnS
Very Large Data Base Proceedings
Management Information Systems Research Center MISRC Publications
Graduate School of Business Administration
University of Minnesota
Minneapolis, Minn 55455
IFIP Administrative Data Processing Group CODASYL System Committee
6 Stadhouderskade DBTG Specification
Amsterdam 1013, The Netherlands IAG Journal

Computfng Sur~i¢¥s. VoL•8, .N°" 1, March 1976


I
42 • James P. Fry and Edgar H. Sibley

Addresses Publications
Technical Services Branch CODASYL COBOL Specification
Department of Supply and Services
88 Metcalfe Street
Fifth Floor
Ottawa, Ont., Canada K I A OS5
British Computer Society CODASYL System Committee
29 Portland Place DBTG Specifications
London Wl, England
National Technical Information Service Documents with AD or PB numbers
5285 Port Royal Road
Springfield, Va. 22151
SHARE Inc. SHARE Proceedings
One Illinois Center
111 E. Wacker Drive
Suite 600
Chicago, Ill. 60601
GUIDE Int. GUIDE Proceedings
Mr. Sandy Hill
Smith, Bucklin, and Associates
111 E. Wacker Drive
Chicago, Ill. 60601
System Development Corp. SDC Technical Reports,
2500 Colorado Boulevard Memorandums
Santa Monica, Calif.
The M I T R E Corp. M I T R E Technical Reports
Bedford Operations
Box 207
Bedford, Mass:
Washington Operations
Westgate Research Park
McClean, Va. 22101

Computing Surveys, Vol 8, No. 1, March 1976

You might also like