0% found this document useful (0 votes)
298 views

Early History of SQL: Donald D. Chamberlin

1) The authors Ray Boyce and Donald Chamberlin were introduced to Ted Codd's relational data model in 1972 and saw how it could represent database queries more compactly than existing languages. 2) They worked to design a more user-friendly relational query language, resulting in SEQUEL. SEQUEL used English-like statements instead of mathematical symbols to represent queries declaratively. 3) Ray Boyce tragically passed away in 1974, but SEQUEL continued to be developed as part of the System R database project at IBM.

Uploaded by

rodrigo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
298 views

Early History of SQL: Donald D. Chamberlin

1) The authors Ray Boyce and Donald Chamberlin were introduced to Ted Codd's relational data model in 1972 and saw how it could represent database queries more compactly than existing languages. 2) They worked to design a more user-friendly relational query language, resulting in SEQUEL. SEQUEL used English-like statements instead of mathematical symbols to represent queries declaratively. 3) Ray Boyce tragically passed away in 1974, but SEQUEL continued to be developed as part of the System R database project at IBM.

Uploaded by

rodrigo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

[3B2-9] man2012040078.

3d 5/11/012 17:26 Page 78

Anecdotes

Early History of SQL


Donald D. Chamberlin
Editor: Craig Partridge

Ray Boyce and I first met E.F. (Ted) Codd at a symposium name of the employee’s manager. (This is a simple ex-
he organized at the IBM T.J. Watson Research Center in ample. In a real application, employees would be iden-
Yorktown Heights, New York, in 1972. Ray and I were tified by some unique identifier such as an employee
both recent hires at the Watson Center. I had recently number.) Table 1 shows the structure of the table
completed my PhD at Stanford University, and Ray had with four example rows.
completed his at Purdue University. We were members The third row of the table indicates that Baker’s sal-
of a recently reorganized IBM group that was looking ary is $50,000 and Baker’s manager is Smith. The first
for a mission. At that time, Ted Codd was a computer row indicates that Smith’s salary is $45,000, so Baker
scientist at IBM’s San Jose Research Laboratory and earns more than his manager. Similarly, Nelson’s salary
was proposing a new way of organizing data that he is $55,000, but Nelson’s manager is Baker, who earns
called the ‘‘relational data model.’’ $50,000, so Nelson also earns more than his manager.
One of the most important research areas in com- The result of the query, based on these four sample
puter science in the early 1970s was the development rows, is Baker and Nelson.
of systems and languages for handling what computer In his research papers, Codd introduced two rela-
scientists call persistent data. This term denotes data tional query languages, called Relational Algebra2 and
that remains in a computer system indefinitely, until Relational Calculus (also known as the Data Sublanguage
it is explicitly deleted. Systems for managing persistent Alpha3). Relational Algebra consists of several operators,
data were spreading quickly in the business world. A usually represented by symbols such as those in Figure 1.
database management language proposed by the Coda- Using these operators, the query about well-paid
syl Data Base Task Group (DBTG)1 was receiving a lot of employees could be represented as in Figure 2a.
attention. Ray and I spent some time studying this lan- Codd’s Relational Calculus was based on a notation
guage, learning concepts such as ‘‘currency indicators’’ used in formal logic, using an existential quantifier 9
and ‘‘set occurrence selection.’’ With a little practice, (meaning ‘‘for each’’) and a universal quantifier 8
we learned how to represent a database query in the (meaning ‘‘for all’’). Similar to Relational Algebra, Rela-
form of a program that navigated through a network tional Calculus could represent the well-paid employee
of pointers to find the desired information. query compactly (see Figure 2b).
Ray and I were impressed by how compactly Codd’s
Designing a Relational Language languages could represent complex queries. However,
For Ray and me, our exposure to the relational data at the same time, we believed that it should be possible
model at Codd’s research symposium was a revelation. to design a relational language that would be more
For the first time, we could see how a query that would accessible to users without formal training in mathe-
require a complex program in the DBTG language matics or computer programming. We believed that
could be reduced to a few simple lines using one of barriers to widespread acceptance of Codd’s languages
Codd’s relational languages. It became a game for the existed on two levels. The first barrier came from the
two of us to invent queries and challenge each other mathematical notation, which was hard to enter at a
to express them in various query languages. keyboard. This barrier was superficial and could be eas-
One of the queries that came out of this game was as ily dealt with by replacing symbols with keywords—for
follows: ‘‘Find names of employees who earn more than example, replacing p with ‘‘project’’ and 8 with ‘‘for
their managers.’’ The query was based on a three- all.’’ The more difficult barrier was at the semantic
column employee table. Each row of the table represented level. The basic concepts of Codd’s languages were
an employee and contained a name, a salary, and the adapted from set theory and symbolic logic. This was
natural given Codd’s background as a mathematician,
but Ray and I hoped to design a relational language
Table 1. Employee. based on concepts that would be familiar to a wider
population of users. We also hoped to extend the lan-
Name Salary Manager guage to encompass database updates and administra-
tive tasks such as the creation of new tables and
Smith 45,000 Harker
views, which had traditionally been outside the scope
Jones 40,000 Smith
of a query language.
Baker 50,000 Smith
After attending Codd’s symposium, Ray and I spent
Nelson 55,000 Baker
the next year experimenting with language designs.

78 IEEE Annals of the History of Computing Published by the IEEE Computer Society 1058-6180/12/$31.00  2012 IEEE
c
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 79

Our first attempt, called Square,4 was based


on the notion of mapping and used a sub-
script notation that was difficult to type.
When we moved to the San Jose Research
Laboratory in 1973 to join the System R proj-
ect, we began work on another new language
that we called Sequel. Sequel allowed the Figure 1. Examples of Relational Algebra operators.
well-paid-employee query to be represented
in a readable form free from mathematical papers: ‘‘SEQUEL: A Structured English Query
concepts and symbols, as in Figure 2c. Language,’’5 and ‘‘Using a Structured English
Ray and I hoped that, with a little practice, Query Language as a Data Definition Facility.’’6
users could learn to read queries like this al- As luck would have it, the first of these papers
most as though they were English prose. became quite well known, while the second
This example query might be read as follows: was never published outside IBM.
‘‘Find an employee (let’s call him ‘e’) and In 1974, about one month after present-
another employee (let’s call him ‘m’) where ing a paper on Sequel at a technical confer-
e’s manager matches m’s name (in other ence in Ann Arbor, Michigan, Ray Boyce
words, e’s manager is m) and e’s salary is died suddenly of a ruptured brain aneurysm
greater than m’s salary (in other words, at the age of 26, leaving behind a wife of
e earns more than his manager); then print five years and a 10-month-old daughter.
e’s name (for every such employee).’’ I often remember how much I enjoyed work-
It is important to note that the Sequel ver- ing with Ray. We had a seamless partnership.
sion of this query describes the information it In his brief career, Ray collaborated with Ted
is looking for but does not provide a detailed Codd on the Boyce-Codd Normal Form7 and
plan for how to find this information. This is with me on Sequel. I think that Ray would
why Sequel is called a declarative (rather than have been pleased to see the impact that his
a procedural) language. Translating the de- ideas have had on the world.
clarative query statement into a detailed After Ray’s untimely death, the Sequel lan-
plan for processing the query is the job of guage continued to evolve as a part of the
an optimizing compiler. System R project at San Jose Research Labora-
From the beginning, Sequel was intended to tory. System R was installed on an experi-
be used both for data manipulation (querying mental basis in three IBM customer sites,
and updating data) and for data definition (cre- and a more complete Sequel language design
ation of tables, views, and assertions). To em- was published in 1976,8 based in part on the
phasize this duality, Ray and I wrote two experience collected by early users. In 1977,

Figure 2. Three versions of the query, ‘‘Find names of employees who earn more than their managers.’’

October–December 2012 79
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 80

Anecdotes

the language to be simple enough to learn


easily and expressive enough to do useful
The SQL standard work. Other important contributions to the
success of SQL include the following:
has been helpful in
 The language benefited greatly from hav-
providing a mechanism ing robust early implementations on
multiple platforms. System R provided a
for the controlled multiuser implementation with transac-
tional semantics and a sophisticated opti-
evolution of the mizing compiler. At roughly the same
time, Oracle implemented the language
language. on the widely used Unix platform.
 By combining query, update, and admin-
istrative tasks into a single language, SQL
because of a trademark issue, the name makes it easy for authorized users to mod-
Sequel was shortened to SQL. ify database schemas and install new
applications while the system is running.
The SQL Standard These tasks had traditionally been per-
Commercial implementations of SQL, such formed by specialized database admin-
as Oracle and DB2, began to appear in the istrators during system maintenance
late 1970s and early 1980s. By 1986, a stan- intervals. Making every user his own data-
dard language definition called ‘‘Database base administrator removed an important
Language SQL’’ had been formally adopted bottleneck from application development.
by the ANSI and ISO standards groups.9 A  The SQL standard provided a common
conformance test suite for the SQL standard ecosystem in which vendors could de-
was developed by the National Institute of velop competing implementations, tools,
Standards and Technology, and many SQL- and training materials. The standard also
based products were validated by this test reassured users that they would not be-
suite between 1988 and 1996. New versions come dependent on a single software ven-
of the SQL standard were published in dor, although vendors had a regrettable
1996, 1999, 2003, 2006, and 2008. tendency to implement different subsets
The SQL standard has been helpful in of the standard and to include proprietary
providing a mechanism for the controlled features.
evolution of the language, providing a
forum in which both users and implement- Criticisms of SQL
ers have a voice. Over the years, the evolv- Like most successful languages, SQL has
ing standard has corrected many of the attracted its share of criticism, which has
initial deficiencies of SQL and has added tended to focus on the following issues.
many new features, including outer joins,
table expressions, recursion, triggered Orthogonality and Completeness. The ear-
actions, user-defined types and functions, liest versions of SQL lacked support for
and online analytic processing (OLAP) some aspects of the relational data model,
functions. In the hands of the ANSI X3H2 including primary keys and referential integ-
Committee, the definition of SQL has rity. The early language also lacked orthogon-
evolved from a 12-page research paper to ality because it did not allow subqueries to be
an international standard comprising hun- used in place of named tables and it failed to
dreds of pages. provide a way to name the columns of a
SQL has been a more successful query lan- query result. All these serious deficiencies
guage than Ray Boyce and I had any reason were corrected by the 1992 version of the
to expect in 1974. I think that the most im- SQL standard, illustrating the helpful influ-
portant ingredient in this success was Ted ence of the standards process.
Codd’s breakthrough work in defining the
relational data model and raising the level Nulls. SQL supports a ‘‘null value’’ that rep-
of abstraction with which users could inter- resents a data item that is missing or inappli-
act with stored data. While SQL has been cable. The null value is not comparable to
criticized as departing from some of Codd’s any other value, and for this reason, SQL
original principles, experience has shown implements a three-valued logic in which a

80 IEEE Annals of the History of Computing


[3B2-9] man2012040078.3d 5/11/012 17:26 Page 81

search condition might be neither true nor


false. Nulls and three-valued logic were
both introduced by Ted Codd in his early SQL was designed to be
papers, and Rule 3 of Codd’s famous ‘‘Twelve
Rules’’10 requires a relational database system used both as a stand-
to support nulls. Various writers have com-
plained that nulls and three-valued logic alone language for
make queries more confusing and optimiza-
tion more difficult. Researchers have pro- interactive queries and
posed other approaches to the problem of
missing data, but none of these approaches as an application
is without disadvantages. SQL lets users spec-
ify, on a column-by-column basis, where development language
nulls are permitted and where they are pro-
hibited. Over the years, nulls have proven for OLTP.
useful in the design of various features,
such as outer join, that have been added dur-
ing the evolution of the language. ‘‘impedance mismatch’’ tends to increase ap-
plication complexity and interfere with
Duplicates. Unlike Codd’s original defini- global optimization. One approach to this
tion of the relational data model, SQL per- problem has been the development of com-
mits duplicate rows to exist, both in a putationally complete SQL-based scripting
database table and in a query result. SQL languages such as Persistent Stored Modules
also allows users to selectively prohibit dupli- (PSM).11
cate rows in a table or in a query result. The
intent of this approach is to give users con- Legacy
trol over the potentially expensive process When Ray and I were designing Sequel in
of duplicate elimination. In some applica- 1974, we thought that the predominant use
tions, duplicate rows might be meaningful— of the language would be for ad-hoc queries
for example, in a point-of-sale system, a cus- by planners and other professionals whose
tomer might purchase several identical items domain of expertise was not primarily data-
in the same transaction. In other applica- base management. We wanted the language
tions, such as printing an address list, dupli- to be simple enough that ordinary people
cate values could be unexpected but users could ‘‘walk up and use it’’ with a minimum
might prefer not to pay the cost of detecting of training. Over the years, I have been sur-
and eliminating them. As in the case of nulls, prised to see that SQL is more frequently used
the SQL approach provides users with tools by trained database specialists to implement re-
to control duplicate rows according to the petitive transactions such as bank deposits,
needs of specific applications. The cost of credit card purchases, and online auctions.
sorting a million records to detect duplicates I am pleased to see the language used in a vari-
seemed more significant in 1974 than it does ety of environments, even though it has not
today, when an ordinary laptop has thou- proved to be as accessible to untrained users
sands of times more memory and processing as Ray and I originally hoped.
power than a mainframe computer of the Looking back at my experience at IBM Re-
1970s. search in the 1970s, I feel fortunate to have
been working at that place and time. Ted
Impedence Mismatch. SQL was designed to Codd, Ray Boyce, and the System R team
be used both as a stand-alone language for were wonderful people to work with, and
interactive queries and as an application de- the impact of our work has been gratifying.
velopment language for online transaction I am grateful for having had the opportunity
processing (OLTP). This is a helpful unifica- to participate in this work.
tion of concepts, but in OLTP applications,
SQL is usually embedded in (or called from)
a host programming language such as C or References
Java. Often the data types of the host language 1. ‘‘CODASYL Data Base Task Group,’’ April 71
are not the same as those of SQL, and the host Report, ACM, 1971.
language is usually more procedural, whereas 2. E.F. Codd introduced the operators of relational
SQL is more declarative. The resulting algebra in ‘‘Relational Completeness of Data

October–December 2012 81
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 82

Anecdotes

Base Sublanguages,’’ IBM Research Report RJ textbooks on database management. For exam-
987, Mar. 1972. Several versions of the rela- ple, see R. Ramakrishnan and J. Gehrke, Data-
tional algebra exist, all of which include some base Management Systems, 3rd ed., McGraw
version of the operators used in this paper. Hill, 2003, pp. 615–617.
There is no recognized standard notation for 8. D. Chamberlin et al., ‘‘SEQUEL 2: A Unified
these operators. The notation used in this article Approach to Data Definition, Data Manipulation,
is taken from H. Garcia-Molina, J. Ullman, and and Control.’’ IBM J. Research and Development,
J. Widom, Database Systems: the Complete Book, vol. 20, Nov. 1976, p. 560.
Prentice-Hall, 2002, pp. 189–237. 9. See the ANSI/ISO/IEC 9075-1, Information
3. E.F. Codd, ‘‘A Data Base Sublanguage Founded technology - Database languages - SQL - Part
on the Relational Calculus,’’ Proc. ACM SIGFIDET 1: Framework (SQL/Framework); ANSI/ISO/IEC
Workshop on Data Description, Access, and Con- 9075-2, Information technology - Database
trol, ACM Press, 1971, pp. 35–68. languages - SQL - Part 2: Foundation (SQL/
4. R. Boyce et al., ‘‘Specifying Queries as Relational Foundation); and so on at http://www.ansi.org
Expressions: the SQUARE Sublanguage,’’ Comm. or http://www.iso.ch.
ACM, vol. 18, no. 11, 1975, pp. 621–628. 10. E.F. Codd., ‘‘Does Your DBMS Run by the Rules?’’
5. D. Chamberlin and R. Boyce, ‘‘SEQUEL: A Struc- Computer World, vol. 21, Oct. 1985. See also
tured English Query Language,’’ Proc. ACM http://en.wikipedia.org/wiki/Codd’s_12_rules.
SIGFIDET Workshop on Data Description, Access, 11. ANSI/ISO/IEC 9075-4, Database Language SQL,
and Control, ACM Press, 1974, pp. 249–264. Part 4: Persistent Stored Modules (SQL/PSM);
See also http://www.almaden.ibm.com/cs/ http://www.ansi.org.
people/chamberlin/sequel-1974.pdf.
Donald D. Chamberlin is an adjunct professor
6. R. Boyce and D. Chamberlin, ‘‘Using a Structured
at the University of California, Santa Cruz. His work
English Query Language as a Data Definition Fa-
on SQL and System R at IBM has been recognized
cility,’’ IBM Research Report RJ1318, Dec. 1973. by the ACM SIGMOD Innovation Award and by
7. The Boyce-Codd Normal Form is a database de- the Computer History Museum. Contact him at
sign discipline taught in most advanced [email protected].

82 IEEE Annals of the History of Computing

You might also like