Early History of SQL: Donald D. Chamberlin
Early History of SQL: Donald D. Chamberlin
Anecdotes
Ray Boyce and I first met E.F. (Ted) Codd at a symposium name of the employee’s manager. (This is a simple ex-
he organized at the IBM T.J. Watson Research Center in ample. In a real application, employees would be iden-
Yorktown Heights, New York, in 1972. Ray and I were tified by some unique identifier such as an employee
both recent hires at the Watson Center. I had recently number.) Table 1 shows the structure of the table
completed my PhD at Stanford University, and Ray had with four example rows.
completed his at Purdue University. We were members The third row of the table indicates that Baker’s sal-
of a recently reorganized IBM group that was looking ary is $50,000 and Baker’s manager is Smith. The first
for a mission. At that time, Ted Codd was a computer row indicates that Smith’s salary is $45,000, so Baker
scientist at IBM’s San Jose Research Laboratory and earns more than his manager. Similarly, Nelson’s salary
was proposing a new way of organizing data that he is $55,000, but Nelson’s manager is Baker, who earns
called the ‘‘relational data model.’’ $50,000, so Nelson also earns more than his manager.
One of the most important research areas in com- The result of the query, based on these four sample
puter science in the early 1970s was the development rows, is Baker and Nelson.
of systems and languages for handling what computer In his research papers, Codd introduced two rela-
scientists call persistent data. This term denotes data tional query languages, called Relational Algebra2 and
that remains in a computer system indefinitely, until Relational Calculus (also known as the Data Sublanguage
it is explicitly deleted. Systems for managing persistent Alpha3). Relational Algebra consists of several operators,
data were spreading quickly in the business world. A usually represented by symbols such as those in Figure 1.
database management language proposed by the Coda- Using these operators, the query about well-paid
syl Data Base Task Group (DBTG)1 was receiving a lot of employees could be represented as in Figure 2a.
attention. Ray and I spent some time studying this lan- Codd’s Relational Calculus was based on a notation
guage, learning concepts such as ‘‘currency indicators’’ used in formal logic, using an existential quantifier 9
and ‘‘set occurrence selection.’’ With a little practice, (meaning ‘‘for each’’) and a universal quantifier 8
we learned how to represent a database query in the (meaning ‘‘for all’’). Similar to Relational Algebra, Rela-
form of a program that navigated through a network tional Calculus could represent the well-paid employee
of pointers to find the desired information. query compactly (see Figure 2b).
Ray and I were impressed by how compactly Codd’s
Designing a Relational Language languages could represent complex queries. However,
For Ray and me, our exposure to the relational data at the same time, we believed that it should be possible
model at Codd’s research symposium was a revelation. to design a relational language that would be more
For the first time, we could see how a query that would accessible to users without formal training in mathe-
require a complex program in the DBTG language matics or computer programming. We believed that
could be reduced to a few simple lines using one of barriers to widespread acceptance of Codd’s languages
Codd’s relational languages. It became a game for the existed on two levels. The first barrier came from the
two of us to invent queries and challenge each other mathematical notation, which was hard to enter at a
to express them in various query languages. keyboard. This barrier was superficial and could be eas-
One of the queries that came out of this game was as ily dealt with by replacing symbols with keywords—for
follows: ‘‘Find names of employees who earn more than example, replacing p with ‘‘project’’ and 8 with ‘‘for
their managers.’’ The query was based on a three- all.’’ The more difficult barrier was at the semantic
column employee table. Each row of the table represented level. The basic concepts of Codd’s languages were
an employee and contained a name, a salary, and the adapted from set theory and symbolic logic. This was
natural given Codd’s background as a mathematician,
but Ray and I hoped to design a relational language
Table 1. Employee. based on concepts that would be familiar to a wider
population of users. We also hoped to extend the lan-
Name Salary Manager guage to encompass database updates and administra-
tive tasks such as the creation of new tables and
Smith 45,000 Harker
views, which had traditionally been outside the scope
Jones 40,000 Smith
of a query language.
Baker 50,000 Smith
After attending Codd’s symposium, Ray and I spent
Nelson 55,000 Baker
the next year experimenting with language designs.
78 IEEE Annals of the History of Computing Published by the IEEE Computer Society 1058-6180/12/$31.00 2012 IEEE
c
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 79
Figure 2. Three versions of the query, ‘‘Find names of employees who earn more than their managers.’’
October–December 2012 79
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 80
Anecdotes
October–December 2012 81
[3B2-9] man2012040078.3d 5/11/012 17:26 Page 82
Anecdotes
Base Sublanguages,’’ IBM Research Report RJ textbooks on database management. For exam-
987, Mar. 1972. Several versions of the rela- ple, see R. Ramakrishnan and J. Gehrke, Data-
tional algebra exist, all of which include some base Management Systems, 3rd ed., McGraw
version of the operators used in this paper. Hill, 2003, pp. 615–617.
There is no recognized standard notation for 8. D. Chamberlin et al., ‘‘SEQUEL 2: A Unified
these operators. The notation used in this article Approach to Data Definition, Data Manipulation,
is taken from H. Garcia-Molina, J. Ullman, and and Control.’’ IBM J. Research and Development,
J. Widom, Database Systems: the Complete Book, vol. 20, Nov. 1976, p. 560.
Prentice-Hall, 2002, pp. 189–237. 9. See the ANSI/ISO/IEC 9075-1, Information
3. E.F. Codd, ‘‘A Data Base Sublanguage Founded technology - Database languages - SQL - Part
on the Relational Calculus,’’ Proc. ACM SIGFIDET 1: Framework (SQL/Framework); ANSI/ISO/IEC
Workshop on Data Description, Access, and Con- 9075-2, Information technology - Database
trol, ACM Press, 1971, pp. 35–68. languages - SQL - Part 2: Foundation (SQL/
4. R. Boyce et al., ‘‘Specifying Queries as Relational Foundation); and so on at http://www.ansi.org
Expressions: the SQUARE Sublanguage,’’ Comm. or http://www.iso.ch.
ACM, vol. 18, no. 11, 1975, pp. 621–628. 10. E.F. Codd., ‘‘Does Your DBMS Run by the Rules?’’
5. D. Chamberlin and R. Boyce, ‘‘SEQUEL: A Struc- Computer World, vol. 21, Oct. 1985. See also
tured English Query Language,’’ Proc. ACM http://en.wikipedia.org/wiki/Codd’s_12_rules.
SIGFIDET Workshop on Data Description, Access, 11. ANSI/ISO/IEC 9075-4, Database Language SQL,
and Control, ACM Press, 1974, pp. 249–264. Part 4: Persistent Stored Modules (SQL/PSM);
See also http://www.almaden.ibm.com/cs/ http://www.ansi.org.
people/chamberlin/sequel-1974.pdf.
Donald D. Chamberlin is an adjunct professor
6. R. Boyce and D. Chamberlin, ‘‘Using a Structured
at the University of California, Santa Cruz. His work
English Query Language as a Data Definition Fa-
on SQL and System R at IBM has been recognized
cility,’’ IBM Research Report RJ1318, Dec. 1973. by the ACM SIGMOD Innovation Award and by
7. The Boyce-Codd Normal Form is a database de- the Computer History Museum. Contact him at
sign discipline taught in most advanced [email protected].