0% found this document useful (0 votes)
35 views5 pages

A Simpler (And Better) SQL Approach To Relational Division

This article discusses two approaches to implementing the relational division operator in SQL - a "classical version" using nested subqueries and a "simplified version" using membership tests, grouping, counting, and the HAVING clause. The authors found that students struggled more with the classical version and it had poorer computational performance. They propose teaching the simplified version instead as it is more intuitive and easier to understand for students. An example is provided to illustrate relational division and how each SQL implementation would solve the example query. The authors conducted a study finding students had more success with the simplified version compared to the classical version.

Uploaded by

22520327
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views5 pages

A Simpler (And Better) SQL Approach To Relational Division

This article discusses two approaches to implementing the relational division operator in SQL - a "classical version" using nested subqueries and a "simplified version" using membership tests, grouping, counting, and the HAVING clause. The authors found that students struggled more with the classical version and it had poorer computational performance. They propose teaching the simplified version instead as it is more intuitive and easier to understand for students. An example is provided to illustrate relational division and how each SQL implementation would solve the example query. The authors conducted a study finding students had more success with the simplified version compared to the classical version.

Uploaded by

22520327
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Journal of Information Systems Education, Vol.

13(2)

Teaching Tip

A Simpler (and Better) SQL Approach to Relational


Division

Victor M. Matos
Computer and Information Science Department
Cleveland State University
Cleveland, Ohio 44114
[email protected]

Rebecca Grasser
Information Systems Department
Lakeland Community College
Kirtland, Ohio 44094
[email protected]

ABSTRACT

A common type of database query requires one to find all tuples of some table that are related to each and every one of
the tuples of a second group. In general those queries can be solved using the relational algebra division operator.
Relational division is very common and appears frequently in many queries. However, we have found that the phrasing
of this operator in SQL seems to present an overwhelming challenge to novice and experienced database programmers.
Furthermore, students seem to have the most problems with the SQL version commonly recommended in the database
literature. We present an alternative solution that is not only more intuitive and easier to deliver in the classroom but
also exhibits a better computational performance.

Keywords: Database systems, SQL, Division operator, Relational algebra, Classroom presentation, Human reactions,
Code performance.

1. INTRODUCTION critiques include code complexity, lack of intuitive


interpretation, and departure from the simple nature of
Proficiency in SQL is an important skill for IS most SQL constructs. In this note, we suggest an
students. SQL is a relatively small and easy to use alternative implementation of the – rather common –
database query language. One virtue of the language is division operator that greatly simplifies the classroom
its continuous simplicity. Complex queries could be presentation of this important database operator. In
progressively decomposed into a collection of simpler addition to clarity, the recommended solution outper-
SQL interrelated fragments. This structured approach forms, by many times, the traditional SQL code. We
works on most cases. Unfortunately, the traditional have collected empirical evidence suggesting that
SQL implementation of the relational division opera- students find the alternate version easier to interpret
tor is an exception to this observation. We have and maintain.
consistently found this topic to be rather troublesome
for the students (and the instructor, too). Some

85
Journal of Information Systems Education, Vol. 13(2)

2. THE RELATIONAL DATA MODEL AND THE options b2 and b3


DIVISION OPERATOR (such as b2: leather seats, and b3: winter package).
The resulting table T3[A] identifies the customers
The relational data model deals with data held into who acquired at least those items listed in table T2[B].
simple two-dimensional tables. Relational algebra is a
compact symbolic language used to query relational 3. SQL IMPLEMENTATIONS OF THE
databases. The basic operators of the relational DIVISION OPERATOR
algebra are the projection, selection, Cartesian
product, union and difference (Codd 1970; Codd A large number of highly regarded database books
1972). Those operators are the foundation for modern (Date 1995; Desai 1990; Elmasri 1999; Kroenke 2000;
database query languages and have been extensively O’Neil 1999; Ramakrishnan 2000; Watson 1999)
discussed in the database literature. For convenience, describe the implementation of the division operator
other useful operators were added such as different using the SQL syntax of Q1 (below). Even though this
forms of joins (general, natural, left/right outer), solution is commonly accepted in the database
rename, intersection, and division. The division literature, we have found that this syntactical version
operator is less common than simple join-select- is not only difficult for the programmers to understand
project queries. However it is naturally applied in and maintain, but also computationally complex.
many common everyday queries. For instance, Instead, we propose the alternative syntactical varia-
division could be used in solving the following tion called Q0.
problems:
(a) Find suppliers who supply all the red parts, Q0: Alternate Version. Computing Relational
(b) Find students who have taken all the core Division using membership test, group-by, counting,
courses, and having SQL constructors.
(c) Find customers who have ordered all items
from a given line of products, and so on. Q0: SELECT A
The characteristic pattern of this family of inquires is FROM T1
WHERE B IN ( SELECT B FROM T2 )
the attempt to verify whether or not a candidate GROUP BY A
subject is related to each of the values held in a base HAVING COUNT(*) =
set. That base set is called the divisor (or denominator ( SELECT COUNT (*) FROM T2 );
T2[B]), and the table holding the subject’s data is
called the dividend (or numerator T1[A,B]). Without
loosing generality, the expression T1[A,B] / T2[B] Version Q0 uses membership test, group-by, counting,
selects the A-values from the dividend table T1[A,B], and having SQL constructors. The “GROUP BY A”
whose B-values are a super-set of those B-values held clause is responsible for splitting the rows and
in the divisor table T2[B]. creating non-overlapping A-partitions. This is
equivalent to separating T1[A,B] (Figure 1) according
2.1 An Example to customer. Tuples in each A-group have already
Consider the tables T1[A,B] and T2[B] depicted in been restricted by the WHERE… predicate to those
Figure 1. T1 represents a list of customers and the whose B-value matching any entry in T2[B]. To
options they bought for their new cars. Column A is continue with the example, this will select from
the customer identification number and B represents T1[A,B] customers who have purchased either options
the option included in the car. For instance, customer b2 or b3. The count of tuples in each A-partition is
a1 bought her vehicle with the b1, b2, and b3 options. compared with the size of table T2. In our example,
Table T2[B] represents a particular set of options the two rows selected from T1 need to match the two
rows in T2. Only those A-groups HAVING… the
T1 A B T2 B T3 A same count are selected, and their A-value is finally
a1 b1 b2 a1 selected.
a1 b2 b3 a3
Q1: Classical Version.
a1 b3
a2 b1 Q1: SELECT DISTINCT x.A
a2 b3 A: Customer Number FROM T1 AS x
WHERE NOT EXISTS
a3 b2 B: Car’s Option ID ( SELECT * FROM T2 y
a3 b3 WHERE NOT EXISTS
a3 b4 T3 = T1 / T2 ( SELECT * FROM T1 AS z
WHERE (z.A=x.A) AND
a4 b1 . (z.B=y.B)) );
Figure 1. Customers who bought vehicles including

86
Journal of Information Systems Education, Vol. 13(2)

other subjects. However, less than half of the cohort


This version is based on deeply nested sub-queries was able to correctly solve query Q0 and only 30% of
which are interconnected using doubly negated respondents were able to formulate the correct answer
EXISTS functions. The identifiers x, y, and z are for Q1. This is a disappointing score for a group of
aliases of the tables T1, T2, and T1 respectively. Here otherwise good students.
the outermost SELECT statement picks a candidate
x.A as a potential answer. This candidate becomes 7. CONCLUSION
part of the final solution if there is not a tuple y in T2
(the divisor table) for which it doesn’t exist a tuple z The code Q1 is a classical SQL solution for relational
in T1 that matches the candidate’s ID (x.A=z.A) but division. However, if you combine the poor
fails to match the current y value (y.B = z.B). If such performance of Q1 to its high degree of relative
y tuple exists it would create a contradiction, because difficulty, it is clear that other equivalent but
there is data in T2 to which the candidate is not improved SQL code should be used. We strongly
related to, and therefore the candidate must be recommend Q0, not only for its enhanced pedagogical
rejected. value, but also for its better computational speed.
Students need to be aware that performance could be
4. CODE PERFORMANCE critical in real life production environments,
particularly if the computation involves large data
In (Matos 2001) an operational comparison of Q0, Q1, sets. We believe the syntactical construction of Q0
and other SQL versions of the division is described. allows the student to grasp the concepts of
That research shows that, for some samples, Q0 was implementing SQL division in a more intuitive way.
between 300 to 700 times faster than Q1. The data-
base used in (Matos 2001) is similar to that of Figure 8. BIBLIOGRAPHY
1, and the performance estimation is controlled by the
number of records in the table and the coherence Codd, E.F., "A Relational Model of Data for Large
between the two tables. Q0 tends to be constant or Shared Data Banks". CACM 13, No. 6, June
predictably linear while Q1 in general is slow and 1970.
sensitive to changes of the size of the numerator table Codd, E.F., "Relational Completeness of Data Base
as well as the selectivity factor. Sublanguages", In Database Systems, Courant
Computer Science Symposia Series 6.
5. ZERO DIVISION Englewoods Cliffs, NJ, Prentice Hall, 1972
Date, C. J., An Introduction to Database Systems. 6th
When the divisor table T2[B] is empty, the code for Edition, 1995. ISBN 0-201-54329-4.
Q0 and Q1 produce two different results. Q0 reports Date C.J., Darwen H. "Into the Great Divide",
an empty set, whereas Q1 enumerates each of the A- appeared in Relational Database Writings 1989-
values in T1[A]. The lack of intuitive interpretation 1991, Ed. Addison-Wesley, 1992. ISBN 0-201-
for Q1’s result creates a serious philosophical problem 82459-0.
(Date 1991). An interesting class discussion involves Desai, B. An Introduction to Database Systems. Ed.
looking at the outputs produced by each query - where West Publishing CO., 1990. ISBN 0-314-66771-
there is a zero divisor - and asking the students to 7.
interpret the meaning of the data. This discussion will Elmasri, R., Navathe, SR. Fundamentals of Database
show why explaining the results of an application to Systems, Third Edition. Addison-Wesley
non-technical staff is an important skill for IS Publishing Co. 1999. ISBN 0-8053-1755-4.
professionals. Kroenke, David. Database Processing Fundamentals,
Design and Implementation. Ed. Prentice-Hall,
6. HUMAN PERCEPTIONS 2000. ISBN 0-13-084816-6.
Matos, V., Grasser, R., “Assessing the Performance
In a forthcoming paper, the authors provide an of Various SQL Versions of the Relational
empirical estimation of difficulty for Q0 and Q1. In a Division Operator”, Database Management,
survey conducted among graduate and undergraduate Auerbach Pub., Feb 2001.
database students we have found that regardless of O’Neil, Patrick, Database Principles, Programming,
their academic background, experience, and practitio- Performance. Ed. Morgan Kauffman Pub., 1999.
ner’s level, the experimental subjects ranked query Q1 Ramakrishnan R., and Gehrke J., Database
as more difficult than Q0. Subjects with an Engineer- Management Systems 2nd Edition. Ed. McGraw-
ing or Science major, or those students with some Hill, 2000. ISBN 0-07-232206-3.
previous database experience, were able to understand Watson, Richard. Database Management – Databases
and manipulate both Q0 and Q1 with more ease than and Organization. 2nd Edition. Ed. Wiley, 1999.

87
Journal of Information Systems Education, Vol. 13(2)

ISBN 0-471-18074-2.

AUTHOR BIOGRAPHIES
Victor Matos is an Associate
Professor of Computer and
Information Science at Cleveland
State University in Cleveland,
Ohio.

Rebecca Grasser is an
Assistant Professor of Information
Systems at Lakeland Community
College in Kirtland, Ohio.

88
Information Systems & Computing
Academic Professionals

STATEMENT OF PEER REVIEW INTEGRITY


All papers published in the Journal of Information Systems Education have undergone rigorous peer review. This includes an
initial editor screening and double-blind refereeing by three or more expert referees.

Copyright ©2002 by the Information Systems & Computing Academic Professionals, Inc. (ISCAP). Permission to make digital
or hard copies of all or part of this journal for personal or classroom use is granted without fee provided that copies are not made
or distributed for profit or commercial use. All copies must bear this notice and full citation. Permission from the Editor is
required to post to servers, redistribute to lists, or utilize in a for-profit or commercial use. Permission requests should be sent to
the Editor-in-Chief, Journal of Information Systems Education, [email protected].
ISSN 1055-3096

You might also like