The BUCKY Object-Relational Benchmark
The BUCKY Object-Relational Benchmark
net/publication/2383605
CITATIONS READS
47 155
7 authors, including:
Dhaval N. Shah
University of California, Berkeley
2 PUBLICATIONS 117 CITATIONS
SEE PROFILE
All content following this page was uploaded by Dhaval N. Shah on 13 June 2013.
major
Student Employee
hasTaken advisor
Instructor
majors student
section advisees
chair
coursesOffered
dept students
teacher
Courses sections CourseSection teaches
course
3 BUCKY Database Description that every instructor will either be a teaching assistant or
a professor. In addition to these types, each of them has a
The database for the BUCKY benchmark is modeled after a corresponding table to hold its instances (i.e., Person t has
university database application. Figure 1 gives a graphical the Person table, Employee t has the Employee table, Stu-
sketch of the schema. The lines in the gure from Person to dent t has the Student table, and so on), and these tables are
Student, Person to Employee, Student to TA, Employee to contained in a table/sub-table hierarchy that mirrors that
Sta , Employee to Instructor, Instructor to TA, and Instruc- of the type hierarchy. A complete SQL3-style description
tor to Professor represent inheritance among types. The re- of the object-relational BUCKY schema is given in the full
maining lines represent relationships between instances of paper (tt http://www.cs.wisc.edu/~naughton/bucky.html).
types, and are labeled on each end with the name by which Since there is no direct way to model inheritance in re-
the relationship is known at that end. Though BUCKY is lational DDL, we created a separate table for each non-
designed to be run on an object-relational system, as we abstract type in the hierarchy (Employee, Instructor, Stu-
mentioned earlier, it can also be run on a relational system dent, and TA), repeating their common elds in each ta-
by appropriately mapping its object features onto relational ble de nition. We felt this was the most natural mapping
features. In this section we discuss the key features of both to use; a complete DDL description for this version of the
versions of the BUCKY schema in order to make sure that BUCKY schema is given in the full paper. An alternative
their designs are clear; details of each version can be found would have been to have a single table with the union of
in the Appendices. all the attributes in the hierarchy plus a type tag, with null
values in attributes that do not apply to a particular row.
3.1 Inheritance However, we were worried that this approach would end up
wasting too much space (depending on how null attributes
Using object-relational DDL, the natural way to model the are represented by the system). Another alternative would
information in BUCKY about university people is by having have been to use a \vertically decomposed" schema, where
an inheritance hierarchy rooted at a person type. Thus, the each subtype has a corresponding table that contains the
object-relational BUCKY has a root row type called Per- key for its least speci c supertype(s)'s table (e.g., Person,
son t that contains the attributes common to all university- for Employee) plus only those attributes unique to the type
aliated people. Person t has two subtypes, Student t and (e.g., the Employee table under this scheme would have just
Employee t, that add student- and employee-speci c infor- four attributes: id, from Person, plus dateHired, status,
mation. Employee t has two subtypes, Sta t and Instruc- and worksIn). However, we were concerned that this ap-
tor t, that add information speci c to non-instructional sta proach would require too many joins to reassemble objects.
and instructors, respectively. Finally, there are two sub- (It would be interesting to experiment with these other two
types of Instructor t, namely TA t and Professor t, as well; mapping alternatives in the future.)
TA t is also a subtype of Student t (providing a test case
for multiple inheritance). In a BUCKY database, there are 3.2 References
no instances of the non-leaf (super) types, so these are like
\abstract classes" in C++ parlance; only the leaf types ac- Another salient feature of object-relational DDL is its direct
tually have instances. What this means, for example, is support for inter-object references. Among its attributes,
the Student t type denotes a student's major using a refer- modeled by adding an additional Kids table with an id at-
ence to a row of type Department t: tribute, which is a foreign key referencing the Person table,
plus a kidname attribute, which is the string name of one of
CREATE ROW TYPE Student_t ( the referenced person's children.
...
major Ref(Department_t),
...
3.4 Abstract Data Types
) One of the key features of the O-R paradigm is an abstract
UNDER Person_t; data type (ADT) facility that enables users to de ne their
The department row type has a corresponding \inverse" own data types for columns of tables. These user-de ned
reference to the set of students majoring in that department: types can then be used in SQL commands, just like the
built-in (system-de ned) types, and users can also de ne
CREATE ROW TYPE Department_t ( their own functions to operate on ADT instances. To test
... this facility, the BUCKY schema includes a data type called
majors Set(Ref(Student_t)), LocationAdt which is equivalent to the following C++ class
... de nition:
);
class LocationAdt {
It should be noted that the presence of such inverse sets private:
is not strictly required, as the student/major relationship is int lat;
fully captured by the reference contained in the Student t int lon;
type. However, we included this set in BUCKY anyway, as public:
it is in the \spirit" of object-relational data modeling, which int extract_latitude() { return(lat); }
encourages the representation of (binary) relationships in a int extract_longitude() { return(lon); }
bi-directional manner. Unfortunately, unlike some OODB float distance(LocationAdt& loc)
systems, which allow users to tell the system about the in- { return(sqrt((this->lat - loc->lat)**2 +
verse nature of relationships of this sort, we are not aware (this->lon - loc->lon)**2)); }
of any current O-R product that has DDL support for mak- }
ing such assertions. SQL3 provides no such support either Currently, di erent object-relational database systems
| in fact, support for collection-valued attributes was very take di erent approaches to supporting such ADTs. Some
recently moved out of SQL3. We retain them in BUCKY provide SQL3-style \value ADTs", where the structural con-
because current O-R products do provide support for them, tent of an ADT is de ned in SQL using a row-de nition-like
and we therefore expect them to reappear as an SQL object syntax, thus declaring its internal structure to the DBMS.
extension in the not-too-distant future. Others provide \black box ADTs" instead, where the DBMS
In relational DDL, the relationship is modeled as a key/ is given nothing more than total size information for each
foreign key pair: ADT. In the relational BUCKY schema, where no ADT sup-
CREATE TABLE Student ( port is assumed, we simply un-encapsulate the LocationAdt
... type; each of its two data elements becomes a eld in each
majorDept Integer REFERENCES Department, of the relational tables that has a LocationAdt eld in its
... corresponding object-relational table (i.e., in Department,
); Sta , Professor, Student, and TA).
CREATE TABLE Department ( 3.5 Methods
...
deptNo Integer NOT NULL PRIMARY KEY,
Most object-relational systems allow functions to be written
...
either in SQL (for relatively simple functions) or in an ex-
);
ternal language like C or C++ (for more complicated func-
tions). To test both avors, BUCKY includes some func-
Since the relational model doesn't support set-valued at- tions written each way. The Person row object type and
tributes, there is no analogy in the relational case to the each of its subtypes have a salary function, and these func-
majors set in the object-relational schema's Department t tions are written in SQL. The three LocationAdt functions
type. The relationship is less \directional" in the relational are written in C (but any external language is acceptable
case, as reconstructing it via a query involves writing a join here). For the salary function, BUCKY demands late bind-
clause (s.majorDept = d.deptNo) that has no inherent di- ing. E.g., for Employees that are Professors, the following
rectionality. function is called to compute their overall salary based on
their 9-month academic year salary plus their degree of sum-
3.3 Sets mer support:
Another di erence between the object-relational BUCKY CREATE FUNCTION salary(p Professor_t)
schema and the relational version is the availability of set- RETURNS numeric
valued attributes for storing sets of instances of base data RETURN p.AYSalary * (9 + p.monthSummer) / 9.0;
types. For example, in object-relational BUCKY, the type The de nitions for each of BUCKY's SQL functions are
de nition for Person t includes an attribute kidNames of given in the full paper, as are the SQL function signatures for
type Set(Varchar(10)); it contains a set of strings, where each of the methods of LocationAdt. Since most SQL-based
each one is the name of one of the person's children. In relational DBMSs do not provide an equivalent of ADTs or
the relational model, since there are no nested sets, this is
Parameter Description Table Cardinalities number of summer months). We generated these numbers
Parameter Value Table Cardin. so that Query 5, which asks for all employees making over
NumStudents 50000 Student 50000 $96000, returns about 5% of the Employees in the database.
NumDepts 250 Department 250 Of course, indices should be created on the data in order
TAsPerDept 100 TA 25000 to speed up the queries as much as possible. The strategy for
Sta PerDept 100 Sta 25000 creating indices should be to look at the benchmark, query
ProfsPerDept 100 Professor 25000 by query, to determine (for each query) what indices will
KidsPerPerson 2.5 Kids 116759 potentially improve their performance. It is legal to create
CoursesPerDept 50 Course 12500 the indices after the data has been bulk-loaded, so as not
SectionsPerCourse 2 CourseSection 50000 to slow down bulk-loading, and to report the bulk-loading
SemestersPerSection 2 Enrolled 150000 time separately from the index creation time.
StudentsPerSection 20
CoursesPerStudent 2 5 BUCKY Queries and Preliminary Results
Figure 2: Parameter setting for populating the BUCKY This section describes the BUCKY benchmark's query set.
database. As described earlier, we will present two sets of queries that
should be run against the system|one set that exercises its
ADT functions, the relational version of the BUCKY bench- object-relational (O-R) capabilities, and another set that
mark stores the location data in two columns of the a ected uses just the relational subset of the system. As we describe
tables (as described above) and performs the salary com- each BUCKY query, we also explain its role in the bench-
putations directly in the relational versions of BUCKY's mark.
ADT test queries (which has the obvious disadvantage of
de-encapsulating the details of the salary computations). 5.1 SINGLE-EXACT: Exact-Match Over One Table
Find the address of the sta member with id 6966.
4 Experimental Setup
This is a simple exact-match lookup. The relational and
In this section, we explain how a target system should be set O-R versions of this query look the same:
up in order to run BUCKY and obtain meaningful numbers.
The queries in the benchmark should be run \cold", that SELECT e.name, e.street, e.city, e.state, e.zipcode
is, with the bu er pool being empty. Moreover, in enviro- FROM Staff e WHERE e.id = 6966;
ments where database pages can be cached in the operating
system's le bu ers, the le system cache should be cold as This rst test mainly serves to provide a performance
well. To ush the database bu er pool between queries, a baseline that can be helpful when interpreting results of later
huge table that is not used in the benchmark queries can be queries.
scanned. To ush the Unix bu er pool, a huge le that is
not a part of the database should be scanned. We found in 5.2 HIER-EXACT: Exact-Match Over Table Hierarchy
our experimentation that we were indeed able to generate Find the address of the employee with id 6966.
repeatable query running times this way, so this strategy is
e ective. (This can be veri ed by running queries 10 times In O-R SQL, this query|which must search the Em-
with ushing; the 10th time should match the rst if no ployee table and its subtables|simply looks like:
signi cant data caching is occurring between queries.)
The (self-explanatory) parameter settings shown in Fig- SELECT e.name, e.street, e.city, e.state, e.zipcode
ure 2 are to be used for populating the BUCKY database; we FROM Employee e WHERE e.id = 6966;
show both the parameter values and the resulting table sizes
(in terms of the number of rows). While this is a relatively In relational SQL, searching all these types requires to
small data set, we have found it to be sucient for generat- explictly union the relational schema's separate tables, yield-
ing interesting ORDBMS performance results and tradeo s ing:
given the current state of the technology.
A few of the attribute value distributions are important SELECT e.name, e.street, e.city, e.state, e.zipcode
to the BUCKY queries, so we mention them here. The kids FROM Staff e WHERE e.id = 6966
for each person are generated by, for each person, (1) gen- UNION ALL
erating a number between 0 and 99, then adding kids
x
SELECT e.name, e.street, e.city, e.state, e.zipcode
\girlname " and \boyname ", then (2) with probability 1 4
x x =
FROM Professor e WHERE e.id = 6966
generating another kid with name \girlname ", where is
y y
UNION ALL
randomly chosen between 100 and 1000, then with probabil- SELECT e.name, e.street, e.city, e.state, e.zipcode
ity (1 4)2 choosing another such girlname, etc. This means
=
FROM TA e WHERE e.id = 6966;
that everyone has at least one boy and girl, and that the This tests the eciency of the O-R system's handling of
boy and girl share the same numeric sux on their names; queries over subtable hierarchies, measuring the impact of
25% of the people have one additional girl, 12.5% have two the system's approach to scanning and indexing over hier-
additional girls, and so on. archies.
The birth dates are uniformly distributed between 1940
and 1991. Salaries are more complex, since each subclass of
Employee represents the salary in a di erent way (i.e., Sta
have an annualSalary, TAs have a monthly salary and a per-
cent time, and Professors have a 9-month salary plus some
5.3 SINGLE-METH: Method Query Over One Table The relational SQL query is almost identical (with ids
Find all Professors who make more than 150000 instead of oids):
per year. SELECT s1.id, s1.name, s1.city,
s2.id, s2.name, s2.city
In O-R SQL, the query involves invoking the salary method FROM Staff s1, Staff s2
(whose body is written in SQL): WHERE s1.birthdate = s2.birthdate AND
SELECT p.name, p.street, p.city, p.state, p.zipcode s1.zipcode = s2.zipcode AND s1.id < s2.id;
FROM Professor p WHERE salary(p) >= 150000; This is the baseline test for join processing, hopefully ver-
In relational SQL, there is no salary method, so the query ifying that the O-R query is just as ecient as the relational
is instead: query for regular joins.
SELECT p.name, p.street, p.city, p.state, p.zipcode 5.6 HIER-JOIN: Relational Join Over Table Hierarchy
FROM Professor p
WHERE (p.AYSalary * (9 + p.MonthSumer) / 9.0) Find all persons with the same birthdate who live
>= 150000; in the same zipcode aera.
This test establishes the eciency of the O-R system's This is the same query, but over the hierarchy. In O-R
approach to indexing on function results (as compared to SQL, it looks like:
indexing on stored relational attributes).
SELECT p1.id, p1.name, p1.city,
5.4 HIER-METH: Method Query Over Table Hierarchy p2.id, p2.name, p2.city
FROM Person p1, Person p2
Find all Employees who make more than 96000 WHERE p1.birthDate = p2.birthDate AND
per year. p1.oid < p2.oid AND p1.zipcode = p2.zipcode;
The query returns about 18% of the Sta , TA, and Professor In relational SQL, Query HIER-JOIN is ten-way union
objects (13191 tuples). The salaries of professors are uni- query; each of the ten arms of the union consists of a join be-
formly distributed between 30K and 129K and the salary of tween a pair of the tables that hold subtypes of Person (Pro-
tas are uniformly distributed between 10K and 19K (sigh!). fessors, Students, TAs and Sta ) in the relational database.
In O-R SQL, the query is again a clean-looking call to the (Due to the length of this query, its statement is not shown.)
salary function; recall that the implementation of this func- This test investigates the eciency of the O-R system's
tion is di erent for the various employee subtypes: handling of joins between table hierarchies.
SELECT e.name, e.street, e.city, e.state, e.zipcode
FROM Employee e WHERE salary(e) >= 96000;
5.7 SET-ELEMENT: Set Membership
In relational SQL, the method computation must be embed- Find all Sta who have a child named \girl16."
ded in the query, which again involves an explicit union: The kidName values in the database are such that this
SELECT e.name, e.street, e.city, e.state, e.zipcode query returns about 2% percent of the Sta objects (495
FROM Staff e people). In O-R SQL, this query is simple; it just tests for
WHERE e.annualSalary >= 96000 UNION ALL membership of 'girl16' in the nested kidName set:
SELECT e.name, e.street, e.city, e.state, e.zipcode
SELECT e.name, e.street, e.city, e.state, e.zipcode
FROM Professor e
FROM Staff e WHERE 'girl16' IN e.kidNames;
WHERE (e.AYSalary * (9 + e.MonthSummer) / 9.0)
>= 96000 UNION ALL In relational SQL, this query involves a join with the
SELECT e.name, e.street, e.city, e.state, e.zipcode table needed to normalize this data in the relational case;
FROM TA e the DISTINCT clause in the relational version is needed to
WHERE (apptFraction * (2 * e.semesterSalary)) force the same semantics as in the O-R query, where each
>= 96000; Sta tuple will be output at most once:
This tests the O-R system's handling of indexing on func- SELECT DISTINCT e.name, e.street, e.city,
tion results in the presence of a table hierarchy. e.state, e.zipcode FROM Staff e, Kids k
WHERE e.id = k.id AND k.kidName = 'girl16';
5.5 SINGLE-JOIN: Relational Join Query
This query tests the O-R system's handling of nested
Find all Sta with the same birthdate who live in sets. As we have mentioned, nested sets have recently been
an area with the same zipcode eliminated from SQL3; we are leaving this query in the
benchmark because vendors support it and we think users
This is a fairly traditional relational join. In O-R SQL, want it. A select/join is required in the relational case, so if
this looks as follows (the oid predicate prevents each satis- the O-R system supports indexing on set-valued attributes,
fying sta member pair from appearing twice): it has an opportunity to win here.
SELECT s1.id, s1.name, s1.city,
s2.id, s2.name, s2.city
FROM Staff s1, Staff s2
WHERE s1.birthDate = s2.birthDate AND
s1.zipcode = s2.zipcode AND s1.oid < s2.oid;
5.8 SET-AND: And'ed Set Membership committee recently voted to remove unscoped reference from
Find all Sta who have children named \girl16" the standard; we will see one reason for this when we exam-
and \boy16." ine the performance results that we obtained by running
BUCKY on an object-relational product that pre-dates this
This query also returns about 2% percent of the Sta decision.)
objects. In O-R SQL, this query is straightforward:
5.10 1HOP-ONE: Single Hop Path, One-Side Selection
SELECT e.name, e.street, e.city, e.state, e.zipcode
FROM Staff e Find the majors of students named
WHERE 'girl16' IN e.kidNames \studentName9000".
AND 'boy16' IN e.kidNames;
This query pairs students and department with a selec-
In relational SQL, this query again involves joins: tion on student name.
In relational SQL, the query looks like this:
SELECT DISTINCT e.name, e.street, e.city,
e.state, e.zipcode SELECT s.id, s.name, d.deptNo, d.name
FROM Staff e, Kids k1, Kids k2 FROM Student s, Department d
WHERE e.id = k1.id AND e.id = k2.id AND WHERE s.majorDept = d.deptNo AND
k1.kidName = 'girl16' AND k2.kidName = 'boy16'; s.name = 'studentName9000'
UNION ALL
This is a slightly more complex test of the O-R system's SELECT s.id, s.name, d.deptNo, d.name
handling of queries involving nested set attributes. FROM TA s, Department d
WHERE s.majorDept = d.deptNo AND
5.9 1HOP-NONE: Single-Hop Path, No Selection s.name = 'studentName9000';
Find all student/major pairs Note that the union is necessary since a student may either
be a TA or a \regular" student.
This is the rst of BUCKY's path expression test queries. In O-R SQL, there are two ways to express this. The
It returns all students and teaching assistants (75000 persons rst, variant A, starts the query from the students and fol-
in all). lows the path to their major department, which looks like:
In O-R SQL, this query is easily written as:
SELECT s.id, s.name, s.state,
SELECT s.id, s.name, s.state, s.major->dno, s.major->dno, s.major->name,
s.major->name, s.major->building s.major->building
FROM Student s; FROM Student s WHERE s.name = 'studentName9000';
In relational SQL, it becomes a union of two joins: The second, variant B, starts from the departments and
SELECT s.id, s.name, s.state, follows their (sets of) pointers toward the department's ma-
d.dno, d.name, d.building jors. This is a selection on the target of a set-valued refer-
FROM Department d, Student s ence, and is a bit obtuse due to the SQL3 \everything in the
WHERE s.majorDept = d.deptNo UNION ALL FROM clause is a table" view of the world:
SELECT s.id, s.name, s.state,
SELECT m->id, m->name, m->state,
d.dno, d.name, d.building
d.dno, d.name, d.building
FROM Department d, TA s
FROM Department d, TABLE(d.majors) t(m)
WHERE s.majorDept = d.deptNo
WHERE m.majors->name = 'studentName9000';
This tests the eciency of the O-R system at processing Variant A tests the O-R system's handling of short path
queries that involve path expressions. A well-implemented expressions with predicates on the originating table. Vari-
O-R system should be able to handle the O-R and relational ant B tests the O-R system's eciency at handling queries
cases with more or less equal eciency. involving nested sets of references. (It is also a case where
We need to point out here that, strictly speaking, O-R inverse relationships, if supported, could be exploited very
path expressions are equivalent to relational systems' left e ectively due to the nature of the selection predicate.)
outer joins, not inner joins. Despite this, we explicitly chose
to use regular joins in the relational case. The reason for this
decision is that, as was mentioned in Section 3, we know that 5.11 1HOP-MANY: One-Hop Path, Many-Side Selection
the BUCKY database contains no dangling relationships. Find all students majoring in Department 7.
Given this knowledge about the database, we have simply
written the given query in its most convenient and natural In relational SQL there is again only one version:
form in each case (i.e., using the most natural O-R and
relational formulations). SELECT s.id, s.name, d.deptNo, d.name
It is also important to notice that the relational version FROM Student s, Department d
of this query explicitly encodes more \information" than WHERE s.majorDept = d.deptNo
the object-relational version, as the relational version names AND d.name = 'deptname7'
both the source and target tables of the relationships in- UNION ALL
volved in this query. In the object-relational case, the infor- SELECT s.id, s.name, d.deptNo, d.name
mation about which tables contain the target objects of ref- FROM TA s, Department d
erences is instead encoded in the schema as reference scope WHERE s.majorDept = d.deptNo
information (as mentioned in Section 3). (In fact, the SQL3 AND d.name = 'deptname7';
Again, there are two ways to express this in O-R SQL. 5.13 ADT-SIMPLE: Simple ADT Function
The rst, variant A, starts from departments and follows Find the latitudes of all sta members.
the path to students; this is a selection on the source of a
set-valued reference: We now turn our attention to testing ADT support,
SELECT m->id, m->name, m->state, starting with the very simple case of a query that has a
d.dno, d.name, d.building function invocation in its SELECT list. In object relational
FROM Department d, TABLE(d.majors) t(m) SQL, the query looks like
WHERE d.name = 'deptname7'
SELECT extract_latitude(s2.place)
The second, variant B, starts from students and follows FROM Staff s2;
the path toward their major departments. This is a selection In relational SQL, where the ADT has been \unencap-
on the target of a scalar reference, which in O-R SQL looks: sulated," we have:
SELECT s.id, s.name, s.state, s.major->dno,
SELECT e.latitude
s.major->name, s.major->building
FROM Staff e;
FROM Student s WHERE s.major->name = 'deptname7';
Variant B tests the O-R system's handling of queries This tests the eciency of the O-R system's function dis-
with path expressions whose target table is restricted by a patch mechanism (versus the eciency of retrieving stored
predicate. With the selection predicate on the path's target data).
table rather than its originating table, an O-R system that
handles path queries naively|e.g., to failing to make use of 5.14 ADT-COMPLEX: Complex ADT Function
scope information, or failing to reorder path expressions like For each Sta member, nd the distance between
joins|will likely do poorly on this test. As with the previous him and the sta member with id 6966.
test, inverse relationship exploitation is possible (and can be
advantageous) on this test. This query applies a more complex ADT function; in
object relational SQL, it looks like:
5.12 2HOP-ONE: Two-Hop Path, One-Side Selection
SELECT distance(s1.place, s2.place)
Find the semester, enrollment limit, department FROM Staff s1, Staff s2 WHERE s1.id = 6966 ;
number, and department name for all sections of
courses taught in room 69. In relational SQL, the computation must be spelled out
completely in SQL:
In O-R SQL there are many ways to express this query.
Like Query 2HOP-NONE, it involves a join of three tables. SELECT SQRT((s1.latitude - s2.latitude)*
We can start to follow references either from course sections, (s1.latitude - s2.latitude)
courses, or departments. We chose not to start with courses + (s1.longitude - s2.longitude)*
since it seemed unlikely (i.e., awkward) for a user to express (s1.longitude - s2.longitude))
the query that way. In variant A, we start from course sec- FROM Staff s1, Staff s2 WHERE s1.id = 6966;
tions and follows the path through Course to Department. This again tests the O-R system's function dispatch me-
This is a selection on the source of a two-hop chain of scalar chanism, but this time it does so versus a case where the
valued references. Variant A is thus quite simple-looking: relational case's expression is quite complex.
SELECT x.semester, x.noStudents,
x.course->dept->dno, x.course->dept->name 5.15 ADT-SIMPLE-EXACT: Exact-Match on an ADT
FROM CourseSection x WHERE x.roomNo = 69;
Find the ids of the Sta who live at latitude of
The second O-R variant starts from departments and 34 and a longitude of 35
follows the path through course to course sections. This is
a selection on the target of a two-hop chain of set-valued In this query, we are looking for a particular point, which
references. Variant B looks like: is an exact match. In object relational SQL, the query looks
like
SELECT x->semester, x->noStudents, d.dno, d.name
FROM Department d, TABLE(d.coursesOffered) t1(c), SELECT s.id
TABLE(c.sections) t2(x) FROM Staff s WHERE s.place = LocationADT(34, 35);
WHERE x->roomNo = 69;
In relational SQL, it looks like:
In relational SQL, there is only one variant:
SELECT s.id
SELECT x.semester, x.roomNo, d.deptNo, d.name FROM Staff s
FROM CourseSection x, Course c, Department d WHERE s.latitude = 34 AND s.longitude = 35;
WHERE x.deptNo = c.deptNo
AND x.courseNo = c.courseNo This tests the O-R system's eciency at handling an
AND c.deptNo = d.deptNo AND x.roomNo = 69; exact match query involving an ADT (which requires ADT
indexing support).
This tests the O-R system's handling of path queries with
longer paths.
5.16 ADT-COMPLEX-RANGE: Range on Complex ADT concise in most cases, we quickly discovered that loading
Function becomes both more complex and more time-consuming as a
Find the ids and names of Sta whose ids are result.
less then 1500 and are at a distance of 500 units Our approach to loading was to rst generate external
from each other. data les that were then loaded into the database system
using its bulk-loading facility. We did this for portability
We now try a more complex ADT query, which in O-R and uniformity reasons: the load les are generated by a
SQL is: stand-alone C++ program, and hence can be used by any-
one, ensuring that others will be able to use the exact same
SELECT s1.id, s1.name, s2.id, s2.name input data set. These les can then be bulk-loaded into
FROM Staff s1, Staff s2 any DBMS (perhaps after some minor syntactic tweaking to
WHERE distance(s1.place, s2.place) < 500 match the eld and tuple delimiters used by the particular
AND s1.id < 1500 AND s2.id < 1500 DBMS's bulk-loading facility).
AND s1.id < s2.id; Our approach to loading was straightforward for the re-
lational BUCKY database, but proved much more dicult
In relational SQL, it looks like: in the object-relational case. To see why, consider generat-
ing the load le for the Students table. Each student has,
SELECT s1.id, s1.name, s2.id, s2.name among other things, an associated major. For relational
FROM Staff s1, Staff s2 systems, the data for a particular student simply includes
WHERE SQRT((s1.latitude - s2.latitude)* the department id of the student's major department; as a
(s1.latitude - s2.latitude) + result, when generating the Department load le, we don't
(s1.longitude - s2.longitude)* need to know who the department's majors are, as this in-
(s1.longitude - s2.longitude)) < 500 formation is already captured in the Student table and can
AND s2.id < 1500 and s1.id < s2.id be recovered later using join queries. In contrast, consider
AND s1.id < 1500; generating the load les for the same two tables (Student
This tests the O-R system's eciency at handling a range and Department) in the object-relational case. First, rather
query involving an ADT. than holding the key of the major department, the student
data must now include the OID (object identi er) of the
5.17 Other Queries Considered student's major department. Unfortunately, since the ma-
jor department object has not yet been created, there is no
In addition to the queries described here, we also consid- way to know what this OID may eventually be. Current
ered including a number of other test queries. However, O-R systems address this problem by allowing the use of a
these other queries were eliminated because, when running surrogate for the department object at load time; this sur-
BUCKY against an actual O-R system, we found that their rogate is a temporary, external OID that the system later
results simply reinforced those that we already presented. replaces with the actual OID later during the loading pro-
The other queries that we tested include: SINGLE-RANGE cess. Although providing support for such surrogate OIDs
(Range Query Over Single Table) and HIER-RANGE (Range makes bulk-loading possible, the problem of generating and
Query Over Table Hierarchy), whose results were similar to managing surrogate OIDs when preparing the data for bulk-
the corresponding exact-match queries; SET-OR (Or'ed Set loading is far from trivial.
Membership), the results of which were similar to the other To illustrate the \joys" of loading an O-R database, sup-
set queries; 1HOP-BOTH (Single-Hop Path, Double-Ended pose we are now generating the Department data le. When
Selection), whose results were essentially predictable based we come to the 4095th department object, it must include a
on the corresponding pair of single-ended selections; 2HOP- reference to each student that has this department as a ma-
NONE (Two-Hop Path, No Selection), 2HOP-MANY (Two- jor. We can use the surrogate OIDs of the student objects;
Hop Path, Many-Side Selection), and 2HOP-BOTH (Two- the set of surrogates for this particular department might
Hop Path, Double-Ended Selection), which largely reinforced be 56, 157, 3100, the surrogates for the 56th, 157th, and
the corresponding single-hop query results; and, nally, ADT- 3100th students. Unfortunately, this means that we must
SIMPLE-RANGE (Range on Simple ADT Function), which remember the association between these students and their
produced results similar to those of ADT-SIMPLE-EXACT. departments from the time when we generate the students
until the time when we generate their corresponding depart-
6 Initial BUCKY Results and Lessons ments. If we are working with a large database, the number
of such associations can be huge, making the data structure
In this section, we brie y describe our preliminary experi- needed to store these associations larger than memory. At
ence in applying the BUCKY benchmark to an actual sys- best, paging of this data structure would make data gen-
tem { one of the early object-relational products. The sys- eration impossibly slow; at worst, its size will exceed the
tem that we tested is Illustra, now owned by Informix. available swap space and the program won't nish running
at all. Note that interchanging the generation order of the
Department and Student tables won't help, as there is a
6.1 Loading the BUCKY Database cyclic dependency between them.
A big di erence between implementing BUCKY in the re- To solve this problem, we used a C++ program that
lational and object-relational models arose when generating generates relational load les, followed by a series of smaller
and bulk-loading the input les for the database. Doing this C++ programs, awk programs, and calls to the Unix sort
was much harder for object-relational data due to the pres- and join utilities to munge this output into an object-re-
ence of references. Basically, O-R systems make querying lational load le. As mentioned above, the relational load
simpler by preconnecting objects according to relationships les include information about which rows are related to
declared in the schema; while queries indeed become more which other rows by using key-foreign key pairs; for exam-
faster hardware platform!) were to create proper function
Query R O-R indexes and to load all ADT functions statically (rather than
SINGLE-EXACT 0.23 0.28 dynamically) into the engine prior to running the benchmark
HIER-EXACT 0.25 0.40 queries. Table 3 lists both the relational (R) and object-
SINGLE-METH 3.58 0.67 relational (O-R) BUCKY results.
HIER-METH 11.49 18.73 For queries SINGLE-EXACT and SINGLE-JOIN, which
SINGLE-JOIN 11.25 11.33 are just relational queries over one table, the R and O-R
HIER-JOIN 140.1 187.2 times are essentially identical, as one would expect. The
SET-ELEMENT 5.8 23.7 O-R times for the corresponding queries HIER-EXACT and
SET-AND 2.5 24.0 HIER-JOIN are somewhat worse than the relational times.
1HOP-NONE 50.9 95.0/39.7 These di erences are due to an O-R query optimizer bug
1HOP-ONE 0.30 0.29/0.26 (the optimizer sometimes fails to correctly choose an index-
1HOP-MANY 2.32 23.96/6.26 based plan in the presence of a table hierarchy) in the version
2HOP-ONE 4.95 2.12/1.74 of Illustra on which we ran the tests.
ADT-SIMPLE 5.97 6.43 We now turn to the method queries SINGLE-METH and
ADT-COMPLEX 9.43 5.92 HIER-METH. Comparing the R and O-R times for query
ADT-SIMPLE-EXACT 0.20 0.24 SINGLE-METH shows the large gains that O-R support for
ADT-COMPLEX-RANGE 39.6 22.5 indices on functions can provide. The O-R system is able
to execute this query by doing an index lookup on the em-
ployee salary function, whereas the complexity of the query
Figure 3: Measured times in seconds for BUCKY queries. predicate in the relational case (where the predicate essen-
(Path expression results shown as UNSCOPED/SCOPED tially includes an in-query expansion of the O-R function
pairs of times). body) forces a query plan that involves a sequential scan.
The same performance advantage for O-R should be seen
for HIER-METH, but it isn't; instead, the O-R time is ac-
ple, the relationship between a given Student row and its tually worse in this case. This is also due to the optimizer
corresponding major Department is represented by storing bug mentioned above (which causes the plan based on the
the department's key in the student tuple. The load mung- functional index to be missed).
ing programs have to replace this representation by putting Next we look at the set queries, SET-ELEMENT and
a reference to the Student tuple in the \majors" set of the SET-AND. In both cases, the relational version { which in-
Department row and a reference to the Department tuple volves a join { is signi cantly faster than the O-R version,
in the Student row. This can be accomplished by joining showing that the O-R system's handling of nested sets could
the Department and Student class, and then lling in the be improved.
references with surrogate OIDs instead of writing out joined We now come to the path queries. Two O-R times are
tuples. This process was implemented as a sort-merge join shown for 1HOP-NONE. The rst O-R time (95.0 seconds)
using the Unix \sort" and \join" utilities. We used a simi- is worse than the relational time (50.9 seconds) and re-
lar approach for the other references in the BUCKY schema sulted from writing the O-R query using a path expression.
as well. It is worth noting that much of this e ort would The reason for the lower O-R performance is2that this sys-
have been unnecessary if object-relational database systems tem does not yet support scoped references , as the sys-
(and SQL3!) supported the notion of bi-directional rela- tem was built before the notion of scoped references was
tionships { we could then have declared the students' major added to SQL3. Thus, although the system knows from the
and departments' majors attributes to be inversely related, schema that the eld s.major points to an object of type
explicitly linking the data in only one direction (as in the DepartmentObj, it has no way of knowing that the target
relational case), leaving it to the system to ll in the reverse object is in the Department table. Consequently, it has to
direction. revert to what amounts to a nested-loops join (scanning the
Finally, the amount of loading work (I/O and CPU time) Student table and following the major pointer for each stu-
that must be done by the system is greater in the O-R case dent tuple). Since it is of questionable fairness to compare
as well, as the O-R system must assign each object a real the performance of an explicitly scoped relational join with
OID and then replace all uses of that object's surrogate OID an unscoped pointer join, we also include another O-R time
with the newly assigned real OID [WN94, WN95]. This (39.7 seconds) in the table. This time was obtained by sim-
extra work was dramatically visible in the loading times that ulating what an O-R system with support for scoped refer-
we saw when preparing the BUCKY database; loading took ences would do by explicitly rewriting the path query as an
many times longer in the object-relational case (even when explicit OID-join (i.e., as a join between the Student and
the \munging" time required to prepare the O-R input les Department tables, just like the relational query but with
was excluded). a join predicate of s.major = d.oid). Doing so led to an
O-R time that beat the corresponding relational time (due
6.2 Running the BUCKY Queries to a better query plan being selected by the optimizer in the
In describing the BUCKY queries in Section 5, we indicated second O-R case than in the relational case).
brie y what each one was intended to test. Here we present Alternatively, we could have implemented an unscoped
preliminary results that were obtained by running BUCKY join in the relational system. One way to do this would
on version 3.2 of Illustra, a rst-generation O-R database be to replace the \major" attribute with a pair of attributes
system product. The results reported here were measured (tableName, majorDept) then \decode" this pair of attributes
at Informix; they took the initial Illustra implementation in a client application. This would be impossibly slow; we
of BUCKY produced at the University of Wisconsin and 2 It supports only unscoped references, which are strictly more
improved it in several ways. Some of the improvements that powerful but also much more costly in terms of performance in some
they made over our initial version (in addition to using a cases, as we will see here.
didn't test it because we have no notion of a client applica- 1. The BUCKY O-R Eciency Index.
tion anywhere else in the benchmark. This number measures the relative performance of the
The next path query is 1HOP-ONE. Recall that for the system's O-R and relational functionality. It is de ned
select-join queries, we looked at two ways of expressing each to be ( ) ( ), where ( ) is the geometric
query in O-R SQL; this is because given our schema there are G OR =G R G OR
mean of all object-relational test times and ( ) is
two directions in which to follow each relationship. These G R
the geometric mean of all relational test times.
two variants are not the source of the pair of numbers in
Figure 3; as mentioned previously, the two numbers in the 2. The BUCKY O-R Power Rating.
gure are due to the scoped/unscoped option. This measures the absolute performance of the sys-
In the case of 1HOP-ONE, the O-R times shown are for tem's O-R functionality, and is simply 100 0 ( ).
variant A, where the query is written as a path expression : =G OR
going from students to departments. The two O-R times and The O-R Power Rating is useful only when comparing
the relational time are all more or less identical due to the two object-relational systems|if system A has a higher pow-
fact that the three cases all allow the selective student name er rating than system B, then system A is in some sense
predicate to be applied rst; in this case, the lack of scope \faster" than B. The O-R eciency index, in contrast, is in-
information is not a problem. Variant B, which traverses the teresting within a single system. For Illustra, if we omit the
relationship in the opposite direction using the department's set-valued attribute and set-of-reference queries (the ones
set of majors, is not shown. Illustra's times were slower in that have recently been dropped from SQL3), and use the
this case because it uses nested sets and unscoped references, OID-join encoding of the scoped reference queries, the O-R
so there was no point in including these times (though they eciency index is 0 9.
:
should be included in tests of systems that support scoped We anxiously await the rst O-R system that can get
references and nested sets); note that a join-based rewrite of an O-R eciency rating 1 0 based on reporting times
< :
variant B would be the same as that for variant A. Lastly, for all of the queries (without rewrites), indicating that it
note that inverse relationship information would allow a sys- successfully ran the full O-R version of the BUCKY queries
tem to choose to use forward-traversal plans rather than set- faster than the relational version. For anyone who would like
traversal plans when appropriate, which would help here, to try, the loading programs and queries are freely available
but neither current O-R systems or SQL3 provide any such from the database area at the UW CS department web site
support. (http://www.cs.wisc.edu/~naughton/bucky.html).
For the next query, 1HOP-MANY, the results shown are
for variant B, the reference traversal variant; as for query 7 Conclusions
1HOP-ONE, we omit the times for the variant that traverses
through a set of unscoped references, but plan on including In this paper, we have presented BUCKY, a benchmark
it in the benchmark when systems provide scoped references. for object-relational database systems. BUCKY is a query
The relational time for the path variant of 1-HOP-MANY benchmark that tests the object features o ered by object-
is better than both O-R times here. In the unscoped case, relational systems, including row types and inheritance, ref-
the lack of scope information forces the system to apply the erences and path expressions, sets of atomic values and of
selection last, so the unscoped path query cost is high here. references, methods and late binding, and user-de ned ab-
The OID-join version performs much better, though still not stract data types and their methods. To help evaluate the
as well as the relational version in this case. current state of the O-R art, we presented both object-
The last path query is 2HOP-ONE; again, we show only relational BUCKY and a relationally mapped simulation
the forward path traversal results, omitting the traversal thereof, and we strongly advocate running both versions
in the reverse (set of reference) direction. This query once against the same O-R engine. We discussed the lessons that
again involves unscoped references, but O-R performance is we learned by running BUCKY on an early O-R product;
quite good despite this due to the highly selective predicate the results highlighted a number of issues related both to
on roomNo. In this case, the relational version is slower than current products and to object-relational technology (a la
both of the O-R versions of the query. SQL3) in general.
The last group of queries involves use of ADTs and their While we expect the BUCKY benchmark to continue
functions. The ADT-SIMPLE results show a slight overhead to evolve, our initial BUCKY experience, in a nutshell, in-
for function invocation in the O-R case, as the function body dicates that object-relational technology is a double-edged
is extremely simple, while the results for ADT-COMPLEX sword today. For the most part, the queries are much more
show that ADT function performance beats relational ex- naturally and concisely expressible using the power of the
pression evalution for more complex functions. The O-R and object-relational model and SQL extensions. However, at
R times are essentially identical for query ADT-SIMPLE- least today, this greater expressive power does not come
EXACT. Finally, the O-R time is better for query ADT- for free. For example, we found that loading an object-
COMPLEX-RANGE; it is clear that the O-R system can relational database is far more challenging than loading an
take advantage of the ADT in this case. information-equivalent relational database; inverse relation-
ship support would have helped here. In addition, the new
6.3 Reporting the Bottom Line SQL language features that O-R systems o er|such as ref-
In our previous benchmarks, we have avoided the idea of erences, sets, inheritance, methods, and ADTs|provide new
boiling an entire benchmark down to a single number, but implementation challenges for implementors of DBMS en-
it is too much fun not to compress the results somehow. gines. We saw that a number of the BUCKY queries cur-
We still believe that a full set of results is by far the best rently run faster on the relational version of BUCKY, par-
performance pro le of a system, but as a challenge to im- ticularly those involving sets, and we clearly saw one of the
plementors of O-R systems everywhere, we are de ning two reasons that SQL3 advocates the exclusive use of scoped
bottom-line metrics for the BUCKY Benchmark: references whenever possible.
Stonebraker refers to object-relational technology as \the advisor Ref(ProfessorObj),
next great wave" [Sto96], and it is clear from the activity in hasTaken Set(Ref(EnrolledObj))
the industry that this wave is starting to wash over us to- ) UNDER PersonObj;
day. It is our hope that BUCKY will be useful over the next CREATE TABLE Student
few years as this wave continues|both for customers of this OF ROW TYPE StudentObj UNDER Person ...;
technology, so they can tell when O-R systems are ready for
deployment in their applications, and for its developers, to CREATE ROW TYPE EmployeeObj (
provide a forcing function for improving the current state DateHired Date,
of the art. To this end, we have o ered two BUCKY per- status Integer, salary Real virtual,
formance metrics|the O-R Eciency Index, for comparing worksIn Ref(DepartmentObj)
O-R and relational implementations of BUCKY, and the ) UNDER PersonObj;
O-R Power Rating, for comparing O-R systems. CREATE TABLE Employee
OF ROW TYPE EmployeeObj UNDER Person ...;
References
CREATE ROW TYPE StaffObj (
[CDKN94] Michael J. Carey, David J. DeWitt, Chander annualSalary Integer
Kant, and Je rey F. Naughton. A status report ) UNDER EmployeeObj;
on the OO7 OODBMS benchmarking e ort. In CREATE TABLE Staff
Proceedings of the ACM OOPSLA Conference, OF ROW TYPE StaffObj UNDER Employee ...;
pages 414{426, Portland, OR, October 1994.
CREATE ROW TYPE InstructorObj (
[CDN93] Michael J. Carey, David J. DeWitt, and Jef- Teaches Set(Ref(CourseSectionObj))
frey F. Naughton. The OO7 benchmark. In Pro- ) UNDER EmployeeObj;
ceedings of the 1993 ACM-SIGMOD Conference CREATE TABLE Instructor
on the Management of Data, Washington D.C., OF ROW TYPE InstructorObj UNDER Employee ...;
May 1993.
CREATE ROW TYPE ProfessorObj (
[CS92] R. Cattell and J. Skeen. Object operations AYSalary Integer, monthSummer Integer,
benchmark. ACM Transactions on Database advisees Set(Ref(StudentObj))
Systems, 17(1), March 1992. ) UNDER InstructorObj;
[Gra93] Jim Gray. The Benchmark Handbook. Morgan CREATE TABLE Professor
Kaufmann, San Mateo, CA, 1993. OF ROW TYPE ProfessorObj UNDER Instructor ...;