Parall Aggr
Parall Aggr
in Object-Relational DBMS
Michael Jaedicke, Bernhard Mitschang
Technische Universittit Miinchen
Computer Science Department
80290 Mtinchen, Germany
+498948095184
[email protected]
’ This work was partially supportedby the DFG (SFB 342, B2).
379
parallel execution. Our framework for parallel processing of context. Since we do see only very limited opportunities for
user-defined functions is introduced in Section 4. After a database technology to enable parallelism for functions with
discussion of related work in Section 5, the closing Section external context we will not consider that kind of UDFs in
6 contains a short summary and a brief outlook to future this paper.
work.
3.2.1 User-Dejined Scalar Functions
3. USER-DEFINED FUNCTIONS Figure 1 provides an example of the syntax used in DB2
We will now provide the basic concepts and definitions we UDB [4] to register a new UDSF with the DBMS. The
use in this paper. We will concentrate on the concepts for scalar function add returns the sum of its two arguments of
our specific query processing problem and refer the reader the user-defined data type do1 lar .
to the literature for the general concepts of parallel
relational database query processing ([91, [14], [12], [431, CREATE FUNCTION add (dollar, dollar)
[41]) and object-relational query processing ([42], [4]). RETURNS dollar
EXTERNAL NAME ‘dollar!add’
3.1 Built-in and User-Defined Functions LANGUAGE C
Every RDBMS comes with a fixed set of built-in functions. PARAMETER STYLE DB2SQL
These functions can be either scalar functions or aggregate NOT VARIANT
functions. A scalar function can be used in SQL queries NOT FENCED
wherever an expression can be used. Typical scalar NOT NULL CALL
functions are arithmetic functions like + and * or concat NO SQL
for string concatenation. Functions for type casting are NO EXTERNAL ACTION
special scalar functions, too. A scalar function is applied to NO SCRATCHPAD
the values of some columns of a single row of an input table. NO FINAL CALL;
Figure 1. Registration of a new UDSF add in DB2
By contrast, an aggregate function is applied to the values of
a single column of either a group of rows or of all rows of an
input table. A group of rows occurs if a GROUP-BY clause As can be seen from this example there are already some
is used. Thus aggregate functions can be used in the parameters allowing the user to describe the characteristics
projection part of SQL queries and in HAVING clauses. The of a newly registered function. We refer the reader to [4] for
aggregate functions of the SQL92 standard are MAX, MIN, most of the details and provide only the relevant information
AVG, SUM and COUNT. Other statistical aggregate functions to our problem. An interesting feature is the possibility to
like standard deviation and variance are use a so-called scratchpad area for UDSFs. A scratchpad
provided by some RDBMS implementations [4]. area is a small piece of memory that is passed to a UDSF
with all calls and that is not deleted after the executed
In ORDBMS it is possible to use a user-dejined function function returns the control. Thus it is possible for a
(ZJDF) in nearly all places where a system-provided built-in function to maintain a global context (or global state) that
function can appear in SQL92. Thus there are two subsets of means information can be preserved from one function
UDFs: user-de$ned scalar functions (UDSFs) and user- invocation to the next. After the last call to the function
defined aggregatefunctions (UDAFs). within an SQL query, the scratchpad is deallocated by the
3.2 Definition of New UDFs system. Please note that the user can allocate more memory
than the rather small scratchpad area by simply allocating
Let us now describe briefly how UDFs are created in
ORDBMS. Users can write UDFs as so-called external some memory dynamically and hooking it up in the
functions in a 3GL (typically C and Java are supported as scratchpad. Often a scratchpad is used to store intermediate
languages) and then register them with the DBMS. results that have been computed from the arguments of
former function calls. We say that such UDSFs have an
In advanced object-relational systems it should be possible input context. The moving average function is an example
to implement the body of UDFs using SQL statements of such a scalar function. It allows to compute many
embedded in the code of a 3GL (similar to the usual aggregates (many moving averages) by means of a single
embedded SQL offered for application development). This scan over an input table.
allows access to the database in the function’s body. One
restriction is that a function should not modify the database, After a function has been registered, the developer of a UDF
if it is used in a SELECT statement. Furthermore a UDF should provide the query optimizer with some information
might perform an external action, e.g. read from or write to about the expected execution costs of a UDF. ORDBMS
a tile, send an email to the DB administrator, start a have to provide a suitable interface for this purpose. For
program, etc. The Informix Illustra ORDBMS already example DB2 allows to specify the I/O and CPU costs that
supports UDFs that consist of one or more SQL statements. are expected for the first call to a function, for each further
call, and per argument byte that is read by the function. In
If DML statements or external actions are used in UDFs, the addition to this, the percentage of the argument’s storage
UDF might depend on arbitrary data in the database or size that is processed at the average call to the UDF should
elsewhere. We say that these functions have an external be specified. If a UDF is used as a predicate (i.e. the
380
function returns a boolean value) the user should be able to presentation as simple as possible.
specify a user-defined selectivity function [42]. Since
We first have to create three UDSFs INIT-MF, ITER-MF,
providing these details can be a time consuming task, easy
FINAL-MF that provide the implementation routines of the
to use development kits may be offered for this task ([7],
MOST-FREQUENT aggregate function. These three routines
WI, [331).
are programmed as external functions, i.e. they are written
3.2.2 User-Dejined Aggregate Functions e.g. in C and can use the system-provided API for UDFs to
Let us now see how, for example, the Informix Illustra handle tasks like memory allocation, etc. Then they are
ORDBMS supports UDAFs. The system computes registered using the CREATE FUNCTION statement:
aggregate functions in a tuple-at-a-time fashion, i.e. there is CREATE FUNCTION INIT-MFO
one function call for each element of the input set. The user RETURNS POINTER
has to write the three following external functions to EXTERNAL NAME 'libfuncs!mf-init'
implement a new UDAF: LANGUAGE C . . ..
l hit(): CREATE FUNCTION ITER-MF(POINTER, INTEGER)
The Init function is called only once and without argu- RETURNS POINTER
ments to initialize the aggregate computation before the EXTERNAL NAME 'libfuncs!mf-iter'
actual computation of the aggregate begins. It returns a LANGUAGE C . . . .
pointer to memory, which it has allocated to store inter-
mediate results during the aggregation. CREATE FUNCTION FINAL-MFO
RETURNS INTEGER
l Iter(pointer, value): EXTERNAL NAME 'libfuncs!mf-final'
The Iter function is called once for each element of the LANGUAGE C . . . .
input set. One parameter is the value of this element and
the other is the pointer to the allocated memory. It aggre- The function INIT-MF allocates and initializes memory to
gates the next value into the current aggregate that is store the integer values together with a count and returns a
stored in memory using the pointer. It returns the pointer pointer to that memory. The function ITER-MF stores its
to the allocated memory. argument in the allocated memory, if it is an integer value
not seen so far, and increments the count for this value.
l aggregate value = FinaQpointer):
Finally, the FINAL-MF function searches for the value
The Final function is called once after the last element of
with the maximum count and returns this value. Now we
the input set has been processed by the Iter function. It
create the UDAF with the CREATE AGGREGATE
computes and returns the resulting aggregate using the
statement:
pointer to the allocated memory. In addition, it deallo-
cates the memory. CREATE AGGREGATE MOST-FREQUENT
The pointer, similar to the scratchpad area mentioned before (
(cf. [4]), allows to store the input context of the init = INIT-MF()
computation. For example to compute the average of a set of iter = ITER-MF(POINTER, INTEGER)
values, the Zter function would store both the sum of all final = FINAL-MF(POINTER)
values seen so far and their number as intermediate results ) ;
in the allocated memory. The Final function would divide Now the MOST-FREQUENT function can be used as a new
the sum by the number and return the result. The reader aggregate function in queries. We will now explain, why
should note that all practical aggregate functions have an this aggregate function cannot be processed parallelly.
input context.
UDSFs without context can be executed parallelly using
Obviously this design matches the usual Open-Next-Close data parallelism. Instead of executing a set of function
protocol [ 121 for relational operators. After the three invocations in a sequential order, one simply partitions the
functions have been registered with the ORDBMS (cf. data set (horizontal fragmentation) and processes the UDSF
Figure l), the user can create the aggregate function (e.g. for each data partition parallelly. This parallel execution
average) using a CREATE AGGREGATE statement. This scheme is shown in Figure 2 for a selection.
statement determines, which three functions are used to
implement the hit, Iter and Final functions for the new
aggregate function.
381
Obviously aggregate functions cannot use this approach enhance the parallel processing of UDFs with an input
without modification as they have an input context and context. In Subsection 4.1 we introduce local and global
deliver only a single result for a set of input tuples. Parallel aggregation functions for UDAFs as a generalization of the
aggregation operations in RDBMS use an execution scheme relational processing scheme. In Subsection 4.2 we
consisting of two steps [ 121 as shown in Figure 3. After the introduce partitioning classes and define the class of
data has been partitioned, it is first aggregated locally for partitionable functions that can be processed with data
each partition and then, in a second step, the locally parallelism. In Subsection 4.3 we propose sorting as a
computed sub-aggregates are combined in a global preprocessing step to enhance parallel execution for non-
aggregation (merging step in Figure 3). For the aggregate partitionable UDAFs.
function COUNT the local aggregation counts while the
global aggregation computes the sum of the local counts, 4.1 Two Step Parallel Aggregation of UDAFs
Generally speaking, the local and global aggregation In this Subsection we will show how aggregates can be
functions needed for parallel execution are different from processed in 2 steps using local and global aggregate
the aggregate function that is used for sequential execution. functions. .
For built-in aggregate functions local and global
aggregation functions are system-provided. Thus the DBMS To simplify the presentation below, we will omit constant
can use these functions for parallel execution. For UDAFs input parameters to UDFs. Given a set S, we will use
there is currently no possibility to register additional local shorthand notations like f(S) for the resulting aggregate
and global aggregation functions. This is the reason, why a value of an aggregate function f applied to S. We will also
UDAF like the MOST-FREQUENT function cannot be use the notation f(S) to denote the result of repeatedly
executed with the usual 2 step parallel aggregation scheme. invoking a scalar function f for all elements of S. We want to
emphasize that in this case f(S) denotes a multi-set of values
(a new column).
Next, we define the class of aggregate functions that can be
processed parallelly using local and global aggregation
functions. An aggregate function f is partitionable iff two
aggregate functions ft and fg exist, such that for any multi-
set S and some partition Si of S, 1 < i 5 k, the following
equation holds:
382
aggregate function to both sequential and parallel greater than the value N. This class of partitioning func-
processing. Therefore the developer might have to tions is especially useful for scalar functions that require
implement and register six additional functions (Init, Iter, a sorted input, for example scalar functions that compute
and Final functions for local aggregation and the same for moving averages (see Section 4.5).
global aggregation) to enable parallel as well as sequential Please note that the following inclusion property holds:
processing of a UDAE However, if one does not need RANGE C EQUAL C ANY. This taxonomy is useful to
maximum efficiency for sequential evaluation, one can classify UDFs according to their processing requirements as
simply use the local and global function for sequential we will see below. The database system can automatically
execution, too. This, however, will incur at least the provide at least a partitioning function of class ANY for all
overhead for the invocation of an additional function. On the user-defined data types (e.g. round-robin). We define that a
other hand, the additional work for the developer will pay class Cpartition of a multi-set is a partition that is generated
off with all applications that are profiting from the increased using a partitioning function of class C (C denotes either
potential for parallelism. Besides that, there seems to be no ANY, EQUAL or RANGE).
solution that results in less work for the developer.
Based on these definitions we can now define the classes of
4.2 Partitioning Classes and Partitionable partitionable aggregate and scalar functions. These classes
Functions describe the set of UDFs that can be processed parallelly
One prerequisite for data parallelism is that one has to find a with the usual execution schemes for data parallelism (cf.
suitable partitioning of the data. This means that the Figures 2 and 3) and a particular class of partitioning
partitioning must allow a semantically correct parallel functions.
processing of the function. In order to ease the specification A scalar function f is partitionable for class C iff a function
of all partitionings that are allowed for the correct parallel ft exists, such that for any multi-set S and any class C
processing of a UDFs, we describe a taxonomy of the partition Si of S, 1 5 i I k, the following equation holds:
functions that can be used for partitioning.
f(s) = u1 2 i 5 k fitsi>
All partitioning functions take a multi-set as input and
return a partition of the input multi-set, i.e. a set of multi- An aggregate function f is partitionable for class C iff two
sets such that any element of the input multi-set is contained functions fi and fg exist, such that for any multi-set S and
exactly in one resulting multi-set. Actually in some cases we any class C partttion Si of S, 1 I i 5 k, the following
will allow functions returning subsets that are not disjoint, equation holds:
i.e. functions that replicate some of the elements of the input
f(s) = fg(u1 5 i < k {fl(si) 1)
set. We define the following increasingly more special
classes of partitioning functions: The schemes in Figure 4 and Figure 5 show how
partitionable functions can be processed parallelly. All k
l ANY the class of all partitioning functions. Round- partitions can be processed parallelly. The actual degree of
robin and random functions are examples that belong to parallelism (i.e. mainly the parameter k) has to be chosen by
no other class. All partitioning functions that are not the optimizer as usual. Please note, that for the scheme in
based on attribute values belong only to this class. Figure 4, there is not always a need to combine the local
l EQUAL (column name): the class of partitioning func- results. Hence, the optional combination step (computing
tions that map all rows of the input multi-set with equal f(S) = ~1 2 i 5 k fi(Si)) is left out. In order to enable the
values in the selected column into the same multi-set of DBMS to process a UDF parallelly the developer must
the result. Examples of EQUAL functions are partition- specify the allowed partitioning class when the function is
ing functions that use hashing. registered (cf. Section 4.4).
l RANGE (column name [, N]): the class of partitioning
functions that map rows, whose values of the specified A scalar function f that is partitionable for class C using
column belong to a certain range, with the same multi- the associated function ft can be evaluated parallelly
set of the result. Obviously there must exist a total order using the following scheme, given a multi-set S and a
on the data type of the column. The range of all values of partitioning function p of class C:
the data type is split into some sub-ranges that define 1. Partition S in k subsets Si, 1 < i 5 k, using p.
which elements are mapped into the same multi-set of
Distribute the partitions to some nodes in the system.
the resulting partition. Based on the total order of the
data type the optional parameter N allows to specify that 2. Compute ft(Si) for 1 I i 5 k for all Si parallelly.
the largest N elements of the input set which are smaller
Figure 4. Parallel processing scheme for partitionable
than the values of a certain range have to be replicated
scalar functions
into the resulting multi-set of this range. Replicated ele-
ments must be processed in a special way and are
needed only to establish a “window” on a sorted list as a We have introduced some extensibility to the traditional
kind of global context for the function. The number of parallel execution schemes by parameterizing the
elements that belong to a certain range should be much partitioning step by means of the partitioning function. In
383
are produced. For example, when the moving average over
An aggregate function f that is partitionable for class C
five values is computed, the first four values of a partition
using the two associated functions fl and fg can be will be replicated ones and are stored in the global context
evaluated parallelly using the .following scheme, given
of the function. Then, the fifth invocation produces the first
an input multi-set S and a partitioning function p of class
result. Though this extension is conceptually simple, it may
c:
be difficult to add it to an existing execution system.
1. Partition S in k subsets Si, 1 < i < k, using p.
Distribute the partitions to some nodes in the system.
4.3 Parallel Sorting as a Preprocessing Step
for UDAFs
2. Compute Ii := fl(Si) for 1 I i I k for all Si parallelly.
Some user-defined aggregate functions can be easily
Send the intermediate results Ii to a single node for implemented, if their input is sorted according to a specified
processing of step 3. order. In this case the sort operation can be executed
3. Compute f(S) := fs(Ut 5 i 2 k {Ii}); parallelly. Of course, this is especially interesting for
fp can be applied to the Ii in arbitrary order. UDAFs that are not partitionable.
Sorting as a preprocessing step for UDAFs can be
Figure 5. Parallel processing scheme for partitionable
introduced by using an additional parameter in the CREATE
aggregate functions
FUNCTION statement (see Section 4.4 for details of the
syntax we propose). Of course the user must have the
addition, we have defined classes of partitions to allow the possibility to specify a user-defined order by providing a
optimizer more flexibility w.r.t. to the choice of the specific sort function for the argument types of the UDF that
partitioning function. The query optimizer can try to avoid are often user-defined data types. In most cases such
data repartitioning, when multiple UDFs are processed, if functions will be needed anyway, to support sorted query
the developer specifies only the class of the partitioning results, to build indexes (like generalized B-Trees [42] or
functions. This can reduce processing costs dramatically, GiSTs [19]) or for sort merge joins to efficiently evaluate
especially for shared-disk and shared-nothing architectures. predicates on user-defined data types, to quote some
If the developer specifies a single partitioning function for examples.
each UDF, in almost all cases a repartitioning step will be
needed to process a UDF parallelly. Vice versa, if a single One interesting point to observe is that many aggregate
partitioning function satisfies all of the partitioning classes functions, which operate on a sorted input, do not need to
of a given set of UDFs, then repartitioning can be avoided. read the complete input set to compute the aggregate. Thus
it might be well worth to provide the aggregate function
Because UDFs can have arbitrary semantics, we believe that with the option to terminate the evaluation as early as
it is not possible to define a fixed set of partitioning possible and return the result. We call this feature early
functions that allows to apply data parallelism to all UDFs. termination. The parallel processing scheme for aggregate
If a given UDF is partitionable using some partitioning functions with sorted inputis shown-in Figure 6. -- -
function p, but none of the partitioning classes defined
above, the developer should be enabled to specify that this
function p must be used. Using a special partitioning
An aggregate function f that requires a sorted input car n
1
be evaluated using the following scheme given inpu t
function should be avoided in general, since all data has to
multi-set S:
be repartitioned before such a UDF can be processed.
1. Sort the input S. This can be done parallelly.
We want to remark here that implementing RANGE
partitioning is a bit complicated, since a user-defined sorting 2. Compute f(S) without parallelism
order and partial replication have to be supported. One (use early termination, if possible).
difficulty is for example to find equally populated ranges for Figure 6. Parallel processing scheme for aggregate
a given user-defined sort-order. We believe that range functions with sorted input
partitioning with partial replication can be best supported by
an appropriate extension of the built-in sort operator of the
ORDBMS. This operator has to support user-defined sorting The optional sort requirements can be integrated into rule-
orders anyway. The definition of ranges and partial based query optimization (see e.g. [13], [25], [16], [28])
replication can be supported, if information about the data is simply by specifying the sorting order as a required physical
collected during the sorting process. property for the operator executing the UDF. Then a sort
enforcer rule (1131, [ 171) can guarantee this order
In addition to that extension, the operator that invokes requirement by putting a sort operation into the execution
UDSFs has to be extended. The UDSF that needs the range plan, if necessary. Informix’ Illustra [22] supports already
partitioning is evaluated immediately after the partitioning. optional sorting of inputs for UDFs that have two arguments
Replicated data elements (that have to be marked) are and return a boolean value. The developer can specify a
processed by the UDSF in a special mode that has to be user-defined order for the left and right input of such a
indicated by turning a special switch on. In this mode only function. Obviously this allows to implement a user-defined
the global context of the UDSF is initialized and no results join predicate using a sort-merge join instead of a Cartesian
384
product followed by a selection. Thus our proposal can be would be desirable.
seen as an extension of this approach w.r.t. to a broader class
of supported UDFs and their parallel execution. 4.5 Example Applications and Discussion
In this Subsection we present some example applications to
4.4 Extended Syntax for Function Registration illustrate the benefits of the introduced techniques.
In this Subsection, we present the syntax extensions for the
statements that allow the registration of UDFs with support 4.5.1 Application to the UDAF mostfrequent
for the features introduced in the previous Subsections.
First, we will demonstrate how parallel execution can be
Figure 7 shows the extensions for the CREATE FUNCTION enabled for the most frequent aggregate function.
statement. We have marked our extensions by boldface. The
ORDER BY clause can be used to specify a sorting order How can we use the 2 step processing scheme to process the
that is required for the input table, on which the function is most frequent function parallelly? A straightforward
executed. The input table can be sorted on multiple columns approach could be to compute the most frequent value for
applying user-defined sort functions to define the sort order. each partition parallelly using the local aggregate function.
Furthermore the developer must specify, if early termination This implies that the local aggregate function returns the
is used. To enable parallel evaluation, the partitioning class most frequent value together with the number of its
has to be specified. In addition to ANY, EQUAL and occurrences (i.e. the return type of the local function is a
RANGE partitioning, the developer can register a special row type or a special user-defined type). Then the overall
(user-defined) partitioning function for a UDF. most frequent value is computed by the global function.
Obviously this scheme is only correct if EQUAL is specified
CREATE FUNCTION <function-name> (<argumenttype list>) as the partitioning class for the local aggregation function. If
RETURNS <data type name> ANY would be used as partitioning class, the local
EXTERNAL NAME <external function name> aggregate function would have to return all distinct values
[ORDER BY {<argument name> [USING <sort function together with the number of their occurrences for each
name>] [ASC I DESC] } [EARLY TERMINATION]]
[ALLOW PARALLEL WITH PARTITIONING CLASS (
partition. Thus the local aggregation step would not be
ANY useful.
I EQUAL (<argument name list>)
I RANGE {<argument name> [, <number>] One difficulty of this approach is to implement the local
[USING <sort function name>] [ASC I DESC]} aggregation function, since it must temporarily store cl11
I <partitioning function name> )] distinct values together with a counter. It is difficult to
LANGUAGE <languagename> implement this efficiently in a user-defined function, since
the function must be able to store an arbitrarily large data
Figure 7. Extensions to UDSF registration
set. By contrast, the local aggregation can be done much
I easier if the developer uses sorting as a preprocessing step.
The function must then only store two values and two
Figure 8 shows the extensions for the CREATE counters: one for the most frequent value seen so far and
AGGREGATE statement. It now includes the local and one for the last value seen. This approach is much more
global function options that are needed to register the practical. Based on the syntax from Section 4.4 we show the
aggregate functions that have to be used for the parallel registration of the Iter function for the local aggregation
evaluation of the new aggregate function. Of course the (‘$i’ denotes the argument at position i in the parameter list
various Init, Iter, and Final routines that are registered must of the function).
be consistent w.r.t. their argument types. For example the
sequential and the global Final function must have the same CREATE FUNCTION
return types (but often will have different argument types). ITER-MF-LOCAL(POINTER, INTEGER)
RETURNS POINTER;
As we mentioned already in Section 3.2 additional
EXTERNAL NAME 'libfuncs!mf-iter-local'
information about these functions should be supplied by the
ORDER BY $2 ASC
developer. In addition to the usual cost parameters
ALLOW PARALLEL WITH PARTITIONING CLASS
information about the size of the results of the local
EQUAL $2
aggregation function (perhaps depending on the cardinalit Y
LANGUAGE C . . . .
of the input set, if the function returns a collection type)
Instead of using EQUAL and the ORDER BY clause, one
CREATE AGGREGATE <function-name> could have also used RANGE as partitioning class. But this
( would have two disadvantages: first, all data must be sorted
<hit, her, and Final function definition>
[LOCAL chit, Iter, and Final function definition> ] before the partitioning. With EQUAL as partitioning class
[GLOBAL chit, Iter, and Final function definition> ] the data is first partitioned and then only the partitions are
1 sorted as specified in the ORDER BY clause. Second,
Figure 8. Extensions to UDAF registration
repartitioning would occur more often due to the more
restrictive partitioning class.
385
#.5.2 Application to the UDSF running average SELECT Median(P.Age, COUNT( *))
As an example of a UDSF with input context, we discuss the FROM Persons AS P
running average function. This functions computes for each
input value the average of the N values seen last. This means Figure 10. Computing the median in object-relational SQL
that the input context of the function is a ‘window’ of size
from Figure 9 has one of O(N2). In case of the Median
N. Thus the running average function is partitionable of
function using the early termination option would save
class RANGE with parameter N. Obviously the running
roughly half of the calls to the function.
average function computes many aggregates with a single
scan over the input table. This is a typical example of a
4.6 Summary
UDSF with an input context. Other functions of that kind
are for example available in Red Brick Systems’ Intelligent Table 1 shows the different kinds of contexts that can occur
SQL [37]. for UDFs and the implications for parallel execution with
respect to data parallelism for aggregate and scalar
4.5.3 Application to the UDAF Median functions. As can be seen from Table 1, our techniques
As an example of a function that seems to be not support data parallelism with respect to many, but not all
partitionable consider the Median function that computes UDFs with input context. Additional techniques might
the r(N+l)/21 largest element of a set with N elements (that emerge in the future. Please note that UDFs with external
element could be informally called the ‘halfway’ element). context are beyond the scope of this paper (c.f. subsection
A query that finds the median of a set is not very intuitively 3.2).
expressible in SQL92. For example, the simple query to
select the median of the ages of certain persons could be :ontexl UDSF UDAF
formulated as shown in Figure 9. Of course one would
prefer a query using a UDAF Median as shown in Figure none PARTITIONABLE WITH NOT REASONABLE
10. CLASS ANY
386
an input context. some partitioning function. The classification of Gray et al.
was designed with the goal to compute data cubes
In [ 121 and [39] parallel processing of aggregate functions efficiently. However, the rationale behind our work was to
in RDBMS has been studied. The proposed concepts are find a classification of functions that is useful for parallel
applicable to built-in aggregation functions and consider evaluation.
also aggregation in combination with GROUP-BY
operations and duplicate elimination. The proposed In [44] the class of decomposable aggregate functions is
algorithms in (391 may be combined with our framework, if introduced to characterize the aggregate functions that allow
user-defined aggregate functions are used with GROUP-BY. early and late aggregation as a query optimization
It has been observed in 1121that different local and global technique. This class of aggregate functions is identical to
functions are needed for parallel aggregation operations in partitionable aggregate functions of partitioning class ANY
RDBMS. In 136) the concept to process user-defined except that no size restriction for sub-aggregates is required
aggregate functions parallelly using two steps is proposed as in [44]. Thus for these partitionable functions also certain
a general technique, but neither are details nor more rewrite optimizations are possible that provide orthogonal
sophisticated processing techniques (like sorting as a measures to improve the performance. In ]SJ the class of
preprocessing step, early termination or partitioning classes) group queries is identified for relational queries. This class
presented. In 1291 RDBMS are extended by ordered is directly related to data partitioning. Our framework
domains, but neither is an object-relational approach taken provides support for the concept of group queries in object-
nor are functions considered. relational processing as well. Finally, we want to remark
that [24] contains some additional examples for the
It is interesting to compare our classification of aggregate application of our techniques”
functions in partitionable and non-partitionable aggregate
functions with other classifications. In [ 151 a classification 6. SUMMARY AND FUTURE WORK
of aggregate functions into three categories is developed
In this paper we have proposed a framework that allows
primarily with the goal to be able to determine, if super-
parallel processing of a broad class of user-defined
aggregates in data cubes can be computed based on sub-
functions with input context in ORDBMS. This is an
aggregates for a given aggregate function. It is pointed out
important step in removing a performance bottleneck
that this classification is also useful for the parallel
parallelly object-relational query processing.
computation of user-defined aggregate functions. In the
classification that is proposed in [ 151an aggregate function f Since it was clear that a straightforward application of data
with a given input multi-set S and an arbitrary partition Si parallelism is not possible, we had to devise more
of S is: sophisticated parallelization techniques. The three key
techniques that we have proposed here are the following:
l distributive iff there is a function g such that
First, we have generalized the parallel execution scheme for
f(s) = g(u I 5 i I k f(si>>.
aggregation in relational systems by means of local and
l algebraic iff there is an M-tuple valued function g and a global aggregations to allow its application to user-defined
function h such that f(S) = h(ul 2 1s k g(Si)). It is pointed aggregations. Second, we have introduced some
out that the main characteristic of algebraic functions is extensibility to the parallel execution schemes for scalar and
that a result of fixed size (an M-tuple) can summarize aggregate functions by means of user-defined partitioning
sub-aggregates. functions. We have defined classes of partitioning functions
l holistic iff there is no constant bound on the size of the to make the specification of all allowed partitioning
storage needed to represent a sub-aggregate. functions easier and to enable the optimizer to avoid data
repartitioning as much as possible. Third, we have
Clearly, distributive and algebraic functions are both introduced parallel sorting as a preprocessing step for user-
partitionable aggregate functions for the partitioning class defined aggregate functions. This enables an easier
ANY. Note that our definition of partitionable aggregate implementation of UDFs and the use of parallelism in the
functions is less restrictive with regard to the size of the sub- preprocessing phase. Furthermore, we have defined new
aggregates. Aggregate functions that are easy to implement interfaces that allow the developer to use these techniques
using a sorted input are typically holistic. Aggregate by providing the necessary information to the DBMS.
functions that are partitionable with a less general
partitioning class than ANY, e.g. the MOST-FREQUENT Some important remaining questions with respect to parallel
function, are holistic in this scheme, but can be evaluated object-relational query processing in general and especially
parallelly by our framework. Other holistic functions like UDFs are:
e.g. the Median function can be efficiently evaluated in our
approach, by using parallel sorting as a preprocessing step l Are there other classes of UDFs that do not comply with
and early termination. Note that the application scenario in our methodology? As an example consider user-defined
[15] is different to ours with regard to partitioning and table functions [21] that can be used to encapsulate
parallel evaluation, because the sub-aggregates in data cubes access to external data sources or external indexes [g].
must be computed for fixed partitions that are determined by l Though our framework supports parallel execution of’
semantically defined sub-cubes and not by the application of user-defined predicates, we believe that substantial addi-
387
tional work is necessary to avoid Cartesian product oper- [ 141Gray, J.: A Survey of Parallel Database Techniques and
ations in ORDBMS that are used to deal with most user- Systems, in: Tutorial handout at VLDB 1995.
defined join predicates. We are currently working on an [lS]Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Rei-
extension of our approach for user-defined join algo- chart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data
rithms to overcome Cartesian products and allow effi- Cube: A Relational Aggregation Operator Generalizing
cient parallel execution. Group-By, Cross-Tab, and Sub-Totals, Data Mining and
l How can parallelism be used to efficiently process UDFs Knowledge Discovery 1, p. 29-53, Kluwer Academic
on single, but very large ADTs [32] and collection types Publishers, 1997.
parallelly ([lo], [3])? [16]Haas, L. M., Chang, W., Lohman, G. M., McPherson,
Additional future work should be concerned with the J., Wilms, P. F., Lapis, G., Lindsay, B. G., Pirahesh, H.,
extension of query optimization to our approach to parallel Carey, M. J., Shekita, E. J.: Starburst Mid-Flight: As the
processing of UDFs. Dust Clears. TKDE 2(l): 143-160 (1990).
[ 171Haas, L. M., Kossmann, D., Wimmers, E. L., Yang, J.:
7. ACKNOWLEDGEMENTS Optimizing Queries Across Diverse Data Sources.
We gratefully acknowledge the valuable comments of the VLDB 1997: 276-285.
anonymous referees, which have helped to enhance the
presentation of this paper significantly as well as provided [ 181Hellerstein, J. M., Stonebraker, M.: Predicate Migra-
new insights. tion: Optimizing Queries with Expensive Predicates.
SIGMOD 1993: 267-276.
8. REFERENCES 1191Hellerstein, J. M., Naughton, J. F., Pfeffer, A.: General-
111 Antoshenkov, G., Ziauddin, G.: Query Processing and ized Search Trees for Database Systems. VLDB 1995:
Optimization in Oracle Rdb. VLDB Journal 5(4): 229- 562-573.
237 (1996). [20] Hellerstein, J. M., Naughton, J. F.: Query Execution
PI Carey, M. J., Dewitt, D. J.: Of Objects and Databases: Techniques for Caching Expensive Methods. SIGMOD
A Decade of Turmoil, VLDB 1996. 1996: 423-434.
c31 Carey, M. J., Mattos, N., Nori, A.: Object-Relational [2 1] IBM DB2 Universal Database SQL Reference Version
Database Systems: Principles, Products, and Chal- 5, Document Number SlOJ-8165-00, 1997: 441-453.
lenges (Tutorial). SIGMOD 1997: 502.
[22] Illustra User’s Guide, Illustra Information Technolo-
]41 Chamberlin, D.: Using the New DB2, Morgan Kauf- gies, Inc., 1995.
man Publishers, San Francisco, 1996.
[23] Informix Corporation, http://www.informix.com/infor-
[51 Chatziantoniou, D., Ross, K. A.: Groupwise Processing mix/products/techbrfs/dblade/prograrn/2 12287 1.htm,
of Relational Queries. VLDB 1997: 476-485. August 1997.
161 Chaudhuri, S., Shim, K.: Optimization of Queries with [24] Jaedicke, M., Mitschang, B.: A Framework for Parallel
User-defined Predicates. VLDB 1996: 87-98. Processing of Aggregate and Scalar Functions in
[71 Davis, J. R.: Creating an extensible, Object-Relational Object-Relational DBMS, TUM-I 9741, SFB-Bericht
Data Management Environment: IBM’s Universal Nr. 342125197A, September 1997. (http://www3.infor-
Database, White Paper, Database Associates Interna- matik.tu-muenchen.de/public/projekte/sfb342/publica-
tional, 1996. tions.html).
VI DeTJloch, S., Mattos, N.: Integrating SQL Databases [2S] Lohman, G. M.: Grammar-like Functional Rules for
with Content-Specific Search Engines. VLDB 1997: Representing Query Optimization Alternatives. SIG-
528-537. MOD 1988: 18-27.
191 Dewitt, D., Gray, J.: Parallel Database Systems: The [26] Mattos, N.: An Overview of the SQL3 Standard, Data-
Future of High Performance Database Systems, In: base Technology Institute, IBM Santa Teresa Lab, San
CACM, Vo1.35, No.6, 85-98, 1992. Jose, California, July 1996
[lo] Dewitt, D.: Parallel Object-Relational Database Sys- [27]Mattos, N., DeBloch, S., DeMichiel, L., Carey, M.:
tems: Challenges & Opportunities, invited talk, PDIS Object-Relational DB2, IBM White Paper, July 1996.
1996. [28]McKenna, W. J., Burger, L., Hoang, C., Truong, M.:
[ II] Dewitt, D. J., Carey, M., Naughton, J., Asgarian, M., EROC: A Toolkit for Building NEAT0 Query Optimiz-
Gehrke, J., Shah, D.: The BUCKY Object-Relational ers. VLDB 1996: 111-121.
Benchmark, SIGMOD 1997: 135- 146. [29]Ng, W., Levene, M.: OSQL: An Extension to SQL to
[ 121Graefe, G.: Query Evaluation Techniques for Large Manipulate Ordered Relational Databases. IDEAS
Databases. Computing Surveys 25(2): 73-170 (1993). 1997: 358-367.
[ 131Graefe, G.: The Cascades Framework for Query Opti- [30]Niblack, W., Barber, R., Equitz, W., Flickner, M., Glas-
mization. Data Engineering Bulletin 18(3): 19-29 man, E. H., Petkovic, D., Yanker, P., Faloutsos, C.,
(1995). Taubin, G.: The QBIC Project: Querying Images by
388
Content, Using Color, Texture, and Shape. Storage and Implementation, and Evaluation, SIGMOD 1997: 336-
Retrieval for Image and Video Databases (SPIE) 1993: 347.
173-187. [37] Red Brick Systems, Inc., http://www.redbrick.com/rbs-
(31]O’Connell, W., Ieong, I.T., Schrader, D., Watson, C., g/html/whpap.html, August 1997.
Au, G., Biliris, A., Choo, S., Colin, P., Linderman, G.,
[38] Seshadri, P., Livny, M., Ramakrishnan, R.: The Case for
Panagos, E., Wang, J., Walters, T.: Prospector: A Con-
Enhanced Abstract Data Types. VLDB 1997: 66-75.
tent-Based Multimedia Server for Massively Parallel
Architectures. SIGMOD 1996: 68-78. [39] Shatdal, A., Naughton, 3. F.: Adaptive Parallel Aggre-
gation Algorithms. SIGMOD 1995: 104-l 14.
[32]Olson, M. A., Hong, W. M., Ubell, M., Stonebraker,
M.: Query Processing in a Parallel Object-Relational [40] Stonebraker, M.: Inclusion of New Types in Relational
Database System, Data Engineering Bulletin, 12/1996. Data Base Systems. ICDE 1986: 262-269.
[33] Oracle Corporation, http://www.oracle.com/st/, August [41] Stonebraker, M.: The Case for Shared Nothing. Data-
1997. base Engineering Bulletin 9( 1): 4-9 (1986).
[34] Oracle Corporation, http://www.oracle.com/st/car- [42] Stonebraker, M., Moore, D.: Object-Relational DBMSs
tridgeslcontextf, August 1997. - The Next Great Wave, Morgan Kaufman Publishers,
[35] Oracle Corporation, http:Nwww.oracle.com/st/car- 1996.
tridgesltimei, August 1997. [43] Valduriez, P.: Parallel Database Systems: Open Prob-
[36] Pate], J., Yu, J. Kabra, N., Tufte, K., Nag, B., Burger, J., lems and New Issues, in: Distributed and Parallel Data-
Hall, N., Ramasamy, K., Lueder, R., Ellman, C., Kup- bases, Vol.1, No. 2, April 1993, 137-166.
sch, J., Guo, S., Dewitt, D. J., Naughton, J.: Building A [44] Yan, W. P, Larson, I?: Eager Aggregation and Lazy
Scalable GeoSpatial Database System: Technology, Aggregation. VLDB 1995: 345-357.
389