Parall Aggr

Uploaded by

Lefteris Sidirourgos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

Parall Aggr

Uploaded by

Lefteris Sidirourgos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

On Parallel Processing of Aggregate and Scalar Functions

in Object-Relational DBMS
Michael Jaedicke, Bernhard Mitschang
Technische Universittit Miinchen
Computer Science Department
80290 Mtinchen, Germany
+498948095184
[email protected]

1. ABSTRACT ORDBMS of some database vendors ([7], [32], 1311, 1331).

Nowadays parallel object-relational DBMS are Although first industrial implementations enter the
marketplace and the SQL3 standard 1261 is maturing, there
envisioned as the next great wave, but there is are still many topics left for research in this area ([IO], [I 11,
still a lack of efficient implementation concepts 121,[31).
for some parts of the proposed functionality.
Thus one of the current goals for parallel One of the current goals for ORDBMS is to move towards a
framework for constructing parallel ADTs [3] and more
object-relational DBMS is to move towards sophisticated query optimization and execution ([3], 138)).
higher performance. In this paper we develop a ADT functions are completely opaque for the query
framework that allows to process user-defined optimizer and thus allow only very restricted query
functions with data parallelism. We will optimization and execution techniques, if no further
describe the class of partitionable functions optimization and execution information is provided.
Additional information enables more sophisticated query
that can be processed parallelly. We will also optimization and execution as the ORDBMS knows and
propose an extension which allows to speed up understands at least part of the semantics of the ADT. This
the processing of another large class of func- will result in great performance improvements ([42], 1361).
tions by means of parallel sorting. Functions While there are different approaches to reach this goal (1221,
that can be processed by means of our tech- 141, [38]), most ORDBMS vendors currently offer ADT
niques are often used in decision support que- developers some parameters to describe the semantics of
user-defined functions.
ries on large data volumes, for example. Hence
a parallel execution is indispensable. Our main contribution in this paper is to show how a broad
class of user-defined functions can be processed parallelly.
1.1 Keywords This class includes user-defined aggregate functions - often
User-defined functions, aggregates, parallel query called set or column functions. To this aim we propose a
processing, ob.ject-relational database systems framework covering both the necessary interfaces that allow
the appropriate registration of user-defined aggregate
2. INTRODUCTION functions with the ORDBMS and their parallel processing.
According to [42] object-relational database management Parallel computing of user-defined aggregate functions is
systems (ORDBMS) are the next great wave. ORDBMS are especially useful for application domains like decision
proposed for all applications that need both complex queries support (e.g. based on a data warehouse that stores
and complex data types. Typical ORDBMS application areas traditional as well as non-traditional data, like spatial, text or
are e.g. multi-media and image applications [30], geographic image data), as decision support queries often must compute
information systems 1361, and management of time series complex aggregates. For example, it has been noted that in
1351 and documents [34]. Many of these applications pose the TPC-D Benchmark 15 out of the 17 queries contain
high requirements with respect to functionality and aggregate operations 1391. In addition, if scalar functions
performance on ORDBMS. Thus ORDBMS need to exploit with a global context are processed parallelly, caution is
parallel database technology. These observations have needed in order to get semantically correct results. Our
recently led to significant development efforts for parallel framework can help in this case, too. Furthermore, we show
that many aggregate functions can be easily implemented, if
permission lo make digital or hard ccapies of ali or part of this work for
Personal 0, Classroom Use is granted without fee provided that
their input is sorted out, and they can thus profit from
Copies are not made or distributed for profit or commer&l advane parallel sorting.
laga and that copies bear this notice and the full citation on the fir.1 page.
‘lo COPY otherwise. lo republish, to post on CC~VB,O 0, lo In Section 3 we give the necessary background on user-
redistribute lo lists. requires prior specific permission andlo, a fee.
SIGMOD ‘98 Seattle, WA, USA defined functions and show their limits with respect to
8 1996 ACM 0-89791-995.5/9S/oo&..$S,oo

’ This work was partially supportedby the DFG (SFB 342, B2).

379
parallel execution. Our framework for parallel processing of context. Since we do see only very limited opportunities for
user-defined functions is introduced in Section 4. After a database technology to enable parallelism for functions with
discussion of related work in Section 5, the closing Section external context we will not consider that kind of UDFs in
6 contains a short summary and a brief outlook to future this paper.
work.
3.2.1 User-Dejined Scalar Functions
3. USER-DEFINED FUNCTIONS Figure 1 provides an example of the syntax used in DB2
We will now provide the basic concepts and definitions we UDB [4] to register a new UDSF with the DBMS. The
use in this paper. We will concentrate on the concepts for scalar function add returns the sum of its two arguments of
our specific query processing problem and refer the reader the user-defined data type do1 lar .
to the literature for the general concepts of parallel
relational database query processing ([91, [14], [12], [431, CREATE FUNCTION add (dollar, dollar)
[41]) and object-relational query processing ([42], [4]). RETURNS dollar
EXTERNAL NAME ‘dollar!add’
3.1 Built-in and User-Defined Functions LANGUAGE C
Every RDBMS comes with a fixed set of built-in functions. PARAMETER STYLE DB2SQL
These functions can be either scalar functions or aggregate NOT VARIANT
functions. A scalar function can be used in SQL queries NOT FENCED
wherever an expression can be used. Typical scalar NOT NULL CALL
functions are arithmetic functions like + and * or concat NO SQL
for string concatenation. Functions for type casting are NO EXTERNAL ACTION
special scalar functions, too. A scalar function is applied to NO SCRATCHPAD
the values of some columns of a single row of an input table. NO FINAL CALL;
Figure 1. Registration of a new UDSF add in DB2
By contrast, an aggregate function is applied to the values of
a single column of either a group of rows or of all rows of an
input table. A group of rows occurs if a GROUP-BY clause As can be seen from this example there are already some
is used. Thus aggregate functions can be used in the parameters allowing the user to describe the characteristics
projection part of SQL queries and in HAVING clauses. The of a newly registered function. We refer the reader to [4] for
aggregate functions of the SQL92 standard are MAX, MIN, most of the details and provide only the relevant information
AVG, SUM and COUNT. Other statistical aggregate functions to our problem. An interesting feature is the possibility to
like standard deviation and variance are use a so-called scratchpad area for UDSFs. A scratchpad
provided by some RDBMS implementations [4]. area is a small piece of memory that is passed to a UDSF
with all calls and that is not deleted after the executed
In ORDBMS it is possible to use a user-dejined function function returns the control. Thus it is possible for a
(ZJDF) in nearly all places where a system-provided built-in function to maintain a global context (or global state) that
function can appear in SQL92. Thus there are two subsets of means information can be preserved from one function
UDFs: user-de$ned scalar functions (UDSFs) and user- invocation to the next. After the last call to the function
defined aggregatefunctions (UDAFs). within an SQL query, the scratchpad is deallocated by the
3.2 Definition of New UDFs system. Please note that the user can allocate more memory
than the rather small scratchpad area by simply allocating
Let us now describe briefly how UDFs are created in
ORDBMS. Users can write UDFs as so-called external some memory dynamically and hooking it up in the
functions in a 3GL (typically C and Java are supported as scratchpad. Often a scratchpad is used to store intermediate
languages) and then register them with the DBMS. results that have been computed from the arguments of
former function calls. We say that such UDSFs have an
In advanced object-relational systems it should be possible input context. The moving average function is an example
to implement the body of UDFs using SQL statements of such a scalar function. It allows to compute many
embedded in the code of a 3GL (similar to the usual aggregates (many moving averages) by means of a single
embedded SQL offered for application development). This scan over an input table.
allows access to the database in the function’s body. One
restriction is that a function should not modify the database, After a function has been registered, the developer of a UDF
if it is used in a SELECT statement. Furthermore a UDF should provide the query optimizer with some information
might perform an external action, e.g. read from or write to about the expected execution costs of a UDF. ORDBMS
a tile, send an email to the DB administrator, start a have to provide a suitable interface for this purpose. For
program, etc. The Informix Illustra ORDBMS already example DB2 allows to specify the I/O and CPU costs that
supports UDFs that consist of one or more SQL statements. are expected for the first call to a function, for each further
call, and per argument byte that is read by the function. In
If DML statements or external actions are used in UDFs, the addition to this, the percentage of the argument’s storage
UDF might depend on arbitrary data in the database or size that is processed at the average call to the UDF should
elsewhere. We say that these functions have an external be specified. If a UDF is used as a predicate (i.e. the

380
function returns a boolean value) the user should be able to presentation as simple as possible.
specify a user-defined selectivity function [42]. Since
We first have to create three UDSFs INIT-MF, ITER-MF,
providing these details can be a time consuming task, easy
FINAL-MF that provide the implementation routines of the
to use development kits may be offered for this task ([7],
MOST-FREQUENT aggregate function. These three routines
WI, [331).
are programmed as external functions, i.e. they are written
3.2.2 User-Dejined Aggregate Functions e.g. in C and can use the system-provided API for UDFs to
Let us now see how, for example, the Informix Illustra handle tasks like memory allocation, etc. Then they are
ORDBMS supports UDAFs. The system computes registered using the CREATE FUNCTION statement:
aggregate functions in a tuple-at-a-time fashion, i.e. there is CREATE FUNCTION INIT-MFO
one function call for each element of the input set. The user RETURNS POINTER
has to write the three following external functions to EXTERNAL NAME 'libfuncs!mf-init'
implement a new UDAF: LANGUAGE C . . ..
l hit(): CREATE FUNCTION ITER-MF(POINTER, INTEGER)
The Init function is called only once and without argu- RETURNS POINTER
ments to initialize the aggregate computation before the EXTERNAL NAME 'libfuncs!mf-iter'
actual computation of the aggregate begins. It returns a LANGUAGE C . . . .
pointer to memory, which it has allocated to store inter-
mediate results during the aggregation. CREATE FUNCTION FINAL-MFO
RETURNS INTEGER
l Iter(pointer, value): EXTERNAL NAME 'libfuncs!mf-final'
The Iter function is called once for each element of the LANGUAGE C . . . .
input set. One parameter is the value of this element and
the other is the pointer to the allocated memory. It aggre- The function INIT-MF allocates and initializes memory to
gates the next value into the current aggregate that is store the integer values together with a count and returns a
stored in memory using the pointer. It returns the pointer pointer to that memory. The function ITER-MF stores its
to the allocated memory. argument in the allocated memory, if it is an integer value
not seen so far, and increments the count for this value.
l aggregate value = FinaQpointer):
Finally, the FINAL-MF function searches for the value
The Final function is called once after the last element of
with the maximum count and returns this value. Now we
the input set has been processed by the Iter function. It
create the UDAF with the CREATE AGGREGATE
computes and returns the resulting aggregate using the
statement:
pointer to the allocated memory. In addition, it deallo-
cates the memory. CREATE AGGREGATE MOST-FREQUENT
The pointer, similar to the scratchpad area mentioned before (
(cf. [4]), allows to store the input context of the init = INIT-MF()
computation. For example to compute the average of a set of iter = ITER-MF(POINTER, INTEGER)
values, the Zter function would store both the sum of all final = FINAL-MF(POINTER)
values seen so far and their number as intermediate results ) ;
in the allocated memory. The Final function would divide Now the MOST-FREQUENT function can be used as a new
the sum by the number and return the result. The reader aggregate function in queries. We will now explain, why
should note that all practical aggregate functions have an this aggregate function cannot be processed parallelly.
input context.
UDSFs without context can be executed parallelly using
Obviously this design matches the usual Open-Next-Close data parallelism. Instead of executing a set of function
protocol [ 121 for relational operators. After the three invocations in a sequential order, one simply partitions the
functions have been registered with the ORDBMS (cf. data set (horizontal fragmentation) and processes the UDSF
Figure l), the user can create the aggregate function (e.g. for each data partition parallelly. This parallel execution
average) using a CREATE AGGREGATE statement. This scheme is shown in Figure 2 for a selection.
statement determines, which three functions are used to
implement the hit, Iter and Final functions for the new
aggregate function.

3.3 Limits of Current ORDBMS

We will now describe the limits of current ORDBMS with
respect to the parallel execution of UDFs. To provide a
concrete example, we use the user-defined aggregate
function MOST-FREQUENT, which computes the most
frequently occurring integer value in a column of type Figure 2. Parallel selection in RDBMS
integer. We have omitted some details to make the

381
Obviously aggregate functions cannot use this approach enhance the parallel processing of UDFs with an input
without modification as they have an input context and context. In Subsection 4.1 we introduce local and global
deliver only a single result for a set of input tuples. Parallel aggregation functions for UDAFs as a generalization of the
aggregation operations in RDBMS use an execution scheme relational processing scheme. In Subsection 4.2 we
consisting of two steps [ 121 as shown in Figure 3. After the introduce partitioning classes and define the class of
data has been partitioned, it is first aggregated locally for partitionable functions that can be processed with data
each partition and then, in a second step, the locally parallelism. In Subsection 4.3 we propose sorting as a
computed sub-aggregates are combined in a global preprocessing step to enhance parallel execution for non-
aggregation (merging step in Figure 3). For the aggregate partitionable UDAFs.
function COUNT the local aggregation counts while the
global aggregation computes the sum of the local counts, 4.1 Two Step Parallel Aggregation of UDAFs
Generally speaking, the local and global aggregation In this Subsection we will show how aggregates can be
functions needed for parallel execution are different from processed in 2 steps using local and global aggregate
the aggregate function that is used for sequential execution. functions. .
For built-in aggregate functions local and global
aggregation functions are system-provided. Thus the DBMS To simplify the presentation below, we will omit constant
can use these functions for parallel execution. For UDAFs input parameters to UDFs. Given a set S, we will use
there is currently no possibility to register additional local shorthand notations like f(S) for the resulting aggregate
and global aggregation functions. This is the reason, why a value of an aggregate function f applied to S. We will also
UDAF like the MOST-FREQUENT function cannot be use the notation f(S) to denote the result of repeatedly
executed with the usual 2 step parallel aggregation scheme. invoking a scalar function f for all elements of S. We want to
emphasize that in this case f(S) denotes a multi-set of values
(a new column).
Next, we define the class of aggregate functions that can be
processed parallelly using local and global aggregation
functions. An aggregate function f is partitionable iff two
aggregate functions ft and fg exist, such that for any multi-
set S and some partition Si of S, 1 < i 5 k, the following
equation holds:

f(s) = f&u1 5 i I k {fl(si>))

The notation ft indicates that the function is applied locally
I Figure 3. Parallel aggregation in RDBMS I (for each partition), whereas f is applied globally. In
addition the result size of the locaB function fl must be either
Another problem is that current ORDBMS do not allow the bound by a constant or it must be a small fraction of the
developer to define a special partitioning function for a input size. This requirement is important, since otherwise
UDAF. However, unfortunately not all UDAFs can be one could use the identity as local function and the
processed parallelly on all kinds of partitions as we will sequential aggregation function as the global function.
show later. The latter is also valid with respect to scalar Clearly, this is not desirable, since it would not improve
functions that have an input context. In many cases the processing. In general, the smaller the size of the local
result will be semantically incorrect if the data partitioning results, the better the speed up that can be expected, as there
does not take the semantics of the function into is less data to be exchanged and thus less input for the
consideration. global aggregation. Obviously, if an aggregate function is
partitionable, the local aggregate function can be executed
In summary, UDSFs with an input context and all practical for all partitions Si parallelly, while the global aggregation
UDAFs can not be processed parallelly without special has to be processed sequentially.
support by the DBMS. This situation will result in a
performance bottleneck in parallel ORDBMS query If an aggregate function is used in combination with
processing. In shared-nothing and shared-disk parallel grouping, the optimizer can also decide to process several
architectures the input data is often distributed over various groups parallelly. In this case grouping can be done with the
nodes and must be shipped to a single node to correctly algorithms described in [391. The algorithms discussed
process a UDF with input context, i.e. sequentially, and there can be applied orthogonally to our approach. Of
afterwards the data possibly has to be redistributed for course, if enough parallelism is possible by processing
further parallel processing. This results in additional different groups parallelly, the optimizer might decide that
communication costs and hence even worse performance. no further parallel processing of the aggregate function is
needed.
4. PARALLEL PROCESSING OF UDFS One disadvantage of this 2 step approach to parallel
In this Section we describe several orthogonal approaches to aggregation is that we are not always able to apply one

382
aggregate function to both sequential and parallel greater than the value N. This class of partitioning func-
processing. Therefore the developer might have to tions is especially useful for scalar functions that require
implement and register six additional functions (Init, Iter, a sorted input, for example scalar functions that compute
and Final functions for local aggregation and the same for moving averages (see Section 4.5).
global aggregation) to enable parallel as well as sequential Please note that the following inclusion property holds:
processing of a UDAE However, if one does not need RANGE C EQUAL C ANY. This taxonomy is useful to
maximum efficiency for sequential evaluation, one can classify UDFs according to their processing requirements as
simply use the local and global function for sequential we will see below. The database system can automatically
execution, too. This, however, will incur at least the provide at least a partitioning function of class ANY for all
overhead for the invocation of an additional function. On the user-defined data types (e.g. round-robin). We define that a
other hand, the additional work for the developer will pay class Cpartition of a multi-set is a partition that is generated
off with all applications that are profiting from the increased using a partitioning function of class C (C denotes either
potential for parallelism. Besides that, there seems to be no ANY, EQUAL or RANGE).
solution that results in less work for the developer.
Based on these definitions we can now define the classes of
4.2 Partitioning Classes and Partitionable partitionable aggregate and scalar functions. These classes
Functions describe the set of UDFs that can be processed parallelly
One prerequisite for data parallelism is that one has to find a with the usual execution schemes for data parallelism (cf.
suitable partitioning of the data. This means that the Figures 2 and 3) and a particular class of partitioning
partitioning must allow a semantically correct parallel functions.
processing of the function. In order to ease the specification A scalar function f is partitionable for class C iff a function
of all partitionings that are allowed for the correct parallel ft exists, such that for any multi-set S and any class C
processing of a UDFs, we describe a taxonomy of the partition Si of S, 1 5 i I k, the following equation holds:
functions that can be used for partitioning.
f(s) = u1 2 i 5 k fitsi>
All partitioning functions take a multi-set as input and
return a partition of the input multi-set, i.e. a set of multi- An aggregate function f is partitionable for class C iff two
sets such that any element of the input multi-set is contained functions fi and fg exist, such that for any multi-set S and
exactly in one resulting multi-set. Actually in some cases we any class C partttion Si of S, 1 I i 5 k, the following
will allow functions returning subsets that are not disjoint, equation holds:
i.e. functions that replicate some of the elements of the input
f(s) = fg(u1 5 i < k {fl(si) 1)
set. We define the following increasingly more special
classes of partitioning functions: The schemes in Figure 4 and Figure 5 show how
partitionable functions can be processed parallelly. All k
l ANY the class of all partitioning functions. Round- partitions can be processed parallelly. The actual degree of
robin and random functions are examples that belong to parallelism (i.e. mainly the parameter k) has to be chosen by
no other class. All partitioning functions that are not the optimizer as usual. Please note, that for the scheme in
based on attribute values belong only to this class. Figure 4, there is not always a need to combine the local
l EQUAL (column name): the class of partitioning func- results. Hence, the optional combination step (computing
tions that map all rows of the input multi-set with equal f(S) = ~1 2 i 5 k fi(Si)) is left out. In order to enable the
values in the selected column into the same multi-set of DBMS to process a UDF parallelly the developer must
the result. Examples of EQUAL functions are partition- specify the allowed partitioning class when the function is
ing functions that use hashing. registered (cf. Section 4.4).
l RANGE (column name [, N]): the class of partitioning
functions that map rows, whose values of the specified A scalar function f that is partitionable for class C using
column belong to a certain range, with the same multi- the associated function ft can be evaluated parallelly
set of the result. Obviously there must exist a total order using the following scheme, given a multi-set S and a
on the data type of the column. The range of all values of partitioning function p of class C:
the data type is split into some sub-ranges that define 1. Partition S in k subsets Si, 1 < i 5 k, using p.
which elements are mapped into the same multi-set of
Distribute the partitions to some nodes in the system.
the resulting partition. Based on the total order of the
data type the optional parameter N allows to specify that 2. Compute ft(Si) for 1 I i 5 k for all Si parallelly.
the largest N elements of the input set which are smaller
Figure 4. Parallel processing scheme for partitionable
than the values of a certain range have to be replicated
scalar functions
into the resulting multi-set of this range. Replicated ele-
ments must be processed in a special way and are
needed only to establish a “window” on a sorted list as a We have introduced some extensibility to the traditional
kind of global context for the function. The number of parallel execution schemes by parameterizing the
elements that belong to a certain range should be much partitioning step by means of the partitioning function. In

383
are produced. For example, when the moving average over
An aggregate function f that is partitionable for class C
five values is computed, the first four values of a partition
using the two associated functions fl and fg can be will be replicated ones and are stored in the global context
evaluated parallelly using the .following scheme, given
of the function. Then, the fifth invocation produces the first
an input multi-set S and a partitioning function p of class
result. Though this extension is conceptually simple, it may
c:
be difficult to add it to an existing execution system.
1. Partition S in k subsets Si, 1 < i < k, using p.
Distribute the partitions to some nodes in the system.
4.3 Parallel Sorting as a Preprocessing Step
for UDAFs
2. Compute Ii := fl(Si) for 1 I i I k for all Si parallelly.
Some user-defined aggregate functions can be easily
Send the intermediate results Ii to a single node for implemented, if their input is sorted according to a specified
processing of step 3. order. In this case the sort operation can be executed
3. Compute f(S) := fs(Ut 5 i 2 k {Ii}); parallelly. Of course, this is especially interesting for
fp can be applied to the Ii in arbitrary order. UDAFs that are not partitionable.
Sorting as a preprocessing step for UDAFs can be
Figure 5. Parallel processing scheme for partitionable
introduced by using an additional parameter in the CREATE
aggregate functions
FUNCTION statement (see Section 4.4 for details of the
syntax we propose). Of course the user must have the
addition, we have defined classes of partitions to allow the possibility to specify a user-defined order by providing a
optimizer more flexibility w.r.t. to the choice of the specific sort function for the argument types of the UDF that
partitioning function. The query optimizer can try to avoid are often user-defined data types. In most cases such
data repartitioning, when multiple UDFs are processed, if functions will be needed anyway, to support sorted query
the developer specifies only the class of the partitioning results, to build indexes (like generalized B-Trees [42] or
functions. This can reduce processing costs dramatically, GiSTs [19]) or for sort merge joins to efficiently evaluate
especially for shared-disk and shared-nothing architectures. predicates on user-defined data types, to quote some
If the developer specifies a single partitioning function for examples.
each UDF, in almost all cases a repartitioning step will be
needed to process a UDF parallelly. Vice versa, if a single One interesting point to observe is that many aggregate
partitioning function satisfies all of the partitioning classes functions, which operate on a sorted input, do not need to
of a given set of UDFs, then repartitioning can be avoided. read the complete input set to compute the aggregate. Thus
it might be well worth to provide the aggregate function
Because UDFs can have arbitrary semantics, we believe that with the option to terminate the evaluation as early as
it is not possible to define a fixed set of partitioning possible and return the result. We call this feature early
functions that allows to apply data parallelism to all UDFs. termination. The parallel processing scheme for aggregate
If a given UDF is partitionable using some partitioning functions with sorted inputis shown-in Figure 6. -- -
function p, but none of the partitioning classes defined
above, the developer should be enabled to specify that this
function p must be used. Using a special partitioning
An aggregate function f that requires a sorted input car n
1
be evaluated using the following scheme given inpu t
function should be avoided in general, since all data has to
multi-set S:
be repartitioned before such a UDF can be processed.
1. Sort the input S. This can be done parallelly.
We want to remark here that implementing RANGE
partitioning is a bit complicated, since a user-defined sorting 2. Compute f(S) without parallelism
order and partial replication have to be supported. One (use early termination, if possible).
difficulty is for example to find equally populated ranges for Figure 6. Parallel processing scheme for aggregate
a given user-defined sort-order. We believe that range functions with sorted input
partitioning with partial replication can be best supported by
an appropriate extension of the built-in sort operator of the
ORDBMS. This operator has to support user-defined sorting The optional sort requirements can be integrated into rule-
orders anyway. The definition of ranges and partial based query optimization (see e.g. [13], [25], [16], [28])
replication can be supported, if information about the data is simply by specifying the sorting order as a required physical
collected during the sorting process. property for the operator executing the UDF. Then a sort
enforcer rule (1131, [ 171) can guarantee this order
In addition to that extension, the operator that invokes requirement by putting a sort operation into the execution
UDSFs has to be extended. The UDSF that needs the range plan, if necessary. Informix’ Illustra [22] supports already
partitioning is evaluated immediately after the partitioning. optional sorting of inputs for UDFs that have two arguments
Replicated data elements (that have to be marked) are and return a boolean value. The developer can specify a
processed by the UDSF in a special mode that has to be user-defined order for the left and right input of such a
indicated by turning a special switch on. In this mode only function. Obviously this allows to implement a user-defined
the global context of the UDSF is initialized and no results join predicate using a sort-merge join instead of a Cartesian

384
product followed by a selection. Thus our proposal can be would be desirable.
seen as an extension of this approach w.r.t. to a broader class
of supported UDFs and their parallel execution. 4.5 Example Applications and Discussion
In this Subsection we present some example applications to
4.4 Extended Syntax for Function Registration illustrate the benefits of the introduced techniques.
In this Subsection, we present the syntax extensions for the
statements that allow the registration of UDFs with support 4.5.1 Application to the UDAF mostfrequent
for the features introduced in the previous Subsections.
First, we will demonstrate how parallel execution can be
Figure 7 shows the extensions for the CREATE FUNCTION enabled for the most frequent aggregate function.
statement. We have marked our extensions by boldface. The
ORDER BY clause can be used to specify a sorting order How can we use the 2 step processing scheme to process the
that is required for the input table, on which the function is most frequent function parallelly? A straightforward
executed. The input table can be sorted on multiple columns approach could be to compute the most frequent value for
applying user-defined sort functions to define the sort order. each partition parallelly using the local aggregate function.
Furthermore the developer must specify, if early termination This implies that the local aggregate function returns the
is used. To enable parallel evaluation, the partitioning class most frequent value together with the number of its
has to be specified. In addition to ANY, EQUAL and occurrences (i.e. the return type of the local function is a
RANGE partitioning, the developer can register a special row type or a special user-defined type). Then the overall
(user-defined) partitioning function for a UDF. most frequent value is computed by the global function.
Obviously this scheme is only correct if EQUAL is specified
CREATE FUNCTION <function-name> (<argumenttype list>) as the partitioning class for the local aggregation function. If
RETURNS <data type name> ANY would be used as partitioning class, the local
EXTERNAL NAME <external function name> aggregate function would have to return all distinct values
[ORDER BY {<argument name> [USING <sort function together with the number of their occurrences for each
name>] [ASC I DESC] } [EARLY TERMINATION]]
[ALLOW PARALLEL WITH PARTITIONING CLASS (
partition. Thus the local aggregation step would not be
ANY useful.
I EQUAL (<argument name list>)
I RANGE {<argument name> [, <number>] One difficulty of this approach is to implement the local
[USING <sort function name>] [ASC I DESC]} aggregation function, since it must temporarily store cl11
I <partitioning function name> )] distinct values together with a counter. It is difficult to
LANGUAGE <languagename> implement this efficiently in a user-defined function, since
the function must be able to store an arbitrarily large data
Figure 7. Extensions to UDSF registration
set. By contrast, the local aggregation can be done much
I easier if the developer uses sorting as a preprocessing step.
The function must then only store two values and two
Figure 8 shows the extensions for the CREATE counters: one for the most frequent value seen so far and
AGGREGATE statement. It now includes the local and one for the last value seen. This approach is much more
global function options that are needed to register the practical. Based on the syntax from Section 4.4 we show the
aggregate functions that have to be used for the parallel registration of the Iter function for the local aggregation
evaluation of the new aggregate function. Of course the (‘$i’ denotes the argument at position i in the parameter list
various Init, Iter, and Final routines that are registered must of the function).
be consistent w.r.t. their argument types. For example the
sequential and the global Final function must have the same CREATE FUNCTION
return types (but often will have different argument types). ITER-MF-LOCAL(POINTER, INTEGER)
RETURNS POINTER;
As we mentioned already in Section 3.2 additional
EXTERNAL NAME 'libfuncs!mf-iter-local'
information about these functions should be supplied by the
ORDER BY $2 ASC
developer. In addition to the usual cost parameters
ALLOW PARALLEL WITH PARTITIONING CLASS
information about the size of the results of the local
EQUAL $2
aggregation function (perhaps depending on the cardinalit Y
LANGUAGE C . . . .
of the input set, if the function returns a collection type)
Instead of using EQUAL and the ORDER BY clause, one
CREATE AGGREGATE <function-name> could have also used RANGE as partitioning class. But this
( would have two disadvantages: first, all data must be sorted
<hit, her, and Final function definition>
[LOCAL chit, Iter, and Final function definition> ] before the partitioning. With EQUAL as partitioning class
[GLOBAL chit, Iter, and Final function definition> ] the data is first partitioned and then only the partitions are
1 sorted as specified in the ORDER BY clause. Second,
Figure 8. Extensions to UDAF registration
repartitioning would occur more often due to the more
restrictive partitioning class.

385
#.5.2 Application to the UDSF running average SELECT Median(P.Age, COUNT( *))
As an example of a UDSF with input context, we discuss the FROM Persons AS P
running average function. This functions computes for each
input value the average of the N values seen last. This means Figure 10. Computing the median in object-relational SQL
that the input context of the function is a ‘window’ of size
from Figure 9 has one of O(N2). In case of the Median
N. Thus the running average function is partitionable of
function using the early termination option would save
class RANGE with parameter N. Obviously the running
roughly half of the calls to the function.
average function computes many aggregates with a single
scan over the input table. This is a typical example of a
4.6 Summary
UDSF with an input context. Other functions of that kind
are for example available in Red Brick Systems’ Intelligent Table 1 shows the different kinds of contexts that can occur
SQL [37]. for UDFs and the implications for parallel execution with
respect to data parallelism for aggregate and scalar
4.5.3 Application to the UDAF Median functions. As can be seen from Table 1, our techniques
As an example of a function that seems to be not support data parallelism with respect to many, but not all
partitionable consider the Median function that computes UDFs with input context. Additional techniques might
the r(N+l)/21 largest element of a set with N elements (that emerge in the future. Please note that UDFs with external
element could be informally called the ‘halfway’ element). context are beyond the scope of this paper (c.f. subsection
A query that finds the median of a set is not very intuitively 3.2).
expressible in SQL92. For example, the simple query to
select the median of the ages of certain persons could be :ontexl UDSF UDAF
formulated as shown in Figure 9. Of course one would
prefer a query using a UDAF Median as shown in Figure none PARTITIONABLE WITH NOT REASONABLE
10. CLASS ANY

input PARTITIONABLE WITH PARTITIONABLE WITH

SELECT MIN(Age) LOCAL & GLOBAL AGGRE-
SOME CLASS
FROM Persons AS P
GATION AND SOME CLASS
WHERE OR
OR
(SELECT Ceiling((COUNT(*) + 1) / 2)
NOT PARTITIONABLE PARALLEL SORTING
FROM Persons)
(& EARLY TERMINATION)
OR
FCELECT COUNT( *) FROM Persons AS R
NOT PARTITIONABLE
WHERE R.Age <= P.Age) It
I NOT TREATED HERE NOT TREATED HERE
Figure 9. Computing the median in relational SQL lexterna
Table 1: UDFs and support for their parallel execution
The object-relational query is not only easy to write, but will
also run more efficiently, because the Median function can 5. RELATED WORK
be implemented with lower complexity than the complex User-Defined Functions (UDFs) have attracted increasing
query in Figure 9 as we show in the following. The Median interest of researchers as well as industry in recent years
function is called with two parameters (cf. Figure 7): the (see e.g. [401, P21, U61, [27l, 1361, P21, 1311, []I, [3f%
first parameter is an element of the already sorted input set, [18], [20], [6J). Despite this, most of the work discusses
the second parameter is constant and gives the cardinality of only the non-parallel execution of UDFs. We see our
the input set’. When the function is called the first time, it contribution as a generalization and extension of the
computes the median position and stores this position in the existing work on the execution of user-defined functions
global context. In addition the function maintains a counter using data-parallelism. In [32] pipeline parallelism for
for the number of invocations. During each call the function functions as well as intra-function parallelism are discussed.
checks whether the median position is reached. In this case Intra-function parallelism allows a single function
the function stores the input value in the global context. invocation to be processed parallelly. This can be useful for
Because the input is sorted, this value is actually the median extremely expensive functions, e.g. a function processing a
of the input set. Finally the function returns the median. large data object like a satellite image. All concepts seem to
Obviously this function is easy to implement, because be orthogonal to our framework, but only applicable for
essentially it has to scan its input for the right position. This scalar functions. Recently, IBM added the optional clause
implementation has an asymptotic complexity of O(N 1ogN) ALLOW PARALLEL or DISALLOW PARALLEL to the
due to sorting, while the computation with the SQL query create function statement for UDSFs in DB2 UDB [21]. We
view this as a first step of support for parallel execution of
’ Some systemsdo not support nesting of aggregatefunctions. In UDFs that is consistent with our more comprehensive
this caseone could e.g. use a subquery in the FROM clause to framework. To the best of our knowledge there is no work
computethe cardinality. on parallel processing of scalar user-defined functions with

386
an input context. some partitioning function. The classification of Gray et al.
was designed with the goal to compute data cubes
In [ 121 and [39] parallel processing of aggregate functions efficiently. However, the rationale behind our work was to
in RDBMS has been studied. The proposed concepts are find a classification of functions that is useful for parallel
applicable to built-in aggregation functions and consider evaluation.
also aggregation in combination with GROUP-BY
operations and duplicate elimination. The proposed In [44] the class of decomposable aggregate functions is
algorithms in (391 may be combined with our framework, if introduced to characterize the aggregate functions that allow
user-defined aggregate functions are used with GROUP-BY. early and late aggregation as a query optimization
It has been observed in 1121that different local and global technique. This class of aggregate functions is identical to
functions are needed for parallel aggregation operations in partitionable aggregate functions of partitioning class ANY
RDBMS. In 136) the concept to process user-defined except that no size restriction for sub-aggregates is required
aggregate functions parallelly using two steps is proposed as in [44]. Thus for these partitionable functions also certain
a general technique, but neither are details nor more rewrite optimizations are possible that provide orthogonal
sophisticated processing techniques (like sorting as a measures to improve the performance. In ]SJ the class of
preprocessing step, early termination or partitioning classes) group queries is identified for relational queries. This class
presented. In 1291 RDBMS are extended by ordered is directly related to data partitioning. Our framework
domains, but neither is an object-relational approach taken provides support for the concept of group queries in object-
nor are functions considered. relational processing as well. Finally, we want to remark
that [24] contains some additional examples for the
It is interesting to compare our classification of aggregate application of our techniques”
functions in partitionable and non-partitionable aggregate
functions with other classifications. In [ 151 a classification 6. SUMMARY AND FUTURE WORK
of aggregate functions into three categories is developed
In this paper we have proposed a framework that allows
primarily with the goal to be able to determine, if super-
parallel processing of a broad class of user-defined
aggregates in data cubes can be computed based on sub-
functions with input context in ORDBMS. This is an
aggregates for a given aggregate function. It is pointed out
important step in removing a performance bottleneck
that this classification is also useful for the parallel
parallelly object-relational query processing.
computation of user-defined aggregate functions. In the
classification that is proposed in [ 151an aggregate function f Since it was clear that a straightforward application of data
with a given input multi-set S and an arbitrary partition Si parallelism is not possible, we had to devise more
of S is: sophisticated parallelization techniques. The three key
techniques that we have proposed here are the following:
l distributive iff there is a function g such that
First, we have generalized the parallel execution scheme for
f(s) = g(u I 5 i I k f(si>>.
aggregation in relational systems by means of local and
l algebraic iff there is an M-tuple valued function g and a global aggregations to allow its application to user-defined
function h such that f(S) = h(ul 2 1s k g(Si)). It is pointed aggregations. Second, we have introduced some
out that the main characteristic of algebraic functions is extensibility to the parallel execution schemes for scalar and
that a result of fixed size (an M-tuple) can summarize aggregate functions by means of user-defined partitioning
sub-aggregates. functions. We have defined classes of partitioning functions
l holistic iff there is no constant bound on the size of the to make the specification of all allowed partitioning
storage needed to represent a sub-aggregate. functions easier and to enable the optimizer to avoid data
repartitioning as much as possible. Third, we have
Clearly, distributive and algebraic functions are both introduced parallel sorting as a preprocessing step for user-
partitionable aggregate functions for the partitioning class defined aggregate functions. This enables an easier
ANY. Note that our definition of partitionable aggregate implementation of UDFs and the use of parallelism in the
functions is less restrictive with regard to the size of the sub- preprocessing phase. Furthermore, we have defined new
aggregates. Aggregate functions that are easy to implement interfaces that allow the developer to use these techniques
using a sorted input are typically holistic. Aggregate by providing the necessary information to the DBMS.
functions that are partitionable with a less general
partitioning class than ANY, e.g. the MOST-FREQUENT Some important remaining questions with respect to parallel
function, are holistic in this scheme, but can be evaluated object-relational query processing in general and especially
parallelly by our framework. Other holistic functions like UDFs are:
e.g. the Median function can be efficiently evaluated in our
approach, by using parallel sorting as a preprocessing step l Are there other classes of UDFs that do not comply with
and early termination. Note that the application scenario in our methodology? As an example consider user-defined
[15] is different to ours with regard to partitioning and table functions [21] that can be used to encapsulate
parallel evaluation, because the sub-aggregates in data cubes access to external data sources or external indexes [g].
must be computed for fixed partitions that are determined by l Though our framework supports parallel execution of’
semantically defined sub-cubes and not by the application of user-defined predicates, we believe that substantial addi-

387
tional work is necessary to avoid Cartesian product oper- [ 141Gray, J.: A Survey of Parallel Database Techniques and
ations in ORDBMS that are used to deal with most user- Systems, in: Tutorial handout at VLDB 1995.
defined join predicates. We are currently working on an [lS]Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Rei-
extension of our approach for user-defined join algo- chart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data
rithms to overcome Cartesian products and allow effi- Cube: A Relational Aggregation Operator Generalizing
cient parallel execution. Group-By, Cross-Tab, and Sub-Totals, Data Mining and
l How can parallelism be used to efficiently process UDFs Knowledge Discovery 1, p. 29-53, Kluwer Academic
on single, but very large ADTs [32] and collection types Publishers, 1997.
parallelly ([lo], [3])? [16]Haas, L. M., Chang, W., Lohman, G. M., McPherson,
Additional future work should be concerned with the J., Wilms, P. F., Lapis, G., Lindsay, B. G., Pirahesh, H.,
extension of query optimization to our approach to parallel Carey, M. J., Shekita, E. J.: Starburst Mid-Flight: As the
processing of UDFs. Dust Clears. TKDE 2(l): 143-160 (1990).
[ 171Haas, L. M., Kossmann, D., Wimmers, E. L., Yang, J.:
7. ACKNOWLEDGEMENTS Optimizing Queries Across Diverse Data Sources.
We gratefully acknowledge the valuable comments of the VLDB 1997: 276-285.
anonymous referees, which have helped to enhance the
presentation of this paper significantly as well as provided [ 181Hellerstein, J. M., Stonebraker, M.: Predicate Migra-
new insights. tion: Optimizing Queries with Expensive Predicates.
SIGMOD 1993: 267-276.
8. REFERENCES 1191Hellerstein, J. M., Naughton, J. F., Pfeffer, A.: General-
111 Antoshenkov, G., Ziauddin, G.: Query Processing and ized Search Trees for Database Systems. VLDB 1995:
Optimization in Oracle Rdb. VLDB Journal 5(4): 229- 562-573.
237 (1996). [20] Hellerstein, J. M., Naughton, J. F.: Query Execution
PI Carey, M. J., Dewitt, D. J.: Of Objects and Databases: Techniques for Caching Expensive Methods. SIGMOD
A Decade of Turmoil, VLDB 1996. 1996: 423-434.
c31 Carey, M. J., Mattos, N., Nori, A.: Object-Relational [2 1] IBM DB2 Universal Database SQL Reference Version
Database Systems: Principles, Products, and Chal- 5, Document Number SlOJ-8165-00, 1997: 441-453.
lenges (Tutorial). SIGMOD 1997: 502.
[22] Illustra User’s Guide, Illustra Information Technolo-
]41 Chamberlin, D.: Using the New DB2, Morgan Kauf- gies, Inc., 1995.
man Publishers, San Francisco, 1996.
[23] Informix Corporation, http://www.informix.com/infor-
[51 Chatziantoniou, D., Ross, K. A.: Groupwise Processing mix/products/techbrfs/dblade/prograrn/2 12287 1.htm,
of Relational Queries. VLDB 1997: 476-485. August 1997.
161 Chaudhuri, S., Shim, K.: Optimization of Queries with [24] Jaedicke, M., Mitschang, B.: A Framework for Parallel
User-defined Predicates. VLDB 1996: 87-98. Processing of Aggregate and Scalar Functions in
[71 Davis, J. R.: Creating an extensible, Object-Relational Object-Relational DBMS, TUM-I 9741, SFB-Bericht
Data Management Environment: IBM’s Universal Nr. 342125197A, September 1997. (http://www3.infor-
Database, White Paper, Database Associates Interna- matik.tu-muenchen.de/public/projekte/sfb342/publica-
tional, 1996. tions.html).
VI DeTJloch, S., Mattos, N.: Integrating SQL Databases [2S] Lohman, G. M.: Grammar-like Functional Rules for
with Content-Specific Search Engines. VLDB 1997: Representing Query Optimization Alternatives. SIG-
528-537. MOD 1988: 18-27.
191 Dewitt, D., Gray, J.: Parallel Database Systems: The [26] Mattos, N.: An Overview of the SQL3 Standard, Data-
Future of High Performance Database Systems, In: base Technology Institute, IBM Santa Teresa Lab, San
CACM, Vo1.35, No.6, 85-98, 1992. Jose, California, July 1996
[lo] Dewitt, D.: Parallel Object-Relational Database Sys- [27]Mattos, N., DeBloch, S., DeMichiel, L., Carey, M.:
tems: Challenges & Opportunities, invited talk, PDIS Object-Relational DB2, IBM White Paper, July 1996.
1996. [28]McKenna, W. J., Burger, L., Hoang, C., Truong, M.:
[ II] Dewitt, D. J., Carey, M., Naughton, J., Asgarian, M., EROC: A Toolkit for Building NEAT0 Query Optimiz-
Gehrke, J., Shah, D.: The BUCKY Object-Relational ers. VLDB 1996: 111-121.
Benchmark, SIGMOD 1997: 135- 146. [29]Ng, W., Levene, M.: OSQL: An Extension to SQL to
[ 121Graefe, G.: Query Evaluation Techniques for Large Manipulate Ordered Relational Databases. IDEAS
Databases. Computing Surveys 25(2): 73-170 (1993). 1997: 358-367.
[ 131Graefe, G.: The Cascades Framework for Query Opti- [30]Niblack, W., Barber, R., Equitz, W., Flickner, M., Glas-
mization. Data Engineering Bulletin 18(3): 19-29 man, E. H., Petkovic, D., Yanker, P., Faloutsos, C.,
(1995). Taubin, G.: The QBIC Project: Querying Images by

388
Content, Using Color, Texture, and Shape. Storage and Implementation, and Evaluation, SIGMOD 1997: 336-
Retrieval for Image and Video Databases (SPIE) 1993: 347.
173-187. [37] Red Brick Systems, Inc., http://www.redbrick.com/rbs-
(31]O’Connell, W., Ieong, I.T., Schrader, D., Watson, C., g/html/whpap.html, August 1997.
Au, G., Biliris, A., Choo, S., Colin, P., Linderman, G.,
[38] Seshadri, P., Livny, M., Ramakrishnan, R.: The Case for
Panagos, E., Wang, J., Walters, T.: Prospector: A Con-
Enhanced Abstract Data Types. VLDB 1997: 66-75.
tent-Based Multimedia Server for Massively Parallel
Architectures. SIGMOD 1996: 68-78. [39] Shatdal, A., Naughton, 3. F.: Adaptive Parallel Aggre-
gation Algorithms. SIGMOD 1995: 104-l 14.
[32]Olson, M. A., Hong, W. M., Ubell, M., Stonebraker,
M.: Query Processing in a Parallel Object-Relational [40] Stonebraker, M.: Inclusion of New Types in Relational
Database System, Data Engineering Bulletin, 12/1996. Data Base Systems. ICDE 1986: 262-269.
[33] Oracle Corporation, http://www.oracle.com/st/, August [41] Stonebraker, M.: The Case for Shared Nothing. Data-
1997. base Engineering Bulletin 9( 1): 4-9 (1986).
[34] Oracle Corporation, http://www.oracle.com/st/car- [42] Stonebraker, M., Moore, D.: Object-Relational DBMSs
tridgeslcontextf, August 1997. - The Next Great Wave, Morgan Kaufman Publishers,
[35] Oracle Corporation, http:Nwww.oracle.com/st/car- 1996.
tridgesltimei, August 1997. [43] Valduriez, P.: Parallel Database Systems: Open Prob-
[36] Pate], J., Yu, J. Kabra, N., Tufte, K., Nag, B., Burger, J., lems and New Issues, in: Distributed and Parallel Data-
Hall, N., Ramasamy, K., Lueder, R., Ellman, C., Kup- bases, Vol.1, No. 2, April 1993, 137-166.
sch, J., Guo, S., Dewitt, D. J., Naughton, J.: Building A [44] Yan, W. P, Larson, I?: Eager Aggregation and Lazy
Scalable GeoSpatial Database System: Technology, Aggregation. VLDB 1995: 345-357.

389

Introduction To Advanced Data Models
No ratings yet
Introduction To Advanced Data Models
18 pages
Oracle Interface - Oracle EBS R12 End To End Interface Process For Item Import
No ratings yet
Oracle Interface - Oracle EBS R12 End To End Interface Process For Item Import
68 pages
Code Standard S4 HANA
No ratings yet
Code Standard S4 HANA
65 pages
Ad3391 LAB MANUAL
No ratings yet
Ad3391 LAB MANUAL
23 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
144 pages
Advanced Data Management - For SQL, NoSQL, Cloud and Distributed Databases
No ratings yet
Advanced Data Management - For SQL, NoSQL, Cloud and Distributed Databases
375 pages
Java RoadMap
No ratings yet
Java RoadMap
4 pages
Oracledba12c Uclid
No ratings yet
Oracledba12c Uclid
322 pages
9-12 DBMS
No ratings yet
9-12 DBMS
589 pages
Master of Computer Application: Centre For Distance Education Anna University
No ratings yet
Master of Computer Application: Centre For Distance Education Anna University
251 pages
Object Relational Database
No ratings yet
Object Relational Database
3 pages
Oracle SQL Final
No ratings yet
Oracle SQL Final
87 pages
Assignment
100% (1)
Assignment
35 pages
Survey of Different Database Ssoftwares
50% (2)
Survey of Different Database Ssoftwares
17 pages
Advanced Data Base Mangement System
No ratings yet
Advanced Data Base Mangement System
182 pages
4 - 2 Oodbms and Ddbms
No ratings yet
4 - 2 Oodbms and Ddbms
48 pages
CH - 1 Concept of OODB
No ratings yet
CH - 1 Concept of OODB
42 pages
This Tutorial Teaches ASP PDF
No ratings yet
This Tutorial Teaches ASP PDF
223 pages
DDM 5
No ratings yet
DDM 5
46 pages
OODBMS - Concepts
No ratings yet
OODBMS - Concepts
9 pages
QUESTION
No ratings yet
QUESTION
3 pages
10 IT Paper
No ratings yet
10 IT Paper
7 pages
Mod8 Dbms
No ratings yet
Mod8 Dbms
20 pages
Database Management System (203105251) : Computer Science & Engineering
No ratings yet
Database Management System (203105251) : Computer Science & Engineering
54 pages
A New Technology Has in Which Relational and Object-Oriented Concepts Have Been Combined or Merged. These Systems Are Called
No ratings yet
A New Technology Has in Which Relational and Object-Oriented Concepts Have Been Combined or Merged. These Systems Are Called
18 pages
Oracle: Jagannath Gupta Institute of Engineering & Technology
No ratings yet
Oracle: Jagannath Gupta Institute of Engineering & Technology
51 pages
Fundamentals of Database System PDF
No ratings yet
Fundamentals of Database System PDF
5 pages
Adbms 12
No ratings yet
Adbms 12
30 pages
Session 33 - Data Independence
No ratings yet
Session 33 - Data Independence
19 pages
Dbms New Manual
No ratings yet
Dbms New Manual
143 pages
DBMS Final Lab Manual
No ratings yet
DBMS Final Lab Manual
24 pages
Object-Relational Database Systems - An Introduction
100% (1)
Object-Relational Database Systems - An Introduction
8 pages
DBMS Manual
No ratings yet
DBMS Manual
56 pages
Nav2013 Enus Cssol 03
No ratings yet
Nav2013 Enus Cssol 03
62 pages
An Introduction To OODB and Database System
No ratings yet
An Introduction To OODB and Database System
86 pages
DBMS FILE Amit Singh
No ratings yet
DBMS FILE Amit Singh
98 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
DB 2014
No ratings yet
DB 2014
167 pages
Lesson 2 - Introduction To Oracle Database
No ratings yet
Lesson 2 - Introduction To Oracle Database
8 pages
Database Systems Performance Evaluation For Iot Applications
No ratings yet
Database Systems Performance Evaluation For Iot Applications
14 pages
Chapter 4 Object Relational DBMSs
No ratings yet
Chapter 4 Object Relational DBMSs
23 pages
7th DBMS
No ratings yet
7th DBMS
65 pages
Applications of Object Relational Database Management Systems at BCS
No ratings yet
Applications of Object Relational Database Management Systems at BCS
13 pages
Adbms Notes
No ratings yet
Adbms Notes
17 pages
Iii. Current Trends: Object-Relational Dbmss
No ratings yet
Iii. Current Trends: Object-Relational Dbmss
24 pages
Prcatical File - Aditi Mahale XII C
No ratings yet
Prcatical File - Aditi Mahale XII C
65 pages
21CS53 DBMS Module3 QuestionBank 2023-24
No ratings yet
21CS53 DBMS Module3 QuestionBank 2023-24
3 pages
ADMS 2018 - Chapter Three
No ratings yet
ADMS 2018 - Chapter Three
34 pages
Object Relational
No ratings yet
Object Relational
27 pages
DBMS Unit 4
No ratings yet
DBMS Unit 4
21 pages
Blood Donation Management Database System Project
No ratings yet
Blood Donation Management Database System Project
14 pages
Overview of Oracle Database
No ratings yet
Overview of Oracle Database
46 pages
Unit I: Introduction
No ratings yet
Unit I: Introduction
41 pages
Database Applications (15-415) : ORM - Part I Lecture 11, February 11, 2018
No ratings yet
Database Applications (15-415) : ORM - Part I Lecture 11, February 11, 2018
45 pages
QB DBMS
No ratings yet
QB DBMS
15 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
11 pages
Big Nosql Data: Mike Carey
No ratings yet
Big Nosql Data: Mike Carey
35 pages
Object Relational DBMSs
No ratings yet
Object Relational DBMSs
34 pages
Multimodel Database With Oracle Database 18c
No ratings yet
Multimodel Database With Oracle Database 18c
16 pages
NoSQL Paper 2
No ratings yet
NoSQL Paper 2
18 pages
Chapter 1 Oracles QL Intro
No ratings yet
Chapter 1 Oracles QL Intro
31 pages
I924F20911 Amal Adamu Usman 穆云慧
No ratings yet
I924F20911 Amal Adamu Usman 穆云慧
12 pages
BDII
No ratings yet
BDII
10 pages
56fe PDF
No ratings yet
56fe PDF
9 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
11 pages
Object Relational DBMSs
No ratings yet
Object Relational DBMSs
34 pages
Relational Database Migration To Object Oriented Environment: A Reengineering Approach
No ratings yet
Relational Database Migration To Object Oriented Environment: A Reengineering Approach
7 pages
Course Code: CS 261 Course Title: Database Management System Class Day: Wednesday Timing: 9 AM TO 12:00 PM Lecture / Week No. 07
No ratings yet
Course Code: CS 261 Course Title: Database Management System Class Day: Wednesday Timing: 9 AM TO 12:00 PM Lecture / Week No. 07
23 pages
Cosc411 M2 2023
No ratings yet
Cosc411 M2 2023
7 pages
Chapter 10 Working With Tables
No ratings yet
Chapter 10 Working With Tables
7 pages
Solving Relational Database Problems With ORDBMS in An Advanced Database Course
No ratings yet
Solving Relational Database Problems With ORDBMS in An Advanced Database Course
11 pages
Tutorial 9
No ratings yet
Tutorial 9
9 pages
SQL and NoSQL
No ratings yet
SQL and NoSQL
5 pages
CC105 Laboratory Exercise #1 - Creating Database and Tables
No ratings yet
CC105 Laboratory Exercise #1 - Creating Database and Tables
2 pages
Converting Relational Databases Into Object-Relational Databases
No ratings yet
Converting Relational Databases Into Object-Relational Databases
17 pages
View vs. Materialized View
No ratings yet
View vs. Materialized View
3 pages
Systems Architecture For Management of Bim, 3D Gis and Sensors Data
No ratings yet
Systems Architecture For Management of Bim, 3D Gis and Sensors Data
9 pages
Database Models: Hierarchical Model
No ratings yet
Database Models: Hierarchical Model
6 pages
Big Data - Wikipedia, The Free Encyclopedia
No ratings yet
Big Data - Wikipedia, The Free Encyclopedia
10 pages
DBS Notes With Diagram
No ratings yet
DBS Notes With Diagram
4 pages
6) Object Relational Model: Component of The SQL Standard
No ratings yet
6) Object Relational Model: Component of The SQL Standard
11 pages
100 ETL Questions
No ratings yet
100 ETL Questions
5 pages
10 Disadvantages
No ratings yet
10 Disadvantages
3 pages
Answer Scheme
No ratings yet
Answer Scheme
4 pages
Woodger Computing Inc. - Architecture: Object-Oriented Databases
No ratings yet
Woodger Computing Inc. - Architecture: Object-Oriented Databases
5 pages
How To Connect Python Programs To MariaDB
No ratings yet
How To Connect Python Programs To MariaDB
6 pages

Parall Aggr

Uploaded by

Parall Aggr

Uploaded by

On Parallel Processing of Aggregate and Scalar Functions

1. ABSTRACT ORDBMS of some database vendors ([7], [32], 1311, 1331).

3.3 Limits of Current ORDBMS

f(s) = f&u1 5 i I k {fl(si>))

input PARTITIONABLE WITH PARTITIONABLE WITH

You might also like