0% found this document useful (0 votes)
71 views11 pages

Survey of Architectures of Parallel Database Systems: Programming and Computer Software November 2004

Parallel database system

Uploaded by

Ezekiel Jeremiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views11 pages

Survey of Architectures of Parallel Database Systems: Programming and Computer Software November 2004

Parallel database system

Uploaded by

Ezekiel Jeremiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/226747476

Survey of Architectures of Parallel Database Systems

Article  in  Programming and Computer Software · November 2004


DOI: 10.1023/B:PACS.0000049511.71586.e0 · Source: DBLP

CITATIONS READS
17 625

1 author:

Leonid Sokolinsky
South Ural State University
57 PUBLICATIONS   310 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Resource Management in Problem-Oriented Computing Environment View project

Columnar coprocessor for RDBMS View project

All content following this page was uploaded by Leonid Sokolinsky on 11 May 2015.

The user has requested enhancement of the downloaded file.


Programming and Computer Software, Vol. 30, No. 6, 2004, pp. 337–346. Translated from Programmirovanie, Vol. 30, No. 6, 2004.
Original Russian Text Copyright © 2004 by Sokolinsky.

Survey of Architectures of Parallel Database Systems


L. B. Sokolinsky
South Ural State University,
pr. im. V.I. Lenina 46, Chelyabinsk, 454080 Russia
e-mail: [email protected]
Received February 26, 2004

Abstract—The paper is devoted to the classification, design, and analysis of architectures of parallel database
systems. A formalization of the notion “parallel database system” is suggested, which relies on a concept of a
virtual machine. Based on this formalization, a new approach to the classification of architectures of parallel
database systems is suggested. Requirements to parallel database systems are formulated, which serve as crite-
ria for comparing various architectures. Various classes of architectures of parallel database systems are con-
sidered and compared.

1. INTRODUCTION tems, but, currently, it is not quite adequate either [8].


This is explained by the fact that the existing classifica-
A parallel database system is a database manage-
ment system (DBMS) implemented on a multiproces- tion approaches are based on mapping the parallel data-
sor system with high-degree connectivity. A multipro- base system architecture directly to the hardware archi-
cessor system with high-degree connectivity is a sys- tecture of the multiprocessor system, as shown in Fig. 1.
tem containing many processors and disks in which the The classification problem specified above can be
processors are connected with each other by means of a
communication network such that the network data solved by introducing some additional abstraction level
exchange time is comparable with the time of the data based on the notion of a virtual parallel database
exchange with a disk. This definition excludes from the machine. The architecture of a parallel database system
consideration distributed DBMSs implemented on sev- is mapped onto the architecture of the virtual parallel
eral independent computers connected through a local database machine, which, in turn, is mapped onto one
or/and global networks. The latter systems possess their or another hardware architecture of a multiprocessor
own specific features (such as those associated with dif- system (Fig. 2). We used this approach for the classifi-
ferent geographic locations of the computers, local cation of modern parallel database architectures. On the
autonomy, and software and hardware heterogeneity
[1]), and the problems associated with the large number
of processor nodes are usually not considered for such
systems. However, there exist a wide spectrum of sys-
tems—starting from traditional one-processor DBMSs SN CE CD CDN
ported to symmetric multiprocessor systems that use Scalability 2 3 3 3
only intertransaction parallelism and ending with com- Data availability 2 1 3 3
plex parallel systems implemented on clusters or multi- Load balancing 0 2 1 1
processors with massive parallelism that use parti- Interprocessor communications 0 2 1 1
tioned parallelism [2]—that meet the above definition. Cache coherence 3 2 0 3
Concurrency control 3 2 0 3
Currently, there exist several approaches to classify- Sum of points 10 12 8 14
ing parallel computation systems. A good survey of the
existing methods of the description and classification of Architecture of the
architectures of computational systems can be found in parallel database system
the book [3]. However, the existing classifications of
architectures of multiprocessor systems are either too
general from the DBMS standpoint (first of all, this
refers to the Flynn classification [4]), or too compli- Architecture of the
cated (for example, a taxonomic system suggested in multiprocessor system
[5]), or not quite adequate (this refers both to the Flynn
classification and the structural–functional classifica- Fig. 1. Traditional approach to the classification of architec-
tion from [6]). The Stonebraker classification [7] has tures of parallel database systems based on the classification
purposely been developed for parallel database sys- of the hardware architecture.

0361-7688/04/3006-0337 © 2004 åÄIä “Nauka /Interperiodica”


338 SOKOLINSKY

a microprocessor or a processor module. If several


Architecture of the database processes are executed on one physical pro-
parallel database system
cessor in the time-sharing mode, this physical proces-
sor is said to implement several virtual processors.
A virtual memory module is a virtual device used for
buffering database objects. A typical example of an
Architecture of the virtual object of a relational database is a relation or its frag-
parallel database machine ment (if partitioned parallelism is used). A virtual pro-
cessor can access an object of a database only through
its image loaded into some virtual memory module
accessible to this processor. In accordance with this, the
number of virtual memory modules in a virtual parallel
Architecture of the
database machine cannot exceed the number of virtual
multiprocessor system processors. In a real system, virtual memory modules
are usually implemented as physical modules of opera-
tive memory. Note that one physical memory module
Fig. 2. Classification based on the notion of a virtual paral- may be represented by several virtual memory modules
lel database machine.
(see, for example, [9]), and, vice versa, several physical
memory modules can be considered as one virtual
basis of the classification constructed, a qualitative com- memory module (see, for example, [10]).
parative analysis of various architectures is carried out. A virtual disk is a virtual device used for storing
The remaining part of the paper is organized as fol- database objects. In a real system, a virtual disk is usu-
lows. In Section 2, a notion of a virtual parallel database ally implemented as a physical disk device or an array
machine is defined. Based on this notion, in Section 3, of disks [11].
a classification of architectures of modern parallel data- A virtual communication network is a virtual device
base systems is given. Section 4 is devoted to the com- providing data transfers from one virtual memory mod-
parative analysis of various parallel architectures of ule to another. The transfers are implemented only by
database systems, which relies on a certain set of means of communicative actions of the corresponding
requirements. The last section gives summary of the virtual processors. Without loss of generality, we may
basic results obtained and conclusions, as well as dis- assume that a virtual parallel database machine has not
cusses directions of future researches. more than one virtual communication network. Note
that, if a virtual machine has only one memory module,
than this machine has no virtual network.
2. VIRTUAL PARALLEL DATABASE MACHINES A virtual parallel database machine is defined as a
A virtual parallel database machine is constructed connected graph whose nodes correspond to various
from the following virtual devices: virtual processors, virtual devices and edges, to dataflows. An example of
virtual memory modules, virtual disks, and virtual a virtual parallel database machine configuration is
communication network. shown in Fig. 3.
A virtual processor is a virtual device used for per- It should be noted that, in the given context, the vir-
forming a separate task defined as a database process. tual database machine is just some abstraction level in
A typical example of a database process is a query or a the system hierarchy of program modules of the DBMS
query agent (if a partitioned parallelism is used). In a and operating system implementing the database sys-
real system, a virtual processor may be represented by tem on a particular hardware platform. Note that, in the

(a) With straight edges (b) With broken edges


N N

P P P P P P P P P P P P

M M M M M M

D D D D D D

Fig. 3. An example of a configuration of the virtual parallel database machine. Here P is a virtual processor, M is a virtual memory
module, D is a virtual disk, and N is a virtual communication network.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


SURVEY OF ARCHITECTURES 339

framework of one system, we can consider several


abstraction levels of this kind. The virtual machine of P P P
each current level is implemented on the basis of func-
tions and services provided by the virtual machine of
the previous level. To better explain this, we consider an M V2
illustrative example shown in Fig. 4. Here, the system
hierarchy is based on the virtual machine V0 that has a
certain hardware–software implementation. The hard- D
ware platform of V0 is actually a union of several sepa-
rate computers. In this case, a UNIX/Linux system loaded
in each node might serve as the operating system, and MPI I2
package, as the system integration means.
Using the virtual machine V0, we can create a pro-
gram complex I1 implementing a virtual disk with the N
storage equal to the free disk space of all physical sys-
tem disks. In this case, I1 is said to implement a virtual P P P
machine V1 in which each processor has its own private
memory, but all processors share common disk space. V1
In other words, I1 defines a mapping of V0 onto V1. Sim- M M M
ilarly, in the environment V1, we can create a program
complex I2 implementing common virtual memory of a
size equal to the total free address space of all physical D
memory modules. As a result, we construct a mapping
from the configuration V1 onto the configuration V2 in
which all processors share common memory and com- I1
mon disk space. Having this done, we can complete the
DBMS implementation in the context of the configura-
tion V2. The database system obtained has a hybrid
N
architecture in the following sense: on a physical (zero)
level, it is a system without shared resources; on a log-
ical (first) level, it is a system with shared disks; and, on P P P
the virtual (second) level, we have a system with shared V0
memory and disks. Of course, the configuration
depicted in Fig. 4 is not viable because of difficulties M M M
associated with the inefficiency of accessing such a vir-
tual memory and virtual disk. However, in what fol- D D D
lows, we give examples of hybrid configurations that
can be used in real parallel database systems.
Fig. 4. Example illustrating the hierarchy of virtual data-
In conclusion, it should be noted that we use the base machines.
term “virtual machine” together with the term “data-
base” with the only purpose to restrict all variety of pos-
sible combinations and to exclude from the consider- 3.1. Stonebraker Classification
ation the architectures that are not adequate from the
database standpoint. These architectures, however, may The most popular approach to classifying parallel
occur quite adequate for other problems. database systems is that suggested by Stonebraker [7].
The Stonebraker classification is shown schematically
in Fig. 5. Here, P denotes processors, M stands for the
3. CLASSIFICATION OF ARCHITECTURES operative memory module, D is a disk device, and N is
OF PARALLEL DATABASE SYSTEMS the communication network.
According to the Stonebraker classification, parallel
The classification of architectures of parallel data- database systems can be divided into the following
base systems serves as a methodological basis for many three basic classes depending on the way the hardware
studies related to databases. Until recently, the taxo- resources are shared:
nomic approach suggested by M. Stonebraker was used
for these purposes. In this section, we revise the Stone- (1) SE (shared-everything) architectures with shared
braker taxonomy and describe some extensions of this memory and disks (Fig. 5a),
classification, which are based on a concept of virtual (2) SD (shared-disks) architectures with shared
database machines. disks (Fig. 5b),

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


340 SOKOLINSKY

(a) SE (b) SD (c) SN

N N
P P P

P P P
P P P
M
M M M
M M M

D D D D D D D D D

Fig. 5. The Stonebraker classification.

(3) SN (shared-nothing) architectures without which use partitioned parallelism. For the examples of
shared resources (Fig. 5c). the prototype SN systems, we can cite the following
The SE architecture (in [7], this architecture is ones: ARBRE [23], BUBBA [24], EDS [25], GAMMA
referred to as the shared-memory architecture) includes [26], KARDAMOM [27], and PRISMA [28]. Exam-
database systems in which each processor has access to ples of the commercial systems with the SN architecture
any disk (with the same access time) through the shared are NonStop SQL [29], Informix PDQ [30], NCR/Tera-
memory (Fig. 5a). The interprocessor communications data DBC [31], IBM DB2 PE [32], and others.
in the SE systems are implemented through the use of
the shared memory. The access to the disks in the SE 3.2. Extension of the Stonebraker Classification
systems is implemented usually through a common
buffer pool. It should be emphasized that each proces- The Stonebraker classification was used in many
sor in an SE system has its own cache memory. works devoted to the analysis of architectures of paral-
lel database systems (see, for example, [2, 32–35]).
There exist many parallel database systems with the However, this classification is currently considered
SE architecture. In fact, all leading commercial DBMSs obsolete and inadequate. Two arguments questioning
have implementations based on the SE architecture. the adequacy of the Stonebraker classification are as
One of the first examples of porting a single-processor follows ([8]):
system to the SE architecture is the implementation of
DB2 on IBM3090 with six processors [12]. Another (i) the Stonebraker classification does not cover all
example is the parallel composition of indexes in Infor- variety of the existing architectures;
mix Online 6.0 [13]. However, it should be noted that (ii) the classification based on sharing hardware
the majority of the commercial SE systems make use of resources is not appropriate for classifying architec-
only intertransaction parallelism (i.e., the intra-transac- tures of modern parallel database systems.
tion parallelism is lacking). Nevertheless, several pro- The first argument is based on the fact that there
totype SE systems that use the intra-transaction paral- appeared multiprocessor systems that combine features
lelism, such as XPRS [14], DBS3 [15], and Volcano of both the SE and SN architectures [8, 34–37]. To
[16], have already been created. describe such systems, Copeland and Keller [38] sug-
The SD architecture (Fig. 5b) includes database gested extending the Stonebraker classification by
systems in which each processor has access to any disk; introducing the following two additional classes of
however, each processor has its own private memory architectures of parallel database machines (Fig. 6):
[17]. The processors in such systems are connected • the CE (clustered-everything) architecture with SE
with each other through a high-speed network to make clusters joined on the basis of the SN principle (Fig. 6a);
it possible to transfer data. Examples of parallel database • the CD (clustered-disk) architecture with SD clus-
systems with the SD architecture are IBM IMS [18], Ora- ters joined on the basis of the SN principle (Fig. 6b).
cle Parallel Server [19] on nCUBE [20] and VAXclusters Boundaries of the SD clusters in Fig. 6 are extended to
[21], IBM Parallel Sysplex [22], and others. the common (global) interconnection network, since
In the SN architecture (Fig. 5c), each processor they can include their own (local) interconnection net-
has its own memory and its own disk. As in the SD sys- work.
tems, the processors are connected with each other These architectures are also referred to as hierarchi-
through a high-speed network, which makes it possible cal architectures [39]. Figure 6 shows two-level hierar-
to organize message exchange between the processors chies. However, the classification approach suggested
[2]. Currently, there exist many prototype systems and by Copeland and Keller can easily be extended to archi-
several commercial systems with the SN architecture, tectures with three or more hierarchy levels. An exam-

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


SURVEY OF ARCHITECTURES 341

(a) CE (b) CD

N N

P P P P P P P P P P P P
M M M M M M
M M

D D D D D D D D D D D D

Fig. 6. Extension of the Stonebraker classification.

ple of a three-level hierarchical architecture is CD2 used in the development of the parallel database system
(Clustered-Disk with 2-processor modules) architec- NonStop SQL/MP [29].
ture, which was used in designing the parallel database
system Omega [40, 41]. Two-level hierarchical archi-
tectures were studied in a number of works (see, for 4. COMPARATIVE ANALYSIS
example, [34, 37, 38, 42, 43]). Three-level hierarchical OF ARCHITECTURES OF PARALLEL
architectures almost have not been studied by now. DATABASE SYSTEMS
The second argument is related to the fact that mod- In this section, we compare various parallel archi-
ern multiprocessor systems, as a rule, have hardware tectures of database systems on the basis of the taxo-
components of complicated structure and combine nomic approach discussed in Section 3. To compare
properties of architectures of different classes. Exam- different architectures, it is required to formulate
ples of such systems are Russian multiprocessor sys- requirements to parallel database systems.
tems from MBC-100/1000 series [6], multiprocessor
systems SP2 [44] manufactured by IBM, computers 4.1. Requirements for Parallel Database Systems
based on the ServerNet technology by Tandem [45],
and others. Indeed, the Stonebraker classification turns The criteria used in the comparison of architectures
out inadequate as applied to parallel database systems of parallel database systems rely on the following set of
implemented on the platforms of this kind if the classi- requirements [2, 7, 35]:
fication approach shown in Fig. 1 is used. However,
when using the classification approach based on the
introduction of an intermediate notion of the virtual N
parallel database machine (Fig. 2), the criterion of the
resource sharing may still serve as an adequate basis for P P P P P P
classifying architectures of modern parallel database
systems. Note that we can consider a hierarchy of vir- M M M M M M
tual machines: each current machine is a platform for 2nd level
the implementation of the previous one. For an example (program)
D D D D D D
of such hybrid architectures, we consider the CDN
architecture of parallel database systems described in SN SN
[40, 41]. This architecture is based on the approach sug-
gested by Rahm in [46]. The CDN architecture is con-
structed as a set of one-type SD clusters combined on N
the basis of the SN principle. A distinctive feature of
this system architecture is that the SD clusters on the
upper levels of the system hierarchy are viewed as SN P P P P P P
systems (Fig. 7). This manifests itself in that, to each
processor node, a separate disk is logically assigned. M M M M M M
Such an approach makes it possible to avoid difficulties 1st level
(hardware)
associated with the implementation of the global block-
ing table and support of the cache coherence, which are D D D D D D
typical of the SD systems [35], and, simultaneously, to SD SD
take advantage of the possibilities of the SD architec-
ture for load balancing. A similar approach has been Fig. 7. Hybrid CDN architecture.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


342 SOKOLINSKY

Comparison of architectures system is not great. However, in a system with thou-


sands of processor nodes, this probability increases
SN CE CD CDN thousandfold. Therefore, the problem of ensuring high
Scalability 2 3 3 3 data availability in multiprocessor systems is of great
importance.
Data availability 2 1 3 3
Load balancing 0 2 1 1 The database availability coefficient can roughly be
Interprocessor 0 2 1 1 defined as the ratio of the time during which the data-
communications base was available for the users to that during which the
users tried to access the database. For example, if the
Cache coherence 3 2 0 3
users needed the access to the database during eight
Concurrency control 3 2 0 3 hours a day, but the database was actually available
Sum of points 10 12 8 14 only for six hours, the availability coefficient is equal to
6/8 = 0.75 during an 8-hour period. A highly available
database system can be defined as a system accepting
(1) good scalability, users’ queries 24 h a day with the availability coeffi-
(2) high data availability, cient not less than 0.99 [47].
(3) efficient load balancing, Hardware fault-tolerance is the basic factor ensur-
(4) low cost of interprocessor exchanges, ing high data availability in parallel database systems
(5) low overheads on ensuring cache coherence, with a large number of processor nodes. The hardware
fault-tolerance is meant to be the retention of the sys-
(6) efficient organization of the concurrency control.
tem efficacy under single failures of hardware compo-
Let us consider the specified criteria in more detail. nents, such as a processor, memory module, disk, or
Scalability. The possibility of dynamic buildup to links [47]. In particular, a single failure of any device
adapt to a growing database size or increasing perfor- must not result in the loss of the database integrity, to
mance requirements is an important property of parallel say nothing of a physical loss of any part of the data-
platforms [35]. This feature is achieved by gradually base.
incorporating additional processors, memory modules,
disks, and other hardware components into the system. Load balancing. The balancing of the processor
This process is referred to as system scaling. If the load is one of the key problems in ensuring high effi-
hardware capacity of the system doubles, its perfor- ciency of the parallel query processing. The DBMS
mance is expected to double as well. However, in prac- should divide a query into parallel agents and distribute
tice, a real increase in the performance is usually much them among the processors to ensure uniform loading
lower. For example, the scalability of the SE systems is of all processors. The problem of load balancing is
limited to 20–30 processors [35]. With the further especially important in the case where partitioned par-
enhancement of an SE system, its performance grows allelism is used [2]. The important factor affecting the
very slowly or even starts to fall [34]. This is explained efficiency of the parallelization of relational operations
by the fact that the processors spend much time waiting (especially, join and sort operations) is the value of the
for an access to the shared resources. Hence, the scal- skew in data to be processed. It has been shown that, in
ability of any multiprocessor system is determined by real databases, some values of a certain attribute occur
the parallelization efficiency. more frequently than others [48–50]. In particular,
The parallelization efficiency is described in terms Lynch [49] notes that the values of text attributes are
of two basic qualitative characteristics: speedup and usually distributed in accordance with the Zipf law
scaleup [2]. The architecture of a multiprocessor sys- [51]. Such non-uniformity is said to be the attribute
tem is considered to be nicely scalable if it demon- value skew [52]. Lakshmi and Yu [53] showed that, in
strates almost linear scaleup and speedup. Linear the presence of data skew, the speedup of the parallel
scaleup implies that the time spent by the system for execution of the join operation may be extremely low
solving a problem is equal to that spent by a double sys- because of an overload of some processors and under-
tem for solving a double problem. Linear speedup load of others.
implies that a double system solves a problem twice as Interprocessor communications. If the partitioned
fast as the original system. The main factor that wors- parallelism is used, the interprocessor communications
ens the scalability of systems results from drawbacks in parallel database systems can generate considerable
associated with the concurrent access of shared traffic [35]. This is explained by the fact that, upon par-
resources by the processors. allel execution of the operation of joining two relations,
Data availability. One of the critical characteristics we have either to dynamically fragment anew the orig-
of parallel database systems is the capability of the sys- inal relations by the join attribute or to send the “alien”
tem to ensure a high degree of data availability under tuples from one processor node to another. Both actions
the condition of failures of some hardware components. are associated with sending considerable amounts of
The probability of a hardware failure in a one-processor data through the communication network. Therefore,

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


SURVEY OF ARCHITECTURES 343

the cost of the interprocessor exchanges may critically Data availability. The SN architecture is character-
affect the total system performance. ized again as a good one (2 points). This is explained by
Cache coherence. When a common disk pool is the fact that the backup copies in an SN system should
shared by several processors, we face the so-called be partitioned to many nodes [57] in order that to make
cache coherence problem [17]. The essence of this the backup copy of a failed disk available in the parallel
problem is as follows. After a transaction addresses a mode (otherwise, there may arise a serious disbalance
disk page, the image of this page remains for some time in the loading). The support of the coherence of the par-
in the buffer associated with the given processor node. titioned backup copies requires certain overheads asso-
Hence, one processor node may revoke changes made ciated, first of all, with sending large amounts of data
by the other processor node. To avoid this, any time through the communication network. The data avail-
when the disk page is accessed, we need to check ability in the CE architecture is classified as satisfactory
whether the image of this page is contained in the buffer (1 point) because of the low hardware fault-tolerance of
pools (caches) of other processor nodes and, if this the SE cluster. Indeed, the failure of practically any
takes place, coordinate changes produced in the caches hardware component of an SE system leads to the fail-
of these processor nodes. ure of the whole system [54]. The CD and CDN archi-
tectures demonstrate better data availability (3 points)
When the common operative memory is shared by owing to the fact that all problems related to ensuring
several processors, we encounter a similar problem high data availability can efficiently be solved at the
with the content of the processor cache memory [54]. level of separate SD clusters [40].
True, in the latter case, the problem is usually solved on
the hardware–microprogram level. In any case, the Load balance for the SN architecture is a serious
ensuring of the cache coherence requires additional problem, since the SN systems are very sensitive to the
overheads, which can be considerable in database sys- data skew [53]. Therefore, the corresponding grade of
tems [17]. the SN architecture is 0. The hierarchical CE, CD, and
Concurrency control. Another series problem for CDN architectures make it possible to get better load
database systems with shared disks is the support of the balance since the load is balanced at two—intercluster
global lock table [55]. The locking is one of the basic and intracluster—levels. The SD clusters are character-
methods used for ensuring ACID properties of the ized by satisfactory load balancing owing to the fact
transactions [56]. If different processor nodes work that all disks are available for all processors. Accord-
concurrently with the same database objects, they must ingly, the CD and CDN architectures get 1 point. The
have an access to the common (global) lock table. The best load balance among the considered architectures is
support of such global lock table in multiprocessor sys- achieved in SE clusters, since, in addition to the disks,
tems without shared memory can be associated with the entire operative memory is available for all proces-
great overheads [55]. sors [35]. In accordance with this, the load balancing
for the CE architecture is estimated as good (2 points).
We do not give the CE architecture the highest grade
4.2. Comparative Analysis of Architectures because the problem of balancing the load between sep-
of Parallel Database Systems arate SE clusters remains relevant for the CE systems.
The comparative analysis of the SE, SD, and SN Interprocessor communications. The high cost of
architectures was done by Stonebraker and can be interprocessor communications is a weak point of the
found in the classical work [7]. This analysis showed SN architecture [7, 29] (0 points). The use of the CE
that, from the standpoint of scaleable high-performance architecture makes it possible to considerably reduce
database systems, the SN architecture is most prefera- the overheads associated with the interprocessor com-
ble among these three architectures. munications [34] since the interprocessor communica-
In this section, we compare four different architec- tions on the SE cluster level can efficiently be imple-
tures of parallel database systems using the criteria mented through shared memory. Therefore, in terms of
summarized in the table. These criteria follow immedi- this criterion, the CE architecture gets 2 points. The CD
ately from the requirements for parallel database sys- and CDN architectures are behind the CE architecture
tems formulated in Section 4.1 and are graded on a in terms of this criterion; however, they may outper-
four-point basis: 0 (“unsatisfactory”), 1 (“satisfac- form the SN architecture, since, potentially, the intrac-
tory”), 2 (“good”), and 3 (“excellent”). luster communications can be implemented more effi-
ciently than the intercluster communications [58, 59].
Scalability. The SN architecture is characterized by
Accordingly, we give 1 point each to both CD and CDN
the good scalability (2 points). This is associated with
architecture.
the fact that, in the case of many processor nodes, the
interprocessor communication network becomes a bot- Cache coherence is a serious problem for the CD
tleneck [8, 17]. The CD, CE, and CDN architectures architecture, since, in an SD cluster, the same pages of
demonstrate better scalability (3 points) owing to the the shared disks are buffered in the private memory
fact that most of the communications occur inside the modules (we give the lowest grade, 0 points). The CE
clusters, thus unloading the intercluster network. architecture is better than the CD architecture in terms

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


344 SOKOLINSKY

of this parameter, since the SE clusters use a common however, it needs certain refinement and extension. In
buffer pool in the shared memory. However, the CE this paper, we have refined and extended the Stone-
architecture is behind the SN architecture in terms of braker classification by way of using a virtual parallel
this criterion since, in the SE clusters, it is required to database machine abstraction and introducing addi-
support data coherence in the private processor caches tional classes of hierarchical cluster architectures.
[54]. Hence, the CE architecture gets only 2 points. The Further, we have formulated and considered basic
CDN architecture is free of this disadvantage since, on requirements to parallel database systems. Based on
the logical (program) level, it has no shared resources. these requirements and on the extended Stonebraker
In SN systems, the coherence problem is lacking as classification, we have carried out comparative analysis
well, since these systems have no shared resources. of modern architectures of parallel database systems.
Therefore, the SN and CDN architectures get the high- This analysis has revealed that the hybrid CDN archi-
est grade (3 points). tecture, whose implementation is described in [40, 41],
Concurrency control. Another serious problem has the best performance. The CDN architecture was
inherent in the CD architecture is related to difficulties employed in the prototype parallel database system
associated with the organization of the database object Omega designed for the Russian multiprocessor com-
locking by the concurrent transactions accessing them. puting system MBC-100/1000. Experiments carried
In an SD cluster, it is required to support a copy of the out on the basis of the Omega system substantiate the
global lock table in each processor node, which may conclusions of this paper.
require considerable overheads [55]. Therefore, the CD In terms of further studies, of most interest are the
architecture gets 0 points here as well. The CE architec- following problems:
ture is, basically, free of this shortcoming, since the (1) Experimental studies of different cluster config-
only copy of the global lock table for the SE cluster is urations in systems with the CDN architecture. It is
stored in the shared operative memory (2 points). In the planned to develop a program that emulates the opera-
SN systems, there is no need to support the global lock tion of a database system with the CDN architecture
table just because no resources are shared. Therefore, and, using it, to carry out computational experiments on
the SN architecture is the best in terms of this parameter studying efficiency of query parallelizing in the OLTP
(3 points). The CDN architecture fully inherits this fea- and OLAP modes for various topologies of intracluster
ture from the SN architecture (also 3 points). interprocessor connections.
Conclusion. Based on the above analysis and taking (2) Development of algorithms performing rela-
into account the sum of the grades for different criteria tional operations that take into account specific features
shown in the table, we may conclude that the CD archi- of the CDN architecture in some optimal way.
tecture in the pure form is not appropriate. The CE
architecture looks more attractive than the SN architec- (3) Development of methods for optimizing parallel
ture. However, if we take into account the entire collec- queries designed for database systems with the CDN
tion of the requirements to parallel database systems architecture.
considered in Section 4.1, we can see that the CDN
architecture is the best one. We used the CDN architec- ACKNOWLEDGMENTS
ture in the design of a prototype parallel database sys-
tem Omega based on the Russian multiprocessor com- This work was supported by the Russian Foundation
plex MBC-100/1000. Numerous experiments carried for Basic Research, project no. 03-07-90031.
out with this prototype substantiate the conclusions
made in this paper (see [40, 41]).
REFERENCES
1. Ozsu, M.T. and Valduriez, P., Principles of Distributed
5. CONCLUSION Database System, Englewood Cliffs: Prentice-Hall,
1991.
In this paper, we have introduced the notion of a vir-
tual parallel database machine, which is an effective 2. DeWitt, D.J. and Gray, J., Parallel Database Systems:
The Future of High-Performance Database Systems,
tool in designing complex hybrid parallel database sys- Commun. ACM, 1992, vol. 35, no. 6, pp. 85–98.
tems. This tool makes it possible to define different lev-
els of abstraction when designing system architecture. 3. Voevodin, Vl.V. and Kapitonova, A.P., Metody opisaniya
i klassifikatsii arkhitektur vychislitel’nykh system,
The design stages represent a hierarchy of virtual (Methods of Description and Classification of Architec-
machines, such that each current machine implements tures of Computing Systems), Moscow: Mosk. Gos.
the previous one. Univ., 1994.
For the sake of classification and comparison of 4. Flynn, M.J. and Rudd, K.W., Parallel Architectures,
architectures of modern parallel database systems, we ACM Computing Surv., 1996, vol. 28, no. 1, pp. 67–70.
have analyzed the Stonebraker classification suggested 5. Dasgupta, S.A., Hierarchical Taxonomic System for
in the mid-1980s. This classification has been shown to Computer Architectures, IEEE Comput., 1990, vol. 23,
remain the most appropriate one for database systems; no. 3, pp. 64–74.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


SURVEY OF ARCHITECTURES 345

6. Korneev, V.V., Parallel’nye vychislitel’nye sistemy (Par- 24. Boral, H., Alexander, W., Clay, L., Copeland, G., San-
allel Computing Systems), Moscow: Nolidzh, 1999. forth, S., Franklin, M., Hart, B., Smith, M., and Valdu-
7. Stonebraker, M., The Case for Shared Nothing, Data- riez, P., Prototyping Bubba: A Highly Parallel Database
base Eng. Bull., 1986, vol. 9, no. 1, pp. 4–9. System, IEEE Trans. Knowledge Data Eng., 1990,
vol. 2, no. 1, pp. 4–24.
8. Norman, M.G., Zurek, T., and Thanisch, P., Much Ado
about Shared-Nothing, ACM SIGMOD Record, 1996, 25. Skelton, C.J. et al., EDS: A Parallel Computer System
vol. 25, no. 3, pp. 16–21. for Advanced Information Processing, Lecture Notes in
9. Carr, J.L. and Hennessy, J.L., WSClock—A Simple and Computer Science (Proc. of the 4th Int. PARLE Conf.,
Effective Algorithm for Virtual Memory Management, Paris, 1992), Springer, 1992, vol. 605, pp. 1–18.
Proc. of the Eighth Symp. on Operating System Princi- 26. DeWitt, D.J. et al., The Gamma Database Machine
ples (Asilomar Conf. Grounds, Pacific Grove, 1981), Project, IEEE Trans. Knowledge Data Eng., 1990,
New York: ACM, 1981, pp. 87–95. vol. 2, no. 1, pp. 44–62.
10. Amza, C. et al., ThreadMarks: Shared Memory Comput- 27. Von Bultzingsloewen, G. et al., KARDAMON—A
ing on Networks of Workstations, IEEE Comput., 1996, Dataflow Database Machine for Real-Time Applica-
vol. 29, no. 2, pp. 18–28. tions, SIGMOD Record, 1988, vol. 17, no. 1, pp. 44–50.
11. Patterson, D.A., Gibson, G.A., and Katz, R.H., A Case 28. Apers, P.M.G., van den Berg, C.A., Flokstra, J.,
for Redundant Arrays of Inexpensive Disks (RAID), Grefen, P.W.P.J., Kersten, M.L., and Wilschut, A.N.,
Proc. 1988 ACM SIGMOD Int. Conf. on Management of Prisma/DB: A Parallel Main-Memory Relational
Data (Chicago, 1988), ACM, 1988, pp. 109–116. DBMS, IEEE Trans. Knowledge Data Eng., 1992,
12. Cheng, J.M. et al., IBM Database 2 Performance: vol. 4, no. 6, pp. 541–554.
Design, Implementation, and Tuning, IBM Systems J., 29. Englert, S., Glasstone, R., and Hasan, W., Parallelism
1984, vol. 23, no. 2, pp. 189–210. and Its Price: A Case Study of NonStop SQL/MP, ACM
13. Davison, W., Parallel Index Building in Informix OnLine SIGMOD Record, 1995, vol. 24, no. 4, pp. 61–71.
6.0, Proc. of the 1992 ACM SIGMOD Int. Conf. on Man- 30. Clay, D., Informix Parallel Data Query (PDQ), in Issues,
agement of Data (San Diego, 1992), ACM, 1992, p. 103. Architectures, and Algorithms (Proc. of the 2nd Int.
14. Stonebraker, M., Katz, R.H., and Patterson, D.A., and Conf. on Parallel and Distributed Information Systems
Ousterhout, J.K., The Design of XPRS, Fourteenth Int. (PDIS 1993), San Diego, 1993), IEEE Comput. Soc.,
Conf. on Very Large Data Bases, (Los Angeles, 1988), 1993, pp. 71–72.
Morgan Kaufmann, 1988, pp. 318–330. 31. Page, J., A Study of a Parallel Database Machine and Its
15. Bergsten, B., Couprie, M., and Lopez, M., DBS3: A Par- Performance: The NCR/Teradata DBC/1012. Advanced
allel Data Base System for Shared Store (Synopsis), in Database Systems, Lecture Notes in Computer Science
Issues, Architectures, and Algorithms (Proc. of the 2nd (Proc. of the 10th British Natl. Conf. on Databases.
Int. Conf. on Parallel and Distributed Information Sys- BNCOD 10, Aberdeen, 1992), Springer, 1992, vol. 618,
tems (PDIS 1993), San Diego, 1993), IEEE Comput. pp. 115–137.
Soc., 1993, pp. 260–262. 32. Baru, C.K. et al., DB2 Parallel Edition, IBM System J.,
16. Graefe, G., Volcano—An Extensible and Parallel Query 1995, vol. 34, no. 2, pp. 292–322.
Evaluation System, IEEE Trans. Knowledge Data Engi- 33. Bergsten, B., Couprie, M., and Valduriez, P., Overview
neering, 1994, vol. 6, no. 1, pp. 120–135. of Parallel Architectures for Databases, Comput. J.,
17. Rahm, E., Parallel Query Processing in Shared Disk 1993, vol. 36, no. 8, pp. 734–740.
Database Systems, ACM SIGMOD Record, 1993, 34. Hua, K.A., Lee, C., and Peir, J.-K., Interconnecting
vol. 22, no. 4, pp. 32–37. Shared-Everything Systems for Efficient Parallel Query
18. Strickland, J.P., Uhrowczik, P.P., and Watts, V.L., Processing, Proc. First Int. Conf. on Parallel and Dis-
IMS/VS: An Evolving System, IBM Systems J., 1982, tributed Information Systems (PDIS 1991) (Miami
vol. 21, no. 3, pp. 490–510. Beach, 1991), IEEE-CS, 1991, pp. 262–270.
19. Linder, B., Oracle Parallel RDBMS on Massively Paral- 35. Valduriez, P., Parallel Database Systems: The Case for
lel Systems, in Issues, Architectures, and Algorithms Shared-Something, Proc. of the 9th Int. Conf. on Data
(Proc. of the 2nd Int. Conf. on Parallel and Distributed Eng. (Vienna, 1993), IEEE Comput. Soc., 1993,
Information Systems (PDIS 1993), San Diego, 1993), pp. 460–465.
IEEE Comput. Soc., 1993, pp. 67–68. 36. Ballinger, C. and Fryer, R., Born to Be Parallel: Why
20. Dubova, N., Supercomputers nCube, Otkrytye sistemy, Parallel Origins Give Teradata an Enduring Performance
1995, no. 2, pp. 42–47. Edge, IEEE Data Eng. Bull., 1997, vol. 20, no. 2, pp. 3–12.
21. Kronenberg, N.P., Levy, H.M., and Strecker, W.D., 37. Pramanik, S. and Tout, W.R., The NUMA with Clusters
VAXclusters: A Closely-Coupled Distributed System, of Processors for Parallel Join, IEEE Trans. Knowledge
ACM Trans. Comput. Systems, 1986, vol. 4, no. 2, Data Eng., 1997, vol. 9, no. 4, pp. 653–666.
pp. 130–146. 38. Copeland, G.P. and Keller, T., A Comparison of High-
22. Nick, J.M., Moore, B.B., Chung, J.-Y., and Bowen, N.S., Availability Media Recovery Techniques, Proc. of the
S/390 Cluster Technology: Parallel Sysplex, IBM Sys- 1989 ACM SIGMOD Int. Conf. on Management of Data
tems J., 1997, vol. 36, no. 2, pp. 172–201. (Portland, 1989), ACM, 1989, pp. 98–109.
23. Lorie, R., et al., Adding Intra-Transaction Parallelism to 39. Graefe, G., Query Evaluation Techniques for Large
an Existing DBMS: Early Experience, Data Engineering Databases, ACM Computing Surv., 1993, vol. 25, no. 2,
Bull., 1989, vol. 12, no. 1, pp. 2–8. pp. 73–169.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004


346 SOKOLINSKY

40. Sokolinsky, L.B., Organization of Parallel Query Pro- cessing 83 (Proc. of the IFIP 9th World Comput. Congr.,
cessing in Multiprocessor Database Machines with Hier- Paris, 1983), North-Holland, 1983, pp. 235–241.
archical Architecture, Programmirovanie, 2001, no. 6, 51. Zipf, G.K., Human Behavior and the Principle of Least
pp. 13–29. Effort: An Introduction to Human Ecology, Cambridge:
41. Sokolinsky, L.B., Design and Evaluation of Database Addison-Wesley, 1949.
Multiprocessor Architecture with High Data Availabil- 52. Walton, C.B., Dale, A.G., and Jenevein, R.M., A Taxon-
ity, Proc. of the 12th Int. DEXA Workshop (Munich, omy and Performance Model of Data Skew Effects in
2001), IEEE Comput. Soc., 2001, pp. 115–120. Parallel Joins, Proc. of the 17th Int. Conf. on Very Large
42. Bouganim, L., Florescu, D., and Valduriez, P., Dynamic Data Bases (Barcelona, 1991), Morgan Kaufmann,
Load Balancing in Hierarchical Parallel Database Sys- 1991, pp. 537–548.
tems, Proc. 22th Int. Conf. on Very Large Data Bases 53. Lakshmi, M.S. and Yu, P.S., Effectiveness of Parallel
(VLDB’96) (Mumbai, India, 1996), Morgan Kaufmann, Joins, IEEE Trans. Knowledge Data Eng., 1990, vol. 2,
1996, pp. 436–447. no. 4, pp. 410–424.
43. Xu, Y. and Dandamudi, S.P., Performance Evaluation of 54. Pfister, G., Sizing Up Parallel Architectures, Database
a Two-Level Hierarchical Parallel Database System, Programming Design OnLine (http://www.dbpd.com),
Proc. Int. Conf. Computers and Their Applications, 1998, vol. 11, no. 5.
Tempe, Arizona, 1997, pp. 242–247.
55. Mohan, C. and Narang, I., Efficient Locking and Cach-
44. Shmidt, V., IBM SP2 Systems, Otkrytye Sistemy, 1995, ing of Data in the Multisystem Shared Disks Transaction
no. 6, pp. 53–60. Environment, Lecture Notes in Computer Science (Proc.
45. Shnitman, V., Fault-Tolerant Servers ServerNet, Otkry- of the 3rd Int. Conf. on Extending Database Technol.,
tye Sistemy, 1996, no. 3, pp. 5–11. Vienna, 1992), Vienna: Springer, 1992, pp. 453–468.
46. Rahm, E., Framework for Workload Allocation in Dis- 56. Gray, J. and Reuter, A., Transaction Processing: Con-
tributed Transaction Processing Systems, J. Systems cepts and Techniques, Morgan Kaufmann, 1993.
Software, 1992, vol. 18, pp. 171–190. 57. Hsiao, H.I. and DeWitt, D.J., A Performance Study of
47. Kim, W., Highly Available Systems for Database Appli- Three High Availability Data Replication Strategies,
cations, ACM Computing Surv., 1984, vol. 16, no. 1, Distributed Parallel Databases, 1993, vol. 1, no. 1,
pp. 71–98. pp. 53–80.
48. Christodoulakis, S., Estimating Record Selectivities, 58. Sokolinsky, L.B., Interprocessor Communication Sup-
Information Systems, 1983, vol. 8, no. 2, pp. 105–115. port in the Omega Parallel Database System, Proc. of the
49. Lynch, C.A., Selectivity Estimation and Query Optimi- 1st Int. Workshop on Comput. Sci. and Information Tech-
zation in Large Databases with Highly Skewed Distribu- nol. (CSIT’99), Moscow, 1999.
tion of Column Values, Proc. of the Fourteenth Int. Conf. 59. Sokolinsky, L.B., Operating System Support for a Paral-
on Very Large Data Bases, (Los Angeles, 1988), Morgan lel DBMS with a Hierarchical Shared-Nothing Architec-
Kaufmann, 1998, pp. 240–251. ture, Proc. of the Third East Eur. Conf. “Advances in
50. Montgomery, A.Y., D’Souza, D.J., and Lee, S.B., The Databases and Information Systems” (ADBIS’99)
Cost of Relational Algebraic Operations on Skewed (Maribor, Slovenia, 1999), Maribor: Institute of Infor-
Data: Estimates and Experiments, in Information Pro- matics, 1999, pp. 38–45.

PROGRAMMING AND COMPUTER SOFTWARE Vol. 30 No. 6 2004

View publication stats

You might also like