0% found this document useful (0 votes)
19 views3 pages

Parallel Databases: Solutions To Practice Exercises

This document discusses parallel databases and solutions to practice exercises related to parallel databases. Some key points discussed include: - Hybrid range partitioning can provide benefits of range partitioning without its drawbacks by partitioning ranges into small blocks in a round-robin fashion. - Intra-query parallelism is important for large queries to take advantage of parallel hardware, while inter-query parallelism is better for many small queries to reduce overhead. - In shared memory architectures, data transfer overhead between operators running on different processors is reduced. - A partitioning technique is described that replicates relation partitions across fewer processors than the general case, allowing more fragments to be partitioned for the same number of processors.

Uploaded by

NUBG Gamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Parallel Databases: Solutions To Practice Exercises

This document discusses parallel databases and solutions to practice exercises related to parallel databases. Some key points discussed include: - Hybrid range partitioning can provide benefits of range partitioning without its drawbacks by partitioning ranges into small blocks in a round-robin fashion. - Intra-query parallelism is important for large queries to take advantage of parallel hardware, while inter-query parallelism is better for many small queries to reduce overhead. - In shared memory architectures, data transfer overhead between operators running on different processors is reduced. - A partitioning technique is described that replicates relation partitions across fewer processors than the general case, allowing more fragments to be partitioned for the same number of processors.

Uploaded by

NUBG Gamer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

C H A P T E R 2 1

Parallel Databases

Solutions to Practice Exercises


21.1 If there are few tuples in the queried range, then each query can be processed
quickly on a single disk. This allows parallel execution of queries with reduced
overhead of initiating queries on multiple disks.
On the other hand, if there are many tuples in the queried range, each query
takes a long time to execute as there is no parallelism within its execution. Also,
some of the disks can become hot-spots, further increasing response time.
Hybrid range partitioning, in which small ranges (a few blocks each) are
partitioned in a round-robin fashion, provides the benefits of range partition-
ing without its drawbacks.
21.2 a. When there are many small queries, inter-query parallelism gives good
throughput. Parallelizing each of these small queries would increase the
initiation overhead, without any significant reduction in response time.
b. With a few large queries, intra-query parallelism is essential to get fast re-
sponse times. Given that there are large number of processors and disks,
only intra-operation parallelism can take advantage of the parallel hard-
ware – for queries typically have few operations, but each one needs to
process a large number of tuples.
21.3 a. The speed-up obtained by parallelizing the operations would be offset by
the data transfer overhead, as each tuple produced by an operator would
have to be transferred to its consumer, which is running on a different pro-
cessor.
b. In a shared-memory architecture, transferring the tuples is very efficient.
So the above argument does not hold to any significant degree.

95
96 Chapter 21 Parallel Databases

c. Even if two operations are independent, it may be that they both supply
their outputs to a common third operator. In that case, running all three on
the same processor may be better than transferring tuples across proces-
sors.
21.4 Relation r is partitioned into n partitions, r0 , r1 , . . . , rn−1 , and s is also parti-
tioned into n partitions, s0 , s1 , . . . , sn−1 . The partitions are replicated and as-
signed to processors as shown below.

s0 s1 s2 s3 . . . . sn 1

.
.
r0 P 0,0 P 0,1 .

r1 P 1,0 P 1,1 P 1,2

r2 P 2,1 P 2,2 P 2,3


.
. . . .
. . . .
. .

rn 1 . . . . . . Pn 1,
n 1

Each fragment is replicated on 3 processors only, unlike in the general case


where it is replicated on n processors. The number of processors required is
now approximately 3n, instead of n2 in the general case. Therefore given the
same number of processors, we can partition the relations into more fragments
with this optimization, thus making each local join faster.
21.5 a. A partitioning vector which gives 5 partitions with 20 tuples in each parti-
tion is: [21, 31, 51, 76]. The 5 partitions obtained are 1 − 20, 21 − 30, 31 − 50,
51 − 75 and 76 − 100. The assumption made in arriving at this partitioning
vector is that within a histogram range, each value is equally likely.
b. Let the histogram ranges be called h1 , h2 , . . . , hh , and the partitions
p1 , p2 , . . . , pp . Let the frequencies of the histogram ranges be n1 , n2 , . . . , nh .
Each partition should contain N/p tuples, where N = Σhi=1 ni .
To construct the load balanced partitioning vector, we need to deter-
mine the value of the k1th tuple, the value of the k2th tuple and so on, where
k1 = N/p, k2 = 2N/p etc, until kp−1 . The partitioning vector will then be
[k1 , k2 , . . . , kp−1 ]. The value of the kith tuple is determined as follows. First
determine the histogram range hj in which it falls. Assuming all values in
Exercises 97

a range are equally likely, the kith value will be


kij
sj + (ej − sj ) ∗
nj
where
sj : first value in hj
ej : last value in hj
kij : ki − Σj−1
l=1 nl

21.6 a. The copies of the data items at a processor should be partitioned across
multiple other processors, rather than stored in a single processor, for the
following reasons:
• to better distribute the work which should have been done by the failed
processor, among the remaining processors.
• Even when there is no failure, this technique can to some extent deal
with hot-spots created by read only transactions.
b. RAID level 0 itself stores an extra copy of each data item (mirroring). Thus
this is similar to mirroring performed by the database itself, except that the
database system does not have to bother about the details of performing
the mirroring. It just issues the write to the RAID system, which automati-
cally performs the mirroring.
RAID level 5 is less expensive than mirroring in terms of disk space re-
quirement, but writes are more expensive, and rebuilding a crashed disk
is more expensive.

You might also like