Para Distr Query Processing Slides
Para Distr Query Processing Slides
Sort
Given the frequency with which sorting is required in query
processing, parallel sorting algorithms have been much studied.
A commonly used parallel sorting algorithm is the parallel merge
sort:
This first sorts each fragment of the relation individually on its
local disk, e.g. using the external merge sort algorithm we have
already looked at.
Groups of fragments are then shipped to one node per group,
which merges the group of fragments into a larger sorted fragment.
This process repeats until a single sorted fragment is produced at
one of the nodes.
Join
Ro
n S using Parallel Nested Loops or Index Nested Loops
First the ‘outer’ relation has to be chosen. In particular, if S has
an appropriate index on the join attribute(s) then R should be the
outer relation.
All the fragments of R are then shipped to all nodes. So each node
i now has a whole copy of R as well as its own fragment of S, Si .
The local joins R o
n Si are performed in parallel on all the nodes,
and the results are finally shipped to a chosen node for merging.
Ro
n S using Parallel Sort-Merge Join (for natural/equijoins)
The first phase of this involves sorting R and S on the join
attribute(s). These sorts can be performed using the parallel merge
sort operation
The sorted relations are then partitioned across the nodes using
range partitioning with the same subranges on the join attribute(s)
for both relations.
The local joins of each pair of sorted fragments Ri o n Si are
performed in parallel, and the results are finally shipped to a
chosen node for merging.
Ro
n S using Parallel Hash Join (for natural/equijoins only)
Each bucket of R and S is logically assigned to one node.
The first hashing phase, using the first hash function h1 , is
undertaken in parallel on the all nodes. Each tuple t from R or S
is shipped to node i if the bucket assigned to it by h1 is the i th
bucket.
The next phase is also undertaken in parallel on all nodes. On each
node i, a hash table is created from the local fragment of R, Ri ,
using another hash function h2 . The local fragment of S, Si , is
then scanned and h2 is used to probe the hash table for matching
records of Ri for each record of Si .
The results produced at each node are shipped to a chosen node
for merging.
((R1 o
n R2 ) o
n R3 ) o
n R4
((R1 o
n R2 ) o
n (R3 o
n R4 ))
which simplifies to
Full-join method
The simplest method for computing R o n S at the site of S
consists of shipping R to the site of S and doing the join there.
This has a cost of
where c is the cost of transmitting one page of data from the site
of R to the site of S, and pages(R) is the number of pages that R
consists of.
If the result of this join were needed at a different site, then there
would also be the additional cost of sending the result of the join
from site(S) to where it is needed.
Semi-join method
This is an alternative method for computing R o
n S at the site of S
and consists of the following steps:
(i) Compute πR∩S (S) at the site of S, where πR∩S denotes
projection on the common attributes of R and S.
(ii) Ship πR∩S (S) to the site of R.
(iii) Compute R n S at the site of R, where n is the semi-join
operator, defined as follows:
R nS =R o
n πR∩S (S)
Ro
n S = (R n S) o
nS
So in this case the full join method ((c × 1000) + 6500 I/Os) is
cheaper: we have gained nothing by using the semi-join method
since all the tuples of R join with tuples of S.
The cost of shipping the bit-vector from the site of S to the site of
R is less than the cost of shipping the projection of S in the
semi-join method.
However, the size of the subset of R that is sent back to the site of
S is likely to be larger (since only approximate matching of tuples
is taking place now), and so the shipping costs and join costs are
likely to be higher than with the semi-join method.