0% found this document useful (0 votes)
16 views

Para Distr Query Processing Notes

The document discusses parallel and distributed query processing techniques. It describes how to parallelize relational algebra operations like sort, projection, selection and join. It also discusses distributed query processing where global queries are broken into subqueries to be processed locally and results combined.

Uploaded by

ihtishaamahmed6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Para Distr Query Processing Notes

The document discusses parallel and distributed query processing techniques. It describes how to parallelize relational algebra operations like sort, projection, selection and join. It also discusses distributed query processing where global queries are broken into subqueries to be processed locally and results combined.

Uploaded by

ihtishaamahmed6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Advances in Data Management:

Parallel and Distributed Query Processing

1 Parallel implementation of the relational algebra


Note: The algorithms described below generalise to all three architectures. Just the mode of commu-
nication between the processors is different: via the interconnection network with shared-nothing —
which is what is assumed in the below description; and via shared memory or shared disk with shared-
memory/shared-disk.

The basic foundations of parallel query evaluation are the splitting and merging of parallel streams of
data:
Data streams arising from different disks or processors provide the input for each operator in the query.
The results being produced as an operator is evaluated in parallel over a set of nodes need to be merged
at one node in order to produce the overall result of the operator.
This result set may then be split again in order to parallelise subsequent processing in the query.

1.1 Sort
Given the frequency with which sorting is required in query processing, parallel sorting algorithms have
been much studied.
A commonly used parallel sorting algorithm is the parallel merge sort:
This first sorts each fragment of the relation individually on its local disk, e.g. using the external merge
sort algorithm we have already looked at.
Groups of fragments are then shipped to one node per group, which merges the group of fragments into
a larger sorted fragment. This process repeats until a single sorted fragment is produced at one of the
nodes.

1.2 Projection and selection


Selection and projection operators on a relation are usually performed together in one scan. This can be
considered as a reduction operator.
Reducing a fragment at a single node can be done as in a centralised DBMS, i.e. by a sequential scan or
by utilising appropriate indexes if available.
If duplicates are permitted in the overall result, or if they are not permitted but are known to be
impossible (e.g. if the result contains all the key attributes of the relation), then the fragments can just
be shipped to one node to be merged.
If duplicates may arise in the overall result and they need to be eliminated, then a variant of the parallel
merge sort algorithm described in Section 1.1 can be used to sort the result while at the same time
eliminating duplicates.

1
1.3 Join
Ro
n S using Parallel Nested Loops or Index Nested Loops
First the ‘outer’ relation has to be chosen. In particular, if S has an appropriate index on the join
attribute(s) then R should be the outer relation.
All the fragments of R are then shipped to all nodes. So each node i now has a whole copy of R as well
as its own fragment of S, Si .
The local joins R o
n Si are performed in parallel on all the nodes, and the results are finally shipped to
a chosen node for merging.
Ro
n S using Parallel Sort-Merge Join (for natural/equijoins)
The first phase of this involves sorting R and S on the join attribute(s). These sorts can be performed
using the parallel merge sort operation described in Section 1.1.
The sorted relations are then partitioned across the nodes using range partitioning with the same sub-
ranges on the join attribute(s) for both relations.
The local joins of each pair of sorted fragments Ri o
n Si are performed in parallel, and the results are
finally shipped to a chosen node for merging.
Ro
n S using Parallel Hash Join (for natural/equijoins only)
Each bucket of R and S is logically assigned to one node.
The first hashing phase, using the first hash function h1 , is undertaken in parallel on the all nodes. Each
tuple t from R or S is shipped to node i if the bucket assigned to it by h1 is the ith bucket.
The next phase is also undertaken in parallel on all nodes. On each node i, a hash table is created from
the local fragment of R, Ri , using another hash function h2 . The local fragment of S, Si , is then scanned
and h2 is used to probe the hash table for matching records of Ri for each record of Si .
The results produced at each node are shipped to a chosen node for merging.

1.4 Parallel Query Optimisation


The above parallel versions of the relational algebra operators have different costs compared to their
sequential counterparts, and it is these costs that need to be considered during the generation and
selection of parallel query plans:

• The costs of partitioning and merging the data now need to be taken into account.
• If the data distribution is skewed, this will have an impact on the overall time taken to complete
the evaluation of an operator, so that too needs to be taken into account.

• The results being produced by one operator can be pipelined into the evaluation of another operator
that is executing at the same time on a different node.
For example, consider this left-deep join tree where all the joins are nested loops joins:
((R1 o
n R2 ) o
n R3 ) o
n R4
The tuples being produced by R1 o n R2 can be used to ‘probe’ R3 and the resulting tuples can
be used to probe R4 , thus setting up a pipeline of three concurrently executing join operators on
three different nodes.

• There is now the possibility of executing different operators of the query concurrently on different
nodes.
For example, with this bushy join tree:
((R1 o
n R2 ) o
n (R3 o
n R4 ))

2
the join of R1 with R2 and the join of R3 with R4 can be executed concurrently. The partial results
of these joins can also be pipelined into their parent join operator.

In uniprocessor systems only left-deep join orders are usually considered (and this still gives n! possible
join orders for a join of n relations).
In multi-processor systems, other join orders can result in more parallelisation, e.g. bushy trees, as
illustrated above. Even if such a plan is more costly in terms of the number of I/O operations performed,
it may execute more quickly than a left-deep plan due to the increased parallelism it affords.
So, in general, the number of candidate query plans is much greater in parallel database systems and
more heuristics need to be employed to limit the search space of query plans. For example, one possible
approach is for the query optimiser to first find the best plan for sequential evaluation of the query; and
then find the fastest parallel execution of that plan.

2 Distributed Query Processing


The purpose of distributed query processing is to process global queries, i.e. queries expressed with
respect to the global or external schemas of a DDB system.
The local query processor at each site is responsible for processing sub-queries of global queries that
can be evaluated at that site.
A global query processor is needed at sites of the DDB system to which global queries can be
submitted. This will optimise each global query, distribute sub-queries of the query to the appropriate
local query processors, and collect the results of these sub-queries to return to the user.
In more detail, processing global queries consists of the following steps:

1. translating the query into a query tree;


2. replacing fragmented relations in this tree by their definition as unions/joins of their horizon-
tal/vertical fragments;
3. simplifying the resulting tree using several heuristics (see below);
4. global query optimisation, resulting in the selection of a query plan; this consists of sub-queries each
of which will be executed at one local site; the query plan is annotated with the data transmission
that will occur between sites;
5. local processing of the local sub-queries; this may include further optimisation of the local sub-
queries, based on local information about access paths and database statistics.

In Step 3, the simplifications that can be carried out in the case of horizontal partitioning include the
following:

• eliminating fragments from the argument to a selection operation if they cannot contribute any
tuples to the result of that selection;
• distributing join operations over unions of fragments, and eliminating joins that can yield no tuples;

For example, suppose a table Employee(empID, site, salary, . . . ) is horizontally fragmented into four
fragments:
E1 = σsite=0 A0 AN D salary<30000 Employee
E2 = σsite=0 A0 AN D salary>=30000 Employee
E3 = σsite=0 B 0 AN D salary<30000 Employee
E4 = σsite=0 B 0 AN D salary>=30000 Employee

3
then the query σsalary<25000 Employee is replaced in Step 2 by

σsalary<25000 (E1 ∪ E2 ∪ E3 ∪ E4 )

which simplifies to
σsalary<25000 (E1 ∪ E3 )

For example, suppose a table WorksIn(empID, site, project, . . . ) is horizontally fragmented into two
fragments:
W1 = σsite=0 A0 WorksIn
W2 = σsite=0 B 0 WorksIn
then the query Employee o
n W orksIn is replaced in Step 2 by:

(E1 ∪ E2 ∪ E3 ∪ E4 ) o
n (W1 ∪ W2 )

distributing the join over the unions of fragments gives:

n W1 ) ∪ (E2 o
(E1 o n W1 ) ∪ (E3 o
n W1 ) ∪ (E4 o
n W1 ) ∪
n W2 ) ∪ (E2 o
(E1 o n W2 ) ∪ (E3 o
n W2 ) ∪ (E4 o
n W2 )

and this simplifies to:


n W1 ) ∪ (E2 o
(E1 o n W1 ) ∪ (E3 o
n W2 ) ∪ (E4 o
n W2 )

One simplification that can be carried out in Step 3 in the case of vertical partitioning is that fragments in
the argument of a projection operation which have no non-key attributes in common with the projection
attributes can be eliminated.
For example, if a table Projects(projNum, budget, location, projName) is vertically partitioned into two
fragments:
P1 = πprojN um,budget,location Projects
P2 = πprojN um,projN ame Projects
then the query πprojN um,location Projects is replaced in Step 2 by:

πprojN um,location (P1 o


n P2 )

which simplifies to:


πprojN um,location P1
on the assumption that each P rojN um value in P1 also appears in P2 (which is guaranteed by the
loss-less join decomposition condition).
Step 4 (Query Optimisation) consists of generating a set of alternative query plans, estimating the
cost of each plan, and selecting the cheapest plan.
It is carried out in much the same way as for centralised query optimisation, but now communication
costs must also be taken into account when estimating the overall cost of a query plan.
Also, the replication of relations or fragments of relations is now a factor — there may be a choice of
which replica to use within a given query plan, with different costs being associated with using different
replicas.
Another difference from centralised query optimisation is that there may be a speed-up in query execution
times due to the parallel processing of parts of the query at different sites.

2.1 Distributed Processing of Joins


Given the potential size of the results of join operations, the efficient processing of joins is a significant
aspect of global query processing in distributed databases and a number of distributed join algorithms
have been developed. These include:

4
• the full-join method, and
• the semi-join method.

Full-join method
The simplest method for computing R o n S at the site of S consists of shipping R to the site of S and
doing the join there. This has a cost of

cost of reading R + c × pages(R) + cost of computing R o


n S at site(S)

where c is the cost of transmitting one page of data from the site of R to the site of S, and pages(R) is
the number of pages that R consists of.
If the result of this join were needed at a different site, then there would also be the additional cost of
sending the result of the join from site(S) to where it is needed.
Semi-join method
This is an alternative method for computing R o
n S at the site of S and consists of the following steps:

(i) Compute πR∩S (S) at the site of S, where πR∩S denotes projection on the common attributes of R
and S.
(ii) Ship πR∩S (S) to the site of R.
(iii) Compute R n S at the site of R, where n is the semi-join operator, defined as follows:

RnS =Ro
n πR∩S (S)

(iv) Ship R n S to the site of S.


(v) Compute R o
n S at the site of S, using the fact that

Ro
n S = (R n S) o
nS

Example 1. Consider the following relations, stored at different sites:


R = accounts(accno, cname, balance)
S = customer(cname, address, city, telno, creditRating).
Suppose we need to compute R o
n S at the site of S.
Suppose also that

• accounts contains 100,000 tuples on 1,000 pages


• customer contains 50,000 tuples on 500 pages
• the cname field of S consumes 0.2 of each record of S

With the full join method we have a cost of

cost of reading R + c × pages(R) + cost of computing R o


n S at site(S)

which is 1000 I/Os to read R, plus (c × 1000) to transmit R to the site of S, plus 1000 I/Os to save it
there, plus (3 × (1000 + 500)) I/Os (assuming a hash join) to perform the join. This gives a total cost of:
(c × 1000) + 6500 I/Os
With the semi-join method we have the cost of:

(i) Computing πR∩S (S) at site(S), i.e. 500 I/Os to scan S, generating 100 pages of just the cname
values

5
(ii) Shipping πR∩S (S) to site(R), i.e. c × 100 and saving it there, i.e. 100 I/Os.
(iii) Computing R n S at site(R), i.e. 3 × (100 + 1000) I/Os, assuming a hash join
(iv) Shipping the result of R n S to the site of S, i.e. c × 1000 and saving it there, i.e. 1000 I/Os
(assuming cname in R is a foreign key).
n S at site(S), i.e. 3 × (1000 + 500))
(v) Computing R o

This gives a total cost of


(c × 1100) + 9400 I/Os

So in this case the full join method ((c × 1000) + 6500 I/Os) is cheaper: we have gained nothing by using
the semi-join method since all the tuples of R join with tuples of S.
Example 2. Let R be as above (i.e., accounts) and let
S = σcity=0 London0 (customer)
Suppose again that we need to compute R o
n S at the site of S.
Suppose also that there are 100 different cities in customer, that there is a uniform distribution of
customers across cities, and a uniform distribution of accounts over customers. So S contains 500 tuples
on 5 pages.
With the full join method we have a cost of

cost of reading R + c × pages(R) + cost of computing R o


n S at site(S)

which is 1000 + (c × 1000) + 1000 + (3 × (1000 + 5)) I/Os

= (c × 1000) + 5015 I/Os.

With the semi-join method we have the cost of:

(i) Computing πR∩S (S) at site(S), i.e. 5 I/Os to scan S, generating 1 page of cname values
(ii) Shipping πR∩S (S) to site(R), i.e. c × 1 plus 1 I/O to save it there.
(iii) Computing R n S at site(R), i.e. 3 × (1 + 1000) assuming a hash join

(iv) Shipping R n S to the site of S, i.e. c × 10 since, due to a uniform distribution of accounts over
customers, 1/100th of R will match the cname values sent to it from S.
Plus the cost of saving the result of R n S at the site of S, 10 I/Os.
n S at site(S), i.e. 3 × (10 + 5))
(v) Computing R o

The overall cost is thus (c × 11) + 3064 I/Os versus (c × 1000) + 5015 I/Os for the full-join method.
So in this case the semi-join method is cheaper. This is because a significant number of tuples of R do
not join with S and so are not sent to the site of S.
Bloom join method
In general, a Bloom filter is an efficient memory structure (usually a bit vector) used to approximate the
contents of a set S in the following sense:

• given an item i,

– if the filter returns false or 0 for i, then i is definitely not in S;


– if it returns true or 1, then i may (or may not) be in S.

6
Apart from their use in distributed join processing, Bloom filters are also used in NoSQL systems such
as BigTable to avoid having to read an SSTable in full to find out that the key being searched for does
not appear in it.
The Bloom join method is similar to the semi-join method, as it too aims to reduce the amount of data
being sent from the site of R to the site of S.
However, rather than doing a projection on S and sending the resulting data to the site of R, a bit-vector
of a fixed size k is computed by hashing each tuple of S to the range [0..k − 1] (using the join attribute
values). The ith bit of the vector is set to 1 if some tuple of S hashes to i and is set to 0 otherwise.
Then, at the site of R, the tuples of R are also hashed to [0..k − 1] (using the same hash function and the
join attribute values), and only those tuples of R whose hash value corresponds to a 1 in the bit-vector
sent from S are retained for shipping to the site of S.
The cost of shipping the bit-vector from the site of S to the site of R is less than the cost of shipping
the projection of S in the semi-join method.
However, the size of the subset of R that is sent back to the site of S is likely to be larger (since only
approximate matching of tuples is taking place now), and so the shipping costs and join costs are likely
to be higher than with the semi-join method.

You might also like