Setting The Degree of Parallelism: Figure C-4
Setting The Degree of Parallelism: Figure C-4
execution plan, the query coordinator determines the order in which the
operations must be performed. With this information, the query coordinator
determines the data flow of the statement. Figure C-4 illustrates the data flow of
the following query:
SELECT dname, MAX(sal), AVG(sal)
GROUP BY dname;
The following example shows a statement that sets the degree of parallelism to 4 on a
table:
ALTER TABLE emp PARALLEL 4;
The Oracle server can use parallel execution for any of these operations:
Table scan
Nested loop join
Sort merge join
Hash join
"Not in"
Group by
Select distinct
Union and union all
Aggregation
PL/SQL functions called from SQL
Order by
Create table as select
Create index
Rebuild index
Rebuild index partition
Move partition
Split partition
Update
Delete
Insert ... select
Enable constraint (the table scan is parallelized)
Star transformation
Cube
Rollup
Hash Join
There are several variants of hash joins, like Simple Hash Join, Partitioned Hash Join, and Hybrid Hash Join.
Steps:
1. Identify a hash function such that the result of the function should point to one of the identified range of values (or
buckets, say bucket 0, bucket 1, …, bucket 9 for example).
2. Identify the smaller relation among the relations that are to be joined, say r and s.
3. Partition the smaller relation, say r, into the identified buckets using the join attribute of r. That is, r is partitioned
into available buckets as r0, r1, r2, …, r9. This step is called Building Phase (sometimes Partitioning Phase).
4. Partition the relation s into the identified buckets using the join attribute of s. That is, s is partitioned into
available buckets as s0, s1, s2, …, s9. This step is called Probing Phase as the records of s probe for the
matching records in the appropriate buckets.
5. Finally the result can be collected from different buckets and submitted as result.
where n is the number of partitions (or buckets). Let us choose the DeptNo attribute as hash key and partition the relation
into 3 partitions such as 0, 1, and 2. Then, our hash function will look like as follows;
h(DeptNo) = DeptNo mod 3
Step 3: Partition Department using the hash function. The first record of Department will go into Bucket 1, second record into
Bucket 2, and third record into Bucket 0.
Step 4: Now partition the larger relation Employee using the same hash function. That is use the following hash function;
h(DeptNo) = DeptNo mod 3
The figure given below shows the partitioning of Employee table into 3 buckets.
After successful partitioning, our hash buckets will look the figure given below; in this figure, the first table shows 0 th partition
of Department table, and the second one shows the 0th partition of Employee table.
Figure 3 - Bucket status after hashing of all records of the tables to be joined
Carefully observe the data stored in every bucket. Bucket 0 stores records of zeroth partitions of both tables where the
attribute DeptNo (joining attribute) is having the same value, i.e, 3. It is true for other partitions also. Hence, at this stage the
join is trivial. This is why this phase of this joining technique is named Probing phase, where the Probe relation searches for
the joining attribute value of other relation and gets joined. At this stage, the joining is trivial.
The major advantage is that, if we join with conventional join technique, then the comparison requires,
6 records X 3 records = 18 comparisons
When we use hash join, we need
(1 record X 2 records in Bucket 0)+ (1 X 2 in Bucket 1)+ (1 X 2 in Bucket 2) = 6 comparisons
Points to note:
1. Only Equi-join or Natural Join can be performed using Hash join
2. Simple hash join assumed one of the relations as small relation, i.e, the whole relation can fit into
memory.
3. Smaller relation is chosen in the building phase.
4. Only a single pass through the tables is required. That is one time scanning of relation is required.
5. Hash function should be chosen such that it should not cause skewness.
SELECT *
FROM t1
JOIN t2
ON t1.c1=t2.c1;
he hash join is used for queries involving multiple joins as well, as long as at least one join condition for each pair of
tables is an equi-join, like the query shown here:
SELECT *
FROM t1
JOIN t2
ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
JOIN t3
ON (t2.c1 = t3.c1);
A hash join cannot be used if any pair of joined tables does not have at least one equi-join condition, as can be seen
here:
Consumer operations can begin consuming rows as soon as the producer operations have produced rows.
In Example 8-2, while the parallel execution servers are producing rows in the FULL SCAN of the sales table, another
set of parallel execution servers can begin to perform the HASH JOIN operation to consume the rows.
In a nutshell, the Nested Loop Join uses one joining table as an outer input table and the
other one as the inner input table. The Nested Loop Join gets a row from the outer table and searches
for the row in the inner table; this process continues until all the output rows of the outer table are searched
in the inner table.
Used where relation table size is very small/.
he major difference between a hash join and a nested loops join is the use of a full-
table scan with the hash join.
We may see the physical join implementations with names like nested loops, sort
merge and hash join.
Hash joins - In a hash join, the Oracle database does a full-scan of the driving table,
builds a RAM hash table, and then probes for matching rows in the other table. For
certain types of SQL, the hash join will execute faster than a nested loop join, but
the hash join uses more RAM resources.
Nested loops join - The nested loops table join is one of the original table join plans
and it remains the most common. In a nested loops join, we have two tables a driving
table and a secondary table. The rows are usually accessed from a driving table index
range scan, and the driving table result set is then nested within a probe of the second
table, normally using an index range scan method.
Some queries will perform faster with NESTED LOOPS joins, some with HASH joins, while others
favor sort-merge joins. It is difficult to predict what join technique will be fastest a priori
A hash join is an operation that performs a full-table scans on the smaller of the two
tables (the driving table) and then builds a hash table in RAM memory. The hash
table is then used to retrieve the rows in the larger table.
Fragment-and-Replicate Join
We have discussed Partitioned Join in the previous post, where we partitioned the relational tables that are to be joined, into
equal partitions and we performed join on individual partitions locally at every processor. Partitioning the relations on the
joining attribute and join them will work only for joins that involve equality conditions.
Clearly, joining the tables by partitioning will work only for Equi-joins or natural joins. For inequality joins, partitioning will not
work. Consider a join condition as given below;
r r.a>s.b s
In this non-equal join condition, the tuples (records) of r must be joined with records of s for all the records where the value
of attribute r.a is greater than s.b. In other words, all records of r join with some records of s and vice versa. That is, one
of the relations’ all records must be joined with some of the records of other relation. For clear example, see Non-equi join
post.
What does fragment and replicate mean?
Fragment means partitioning a table either horizontally or vertically (Horizontal and Vertical fragmentation). Replicate means
duplicating the relation, i.e, generating similar copies of a table. This join is performed by fragmenting and replicating the
tables to be joined.
Points to Note:
1. Non-equal join can be performed in parallel.
2. If one of the relations to be joined is already partitioned into n processors, this technique is best suited, because we need
to replicate the other relation.
3. Unlike in Partitioned Join, any partitioning techniques can be used.
4. If one of the relations to be joined is very small, the technique performs better.
2. The values for m and n are chosen based on the availability of processor. That is, we need at least m*n processors to
perform join.
3. Now we have to distribute all the partitions of r and s into available processors. And, remember that we need to
compare every tuple of one relation with every tuple of other relation. That is the records of r 0 partition should be
compared with all partitions of s, and the records of partition s 0 should be compared with all partitions of r. This
must be done with all the partitions of r and s as mentioned above. Hence, the data distribution is done as follows;
a. As we need m*n processors, let us assume that we have processors P 0,0, P0,1, …, P0,n-1, P1,0, P1,1, …, Pm-1,n-1.
Thus, processor Pi,j performs the join of ri with sj.
b. To ensure the comparison of every partition of r with every other partition of s, we replicate r i with the
processors, Pi,0, Pi,1, Pi,2, …, Pi,n-1, where 0, 1, 2, …, n-1 are partitions of s. This replication ensures the comparison of every
ri with complete s.
c. To ensure the comparison of every partition of s with every other partition of r, we replicate s i with the
processors, P0,i, P1,i, P2,i, …, Pm-1,i, where 0, 1, 2, …, m-1 are partitions of r. This replication ensures the comparison of every
si with complete r.
4. Pi,j computes the join locally to produce the join result.
Figure 2 given below shows the process of general case Fragment-and-Replicate join (it may not be the appropriate
example, but it clearly shows the process);
Points to Note:
1. Asymmetric Fragment-and-replicate join is the special case of general case Fragment-and-replicate join, where n or m is 1,
i.e, if one of the relation does not have partitions.
2. When compared to asymmetric technique, Fragment-and-replicate join reduces the size of the tables at every processor.
3. Any partitioning techniques can be used and any joining technique can be used as well.
Sort merge join is used to join two independent data sources. They perform better than
nested loop when the volume of data is big in tables but not as good as hash joins in general.
They perform better than hash join when the join condition columns are already sorted or
there is no sorting required.
The sort merge operation is the most ideal for parallel query because a merge join
always performs full-table scans against the tables. Sort merge joins are generally
best for queries that produce very large result sets such as daily reports and table
detail summary queries. Here we see a simple query that has been formed to
perform a sort merge using parallel query against both tables.
select /*+ use_merge(e,b) parallel(e, 4) parallel(b, 4) */
e.ename,
hiredate,
b.comm
from
emp e,
bonus b
where
e.ename = b.ename
;
Suppose two salespeople attend a conference and each collect over 100 business
cards from potential new customers. They now each have a pile of cards in random
order, and they want to see how many cards are duplicated in both piles. The
salespeople alphabetize their piles, and then they call off names one at a time.
Because both piles of cards have been sorted, it becomes much easier to find the
names that appear in both piles. This example describes a SORT-MERGE join.
In a SORT-MERGE join, Oracle sorts the first row source by its join columns, sorts the
second row source by its join columns, and then merges the sorted row sources
together. As matches are found, they are put into the result set. SORT-MERGE joins
can be effective when lack of data selectivity or useful indexes render a NESTED
LOOPS join inefficient, or when both of the row sources are quite large (greater than
5 percent of the blocks accessed).
However, SORT-MERGE joins can be used only for equijoins (WHERE D.deptno =
E.deptno, as opposed to WHERE D.deptno >= E.deptno). SORT-MERGE joins require
temporary segments for sorting (if SORT_AREA_SIZE or the automatic memory
parameters like MEMORY_TARGET are set too small). This can lead to extra memory
utilization and/or extra disk I/O in the temporary tablespace. Table 1
below illustrates the method of executing the query shown next when a SORT-
MERGE join is performed.
Pipelined Parallelism and Independent Parallelism
Interoperation Parallelism
It is about executing different operations of a query in parallel. A single query may
involve multiple operations at once. We may exploit parallelism to achieve better
performance of such queries. Consider the example query given below;
SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;
It involves two operations. First one is an Aggregation and the second is grouping. For
executing this query,
We need to group all the employee records based on the attribute Dept_Id
first.
Then, for every group we can apply the AVG aggregate function to get the final
result.
We can use Interoperation parallelism concept to parallelize these two operations.
[Note: Intra-operation is about executing single operation of a query using multiple
processors in parallel]
The following are the variants using which we would achieve Interoperation
Parallelism;
1. Pipelined Parallelism
2. Independent Parallelism
1. Pipelined Parallelism
In Pipelined Parallelism, the idea is to consume the result produced by one operation
by the next operation in the pipeline. For example, consider the following operation;
r1 ⋈ r2 ⋈ r3 ⋈ r4
The above expression shows a natural join operation. This actually joins four tables.
This operation can be pipelined as follows;
Perform temp1 ← r1 ⋈ r2 at processor P1 and send the result temp1 to processor P2 to
perform temp2 ← temp1 ⋈ r3 and send the result temp2 to processor P3 to perform
result ← temp2 ⋈ r4. The advantage is, we do not need to store the intermediate
results, and instead the result produced at one processor can be consumed
directly by the other. Hence, we would start receiving tuples well before P1
completes the join assigned to it.
Disadvantages:
1. Pipelined parallelism is not the good choice, if degree of parallelism is high.
2. Useful with small number of processors.
3. Not all operations can be pipelined. For example, consider the query given in the
first section. Here, you need to group at least one department employees. Then only
the output can be given for aggregate operation at the next processor.
4. Cannot expect full speedup.
2. Independent Parallelism:
Operations that are not depending on each other can be executed in parallel at
different processors. This is called as Independent Parallelism.
For example, in the expression r1 ⋈ r2 ⋈ r3 ⋈ r4, the portion r1 ⋈ r2 can be done in
one processor, and r3 ⋈ r4 can be performed in the other processor. Both results can
be pipelined into the third processor to get the final result.
Disadvantages:
Does not work well in case of high degree of parallelism.