Assignment - 10 Parallel Sorting Techniques: Range-Partitioning Sort
Assignment - 10 Parallel Sorting Techniques: Range-Partitioning Sort
Range-partitioning sort works in two steps: first range partitioning the relation, then sorting each partition
separately. When we sort by range partitioning the relation, it is not necessary to range-partition the relation on the
same set of processors or disks as those on which that relation is stored. Suppose that we choose processors P 0, P1,
…., Pm, where m < n, to sort the relation. There are two steps involved in this operation:
1. Redistribute the tuples in the relation, using a range-partition strategy, so that all tuples that lie within the i th
range are sent to processor P i, which stores the relation temporarily on disk D i. To implement range partitioning, in
parallel every processor reads the tuples from its disk and sends the tuples to their destination processors. Each
processor P0, P1,..., Pm also receives tuples belonging to its partition, and stores them locally. This step requires disk
I/O and communication overhead.
2. Each of the processors sorts its partition of the relation locally, without interaction with the other
processors. Each processor executes the same operation—namely, sorting—on a different dataset. (Execution of the
same operation in parallel on different sets of data is called data parallelism.) The final merge operation is trivial,
because the range partitioning in the first phase ensures that, for 1 ≤ i < j ≤ m, the key values in processor P i are all
less than the key values in Pj.
We must do range partitioning with a good range-partition vector,so that each partition will have approximately the
same number of tuples. Virtual processor partitioning can also be used to reduce skew.
Ex:
Step 1:
At first we have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2].
For our example, let us assume the following range vector v[14000, 24000]
This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and
more).
Redistribute the relations Employee0, Employee1 and Employee2 using these range vectors into 3 disks temporarily.
After this distribution disk 0 will have range 0 records (i.e, records with salary value less than or equal to 14000), disk
1 will have range 1 records (i.e, records with salary value greater than 14000 and less than or equal to 24000), and
disk 2 will have range 2 records (i.e, records with salary value greater than 24000).
This redistribution according to range vector v is represented in Figure 2 as links to all the disks from all the relations.
Temp_Employee0, Temp_Employee1, and Temp_Employee2, are the relations after successful redistribution. These
tables are stored temporarily in disks D 0, D1, and D2. (They can also be stored in Main memories (M 0, M1, M2) if they
fit into RAM).
Step 2:
Now, we got temporary relations at all the disks after redistribution.
At this point, all the processors sort the data assigned to them in ascending order of Salary individually. The process
of performing the same operation in parallel on different sets of data is called Data Parallelism.
Final Result:
After the processors completed the sorting, we can simply collect the data from different processors and merge
them. This merge process is straightforward as we have data already sorted for every range. Hence, collecting sorted
records from partition 0, partition 1 and partition 2 and merging them will give us final sorted output.
Parallelexternalsort–mergeisanalternativetorangepartitioning.Supposethat
arelationhasalreadybeenpartitionedamongdisks D 0, D1,...,Dn−1 (itdoesnot matter how the relation has been
partitioned). Parallel external sort–merge then works this way:
The mergingof the sorted runs in step 2 can be parallelizedby this sequence of actions:
As described, this sequence of actions results in an interesting form of execution skew, since at first every processor
sends all blocks of partition 0 to P 0, then every processor sends all blocks of partition 1 to P 1, and so on. Thus, while
sending happens in parallel, receiving tuples becomes sequential. First only P 0 receives tuples, then only P 1 receives
tuples, and so on. To avoid this problem, eachprocessorrepeatedlysendsablockofdatatoeachpartition.Inotherwords,
each processors ends the first block of every partition, then sends the second block of every partition, and so on. As
a result, all processors receive data in parallel. Some machines, such as the Teradata Purpose-Built Platform Family
machines, use specialized hardware to perform merging. The BYNET interconnection network in the Tera data
machines can merge output from multiple processors to give a single sorted output.
Ex:
Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D0, D1, and
D2 which are associated with processors P 0, P1, and P2. At processors P0, P1, and P2, the relations are named
Employee0, Employee1 and Employee2 respectively.
Step 1:
Sort the data stored in every partition (every disk) using the ordering attribute Salary. (Sorting of data in every
partition is done temporarily). At this stage every Employee i contains salary values of range minimum to maximum.
Step 2:
We have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2]. For
our example, let us assume the following range vector v[14000, 24000]
This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and
more).
Redistribute every partition (Employee0, Employee1 and Employee2) using these range vectors into 3 disks
temporarily.
Step 3:
Actually, the above said distribution is executed at all processors in parallel such that processors P0, P1, and P2 are
sending the first partition of Employee 0, 1, and 2 to disk 0. Upon receiving the records from various partitions, the
receiving processor P0 merges the sorted data. This is shown in Figure 4.
The above said process is done at all processors for different partitions.
Step 4:
The final concatenation of sorted data from all the disks is trivial.
Conclusion: