0% found this document useful (0 votes)
329 views22 pages

External Sorting: A Technical Paper

The document summarizes different techniques for external sorting of data that is too large to fit into memory. It discusses the distribution and merge phases of external sorting. In the distribution phase, strings of data are sorted internally and written to tape. In the merge phase, the strings are merged together to produce the fully sorted output. The document describes various external sorting algorithms like two-way merge sort, k-way balanced merge sort, cascade merge sort, and polyphase merge sort. It highlights that external sorting aims to minimize disk I/O by accessing the external data files as few times as possible during the sorting process.

Uploaded by

jamshed90
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views22 pages

External Sorting: A Technical Paper

The document summarizes different techniques for external sorting of data that is too large to fit into memory. It discusses the distribution and merge phases of external sorting. In the distribution phase, strings of data are sorted internally and written to tape. In the merge phase, the strings are merged together to produce the fully sorted output. The document describes various external sorting algorithms like two-way merge sort, k-way balanced merge sort, cascade merge sort, and polyphase merge sort. It highlights that external sorting aims to minimize disk I/O by accessing the external data files as few times as possible during the sorting process.

Uploaded by

jamshed90
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

A Technical Paper

On

EXTERNAL SORTING
Prepared By :

INTRODUCTION

One method for sorting a file is to load the file into memory, sort the data in memory, then write the results. But when the file cannot be loaded into memory due to resource limitations, an external sort applicable. External sorting refers to the sorting of a file that is on disk or tape. So external sorting is referred as tape sorting.

PHASES
The tape sorting methods, in general, requires two phases
Distribution phase Merge phase

Distribution Phase

In distribution phase strings of convenient lengths are created by use of a suitable internal sorting technique. The strings are generated one at a time. These strings are distributed according to some rule to one of the several tape.

Merge Phase

In the merge phase, these strings are merged to create longer strings, which are again merged. The process continues until the final merge generates s a single string that is the required sorted file.

Merging of strings-blocks

During a merging process, the strings to be merged will be called input strings and the generated merged string will be called the output string. Tapes containing input and output strings will be called input and output tapes This requires that strings should be considered as divided into a number of blocks of records. At the beginning of the merge, the leading block of each of the input strings are read into the memory. When a particular block is exhausted during merging, the next block is read in the same storage area. When the buffer is full, the block of records may be written onto the output tape

Initial Strings and Dummy strings


It is desirable to make the initial strings as long as long as is possible so that fewer passes will be required in the merge phase. The tape sorting techniques require that the total number of initial strings must be equal to some allowable numbers. The number of strings that are generated from the given file does not match with the required allowable number. In such cases additional dummy strings should be added in order to make the total number of strings equal to next higher allowable number.

During the sort, some of the data must be stored externally. Typically the data will be stored on tape or disk. The cost of accessing data is significantly greater than either bookkeeping or comparison costs. There may be severe restrictions on access. For example, if tape is used, items must be accessed sequentially.

Characteristics of External Sorting

Sorting by Merging

Sorting by merging is refers as merge sort. Merging is the process of combining two sorted lists into one sorted list.

Advantages of Sort Merge


Accesses records sequentially. Minimizes block accesses. Gives a stable sort.

Types of External Sorting


Two-way Merge Sort
K-way balanced Merge Sort Cascade Merge Sort Polyphase Merge Sort

Two-way Merge Sort

The first pass consists of creating strings of length 2 by comparing and if necessary, by exchanging pairs of items. As a result, the pass generates n/2 strings each of length 2 where n is the number of items in the given file. The second pass consists of merging pairs of strings generated in the previous pass. Successive passes are similar to the second pass and in each pass the number of strings is reduced by half of the number and the sizes of the strings are doubled.

EXAMPLE:Two-way merge sort


Original First pass Second pass file Third pass

78 45 5 88 36 9 11 2

45 78

5 88 9 36 2 11

5 45 78 88
2 9 11 36

5 2 9 11 36 45 78 88

Advantages of 2-way merge sort

The technique is easy to program,particularly when n is a power of 2 . This is very efficient technique for large n. The number of comparisons required is very low.

K-way balanced Merge Sort


The procedure of 2-way balanced merge easily extended to the general K-way balanced merge. The number of tape units required is 2k In each of the merge passes, the lengths of the strings are increased by a factor of k and number of strings is divided by k. n-km be the number strings generated in first pass. The number of merge passes required is given by m=log k n

EXAMPLE
Balanced Merge (2 way) with 32 strings and 4 tape units:
A 0 8(2) 0 2(8) 0 1(32) B 0 8(2) 0 2(8) 0 0 C 16(1) 0 4(4) 0 1(16) 0 D 16(1) 0 4(4) 0 1(16) 0 Description Distribution pass 2-way merge 2-way merge 2-way merge 2-way merge 2-way merge

Cascade Merge Sort

When t tape unit are available, balanced merge requires that t should be even and order of merge should be even and order of merge should be t/2.Thus for a balanced merge of higher order, a large number of tape units are tied up. Cascade merge allow (t-1)-way merge with only t tape units. The advantage is due to the distribution of initial strings in unequal numbers to the (t-1) tapes. Each pass commences with a (t-1)-way merge.

EXAMPLCascade Merge with 31 strings and 4 tape units E:


A B C 0 5(2) 5(2) D 0 6(3) 6(3) 6(3) Comment

Strings of level
Level Distributi on

14(1)

11(1) 5(1) 0 3(1)

6(1)

Distribution pass 3-way merge 2-way merge Copy


End of first Merging pass

6(1) 3(1) 0

3(6)

0 2(5) 2(5)

2(2) 0 1(3)

3(3) 1(3) 0

3-way merge 2-way merge Copy


End of second Merging pass

0 1 2 3 4

1,1,1 3,2,1 6,5,3 14,11,6 31,25,14

3(6) 3(6)

2(6) 1(6) 0

1(5) 0 1(6)

0 1(11) 1(11)

1(14) 1(14) 1(14)

3-way merge 2-way merge Copy


End of third Merging pass

1(31)

3-way merge

Polyphase Merge sort

Polyphase merge is similar to cascade merge as in this case also the distribution of strings is made unequally to ( i1) tapes where t is the total number of tape units available. The distribution rule is different from that of the cascade merge. In the merge phase, ployphase merge always restricts to (t1)-way merging.

EXAMPL E:

Strings of level
Level 1 2 3 4 5 . Distribution 1,1,1 2,2,1 4,3,2 7,6,4 13,11,7 .

Polyphase sort with 31 string and 4 tape units:


A
13(1)

B 11(1) 4(1) 0

C 7(1) 0 4(5)

D 0 7(3) 3(3)

Comments Distribution Pass Merge pass 1.3-way merging Merge pass 2 .3-way merging

6(1) 2(1)

0
1(17) 0

2(9)
1(9) 0

2(5)
1(5) 0

1(3)
0 1(31)

Merge pass 3 .3-way merging


Merge pass 4 .3-way merging Merge pass 5 .3-way merging

When the number of tape units available is t, cascade merge in each pass begins with a (t1)-way merge. But with the progress of the merge, its order is decreased and finally the pass ends with a copy operation. polyphase merge in each pass performs only (t1)-way merge. This is a distinct advantage of polyphase merge over cascade merge over cascade merge as it is desirable to have higher order merges whenever possible.

Comparison between cascade Merge and Polyphase

CONCLUSION
Very often the data set to be sorted becomes so large that internal sorting can't do anything, as the data set could not be fit into the main memory
External merge sort minimizes disk I/O cost fetching the external files as less number of times as required Among several merge sort techniques, k-way balanced merge sort, cascade merge sort, Polyphase merge sort, all are best types of external sorting from their distribution point of view.

THANK YOU!!!

You might also like