Radix Sort: Problem Description

The document describes an implementation of a parallel radix sort algorithm to sort strings read from an input file and write the sorted strings to an output file. The algorithm uses multiple threads to independently sort buckets of strings at each level of recursion in a most significant digit radix sort, limiting the number of concurrent threads to the number of available processors. Linked lists are used to store strings at each level to avoid requiring large contiguous memory. Testing with up to 10 million randomly generated strings showed sorting times scaling from 0.139 seconds for 10,000 strings to 94.087 seconds for 10 million strings.

Uploaded by

x_jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views5 pages

Radix Sort: Problem Description

Uploaded by

x_jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Radix Sort

Problem description Given a set of unsorted items with keys that can be considered as a binary representation of an integer, the bits within the key can be used to sort the set of items. This method of sorting is known as Radix Sort. Write a program that includes a threaded version of a Radix Sort algorithm that sorts the keys read from an input file, then output the sorted keys to another file. The input and output file names shall be the first and second arguments on the command line of the application execution. The first line of the input text file is the total number of keys (N) to be sorted; this is followed by N keys, one per line, in the file. A key will be a seven-character string made up of printable characters not including the space character (ASCII 0x20). The number of keys within the file is less than 2^31 - 1. Sorted output must be stored in a text file, one key per line. Timing: If you put timing code into your application to time the sorting process and report the elapsed time, this time will be used for scoring. If no timing code is added, the entire execution time (including time for input and output) will be used for scoring. Example Input file: 8 H@skell surVEYs sysTEMS HASKELL Surveys 1234567 SURveys systEMS Example Output file: 1234567 H@skell HASKELL SURveys Surveys surVEYs sysTEMS systEMS

Serial Algorithm Radix sort is a sorting algorithm that sorts integers by processing individual digits. Because integers can represent strings of characters (e.g., names or dates) and specially formatted

floating point numbers, radix sort is not limited to integers. A most significant digit (MSD) radix sort can be used to sort keys in lexicographic order. Unlike a least significant digit (LSD) radix sort, a most significant digit radix sort does not necessarily preserve the original order of duplicate keys. A MSD radix sort starts processing the keys from the most significant digit, leftmost digit, to the least significant digit, rightmost digit. This sequence is opposite that of least significant digit (LSD) radix sorts. An MSD radix sort stops rearranging the position of a key when the processing reaches a unique prefix of the key. Some MSD radix sorts use one level of buckets in which to group the keys. See the counting sort and pigeonhole sort articles. Other MSD radix sorts use multiple levels of buckets, which form a trie or a path in a trie. A postman's sort / postal sort is a kind of MSD radix sort. Radix sort, essentially, uses a tree to sort keys with each node branching out into R other nodes where R is the radix used. A recursively subdividing MSD radix sort algorithm works as follows: 1. Take the most significant digit of each key. 2. Sort the list of elements based on that digit, grouping elements with the same digit into one bucket. 3. Recursively sort each bucket, starting with the next digit to the right. 4. Concatenate the buckets together in order.

Considerations 1. There may be as many as 2^31 1 strings. This implies that 64-bit addressing may be required which is accomplished by defining _FILE_OFFSET_BITS=64 at compilation time. 2. Theoretically, to store the entire tree for a 7 byte string more than 94^7 * 8 > 480TB of storage would be required. This could cause problems unless memory is managed well. 3. To avoid excessive thread swapping, controlling the number of running threads might be a good idea. Hence, the number of running threads, in this implementation, is limited to the number of available processors, determined using /proc/cpuinfo.

Parallelization The recursive version of the (Most Significant Digit) MSD radix sort algorithm has particular application to parallel computing, as each of the subdivisions can be sorted independently of the rest. The motivation, here, is to engage as many of the available processors while managing memory usage effectively. Implementation This implementation uses POSIX threads for the sake of compatibility across systems. 1 Reading strings 1.1 The input data is stored in linked-lists. This way the requirement of a large chunk of contiguous memory is avoided and the nodes of the linked-list can be reused at the higher levels of the tree. 1.2 As many lists as the number of processors in the system (say P) are created and the input data is stored in these lists. 2 Sorting 2.1 The P linked-lists are processed in parallel. Strings in the linked-lists are assigned to 94 buckets corresponding to the printable characters (ASCII 33 to 126). Each bucket is a linked-list in itself. This has been done with the hope that processors will not be idle during the initial assignment to buckets. 2.2 A different strategy is used for the higher levels of the tree: 2.2.1 Buckets of the first level that contain more than one string are processed in parallel, P buckets at a time. 2.2.2 One thread fully processes an entire first level bucket using recursion at each level and then picks up the next available bucket for processing. 2.2.3 Once all the buckets at a certain level have been processed, the linked-lists are merged into a single list and assigned to the parent bucket. 3 Writing sorted strings 3.1 The sorted list is written to the specified output file. Testing The following code was used to generate test data:
#include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main(int argc, char **argv) { if(argc==3) { int nRec = atoi(argv[1]); cout<<"Generating "<<nRec<<" strings"<<endl; ofstream fout(argv[2]); if(fout.fail()) {

cerr << "ERROR: File could not be created!" << endl; return EXIT_FAILURE; } char str[8]; str[7] = '\0'; srand(0); for(int j=0; j<7; j++) { str[j] = rand() % 94 + 33; } fout<<nRec<<endl; for(int i=0; i<nRec; i++) { int charsToChange = rand() % 8; for(int j=0; j<charsToChange; j++) { str[6 - j] = rand() % 94 + 33; } fout<<str<<endl; } fout.close(); } else { cout<<"Syntax: "<<argv[0]<<" <number of strings> <filename>"<<endl; } return EXIT_SUCCESS; }

The strings generated are fairly random. But care has been taken, to ensure that duplicate strings are also generated.

Performance Expectations 1. This implementation is expected to perform well given a well-distributed set of input strings. 2. The performance would be worst when all the input strings have the same first character. 3. Another advantage of this implementation is low memory requirement. This implementation managed to sort 10 million strings on a machine with 750MB of RAM while using only 20.3% of the available memory (as reported by top). In comparison, the default qsort implementation uses 25.3% of the available memory to sort the same input file. Observations The application was tested with upto 10 million strings, generated using the test program described above, on a relatively low performance machine. The observations were as follows: Number of strings 10,000 100,000 1,000,000 10,000,000 Radix Sort Timing 0.139 sec 1.142 sec 10.429 sec 94.087 sec

DB2 Administration Guide
No ratings yet
DB2 Administration Guide
1,204 pages
Dovecot Nfs
No ratings yet
Dovecot Nfs
2 pages
ORA-27054 NFS File System Where The File Is Created or Resides Is Not Mounted With Correct Options
No ratings yet
ORA-27054 NFS File System Where The File Is Created or Resides Is Not Mounted With Correct Options
3 pages
Document 1332308
No ratings yet
Document 1332308
13 pages
PhDWin Tutorial
No ratings yet
PhDWin Tutorial
12 pages
Sorting Algorithms
100% (6)
Sorting Algorithms
16 pages
ITP101 Lab6 Word PDF
No ratings yet
ITP101 Lab6 Word PDF
9 pages
Java HashMap, LinkedHashMap and TreeMap - W3resource
No ratings yet
Java HashMap, LinkedHashMap and TreeMap - W3resource
6 pages
Dragos
No ratings yet
Dragos
8 pages
Radix Sort
No ratings yet
Radix Sort
5 pages
Alter Index
No ratings yet
Alter Index
56 pages
Caterpillar CCM m50
No ratings yet
Caterpillar CCM m50
4 pages
A New Block S-Random Interleaver For Shorter Length Frames For Turbo Codes
No ratings yet
A New Block S-Random Interleaver For Shorter Length Frames For Turbo Codes
6 pages
Information Technology Security Policy Related To The Accounting/Financial Data of Punjab State Road Sector Project
No ratings yet
Information Technology Security Policy Related To The Accounting/Financial Data of Punjab State Road Sector Project
3 pages
Fingerprint SDK Interface Documentation: 1.1 Initialize
No ratings yet
Fingerprint SDK Interface Documentation: 1.1 Initialize
10 pages
Bittorrent-Like P2P Network
No ratings yet
Bittorrent-Like P2P Network
41 pages
Strings: Steven Skiena
No ratings yet
Strings: Steven Skiena
20 pages
String Algorithm
No ratings yet
String Algorithm
17 pages
18 Radix Sort
No ratings yet
18 Radix Sort
51 pages
List of Programs: Course: Design and Analysis of Algorithms Course Id: MCA202
No ratings yet
List of Programs: Course: Design and Analysis of Algorithms Course Id: MCA202
55 pages
Radix Sorting: IESL College of Engineering
No ratings yet
Radix Sorting: IESL College of Engineering
22 pages
Journal of Information Science: Formulation and Analysis of In-Place MSD Radix Sort Algorithms
No ratings yet
Journal of Information Science: Formulation and Analysis of In-Place MSD Radix Sort Algorithms
18 pages
Radix Sort
No ratings yet
Radix Sort
49 pages
Radix Final 171212133658 PDF
No ratings yet
Radix Final 171212133658 PDF
15 pages
Radix Sort
No ratings yet
Radix Sort
10 pages
Radix Sort in Java
No ratings yet
Radix Sort in Java
7 pages
Radix Sort Algorithm
No ratings yet
Radix Sort Algorithm
10 pages
Radix 4up
No ratings yet
Radix 4up
10 pages
MS 101: Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
No ratings yet
MS 101: Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
28 pages
Radix Sort (Chapter 10)
No ratings yet
Radix Sort (Chapter 10)
11 pages
Sorting Algorithms: Welcome To CS221: Programming & Data Structures
No ratings yet
Sorting Algorithms: Welcome To CS221: Programming & Data Structures
9 pages
Radix Sort
No ratings yet
Radix Sort
6 pages
Janine
No ratings yet
Janine
2 pages
7 Radix Sort
No ratings yet
7 Radix Sort
4 pages
Filenamecomparator
No ratings yet
Filenamecomparator
3 pages
Assignment #5: Sorting Lab: Due: Mon, Feb 25 2:15pm
No ratings yet
Assignment #5: Sorting Lab: Due: Mon, Feb 25 2:15pm
5 pages
Dimensionnement Spark - Les 5 Erreurs À Éviter
No ratings yet
Dimensionnement Spark - Les 5 Erreurs À Éviter
75 pages
Dynam T Vse Product Sheet
No ratings yet
Dynam T Vse Product Sheet
5 pages
Sorting Array of Strings: Void String - Sort (Char Arr, Const Int CNT, Int ( CMP - Func) (Const Char A, Const Char B) )
No ratings yet
Sorting Array of Strings: Void String - Sort (Char Arr, Const Int CNT, Int ( CMP - Func) (Const Char A, Const Char B) )
2 pages
SAP Upgrade Strategy For World Bank Group: Jagadish Shamliya
No ratings yet
SAP Upgrade Strategy For World Bank Group: Jagadish Shamliya
12 pages
Spark Recovery
No ratings yet
Spark Recovery
2 pages
Radix Sort
No ratings yet
Radix Sort
10 pages
Rekha Saripella - Radix and Bucket Sort
No ratings yet
Rekha Saripella - Radix and Bucket Sort
22 pages
Chapter - 1 Pointers 1 Pointers 1 Pointers 1 Pointers: Shree H. N. Shukla College of I.T. & MGMT
No ratings yet
Chapter - 1 Pointers 1 Pointers 1 Pointers 1 Pointers: Shree H. N. Shukla College of I.T. & MGMT
12 pages
Diploma in Computer Hardware Maintenance and Network Technologies (DCHMNT)
No ratings yet
Diploma in Computer Hardware Maintenance and Network Technologies (DCHMNT)
9 pages
Radix Sort - Wikipedia, The Free Encyclopedia
No ratings yet
Radix Sort - Wikipedia, The Free Encyclopedia
13 pages
Data Frame
No ratings yet
Data Frame
17 pages
Assign5 Solution
No ratings yet
Assign5 Solution
4 pages
Lab CS213 - 21 03 2023 PDF
No ratings yet
Lab CS213 - 21 03 2023 PDF
7 pages
Radixsort
No ratings yet
Radixsort
20 pages
Change Data Capture (CDC) For Iceberg
No ratings yet
Change Data Capture (CDC) For Iceberg
11 pages
String Program
No ratings yet
String Program
3 pages
Radix Sort
No ratings yet
Radix Sort
14 pages
10 - Bucket N Radix Sort
No ratings yet
10 - Bucket N Radix Sort
27 pages
Using The Virtual Table Server
No ratings yet
Using The Virtual Table Server
34 pages
Document From Pratik Salunke Dsu
No ratings yet
Document From Pratik Salunke Dsu
16 pages
AOMEI Backupper
No ratings yet
AOMEI Backupper
11 pages
Module 5-1
No ratings yet
Module 5-1
36 pages
DS in 7 Hours
No ratings yet
DS in 7 Hours
87 pages
Informatics Practices - Import Theory Question - SQL
No ratings yet
Informatics Practices - Import Theory Question - SQL
7 pages
He
No ratings yet
He
10 pages
Theoretically-Efficient and Practical Parallel In-Place Radix Sorting
No ratings yet
Theoretically-Efficient and Practical Parallel In-Place Radix Sorting
12 pages
Samanyu Kaushal-Program File
No ratings yet
Samanyu Kaushal-Program File
26 pages
2100 2122 8 Sorting in Linear Time
No ratings yet
2100 2122 8 Sorting in Linear Time
29 pages
DS - Radix Sort (8) - SLM
No ratings yet
DS - Radix Sort (8) - SLM
5 pages
Lab4 Sorting Part02
No ratings yet
Lab4 Sorting Part02
11 pages
Radix Sort
No ratings yet
Radix Sort
46 pages
002 Mpmctermwork
No ratings yet
002 Mpmctermwork
7 pages
AGH Computer Science C Programming Laboratory 6
No ratings yet
AGH Computer Science C Programming Laboratory 6
3 pages
Lab 4
No ratings yet
Lab 4
2 pages
Sorting Double Arrays Lab
No ratings yet
Sorting Double Arrays Lab
3 pages
DS JAVA - Lab-Radix Sort
No ratings yet
DS JAVA - Lab-Radix Sort
2 pages
Lecture 14 - Radix Sort
No ratings yet
Lecture 14 - Radix Sort
106 pages
3 Radix-Sort
No ratings yet
3 Radix-Sort
3 pages
4 Radix Sort
No ratings yet
4 Radix Sort
11 pages
RadixSortingGSA G1
No ratings yet
RadixSortingGSA G1
9 pages
Data Processing Year 11 First Term Half
No ratings yet
Data Processing Year 11 First Term Half
10 pages
Advanced DSA - 2.1 - 1723800288922
No ratings yet
Advanced DSA - 2.1 - 1723800288922
9 pages
DAA Programming Project - 1
No ratings yet
DAA Programming Project - 1
26 pages
CS111-Programming Project 1 2025
No ratings yet
CS111-Programming Project 1 2025
2 pages
(61B SP25) Lecture 37 - Algorithm Design and Reductions
No ratings yet
(61B SP25) Lecture 37 - Algorithm Design and Reductions
36 pages
Chapter7 Part4 Radix Counting
No ratings yet
Chapter7 Part4 Radix Counting
16 pages
Radix Sort
No ratings yet
Radix Sort
11 pages
20250505161527-DS Module-5
No ratings yet
20250505161527-DS Module-5
19 pages
O Maior Tune Book para Tin Whistle - Compressed - Compressed
No ratings yet
O Maior Tune Book para Tin Whistle - Compressed - Compressed
1,109 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Radix Sort: Problem Description

Uploaded by

Radix Sort: Problem Description

Uploaded by

Radix Sort

You might also like