0% found this document useful (0 votes)

330 views8 pages

Vectorization of Insertion Sort Using Altivec

This document discusses vectorizing insertion sort and merge sort algorithms using Altivec SIMD instructions on PowerPC processors. It describes dividing a dataset into 16 sets that can be sorted in parallel using Altivec registers holding 16 data elements. An extended merge sort is then used to merge the partially sorted sets. Vectorizing insertion sort provides a speedup of sorting 16 sets at once instead of one, reducing the time complexity. The document provides code examples of the vectorized algorithms.

Uploaded by

ravg10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

330 views8 pages

Vectorization of Insertion Sort Using Altivec

Uploaded by

ravg10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Vectorization of Insertion Sort using Altivec,

and an Extended Merge Sort algorithm

c 2005 Konstantinos Margaritis, [email protected]

Licensed under the LGPL
11/07/2005

Abstract
Parallelization of a sort algorithm is not always easy nor possible, or if
it is, this does not necessarily mean significant gains in performance. After
reading in Slashdot about a GPU version of Quicksort ([gpu]), I was curi-
ous if vectorization of common sorting algorithms, such as qsort, insertion
sort and merge sort could be vectorizable and adapted to be used with Al-
tivec, the PowerPC SIMD unit. The results were more than interesting, in
some cases offering speed gains of 54%!! in this paper, we will present vec-
torization techniques for all 3 of these algorithms. Note: This paper does
not yet include the quicksort vectorization.

1 Vectorizing Insertion sort

The original (scalar) Insertion sort algorithm is at worst an O(N 2 ) algorithm.
While simple in concept, it’s not considered one of the fast sorting routines so
it’s not very popular in code where performance is necessary. A very thorough
description of the algorithm can be found in [wika]. We provide here a simple
implementation against which comparisons have been made.
void inssort_c(uint8_t a[], size_t N) {
int i, j;
for(i = 1; i < N; i++) {
int value = a[i];
for (j = i-1; j >= 0 && a[j] > value; j--) {
a[j+1] = a[j];
}
a[j+1] = value;
}
}
Here, the algorithm will sort a set of N unsigned chars. The sorting will be
stable, meaning that the position of equal elements will not be changed. The
algorithm implementation remains the same in case we’re dealing with ints or
floats.
The basic concept of vectorization implicitly denotes the steps we must take
for vectorization of this algorithm. We shall devide the set of N elements in 16

1
Figure 1: Sorting 16 sets in parallel using Altivec

sets (as an Altivec register can hold 16 chars, which can be processed in par-
allel). Again, the situation is analogous if we’re having ints or floats deviding
into 4 sets (or 8 sets if dealing with short ints).
The idea is to sort 4/8/16 sets in parallel and then merge the results. This
is actually a variant of the Shell Sort algorithm ([wikb]). How we will do the
merging stage, still remains to be explained later. First we will have to deal
with the sorting of 16 parallel columns of data. In theory this is easy, but we
must make sure we handle a vector at a time, and in as few instructions as
possible, preferably in one go. To do that, we first have to load the data into
Altivec registers. We will use two registers as the comparison will be used
between two sets, one used as the key. Obviously, this means that the Altivec
version will work only for sizes > 32 chars (which is the size of 2 vectors).

vector uint8_t va_key, va_cur,

vector unsigned char *cur;
if (length >= 32) {
// How many sets of N/16 do we have to deal with?
loops = length/16;
for(i = 1; i < loops; i++) {
// Get the key set of elements
// and load it into an Altivec vector register
cur = a + i*16;
va_key = vec_ld(0, cur);
// Compare all the previous sets to the key set.
for (j = i-1; j >= 0; j--) {
cur = a + j*16;
va_cur = vec_ld(0, cur);
// Compare the sets
}

2
}
}

How do we compare 2 sets of 16 characters at once? Even more tricky, how

do we swap the elements between themselves in the description and imple-
mentation of the Insertion sort algorithm? Altivec comes to the rescue with
its powerful comparison and selection instructions. Using vec_cmpgt, Altivec
provides us a comparison mask of the 2 vectors, which we can use then to
select individual elements of each vector, to combine new vectors.
Let’s explain this a little more:
The algorithm compares keys at positions i and j (initial j value is set to
i 1)
While a[ j] > a[i] the a[ j] is moved one position down to a[ j + 1]. Decrease
j, while it’s greater than zero.
When the first element is found that is smaller or equal to a[i], it is re-
placed by a[i].
To do this in parallel for 16 sets, we would have to do 16 comparisons at
once, and how would we handle 16 swaps, that might happen in totally dif-
ferent positions in each set? Instead of swapping elements, we create a new
vector that holds the appropriate value according to the comparison mask. So
the swapping code is equivalent to:

Generate comparison mask for elements a[i] and a[ j].

Generate first result using the comparison mask and the two elements as
operators (eg. select a[i] if mask is 1, or a[ j] if mask is 0).
Generate second result using the same comparison mask but reversed the
order element (ie. select a[ j] if mask is 1, a[i] if mask is 0).
Store the first result in a[ j]

Now this repeated 16 times over a vector is what Altivec does. What is
the benefit? The benefit is that this code does not use any branches and it
can be pre-calculated in the pipeline by the processor, giving very good per-
formance results. Plus it is constant, meaning it will take exactly the same
number of instructions regardless the data given. So the initial scalar loop can
be replaced by the following Altivec instructions (the following code sorts an
array of chars):

int vecinssort_c(uint8_t a[], int length) {

vector uint8_t va_key, va_cur, va_next, va_first, va_second;
vector bool char va_cmpmask;
vector uint8_t *cur;
int i, j, k;
if (length >= 32) {
// How many sets of N/16 do we have to deal with?
loops = length/16;
for(i = 1; i < loops; i++) {

3
// Get the key set of elements
// and load it into an Altivec vector register
cur = a + i*16;
va_key = vec_ld(0, cur);
// Compare all the previous sets to the key set.
for (j = i-1; j >= 0; j--) {
cur = a + j*16;
/* Load current and next 16-byte sets
into Altivec registers
*/
va_cur = vec_ld(0, cur);
va_next = vec_ld(16, cur);

/* Generate the comparison mask between

the current vector and the key vector
*/
va_cmpmask = vec_cmpgt(va_cur, va_key);

/* And construct the first and second

result vectors to replace the 2 16-byte sets
*/
va_first = vec_sel(va_cur, va_key, va_cmpmask);
va_second = vec_sel(va_next, va_cur, va_cmpmask);

// Store the vectors to their appropriate

// positions.
vec_st(va_first, 0, cur);
vec_st(va_second, 16, cur);
}
}
}
}

The result will be 16 sorted sets of N =16 elements. As the algorithm is still
basically an Insertion Sort the time required is still O(N 0 2 ) but this time N 0 =
N =16. So for example for N = 64 elements, the original algorithm would do
N 2 = 4096 steps to execute, while the Altivec version would do just N 0 = 4,
N 0 2 = 16 steps!!
So is this over? No, because the result is still not what we want, we have to
merge the remaining sets to a final sorted set. For that reason we have to use
an extended version of Merge Sort.

2 Extending/Vectorizing Merge sort

Why extended? Well, the original Merge Sort algorithm copes for 2 sorted sets,
while we need to merge 4/8/16 sets of already sorted elements.
The process is quite quite generic (i.e. not Altivec specific), with a similar
concept to the 2-set case, but obviously extended to cover more (at first we will

4
Figure 2: Merging the 16 sets using extended Merge sort

initially cover N cases, optimizing afterwards for 4/8/16 sets). Let’s begin by
describing the algorithm, assuming we have N sets of sorted data:

1. Initialize a indices[] array of integers, with size N. Set all elements to 0.

2. Initialize an flag[] array of booleans, size N. Set all elements to 0.
3. Initialize the result[] array to store the final sorted data (which has size
N columnsize).
4. Get the next N-tuple of data which we want to sort. The N-tuple will be
comprised of the elements in the set, each in the position indicated by the
indices[] array.

5. Find the minimum value(s) in the N-tuple and note its position(s), pos.
6. Insert the value(s) to the next available slot in the result[] array.
7. Increase indices[pos] by 1 (for all elements if more than one is found).
8. Set flag[pos] to 1.
9. Go back to 5 until all elements in flag[] are set to 1.
10. Go back to 4 while there are remaining sets.

This looks even easier if you see it in code:

#define INDEX(n,x,y) ((x)*(n) +(y))

int mergesort_c(uint8_t a[], size_t sets, size_t columnsize) {

// Set up variables

5
uint32_t flag = 0, remsets = sets, length = sets*columnsize;
size_t i, j, set = 0, index = 0;
uint8_t min, cur;

// Initialize and zero the counters[] array.

size_t counters[sets];
memset(counters, 0, sets*sizeof(size_t));
// Allocate the memory for the final sorted array.
uint8_t *target = malloc(length*sizeof(uint8_t));
if (target == NULL) {
printf("Error: mergesort_c(): malloc() \\
failed to allocate %d bytes\n", length);
exit(10);
}

// While we have remaining sets to sort, loop

while (remsets) {
// set up the initial min value to the max value,
// so that it gets reset immediately.
min = UCHAR_MAX;

// Loop over all the sets

for (j=0; j < sets; j++) {
// Check if this particular set has been finished
// (its bit is set) otherwise consider it when
// searching for the min value.
if (!((1 << j) & flag)) {
cur = a[INDEX(sets, counters[j], j)];

if (min >= cur) {

min = cur;
set = j;
}
}
}
// Insert the min value to the next available position
// in the final array
target[index++] = min;
// Increase the counter for this particular set
counters[set]++;
// If this counter exceeds the columnsize, then
// we have finished the set. Set its bit in
// the flag variable to 1 and decrease remsets.
if (counters[set] >= columnsize) {
flag |= (1 << set);
remsets--;
}
}
// Copy the target array to our given array and free target
memcpy(a, target, length);

6
Size It. (Alticec) It. (Scalar) Ratio
8 41 14 0.33
16 86 59 0.75
32 188 286 1.10
64 440 917 1.33
128 1136 4335 2.25
256 3296 16019 2.67
512 10688 66644 3.24
1024 37760 262649 3.50
2048 141056 1064615 3.73
4096 544256 4237939 3.81
8192 2137088 16521839 3.76

Figure 3: Insertion Sort: Altivec versus Scalar (ints)

if (target)
free(target);

return;
}

You may have noticed that the code is scalar and with nothing Altivec spe-
cific. It is possible to do some Altivec optimizations in this one, especially with
regards to finding the minimum value, as Altivec provides such a function
(vec_min) that will find the minimum value in a vector in just 1 CPU cycle!
This will follow in a next revision of this paper.
Also worth mentioning, we have used a 32-bit integer and setting bits to
denote that a set has been completed. This was done for performance reasons,
but it also presents a limitation. We can’t use more than 32 sets of sorted data
to merge. But that’s quite acceptable for our purposes, as we only need 16 sets
maximum.
This Extended Merge Sort algorith, like the original algorithm is not an in-
place sorting algorithm, that is, the data get sorted in another array and then
copied back to the original array. We have used memcpy() for the copying pro-
cess, as we believe it to be faster than other methods (eg. per-element copying),
but there is of course no other particular reason or this choice. In particular,
since this function has already been vectorized for Altivec, we might benefit
from that as well.

2.1 Performance Results

Here we will present the results of using our Insertion sort for unsigned inte-
gers and characters (ie 32-bit and 8-bit integers) for 100 loops. Figure 3 presents
the case with 32-bit integers, while figure 4 presents the same results for 8-bit
unsigned characters. As you can see, the Altivec version gives really impres-
sive results! Of course this has to do with the advanced Merge Sort and the
fact that for large sizes, the chance of finding the same keys increases. This is
obvious for chars, as there are only 255 available values for a char.

7
Size It. (Alticec) It. (Scalar) Ratio
128 2048 4199 1.00
256 4096 16429 2.00
512 8192 67361 4.25
1024 16384 259680 7.33
2048 32768 1036143 13.79
4096 65536 4116630 20.27
8192 131072 16996544 29.85
16384 262144 67815998 37.50
32768 524288 265531367 42.12

Figure 4: Insertion Sort: Altivec versus Scalar (chars)

3 Conclusions
Altivec is really a very powerful tool but so far its use is really mainly centered
around Multimedia and or Linear Algebra scientific applications. We strongly
believe that Altivec can be of real benefit to generic system-wide OS usage
as well. We will show eventually with a series of papers like this one that it
can be used for other generic applications, like data manipulation, other sort
algorithms, etc.

References
[gpu] Gpusort.
[wika] Shell sort.
[wikb] Shell sort.

Edexcel International GCSE 9 1 Computer Science Student Book
60% (10)
Edexcel International GCSE 9 1 Computer Science Student Book
319 pages
CSE-1303 Final-Assignment 19304025
0% (1)
CSE-1303 Final-Assignment 19304025
21 pages
Indirect Sorting
No ratings yet
Indirect Sorting
24 pages
Practical-1:Implementation and Time Analysis of Sorting Algorithms
No ratings yet
Practical-1:Implementation and Time Analysis of Sorting Algorithms
10 pages
04-Mergesort
No ratings yet
04-Mergesort
55 pages
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
No ratings yet
PS3 Programming Basics: Week 1. SIMD Programming On PPE Materials Are Adapted From The Textbook
37 pages
mergeSortppt1
No ratings yet
mergeSortppt1
15 pages
3.parallel Processing - Algorithms
No ratings yet
3.parallel Processing - Algorithms
37 pages
F8 PDF
No ratings yet
F8 PDF
32 pages
Advanced Data Structure & Algorithm_2_1724990251486
No ratings yet
Advanced Data Structure & Algorithm_2_1724990251486
12 pages
Merge Sort: Biostatistics 615/815
No ratings yet
Merge Sort: Biostatistics 615/815
36 pages
Lecture 7 - Sorting
No ratings yet
Lecture 7 - Sorting
38 pages
Parth ADA Practical K
No ratings yet
Parth ADA Practical K
53 pages
05 Elementary Sorts
No ratings yet
05 Elementary Sorts
47 pages
Untitled document
No ratings yet
Untitled document
11 pages
Objective: LAB-1: Implementation of Quick Sort and Merge Sort 1.1 Quick Sort
No ratings yet
Objective: LAB-1: Implementation of Quick Sort and Merge Sort 1.1 Quick Sort
28 pages
ADA lab file 2025-3
No ratings yet
ADA lab file 2025-3
37 pages
Chapter7 External Sorting (1)
No ratings yet
Chapter7 External Sorting (1)
23 pages
dsa small
No ratings yet
dsa small
21 pages
Lab Assignment
No ratings yet
Lab Assignment
24 pages
Algorithm Design and Analysis GGSIPU Complete Lab File
0% (1)
Algorithm Design and Analysis GGSIPU Complete Lab File
26 pages
Brief - Report - CSE 01706543
No ratings yet
Brief - Report - CSE 01706543
15 pages
L1 L3
No ratings yet
L1 L3
54 pages
Sorting Techniques
No ratings yet
Sorting Techniques
56 pages
Sorting in C
No ratings yet
Sorting in C
18 pages
Experiment 5
No ratings yet
Experiment 5
7 pages
Merge Sort
No ratings yet
Merge Sort
3 pages
Exp 3-2
No ratings yet
Exp 3-2
9 pages
Ram CTSD New
No ratings yet
Ram CTSD New
72 pages
Lecture 9
No ratings yet
Lecture 9
34 pages
ADA Practicals
No ratings yet
ADA Practicals
38 pages
2210061 Lab 8
No ratings yet
2210061 Lab 8
8 pages
Practical File of Design of Algorithms and Analysis
No ratings yet
Practical File of Design of Algorithms and Analysis
18 pages
Vidya Bhavan College For Engineering Technology, Rautapur, Chaubeypur, Kanpur
No ratings yet
Vidya Bhavan College For Engineering Technology, Rautapur, Chaubeypur, Kanpur
22 pages
3 Arrays
No ratings yet
3 Arrays
31 pages
UNIT-V DSUC-UG
No ratings yet
UNIT-V DSUC-UG
51 pages
Daa Practical 1 To 8
No ratings yet
Daa Practical 1 To 8
30 pages
Q2.Nabil Mohsen Alzeqri
No ratings yet
Q2.Nabil Mohsen Alzeqri
7 pages
Merge Sort Notes
No ratings yet
Merge Sort Notes
13 pages
Practical Record File: Algorithms Analysis and Design (ETCS-351)
No ratings yet
Practical Record File: Algorithms Analysis and Design (ETCS-351)
23 pages
Unit IV -Merge Sort
No ratings yet
Unit IV -Merge Sort
18 pages
DSA-Chapter-3.2-2024
No ratings yet
DSA-Chapter-3.2-2024
5 pages
Lecture11 Handouts Proto
No ratings yet
Lecture11 Handouts Proto
45 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
5 pages
Mathematical Programming-II Merge Sort
No ratings yet
Mathematical Programming-II Merge Sort
18 pages
Quick & Merge Sort
No ratings yet
Quick & Merge Sort
10 pages
DSA - Unit IV
No ratings yet
DSA - Unit IV
37 pages
CS20L: Information Structures Semester II, 2004 Sorting: Definitions
No ratings yet
CS20L: Information Structures Semester II, 2004 Sorting: Definitions
11 pages
IPD (2)
No ratings yet
IPD (2)
12 pages
Implimentation and Analysis of Various Sorting Techniques
No ratings yet
Implimentation and Analysis of Various Sorting Techniques
30 pages
Slides03-MergeSort&CountInversion
No ratings yet
Slides03-MergeSort&CountInversion
95 pages
Nabil Mohsen Alzeqri
No ratings yet
Nabil Mohsen Alzeqri
7 pages
DS File
No ratings yet
DS File
27 pages
i Cse(r23) Sem-2 Ds Lab Manual
No ratings yet
i Cse(r23) Sem-2 Ds Lab Manual
44 pages
Divide and Conquer
No ratings yet
Divide and Conquer
9 pages
Merge Sort
No ratings yet
Merge Sort
9 pages
AKJ - STD - Module-5 - Sorting - Searching - Hashing
No ratings yet
AKJ - STD - Module-5 - Sorting - Searching - Hashing
132 pages
Data Structure and Algorithms Lab: Sorting Ii
No ratings yet
Data Structure and Algorithms Lab: Sorting Ii
30 pages
DFS File New
No ratings yet
DFS File New
86 pages
CSE2208-Lab Manual
No ratings yet
CSE2208-Lab Manual
28 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
White Spaces
No ratings yet
White Spaces
1 page
Explicit Algorithms For Probabilistic Model Checking: Dissertation
No ratings yet
Explicit Algorithms For Probabilistic Model Checking: Dissertation
210 pages
Return
No ratings yet
Return
1 page
Symbols
No ratings yet
Symbols
1 page
Rivest-Rc5 Network Security
No ratings yet
Rivest-Rc5 Network Security
11 pages
Sorting
No ratings yet
Sorting
8 pages
Pseudo Code Structured Programming
No ratings yet
Pseudo Code Structured Programming
11 pages
Parallel Quick Sort Algorithm
No ratings yet
Parallel Quick Sort Algorithm
8 pages
Quick Sort: I L J r+1 Do ++i While (A (I) Pivot && I R)
No ratings yet
Quick Sort: I L J r+1 Do ++i While (A (I) Pivot && I R)
6 pages
Data Structures: 8.1 Type Definitions
No ratings yet
Data Structures: 8.1 Type Definitions
19 pages
Mis387 7
No ratings yet
Mis387 7
12 pages
Control Statements: 5.1 Sequence
No ratings yet
Control Statements: 5.1 Sequence
16 pages
Mis387 3
No ratings yet
Mis387 3
8 pages
Mis387 4
No ratings yet
Mis387 4
11 pages
Lec 1
No ratings yet
Lec 1
4 pages
Keywords
No ratings yet
Keywords
1 page
Term Paper
No ratings yet
Term Paper
11 pages
Linked Lists: Data Structures
No ratings yet
Linked Lists: Data Structures
9 pages
Omega: Mahdi Alinaghian, Nadia Shokouhi
No ratings yet
Omega: Mahdi Alinaghian, Nadia Shokouhi
15 pages
Ba Paper 3
No ratings yet
Ba Paper 3
1 page
DS Lab Manual 22-23
No ratings yet
DS Lab Manual 22-23
20 pages
Homework 2: I K k+1 k+2 N
No ratings yet
Homework 2: I K k+1 k+2 N
4 pages
Data Structure and Algorithms: Travelling Salesman Problem Using Dynamic Programing
No ratings yet
Data Structure and Algorithms: Travelling Salesman Problem Using Dynamic Programing
15 pages
AI Sheet 3 - Problem Solving As Search (Heuristic Search - Adversarial Search) PDF
No ratings yet
AI Sheet 3 - Problem Solving As Search (Heuristic Search - Adversarial Search) PDF
2 pages
Data Structures & Algorithms ERE 3rd Sem
No ratings yet
Data Structures & Algorithms ERE 3rd Sem
160 pages
Airport Management System
100% (1)
Airport Management System
18 pages
PC-II - Advanced Data Structure
No ratings yet
PC-II - Advanced Data Structure
3 pages
2-Sort The Bitonic DLL-05-01-2024
No ratings yet
2-Sort The Bitonic DLL-05-01-2024
9 pages
Queue - Attempt Review - BK-LMS
No ratings yet
Queue - Attempt Review - BK-LMS
14 pages
Analysis and Design of Algorithm (ADA) : Module-1
No ratings yet
Analysis and Design of Algorithm (ADA) : Module-1
23 pages
Unit - 6
No ratings yet
Unit - 6
32 pages
Data Structures Algorithms
No ratings yet
Data Structures Algorithms
21 pages
Intro To Neural Networks
No ratings yet
Intro To Neural Networks
100 pages
Data Structures and Algorithms in Swift: Implement Stacks, Queues, Dictionaries, and Lists in Your Apps 1st Edition Elshad Karimov instant download
No ratings yet
Data Structures and Algorithms in Swift: Implement Stacks, Queues, Dictionaries, and Lists in Your Apps 1st Edition Elshad Karimov instant download
41 pages
ADS Syllabus
No ratings yet
ADS Syllabus
1 page
DS Module 4 Trees
No ratings yet
DS Module 4 Trees
21 pages
Collections & ArrayList - 1
No ratings yet
Collections & ArrayList - 1
45 pages
CS 332: Algorithms: S-S Shortest Path: Dijkstra's Algorithm Disjoint-Set Union Amortized Analysis
No ratings yet
CS 332: Algorithms: S-S Shortest Path: Dijkstra's Algorithm Disjoint-Set Union Amortized Analysis
11 pages
Leetcode CPP
No ratings yet
Leetcode CPP
262 pages
AI Lab Assignment-10 Ishaan Bhadrike
No ratings yet
AI Lab Assignment-10 Ishaan Bhadrike
7 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
4 pages
FML File Final
No ratings yet
FML File Final
36 pages
PDB Partitioning
No ratings yet
PDB Partitioning
11 pages
KNN With Example (2)
No ratings yet
KNN With Example (2)
21 pages
Sde Sheet (Core) :: Vjera4/Edit
No ratings yet
Sde Sheet (Core) :: Vjera4/Edit
12 pages

Vectorization of Insertion Sort Using Altivec

Uploaded by

Vectorization of Insertion Sort Using Altivec

Uploaded by

Vectorization of Insertion Sort using Altivec,

and an Extended Merge Sort algorithm

c 2005 Konstantinos Margaritis, [email protected]

1 Vectorizing Insertion sort

vector uint8_t va_key, va_cur,

How do we compare 2 sets of 16 characters at once? Even more tricky, how

 Generate comparison mask for elements a[i] and a[ j].

int vecinssort_c(uint8_t a[], int length) {

/* Generate the comparison mask between

/* And construct the first and second

// Store the vectors to their appropriate

2 Extending/Vectorizing Merge sort

1. Initialize a indices[] array of integers, with size N. Set all elements to 0.

This looks even easier if you see it in code:

#define INDEX(n,x,y) ((x)*(n) +(y))

int mergesort_c(uint8_t a[], size_t sets, size_t columnsize) {

// Initialize and zero the counters[] array.

// While we have remaining sets to sort, loop

// Loop over all the sets

if (min >= cur) {

Figure 3: Insertion Sort: Altivec versus Scalar (ints)

2.1 Performance Results

Figure 4: Insertion Sort: Altivec versus Scalar (chars)

You might also like

Generate comparison mask for elements a[i] and a[ j].