0% found this document useful (0 votes)
307 views22 pages

Thrust

Thrust is a C++ template library that allows programming GPUs using an interface similar to the C++ Standard Template Library (STL) with minimal programming effort. It provides common containers, iterators, and algorithms for performing operations such as sorting, searching, and reducing data stored on the host or device. Thrust aims to simplify GPU programming by handling memory transfers between the host and device and providing a familiar STL-like programming model.

Uploaded by

Hemant Gangwar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
307 views22 pages

Thrust

Thrust is a C++ template library that allows programming GPUs using an interface similar to the C++ Standard Template Library (STL) with minimal programming effort. It provides common containers, iterators, and algorithms for performing operations such as sorting, searching, and reducing data stored on the host or device. Thrust aims to simplify GPU programming by handling memory transfers between the host and device and providing a familiar STL-like programming model.

Uploaded by

Hemant Gangwar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

An introduction to Thrust

A CUDA library of parallel algorithms

Jose Nunez Gonzalez


Michel A. Rivero Corona
What is Thrust?

Thrust is a C++ template library for CUDA.


Requires CUDA 3.0 (included in CUDA 4.x)
Thrust allows you to program GPUs with
minimal programing effort using an interface
similar the C++ Standard Template Library
(STL).
You just need to #include the appropriate
header files into your .cu file and compile with
nvcc.
What is Thrust?

Thrust is a cohesive collection of algorithms


and data structures in a single package.
Thrust is self-contained and requires no
additional libraries.
Thrust is open-source software.
Thrust has been tested extensively on Linux,
Windows and MacOSX systems.
Thrust components
Container Classes
Storage your data
Vector, list, map ...

Algorithm Classes
frequently used algorithms
sort, find, binary search, ...

Iterator Classes
Vector containers
Thrust provides

thrust::host_vector
thrust::device_vector

These vector data structures simplify memory


management and transferring data between the
host and device.
Basic_vector.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <iostream>

int main(void)
{
// H has storage for 4 integers
thrust::host_vector<int> H(4);

// initialize individual elements


H[0] = 14; H[1] = 20; H[2] = 38; H[3] = 46;

// resize H
H.resize(2);

// Copy host_vector H to device_vector D


thrust::device_vector<int> D = H;

// elements of D can be modified


D[0] = 99;
D[1] = 88;

// print contents of D
for(int i = 0; i < D.size(); i++)
std::cout << "D[" << i << "] = " << D[i] << std::endl;

// H and D are automatically deleted when the function returns


return 0;
}
STL Vector::Member functions
Iterators:
begin Return iterator to beginning
end Return iterator to end

Capacity:
size Return size
resize Change size

Modifiers:
assign Assign vector content
push_back Add element at the end
pop_back Delete last element
insert Insert elements
erase Erase elements
An important question
Can I create a thrust::device_vector from memory
I've allocated myself?

Answer: No, instead, wrap your externally


allocated raw pointer with thrust::device_ptr and
pass it to Thrust algorithms.
Wrap_pointer.cu
#include <thrust/device_ptr.h>
#include <thrust/fill.h>
#include <cuda.h>

int main(void)
{
size_t N = 10;

// raw pointer to device memory


int * raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));

// wrap raw pointer with a device_ptr


thrust::device_ptr<int> dev_ptr(raw_ptr);

// use device_ptr in thrust algorithms


thrust::fill(dev_ptr, dev_ptr + N, (int) 0);

// access device memory through device_ptr


dev_ptr[0] = 1;

// free memory
cudaFree(raw_ptr);

return 0;
}
Thrust algorithms
● Linear Search
find, find_if ...
● Subsequence Matching

search, find_end ...


● Counting Elements

count, count_if
● for_each

● Comparing Two Ranges

equal, mismatch …
● Generalized Numeric Algorithms

inner_product, adjacent_difference ...


Thrust algorithms
● Copy Ranges
copy, copy_n ..
● Swapping Elements

swap, swap_ranges ...


● Replacing Elements

replace, replace_if, replace_copy …


● Permuting Elements

reverse
● Others

sort, generate, random


Thrust (STL) algorithms
Approximately 60 standard algorithms

● Search
● Sort

● Transformations

● Numeric

Most functions take the form

Function(Iter_begin,Iter_end, ...)
sort.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
#include <math.h>

int main(void)
{
int k,n;
k=10;
n=int(pow(2.0,k));

// generate random data on the host


thrust::host_vector<int> h_vec(n);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device
thrust::device_vector<int> d_vec = h_vec;

// sort on device
thrust::sort(d_vec.begin(), d_vec.end());

return 0;
}
sort.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <cstdlib>
#include <math.h>

int main(void)
{
int k,n;
k=10;
n=int(pow(2.0,k));

// generate random data on the host


thrust::host_vector<int> h_vec(n);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device
thrust::device_vector<int> d_vec = h_vec;

// sort on device
thrust::sort(d_vec.begin(), d_vec.end());

return 0;
}
sum.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/reduce.h>
#include <thrust/functional.h>
#include <cstdlib>

int main(void)
{
// generate random data on the host
thrust::host_vector<int> h_vec(100);
thrust::generate(h_vec.begin(), h_vec.end(), rand);

// transfer to device and compute sum


thrust::device_vector<int> d_vec = h_vec;
int x = thrust::reduce(d_vec.begin(), d_vec.end(), (int) 0,
thrust::plus<int>());
return 0;
}
Iterators
●Iterators provide a means for accessing data stored in
container classes such a vector.

● Iterators can be thought of as limited pointers.

● Thrust algorithms (discussed before) use iterators.

● For instance, if you had an Thrust device vector storing


integers, you could create an iterator for it as follows:

thrust::device_vector<int> d_vec;
thrust::device_vector<int>::iterator vecIterator;
Iterators
Thrust provides

constant_iterator

counting_iterator
constant_iterator.cu
#include <thrust/iterator/constant_iterator.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>

// for printing
#include <thrust/copy.h>
#include <iterator>

int main(void)
{
thrust::device_vector<int> data(4);
data[0] = 3;
data[1] = 7;
data[2] = 2;
data[3] = 5;

// add 10 to all values in data


thrust::transform(data.begin(), data.end(), thrust::constant_iterator<int>(10),
data.begin(), thrust::plus<int>());

// data is now [13, 17, 12, 15]

// print result
thrust::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, "\n"));

return 0;
}
Iterators
Thrust provides

transform_iterator

zip_iterator
Zip_iterator.cu
#include <thrust/iterator/zip_iterator.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>
// for printing
#include <thrust/copy.h>
#include <iterator>
using namespace thrust;

int main(void)
{
device_vector<int> A(3);
device_vector<char> B(3);
A[0] = 10; A[1] = 20; A[2] = 30;
B[0] = ‘x’; B[1] = ‘y’; B[2] = ‘z’;

// create iterator (type omitted)


begin = make_zip_iterator(make_tuple(A.begin(), B.begin()));
end = make_zip_iterator(make_tuple(A.end(), B.end()));

begin[0] // returns tuple(10, ‘x’)


begin[1] // returns tuple(20, ‘y’)
begin[2] // returns tuple(30, ‘z’)

// maximum of [begin, end)


maximum< tuple<int,char> > binary_op;
reduce(begin, end, begin[0], binary_op); // returns tuple(30, ‘z’)
return 0;
}
Troubleshooting
Make sure that files that #include Thrust have a
.cu extension.
Other extensions (e.g..cpp) will cause nvcc to
treat the file incorrectly and produce an error
message.
Some C++ templates could not be supported.
References
http://code.google.com/p/thrust/
http://wiki.thrust.googlecode.com/hg/html/index.htm
http://www.cplusplus.com/reference/stl/

You might also like