0% found this document useful (0 votes)
68 views30 pages

Kyle Lutz: A C++ Library For GPU Computing

Boost.Compute is a C++ library that provides STL algorithms and containers for GPU and parallel computing. It uses OpenCL to execute algorithms on devices like GPUs and Intel Xeon Phi. Boost.Compute aims to simplify porting C++ applications to parallel devices by providing a familiar STL-like interface. It also handles low-level OpenCL details and provides features like program caching, auto-tuning, and OpenGL interoperability. Boost.Compute was recently accepted as an official Boost library.

Uploaded by

pradeep.esg8068
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views30 pages

Kyle Lutz: A C++ Library For GPU Computing

Boost.Compute is a C++ library that provides STL algorithms and containers for GPU and parallel computing. It uses OpenCL to execute algorithms on devices like GPUs and Intel Xeon Phi. Boost.Compute aims to simplify porting C++ applications to parallel devices by providing a familiar STL-like interface. It also handles low-level OpenCL details and provides features like program caching, auto-tuning, and OpenGL interoperability. Boost.Compute was recently accepted as an official Boost library.

Uploaded by

pradeep.esg8068
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Boost.

Compute
A C++ library for GPU computing

Kyle Lutz

GPUs
(NVIDIA, AMD, Intel)

Multi-core CPUs
(Intel, AMD)

STL for Parallel Devices


Accelerators
(Xeon Phi, Adapteva Epiphany)

FPGAs
(Altera, Xilinx)

Why Parallel?

The free lunch is over

Single-threaded execution is a small percentage of total compute


power in a modern machine

Why STL?

Familiar to most C++ programmers (or should be)

Simplifies porting existing applications to parallel architectures

Design

Library Architecture
Boost.Compute

RNGs

Lambda Expressions

STL-like API

Fancy Iterators
Core

OpenCL
GPU

CPU

FPGA

Why OpenCL?
(or why not CUDA/Thrust/Bolt/SYCL/OpenACC/OpenMP/C++AMP?)

Standard C++ (no special compiler or compiler extensions)

Library-based solution (no special build-system integration)

Vendor-neutral, open-standard

Low-level API

Low-level API
Provides classes to wrap OpenCL objects such as buffer, context,
program, and command_queue.
Takes care of reference counting and error checking
Also provides utility functions for handling error codes or setting up
the default device

Low-level API
#include <boost/compute/core.hpp>
// lookup default compute device
auto gpu = boost::compute::system::default_device();
// create opencl context for the device
auto ctx = boost::compute::context(gpu);
// create command queue for the device
auto queue = boost::compute::command_queue(ctx, gpu);
// print device name
std::cout << device = << gpu.name() << std::endl;

High-level API

Algorithms
accumulate()
adjacent_difference()
adjacent_find()
all_of()
any_of()
binary_search()
copy()
copy_if()
copy_n()
count()
count_if()
equal()
equal_range()
exclusive_scan()
fill()
fill_n()
find()
find_end()
find_if()
find_if_not()
for_each()

gather()
generate()
generate_n()
includes()
inclusive_scan()
inner_product()
inplace_merge()
iota()
is_partitioned()
is_permutation()
is_sorted()
lower_bound()
lexicographical_compare()
max_element()
merge()
min_element()
minmax_element()
mismatch()
next_permutation()
none_of()
nth_element()

partial_sum()
partition()
partition_copy()
partition_point()
prev_permutation()
random_shuffle()
reduce()
remove()
remove_if()
replace()
replace_copy()
reverse()
reverse_copy()
rotate()
rotate_copy()
scatter()
search()
search_n()
set_difference()
set_intersection()

set_symmetric_difference()
set_union()
sort()
sort_by_key()
stable_partition()
stable_sort()
swap_ranges()
transform()
transform_reduce()
unique()
unique_copy()
upper_bound()

Containers

Iterators

array<T, N>
dynamic_bitset<T>
flat_map<Key, T>
flat_set<T>
stack<T>
string
valarray<T>
vector<T>

buffer_iterator<T>
constant_buffer_iterator<T>
constant_iterator<T>
counting_iterator<T>
discard_iterator
function_input_iterator<Function>
permutation_iterator<Elem, Index>
transform_iterator<Iter, Function>
zip_iterator<IterTuple>

Random Number Generators


bernoulli_distribution
default_random_engine
discrete_distribution
linear_congruential_engine
mersenne_twister_engine
normal_distribution
uniform_int_distribution
uniform_real_distribution

Sort Host Data


#include <vector>
#include <algorithm>
std::vector<int> vec = { ... };
std::sort(vec.begin(), vec.end());

Sort Host Data


#include <vector>
#include <boost/compute/algorithm/sort.hpp>
std::vector<int> vec = { ... };
boost::compute::sort(vec.begin(), vec.end(), queue);

8000
6000

STL
Boost.Compute

4000
2000
0
1M

10M

100M

Parallel Reduction
#include <boost/compute/algorithm/reduce.hpp>
#include <boost/compute/container/vector.hpp>
boost::compute::vector<int> data = { ... };
int sum = 0;
boost::compute::reduce(
data.begin(), data.end(), &sum, queue
);
std::cout << sum = << sum << std::endl;

Algorithm Internals
Fundamentally, STL-like algorithms produce OpenCL kernel objects which
are executed on a compute device.

OpenCL
C++

Custom Functions
BOOST_COMPUTE_FUNCTION(int, plus_two, (int x),
{
return x + 2;
});
boost::compute::transform(
v.begin(), v.end(), v.begin(), plus_two, queue
);

Lambda Expressions
Offers a concise syntax for specifying custom operations
Fully type-checked by the C++ compiler

using boost::compute::lambda::_1;
boost::compute::transform(
v.begin(), v.end(), v.begin(), _1 + 2, queue
);

Closures
Similar to BOOST_COMPUTE_FUNCTION()
Additionally allow capturing of in-scope C++ variables

Iterator Adaptors
Augment abilities of existing algorithms
Leads to more performant code

boost::compute::vector<int> v = ...;
int abs_sum = boost::compute::accumulate(
make_transform_iterator(v.begin(), abs<int>()),
make_transform_iterator(v.end(), abs<int>()),
0,
queue
);

Additional Features

OpenGL Interop

OpenCL provides mechanisms for synchronizing with OpenGL to implement


direct rendering on the GPU
Boost.Compute provides easy to use functions for interacting with OpenGL
in a portable manner.

OpenCL

OpenGL

Program Caching
Helps mitigate run-time kernel compilation costs
Frequently-used kernels are stored and retrieved from the global cache
Offline cache reduces this to one compilation per system

Auto-tuning

OpenCL supports a wide variety of hardware with diverse execution


characteristics
Algorithms support different execution parameters such as work-group size,
amount of work to execute serially
These parameters are tunable and their results are measurable
Boost.Compute includes benchmarks and tuning utilities to find the optimal
parameters for a given device

Auto-tuning

Recent News

Coming soon to Boost

Went through Boost peer-review in December 2014


Accepted as an official Boost library in January 2015
Should be packaged in a Boost release this year (1.59)

Thank You
Source
https://github.com/boostorg/compute
Documentation
http://boostorg.github.io/compute

You might also like