0% found this document useful (0 votes)

68 views30 pages

Kyle Lutz: A C++ Library For GPU Computing

Boost.Compute is a C++ library that provides STL algorithms and containers for GPU and parallel computing. It uses OpenCL to execute algorithms on devices like GPUs and Intel Xeon Phi. Boost.Compute aims to simplify porting C++ applications to parallel devices by providing a familiar STL-like interface. It also handles low-level OpenCL details and provides features like program caching, auto-tuning, and OpenGL interoperability. Boost.Compute was recently accepted as an official Boost library.

Uploaded by

pradeep.esg8068

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views30 pages

Kyle Lutz: A C++ Library For GPU Computing

Uploaded by

pradeep.esg8068

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Boost.

Compute
A C++ library for GPU computing

Kyle Lutz

GPUs
(NVIDIA, AMD, Intel)

Multi-core CPUs
(Intel, AMD)

STL for Parallel Devices

Accelerators
(Xeon Phi, Adapteva Epiphany)

FPGAs
(Altera, Xilinx)

Why Parallel?

The free lunch is over

Single-threaded execution is a small percentage of total compute

power in a modern machine

Why STL?

Familiar to most C++ programmers (or should be)

Simplifies porting existing applications to parallel architectures

Design

Library Architecture
Boost.Compute

RNGs

Lambda Expressions

STL-like API

Fancy Iterators
Core

OpenCL
GPU

CPU

FPGA

Why OpenCL?
(or why not CUDA/Thrust/Bolt/SYCL/OpenACC/OpenMP/C++AMP?)

Standard C++ (no special compiler or compiler extensions)

Library-based solution (no special build-system integration)

Vendor-neutral, open-standard

Low-level API

Low-level API
Provides classes to wrap OpenCL objects such as buffer, context,
program, and command_queue.
Takes care of reference counting and error checking
Also provides utility functions for handling error codes or setting up
the default device

Low-level API
#include <boost/compute/core.hpp>
// lookup default compute device
auto gpu = boost::compute::system::default_device();
// create opencl context for the device
auto ctx = boost::compute::context(gpu);
// create command queue for the device
auto queue = boost::compute::command_queue(ctx, gpu);
// print device name
std::cout << device = << gpu.name() << std::endl;

High-level API

Algorithms
accumulate()
adjacent_difference()
adjacent_find()
all_of()
any_of()
binary_search()
copy()
copy_if()
copy_n()
count()
count_if()
equal()
equal_range()
exclusive_scan()
fill()
fill_n()
find()
find_end()
find_if()
find_if_not()
for_each()

gather()
generate()
generate_n()
includes()
inclusive_scan()
inner_product()
inplace_merge()
iota()
is_partitioned()
is_permutation()
is_sorted()
lower_bound()
lexicographical_compare()
max_element()
merge()
min_element()
minmax_element()
mismatch()
next_permutation()
none_of()
nth_element()

partial_sum()
partition()
partition_copy()
partition_point()
prev_permutation()
random_shuffle()
reduce()
remove()
remove_if()
replace()
replace_copy()
reverse()
reverse_copy()
rotate()
rotate_copy()
scatter()
search()
search_n()
set_difference()
set_intersection()

set_symmetric_difference()
set_union()
sort()
sort_by_key()
stable_partition()
stable_sort()
swap_ranges()
transform()
transform_reduce()
unique()
unique_copy()
upper_bound()

Containers

Iterators

array<T, N>
dynamic_bitset<T>
flat_map<Key, T>
flat_set<T>
stack<T>
string
valarray<T>
vector<T>

buffer_iterator<T>
constant_buffer_iterator<T>
constant_iterator<T>
counting_iterator<T>
discard_iterator
function_input_iterator<Function>
permutation_iterator<Elem, Index>
transform_iterator<Iter, Function>
zip_iterator<IterTuple>

Random Number Generators

bernoulli_distribution
default_random_engine
discrete_distribution
linear_congruential_engine
mersenne_twister_engine
normal_distribution
uniform_int_distribution
uniform_real_distribution

Sort Host Data

#include <vector>
#include <algorithm>
std::vector<int> vec = { ... };
std::sort(vec.begin(), vec.end());

Sort Host Data

#include <vector>
#include <boost/compute/algorithm/sort.hpp>
std::vector<int> vec = { ... };
boost::compute::sort(vec.begin(), vec.end(), queue);

8000
6000

STL
Boost.Compute

4000
2000
0
1M

10M

100M

Parallel Reduction
#include <boost/compute/algorithm/reduce.hpp>
#include <boost/compute/container/vector.hpp>
boost::compute::vector<int> data = { ... };
int sum = 0;
boost::compute::reduce(
data.begin(), data.end(), &sum, queue
);
std::cout << sum = << sum << std::endl;

Algorithm Internals
Fundamentally, STL-like algorithms produce OpenCL kernel objects which
are executed on a compute device.

OpenCL
C++

Custom Functions
BOOST_COMPUTE_FUNCTION(int, plus_two, (int x),
{
return x + 2;
});
boost::compute::transform(
v.begin(), v.end(), v.begin(), plus_two, queue
);

Lambda Expressions
Offers a concise syntax for specifying custom operations
Fully type-checked by the C++ compiler

using boost::compute::lambda::_1;
boost::compute::transform(
v.begin(), v.end(), v.begin(), _1 + 2, queue
);

Closures
Similar to BOOST_COMPUTE_FUNCTION()
Additionally allow capturing of in-scope C++ variables

Iterator Adaptors
Augment abilities of existing algorithms
Leads to more performant code

boost::compute::vector<int> v = ...;
int abs_sum = boost::compute::accumulate(
make_transform_iterator(v.begin(), abs<int>()),
make_transform_iterator(v.end(), abs<int>()),
0,
queue
);

Additional Features

OpenGL Interop

OpenCL provides mechanisms for synchronizing with OpenGL to implement

direct rendering on the GPU
Boost.Compute provides easy to use functions for interacting with OpenGL
in a portable manner.

OpenCL

OpenGL

Program Caching
Helps mitigate run-time kernel compilation costs
Frequently-used kernels are stored and retrieved from the global cache
Offline cache reduces this to one compilation per system

Auto-tuning

OpenCL supports a wide variety of hardware with diverse execution

characteristics
Algorithms support different execution parameters such as work-group size,
amount of work to execute serially
These parameters are tunable and their results are measurable
Boost.Compute includes benchmarks and tuning utilities to find the optimal
parameters for a given device

Auto-tuning

Coming soon to Boost

Went through Boost peer-review in December 2014

Accepted as an official Boost library in January 2015
Should be packaged in a Boost release this year (1.59)

Thank You
Source
https://github.com/boostorg/compute
Documentation
http://boostorg.github.io/compute

Book4 Modern CPP STL
No ratings yet
Book4 Modern CPP STL
160 pages
418 CUMMINS 6CTA8.3-C215 Dongfeng Part Catalogue
100% (1)
418 CUMMINS 6CTA8.3-C215 Dongfeng Part Catalogue
84 pages
OOPs With C - CSE2001 - Unit 4 - STL
No ratings yet
OOPs With C - CSE2001 - Unit 4 - STL
34 pages
Oopcgl
No ratings yet
Oopcgl
59 pages
超越C 标准库：Boost库导论Beyond the C Standard Library Boost中文版
No ratings yet
超越C 标准库：Boost库导论Beyond the C Standard Library Boost中文版
371 pages
21.optimization II
No ratings yet
21.optimization II
92 pages
Object-Oriented Programming: Standard Template Library (STL)
No ratings yet
Object-Oriented Programming: Standard Template Library (STL)
101 pages
1 Data - Structures - Course - 20242
No ratings yet
1 Data - Structures - Course - 20242
183 pages
A Data Structure Optimizing Compiler For tUPL
No ratings yet
A Data Structure Optimizing Compiler For tUPL
102 pages
Principles of Computer Programming II Week - 1: Irakli Iremashvili
No ratings yet
Principles of Computer Programming II Week - 1: Irakli Iremashvili
105 pages
Intro To CUDA
No ratings yet
Intro To CUDA
16 pages
21CSC101T Oodp Unit-5
No ratings yet
21CSC101T Oodp Unit-5
120 pages
VCL Manual
No ratings yet
VCL Manual
96 pages
Icl Utk 1031 2017
No ratings yet
Icl Utk 1031 2017
45 pages
C++ STL Programming
No ratings yet
C++ STL Programming
66 pages
Benchmarking Performance of in ( ) &:, Platforms:, &
No ratings yet
Benchmarking Performance of in ( ) &:, Platforms:, &
22 pages
Manish Industrial
No ratings yet
Manish Industrial
30 pages
Implementation of Sequence and Associative Containers For Different Algorithms Using Their Iterators
No ratings yet
Implementation of Sequence and Associative Containers For Different Algorithms Using Their Iterators
50 pages
Design Application Hairpin
No ratings yet
Design Application Hairpin
23 pages
STL Algorithms in Action - Michael VanLoon - CppCon 2015
No ratings yet
STL Algorithms in Action - Michael VanLoon - CppCon 2015
99 pages
GPtutorial Screen
No ratings yet
GPtutorial Screen
58 pages
What Simple CPP Tricks Should Everyone Know
No ratings yet
What Simple CPP Tricks Should Everyone Know
3 pages
21csc101t Oodp Unit-5
No ratings yet
21csc101t Oodp Unit-5
120 pages
Maths Test 1 PDF
No ratings yet
Maths Test 1 PDF
2 pages
Parallel Multiverse
No ratings yet
Parallel Multiverse
46 pages
Final Draft
No ratings yet
Final Draft
4 pages
AI TimeTable Report
No ratings yet
AI TimeTable Report
19 pages
Less Slow C++ - Hacker News
No ratings yet
Less Slow C++ - Hacker News
3 pages
EC6301 Object Oriented Programming and Data Structure 1
No ratings yet
EC6301 Object Oriented Programming and Data Structure 1
111 pages
File 2
No ratings yet
File 2
13 pages
21CSC101T - Oodp Unit 5
No ratings yet
21CSC101T - Oodp Unit 5
76 pages
17.iterators Containers Alg
No ratings yet
17.iterators Containers Alg
80 pages
Lecture 21
No ratings yet
Lecture 21
20 pages
Oodp Unit-5
No ratings yet
Oodp Unit-5
86 pages
Objects, and Algorithms
No ratings yet
Objects, and Algorithms
27 pages
PPT13 Standard Template Library
No ratings yet
PPT13 Standard Template Library
25 pages
Standard C++ Library: Question: Why Should I Write
No ratings yet
Standard C++ Library: Question: Why Should I Write
28 pages
21CSC101T: Object Oriented Design and Programming UNIT-5
No ratings yet
21CSC101T: Object Oriented Design and Programming UNIT-5
120 pages
Dr. Trukk Price List W.E.F. 1st April, 2015. List No: 4
No ratings yet
Dr. Trukk Price List W.E.F. 1st April, 2015. List No: 4
20 pages
OOPlecture 8
No ratings yet
OOPlecture 8
19 pages
C++ Standard Template Library Components: 1. Containers 2. Algorithm 3. Iterators 4. Functors (Function Objects)
No ratings yet
C++ Standard Template Library Components: 1. Containers 2. Algorithm 3. Iterators 4. Functors (Function Objects)
17 pages
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
No ratings yet
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
218 pages
Modern C++ - What You Need To Know PDF
No ratings yet
Modern C++ - What You Need To Know PDF
55 pages
Data Structures AND Algorithms: Lecture Notes 7
No ratings yet
Data Structures AND Algorithms: Lecture Notes 7
31 pages
STL Algorithms: Copy (Istream - Iterator (Input), Istream - Iterator, Inserter (Values, Values - Begin ) )
No ratings yet
STL Algorithms: Copy (Istream - Iterator (Input), Istream - Iterator, Inserter (Values, Values - Begin ) )
14 pages
Container Algorithms
No ratings yet
Container Algorithms
41 pages
Report Indutry
No ratings yet
Report Indutry
29 pages
Pupil Practice Book
67% (3)
Pupil Practice Book
89 pages
OODP Unit 5
No ratings yet
OODP Unit 5
18 pages
Victor A. Laskin, 2015: The Ways To Avoid Complexity in Modern C++
No ratings yet
Victor A. Laskin, 2015: The Ways To Avoid Complexity in Modern C++
46 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
C++ Annotations
No ratings yet
C++ Annotations
559 pages
50 Boost Libraries PDF
No ratings yet
50 Boost Libraries PDF
93 pages
Office Automation
No ratings yet
Office Automation
14 pages
21pd39 - Standard Template Library Documentation
No ratings yet
21pd39 - Standard Template Library Documentation
14 pages
C Plus Plus How To Program 11e Contents
No ratings yet
C Plus Plus How To Program 11e Contents
16 pages
C Templates
No ratings yet
C Templates
30 pages
Templates: Anas P.S 21-Feb-2011
No ratings yet
Templates: Anas P.S 21-Feb-2011
30 pages
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 42
No ratings yet
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 42
43 pages
CPP04-STLAlgorithms 4up PDF
No ratings yet
CPP04-STLAlgorithms 4up PDF
14 pages
Standard Template Library STL in C
No ratings yet
Standard Template Library STL in C
20 pages
C++ Adv - STL
No ratings yet
C++ Adv - STL
5 pages
Crankshaft Balance: Practical Considerations
No ratings yet
Crankshaft Balance: Practical Considerations
4 pages
C++ Interfaces
No ratings yet
C++ Interfaces
1 page
OpenStack Cookbook: Freedom in The Cloud...
100% (1)
OpenStack Cookbook: Freedom in The Cloud...
17 pages
Control of A Two-Tank System - MATLAB & Simulink Example PDF
No ratings yet
Control of A Two-Tank System - MATLAB & Simulink Example PDF
21 pages
Course Contents
No ratings yet
Course Contents
19 pages
Construction of Three Phase Induction Motor
No ratings yet
Construction of Three Phase Induction Motor
6 pages
Lecture 9 Notes: Advanced Topics I: 1 Templates
No ratings yet
Lecture 9 Notes: Advanced Topics I: 1 Templates
7 pages
View PDF
No ratings yet
View PDF
6 pages
Capacity Design
No ratings yet
Capacity Design
12 pages
Paradigm Shifts
No ratings yet
Paradigm Shifts
1 page
A Review of CXX 11 14 Only Boost Libraries Fiber AFIO DI and APIBind
No ratings yet
A Review of CXX 11 14 Only Boost Libraries Fiber AFIO DI and APIBind
95 pages
A Review of CXX 11 14 Only Boost Libraries Fiber AFIO DI and APIBind
No ratings yet
A Review of CXX 11 14 Only Boost Libraries Fiber AFIO DI and APIBind
95 pages
Break The Wall From Bottom: Automated Discovery of Protocol-Level Evasion Vulnerabilities in Web Application Firewalls
No ratings yet
Break The Wall From Bottom: Automated Discovery of Protocol-Level Evasion Vulnerabilities in Web Application Firewalls
50 pages
Fx3uc 32mt LT 2 Hardware Manual Jy997d31601 D (05.11)
No ratings yet
Fx3uc 32mt LT 2 Hardware Manual Jy997d31601 D (05.11)
8 pages
Mock Exam-P1 Review 2025
No ratings yet
Mock Exam-P1 Review 2025
41 pages
Cs International 2019 Interim Report
No ratings yet
Cs International 2019 Interim Report
70 pages
Linux Interactive Exploit Development With GDB and PEDA Slides PDF
No ratings yet
Linux Interactive Exploit Development With GDB and PEDA Slides PDF
42 pages
Laboratory Activity 2
No ratings yet
Laboratory Activity 2
19 pages
Etsi TR 102 825-6
No ratings yet
Etsi TR 102 825-6
39 pages
Kubler - Bellows Couplings
No ratings yet
Kubler - Bellows Couplings
2 pages
UGC NET Paper 1 16 June 2023 Morning Shift
No ratings yet
UGC NET Paper 1 16 June 2023 Morning Shift
40 pages
Current Unbalance Monitoring in Four-Wire System Based
No ratings yet
Current Unbalance Monitoring in Four-Wire System Based
9 pages
Liaw2015 Day1 Application Configuration
No ratings yet
Liaw2015 Day1 Application Configuration
16 pages
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
ElenaElkina, JD, LLM
No ratings yet
ElenaElkina, JD, LLM
14 pages
Generalised Angular Momentum
No ratings yet
Generalised Angular Momentum
10 pages
SD Mill
No ratings yet
SD Mill
10 pages
Summative On Measure of An Arc
No ratings yet
Summative On Measure of An Arc
1 page
Spinach 1
No ratings yet
Spinach 1
7 pages
Test2 QP VE Resit2
No ratings yet
Test2 QP VE Resit2
3 pages
Batiment International, Building Research and Practice
No ratings yet
Batiment International, Building Research and Practice
2 pages
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
No ratings yet
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
6 pages
Phys BP PB 2
No ratings yet
Phys BP PB 2
1 page
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet

Kyle Lutz: A C++ Library For GPU Computing

Uploaded by

Kyle Lutz: A C++ Library For GPU Computing

Uploaded by

Boost.

STL for Parallel Devices

The free lunch is over

Single-threaded execution is a small percentage of total compute

Familiar to most C++ programmers (or should be)

Simplifies porting existing applications to parallel architectures

Standard C++ (no special compiler or compiler extensions)

Library-based solution (no special build-system integration)

Random Number Generators

Sort Host Data

Sort Host Data

OpenCL provides mechanisms for synchronizing with OpenGL to implement

OpenCL supports a wide variety of hardware with diverse execution

Coming soon to Boost

Went through Boost peer-review in December 2014

You might also like