0% found this document useful (0 votes)
40 views16 pages

Sycl 2020 Reference Guide

Uploaded by

celal yaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views16 pages

Sycl 2020 Reference Guide

Uploaded by

celal yaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

SYCL 2020 API Reference Guide Page 1

SYCL™ (pronounced “sickle”) uses generic Developers program at a higher level than
programming to enable higher-level the native acceleration API, but always
application software to be cleanly coded have access to lower-level code through
with optimized acceleration of kernel code seamless integration with the native
across a range of devices. acceleration API.

All definitions in this reference guide are in the sycl namespace.


[n.n] refers to sections in the SYCL 2020 (revision 2) specification at khronos.org/registry/sycl

Common interfaces Anatomy of a SYCL application [3.2]


Below is an example of a typical SYCL application which schedules a job to run Header file
Common reference semantics [4.5.2] in parallel on any OpenCL accelerator. USM versions of this example SYCL programs must include the
T may be accessor, buffer, context, device, device_image, are shown on page page 15 of this reference guide. <sycl/sycl.hpp> header file to
event, host_accessor, host_[un]sampled_image_accessor, provide all of the SYCL features
kernel, kernel_id, kernel_bundle, local_accessor, platform,
#include <iostream> used in this example.
queue, [un]sampled_image, [un]sampled_image_accessor.
T(const T &rhs); #include <sycl/sycl.hpp> Namespace
T(T &&rhs); using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names SYCL names are defined in the
int main() { sycl namespace.
T &operator=(const T &rhs);
T &operator=(T &&rhs); int data[1024]; // Allocates data to be worked on Queue
~T(); queue myQueue; // Create default queue to enqueue work This line implicitly selects the best
friend bool operator==(const T &lhs, const T &rhs); underlying device to execute on.
friend bool operator!=(const T &lhs, const T &rhs); // By wrapping all the SYCL work in a {} block, we ensure all See queue class functions [4.6.5]
// SYCL tasks must complete before exiting the block, on page 2 of this reference guide.
// because the destructor of resultBuf will wait.
Common by-value semantics [4.5.3] Buffer
{
T may be id, range, item, nd_item, h_item, group, // Wrap our data variable in a buffer. All data required in a kernel must
sub_group, or nd_range. be inside a buffer or image or
buffer<int, 1> resultBuf { data, range<1> { 1024 } }; else USM is used. See buffer class
friend bool operator==(const T &lhs, const T &rhs);
functions [4.7.2] on page 3 of this
friend bool operator!=(const T &lhs, const T &rhs); // Create a command group to issue commands to the queue. reference guide.
myQueue.submit([&](handler & cgh) {
Properties [4.5.4] Accessor
// Request access to the buffer without initialization See accessor class functions in
Each of the constructors in the following SYCL runtime accessor writeResult { resultBuf, cgh, write_only, no_init };
classes has an optional parameter to provide a [4.7.6.x] on pages 4 and 5 of this
property_list containing zero or more properties: accessor, reference guide.
// Enqueue a parallel_for task with 1024 work-items.
buffer, host_accessor, host_[un]sampled_image_accessor, cgh.parallel_for(1024, [=](auto idx) {
context, local_accessor, queue, [un]sampled_image, Handler
[un]sampled_image_accessor, stream, and usm_allocator. See handler class functions [4.9.4]
// Initialize each buffer element with its own rank number starting at 0 on page 9 of this reference guide.
template <typename propertyT> writeResult[idx] = idx;
struct is_property; Scopes
}); // End of the kernel function The kernel scope specifies a single
template <typename propertyT> kernel function compiled by a
inline constexpr bool is_property_v = is_property< }); // End of the queue commands device compiler and executed on
propertyT>::value; a device.
template <typename propertyT, typename syclObjectT> } // End of scope, so wait for the queued work to complete
struct is_property_of;template <typename propertyT, The command group scope
typename syclObjectT> // Print result specifies a unit of work which is
inline constexpr bool is_property_of_v = is_property_of< for (int i = 0; i < 1024; i++) { comprised of a kernel function and
propertyT, syclObjectT>::value; std::cout <<''data[''<< i << ''] = '' << data[i] << std::endl; accessors.
class T { return 0; The application scope specifies all
... } other code outside of a command
template <typename propertyT> group scope.
bool has_property() const;
Also see an example of how to write a reduction kernel on page page 9
template <typename propertyT> and examples of how to invoke kernels on page page 16.
propertyT get_property() const;
...
};
Platform class [4.6.2] Device class [4.6.4]
The platform class encapsulates a single platform on which The device class encapsulates a single device on which kernels
class property_list { kernel functions may be executed. A platform is associated can be executed. All member functions of the device class are
public: with a single backend. synchronous.
template <typename... propertyTN> platform(); device();
property_list(propertyTN... props); template <typename DeviceSelector> template <typename DeviceSelector>
}; explicit platform(const DeviceSelector &deviceSelector); explicit device(const DeviceSelector &deviceSelector);
backend get_backend() const noexcept; backend get_backend() const noexcept;
std::vector<device> get_devices( platform get_platform() const;
Device selection [4.6.1] info::device_type = info::device_type::all) const; bool is_cpu() const;
Device selection is done either by already having a specific template <typename param>
instance of a device or by providing a device selector. The bool is_gpu() const;
typename param::return_type get_info() const;
actual interface for a device selector is a callable taking a const bool is_accelerator() const;
device reference and returning a value implicitly convertible to template <typename param>
typename param::return_type get_backend_info() const; template <typename param>
an int. The system calls the function for each device, and the typename param::return_type
device with the highest value is selected. bool has(aspect asp) const; get_info() const;
static std::vector<platform> get_platforms(); template <typename param>
Pre-defined SYCL device selectors
Platform information descriptors typename param::return_type
default_selector_v Device selected by system heuristics get_backend_info() const;
Select a device according to device type version
gpu_selector_v Return type: std::string
bool has(aspect asp) const;
info::device::device_type::gpu info::platform::name
Select a device according to device type info::platform::vendor
template <info::partition_property prop> std::vector<device>
cpu_selector_v create_sub_devices(size_t count) const;
info::device::device_type::cpu
accelerator_selector_v Select an accelerator device.
(Continued on next page)

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 2
Device class (cont.) Context class [4.6.3] platform get_platform() const;
template <info::partition_property prop> std::vector<device> The context class represents a context. A context represents the std::vector<device> get_devices() const;
create_sub_devices(const std::vector<size_t> &counts) runtime data structures and state required by a backend API to template <typename param>
const; interact with a group of devices associated with a platform. typename param::return_type
template <info::partition_property prop>std::vector<device> explicit context(const property_list &propList = {}); get_backend_info() const;
create_sub_devices(info::affinity_domain domain) const; explicit context(async_handler asyncHandler,
const property_list &propList = {}); Context queries using get_info():
static std::vector<device> get_devices( The following descriptor names are in the info::contexr namespace.
info::device_type deviceType = info::device_type::all); explicit context(const device &dev,
const property_list &propList = {}); Descriptor Return type
Device queries using get_info()
The following descriptor names are in the info::device namespace. explicit context(const device &dev, async_handler asyncHandler, platform platform
const property_list &propList = {}); devices std::vector<device>
Descriptor Return type
device_type info::device_type explicit context(const std::vector<device> &deviceList, atomic_memory_order_capabilities std::vector<memory_order>
const property_list &propList = {}); atomic_fence_order_capabilities std::vector<memory_order>
vendor_id uint32_t
explicit context(const std::vector<device> &deviceList, atomic_memory_scope_capabilities std::vector<memory_scope>
max_compute_units uint32_t async_handler asyncHandler,
max_work_item_dimensions uint32_t const property_list &propList = {}); atomic_fence_scope_capabilities std::vector<memory_scope>
max_work_item_sizes<1> id<1> backend get_backend() const noexcept;
max_work_item_sizes<2> id<2> template <typename param>
max_work_item_sizes<3> id<3> typename param::return_type
max_work_group_size size_t
get_info() const;
max_num_sub_groups uint32_t
sub_group_independent_-
bool Descriptor Return type Descriptor Return type
forward_progress
sub_group_sizes std::vector<size_t> max_samplers uint32_t driver_version std::string
preferred_vector_width_char max_parameter_size size_t version std::string
preferred_vector_width_short mem_base_addr_align uint32_t backend_version std::string
preferred_vector_width_int half_fp_config std::vector<info::fp_config> aspects std::vector<aspect>
preferred_vector_width_long uint32_t single_fp_config std::vector<info::fp_config> printf_buffer_size size_t
preferred_vector_width_float double_fp_config std::vector<info::fp_config> parent_device device
preferred_vector_width_double global_mem_cache_type info::global_mem_cache_type partition_max_sub_devices uint32_t
preferred_vector_width_half global_mem_cache_line_size uint32_t std::vector <info::
partition_properties
native_vector_width_char global_mem_cache_size uint64_t partition_property>
native_vector_width_short global_mem_size uint64_t std::vector
partition_affinity_domains
local_mem_type info::local_mem_type <info::partition_affinity_domain>
native_vector_width_int
local_mem_size uint64_t partition_type_property info::partition_property
native_vector_width_long uint32_t
error_correction_support bool partition_type_affinity_domain info::partition_affinity_domain
native_vector_width_float
native_vector_width_double atomic_memory_order_capabilities std::vector<memory_order>
Device aspects [4.6.4.3]
native_vector_width_half atomic_fence_order_capabilities std::vector<memory_order> Device aspects are defined in enum class aspect. The core
max_clock_frequency uint32_t atomic_memory_scope_capabilities std::vector<memory_scope> enumerants are shown below. Specific backends may define
address_bits uint32_t atomic_fence_scope_capabilities std::vector<memory_scope> additional aspects.
max_mem_alloc_size uint64_t profiling_timer_resolution size_t cpu online_compiler
max_read_image_args uint32_t is_available bool gpu online_linker
max_write_image_args uint32_t std::vector accelerator queue_profiling
execution_capabilities
image2d_max_width size_t <info::execution_capability> custom usm_device_allocations
image2d_max_height size_t built_in_kernel_ids std::vector<kernel_id> fp16, fp64 usm_host_allocations
image3d_max_width size_t built_in_kernels std::vector<std::string> emulated usm_atomic_host_allocations
image3d_max_height size_t platform platform host_debuggable usm_shared_allocations
name std::string atomic64 usm_atomic_shared_allocations
image3d_max_depth size_t
vendor std::string image usm_system_allocations
image_max_buffer_size size_t

Queue class [4.6.5] explicit queue(const context &syclContext,


const device &syclDevice,
template <typename KernelName, typename KernelType>
event single_task(event depEvent,
The queue class encapsulates a single queue which schedules
kernels on a device. A queue can be used to submit command const async_handler &asyncHandler, const KernelType &kernelFunc);
groups to be executed by the runtime using the submit member const property_list &propList = {}); template <typename KernelName, typename KernelType>
function. Note that the destructor does not block. backend get_backend() const noexcept; event single_task(const std::vector<event> &depEvents,
explicit queue(const property_list &propList = {}); context get_context() const; const KernelType &kernelFunc);
explicit queue(const async_handler &asyncHandler, device get_device() const; template <typename KernelName, int Dims,
const property_list &propList = {}); typename... Rest>
bool is_in_order() const; event parallel_for(range<Dims> numWorkItems,
template <typename DeviceSelector> template <typename param> Rest&&... rest);
explicit queue(const DeviceSelector &deviceSelector, typename param::return_type template <typename KernelName, int Dims,
const property_list &propList = {}); get_info() const; typename... Rest>
template <typename DeviceSelector> template <typename param> event parallel_for(range<Dims> numWorkItems,
explicit queue(const DeviceSelector &deviceSelector, typename param::return_type event depEvent, Rest&&... rest);
const async_handler &asyncHandler, get_backend_info() const; template <typename KernelName, int Dims,
const property_list &propList = {}); typename... Rest>
template <typename T> event submit(T cgf);
explicit queue(const device &syclDevice, event parallel_for(range<Dims> numWorkItems,
const property_list &propList = {}); template <typename T> const std::vector<event> &depEvents,
event submit(T cgf, const queue &secondaryQueue); Rest&&... rest);
explicit queue(const device &syclDevice,
const async_handler &asyncHandler, void wait(); template <typename KernelName, int Dims,
const property_list &propList = {}); void wait_and_throw(); typename... Rest>
template <typename DeviceSelector> void throw_asynchronous(); event parallel_for(nd_range<Dims> executionRange,
explicit queue(const context &syclContext, Rest&&... rest);
const DeviceSelector &deviceSelector, Queue queries using get_info() template <typename KernelName, int Dims,
const property_list &propList = {}); typename... Rest>
Descriptor Return type
template <typename DeviceSelector> event parallel_for(nd_range<Dims> executionRange,
explicit queue(const context &syclContext, info::queue::context context event depEvent, Rest&&... rest);
const DeviceSelector &deviceSelector, info::queue::device device template <typename KernelName, int Dims,
const async_handler &asyncHandler, typename... rest>
const property_list &propList = {}); Convenience shortcuts event parallel_for(nd_range<Dims> executionRange,
explicit queue(const context &syclContext, template <typename KernelName, typename KernelType> const std::vector<event> &depEvents, Rest&&...rest);
const device &syclDevice, event single_task(const KernelType &kernelFunc);
const property_list &propList = {});
(Continued on next page)

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 3
Queue class (cont.) event prefetch(void* ptr, size_t numBytes); template <typename T_src, int dim_src,
access_mode mode_src, target tgt_src,
event prefetch(void* ptr, size_t numBytes, event depEvent);
USM Functions access::placeholder isPlaceholder_src,
event prefetch(void* ptr, size_t numBytes, typename T_dest, int dim_dest,
event memcpy(void* dest, const void* src, size_t numBytes); const std::vector<event> &depEvents); access_mode mode_dest, target tgt_dest,
event memcpy(void* dest, const void* src, size_t numBytes, event mem_advise(void *ptr, size_t numBytes, int advice); access::placeholder isPlaceholder_dest>
event depEvent); event copy(accessor<T_src, dim_src, mode_src, tgt_src,
event mem_advise(void *ptr, size_t numBytes, int advice, isPlaceholder_src> src, accessor<T_dest, dim_dest,
event memcpy(void* dest, const void* src, size_t numBytes, event depEvent);
const std::vector<event> &depEvents); mode_dest, tgt_dest, isPlaceholder_dest> dest);
event (void *ptr, size_t numBytes, int advice, template <typename T, int dim, access_mode mode,
template <typename T> const std::vector<event> &depEvents);
event copy(const T *src, T *dest, size_t count); target tgt, access::placeholder isPlaceholder>
template <typename T> event update_host(accessor<T, dim, mode, tgt,
Explicit copy functions isPlaceholder> acc);
event copy(const T *src, T *dest, size_t count, template <typename T_src, int dim_src,
event depEvent); access_mode mode_src, target tgt_src, template <typename T, int dim, access_mode mode,
template <typename T> access::placeholder isPlaceholder, typename T_dest> target tgt, access::placeholder isPlaceholder>
event copy(const T *src, T *dest, size_t count, event copy(accessor<T_src, dim_src, mode_src, tgt_src, event fill(accessor<T, dim, mode, tgt, isPlaceholder> dest,
const std::vector<event> &depEvents); isPlaceholder> src, std::shared_ptr<T_dest> dest); const T &src);
event memset(void* ptr, int value, size_t numBytes); template <typename T_src, typename T_dest, Queue property class constructors:
int dim_dest, access_mode mode_dest,
event memset(void* ptr, int value, size_t numBytes, target tgt_dest, access::placeholder isPlaceholder> property::queue::enable_profiling::enable_profiling();
event depEvent); event copy(std::shared_ptr<T_src> src, accessor<T_dest, property::queue::in_order::in_order();
event memset(void* ptr, int value, size_t numBytes, dim_dest, mode_dest, tgt_dest isPlaceholder> dest);
const std::vector<event> &depEvents); template <typename T_src, int dim_src, Queries using get_info():
template <typename T> access_mode mode_src, target tgt_src,
event fill(void* ptr, const T& pattern, size_t count); access::placeholder isPlaceholder, typename T_dest> Descriptor Return type
template <typename T> event copy(accessor<T_src, dim_src, mode_src, tgt_src, info::queue::context context
event fill(void* ptr, const T& pattern, size_t count, isPlaceholder> src, T_dest *dest); info::queue::device device
event depEvent); template <typename T_src, typename T_dest,
template <typename T> int dim_dest, access_mode mode_dest,
event fill(void* ptr, const T& pattern, size_t count, target tgt_dest, access::placeholder isPlaceholder>
const std::vector<event> &depEvents); event copy(const T_src *src, accessor<T_dest, dim_dest,
mode_dest, tgt_dest, isPlaceholder> dest);

Event class [4.6.6] Buffer class [4.7.2] template <class InputIterator>


buffer<T, 1>(InputIterator first, InputIterator last,
An event in is an object that represents the status of an The buffer class defines a shared array of one, two, or three
operation that is being executed by the runtime. dimensions that can be used by the kernel and has to be accessed const property_list &propList = {});
event() using accessor classes. Note that the destructor does block. buffer(buffer &b,const id<dimensions> &baseIndex,
const range<dimensions> &subRange);
backend get_backend() const noexcept; Class declaration
template <typename T, int dimensions = 1, get_range()
std::vector<event> get_wait_list(); typename AllocatorT = byte_size()
void wait(); buffer_allocator<std::remove_const_t<T>>>
class buffer; size_t size() const noexcept;
static void wait(const std::vector<event> &eventList);
AllocatorT get_allocator() const;
void wait_and_throw(); Member functions template <access_mode mode = access_mode::read_write,
static void wait_and_throw( buffer(const range<dimensions> &bufferRange, target targ = target::device> accessor<T,
const std::vector<event> &eventList); const property_list &propList = {}); dimensions, mode, targ>
template <typename param> buffer(const range<dimensions> &bufferRange, get_access(handler &commandGroupHandler);
typename param::return_type AllocatorT allocator, const property_list &propList = {}); template <access_mode mode = access_mode::read_write,
get_info() const; buffer(T *hostData, const range<dimensions> &bufferRange, target targ = target::device> accessor<T,
template <typename param> const property_list &propList = {}); dimensions, mode, targ>
typename param::return_type buffer(T *hostData, const range<dimensions> &bufferRange, get_access(
get_backend_info() const; AllocatorT allocator, const property_list &propList = {}); handler &commandGroupHandler, range<dimensions>
accessRange, id<dimensions> accessOffset = {});
template <typename param> buffer(const T *hostData,
typename param::return_type const range<dimensions> &bufferRange, template<typename... Ts> auto get_access(Ts...);
get_profiling_info() const; const property_list &propList = {}); template<typename... Ts> auto get_host_access(Ts...);
buffer(const T *hostData, template <typename Destination = std::nullptr_t>
Event queries using get_info() const range<dimensions> &bufferRange, void set_final_data(Destination finalData = nullptr);
Descriptor Return type AllocatorT allocator, const property_list &propList = {}); void set_write_back(bool flag = true);
info::event::command_execution_status info::event:: Available if dimensions == 1 and bool is_sub_buffer() const;
command_status std::data(­container) is convertible to T* template <typename ReinterpretT, int ReinterpretDim>
template <typename Container> buffer<ReinterpretT, ReinterpretDim,
Queries using get_profiling_info() buffer(Container &container, AllocatorT allocator, typename std::allocator_traits<AllocatorT>::template
Descriptor Return type const property_list &propList = {}); rebind_alloc<ReinterpretT>>
template <typename Container> reinterpret(range<ReinterpretDim> reinterpretRange)
info::event_profiling::command_submit uint64_t buffer(Container &container, const;
info::event_profiling::command_start uint64_t const property_list &propList = {});
Available when ReinterpretDim == 1 or when (ReinterpretDim ==
info::event_profiling::command_end uint64_t buffer(const std::shared_ptr<T> &hostData, dimensions) && (sizeof(ReinterpretT) == sizeof(T))
const range<dimensions> &bufferRange, template <typename ReinterpretT,
AllocatorT allocator, const property_list &propList = {}); int ReinterpretDim = dimensions>
buffer(const std::shared_ptr<T> &hostData, buffer<ReinterpretT, ReinterpretDim,
const range<dimensions> &bufferRange, typename std::allocator_traits<
const property_list &propList = {}); AllocatorT>::template rebind_alloc<ReinterpretT>>
buffer(const std::shared_ptr<T[ ]> &hostData, reinterpret() const;
const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {}); Buffer property class constructors:
buffer(const std::shared_ptr<T[ ]> &hostData, property::buffer::use_host_ptr::use_host_ptr()
Host allocation [4.7.1] const range<dimensions> &bufferRange, property::buffer::use_mutex::use_mutex(std::mutex &mutexRef)
The default allocator for memory objects is implementation const property_list &propList = {});
defined, but users can supply their own allocator class, e.g.: property::buffer::context_bound::context_bound(context boundContext)
template <class InputIterator>
buffer<int, 1, UserDefinedAllocator<int> > b(d); buffer<T, 1>(InputIterator first, InputIterator last,
The default allocators are buffer_allocator for buffers and AllocatorT allocator, const property_list &propList = {});
image_allocator for images.

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 4 SYCL 2020 API Reference Guide
Images, unsampled and sampled [4.7.3] Available when dimensions > 1 Constructors and members
sampled_image(const void *hostPointer,
Buffers and images define storage and ownership. Images are of unsampled_image(image_format format,
const range<dimensions> &rangeRef, image_format format, image_sampler sampler,
type unsampled_image or sampled_image. Their constructors const range<dimensions> &rangeRef,
take an image_format parameter from enum class image_format. const range<dimensions - 1> &pitch, const property_list &propList = {});
AllocatorT allocator, const property_list &propList = {});
enum class image_format values: sampled_image(std::shared_ptr<const void> &hostPointer,
unsampled_image(void *hostPointer, image_format format, image_sampler sampler,
r8g8b8a8_unorm r16g16b16a16_uint image_format format, const range<dimensions> &rangeRef,
const range<dimensions> &rangeRef, const property_list &propList = {});
r16g16b16a16_unorm r32b32g32a32_uint const range<dimensions - 1> &pitch,
r8g8b8a8_sint r16b16g16a16_sfloat property_list &propList = {}); Available when dimensions > 1
r16g16b16a16_sint r32g32b32a32_sfloat unsampled_image(void *hostPointer, sampled_image(const void *hostPointer,
r32b32g32a32_sint b8g8r8a8_unorm image_format format, image_format format, image_sampler sampler,
r8g8b8a8_uint const range<dimensions> &rangeRef, const range<dimensions> &rangeRef,
const range<dimensions - 1> &pitch, const range<dimensions - 1> &pitch,
AllocatorT allocator, const property_list &propList = {}); const property_list &propList = {});
Unsampled images [4.7.3.1]
unsampled_image(std::shared_ptr<void> &hostPointer, sampled_image(std::shared_ptr<const void> &hostPointer,
Class declaration image_format format, image_format format, image_sampler sampler,
template <int dimensions = 1, const range<dimensions> &rangeRef, const range<dimensions> &rangeRef,
typename AllocatorT = sycl::image_allocator> const range<dimensions - 1> &pitch, const range<dimensions - 1> &pitch,
class unsampled_image; AllocatorT allocator, const property_list &propList = {}); const property_list &propList = {});
Constructors and members unsampled_image(std::shared_ptr<void> &hostPointer, range<dimensions> get_range() const;
unsampled_image(image_format format, image_format format, range<dimensions-1> get_pitch() const;
const range<dimensions> &rangeRef, const range<dimensions> &rangeRef,
const property_list &propList = {}); const range<dimensions - 1> &pitch, size_t byte_size() const noexcept;
unsampled_image(image_format format, const property_list &propList = {}); size_t size() const noexcept;
const range<dimensions> &rangeRef, AllocatorT allocator, range<dimensions> get_range() const; template<typename... Ts>
const property_list &propList = {}); auto get_access(Ts... args);
Available when dimensions > 1
unsampled_image(void *hostPointer, image_format format, range<dimensions-1> get_pitch() const; template<typename... Ts>
const range<dimensions> &rangeRef, auto get_host_access(Ts... args);
const property_list &propList = {}); size_t size() const noexcept;
Image property constructors and members [4.7.3.3]
unsampled_image(void *hostPointer, size_t byte_size() const noexcept;
image_format format, const range<dimensions> &rangeRef, AllocatorT get_allocator() const; property::image::use_host_ptr::use_host_ptr();
AllocatorT allocator, const property_list &propList = {}); property::image::use_mutex::
template<typename... Ts> use_mutex(std::mutex &mutexRef);
unsampled_image(std::shared_ptr<void> &hostPointer, auto get_access(Ts... args);
image_format format, const range<dimensions> &rangeRef, property::image::context_bound::
const property_list &propList = {}); template<typename... Ts> context_bound(context boundContext);
auto get_host_access(Ts... args); std::mutex *property::image::use_mutex::
unsampled_image(std::shared_ptr<void> &hostPointer,
image_format format, const range<dimensions> &rangeRef, template <typename Destination = std::nullptr_t> get_mutex_ptr() const;
AllocatorT allocator, const property_list &propList = {}); void set_final_data(Destination finalData = std::nullptr); context property::image::context_bound::get_context() const;
void set_write_back(bool flag = true);
Available when dimensions > 1
unsampled_image(image_format format, Sampled images [4.7.3.2]
const range<dimensions> &rangeRef, Class declaration
const range<dimensions - 1> &pitch, template <int dimensions = 1,
const property_list &propList = {}); typename AllocatorT = sycl::image_allocator>
class sampled_image;

Data access and storage [4.7] • Unsampled image accessors of two kinds:
- From within a kernel function or from within a host task
Access targets [4.7.6.9]
Buffers and images define storage and ownership. Accessors target::device buffer access from kernel function via
provide access to the data. (4.7.6.13, class unsampled_image_accessor).
device global memory
- From host code outside of a host task (4.7.6.13, class
Accessors [4.7.6] host_unsampled_image_accessor). target::host_task buffer access from a host task
Accessor classes and the objects they access: • Sampled image accessors of two kinds:
• Buffer accessor for commands (4.7.6.9, class accessor) with - From within a kernel function or from within a host task enum class access::address_space [4.7.7.1]
two uses: (4.7.6.14, class sampled_image_accessor).
- From host code outside of a host task (4.7.6.14, class global_space Accessible to all work-items in all work-groups
- access a buffer from a kernel function via device global
memory host_sampled_image_accessor). constant_space Global space that is constant
- access a buffer from a host task
enum class access_mode [4.7.6.2] local_space Accessible to all work-items in a single work-group
• Buffer accessor for host code outside of a command (4.7.6.10,
class host_accessor). read write read_write private_space Accessible to a single work-item
• Local accessor from within kernel functions (4.7.6.11, class generic_space Virtual address space overlapping global, local,
local_accessor). Accessor property class constructor [4.7.6.4] and private
This is used in all accessor classes.
property::no_init::no_init()

Buffer accessor for commands Available when dimensions == 0 Available when dimensions > 0
(class accessor) [4.7.6.9] template <typename AllocatorT> template <typename AllocatorT, typename TagT>
This one class provides two kinds of accessors depending on accessor(buffer<dataT, 1, AllocatorT> &bufferRef, accessor(buffer<dataT, dimensions, AllocatorT>
accessTarget: handler &commandGroupHandlerRef, &bufferRef, handler &commandGroupHandlerRef,
• target::device to access a buffer from a kernel function via
const property_list &propList = {}); TagT tag, const property_list &propList = {});
device global memory template <typename AllocatorT>
• target::host_task to access a buffer from a host task Available when dimensions > 0 accessor(buffer<dataT, dimensions, AllocatorT>
template <typename AllocatorT> &bufferRef, range<dimensions> accessRange,
Class declaration accessor(buffer<dataT, dimensions, const property_list &propList = {});
template <typename dataT, int dimensions, AllocatorT> &bufferRef, template <typename AllocatorT, typename TagT>
access_mode accessMode = const property_list &propList = {}); accessor(buffer<dataT, dimensions, AllocatorT>
(std::is_const_v<dataT> ? access_mode::read &bufferRef, range<dimensions> accessRange,
: access_mode::read_write), template <typename AllocatorT, typename TagT>
accessor(buffer<dataT, dimensions, TagT tag, const property_list &propList = {});
target accessTarget = target::device,
class accessor; AllocatorT> &bufferRef, TagT tag, template <typename AllocatorT>
const property_list &propList = {}); accessor(buffer<dataT, dimensions, AllocatorT>
Constructors and members template <typename AllocatorT> &bufferRef, range<dimensions> accessRange,
accessor(); accessor(buffer<dataT, dimensions, AllocatorT> id<dimensions> accessOffset,
Available when dimensions == 0 &bufferRef, const property_list &propList = {});
template <typename AllocatorT> handler &commandGroupHandlerRef,
accessor(buffer<dataT, 1, AllocatorT> &bufferRef, const property_list &propList = {});
const property_list &propList = {});
(Continued on next page)

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 5
Buffer accessor for commands (cont.) Available when dimensions > 0 Available when dimensions == 0
template <typename AllocatorT, typename TagT> operator reference() const;
Available when dimensions > 0
accessor(buffer<dataT, dimensions, AllocatorT> &bufferRef,
template <typename AllocatorT, typename TagT> handler &commandGroupHandlerRef, Available when dimensions > 0
accessor(buffer<dataT, dimensions, AllocatorT> &bufferRef, range<dimensions> accessRange,
range<dimensions> accessRange, reference operator[](id<dimensions> index) const;
id<dimensions> accessOffset, TagT tag,
id<dimensions> accessOffset, TagT tag, const property_list &propList = {});
const property_list &propList = {}); Available when dimensions > 1
id<dimensions> get_offset() const; __unspecified__ &operator[](size_t index) const;
template <typename AllocatorT>
accessor(buffer<dataT, dimensions, AllocatorT> &bufferRef, void swap(accessor &other);
handler &commandGroupHandlerRef, bool is_placeholder() const; Available when dimensions == 1
range<dimensions> accessRange, reference operator[](size_t index) const;
const property_list &propList = {}); template <access::decorated IsDecorated>
accessor_ptr<IsDecorated> get_multi_ptr() const noexcept; std::add_pointer_t<value_type> get_pointer() const noexcept;
template <typename AllocatorT, typename TagT>
accessor(buffer<dataT, dimensions, AllocatorT> &bufferRef, Common interface functions [Table 79] Property class constructor [4.7.3.3]
handler &commandGroupHandlerRef, This class supports the following functions in addition to begin(),
range<dimensions> accessRange, TagT tag, property::no_init::no_init()
end(), cbegin(), cend(), rbegin(), rend(), crbegin(), and crend().
const property_list &propList = {});
size_type byte_size() const noexcept;
template <typename AllocatorT>
accessor(buffer<dataT, dimensions, AllocatorT> &bufferRef, size_type size() const noexcept;
handler &commandGroupHandlerRef, size_type max_size() const noexcept;
range<dimensions> accessRange,
id<dimensions> accessOffset, bool empty() const noexcept;
const property_list &propList = {}); range<dimensions> get_range() const;

Buffer accessor for host code outside of Available when dimensions > 0 Common interface functions [Table 79]
This class supports the following functions in addition to begin(),
a command (class host_accessor) [4.7.6.10] template <typename AllocatorT>
host_accessor( end(), cbegin(), cend(), rbegin(), rend(), crbegin(), and crend().
Class declaration buffer<dataT, dimensions, AllocatorT> &bufferRef, size_t byte_size() const noexcept;
template <typename dataT, int dimensions, range<dimensions> accessRange, size_t size() const noexcept;
access_mode accessMode = const property_list &propList = {});
size_t max_size() const noexcept;
(std::is_const_v<dataT> ? access_mode::read template <typename AllocatorT, typename TagT>
: access_mode::read_write)> host_accessor( bool empty() const noexcept;
class host_accessor; buffer<dataT, dimensions, AllocatorT> &bufferRef, range<dimensions> get_range() const;
range<dimensions> accessRange, TagT tag,
Constructors and members const property_list &propList = {}); Available when dimensions == 0
All constructors block until data is available from kernels that template <typename AllocatorT> operator reference() const;
access the same underlying buffer. host_accessor(
host_accessor(); buffer<dataT, dimensions, AllocatorT> &bufferRef, Available when dimensions > 0
range<dimensions> accessRange, reference operator[](id<dimensions> index) const;
Available when dimensions == 0 id<dimensions> accessOffset,
template <typename AllocatorT> const property_list &propList = {}); Available when dimensions > 1
host_accessor(buffer<dataT, 1, AllocatorT> &bufferRef, template <typename AllocatorT, typename TagT> __unspecified__ &operator[](size_t index) const;
const property_list &propList = {}); host_accessor(
buffer<dataT, dimensions, AllocatorT> &bufferRef,
range<dimensions> accessRange, Available when dimensions == 1)
Available when dimensions > 0
id<dimensions> accessOffset, TagT tag, reference operator[](size_t index) const;
template <typename AllocatorT> const property_list &propList = {});
host_accessor( std::add_pointer_t<value_type> get_pointer() const noexcept;
buffer<dataT, dimensions, AllocatorT> &bufferRef, id<dimensions> get_offset() const;
const property_list &propList = {}); Property class constructor [4.7.3.3]
void swap(host_accessor &other);
template <typename AllocatorT, typename TagT> property::no_init::no_init()
host_accessor(
buffer<dataT, dimensions, AllocatorT> &bufferRef,
TagT tag, const property_list &propList = {});

Local accessor from within kernel template <access::decorated IsDecorated>


accessor_ptr<IsDecorated>
Available when dimensions > 1
functions (class local_accessor) [4.7.6.11] get_multi_ptr() const noexcept;
__unspecified__ &operator[](size_t index) const;
dataT can be any C++ type
Common interface functions [Table 79] Available when == 1
Class declaration This class supports the following functions in addition to begin(), reference operator[](size_t index) const;
template <typename dataT, int dimensions> end(), cbegin(), cend(), rbegin(), rend(), crbegin(), and crend().
class local_accessor; std::add_pointer_t<value_type> get_pointer() const noexcept;
size_t byte_size() const noexcept;
Constructors and members size_t size() const noexcept; Property class constructor [4.7.3.3]
local_accessor(); property::no_init::no_init()
size_t max_size() const noexcept;
Available when dimensions == 0
bool empty() const noexcept;
local_accessor(handler &commandGroupHandlerRef,
const property_list &propList = {}); range<dimensions> get_range() const;
Available when dimensions == 0
Available when dimensions > 0 operator reference() const;
local_accessor(range<dimensions> allocationSize,
handler &commandGroupHandlerRef, Available when dimensions > 0
const property_list &propList = {});
reference operator[](id<dimensions> index) const;
void swap(accessor &other);

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 6 SYCL 2020 API Reference Guide
Unsampled image accessors [4.7.6.13] template <typename AllocatorT, typename TagT>
unsampled_image_accessor(
template <typename AllocatorT, typename TagT>
host_unsampled_image_accessor(
There are two kinds of unsampled image accessors:
• class unsampled_image_accessor: From within a kernel unsampled_image<dimensions, AllocatorT> &imageRef, unsampled_image<dimensions, AllocatorT> &imageRef,
function or from within a host task handler &commandGroupHandlerRef, TagT tag, const property_list &propList = {});
TagT tag, const property_list &propList = {});
• class host_unsampled_image_accessor: From host code Available to both unsampled image accessor types
outside of a host task host_unsampled_image_accessor size_t size() const noexcept;
unsampled_image_accessor Class declaration
Available when (accessMode == access_mode::read)
Class declaration template <typename dataT, int dimensions,
access_mode accessMode> template <typename coordT>
template <typename dataT, int dimensions, dataT read(const coordT &coords) const;
access_mode accessMode, class host_unsampled_image_accessor;
image_target accessTarget = image_target::device >
class unsampled_image_accessor; Constructors Available when (accessMode == access_mode::write)
template <typename AllocatorT>
host_unsampled_image_accessor( template <typename coordT>
Constructors void write(const coordT &coords,
template <typename AllocatorT> unsampled_image<dimensions, AllocatorT> &imageRef,
const property_list &propList = {}); const dataT &color) const;
unsampled_image_accessor(
unsampled_image<dimensions, AllocatorT> &imageRef,
handler &commandGroupHandlerRef,
const property_list &propList = {});

Sampled image accessors [4.7.6.14] Constructors


template <typename AllocatorT>
Constructor
template <typename AllocatorT>
There are two kinds of sampled image accessors:
• class sampled_image_accessor: From within a kernel function sampled_image_accessor( host_sampled_image_accessor(
or from within a host task sampled_image<dimensions, AllocatorT> &imageRef, sampled_image<dimensions, AllocatorT> &imageRef,
handler &commandGroupHandlerRef, const property_list &propList = {});
• class host_sampled_image_accessor: From host code outside const property_list &propList = {});
of a host task Available to both sampled image accessor types
template <typename AllocatorT, typename TagT>
sampled_image_accessor( size_t size() const noexcept;
sampled_image_accessor
sampled_image<dimensions, AllocatorT> &imageRef, if dimensions == 1, coordT = float
Class declaration handler &commandGroupHandlerRef, TagT tag,
template <typename dataT, int dimensions, const property_list &propList = {}); if dimensions == 2, coordT = float2
image_target accessTarget = image_target::device> if dimensions == 4, coordT = float4
class sampled_image_accessor; host_sampled_image_accessor template <typename coordT>
Class declaration dataT read(const coordT &coords) const;
template <typename dataT, int dimensions>
class host_sampled_image_accessor;

Class multi_ptr [4.7.7.1] Available if Space == address_space::generic_space && Cast to local_ptr, available if Space == address_space::generic_space
The address spaces are global_space, local_space, private_space, ASP != access::address_space::constant_space explicit operator multi_ptr<const value_type,
and generic_space. template<access::address_space ASP, access::address_space::local_space,
access::decorated IsDecorated> DecorateAddress>() const;
Class declaration multi_ptr &operator=(
template <typename ElementType, const multi_ptr<value_type, ASP, IsDecorated>&);
access::address_space Space, Implicit conversions to a multi_ptr
access::decorated DecorateAddress> template<access::address_space ASP,
access::decorated IsDecorated> Implicit conversion to a multi_ptr<void>. Only available when
class multi_ptr; value_type is not const-qualified.
multi_ptr &operator=(
Members: Constructors multi_ptr<value_type, ASP, IsDecorated>&&); template<access::decorated DecorateAddress>
multi_ptr(); reference operator*() const; operator multi_ptr<void, Space, DecorateAddress>() const;
multi_ptr(const multi_ptr&); pointer operator->() const; Implicit conversion to a multi_ptr<const void>. Only available
multi_ptr(multi_ptr&&); pointer get() const; when value_type is const-qualified.
explicit multi_ptr(multi_ptr<ElementType, Space, std::add_pointer_t<value_type> get_raw() const; template<access::decorated DecorateAddress>
yes>::pointer); operator multi_ptr<const void, Space,
__unspecified__ * get_decorated() const; DecorateAddress>() const;
multi_ptr(std::nullptr_t);
Members: Conversions Implicit conversion to multi_ptr<const value_type, Space>.
Available if Space == global_space or generic_space template<access::decorated DecorateAddress>
template <int dimensions, access::mode Mode, Cast to private_ptr, available if Space == address_space::generic_space
operator multi_ptr<const value_type, Space,
access::placeholder isPlaceholder> explicit operator multi_ptr<value_type, DecorateAddress>() const;
multi_ptr(accessor<ElementType, dimensions, Mode, access::address_space::private_space,
target::device, isPlaceholder>); DecorateAddress>(); Implicit conversion to the non-decorated version of multi_ptr.
explicit operator multi_ptr<const value_type, Only available when is_decorated is true.
Available if Space == global_space or generic_space access::address_space::private_space, operator multi_ptr<value_type, Space,
template <int dimensions> DecorateAddress>() const; access::decorated::no>() const;
multi_ptr(local_accessor<ElementType, dimensions>); Cast to global_ptr, available if Space == address_space::generic_space Implicit conversion to the decorated version of multi_ptr. Only
explicit operator multi_ptr<value_type, available when is_decorated is false.
Members: Assignment and access operators access::address_space::global_space, DecorateAddress>(); operator multi_ptr<value_type, Space,
multi_ptr &operator=(const multi_ptr&); access::decorated::yes>() const;
explicit operator multi_ptr<const value_type,
multi_ptr &operator=(multi_ptr&&); access::address_space::global_space, void prefetch(size_t numElements) const;
multi_ptr &operator=(std::nullptr_t); DecorateAddress>() const;
explicit operator multi_ptr<value_type, Members: Arithmetic operators
access::address_space::local_space, The multi_ptr class supports the standard arithmetic and
Class multi_ptr specialized for void and DecorateAddress>(); relational operators.
const void [4.7.7.1]
Class declaration Available if Space == global_space Members
template <access::address_space Space, template <typename ElementType, int dimensions, pointer get() const;
access::decorated DecorateAddress> access_mode Mode, access::placeholder isPlaceholder> explicit operator pointer() const;
class multi_ptr<VoidType, Space, DecorateAddress>
DecorateAddress: yes, no multi_ptr(accessor<ElementType, dimensions, Mode, template <typename ElementType>
VoidType: void or const void
target::device, isPlaceholder>); explicit operator multi_ptr<ElementType, Space,
Available if Space == local_space DecorateAddress>() const;
Members: Constructors template <typename ElementType, int dimensions> Only available when is_decorated is true.
multi_ptr();
multi_ptr(local_accessor<ElementType, dimensions>); operator multi_ptr<value_type, Space,
multi_ptr(const multi_ptr&); access::decorated::no>() const;
multi_ptr(multi_ptr&&); Assignment operators
explicit multi_ptr(multi_ptr<VoidType, Space, yes>::pointer); multi_ptr &operator=(const multi_ptr&);
multi_ptr &operator=(multi_ptr&&);
multi_ptr(std::nullptr_t);
multi_ptr &operator=(std::nullptr_t); (Continued on next page)
©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl
SYCL 2020 API Reference Guide Page 7
multi_ptr specialized (cont.) Explicit pointer aliases [4.7.7.2] Sampler class enums [4.7.8]
Aliases to class multi_ptr for each specialization of The SYCL image_sampler struct contains a configuration for
Only available when is_decorated is false. access::address_space: sampling a sampled_image.
operator multi_ptr<value_type, Space, global_ptr
access::decorated::yes>() const; local_ptr struct image_sampler {
private_ptr addressing_mode addressing;
operator multi_ptr<const void, Space, DecorateAddress>() coordinate_mode coordinate;
const; filtering_mode filtering;
Aliases for non-decorated pointers: };
template <access::address_space Space, access::decorated raw_global_ptr
DecorateAddress, typename ElementType> raw_local_ptr addressing filtering
multi_ptr<ElementType, Space, DecorateAddress> raw_private_ptr
address_space_cast(ElementType *); mirrored_repeat nearest
repeat linear
Operators Aliases for decorated pointers: clamp_to_edge
decorated_global_ptr clamp coordinate
The multi_ptr class supports the standard arithmetic and none
relational operators. decorated_local_ptr normalized
decorated_private_ptr unnormalized

Unified Shared Memory [4.8] void* sycl::aligned_alloc_device(size_t alignment,


size_t numBytes, const device& syclDevice,
void* sycl::aligned_malloc_shared(size_t alignment,
size_t numBytes, const device& syclDevice,
Unified Shared Memory is an optional addressing model providing
an alternative to the buffer model. See examples on page 15 of const context& syclContext, const context& syclContext,
this reference guide. const property_list &propList = {}); const property_list &propList = {});
template <typename T> template <typename T>
There are three kinds of USM allocations (enum class alloc): T* sycl::aligned_alloc_device(size_t alignment, size_t count, T* sycl::aligned_malloc_shared(size_t alignment,
const device& syclDevice, const context& syclContext, size_t count, const device& syclDevice,
host in host memory accessible by a device const property_list &propList = {}); const context& syclContext,
device in device memory not accessible by the host void* sycl::aligned_alloc_device(size_t alignment, const property_list &propList = {});
shared in shared memory accessible by host and device size_t numBytes, const queue& syclQueue, void* sycl::aligned_malloc_shared(size_t alignment,
const property_list &propList = {}); size_t numBytes, const queue& syclQueue,
Class usm_allocator [4.8.3] template <typename T> const property_list &propList = {});
Class declaration T* sycl::aligned_alloc_device(size_t alignment, template <typename T>
template <typename T, usm::alloc AllocKind, size_t count, const queue& syclQueue, T* sycl::aligned_malloc_shared(size_t alignment,
size_t Alignment = 0> const property_list &propList = {}); size_t count, const queue& syclQueue,
class usm_allocator; const property_list &propList = {});
Host allocation functions [4.8.3.3]
Constructors and members void* sycl::malloc_host(size_t numBytes, Parameterized allocation functions [4.8.3.5]
usm_allocator(const context &ctxt, const device &dev, const context& syclContext, void* sycl::malloc(size_t numBytes,
const property_list &propList = {}) noexcept; const property_list &propList = {}); const device& syclDevice, const context& syclContext,
usm_allocator(const queue &q, template <typename T> usm::alloc kind, const property_list &propList = {});
const property_list &propList = {}) noexcept; T* sycl::malloc_host(size_t count, template <typename T>
template <class U> usm_allocator(usm_allocator<U, AllocKind, const context& syclContext, T* sycl::malloc(size_t count,
Alignment> const &) noexcept; const property_list &propList = {}); const device& syclDevice, const context& syclContext,
T *allocate(size_t count); void* sycl::malloc_host(size_t numBytes, usm::alloc kind, const property_list &propList = {});
void deallocate(T *Ptr, size_t count); const queue& syclQueue, void* sycl::malloc(size_t numBytes,
const property_list &propList = {}); const queue& syclQueue, usm::alloc kind,
Allocators only compare equal if they are of the same USM kind, template <typename T> const property_list &propList = {});
alignment, context, and device. T* sycl::malloc_host(size_t count, template <typename T>
template <class U, usm::alloc AllocKindU, size_t const queue& syclQueue, T* sycl::malloc(size_t count,
AlignmentU> const property_list &propList = {}); const queue& syclQueue, usm::alloc kind,
friend bool operator==(const usm_allocator<T, void* sycl::aligned_alloc_host(size_t alignment, const property_list &propList = {});
AllocKind, Alignment> &, const usm_allocator<U, size_t numBytes, const context& syclContext, void* sycl::aligned_alloc(size_t alignment,
AllocKindU, AlignmentU> &); const property_list &propList = {}); size_t numBytes, const device& syclDevice,
template <class U, usm::alloc AllocKindU, size_t template <typename T> const context& syclContext, usm::alloc kind,
AlignmentU> T* sycl::aligned_alloc_host(size_t alignment, size_t count, const property_list &propList = {});
friend bool operator!=(const usm_allocator<T, const context& syclContext, template <typename T>
AllocKind, Alignment> &, const usm_allocator<U, const property_list &propList = {}); T* sycl::aligned_alloc(size_t alignment, size_t count,
AllocKindU, AlignmentU> &); const device& syclDevice, const context& syclContext,
void* sycl::aligned_alloc_host(size_t alignment,
size_t numBytes, const queue& syclQueue, usm::alloc kind, const property_list &propList = {});
malloc-style API [4.8.3]
Device allocation functions [4.8.3.2]
const property_list &propList = {}); void* sycl::aligned_alloc(size_t alignment,
template <typename T> size_t numBytes, const queue& syclQueue,
void* sycl::malloc_device(size_t numBytes, T* sycl::aligned_alloc_host(size_t alignment, usm::alloc kind, const property_list &propList = {});
const device& syclDevice, const context& syclContext, size_t count, const queue& syclQueue, template <typename T>
const property_list &propList = {}); const property_list &propList = {}); T* sycl::aligned_alloc(size_t alignment,
template <typename T> size_t count, const queue& syclQueue,
T* sycl::malloc_device(size_t count, Shared allocation functions [4.8.3.4] usm::alloc kind, const property_list &propList = {});
const device& syclDevice, const context& syclContext, void* sycl::malloc_shared(size_t numBytes,
const property_list &propList = {}); const device& syclDevice, const context& syclContext, Memory deallocation functions [4.8.3.6]
void* sycl::malloc_device(size_t numBytes, const property_list &propList = {}); void sycl::free(void* ptr, sycl::context& syclContext);
const queue& syclQueue, template <typename T> void sycl::free(void* ptr, sycl::queue& syclQueue);
const property_list &propList = {}); T* sycl::malloc_shared(size_t count,
template <typename T> const device& syclDevice, const context& syclContext, USM pointer queries [4.8.4]
T* sycl::malloc_device(size_t count, const property_list &propList = {}); These queries are available only on the host.
const queue& syclQueue, void* sycl::malloc_shared(size_t numBytes,
const property_list &propList = {}); usm::alloc get_pointer_type(const void *ptr,
const queue& syclQueue, const context &ctxt)
const property_list &propList = {}); sycl::device get_pointer_device(const void *ptr,
template <typename T> const context &ctxt)
T* sycl::malloc_shared(size_t count,
const queue& syclQueue,
const property_list &propList = {});

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 8 SYCL 2020 API Reference Guide
Ranges and index space identifiers [4.9.1] Class nd_item [4.9.1.5] Class group [4.9.1.7]
Encapsulates all functionality required to represent a particular
Identifies an instance of the function object executing at each
Class range [4.9.1.1] point in an nd_range<int dimensions> passed to a parallel_for call. work-group within a parallel execution. It is not user-
A 1D, 2D or 3D vector that defines the iteration domain of constructable.
either a single work-group in a parallel dispatch, or the overall Class declaration
dimensions of the dispatch. It can be constructed from integers. template <int dimensions = 1> class nd_item;
Class declaration
This class supports the standard arithmetic, logical, and relational Members template <int dimensions = 1> class group;
operators. id<dimensions> get_global_id() const;
Class declaration size_t get_global_id(int dimension) const; Members
template <int dimensions = 1> class range; size_t get_global_linear_id() const; id<Dimensions> get_group_id() const;
Constructors and members size_t get_group_id(int dimension) const;
range(size_t dim0); id<dimensions> get_local_id() const;
id<Dimensions> get_local_id() const;
range(size_t dim0, size_t dim1); size_t get_local_id(int dimension) const;
size_t get_local_id(int dimension) const;
range(size_t dim0, size_t dim1, size_t dim2); size_t get_local_linear_id() const;
range<Dimensions> get_local_range() const;
size_t get(int dimension) const; group<dimensions> get_group() const; size_t get_local_range(int dimension) const;
size_t &operator[](int dimension); size_t get_group(int dimension) const; range<Dimensions> get_group_range() const;
size_t operator[](int dimension) const; size_t get_group_linear_id() const; size_t get_group_range(int dimension) const;
size_t size() const; range<dimensions> get_group_range() const; range<Dimensions> get_max_local_range() const;
size_t get_group_range(int dimension) const; size_t operator[](int dimension) const;
Class nd_range [4.9.1.2] range<dimensions> get_global_range() const; size_t get_group_linear_id() const;
Defines the iteration domain of both the work-groups and the
overall dispatch. To define this the nd_range comprises two size_t get_global_range(int dimension) const; size_t get_local_linear_id() const;
ranges: the whole range over which the kernel is to be executed, range<dimensions> get_local_range() const; size_t get_group_linear_range() const;
and the range of each work group. size_t get_local_range(int dimension) const; size_t get_local_linear_range() const;
Class declaration nd_range<dimensions> get_nd_range() const; bool leader() const;
template <int dimensions = 1> class nd_range;
template <typename dataT> template<typename workItemFunctionT>
Constructors and members device_event async_work_group_copy( void parallel_for_work_item(
nd_range(range<dimensions> globalSize, decorated_local_ptr<dataT> dest, const workItemFunctionT &func) const;
range<dimensions> localSize); decorated_global_ptr<dataT> src, template<typename workItemFunctionT>
range<dimensions> get_global_range() const; size_t numElements) const; void parallel_for_work_item(range<dimensions>
range<dimensions> get_local_range() const; template <typename dataT> logicalRange, const workItemFunctionT &func) const;
range<dimensions> get_group_range() const; device_event async_work_group_copy( template <typename dataT>
decorated_global_ptr<dataT> dest, device_event async_work_group_copy(
decorated_local_ptr<dataT> src, decorated_local_ptr<dataT> dest,
Class id [4.9.1.3] size_t numElements) const; decorated_global_ptr<dataT> src,
A vector of dimensions that is used to represent an id into a template <typename dataT> size_t numElements) const;
global or local range. It can be used as an index in an accessor device_event async_work_group_copy(
of the same rank. This class supports the standard arithmetic, template <typename dataT>
logical, and relational operators.
decorated_local_ptr<dataT> dest, device_event async_work_group_copy(
decorated_global_ptr<dataT> src, decorated_global_ptr<dataT> dest,
Class declaration size_t numElements, size_t srcStride) const; decorated_local_ptr<dataT> src,
template <int dimensions = 1> class id; size_t numElements) const;
template <typename dataT>
Constructors and members device_event async_work_group_copy( template <typename dataT>
id(); decorated_global_ptr<dataT> dest, device_event async_work_group_copy(
id(size_t dim0); decorated_local_ptr<dataT> src, decorated_local_ptr<dataT> dest,
id(size_t dim0, size_t dim1); size_t numElements, size_t destStride) const; decorated_global_ptr<dataT> src,
template <typename... eventTN> size_t numElements, size_t srcStride) const;
id(size_t dim0, size_t dim1, size_t dim2); void wait_for(eventTN... events) const; template <typename dataT>
id(const range<dimensions> &range); device_event async_work_group_copy(
id(const item<dimensions> &item); Class h_item [4.9.1.6] decorated_global_ptr<dataT> dest,
Identifies an instance of a group::parallel_for_work_item function decorated_local_ptr<dataT> src, size_t numElements,
size_t get(int dimension) const; object executing at each point in a local range<int dimensions> size_t destStride) const;
size_t &operator[](int dimension); passed to a parallel_for_work_item call or to the corresponding
template <typename... eventTN>
size_t operator[](int dimension) const; parallel_for_work_group call if no range is passed to the
parallel_for_work_item call. void wait_for(eventTN... events) const;
Class item [4.9.1.4] Class declaration Class sub_group [4.9.1.8]
Identifies an instance of the function object executing at each template <int dimensions> class h_item; Encapsulates all functionality required to represent a particular
point in a range. It is passed to a parallel_for call or returned by Members sub-group within a parallel execution. It is not user-constructible.
member functions of h_item. item<dimensions, false> get_global() const; Members
Class declaration item<dimensions, false> get_local() const; id<1> get_group_id() const;
template <int dimensions = 1, bool with_offset = true> item<dimensions, false> get_logical_local() const; id<1> get_local_id() const;
class item;
item<dimensions, false> get_physical_local() const; range<1> get_local_range() const;
Members
id<dimensions> get_id() const; range<dimensions> get_global_range() const; range<1> get_group_range() const;
size_t get_global_range(int dimension) const; range<1> get_max_local_range() const;
size_t get_id(int dimension) const;
id<dimensions> get_global_id() const; uint32_t get_group_linear_id() const;
size_t operator[](int dimension) const; uint32_t get_local_linear_id() const;
size_t get_global_id(int dimension) const;
range<dimensions> get_range() const; range<dimensions> get_local_range() const; uint32_t get_group_linear_range() const;
size_t get_range(int dimension) const; size_t get_local_range(int dimension) const; uint32_t get_local_linear_range() const;
Available if with_offset is false id<dimensions> get_local_id() const; bool leader() const;
operator item<dimensions, true>() const; size_t get_local_id(int dimension) const;
range<dimensions> get_logical_local_range() const;
Available if dimensions == 1 size_t get_logical_local_range(int dimension) const;
operator size_t() const; id<dimensions> get_logical_local_id() const;
size_t get_logical_local_id(int dimension) const;
size_t get_linear_id() const; range<dimensions> get_physical_local_range() const;
size_t get_physical_local_range(int dimension) const;
id<dimensions> get_physical_local_id() const;
size_t get_physical_local_id(int dimension) const;

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 9
Reduction variables [4.9.2] Reduction property constructor [4.9.2.2] Reduction kernel example [4.9.2]
Reductions are supported for all SYCL copyable types. property::reduction::initialize_to_identity::initialize_to_identity() The following example shows how to write a reduction kernel
that performs two reductions simultaneously on the same input
template <typename BufferT, typename BinaryOperation> values, computing both the sum of all values in a buffer and the
__unspecified__ reduction(BufferT vars, handler& cgh, Reducer class functions [4.9.2.3] maximum value in the buffer.
BinaryOperation combiner, Defines the interface between a work-item and a reduction
const property_list &propList = {}); variable during the execu­tion of a SYCL kernel, restricting access
to the underlying reduction variable. buffer<int> valuesBuf { 1024 };
template <typename T, typename BinaryOperation> {
__unspecified__ reduction(T* var, template <typename T> // Initialize buffer on the host with 0, 1, 2, 3, ..., 1023
BinaryOperation combiner, void operator+=(reducer<T,plus<T>,0>& accum,
const property_list &propList ={}); const T& partial); host_accessor a { valuesBuf };
std::iota(a.begin(), a.end(), 0);
template <typename T, typename Extent, template <typename T> }
typename BinaryOperation> void operator*=(reducer<T,multiplies<T>,0>& accum,
__unspecified__ reduction(span<T, Extent> vars, const T& partial);
BinaryOperation combiner, // Buffers with just 1 element to get the reduction results
const property_list &propList = {}); Available only for integral types int sumResult = 0;
template <typename T> buffer<int> sumBuf { &sumResult, 1 };
Available if has_known_identity<BinaryOperation, void operator&=(reducer<T,bit_and<T>,0>& accum, int maxResult = 0;
BufferT::value_type>::value is false const T& partial); buffer<int> maxBuf { &maxResult, 1 };
template <typename BufferT, typename BinaryOperation> template <typename T>
__unspecified__ reduction(BufferT vars, handler& cgh, void operator|=(reducer<T,bit_or<T>,0>& accum, myQueue.submit([&](handler& cgh) {
const BufferT::value_type& identity, const T& partial);
BinaryOperation combiner,
const property_list &propList = {}); template <typename T> // Input values to reductions are standard accessors
void operator^=(reducer<T,bit_xor<T>,0>& accum, auto inputValues = valuesBuf.get_access<access_mode::read>(cgh);
Available if has_known_identity<BinaryOperation, const T& partial);
T>::value is false template <typename T> // Create temporary objects describing variables with
template <typename T, typename BinaryOperation> void operator++(reducer<T,plus<T>,0>& accum); // reduction semantics
__unspecified__ reduction(T* var, const T& identity, auto sumReduction = reduction(sumBuf, cgh, plus<>());
BinaryOperation combiner, Member functions auto maxReduction = reduction(maxBuf, cgh, maximum<>());
const property_list &propList = {}); void id combine(const T& partial) const;
Available if has_known_identity<BinaryOperation, __unspecified__ &operator[](size_t index) const; // parallel_for performs two reduction operations
T>::value is false T identity() const; // For each reduction variable, the implementation:
template <typename T, typename Extent, // - Creates a corresponding reducer
typename BinaryOperation> Operators // - Passes a reference to the reducer to the lambda as a parameter
__unspecified__ reduction(span<T, Extent> vars, template <typename T> cgh.parallel_for(range<1>{1024},
const T& identity, BinaryOperation combiner); void operator+=(reducer<T, plus<T>,0>& accum, sumReduction, maxReduction,
const T& partial); [=](id<1> idx, auto& sum, auto& max) {
template <typename T> // plus<>() corresponds to += operator, so sum can be
void operator*=(reducer<T, multiplies<T>,0>& accum, // updated via += or combine()
Command group handler class [4.9.4] const T& partial); sum += inputValues[idx];
Class handler template <typename T>
A command group handler object can only be constructed by void operator|=(reducer<T, bit_or<T>,0>& accum, // maximum<>() has no shorthand operator, so max
the SYCL runtime. All of the accessors defined in command const T& partial);
group scope take as a parameter an instance of the command // can only be updated via combine()
template <typename T> max.combine(inputValues[idx]);
group handler, and all the kernel invocation functions are void operator&=(reducer<T, bit_and<T>,0>& accum,
member functions of this class. });
const T& partial); });
template <typename dataT, int dimensions, template <typename T>
access_mode accessMode, access_target void operator^=(reducer<T, bit_xor<T>,0>& accum,
accessTarget, access::placeholder isPlaceholder> const T& partial); // sumBuf and maxBuf contain the reduction results once
void require(accessor<dataT, dimensions, accessMode, // the kernel completes
accessTarget, placeholder> acc); template <typename T> assert(maxBuf.get_host_access()[0] == 1023
void operator++(reducer<T, plus<T>,0>& accum); && sumBuf.get_host_access()[0] == 523776);
void depends_on(event depEvent);
void depends_on(const std::vector<event> &depEvents);
Backend interoperability interface
template <typename T> USM functions template <typename T_src, typename T_dest,
void set_arg(int argIndex, T && arg); void memcpy(void *dest, const void *src, size_t numBytes); int dim_dest, access_mode mode_dest,
template <typename... Ts> template <typename T> target tgt_dest, access::placeholder isPlaceholder>
void set_args(Ts &&... args); void copy(const T *src, T *dest, size_t count); void copy(const T_src *src, accessor<T_dest, dim_dest,
mode_dest, tgt_dest, isPlaceholder> dest);
void memset(void *ptr, int value, size_t numBytes);
Kernel dispatch API template <typename T_src, int dim_src,
template <typename KernelName, typename KernelType> template <typename T> access_mode mode_src, target tgt_src,
void single_task(const KernelType &kernelFunc); void fill(void *ptr, const T &pattern, size_t count); access::placeholder isPlaceholder_src, typename T_dest,
template <typename KernelName, int dimensions, void prefetch(void *ptr, size_t numBytes); int dim_dest, access::mode mode_dest,
typename... Rest> access::target tgt_dest,
void mem_advise(void *ptr, size_t numBytes, int advice); access::placeholder isPlaceholder_dest>
void parallel_for(range<dimensions> numWorkItems, void copy(
Rest&&... rest); Explicit memory operation APIs accessor<T_src, dim_src, mode_src, tgt_src,
template <typename KernelName, int dimensions, In addition to kernels, command group objects can also be used isPlaceholder_src> src,
typename... Rest> to perform manual operations on host and device memory by accessor<T_dest, dim_dest, mode_dest, tgt_dest,
void parallel_for(nd_range<dimensions> using the copy API of the command group handler. Following are isPlaceholder_dest> dest);
executionRange,Rest&&... rest); members of class handler.
template <typename T, int dim, access_mode mode, target tgt,
template <typename KernelName, typename template <typename T_src, int dim_src, access::placeholder isPlaceholder>
WorkgroupFunctionType, int dimensions> access_mode mode_src, target tgt_src, void update_host(
void parallel_for_work_group( access::placeholder isPlaceholder, accessor<T, dim, mode, tgt, isPlaceholder> acc);
range<dimensions> numWorkGroups, typename T_dest>
const WorkgroupFunctionType &kernelFunc); void copy(accessor<T_src, dim_src, template<typename T, int dim, access_mode mode, target tgt,
mode_src, tgt_src, isPlaceholder> src, access::placeholder isPlaceholder>
template <typename KernelName, typename std::shared_ptr<T_dest> dest); void fill(accessor<T, dim, mode, tgt, isPlaceholder> dest,
WorkgroupFunctionType, int dimensions> const T& src);
void parallel_for_work_group( template <typename T_src, typename T_dest, int dim_dest,
range<dimensions> numWorkGroups, access_mode mode_dest, target tgt_dest, template<auto& S>
range<dimensions> workGroupSize, access::placeholder isPlaceholder> typename std::remove_reference_t<decltype(S)>::type
const WorkgroupFunctionType &kernelFunc); void copy(std::shared_ptr<T_src> src, get_specialization_constant();
accessor<T_dest, dim_dest, mode_dest,
void single_task(const kernel_name &kernelObject); tgt_dest, isPlaceholder> dest); Member function for using a kernel bundle [4.9.4.4]
template <int dimensions> template <typename T_src, int dim_src, void use_kernel_bundle(const kernel_bundle<
void parallel_for(range<dimensions> numWorkItems, access_mode mode_src, target tgt_src, bundle_state::executable> &execBundle);
const kernel &kernelObject); access::placeholder isPlaceholder, typename T_dest>
template <int dimensions> void copy(accessor<T_src, dim_src, mode_src, tgt_src,
void parallel_for(range<dimensions> ndRange, isPlaceholder> src, T_dest *dest);
const kernel &kernelObject);

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 10 SYCL 2020 API Reference Guide
Specialization constants [4.9.5] Host tasks [4.10] Available only if the optional interoperability function get_native
taking a queue is available.
Class specialization_id declaration Class interop_handle [4.10.1-2]
An abstraction over the queue which is being used to invoke the template <backend Backend, typename dataT, int dims>
template <typename T> backend_return_t<Backend, sampled_image<dims>>
class specialization_id; host task and its associated device and context.
get_native_mem(
Member functions
const sampled_image_accessor<dataT, dims,
Class specialization_id constructor image_target::device> &imageAcc) const;
template<class... Args > backend get_backend() const noexcept;
explicit constexpr specialization_id(Args&&... args);
Available only if the optional interoperability function get_native Available only if the optional interoperability function get_native
Members of class handler taking a buffer is available and if accTarget is target::device. taking a queue is available.
template<auto& SpecName> template <backend Backend, typename dataT, template <backend Backend>
void set_specialization_constant( int dims, access_mode accMode, target accTarget, backend_return_t<Backend, queue>
typename std::remove_reference_t<decltype( access::placeholder isPlaceholder> get_native_queue() const;
SpecName)>::type value); backend_return_t<Backend, buffer<dataT, dims>>
template<auto& SpecName> get_native_mem(const accessor<dataT, dims, accMode, Available only if the optional interoperability function get_native
typename std::remove_reference_t<decltype( accTarget, isPlaceholder> &bufferAcc) const; taking a device is available.
SpecName)>::type get_specialization_constant(); template <backend Backend>
Available only if the optional interoperability function get_native backend_return_t<Backend, device>
Member of class kernel_handler taking an unsampled_image is available. get_native_device() const;
template<auto& SpecName> template <backend Backend, typename dataT, int dims,
typename std::remove_reference_t<decltype( access_mode accMode> Available only if the optional interoperability function get_native
SpecName)>::type get_specialization_constant(); backend_return_t<Backend, unsampled_image<dims>> taking a context is available.
get_native_mem( template <backend Backend> backend_return_t<
const unsampled_image_accessor<dataT, dims, backend, context>
accMode, image_target::device> &imageAcc) const; get_native_context() const;
Class private_memory [4.10.4.2.3]
To guarantee use of private per-work-item memory, the Addition to class handler [4.10.3]
private_memory class can be used to wrap the data.
template <typename T>
class private_memory { void host_task(T &&hostTaskCallable);
public:
private_memory(const group<Dimensions> &);
T &operator()(const h_item<Dimensions> &id);
};
Defining kernels [4.12] Defining kernels as lambda functions [4.12.2]
Functions that are executed on a SYCL device are SYCL kernel Kernels may be defined as lambda functions. The name of a
Classes exception & exception_list [4.13.2] functions. A kernel containing a SYCL kernel function is lambda function in SYCL may optionally be specified by passing
Class exception is derived from std::exception. enqueued on a device queue in order to be executed on that it as a template parameter to the invoking member function.
device. For example:
Members of class exception The return type of the SYCL kernel function is void.
exception(std::error_code ec, const std::string& what_arg); There are two ways of defining kernels: as named function // Explicit kernel names can be optionally forward declared
exception(std::error_code ec, const char * what_arg); objects or as lambda functions. // at namespace scope
exception(std::error_code ec); class MyKernel;
Defining kernels as named function objects [4.12.1]
exception(int ev, const std::error_category& ecat, A kernel can be defined as a named function object type and myQueue.submit([&](handler& h) {
const std::string& what_arg); provide the same functionality as any C++ function object. For
exception(int ev, const std::error_category& ecat, example: // Explicitly name kernel with previously forward
const char* what_arg); // declared type
exception(int ev, const std::error_category& ecat); class RandomFiller { h.single_task<MyKernel>([=]{
public: // [kernel code]
exception(context ctx, std::error_code ec,
const std::string& what_arg); RandomFiller(accessor<int> ptr) });
: ptr_ { ptr } {
exception(context ctx, std::error_code ec, std::random_device hwRand; // Explicitly name kernel without forward declaring type
const char* what_arg); at
std::uniform_int_distribution<> r { 1, 100 };
exception(context ctx, std::error_code ec); randomNum_ = r(hwRand); // namespace scope. Must still be forward declarable at
exception(context ctx, int ev, const std::error_category& ecat, } // namespace scope, even if not declared at that scope
const std::string& what_arg); void operator()(item<1> item) const { h.single_task<class MyOtherKernel>([=]{
exception(context ctx, int ev, const std::error_category& ecat, ptr_[item.get_id()] = get_random(); } // [kernel code]
const char* what_arg); int get_random() { return randomNum_; } });
exception(context ctx, int ev, });
const std::error_category& ecat); private:
accessor<int> ptr_;
const std::error_code& code() const noexcept;
int randomNum_;
const std::error_category& category() const noexcept; };
bool has_context() const noexcept;
void workFunction(buffer<int, 1>& b, queue& q,
context get_context() const; const range<1> r) { Class device_event [4.15.2]
myQueue.submit([&](handler& cgh) { Class device_event encapsulates a single SYCL device event
Members of class exception_list which is available only within SYCL kernel functions and can be
size_type size() const; accessor ptr { buf, cgh };
RandomFiller filler { ptr }; used to wait for asynchronous operations within a SYCL kernel
iterator begin() const; function to complete. The class has an unspecified ctor and
iterator end() const; one other member:
cgh.parallel_for(r, filler); void wait() noexcept;
Helper functions });
Free functions: } class atomic_ref [4.15.3]
const std::error_category& sycl_category() noexcept; Class declaration
template<backend b> template <typename T, memory_order DefaultOrder,
const std::error_category& error_category_for() noexcept; memory_scope DefaultScope, access::address_
space Space = access::address_space::generic_space>
std::error_condition make_error_condition(errc e) noexcept; Synchronization and atomics [4.15] class atomic_ref;
std::error_code make_error_code(errc e) noexcept; Enums
Constructors and members
class memory_order explicit atomic_ref(T& ref);
Standard error codes (enum errc)
relaxed acquire release acq_rel seq_cst atomic_ref(const atomic_ref&) noexcept;
runtime invalid
class memory_scope bool is_lock_free() const noexcept;
kernel memory_allocation
work_item sub_group work_group device system
void store(T operand,
accessor platform memory_order order = default_write_order,
nd_range profiling memory_scope scope = default_scope) const noexcept;
event feature_not_supported atomic_fence [4.15.1] T operator=(T desired) const noexcept;
Free function:
kernel_argument kernel_not_supported T load(memory_order order = default_read_order,
void atomic_fence(memory_order order, memory_scope scope = default_scope) const noexcept;
build backend_mismatch memory_scope scope);
(Continued on next page)

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 11
Synchronization and atomics (cont.) Members
Floating fetch_add(Floating operand,
Scalar data types [4.15]
memory_order order = default_read_modify_write_order, SYCL supports the C++ fundamental data types (not within the
memory_scope scope = default_scope) const noexcept; sycl namespace) and the data types byte and half (in the sycl
operator T() const noexcept; namespace).
T exchange(T operand, Floating fetch_sub(Floating operand,
memory_order order = default_read_modify_write_order, memory_order order = default_read_modify_write_order,
memory_scope scope = default_scope) const noexcept; memory_scope scope = default_scope) const noexcept;
bool compare_exchange_weak(T &expected, T desired, Floating fetch_min(Floating operand, Class device_event [4.15.2]
memory_order success, memory_order failure, memory_order order = default_read_modify_write_order,
memory_scope scope = default_scope) const noexcept; This class encapsulates a single SYCL device event which is
memory_scope scope = default_scope) const noexcept; available only within SYCL kernel functions and can be used to
bool compare_exchange_weak(T &expected, T desired, Floating fetch_max(Floating operand, wait for asynchronous operations within a SYCL kernel function
memory_order order = default_read_modify_write_order, memory_order order = default_read_modify_write_order, to complete. This class contains an unspecified ctor and one
memory_scope scope = default_scope) const noexcept; memory_scope scope = default_scope) const noexcept; other member:
bool compare_exchange_strong(T &expected, T desired, OP is +=, -= void wait() noexcept;
memory_order success, memory_order failure, Floating operatorOP(Floating) const noexcept;
memory_scope scope = default_scope) const noexcept;
atomic_ref specialized for pointer types
bool compare_exchange_strong(T &expected, T desired,
Class declaration
Class stream [4.16]
memory_order order = default_read_modify_write_order,
memory_scope scope = default_scope) const noexcept; template <typename T, memory_order DefaultOrder, Enums
memory_scope DefaultScope, access::address_space stream_manipulator
atomic_ref specialized for integral types Space = access::address_space::generic_space>
class atomic_ref<T*, DefaultOrder, DefaultScope, Space>; dec noshowbase noshowpos endl hexfloat
Class declaration hex showbase showpos fixed defaultfloat
template <memory_order DefaultOrder, memory_scope oct scientific flush
DefaultScope, access::address_space Constructors and members
Space = access::address_space::generic_space> explicit atomic_ref(T*&);
class atomic_ref <Integral, DefaultOrder, DefaultScope,atomic_ref(const atomic_ref&) noexcept; Constructor and members
Space>; stream(size_t totalBufferSize, size_t workItemBufferSize,
void store(T* operand, handler& cgh, const property_list &propList = {});
memory_order order = default_write_order,
Members memory_scope scope = default_scope) const noexcept; size_t size() const noexcept;
Integral fetch_add(Integral operand, size_t get_work_item_buffer_size() const;
memory_order order = default_read_modify_write_order, T* operator=(T* desired) const noexcept;
memory_scope scope = default_scope) const noexcept; T* load(memory_order order = default_read_order, Non-member function
Integral fetch_sub(Integral operand, memory_scope scope = default_scope) const noexcept; template <typename T>
memory_order order = default_read_modify_write_order, operator T*() const noexcept; const stream&
memory_scope scope = default_scope) const noexcept; operator<<(const stream& os, const T &rhs);
T* exchange(T* operand,
Integral fetch_and(Integral operand, memory_order order = default_read_modify_write_order,
memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const noexcept;
memory_scope scope = default_scope) const noexcept;
bool compare_exchange_weak(T* &expected, T* desired, Function Objects [4.17.2]
Integral fetch_or(Integral operand, memory_order success, memory_order failure, SYCL provides a number of function objects in the sycl
memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const noexcept; namespace on host and device that obey C++ conversion and
memory_scope scope = default_scope) const noexcept; promotion rules.
bool compare_exchange_weak(T* &expected, T* desired,
Integral fetch_min(Integral operand, memory_order order = default_read_modify_write_order, template <typename T=void>
memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const noexcept; struct plus {
memory_scope scope = default_scope) const noexcept; T operator()(const T & x, const T & y) const;
bool compare_exchange_strong(T* &expected, T* desired,
Integral fetch_max(Integral operand, memory_order success, memory_order failure, };
memory_order order = default_read_modify_write_order, memory_scope scope = default_scope) const noexcept; template <typename T=void>
memory_scope scope = default_scope) const noexcept; struct multiplies {
bool compare_exchange_strong(T* &expected, T* desired, T operator()(const T & x, const T & y) const;
OP is ++, -- memory_order order = default_read_modify_write_order, };
Integral operatorOP(int) const noexcept; memory_scope scope = default_scope) const noexcept;
template <typename T=void>
Integral operatorOP() const noexcept; T* fetch_add(difference_type, struct bit_and {
memory_order order = default_read_modify_write_order, T operator()(const T & x, const T & y) const;
OP is +=, -=, &=, |=, ^=
memory_scope scope = default_scope) const noexcept; };
Integral operatorOP(Integral) const noexcept; T* fetch_sub(difference_type, template <typename T=void>
memory_order order = default_read_modify_write_order, struct bit_or {
atomic_ref specialized for floating point memory_scope scope = default_scope) const noexcept; T operator()(const T & x, const T & y) const;
Class declaration OP is ++, -- };
template <memory_order DefaultOrder, memory_scope T* operatorOP(int) const noexcept; template <typename T=void>
DefaultScope, access::address_space T* operatorOP() const noexcept; struct bit_xor {
Space = access::address_space::generic_space> T operator()(const T & x, const T & y) const;
class atomic_ref<Floating, DefaultOrder, DefaultScope, OP is +=, -= };
Space>; T* operatorOP(difference_type) const noexcept; template <typename T=void>
struct logical_and {
T operator()(const T & x, const T & y) const;
};
Group functions and algorithms template <typename Group, typename Ptr,
typename Predicate> template <typename T=void>
bool joint_all_of(Group g, Ptr first, Ptr last, Predicate pred); struct logical_or {
Group functions [4.17.3] T operator()(const T & x, const T & y) const;
template <typename Group, typename T> template <typename Group, typename T, };
bool group_broadcast(Group g, T x); typename Predicate> template <typename T=void>
bool all_of_group(Group g, T x, Predicate pred); struct minimum {
template <typename Group, typename T>
T group_broadcast(Group g, T x, template <typename Group> T operator()(const T & x, const T & y) const;
Group::linear_id_type local_linear_id); bool all_of_group(Group g, bool pred); };
template <typename Group, typename T> template <typename Group, typename Ptr, template <typename T=void>
T group_broadcast(Group g, T x, Group::id_type local_id); typename Predicate> struct maximum {
bool joint_none_of(Group g, Ptr first, Ptr last, Predicate pred); T operator()(const T & x, const T & y) const;
template <typename Group> };
void group_barrier(Group g, template <typename Group, typename T,
memory_scope fence_scope = Group::fence_scope); typename Predicate>
bool none_of_group(Group g, T x, Predicate pred);
template <typename Group, typename T>
Group algorithms [4.17.4] template <typename Group> T select_from_group(Group g, T x,
template <typename Group, typename Ptr, bool none_of_group(Group g, bool pred); Group::id_type remote_local_id);
typename Predicate> template <typename Group, typename T> template <typename Group, typename Ptr,
bool joint_any_of(Group g, Ptr first, Ptr last, Predicate pred); T shift_group_left(Group g, T x, typename BinaryOperation>
template <typename Group, typename T, Group::linear_id_type delta = 1); std::iterator_traits<Ptr>::value_type joint_reduce(Group g,
typename Predicate> template <typename Group, typename T> Ptr first, Ptr last, BinaryOperation binary_op);
bool any_of_group(Group g, T x, Predicate pred); T shift_group_right(Group g, T x, template <typename Group, typename Ptr, typename T,
template <typename Group> Group::linear_id_type delta = 1); typename BinaryOperation>
bool any_of_group(Group g, bool pred); template <typename Group, typename T> T joint_reduce(Group g, Ptr first, Ptr last, T init,
T permute_group_by_xor(Group g, T x, BinaryOperation binary_op);
Group::linear_id_type mask); (Continued on next page)

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 12 SYCL 2020 API Reference Guide
Group functions and algorithms (cont.) template <typename Group, typename InPtr, typename OutPtr, template <typename Group, typename InPtr, typename OutPtr,
typename T, typename BinaryOperation> typename T, typename BinaryOperation>
T joint_exclusive_scan(Group g, InPtr first, InPtr last, T joint_inclusive_scan(Group g, InPtr first, InPtr last,
template <typename Group, typename T, typename OutPtr result, T init, BinaryOperation binary_op); OutPtr result, BinaryOperation binary_op, T init);
BinaryOperation>
T reduce_over_group(Group g, T x, template <typename Group, typename T, template <typename Group, typename T,
BinaryOperation binary_op); typename BinaryOperation> typename BinaryOperation>
T exclusive_scan_over_group(Group g, T x, BinaryOperation T inclusive_scan_over_group(Group g, T x, BinaryOperation
template <typename Group, typename V, typename T, binary_op); binary_op);
typename BinaryOperation>
T reduce_over_group(Group g, V x, T init, template <typename Group, typename V, typename T, template <typename Group, typename V, typename T,
BinaryOperation binary_op); typename BinaryOperation> typename BinaryOperation>
T exclusive_scan_over_group(Group g, V x, T init, T inclusive_scan_over_group(Group g, V x,
template <typename Group, typename InPtr, typename OutPtr, BinaryOperation binary_op); BinaryOperation binary_op, T init);
typename BinaryOperation>
OutPtr joint_exclusive_scan(Group g, InPtr first, InPtr last, template <typename Group, typename InPtr, typename OutPtr,
OutPtr result, BinaryOperation binary_op); typename BinaryOperation>
OutPtr joint_inclusive_scan(Group g, InPtr first, InPtr last,
OutPtr result, BinaryOperation binary_op);

Math functions [4.17.5] Tf exp (Tf x)


Exponential base e
Tf minmag (Tf x, Tf y) Minimum magnitude of x and y
Math functions are available in the namespace sycl for host and Tff exp (Tff x) NH
device. In all cases below, n may be 2, 3, 4, 8, or 16. Tf modf (Tf x, Tf *iptr) Decompose floating-point number
Tf (genfloat in the spec) is type float[n], double[n], or half[n]. Tf exp2 (Tf x)
Exponential base 2 Quiet NaN
Tff (genfloatf) is type float[n]. Tff exp2 (Tff x) NH Tff nan (uTi nancode) (Return is scalar when nancode
Tfd (genfloatd) is type double[n]. Tfd nan (uTli nancode) is scalar)
Tf exp10 (Tf x)
Th (genfloath) is type half[n]. Exponential base 10
Tff exp10 (Tff x) NH
sTf (sgenfloat) is type float, double, or half. Tf nextafter (Tf x, Tf y) Next representable floating-point
Ti (genint) is type int[n]. Tf expm1 (Tf x) e x -1.0 value after x in the direction of y
uTi (ugenint) is type unsigned int or uintn. Tf pow (Tf x, Tf y) Compute x to the power of y
Tf fabs (Tf x) Absolute value
uTli (ugenlonginteger) is unsigned long int, ulonglongn, ulongn,
unsigned long long int. Tf fdim (Tf x, Tf y) Positive difference between x and y Tf pown (Tf x, Ti y) Compute x y, where y is an integer
N indicates native variants, available in sycl::native.
H indicates half variants, available in sycl::halfprecision, Tf floor (Tf x) Round to integer toward infinity Tf powr (Tf x, Tf y)
implemented with a minimum of 10 bits of accuracy. Tff powr (Tff x, Tff y) N Compute x y, where x is >= 0
Tf fma (Tf a, Tf b, Tf c) Multiply and add, then round Tff powr (Tff x, Th y) H
Tf acos (Tf x) Arc cosine Tf fmax (Tf x, Tf y) Return y if x < y, 1/x
Tff recip (Tff x) NH
Tf acosh (Tf x) Inverse hyperbolic cosine Tf fmax (Tf x, sTf y) otherwise it returns x (Not available in cl::sycl.)

Tf acospi (Tf x) acos (x) / π Tf fmin (Tf x, Tf y) Return y if y < x, Tf remainder (Tf x, Tf y) Floating point remainder
Tf fmin (Tf x, sTf y) otherwise it returns x
Tf asin (Tf x) Arc sine Tf remquo (Tf x, Tf y, Ti *quo) Remainder and quotient
Tf fmod (Tf x, Tf y) Modulus. Returns x – y * trunc (x/y)
Tf asinh (Tf x) Inverse hyperbolic sine Tf rint (Tf x) Round to nearest even integer
Tf fract (Tf x, Tf *iptr) Fractional value in x
Tf asinpi (Tf x) asin (x) / π Tf rootn (Tf x, Ti y) Compute x to the power of 1/y
Tf frexp (Tf x, Ti *exp) Extract mantissa and exponent
Tf atan (Tf y_over_x) Arc tangent Tf round (Tf x) Integral value nearest to x
Tf hypot (Tf x, Tf y) Square root of x2 + y2 rounding
Tf atan2 (Tf y, Tf x) Arc tangent of y / x
Ti ilogb (Tf x) Return exponent as an integer value Tf rsqrt (Tf x)
Inverse square root
Tf atanh (Tf x) Hyperbolic arc tangent Tff rsqrt (Tff x) NH
Tf ldexp (Tf x, Ti k)
x * 2n Tf sin (Tf x)
Tf atanpi (Tf x) atan (x) / π doublen ldexp (doublen x, int k) Sine
Tff sin (Tff x) NH
Tf atan2pi (Tf y, Tf x) atan2 (y, x) / π Tf lgamma (Tf x) Log gamma function
Tf sincos (Tf x, Tf *cosval) Sine and cosine of x
Tf cbrt (Tf x) Cube root Tf lgamma_r (Tf x, Ti *signp) Log gamma function
Tf sinh (Tf x) Hyperbolic sine
Tf ceil (Tf x) Round to integer toward + infinity Tf log (Tf x)
Natural logarithm Tf sinpi (Tf x) sin (π x)
Tff log (Tff x) NH
Tf copysign (Tf x, Tf y) x with sign changed to sign of y
Tf sqrt (Tf x)
Tf log2 (Tf x) Square root
Tf cos (Tf x) Base 2 logarithm Tff sqrt (Tff x) NH
Cosine Tff log2 (Tff x) NH
Tff cos (Tff x) NH
Tf tan (Tf x)
Tf log10 (Tf x) Tangent
Tf cosh (Tf x) Hyperbolic cosine Base 10 logarithm Tff tan (Tff x) NH
Tff log10 (Tff x) NH
Tf cospi (Tf x) cos (π x) Tf tanh (Tf x) Hyperbolic tangent
Tf log1p (Tf x) ln (1.0 + x)
x/y Tf tanpi (Tf x) tan (π x)
Tff divide (Tff x, Tff y) NH Tf logb (Tf x) Return exponent as an integer value
(Not available in cl::sycl.)
Tf tgamma (Tf x) Gamma function
Tf mad (Tf a, Tf b, Tf c) Approximates a * b + c
Tf erfc (Tf x) Complementary error function
Tf trunc (Tf x) Round to integer toward zero
Tf maxmag (Tf x, Tf y) Maximum magnitude of x and y
Tf erf (Tf x) Calculates error function

Integer functions [4.17.6] uTint abs (Tint x) |x| Count of leading 0-bits in x, starting at
Integer functions are available in the namespace sycl. In all cases the most significant bit position. If x is
below, n may be 2, 3, 4, 8, or 16. If a type in the functions below Tint clz (Tint x) 0, returns the size in bits of the type
is shown with [xbit] in its name, this indicates that the type is x uTint abs_diff (Tint x, Tint y) | x – y | without modulo overflow of x or component type of x, if x is a
bits in size. Parameter types may also be their vec and marray vector type.
counterparts. Tint add_sat (Tint x, Tint y) x + y and saturates the result
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int, Count of trailing 0-bits in x. If x is 0,
char, char[n], signed char, scharn, ucharn, unsigned Tint ctz (Tint x) returns the size in bits of the type
short[n], unsigned short, ushort[n], longn, ulongn, long Tint hadd (Tint x, Tint y) (x + y) >> 1 without mod. overflow of x or component type of x, if x is a
int, unsigned long int, long long int, longlongn, ulonglongn vector type.
unsigned long long int. Tint rhadd (Tint x, Tint y) (x + y + 1) >> 1
uTint (ugeninteger) is type unsigned char, ucharn, Tint mad_hi (Tint a, Tint b, mul_hi(a, b) + c
unsigned short, ushortn, unsigned int, uintn, unsigned long int, Tint clamp (Tint x, Tint min, Tint c)
ulongn, ulonglongn, unsigned long long int. Tint max)
iTint (igeninteger) is type signed char, scharn, short[n], int[n], min(max(x, min), max) Tint mad_sat (Tint a, Tint b,
Tint clamp (Tint x, sTint min, a * b + c and saturates the result
long int, longn, long long int, longlongn. sTint max) Tint c)
sTint (sgeninteger) is type char, signed char, unsigned char, short,
unsigned short, int, unsigned int, long int, unsigned long int, (Continued on next page)
long long int, unsigned long long int.

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


SYCL 2020 API Reference Guide Page 13
Integer functions (cont.) uTint16bit upsample (uTint8bit hi,
uTint8bit lo)
result[i]= ((ushort)hi[i]<< 8)|lo[i]
iTint64bit upsample (iTint32bit hi,
uTint32bit lo)
result[i]=((longlong)hi[i]<< 32)|lo[i]

Tint max (Tint x, Tint y)


y if x < y, otherwise it returns x iTint16bit upsample (iTint8bit hi, Tint32bit mad24 (Tint32bit x,
Tint max (Tint x, sTint y) result[i]=((short)hi[i]<< 8)|lo[i]
uTint8bit lo) Tint32bit y, Tint32bit z) Multiply 24-bit integer values x, y, add 32-
Tint min (Tint x, Tint y) Tint32bit mad24 (Tint32bit x, bit integer result to 32-bit integer z
y if y < x, otherwise it returns x uTint32bit upsample ( uTint16bit hi, Tint32bit y, Tint32bit z)
Tint min (Tint x, sTint y) result[i]=((uint)hi[i]<< 16)|lo[i]
uTint16bit lo)
Tint mul_hi (Tint x, Tint y) high half of the product of x and y Tint32bit mul24 (Tint32bit x,
iTint32bit upsample (iTint16bit hi, Multiply 24-bit integer values x and y
result[i]=((int)hi[i]<< 16)|lo[i] Tint32bit y)
uTint16bit lo)
Tint popcount (Tint x) Number of non-zero bits in x
uTint64bit upsample (uTint32bit hi, result[i]=((ulonglong)hi[i]<< 32)
Tint rotate (Tint v, Tint i) result[indx] = v[indx] << i[indx] |lo[i]
uTint32bit lo)
Tint sub_sat (Tint x, Tint y) x - y and saturates the result

Common functions [4.17.7] Relational built-in functions [4.17.9] int any (iTint x) 1 if MSB in component of x is
Common functions are available in the namespace sycl on host Relational functions are available in the namespace sycl on host set; else 0
and device. On the host the vector types use the vec class and on and device. In all cases below, n may be 2, 3, 4, 8, or 16. If a int all (iTint x) 1 if MSB in all components of x are
an OpenCL device use the corresponding OpenCL vector types. In type in the functions below is shown with [xbit] in its name, this set; else 0
all cases below, n may be 2, 3, 4, 8, or 16. The built-in functions indicates that the type is x bits in size. Each bit of result is corresponding
can take as input float or optionally double and their vec and T bitselect (T a, T b, T c) bit of a if corresponding bit of c is 0
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int,
marray counterparts. char, char[n], signed char, scharn, ucharn, unsigned
Tf (genfloat in the spec) is type float[n], double[n], or half[n]. Tint select (Tint a, Tint b, iTint c)
short[n], unsigned short, ushort[n], longn, ulongn, long
Tff (genfloatf) is type float[n]. int, unsigned long int, long long int, longlongn, ulonglongn Tint select (Tint a, Tint b, uTint c) For each component of a vector
unsigned long long int. Tff select (Tff a, Tff b, Ti c) type, result[i] = if MSB of c[i] is set
Tfd (genfloatd) is type double[n]. Tff select (Tff a, Tff b, uTi c) ? b[i] : a[i] For scalar type, result
iTint (igeninteger) is type signed char, scharn, short[n], int[n], Tfd select (Tfd a, Tfd b, iTint64bit c) = c ? b : a
Tf clamp (Tf x, Tf minval, Tf maxval); long int, longn, long long int, longlongn.
Tff clamp (Tff x, float minval,float maxval); Clamp x to range given by Tfd select (Tfd a, Tfd b, uTint64bit c)
Tfd clamp (Tfd x, double minval, minval, maxval uTint (ugeninteger) is type unsigned char, ucharn, iTint32bit function (Tff x, Tff y)
doublen maxval); unsigned short, ushortn, unsigned int, uintn,
unsigned long int, ulongn, ulonglongn, unsigned long long int. iTint64bit function (Tfd x, Tfd y)
Tf degrees (Tf radians); radians to degrees function: isequal, isnotequal, isgreater, This format is used
Ti (genint) is type int[n]. isgreaterequal, isless, islessequal, islessgreater, for many relational
Tf abs (Tf x, Tf y); uTi (ugenint) is type unsigned int or uintn. isordered, isunordered. functions. Replace
Tff abs (Tff x, float y); Max of x and y function with the
Tfd abs (Tfd x, double y); Tff (genfloatf) is type float[n]. iTint32bit function (Tff x) function name.
Tf max (Tf x, Tf y); Tfd (genfloatd) is type double[n]. iTint64bit function (Tfd x)
Tff max (Tff x, float y); Max of x and y T (gentype) is type float[n], double[n], or half[n], or any type function: isfinite, isinf, isnan, isnormal, signbit.
Tfd max (Tfd x, double y); listed for above for Tint.
Tf min (Tf x, Tf y);
Tff min (Tff x, float y); Min of x and y
Tfd min (Tfd x, double y); Geometric Functions [4.17.8] Kernel attributes [5.8.1]
Geometric functions are available in the namespace sycl on Attributes are applied as shown in the following examples.
Tf mix (Tf x, Tf y, Tf a); host and device. The built-in functions can take as input float
Tff mix (Tff x, Tff y, float a); Linear blend of x and y or optionally double and their vec and marray counterparts, [=] (item<1> it) [[sycl::reqd_work_group_size(16)]] {
Tfd mix (Tfd x, Tfd y, double a) ; for dimensions 2, 3 and 4. On the host the vector types use the //[kernel code]
Tf radians (Tf degrees); degrees to radians vec class and on a SYCL device use the corresponding native }
Tf step (Tf edge, Tf x); SYCL backend vector types.
Tff step (float edge, Tff x); 0.0 if x < edge, else 1.0 Tgf (gengeofloat in the spec) is type float, float2, float3, float4. void operator()(item<1> it) const [[sycl::reqd_work_
Tfd step (double edge, Tfd x); Tgd (gengeodouble) is type double, double2, double3, group_size(16)]] {
Tf smoothstep (Tf edge0, Tf edge1, Tf x); double4. //[kernel code]
};
Tff smoothstep (float edge0, float edge1,
Tff x); Step and interpolate float4 cross (float4 p0, float4 p1) Attributes
Tfd smoothstep (double edge0, float3 cross (float3 p0, float3 p1) reqd_work_group_size(dim0)
Cross product
double edge1, Tfd x); double4 cross (double4 p0, double4 p1)
double3 cross (double3 p0, double3 p1) reqd_work_group_size(dim0, dim1)
Tf sign (Tf x); Sign of x reqd_work_group_size(dim0, dim1, dim2)
float distance (Tgf p0, Tgf p1) work_group_size_hint(dim0)
Vector distance
double distance (Tgd p0, Tgd p1)
work_group_size_hint(dim0, dim1)
float dot (Tgf p0, Tgf p1) work_group_size_hint(dim0, dim1, dim2)
Dot product
double dot (Tgd p0, Tgd p1) vec_type_hint(<type>)
Preprocessor directives and macros [5.6] reqd_sub_group_size(dim)
float length (Tgf p)
SYCL_LANGUAGE_VERSION Integer version, e.g.: 202002 Vector length
double length (Tgd p) Device function attributes [5.8.2]
The attribute below is applied to the declaration of a non-
__ SYCL_DEVICE_ONLY__ ==1 when in device compilation Tgf normalize (Tgf p) kernel device function.
Normal vector length 1
Tgd normalize (Tgd p) sycl::requires(has(aspect,...))
== 1 when producing both host and
__ SYCL_SINGLE_SOURCE__
device binary float fast_distance (Tgf p0, Tgf p1) Vector distance

if defined, indicates support for float fast_length (Tgf p) Vector length


SYCL_EXTERNAL
external kernel linkage
Tgf fast_normalize (Tgf p) Normal vector length 1

Notes

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 14 SYCL 2020 API Reference Guide
Backends [4.1] get_native (4.5.1.2) template <backend Backend, int dimensions = 1,
typename AllocatorT = sycl::image_allocator>
Each Khronos-defined backend is associated with a macro of the Returns a SYCL application interoperability native backend object
form SYCL_BACKEND_BACKEND_NAME. The SYCL backends that sampled_image<dimensions, AllocatorT>
associated with syclObject, which can be used for SYCL application
are available can be identified using the enum class backend: interoperability. make_sampled_image(
const backend_input_t<Backend, sampled_image
enum class backend { template <backend Backend, class T> <dimensions, AllocatorT>> &backendObject,
implementation-defined
backend_return_t<Backend, T> const context &targetContext,
}; get_native(const T &syclObject); image_sampler imageSampler,
event availableEvent = {});
Backend interoperability [4.5.1] Backend functions (4.5.1.3)
template <backend Backend> template <backend Backend, int dimensions = 1,
SYCL applications that rely on SYCL backend-specific behavior platform make_platform(const backend_input_t <Backend, typename AllocatorT = sycl::image_allocator>
must include the SYCL backend-specific header in addition to the platform> &backendObject); sampled_image<dimensions, AllocatorT>
sycl/sycl.hpp header. make_sampled_image(
Support for SYCL backend interoperability is optional. A SYCL template<backend Backend> const backend_input_t<Backend, sampled_image
application using SYCL backend interoperability is considered to device make_device(const backend_input_t<Backend, <dimensions, AllocatorT>> &backendObject,
be non-generic SYCL. device> &backendObject); const context &targetContext,
Backend type traits, template function template <backend Backend> image_sampler imageSampler);
context make_context(const backend_input_t<Backend, template <backend Backend, int dimensions = 1,
template <backend Backend> context> &backendObject, typename AllocatorT = sycl::image_allocator>
class backend_traits { const async_handler asyncHandler = {}); unsampled_image<dimensions, AllocatorT>
public: template <backend Backend> make_unsampled_image(
template <class T> queue make_queue(const backend_input_t<Backend, const backend_input_t<Backend, unsampled_image
using input_type = backend-specific; queue> &backendObject, <dimensions, AllocatorT>> &backendObject,
const context &targetContext, const context &targetContext,
template <class T> const async_handler asyncHandler = {}); event availableEvent);
using return_type = backend-specific; template <backend Backend> template <backend Backend, int dimensions = 1,
using errc = backend-specific; event make_event(const backend_input_t<Backend, event> typename AllocatorT = sycl::image_allocator>
}; &backendObject, unsampled_image<dimensions, AllocatorT>
const context &targetContext); make_unsampled_image(
template <backend Backend, typename SyclType> template <backend Backend, typename T, int dimensions = 1, const backend_input_t<Backend, unsampled_image
using backend_input_t = typename AllocatorT = buffer_allocator< <dimensions, AllocatorT>> &backendObject,
typename backend_traits<Backend>::template std::remove_const_t<T>>> const context &targetContext);
input_type<SyclType>; buffer make_buffer(const backend_input_t<Backend, template <backend Backend, bundle_state State>
buffer<T, dimensions, AllocatorT>> &backendObject, kernel_bundle<State> make_kernel_bundle(
template <backend Backend, typename SyclType> const context &targetContext, const backend_input_t<Backend,
using backend_return_t = event availableEvent = {}); kernel_bundle<State>> &backendObject,
typename backend_traits<Backend>::template template <backend Backend, typename T, int dimensions = 1, const context &targetContext);
return_type<SyclType>; typename AllocatorT = buffer_allocator< template <backend Backend>
std::remove_const_t<T>>> kernel make_kernel(const backend_input_t<Backend,
buffer<T, dimensions, AllocatorT> make_buffer(const kernel> &backendObject,
backend_input_t<Backend, buffer<T, dimensions, const context &targetContext);
AllocatorT>> &backendObject,
const context &targetContext);

Kernel bundles [4.11] template<bundle_state State>


kernel_bundle<State>
Online compiling and linking [4.11.11]
A kernel bundle is a high-level abstraction which represents a set Free functions:
of kernels that are associated with a context and can be executed get_kernel_bundle(const context &ctxt,
const std::vector<kernel_id> &kernelIds); kernel_bundle<bundle_state::object>
on a number of devices, where each device is associated with that compile(const kernel_bundle <
same context. template<bundle_state State, typename Selector> bundle_state::input> &inputBundle,
kernel_bundle<State> const std::vector<device> &devs,
Bundle states get_kernel_bundle(const context &ctxt, Selector selector); const property_list &propList = {});
The device images in the kernel bundle template<typename KernelName, bundle_state State> kernel_bundle<bundle_state::executable>
Bundle state have a format that... kernel_bundle<State> link(const std::vector<kernel_bundle <
bundle_state::input Must be compiled and linked before get_kernel_bundle(const context &ctxt); bundle_state::object>> &objectBundles,
their kernels can be invoked. template<typename KernelName, bundle_state State> const std::vector<device> &devs,
bundle_state::object Must be linked before their kernels can kernel_bundle<State> const property_list &propList = {});
be invoked. get_kernel_bundle(const context &ctxt, kernel_bundle<bundle_state::executable>
const std::vector<device> &devs); build(const kernel_bundle <
bundle_state::executable Allows them to be invoked on a device. bundle_state::input> &inputBundle,
Querying if a bundle exists [4.11.8] const std::vector<device> &devs,
Kernel identifiers [4.11.6] Free functions: const property_list &propList = {});
Some of the functions related to kernel bundles take an input template<bundle_state State> kernel_bundle<bundle_state::object>
parameter of type kernel_id. It is a class with member: bool has_kernel_bundle(const context &ctxt, compile(const kernel_bundle <
const char *get_name() const noexcept; const std::vector<device> &devs); bundle_state::input> &inputBundle,
template<bundle_state State> const property_list &propList = {});
Obtaining a kernel identifier [4.11.6] bool has_kernel_bundle(const context &ctxt, kernel_bundle<bundle_state::executable>
Free functions: const std::vector<device> &devs, link(const kernel_bundle <
std::vector<kernel_id> get_kernel_ids(); const std::vector<kernel_id> &kernelIds); bundle_state::object> &objectBundle,
template<bundle_state State> const std::vector<device> &devs,
template <typename KernelName> const property_list &propList = {});
kernel_id get_kernel_id(); bool has_kernel_bundle(const context &ctxt);
template<bundle_state State> kernel_bundle<bundle_state::executable>
Obtaining a kernel bundle [4.11.7] bool has_kernel_bundle(const context &ctxt, link(const std::vector<kernel_bundle <
Free functions: const std::vector<kernel_id> &kernelIds); bundle_state::object>> &objectBundles,
const property_list &propList = {});
template<bundle_state State> template<typename KernelName, bundle_state State>
kernel_bundle<State> bool has_kernel_bundle(const context &ctxt); kernel_bundle<bundle_state::executable>
get_kernel_bundle(const context &ctxt, link(const kernel_bundle <
template<typename KernelName, bundle_state State> bundle_state::object> &objectBundle,
const std::vector<device> &devs); bool has_kernel_bundle(const context &ctxt, const property_list &propList = {});
template<bundle_state State> const std::vector<device> &devs);
kernel_bundle<State> kernel_bundle<bundle_state::executable>
get_kernel_bundle(const context &ctxt, Querying if kernel is compatible with a device [4.11.9] build(const kernel_bundle <
const std::vector<device> &devs, Free functions: bundle_state::input> &inputBundle,
const std::vector<kernel_id> &kernelIds); const property_list &propList = {});
bool is_compatible(const std::vector<kernel_id> &kernelIds,
template<bundle_state State, typename Selector> const device &dev); The kernel bundle class [4.11.12]
kernel_bundle<State> template<typename KernelName>
get_kernel_bundle(const context &ctxt, Class declaration
bool is_compatible(const device &dev); template<bundle_state State>class kernel_bundle;
const std::vector<device> &devs, Selector selector);
Members
template<bundle_state State> Joining kernel bundles [4.11.10] bool empty() const noexcept;
kernel_bundle<State> template<bundle_state State> kernel_bundle<State>
get_kernel_bundle(const context &ctxt); join(const std::vector<kernel_bundle<State>> &bundles); backend get_backend() const noexcept;
context get_context() const noexcept;
(Continued on next page)
©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl
SYCL 2020 API Reference Guide Page 15
Kernel bundles (cont.) The kernel class [4.11.13] Queries using get_info():
std::vector<device> backend get_backend() const noexcept; Descriptor Return type
get_devices() const noexcept; context get_context() const; info::kernel_device_specific::global_work_size range<3>
bool has_kernel(const kernel_id &kernelId) const noexcept; kernel_bundle<bundle_state::executable> info::kernel_device_specific::work_group_size size_t
bool has_kernel(const kernel_id &kernelId, const device &dev) get_kernel_bundle() const; info::kernel_device_specific::compile_work_group_size range<3>
const noexcept; template <typename param> info::kernel_device_specific:: preferred_work_ size_t
std::vector<kernel_id> typename param::return_type group_size_multiple
get_kernel_ids() const; get_info() const;
info::kernel_device_specific::private_mem_size size_t
template <typename param> info::kernel_device_specific::max_num_sub_groups uint_32
Available when State == bundle_state::executable typename param::return_type
kernel get_kernel(const kernel_id &kernelId) const; get_info(const device &dev) const; info::kernel_device_specific::compile_num_sub_groups uint_32
bool contains_specialization_constants() const noexcept; template <typename param> info::kernel_device_specific:: max_sub_group_size uint_32
bool native_specialization_constant() const noexcept; typename param::return_type info::kernel_device_specific:: compile_sub_group_size uint_32
get_backend_info() const;
template<auto& S>
bool has_specialization_constant() const noexcept; The device image class [4.11.14]
class declaration
Available when State == bundle_state::input template<bundle_state State>class device_image;
template<auto& S>
void set_specialization_constant(typename Members
std::remove_reference_t<decltype(S)>::type value); bool has_kernel(const kernel_id &kernelId) const noexcept;
template<auto& S> bool has_kernel(const kernel_id &kernelId, const device &dev)
typename std::remove_reference_t<decltype(S)>::type const noexcept;
get_specialization_constant() const;
device_image_iterator begin() const;
device_image_iterator end() const;

USM examples
Example with USM Shared Allocations Example with USM Device Allocations
#include <iostream> #include <iostream>
#include <sycl/sycl.hpp> #include <sycl/sycl.hpp>
using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names
int main() { int main() {

// Create default queue to enqueue work // Create default queue to enqueue work
queue myQueue; queue myQueue;

// Allocate shared memory bound to the device and context associated to the queue // Allocate shared memory bound to the device and context associated to the queue
// Replacing malloc_shared with malloc_host would yield a correct program that int *data = sycl::malloc_device<int>(1024, myQueue);
// allocated device-visible memory on the host.
int *data = sycl::malloc_shared<int>(1024, myQueue); myQueue.parallel_for(1024, [=](id<1> idx) {
// Initialize each buffer element with its own rank number starting at 0
myQueue.parallel_for(1024, [=](id<1> idx) { data[idx] = idx;
// Initialize each buffer element with its own rank number starting at 0 }); // End of the kernel function
data[idx] = idx;
}); // End of the kernel function myQueue.wait();

myQueue.wait(); int hostData[1024];


myQueue.memcpy(hostData, data, 1024*sizeof(int));
// Print result
for (int i = 0; i < 1024; i++) myQueue.wait();
std::cout <<''data[''<< i << ''] = '' << data[i] << std::endl;
// Print result
return 0; for (int i = 0; i < 1024; i++)
} std::cout <<''data[''<< i << ''] = '' << data[i] << std::endl;

return 0;
}

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl


Page 16 SYCL 2020 API Reference Guide
Examples of how to invoke kernels Parallel for hierarchical invoke [4.9.4.2.3]
In the following example we issue 8 work-groups but let the runtime choose their size, by not
Example: single_task invoke [4.9.4.2.1] passing a work-group size to the parallel_for_­work_group call. The parallel_for_work_item loops
SYCL provides a simple interface to enqueue a kernel that will be sequentially executed may also vary in size, with their execution ranges unrelated to the dimensions of the work-group,
on an OpenCL device. and the compiler generating an appropriate iteration space to fill the gap. In this case, the h_item
provides access to local ids and ranges that reflect both ker­nel and parallel_for_work_item
myQueue.submit([&](handler & cgh) { invocation ranges.
cgh.single_task(
[=] () { myQueue.submit([&](handler & cgh) {
// [kernel code] // Issue 8 work-groups of 8 work-items each
})); cgh.parallel_for_work_group(range<3>(2, 2, 2), range<3>(2, 2, 2), [=](group<3> myGroup) {
});
Examples: parallel_for invoke [4.9.4.2.2] // [workgroup code]
int myLocal; // this variable is shared between workitems
Example #1 // This variable will be instantiated for each work-item separately
Using a lambda function for a kernel invocation. This variant of parallel_for is designed private_memory<int> myPrivate(myGroup);
for when it is not necessary to query the global range of the index space being executed
across.
// Issue parallel work-items. The number issued per work-group is determined
myQueue.submit([&](handler & cgh) { // by the work-group size range of parallel_for_work_group. In this case, 8 work-items
accessor acc { myBuffer, cgh, write_only }; // will execute the parallel_for_work_item body for each of the 8 work-groups,
// resulting in 64 executions globally/total.
cgh.parallel_for(range<1>(numWorkItems), myGroup.parallel_for_work_item([&](h_item<3> myItem) {
[=] (id<1> index) { // [work-item code]
acc[index] = 42.0f; myPrivate(myItem) = 0;
});
}); });

Example #2 // Implicit work-group barrier


Invoking a SYCL kernel function with parallel_for using a lambda function and passing
an item parameter. This vari­ant of parallel_for is designed for when it is necessary to
query the global range of the index space being executed across.
// Carry private value across loops
myGroup.parallel_for_work_item([&](h_item<3> myItem) {
myQueue.submit([&](handler & cgh) { // [work-item code]
accessor acc { myBuffer, cgh, write_only }; output[myItem.get_global_id()] = myPrivate(myItem);
});
cgh.parallel_for(range<1>(numWorkItems), //[workgroup code]
[=] (item<1> item) { });
// kernel argument type is item });
size_t index = item.get_linear_id();
acc[index] = index;
});
});
Example #3
The following two examples show how a kernel function object can be launched over a
3D grid, with 3 elements in each dimension. In the first case work-item ids range from 0
to 2 inclusive, and in the second case work-item ids run from 1 to 3.
myQueue.submit([&](handler & cgh) {
cgh.parallel_for(
range<3>(3,3,3), // global range
[=] (item<3> it) {
//[kernel code]
});
});

Example #4
Launching sixty-four work-items in a three-dimensional grid with four in each
dimension and divided into eight work-groups.
myQueue.submit([&](handler & cgh) {
cgh.parallel_for(
nd_range<3>(range<3>(4, 4, 4), range<3>(2, 2, 2)), [=] (nd_item<3> item) {
// [kernel code]
// Internal synchronization
group_barrier(item.get_group());
// [kernel code]
});
});

The annual gathering of the international community of SYCL developers,


researchers, suppliers, and Khronos SYCL Working Group members to share
best practices and to advance the use and evolution of the SYCL standard
for C++ programming of heterogeneous platforms.
syclcon.org

© 2021 Khronos Group. All rights reserved. SYCL is a trademark of


the Khronos Group. The Khronos Group is an industry consortium
creating open standards for the authoring and acceleration of
parallel computing, graphics, dynamic media, and more on a wide
variety of platforms and devices. See www.khronos.org to learn
more about the Khronos Group. See www.khronos.org/sycl to
learn more about SYCL.

©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl

You might also like