Sycl 2020 Reference Guide
Sycl 2020 Reference Guide
SYCL™ (pronounced “sickle”) uses generic Developers program at a higher level than
programming to enable higher-level the native acceleration API, but always
application software to be cleanly coded have access to lower-level code through
with optimized acceleration of kernel code seamless integration with the native
across a range of devices. acceleration API.
Data access and storage [4.7] • Unsampled image accessors of two kinds:
- From within a kernel function or from within a host task
Access targets [4.7.6.9]
Buffers and images define storage and ownership. Accessors target::device buffer access from kernel function via
provide access to the data. (4.7.6.13, class unsampled_image_accessor).
device global memory
- From host code outside of a host task (4.7.6.13, class
Accessors [4.7.6] host_unsampled_image_accessor). target::host_task buffer access from a host task
Accessor classes and the objects they access: • Sampled image accessors of two kinds:
• Buffer accessor for commands (4.7.6.9, class accessor) with - From within a kernel function or from within a host task enum class access::address_space [4.7.7.1]
two uses: (4.7.6.14, class sampled_image_accessor).
- From host code outside of a host task (4.7.6.14, class global_space Accessible to all work-items in all work-groups
- access a buffer from a kernel function via device global
memory host_sampled_image_accessor). constant_space Global space that is constant
- access a buffer from a host task
enum class access_mode [4.7.6.2] local_space Accessible to all work-items in a single work-group
• Buffer accessor for host code outside of a command (4.7.6.10,
class host_accessor). read write read_write private_space Accessible to a single work-item
• Local accessor from within kernel functions (4.7.6.11, class generic_space Virtual address space overlapping global, local,
local_accessor). Accessor property class constructor [4.7.6.4] and private
This is used in all accessor classes.
property::no_init::no_init()
Buffer accessor for commands Available when dimensions == 0 Available when dimensions > 0
(class accessor) [4.7.6.9] template <typename AllocatorT> template <typename AllocatorT, typename TagT>
This one class provides two kinds of accessors depending on accessor(buffer<dataT, 1, AllocatorT> &bufferRef, accessor(buffer<dataT, dimensions, AllocatorT>
accessTarget: handler &commandGroupHandlerRef, &bufferRef, handler &commandGroupHandlerRef,
• target::device to access a buffer from a kernel function via
const property_list &propList = {}); TagT tag, const property_list &propList = {});
device global memory template <typename AllocatorT>
• target::host_task to access a buffer from a host task Available when dimensions > 0 accessor(buffer<dataT, dimensions, AllocatorT>
template <typename AllocatorT> &bufferRef, range<dimensions> accessRange,
Class declaration accessor(buffer<dataT, dimensions, const property_list &propList = {});
template <typename dataT, int dimensions, AllocatorT> &bufferRef, template <typename AllocatorT, typename TagT>
access_mode accessMode = const property_list &propList = {}); accessor(buffer<dataT, dimensions, AllocatorT>
(std::is_const_v<dataT> ? access_mode::read &bufferRef, range<dimensions> accessRange,
: access_mode::read_write), template <typename AllocatorT, typename TagT>
accessor(buffer<dataT, dimensions, TagT tag, const property_list &propList = {});
target accessTarget = target::device,
class accessor; AllocatorT> &bufferRef, TagT tag, template <typename AllocatorT>
const property_list &propList = {}); accessor(buffer<dataT, dimensions, AllocatorT>
Constructors and members template <typename AllocatorT> &bufferRef, range<dimensions> accessRange,
accessor(); accessor(buffer<dataT, dimensions, AllocatorT> id<dimensions> accessOffset,
Available when dimensions == 0 &bufferRef, const property_list &propList = {});
template <typename AllocatorT> handler &commandGroupHandlerRef,
accessor(buffer<dataT, 1, AllocatorT> &bufferRef, const property_list &propList = {});
const property_list &propList = {});
(Continued on next page)
Buffer accessor for host code outside of Available when dimensions > 0 Common interface functions [Table 79]
This class supports the following functions in addition to begin(),
a command (class host_accessor) [4.7.6.10] template <typename AllocatorT>
host_accessor( end(), cbegin(), cend(), rbegin(), rend(), crbegin(), and crend().
Class declaration buffer<dataT, dimensions, AllocatorT> &bufferRef, size_t byte_size() const noexcept;
template <typename dataT, int dimensions, range<dimensions> accessRange, size_t size() const noexcept;
access_mode accessMode = const property_list &propList = {});
size_t max_size() const noexcept;
(std::is_const_v<dataT> ? access_mode::read template <typename AllocatorT, typename TagT>
: access_mode::read_write)> host_accessor( bool empty() const noexcept;
class host_accessor; buffer<dataT, dimensions, AllocatorT> &bufferRef, range<dimensions> get_range() const;
range<dimensions> accessRange, TagT tag,
Constructors and members const property_list &propList = {}); Available when dimensions == 0
All constructors block until data is available from kernels that template <typename AllocatorT> operator reference() const;
access the same underlying buffer. host_accessor(
host_accessor(); buffer<dataT, dimensions, AllocatorT> &bufferRef, Available when dimensions > 0
range<dimensions> accessRange, reference operator[](id<dimensions> index) const;
Available when dimensions == 0 id<dimensions> accessOffset,
template <typename AllocatorT> const property_list &propList = {}); Available when dimensions > 1
host_accessor(buffer<dataT, 1, AllocatorT> &bufferRef, template <typename AllocatorT, typename TagT> __unspecified__ &operator[](size_t index) const;
const property_list &propList = {}); host_accessor(
buffer<dataT, dimensions, AllocatorT> &bufferRef,
range<dimensions> accessRange, Available when dimensions == 1)
Available when dimensions > 0
id<dimensions> accessOffset, TagT tag, reference operator[](size_t index) const;
template <typename AllocatorT> const property_list &propList = {});
host_accessor( std::add_pointer_t<value_type> get_pointer() const noexcept;
buffer<dataT, dimensions, AllocatorT> &bufferRef, id<dimensions> get_offset() const;
const property_list &propList = {}); Property class constructor [4.7.3.3]
void swap(host_accessor &other);
template <typename AllocatorT, typename TagT> property::no_init::no_init()
host_accessor(
buffer<dataT, dimensions, AllocatorT> &bufferRef,
TagT tag, const property_list &propList = {});
Class multi_ptr [4.7.7.1] Available if Space == address_space::generic_space && Cast to local_ptr, available if Space == address_space::generic_space
The address spaces are global_space, local_space, private_space, ASP != access::address_space::constant_space explicit operator multi_ptr<const value_type,
and generic_space. template<access::address_space ASP, access::address_space::local_space,
access::decorated IsDecorated> DecorateAddress>() const;
Class declaration multi_ptr &operator=(
template <typename ElementType, const multi_ptr<value_type, ASP, IsDecorated>&);
access::address_space Space, Implicit conversions to a multi_ptr
access::decorated DecorateAddress> template<access::address_space ASP,
access::decorated IsDecorated> Implicit conversion to a multi_ptr<void>. Only available when
class multi_ptr; value_type is not const-qualified.
multi_ptr &operator=(
Members: Constructors multi_ptr<value_type, ASP, IsDecorated>&&); template<access::decorated DecorateAddress>
multi_ptr(); reference operator*() const; operator multi_ptr<void, Space, DecorateAddress>() const;
multi_ptr(const multi_ptr&); pointer operator->() const; Implicit conversion to a multi_ptr<const void>. Only available
multi_ptr(multi_ptr&&); pointer get() const; when value_type is const-qualified.
explicit multi_ptr(multi_ptr<ElementType, Space, std::add_pointer_t<value_type> get_raw() const; template<access::decorated DecorateAddress>
yes>::pointer); operator multi_ptr<const void, Space,
__unspecified__ * get_decorated() const; DecorateAddress>() const;
multi_ptr(std::nullptr_t);
Members: Conversions Implicit conversion to multi_ptr<const value_type, Space>.
Available if Space == global_space or generic_space template<access::decorated DecorateAddress>
template <int dimensions, access::mode Mode, Cast to private_ptr, available if Space == address_space::generic_space
operator multi_ptr<const value_type, Space,
access::placeholder isPlaceholder> explicit operator multi_ptr<value_type, DecorateAddress>() const;
multi_ptr(accessor<ElementType, dimensions, Mode, access::address_space::private_space,
target::device, isPlaceholder>); DecorateAddress>(); Implicit conversion to the non-decorated version of multi_ptr.
explicit operator multi_ptr<const value_type, Only available when is_decorated is true.
Available if Space == global_space or generic_space access::address_space::private_space, operator multi_ptr<value_type, Space,
template <int dimensions> DecorateAddress>() const; access::decorated::no>() const;
multi_ptr(local_accessor<ElementType, dimensions>); Cast to global_ptr, available if Space == address_space::generic_space Implicit conversion to the decorated version of multi_ptr. Only
explicit operator multi_ptr<value_type, available when is_decorated is false.
Members: Assignment and access operators access::address_space::global_space, DecorateAddress>(); operator multi_ptr<value_type, Space,
multi_ptr &operator=(const multi_ptr&); access::decorated::yes>() const;
explicit operator multi_ptr<const value_type,
multi_ptr &operator=(multi_ptr&&); access::address_space::global_space, void prefetch(size_t numElements) const;
multi_ptr &operator=(std::nullptr_t); DecorateAddress>() const;
explicit operator multi_ptr<value_type, Members: Arithmetic operators
access::address_space::local_space, The multi_ptr class supports the standard arithmetic and
Class multi_ptr specialized for void and DecorateAddress>(); relational operators.
const void [4.7.7.1]
Class declaration Available if Space == global_space Members
template <access::address_space Space, template <typename ElementType, int dimensions, pointer get() const;
access::decorated DecorateAddress> access_mode Mode, access::placeholder isPlaceholder> explicit operator pointer() const;
class multi_ptr<VoidType, Space, DecorateAddress>
DecorateAddress: yes, no multi_ptr(accessor<ElementType, dimensions, Mode, template <typename ElementType>
VoidType: void or const void
target::device, isPlaceholder>); explicit operator multi_ptr<ElementType, Space,
Available if Space == local_space DecorateAddress>() const;
Members: Constructors template <typename ElementType, int dimensions> Only available when is_decorated is true.
multi_ptr();
multi_ptr(local_accessor<ElementType, dimensions>); operator multi_ptr<value_type, Space,
multi_ptr(const multi_ptr&); access::decorated::no>() const;
multi_ptr(multi_ptr&&); Assignment operators
explicit multi_ptr(multi_ptr<VoidType, Space, yes>::pointer); multi_ptr &operator=(const multi_ptr&);
multi_ptr &operator=(multi_ptr&&);
multi_ptr(std::nullptr_t);
multi_ptr &operator=(std::nullptr_t); (Continued on next page)
©2021 Khronos Group - Rev. 0721 www.khronos.org/sycl
SYCL 2020 API Reference Guide Page 7
multi_ptr specialized (cont.) Explicit pointer aliases [4.7.7.2] Sampler class enums [4.7.8]
Aliases to class multi_ptr for each specialization of The SYCL image_sampler struct contains a configuration for
Only available when is_decorated is false. access::address_space: sampling a sampled_image.
operator multi_ptr<value_type, Space, global_ptr
access::decorated::yes>() const; local_ptr struct image_sampler {
private_ptr addressing_mode addressing;
operator multi_ptr<const void, Space, DecorateAddress>() coordinate_mode coordinate;
const; filtering_mode filtering;
Aliases for non-decorated pointers: };
template <access::address_space Space, access::decorated raw_global_ptr
DecorateAddress, typename ElementType> raw_local_ptr addressing filtering
multi_ptr<ElementType, Space, DecorateAddress> raw_private_ptr
address_space_cast(ElementType *); mirrored_repeat nearest
repeat linear
Operators Aliases for decorated pointers: clamp_to_edge
decorated_global_ptr clamp coordinate
The multi_ptr class supports the standard arithmetic and none
relational operators. decorated_local_ptr normalized
decorated_private_ptr unnormalized
Tf acospi (Tf x) acos (x) / π Tf fmin (Tf x, Tf y) Return y if y < x, Tf remainder (Tf x, Tf y) Floating point remainder
Tf fmin (Tf x, sTf y) otherwise it returns x
Tf asin (Tf x) Arc sine Tf remquo (Tf x, Tf y, Ti *quo) Remainder and quotient
Tf fmod (Tf x, Tf y) Modulus. Returns x – y * trunc (x/y)
Tf asinh (Tf x) Inverse hyperbolic sine Tf rint (Tf x) Round to nearest even integer
Tf fract (Tf x, Tf *iptr) Fractional value in x
Tf asinpi (Tf x) asin (x) / π Tf rootn (Tf x, Ti y) Compute x to the power of 1/y
Tf frexp (Tf x, Ti *exp) Extract mantissa and exponent
Tf atan (Tf y_over_x) Arc tangent Tf round (Tf x) Integral value nearest to x
Tf hypot (Tf x, Tf y) Square root of x2 + y2 rounding
Tf atan2 (Tf y, Tf x) Arc tangent of y / x
Ti ilogb (Tf x) Return exponent as an integer value Tf rsqrt (Tf x)
Inverse square root
Tf atanh (Tf x) Hyperbolic arc tangent Tff rsqrt (Tff x) NH
Tf ldexp (Tf x, Ti k)
x * 2n Tf sin (Tf x)
Tf atanpi (Tf x) atan (x) / π doublen ldexp (doublen x, int k) Sine
Tff sin (Tff x) NH
Tf atan2pi (Tf y, Tf x) atan2 (y, x) / π Tf lgamma (Tf x) Log gamma function
Tf sincos (Tf x, Tf *cosval) Sine and cosine of x
Tf cbrt (Tf x) Cube root Tf lgamma_r (Tf x, Ti *signp) Log gamma function
Tf sinh (Tf x) Hyperbolic sine
Tf ceil (Tf x) Round to integer toward + infinity Tf log (Tf x)
Natural logarithm Tf sinpi (Tf x) sin (π x)
Tff log (Tff x) NH
Tf copysign (Tf x, Tf y) x with sign changed to sign of y
Tf sqrt (Tf x)
Tf log2 (Tf x) Square root
Tf cos (Tf x) Base 2 logarithm Tff sqrt (Tff x) NH
Cosine Tff log2 (Tff x) NH
Tff cos (Tff x) NH
Tf tan (Tf x)
Tf log10 (Tf x) Tangent
Tf cosh (Tf x) Hyperbolic cosine Base 10 logarithm Tff tan (Tff x) NH
Tff log10 (Tff x) NH
Tf cospi (Tf x) cos (π x) Tf tanh (Tf x) Hyperbolic tangent
Tf log1p (Tf x) ln (1.0 + x)
x/y Tf tanpi (Tf x) tan (π x)
Tff divide (Tff x, Tff y) NH Tf logb (Tf x) Return exponent as an integer value
(Not available in cl::sycl.)
Tf tgamma (Tf x) Gamma function
Tf mad (Tf a, Tf b, Tf c) Approximates a * b + c
Tf erfc (Tf x) Complementary error function
Tf trunc (Tf x) Round to integer toward zero
Tf maxmag (Tf x, Tf y) Maximum magnitude of x and y
Tf erf (Tf x) Calculates error function
Integer functions [4.17.6] uTint abs (Tint x) |x| Count of leading 0-bits in x, starting at
Integer functions are available in the namespace sycl. In all cases the most significant bit position. If x is
below, n may be 2, 3, 4, 8, or 16. If a type in the functions below Tint clz (Tint x) 0, returns the size in bits of the type
is shown with [xbit] in its name, this indicates that the type is x uTint abs_diff (Tint x, Tint y) | x – y | without modulo overflow of x or component type of x, if x is a
bits in size. Parameter types may also be their vec and marray vector type.
counterparts. Tint add_sat (Tint x, Tint y) x + y and saturates the result
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int, Count of trailing 0-bits in x. If x is 0,
char, char[n], signed char, scharn, ucharn, unsigned Tint ctz (Tint x) returns the size in bits of the type
short[n], unsigned short, ushort[n], longn, ulongn, long Tint hadd (Tint x, Tint y) (x + y) >> 1 without mod. overflow of x or component type of x, if x is a
int, unsigned long int, long long int, longlongn, ulonglongn vector type.
unsigned long long int. Tint rhadd (Tint x, Tint y) (x + y + 1) >> 1
uTint (ugeninteger) is type unsigned char, ucharn, Tint mad_hi (Tint a, Tint b, mul_hi(a, b) + c
unsigned short, ushortn, unsigned int, uintn, unsigned long int, Tint clamp (Tint x, Tint min, Tint c)
ulongn, ulonglongn, unsigned long long int. Tint max)
iTint (igeninteger) is type signed char, scharn, short[n], int[n], min(max(x, min), max) Tint mad_sat (Tint a, Tint b,
Tint clamp (Tint x, sTint min, a * b + c and saturates the result
long int, longn, long long int, longlongn. sTint max) Tint c)
sTint (sgeninteger) is type char, signed char, unsigned char, short,
unsigned short, int, unsigned int, long int, unsigned long int, (Continued on next page)
long long int, unsigned long long int.
Common functions [4.17.7] Relational built-in functions [4.17.9] int any (iTint x) 1 if MSB in component of x is
Common functions are available in the namespace sycl on host Relational functions are available in the namespace sycl on host set; else 0
and device. On the host the vector types use the vec class and on and device. In all cases below, n may be 2, 3, 4, 8, or 16. If a int all (iTint x) 1 if MSB in all components of x are
an OpenCL device use the corresponding OpenCL vector types. In type in the functions below is shown with [xbit] in its name, this set; else 0
all cases below, n may be 2, 3, 4, 8, or 16. The built-in functions indicates that the type is x bits in size. Each bit of result is corresponding
can take as input float or optionally double and their vec and T bitselect (T a, T b, T c) bit of a if corresponding bit of c is 0
Tint (geninteger in the spec) is type int[n], uint[n], unsigned int,
marray counterparts. char, char[n], signed char, scharn, ucharn, unsigned
Tf (genfloat in the spec) is type float[n], double[n], or half[n]. Tint select (Tint a, Tint b, iTint c)
short[n], unsigned short, ushort[n], longn, ulongn, long
Tff (genfloatf) is type float[n]. int, unsigned long int, long long int, longlongn, ulonglongn Tint select (Tint a, Tint b, uTint c) For each component of a vector
unsigned long long int. Tff select (Tff a, Tff b, Ti c) type, result[i] = if MSB of c[i] is set
Tfd (genfloatd) is type double[n]. Tff select (Tff a, Tff b, uTi c) ? b[i] : a[i] For scalar type, result
iTint (igeninteger) is type signed char, scharn, short[n], int[n], Tfd select (Tfd a, Tfd b, iTint64bit c) = c ? b : a
Tf clamp (Tf x, Tf minval, Tf maxval); long int, longn, long long int, longlongn.
Tff clamp (Tff x, float minval,float maxval); Clamp x to range given by Tfd select (Tfd a, Tfd b, uTint64bit c)
Tfd clamp (Tfd x, double minval, minval, maxval uTint (ugeninteger) is type unsigned char, ucharn, iTint32bit function (Tff x, Tff y)
doublen maxval); unsigned short, ushortn, unsigned int, uintn,
unsigned long int, ulongn, ulonglongn, unsigned long long int. iTint64bit function (Tfd x, Tfd y)
Tf degrees (Tf radians); radians to degrees function: isequal, isnotequal, isgreater, This format is used
Ti (genint) is type int[n]. isgreaterequal, isless, islessequal, islessgreater, for many relational
Tf abs (Tf x, Tf y); uTi (ugenint) is type unsigned int or uintn. isordered, isunordered. functions. Replace
Tff abs (Tff x, float y); Max of x and y function with the
Tfd abs (Tfd x, double y); Tff (genfloatf) is type float[n]. iTint32bit function (Tff x) function name.
Tf max (Tf x, Tf y); Tfd (genfloatd) is type double[n]. iTint64bit function (Tfd x)
Tff max (Tff x, float y); Max of x and y T (gentype) is type float[n], double[n], or half[n], or any type function: isfinite, isinf, isnan, isnormal, signbit.
Tfd max (Tfd x, double y); listed for above for Tint.
Tf min (Tf x, Tf y);
Tff min (Tff x, float y); Min of x and y
Tfd min (Tfd x, double y); Geometric Functions [4.17.8] Kernel attributes [5.8.1]
Geometric functions are available in the namespace sycl on Attributes are applied as shown in the following examples.
Tf mix (Tf x, Tf y, Tf a); host and device. The built-in functions can take as input float
Tff mix (Tff x, Tff y, float a); Linear blend of x and y or optionally double and their vec and marray counterparts, [=] (item<1> it) [[sycl::reqd_work_group_size(16)]] {
Tfd mix (Tfd x, Tfd y, double a) ; for dimensions 2, 3 and 4. On the host the vector types use the //[kernel code]
Tf radians (Tf degrees); degrees to radians vec class and on a SYCL device use the corresponding native }
Tf step (Tf edge, Tf x); SYCL backend vector types.
Tff step (float edge, Tff x); 0.0 if x < edge, else 1.0 Tgf (gengeofloat in the spec) is type float, float2, float3, float4. void operator()(item<1> it) const [[sycl::reqd_work_
Tfd step (double edge, Tfd x); Tgd (gengeodouble) is type double, double2, double3, group_size(16)]] {
Tf smoothstep (Tf edge0, Tf edge1, Tf x); double4. //[kernel code]
};
Tff smoothstep (float edge0, float edge1,
Tff x); Step and interpolate float4 cross (float4 p0, float4 p1) Attributes
Tfd smoothstep (double edge0, float3 cross (float3 p0, float3 p1) reqd_work_group_size(dim0)
Cross product
double edge1, Tfd x); double4 cross (double4 p0, double4 p1)
double3 cross (double3 p0, double3 p1) reqd_work_group_size(dim0, dim1)
Tf sign (Tf x); Sign of x reqd_work_group_size(dim0, dim1, dim2)
float distance (Tgf p0, Tgf p1) work_group_size_hint(dim0)
Vector distance
double distance (Tgd p0, Tgd p1)
work_group_size_hint(dim0, dim1)
float dot (Tgf p0, Tgf p1) work_group_size_hint(dim0, dim1, dim2)
Dot product
double dot (Tgd p0, Tgd p1) vec_type_hint(<type>)
Preprocessor directives and macros [5.6] reqd_sub_group_size(dim)
float length (Tgf p)
SYCL_LANGUAGE_VERSION Integer version, e.g.: 202002 Vector length
double length (Tgd p) Device function attributes [5.8.2]
The attribute below is applied to the declaration of a non-
__ SYCL_DEVICE_ONLY__ ==1 when in device compilation Tgf normalize (Tgf p) kernel device function.
Normal vector length 1
Tgd normalize (Tgd p) sycl::requires(has(aspect,...))
== 1 when producing both host and
__ SYCL_SINGLE_SOURCE__
device binary float fast_distance (Tgf p0, Tgf p1) Vector distance
Notes
USM examples
Example with USM Shared Allocations Example with USM Device Allocations
#include <iostream> #include <iostream>
#include <sycl/sycl.hpp> #include <sycl/sycl.hpp>
using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names using namespace sycl; // (optional) avoids need for "sycl::" before SYCL names
int main() { int main() {
// Create default queue to enqueue work // Create default queue to enqueue work
queue myQueue; queue myQueue;
// Allocate shared memory bound to the device and context associated to the queue // Allocate shared memory bound to the device and context associated to the queue
// Replacing malloc_shared with malloc_host would yield a correct program that int *data = sycl::malloc_device<int>(1024, myQueue);
// allocated device-visible memory on the host.
int *data = sycl::malloc_shared<int>(1024, myQueue); myQueue.parallel_for(1024, [=](id<1> idx) {
// Initialize each buffer element with its own rank number starting at 0
myQueue.parallel_for(1024, [=](id<1> idx) { data[idx] = idx;
// Initialize each buffer element with its own rank number starting at 0 }); // End of the kernel function
data[idx] = idx;
}); // End of the kernel function myQueue.wait();
return 0;
}
Example #4
Launching sixty-four work-items in a three-dimensional grid with four in each
dimension and divided into eight work-groups.
myQueue.submit([&](handler & cgh) {
cgh.parallel_for(
nd_range<3>(range<3>(4, 4, 4), range<3>(2, 2, 2)), [=] (nd_item<3> item) {
// [kernel code]
// Internal synchronization
group_barrier(item.get_group());
// [kernel code]
});
});