Metal Shading Language Specification
Metal Shading Language Specification
Specification
Version 2.4
Developer
Contents
1 Introduction .................................................................................................... 9
1.1 Purpose of This Document........................................................................................................9
1.2 Organization of This Specification ...........................................................................................9
1.3 References ..................................................................................................................................9
1.4 Metal and C++14 ...................................................................................................................... 10
1.4.1 Overloading ........................................................................................................................ 10
1.4.2 Templates ........................................................................................................................... 10
1.4.3 Preprocessing Directives .................................................................................................. 10
1.4.4 Restrictions ......................................................................................................................... 10
1.5 Compiler and Preprocessor .................................................................................................... 11
1.5.1 Preprocessor Compiler Options ....................................................................................... 11
1.5.2 Preprocessor Definitions................................................................................................... 11
1.5.3 Math Intrinsics Compiler Options..................................................................................... 12
1.5.4 Invariance Compiler Options ............................................................................................ 12
1.5.5 Texture Write Rounding Mode ......................................................................................... 13
1.5.6 Compiler Options to Enable Modules .............................................................................. 13
1.5.7 Compiler Options Controlling the Language Version .................................................... 14
1.5.8 Compiler Options to Request or Suppress Warnings .................................................... 15
1.5.9 Target Conditionals ........................................................................................................... 15
1.5.10 Dynamic Library Linker options ........................................................................................ 15
1.6 Metal Coordinate Systems ...................................................................................................... 16
2 Data Types ..................................................................................................... 18
2.1 Scalar Data Types .................................................................................................................... 18
2.2 Vector Data Types .................................................................................................................. 20
2.2.1 Accessing Vector Components........................................................................................ 21
2.2.2 Vector Constructors ......................................................................................................... 24
2.2.3 Packed Vector Types ....................................................................................................... 25
2.3 Matrix Data Types ....................................................................................................................27
2.3.1 Accessing Matrix Components ....................................................................................... 29
2.3.2 Matrix Constructors .......................................................................................................... 29
2.4 SIMD-group Matrix Data Types ............................................................................................. 30
2.5 Alignment of Data Types ........................................................................................................ 30
2.6 Atomic Data Types ................................................................................................................... 31
2.7 Pixel Data Types ....................................................................................................................... 31
2.8 Buffers ...................................................................................................................................... 33
2.9 Textures ................................................................................................................................... 34
2.9.1 Texture Buffers ................................................................................................................. 35
2.10 Samplers .................................................................................................................................. 36
2.11 Imageblocks............................................................................................................................. 39
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 2 of 249
2.12 Aggregate Types ...................................................................................................................... 41
2.12.1 Arrays of Textures, Texture Buffers, and Samplers ....................................................... 41
2.12.1.1 Array Element Access with its Operator .................................................................. 42
2.12.1.2 Array Capacity ............................................................................................................ 42
2.12.1.3 Constructors for Templated Arrays .......................................................................... 42
2.12.2 Structures of Buffers, Textures, and Samplers ............................................................. 44
2.13 Argument Buffers .................................................................................................................... 45
2.13.1 Tier 2 Hardware Support for Argument Buffers ............................................................ 47
2.14 Uniform Type ........................................................................................................................... 48
2.14.1 The Need for a Uniform Type .......................................................................................... 48
2.14.2 Behavior of the Uniform Type ......................................................................................... 50
2.14.3 Uniform Control Flow ........................................................................................................ 51
2.15 Visible Function Table ............................................................................................................. 51
2.16 Function Group Attribute ........................................................................................................ 52
2.17 Ray Tracing Types .................................................................................................................. 53
2.17.1 Ray Tracing Intersection Tags ......................................................................................... 53
2.17.2 Ray Type ............................................................................................................................ 55
2.17.3 Intersection Function Table ............................................................................................. 55
2.17.4 Intersection Result Type .................................................................................................. 56
2.17.5 Intersector Type .................................................................................................................57
2.17.6 Acceleration Structure Type .............................................................................................57
2.17.7 Intersection Query Type................................................................................................... 59
2.18 Interpolant Type ...................................................................................................................... 59
2.19 Type Conversions and Re-interpreting Data ....................................................................... 60
2.20 Implicit Type Conversions ....................................................................................................... 61
3 Operators ...................................................................................................... 62
3.1 Scalar and Vector Operators ................................................................................................. 62
3.2 Matrix Operators ..................................................................................................................... 65
4 Address Spaces ............................................................................................ 68
4.1 device Address Space ............................................................................................................ 68
4.2 constant Address Space ........................................................................................................ 69
4.3 thread Address Space ............................................................................................................ 70
4.4 threadgroup Address Space .................................................................................................. 70
4.4.1 SIMD-groups and Quad-groups ...................................................................................... 71
4.5 threadgroup_imageblock Address Space ............................................................................. 71
4.6 ray_data Address Space .........................................................................................................72
5 Function and Variable Declarations .............................................................. 73
5.1 Functions.................................................................................................................................. 73
5.1.1 Vertex Functions ............................................................................................................... 73
5.1.1.1 Post-Tessellation Vertex Functions ......................................................................... 74
5.1.1.2 Patch Type and Number of Control Points Per-Patch ........................................... 74
1.3 References
C++14
Stroustrup, Bjarne. The C++ Programming Language (Fourth Edition). Harlow: Addison-Wesley,
2013.
1.4.1 Overloading
Metal supports overloading, as defined by section 13 of the C++14 Specification. Metal extends
the function overloading rules to include the address space attribute of an argument. You
cannot overload Metal graphics and kernel functions. (For a definition of graphics and kernel
functions, see section 5.1 of this document.)
1.4.2 Templates
Metal supports templates, as defined by section 14 of the C++14 Specification.
1.4.4 Restrictions
The following C++14 features are not available in Metal (section numbers in this list refer to the
C++14 Specification):
• lambda expressions (section 5.1.2)
• recursive function calls (section 5.2.2, item 9)
• dynamic_cast operator (section 5.2.7)
• type identification (section 5.2.8)
• new and delete operators (sections 5.3.4 and 5.3.5)
• noexcept operator (section 5.3.7)
• goto statement (section 6.6)
• register, thread_local storage attributes (section 7.1.1)
• virtual function attribute (section 7.1.2)
For more about math functions, see section 6.5. For more about the relative errors of ordinary
and fast math functions, see section 7.4.
These options enable (the default) or disable the optimizations for floating-point arithmetic that
may violate the IEEE 754 standard. They also enable or disable the high precision variant of
math functions for single precision floating-point scalar and vector types.
The optimizations for floating-point arithmetic include:
• No NaNs : Allow optimizations to assume the arguments and result are not NaN ("Not a
Number").
• No INFs: Allow optimizations to assume the arguments and result are not positive or
negative infinity.
• No Signed Zeroes: Allow optimizations to treat the sign of a zero argument or result as
insignificant.
• Allow Reciprocal: Allow optimizations to use the reciprocal of an argument rather than
perform a division.
• Fast: Allow algebraically equivalent transformations, such as reassociating floating-point
operations that may dramatically change the floating-point results.
native
Texture writes use the hardware’s native rounding strategy.
(default)
rte
Texture writes round to the nearest even.
All OS: Since Metal 2.3.
rtz
Texture writes round to zero.
All OS: Since Metal 2.3.
-install_name
The rasterizer stage transforms normalized-device coordinates (NDC) into viewport coordinates
(see Figure 2). The (x,y) coordinates in this space are measured in pixels, with the origin in the
top-left corner of the viewport and positive values going to the right and down. You specify
viewports in this coordinate space, and the Metal maps NDC coordinates to the extents of the
viewport.
If you are using variable rasterization rate (see Section 6.15), then the viewport coordinate
system is a logical coordinate system independent of the render target’s physical layout. A rate
map determines the relationship between coordinates in this logical coordinate system
(sometimes called screen space) and pixels in the render targets (physical coordinates).
Metal supports:
• the f or F suffix to specify a single precision floating-point literal value (such as 0.5f or
0.5F).
• the h or H suffix to specify a half precision floating-point literal value (such as 0.5h or 0.5H).
• the u or U suffix for unsigned integer literals.
• the l or L suffix for signed long integer literals.
Table 2.2 lists the size and alignment of most of the scalar data types.
short" 2 2
int16_t"
unsigned short"
ushort"
uint16_t
int" 4 4
int32_t"
unsigned int"
uint"
uint32_t
Metal also supports vec<T,n> where T is a valid scalar type and n is 2, 3, or 4, representing a 2-
, 3-, or 4- component vector type.
Table 2.3 lists the size and alignment of the vector data types.
char3" 4 4
uchar3
char4" 4 4
uchar4
short2" 4 4
ushort2
short3" 8 8
ushort3
short4" 8 8
ushort4
int2" 8 8
uint2
int3" 16 16
uint3
int4" 16 16
uint4
long2 16 16
ulong2
long3 32 32
ulong3
long4 32 32
ulong4
half2 4 4
half3 8 8
half4 8 8
float2 8 8
float3 16 16
float4 16 16
Metal supports using a period (.) as a selection operator to access vector components, using
letters that may indicate coordinate or color data:
<vector_data_type>.xyzw
<vector_data_type>.rgba
The following code initializes a vector test and then uses the .xyzw or .rgba selection syntax
to access individual components:
int4 test = int4(0, 1, 2, 3);
int a = test.x; // a = 0
int b = test.y; // b = 1
int c = test.z; // c = 2
int d = test.w; // d = 3
int e = test.r; // e = 0
int f = test.g; // f = 1
int g = test.b; // g = 2
int h = test.a; // h = 3
The component selection syntax also allows the permutation or replication of components:
The component group notation can occur on the left-hand side (lvalue) of an expression. To
form the lvalue, you may apply swizzling. The resulting lvalue may be either the scalar or vector
type, depending on number of components specified. Each component must be a supported
scalar or vector type. The resulting lvalue of vector type must not contain duplicate
components.
float4 pos = float4(1.0f, 2.0f, 3.0f, 4.0f);
// pos = (5.0, 2.0, 3.0, 6.0)
pos.xw = float2(5.0f, 6.0f);
When assigning a swizzled value to a variable, the GPU may need to read the existing value,
modify it, and write the result back. The assignment to pos.xw in the example above, causes
the GPU to load the float4 value, shuffle values 5.0f and 6.0f into it, and the write back the
result back into pos. If two threads write to different components of the vector at the same time,
the result is undefined.
The following methods of vector component access are not permitted and result in a compile-
time error:
• Accessing components beyond those declared for the vector type is an error.
2-component vector data types can only access .xy or .rg elements. 3-component vector
data types can only access .xyz or .rgb elements.
float2 pos; // This is a 2-component vector.
pos.x = 1.0f; // x is legal and so is y.
pos.z = 1.0f; // z is illegal and so is w. z is the 3rd component.
float3 pos; // This is a 3-component vector.
pos.z = 1.0f; // z is legal for a 3-component vector.
pos.w = 1.0f; // This is illegal. w is the 4th component.
• Accessing the same component twice on the left-hand side is ambiguous and is an error:
// This is illegal because 'x' is used twice.
pos.xx = float2(3.0f, 4.0f);
• Accessing a different number of components is an error:
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 23 of 249
// This is illegal due to a mismatch between float2 and float4.
pos.xy = float4(1.0f, 2.0f, 3.0f, 4.0f);
• Intermixing the .rgba and .xyzw syntax in a single access is an error:
float4 pos = float4(1.0f, 2.0f, 3.0f, 4.0f);
pos.x = 1.0f; // OK
pos.g = 2.0f; // OK
The sizeof operator on a vector type returns the size of the vector, which is given as the
number of components * size of each component. For example, sizeof(float4) returns 16
and sizeof(half4) returns 8.
Table 2.4 lists the size and alignment of the packed vector data types.
packed_char3, 3 1
packed_uchar3
packed_char4, 4 1
packed_uchar4
packed_short2, 4 2
packed_ushort2
packed_short3, 6 2
packed_ushort3
packed_short4, 8 2
packed_ushort4
packed_int2, 8 4
packed_uint2
packed_int3, 12 4
packed_uint3
packed_int4, 16 4
packed_uint4
packed_half2 4 2
packed_half3 6 2
packed_half4 8 2
packed_float2 8 4
packed_float3 12 4
Packed vector data types are typically used as a data storage format. Metal supports the
assignment, arithmetic, logical, relational, and copy constructor operators for packed vector
data types. Metal also supports loads and stores from a packed vector data type to an aligned
vector data type and vice-versa.
Examples:
device float4 *buffer;
device packed_float4 *packed_buffer;
int i;
packed_float4 f ( buffer[i] );
pack_buffer[i] = buffer[i];
You can use an array index to access components of a packed vector data type. Since Metal
2.1, you can use .xyzw or .rgba selection syntax to access components of a packed vector
data type. The semantics and restrictions when swizzling for packed vector data type are the
same as for vector types.
Example:
packed_float4 f;
f[0] = 1.0f; // OK
f.x = 1.0f; // OK since Metal 2.1
Metal also supports matrix<T,c, r> where T is a valid floating point type, c is 2, 3, or 4, and
r is 2, 3, or 4.
Table 2.5 lists the size and alignment of the matrix data types.
You can access floatnxm and halfnxm matrices as an array of n floatm or n halfm entries.
Accessing a component outside the bounds of a matrix with a non-constant expression results
in undefined behavior. Accessing a matrix component that is outside the bounds of the matrix
with a constant expression generates a compile-time error.
You can also construct a matrix from another matrix that has the same number of rows and
columns. For example:
float3x4(float3x4);
float3x4(half3x4);
Since Metal 2.0, a matrix of type T with n columns and m rows can also be constructed from n *
m scalars of type T. The following examples are legal constructors:
float2x2(float, float, float, float);
float3x2(float, float, float, float, float, float);
The following are examples of matrix constructors that are not supported. A matrix cannot be
constructed from combinations of vectors and scalars.
// Not supported.
float2x3(float2 a, float b, float2 c, float d);
Metal supports the following SIMD-group matrix type names, where T is half or float and
Cols and Rows are 8:
• simdgroup_half8x8
• simdgroup_float8x8
The mapping of matrix elements to threads in the SIMD-group is unspecified. For a description
of what functions are supported on SIMD-group matrices, see section 6.7
Only assignments and equality/inequality comparisons between the pixel data types and their
corresponding ALU types are allowed. (The following examples show the buffer(n) attribute,
which is explained in section 5.2.1.)
Example:
kernel void
my_kernel(device rgba8unorm<half4> *p [[buffer(0)]],
uint gid [[thread_position_in_grid]], …)
{
rgba8unorm<half4> x = p[index]; half4 val = p[gid];
…
p[gid] = val;
p[index] = x;
}
Example:
struct Foo {
rgba8unorm<half4> a;
};
2.8 Buffers
MSL implements a buffer as a pointer to a built-in or user defined data type described in the
device, constant, or threadgroup address space. (For more about these address space
attributes, see sections 4.1, 4.2, and 4.4, respectively.)
Ordinary Metal buffers may contain:
For more about the buffer(n) attribute used in the example, see section 5.2.1.
For details about argument buffers, see section 2.13.
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 33 of 249
2.9 Textures
The texture data type is a handle to one-, two-, or three-dimensional texture data that
corresponds to all or a portion of a single mipmap level of a texture. The following templates
define specific texture data types:
enum class access { sample, read, write, read_write };
texture1d<T, access a = access::sample>
texture1d_array<T, access a = access::sample>
texture2d<T, access a = access::sample>
texture2d_array<T, access a = access::sample>
texture3d<T, access a = access::sample>
texturecube<T, access a = access::sample>
texturecube_array<T, access a = access::sample>
texture2d_ms<T, access a = access::read>
texture2d_ms_array<T, access a = access::read>
You must declare textures with depth formats as one of the following texture data types:
depth2d<T, access a = access::sample>
depth2d_array<T, access a = access::sample>
depthcube<T, access a = access::sample>
depthcube_array<T, access a = access::sample>
depth2d_ms<T, access a = access::read>
depth2d_ms_array<T, access a = access::read>
macOS supports texture2d_ms_array and depth2d_ms_array since Metal 2.0. All other
types supported since Metal 1.0.
iOS supports all types except texture2d_ms_array and depth2d_ms_array since Metal 1.0.
T specifies the color type of one of the components returned when reading from a texture or the
color type of one of the components specified when writing to the texture. For texture types
(except depth texture types), T can be half, float, short, ushort, int, or uint. For depth
texture types, T must be float.
If T is int or short, the data associated with the texture must use a signed integer format. If T
is uint or ushort, the data associated with the texture must use an unsigned integer format. If
T is half, the data associated with the texture must either be a normalized (signed or unsigned
integer) or half-precision format. If T is float, the data associated with the texture must either
be a normalized (signed or unsigned integer), half or single-precision format.
The following example uses access qualifiers with texture object arguments:
void foo (texture2d<float> imgA [[texture(0)]],
texture2d<float, access::read> imgB [[texture(1)]],
texture2d<float, access::write> imgC [[texture(2)]])
{…}
(For a description of the texture attribute, see section 5.2.1.)
You can use a texture type as the variable type for any variables declared inside a function. The
access attribute for variables of texture type declared inside a function must be access::read
or access:sample. Declaring variables inside a function to be a texture type without using
access::read or access:sample qualifiers causes a compilation error.
Examples:
void foo (texture2d<float> imgA [[texture(0)]],
texture2d<float, access::read> imgB [[texture(1)]],
texture2d<float, access::write> imgC [[texture(2)]])
{
texture2d<float> x = imgA; // OK
texture2d<float, access::read> y = imgB; // OK
texture2d<float, access::write> z; // This is illegal.
…
}
" Reading the pixel data (just like any other array) from a texture object and performing the
pixel transformation to the desired format.
" Wrapping a texture object around the data of a buffer object, and then accessing the
shared buffer data via the texture. This wrapping technique provides the pixel
conversion, but requires an extra processing step, and the size of the texture is limited.
The following template defines the opaque type texture_buffer, which you can use like any
texture:
texture_buffer<T, access a = access::read>
access can be one of read, write, or read_write.
T specifies the type of a component returned when reading from a texture buffer or the type of
component specified when writing to a texture buffer. For a texture buffer, T can be one of
half, float, short, ushort, int, or uint.
For a format without an alpha channel (such as R, RG, or RGB), an out-of-bounds read returns
(0, 0, 0, 1). For a format with alpha (such as RGBA), an out-of-bounds read returns (0, 0, 0, 0).
For some devices, an out-of-bounds read might have a performance penalty.
An out-of-bounds write is ignored.
A texture buffer can support more texture data than a generic 1D texture, which has is a
maximum width of 16384. However, you cannot sample a texture buffer.
A texture buffer also converts data, delivering it in the requested texture format, regardless of
the source’s format. When creating a texture buffer, you can specify the format of the data in
the buffer (for example, RGBA8Unorm), and later the shader function can read it as a !
converted type (such as float4). As a result, a single pipeline state object can access data
stored in different pixel formats without recompilation.
A texture buffer, like a texture type, can be declared as the type of a local variable to a shader
function. For information about arrays of texture buffers, see section 2.12.1. For more about
texture buffer, see section 6.12.16.
2.10 Samplers
The sampler type identifies how to sample a texture. The Metal API allows you to create a
sampler object and pass it in an argument to a graphics or kernel function. You can describe a
sampler object in the program source instead of in the API. For these cases, you can only
specify a subset of the sampler state: the addressing mode, filter mode, normalized
coordinates, and comparison function.
Table 2.7 lists the supported sampler state enumerations and their associated values (and
defaults). You can specify these states when a sampler is initialized in Metal program source.
macOS: Supports clamp_to_border address mode and border_color since Metal 1.2.
iOS: Supports clamp_to_border address mode or border_color since Metal 2.3.
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 37 of 249
With clamp_to_border, sampling outside a texture only uses the border color for the texture
coordinate (and does not use any colors at the edge of the texture). If the address mode is
clamp_to_border, then border_color is valid.
clamp_to_zero is equivalent to clamp_to_border with a border color of transparent_black
(0.0, 0.0, 0.0) with the alpha component value from the texture. If clamp_to_zero is the
address mode for one or more texture coordinates, the other texture coordinates can use an
address mode of clamp_to_border if the border color is transparent_black. Otherwise, the
behavior is undefined.
If coord is set to pixel, the min_filter and mag_filter values must be the same, the
mip_filter value must be none, and the address modes must be either clamp_to_zero,
clamp_to_border, or clamp_to_edge.
In addition to the enumeration types, you can also specify the maximum anisotropic filtering and
an LOD (level-of-detail) range for a sampler:
max_anisotropy(int value)
lod_clamp(float min, float max)
The following Metal program source illustrates several ways to declare samplers. (The
sampler(n) attribute that appears in the code below is explained in section 5.2.1.) Note that
samplers or constant buffers declared in program source do not need these attribute qualifiers.
You must use constexpr to declare samplers that are initialized in MSL source.
constexpr sampler s(coord::pixel,
address::clamp_to_zero,
filter::linear);
2.11 Imageblocks
iOS: Supports imageblocks since Metal 2.0.
macOS: Supports imageblocks since Metal 2.3.
An imageblock is a 2D data structure (represented by width, height, and number of samples)
allocated in threadgroup memory that is an efficient mechanism for processing 2D image data.
Each element of the structure can be a scalar or vector integer or floating-point data type, pixel
data types (specified in Table 2.6 in section 2.7), an array of these types, or structures built
using these types. The data layout of the imageblock is opaque. You can use an (x, y)
coordinate and optionally the sample index to access the elements in the imageblock. The
elements in the imageblock associated with a specific (x, y) are the per-thread imageblock data
or just the imageblock data.
Section 5.6 details imageblock attributes, including the [[imageblock_data(type)]]
attribute. Section 6.13 lists the built-in functions for imageblocks.
Imageblocks are only used with fragment and kernel functions. Sections 5.6.3 and 5.6.4
describe how to access an imageblock in a fragment or kernel function, respectively.
For fragment functions, you can access only the fragment’s imageblock data (identified by the
fragment’s pixel position in the tile). Use the tile size to derive the imageblock dimensions.
For kernel functions, all threads in the threadgroup can access the imageblock. You typically
derive the imageblock dimensions from the threadgroup size, before you specify the
imageblock dimensions.
An imageblock slice refers to a region in the imageblock that describes the values of a given
element in the imageblock data structure for all pixel locations or threads in the imageblock.
The storage type of the imageblock slice must be compatible with the texture format of the
target texture, as listed in Table 2.8.
r8snorm<T> R8Snorm
r16unorm<T> R16Unorm
r16snorm<T> R16Snorm
rg8unorm<T> RG8Unorm
rg8snorm<T> RG8Snorm
rg16unorm<T> RG16Unorm
rg16snorm<T> RG16Snorm
rgba16unorm<T> RGBA16Unorm
rgba16snorm<T> RGBA16Snorm
rgb10a2<T> RGB10A2Unorm
rg11b10f<T> RG11B10Float
template<size_t N>
constexpr array_ref(const T(&a)[N]);
template<typename T>
constexpr array_ref<T> make_array_ref(const T * array, size_t length)
kernel void
my_kernel_A(const array<texture2d<float>, 10> src [[texture(0)]],
texture2d<float, access::write> dst [[texture(10)]],
…)
{
float4 clr = foo(src);
…
}
kernel void
my_kernel_B(const array<texture2d<float>, 20> src [[texture(0)]],
texture2d<float, access::write> dst [[texture(10)]],
kernel void
my_kernel(Foo f)
{…}
struct Bar {
Foo f;
sampler s [[sampler(0)]];
};
kernel void
my_kernel(Bar b)
{…}
Below are examples of invalid use-cases that shall result in a compilation error:
struct MyResources {
texture2d<float> a [[texture(0)]];
depth2d<float> b [[texture(1)]];
int c;
};
kernel void
my_kernel(MyResources r) // This is an illegal use.
{…}
Arrays of textures and samplers can be declared using the existing array<T, N> templated
type. Arrays of all other legal buffer types can also be declared using C-style array syntax.
Members of argument buffers can be assigned a generic [[id(n)]] attribute, where n is a 32-
bit unsigned integer that can be used to identify the buffer element from the Metal API.
Argument buffers can be distinguished from regular buffers if they contain buffers, textures,
samplers, or any element with the [[id]] attribute.
If you omit the [[id]] attribute, an ID is automatically assigned according to the following
rules:
1. IDs are assigned to structure members in order, by adding 1 to the maximum ID used by
the previous structure member. In the example below, the indices are not provided, so
indices 0 and 1 are automatically assigned.
struct MaterialTexture {
texture2d<float> tex; // Assigned index 0
float4 uvScaleOffset; // Assigned index 1
};
2. IDs are assigned to array elements in order, by adding 1 to the maximum ID used by the
previous array element. In the example below, indices 1-3 are automatically assigned to
the three array elements of texs1. Indices 4-5 are automatically assigned to the fields in
materials[0], indices 6-7 to materials[1], and indices 8-9 to materials[2]. The
[[id(20)]] attribute starts by assigning index 20 to constants.
struct Material {
float4 diffuse; // Assigned index 0
array<texture2d<float>, 3> texs1; // Assigned indices 1-3
MaterialTexture materials[3]; // Assigned indices 4-9
int constants [[id(20)]] [4]; // Assigned indices 20-23
};
3. If a structure member or array element E is itself a structure or array, its structure
members or array elements are assigned indices according to rules 1 and 2 recursively,
starting from the ID assigned to E. In the following example, index 4 is explicitly provided
for the nested structure called normal, so its elements (previously defined as tex and
uvScaleOffset) are assigned IDs 4 and 5, respectively. The elements of the nested
struct TStruct {
texture2d<float> tex;
};
kernel void
kern(constant TStruct *textures [[buffer(0)]]);
To support GPU driven pipelines and indirect draw calls and dispatches, you can copy
resources between structures and arrays within a function, as shown below:
kernel void
copy(constant Foo & src [[buffer(0)]],
device Foo & dst [[buffer(1)]])
{
dst.a = src.d;
…
}
If the variable i has the same value for all threads (is uniform) executing the graphics or kernel
function of a draw or dispatch call and if this information was communicated to the hardware,
then the texture sampling hardware can apply appropriate optimizations. A similar argument
can be made for texture writes, where a variable computed at runtime is used as an index into
an array of textures or to index into one or more buffers.
To indicate that this variable is uniform for all threads executing the graphics or kernel function
of a draw or dispatch call, MSL adds a new template class called uniform (available in the
header metal_uniform) that can be used to declare variables inside a graphics or kernel
function. This template class can only be instantiated with arithmetic types (such as Boolean,
integer, and floating point) and vector types.
The code below is a modified version of the previous example, where the variable i is declared
as a uniform type:
kernel void
my_kernel(array<texture2d<float>, 10> texInput,
array<texture2d<float>, 10> texOutput,
sampler s,
…,
uint2 gid [[thread_position_in_grid]])
{
uniform<int> i = …;
float4 color = texInput[i].sample(s, float2(gid));
…;
texOutput[i].write(color, float2(gid));
The uniform type is legal for both parameters and the return type of a function. For example:
uniform<int> foo(…); // foo returns a uniform integer value
int bar(uniform<int> a, …);
It is legal to declare a pointer to a uniform type, but not legal to declare a uniform pointer. For
example:
device uniform<int> *ptr; // values pointed to by ptr are uniform
uniform<device int *> ptr; // illegal; compile-time error
The results of expressions that combine uniform with non-uniform variables are non- uniform. If
the non-uniform result is assigned to a uniform variable, as in the example below, the behaviors
is undefined. (The front-end might generate a compile-time error, but it is not guaranteed to do
so.)
uniform<int> i = …;
int j = …;
if (i < j) { // non-uniform result for expression (i < j)
…
i++; // compile-time error, undefined behavior
}
The following example shows how to declare a table that is compatible with a function whose
definition is “[[visible]] int func(float f)”.
visible_function_table<int(float)> functions;
To get a visible function pointer from the table, use the [] operator.
using fnptr = T (*)(…) [[visible]]
fnptr operator[](uint index) const;
The following example shows how the table can be passed in a buffer.
• expression statements -- It affects all the indirect call expressions of the given
expression.
• return statements -- It affects all the indirect call expressions of the return value
expression.
instancing The intersection functions declared with this tag can read the built-in
instance_id as described in section 5.2.3.7.
The intersector<intersection_tags...>::intersect()
function and intersection_query< intersection_tags...>
assume that the acceleration structure must be an
instance_acceleration_structure and it returns the
instance_id value.
triangle_data
The triangle intersection functions declared with this tag can read
input parameters with barycentric_coord or front_facing
attribute as described in section 5.2.3.7.
The intersector<intersection_tags...>::intersect()
function and intersection_query< intersection_tags...>
returns the triangle_barycentrics and triangle_front_facing
values.
world_space_data The intersection functions declared with this tag can query
world_space_origin and world_space_direction as described in
section 5.2.3.7.
primitive_motion The intersection functions declared with this tag support primitive
All OS: Since Metal 2.4. level interpolation motion blur.
instance_motion The intersection functions declared with this tag support instance
All OS: Since Metal 2.4 level interpolation motion blur.
The ray’s origin and direction field are in world space. When a ray object is passed into a
custom intersection or triangle intersection function, the min_distance and max_distance
fields will be based on the current search interval: As candidate hits are discovered,
max_distance will decrease to match the newly narrowed search interval. Within intersection
functions, the origin and direction will be in object space.
A ray can be invalid. Examples of invalid rays include:
• INF's or NaN's in origin or direction
• min_distance == NaN or max_distance == NaN
• min_distance == INF (Note that max_distance may be positive INF).
• length(ray.direction) == 0.0
• min_distance > max_distance
• min_distance < 0.0 or max_distance < 0.0
The ray direction does not need to be normalized, although it does need to be non-zero.
template <typename...intersection_tags>
struct intersection_result
{
intersection_type type;
float distance;
uint primitive_id;
uint geometry_id;
};
If a ray is invalid, an intersection::none is returned.
The distance returned is in world space.
[[kernel]]
void
intersectInstancesKernel(
primitive_acceleration_structure prim_accel [[buffer(0)]],
instance_acceleration_structure inst_accel [[buffer(1)]],
device AccelerationStructs *accels [[buffer(3)]]) {…}
It is possible to create default initialized variables of such types, and the default value is the
null value for the acceleration structures
The following combinations of tags can be used to declare a primitive acceleration structure
• no tags
• primitive_motion
The following combinations of tags can be used to declare an instance acceleration structure
• instancing
• instancing, primitive_motion
• instancing, instance_motion
• instancing, primitive_motion, instance_motion
As before, these are opaque objects that can be bound directly using buffer binding points or
via argument buffers:
struct AccelerationMotionStructs {
acceleration_structure<primitive_motion> prim_motion_accel;
acceleration_structure<instancing, instance_motion> inst_motion_accel;
array<acceleration_structure<>, 2> prim_accel_array;
array<acceleration_structure<instancing>, 2> inst_accel_array;
};
[[kernel]]
void
intersectMotionKernel(
acceleration_structure<primitive_motion> prim [[buffer(15)]],
acceleration_structure<instancing, primitive_motion, instance_motion>
inst [[buffer(16)]],
device AccelerationMotionStructs *accels [[buffer(17)]]) {…}
See section 6.18.1 for the functions to call if the acceleration structure is null.
// Legal. Contains:
// (int4)(0x3f800000, 0x40000000, 0x40400000, 0x40800000)
float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
int4 i = as_type<int4>(f);
half4 f;
// Error. Result and operand have different sizes
float4 g = as_type<float4>(f);
float4 f;
// Legal. g.xyz has same values as f.xyz.
float3 g = as_type<float3>(f);
Implicit conversions from scalar-to-vector types are supported. The scalar value is replicated in
each element of the vector. The scalar may also be subject to the usual arithmetic conversion to
the element type used by the vector.
For example:
float4 f = 2.0f; // f = (2.0f, 2.0f, 2.0f, 2.0f)
Implicit conversions from scalar-to-matrix types and vector-to-matrix types are not supported
and a compilation error results. Implicit conversions from a matrix type to another matrix, vector
or scalar type are not permitted and a compilation error results.
Implicit conversions for pointer types follow the rules described in the C++14 Specification.
struct Foo {
float a[3];
int b[2];
}
Pointers or references to the constant address space are allowed as arguments to functions.
Writing to variables declared in the constant address space is a compile-time error. Declaring
such a variable without initialization is also a compile-time error.
To decide which address space (device or constant) a read-only buffer passed to a graphics
or kernel function uses, look at how the buffer is accessed inside the graphics or kernel
function. The constant address space is optimized for multiple instances executing a graphics
or kernel function accessing the same location in the buffer. Some examples of this access
pattern are accessing light or material properties for lighting / shading, matrix of a matrix array
used for skinning, filter weight accessed from a filter weight array for convolution. If multiple
executing instances of a graphics or kernel function are accessing the buffer using an index
such as the vertex ID, fragment coordinate, or the thread position in grid, the buffer must be
allocated in the device address space.
5.1 Functions
Metal 1.0 or later supports the kernel, vertex, and fragment attributes for every OS. Metal
2.3 or later supports the C++ attributes:
• [[vertex]] or vertex (See section 5.1.1)
• [[fragment]] or fragment (See section 5.1.2)
• [[kernel]] or kernel (See section 5.1.3)
• [[visible]] (See section 5.1.4)
• [[intersection(…)]] (See section 5.1.6)
Make a function accessible to the Metal API by adding one of these function attributes at the
start of a function, which makes it a qualified function. Kernel, vertex, and fragment functions
can’t call one another without triggering a compilation error, but they may call other functions
that use the [[visible]] attribute. They can also call functions with the
[[intersection(…)]] attribute by calling intersect() (see section 6.18.2).
Prior to Metal 2.2, the Metal compiler ignores namespace identifiers for kernel, vertex, and
fragment functions. In Metal 2.2 and later, if you declare a qualified function within a
namespace, you must include the namespace identifier with the function’s name each time you
refer it to a Metal Framework API. This example declares two kernel functions in different
namespaces.
namespace outer {
[[kernel]] void functionA() {…}
namespace inner {
[[kernel]] void functionB() {…}
}
}
Refer to a function in a namespace by prepending the function’s name with the namespace’s
identifier followed by two colons.
outer::functionA
Similarly, refer to a function in a nested namespace by prepending the function’s name with all
namespaces in order and separating each with two colons.
outer::inner::functionB
5.1.1 Vertex Functions
You can declare the vertex or since Metal 2.3 [[vertex]] attribute only for a graphics
function. Metal executes a vertex function for each vertex in the vertex stream and generates
per-vertex output. The following example shows the syntax for declaring a vertex.
vertex void
[[vertex]] void
vertex_func2(…)
{…}
For a vertex function, the return type identifies the output generated by the function. If the
vertex function does not generate output, it shall return void and can only be used in a render
pipeline with rasterization disabled.
Example:
[[patch(quad)]]
[[patch(quad, 16)]]
[[vertex]] vertex_output
my_bezier_vertex(…)
{…}
For graphics functions, the return type identifies whether the output generated by the function
is either per-vertex or per-fragment. If the fragment function does not generate output, it shall
return void.
To request performing fragment tests before the fragment function executes, use the
[[early_fragment_tests]] function attribute with a fragment function, as shown in the
example below.
[[early_fragment_tests]]
fragment float4
my_fragment( … )
{…}
It is an error if the return type of the fragment function declared with the
[[early_fragment_tests]] attribute includes a depth or stencil value; that is, if the return
type of this fragment function includes an element declared with the
[[depth(depth_attribute)]] or [[stencil]] attribute.
It is an error to use the [[early_fragment_tests]] attribute with any function that is not a
fragment function; that is, not declared with the fragment attribute.
[[kernel]]
void my_kernel2(…) {…}
Functions declared with the kernel or [[kernel]] attribute must return void.
You can use the [[max_total_threads_per_threadgroup]] function attribute with a kernel
function to specify the maximum threads per threadgroup.
Below is an example of a kernel function that uses this attribute:
[[max_total_threads_per_threadgroup(x)]]
kernel void
my_kernel(…)
{…}
bounding_box
Support procedural intersection within a user specified bounding box.
You may pass zero or more intersection tags as described in Table 2.9 from section 2.17. Some
examples are:
template<typename T>
kernel void bar(device T *x) { … }
// Explicit specialization of `bar<T>` with [T = int]
template kernel void bar(device int *);
The compiler gives all specializations the same name unless one uses the
[[host_name(name)]] attribute to provide a different name for each specialization.
The example below shows attributes used for function arguments of several different types (a
buffer, a texture, and a sampler):
kernel void
If the location indices are not specified, the Metal compiler assigns them using the first available
location index. In the following example, src is assigned texture index 0, dst texture index 1, s
sampler index 0, and u buffer index 0:
kernel void
my_kernel(texture2d<half> src,
texture2d<half, access::write> dst,
sampler s,
device myUserInfo *u)
{…}
In the following example, some kernel arguments have explicitly assigned location indices and
some do not. src is explicitly assigned texture index 0, and f is explicitly assigned buffer index
10. If you assign location indices using function constants (section 5.8), the compiler does not
consider those entries when assigning indices. The other arguments are assigned the first
available location index: dst texture index 1, s sampler index 0, and u buffer index 0.
kernel void
my_kernel(texture2d<half> src [[texture(0)]],
texture2d<half, access::write> dst,
sampler s,
device myUserInfo *u,
device float *f [[buffer(10)]])
{…}
5.2.1.1 Vertex Function Example with Resources and Outputs to Device Memory
The following example is a vertex function, render_vertex, which outputs to device memory in
the array xform_output, which is a function argument specified with the device attribute
(introduced in section 4.1). All the render_vertex function arguments are specified with the
buffer(0), buffer(1), buffer(2), and buffer(3) attributes (introduced in section 5.2.1).
For more about the position attribute shown in this example, see section 5.2.3.3.
struct VertexInput {
float4 position;
float3 normal;
float2 texcoord;
};
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
vertex void
render_vertex(const device VertexInput* v_in [[buffer(0)]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& light_desc [[buffer(2)]],
device VertexOutput* xform_output [[buffer(3)]],
uint v_id [[vertex_id]] )
{
VertexOutput v_out;
v_out.position = v_in[v_id].position * mvp_matrix;
v_out.color = do_lighting(v_in[v_id].position, v_in[v_id].normal,
light_desc);
v_out.texcoord = v_in[v_id].texcoord;
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
vertex VertexOutput
render_vertex(VertexInput v_in [[stage_in]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& lights [[buffer(2)]],
uint v_id [[vertex_id]])
{
The example below shows how both buffers and the stage_in attribute can be used to fetch
per-vertex inputs in a vertex function.
struct VertexInput {
float4 position [[attribute(0)]];
float3 normal [[attribute(1)]];
};
struct VertexInput2 {
half4 color;
half2 texcoord[4];
};
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
vertex VertexOutput
render_vertex(VertexInput v_in [[stage_in]],
VertexInput2 v_in2 [[buffer(0)]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& lights [[buffer(2)]],
uint v_id [[vertex_id]])
{
A post-tessellation vertex function can read the per-patch and patch control-point data. The
post-tessellation vertex function specifies the patch control-point data as the following
templated type:
patch_control_point<T>
Where T is a user defined structure. Each element of T must specify an attribute location using
[[attribute(index)]].
All OS: patch control-point templated type supported since Metal 1.2.
The patch_control_point<T> type supports these member functions:
• constexpr size_t size() const;
which returns the number of control-points in the patch.
• constexpr const_reference operator[] (size_t pos) const;
which returns the data for a specific patch control point that pos identifies.
Example:
struct ControlPoint {
int3 patchParam [[attribute(0)]];
float3 P [[attribute(1)]];
float3 P1 [[attribute(2)]];
float3 P2 [[attribute(3)]];
float2 vSegments [[attribute(4)]];
};
struct PerPatchData {
float4 patchConstant [[attribute(5)]];
float4 someOtherPatchConstant [[attribute(6)]];
};
struct PatchData {
patch_control_point<ControlPoint> cp; // Control-point data
PerPatchData patchData; // Per-patch data
};
All OS: All attributes in Table 5.2 are supported since Metal 1.2.
Notes on vertex function input attributes:
• If the type used to declare [[instance_id]] is uint, the type used to declare
[[base_instance]] must be uint or ushort.
• If the type used to declare [[instance_id]] is ushort, the type used to declare
[[base_instance]] must be ushort.
All OS: Attributes in Table 5.3 are supported since Metal 1.0, unless otherwise indicated.
vertex VertexOutput
process_vertex(…)
{
VertexOutput v_out;
// compute per-vertex output
…
Post-tessellation vertex function outputs are the same as a regular vertex function.
If vertex amplification is enabled, and if a vertex output variable has the same value for every
[[amplification_id]] attribute, the vertex output is considered shared. A vertex output that
is shared may use a single varying output slot, which is a limited resource. Vertex outputs that
are not shared consume more than one varying output slot. (The Metal framework call
[MTLRenderPipelineDescriptor maxVertexAmplificationCount] returns the number of
varying slots that may be used to pass the amplified data to fragment function invocations,
which impacts the number of total available varying slots.)
By default, all built-in vertex outputs are shared, except for those with the [[position]]
attribute. By default, all other vertex outputs are not shared. To explicitly specify that the output
is shared, use the [[shared]] attribute with a vertex output variable.
If the shader compiler can deduce that a vertex output variable has the same value for every
amplification_id, the compiler may mark that vertex output as shared. The compiler may not
mark vertex outputs as shared in any of these cases:
• The output value depends on the [[amplification_id]].
• An atomic read-modify-write operation returns the output value.
• The shader loads the output value from volatile memory.
A variable declared with the [[position]] attribute as input to a fragment function can only
be declared with the center_no_perspective sampling and interpolation attribute. (See
section 5.4.)
For [[color(m)]], m is used to specify the color attachment index when accessing (reading or
writing) multiple color attachments in a fragment function.
The [[sample_mask]] attribute can only be declared once for a fragment function input.
The value of [[render_target_array_index]] in the fragment function is the same value
written from the vertex function, even if the specified value is out of range.
For more about [[viewport_array_index]], see section 5.9.
The default value for [[amplification_count]] is 1, which indicates that vertex amplification
is disabled.
The value for [[amplification_id]] shall be in the range [0, amplification_count).
For a specified [[amplification_id]] attribute value, the [[viewport_array_index]] and
[[render_target_array_index]] built-in fragment input values are added to (offset by) the
values that the corresponding MTLVertexAmplificationViewMapping structure provides.
The following example describes the structure MyVertexOut that is both a vertex function
return type and a fragment function input type. MyVertexOut uses the
struct MyVertexOut {
float4 position [[position]];
float3 normal;
float3 tangent;
float3 bitangent;
float2 texcoord [[shared]]; // explicitly shared.
float ampData;
ushort viewport [[viewport_array_index]]; // implicitly shared
};
A fragment function input declared with the [[barycentric_coord]] attribute can only be
declared with either the center_perspective (default) or center_no_perspective sampling
and interpolation attributes. The barycentric coordinates and per-pixel primitive ID can be
passed as fragment function input in structures organized as shown in these examples:
struct FragmentInput0 {
uint primitive_id [[primitive_id]];
// [[center_perspective]] is the default, so it can be omitted.
float3 barycentric_coord [[barycentric_coord, center_perspective]];
};
struct FragmentInput1 {
uint primitive_id [[primitive_id]];
float2 linear_barycentric_coord [[barycentric_coord,
center_no_perspective]];
};
By storing the barycentric coordinates and per-pixel primitive ID, your shader can manually read
and interpolate the vertices of a drawn primitive within the fragment phase or defer this
interpolation to a separate pass. In the deferred interpolation scenario, you can use a thin buffer
during the geometry pass to store a minimal set of surface data, including pre-clipped
barycentric coordinates. At a later stage, you must have enough data to reconstruct the original
vertex indices from the primitive ID data and to correlate the barycentric coordinates to those
vertex indices.
When applying the barycentric_coord attribute to an input argument (or to a field of an
argument) with more components than the dimension of the primitive, the remaining elements
are initialized with 0.0f. For example, for
fragment float4
frag (float3 coord [[barycentric_coord]]) { ... }
• When drawing a point, coord.yz is float2(0.0f).
• When drawing a line, coord.z is 0.0f.
When applying the barycentric_coord attribute to an input argument (or to a field of an
argument) with fewer components than the dimension of the primitive, the remaining elements
are ignored.
Table 5.5 lists attributes that can be specified for tile arguments that are input to a fragment
function. The data types used to declare [[pixel_position_in_tile]] and
[[pixels_per_tile]] must match.
The color attachment index m for fragment output is specified in the same way as it is for
[[color(m)]] for fragment input (see discussion for Table 5.4). Multiple elements in the
fragment function return type that use the same color attachment index for blending must be
declared with the same data type.
If there is only a single-color attachment in a fragment function, then [[color(m)]] is optional.
If [[color(m)]] is not specified, the attachment index is 0. If multiple color attachments are
specified, [[color(m)]] must be specified for all color values. See examples of specifying the
color attachment in sections 5.5 and 5.8.1.5.
If index(i) is not specified in the attribute, the default is an index of 0. If index(i) is
specified, the value of i must be known at compile time.
If a fragment function writes a depth value, the depth_argument must be specified with one of
the following values:
any
greater
less
You cannot use the [[stencil]] attribute in fragment-based tile shading functions. The
[[stencil]] attribute is not compatible with the [[early_fragment_tests]] function
attribute.
If the fragment function does not output the stencil value, the setStencilReferenceValue: or
setStencilFrontReferenceValue:backReferenceValue: method of
MTLRenderCommandEncoder can set the stencil reference value.
// color attachment 1
int4 clr_i [[color(1)]];
// color attachment 2
uint4 clr_ui [[color(2)]];
}
fragment MyFragmentOutput
my_fragment(…)
{
MyFragmentOutput f;
…
f.clr_f = …;
…
return f;
}
If a color attachment index is used as both an input to and an output of a fragment function, the
data types associated with the input argument and output declared with this color attachment
index must match.
quadgroup_index_in_threadgroup
The scalar index of a quad-group
macOS: Since Metal 2.1. ushort or uint
within a threadgroup.
iOS: Since Metal 2.0.
quadgroups_per_threadgroup
The quad-group execution width
macOS: Since Metal 2.1. ushort or uint
of a threadgroup.
iOS: Since Metal 2.0.
simdgroup_index_in_threadgroup
macOS: Since Metal 2.0. The scalar index of a SIMD-group
ushort or uint
iOS: Since Metal 2.2. within a threadgroup.
simdgroups_per_threadgroup
The SIMD-group execution width
macOS: Since Metal 2.0. ushort or uint
of a threadgroup.
iOS: Since Metal 2.2.
thread_index_in_quadgroup
The scalar index of a thread within
macOS: Since Metal 2.1. ushort or uint
a quad-group.
iOS: Since Metal 2.0.
thread_index_in_simdgroup
The scalar index of a thread within
macOS: Since Metal 2.0. ushort or uint
a SIMD-group.
iOS: Since Metal 2.2.
ushort, ushort2,
thread_position_in_threadgroup ushort3, The thread’s unique position within
All OS: Since Metal 1.0. uint, uint2, or a threadgroup
uint3
ushort, ushort2,
threadgroup_position_in_grid ushort3, The threadgroup’s unique position
All OS: Since Metal 1.0. uint, uint2, or within a grid.
uint3
ushort, ushort2,
threadgroups_per_grid ushort3, The number of threadgroups in a
All OS: Since Metal 1.0. uint, uint2, or grid.
uint3
ushort, ushort2,
threads_per_grid ushort3,
All OS: Since Metal 1.0. The grid size.
uint, uint2, or
uint3
threads_per_simdgroup
The thread execution width of a
macOS: Since Metal 2.0. ushort or uint
SIMD-group.
iOS: Since Metal 2.2.
ushort, ushort2,
threads_per_threadgroup ushort3, The thread execution width of a
All OS: Since Metal 1.0. uint, uint2, or threadgroup.
uint3
key_frame_count
Number of key frames. Available if
ushort or uint intersection_tags include
All OS: Since Metal 2.4.
primitive_motion.
The type for a parameter with the [[payload]] attribute is of the form ray_data T &. It is
passed by reference to the intersection functions, and it is allocated in the ray_data address
space. The type T of the payload can be or contain the following types:
• device or constant pointers or references
• integer types
• enum types
• floating point types
• vector types
• arrays of such types
• struct and union (except for atomic<T’> and imageblock<T’>).
distance float This returns the distance along the ray of a hit found
within the bounding box. If the hit is rejected
([[accept_intersection]] == false), this return
value is ignored. Available if the primitive_type is a
bounding_box.
struct VertexInput {
float4 position [[attribute(0)]];
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 108 of 249
float3 normal [[attribute(1)]];
half4 color [[attribute(2)]];
half2 texcoord [[attribute(3)]];
};
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
vertex VertexOutput
render_vertex(VertexInput v_in [[stage_in]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& lights [[buffer(2)]],
uint v_id [[vertex_id]])
{
VertexOutput v_out;
v_out.position = v_in.position * mvp_matrix;
v_out.color = do_lighting(v_in.position, v_in.normal, lights);
…
return v_out;
}
struct VertexInputData {
float4 position;
float3 normal;
float2 texcoord;
};
struct LightDesc {
uint num_lights;
float4 light_position[MAX_LIGHTS];
float4 light_color[MAX_LIGHTS];
float4 light_attenuation_factors[MAX_LIGHTS];
};
vertex VertexOutput2
render_vertex(const device VertexInputData *v_in [[buffer(0)]],
constant float4x4& mvp_matrix [[buffer(1)]],
constant LightDesc& lights [[buffer(2)]],
uint v_id [[vertex_id]])
{
VertexOutput v_out;
v_out.position = v_in[v_id].position * mvp_matrix;
v_out.color = do_lighting(v_in[v_id].position, v_in[v_id].normal,
lights);
v_out.texcoord = v_in[v_id].texcoord;
return v_out;
}
// Compute color.
float4 clr = compute_color(tex_clr0, tex_clr1, …);
return clr;
}
kernel void
my_kernel(PerThreadInput thread_input [[stage_in]],
…
uint t_id [[thread_position_in_grid]])
{…}
kernel void
my_kernel(texture2d<float> img [[texture(0)]],
device float *ptr [[buffer(0)]])
{
extern constant float4 a;
static constant float4 b; // Here, static is an error.
static float c; // Here, static is an error.
…
my_foo(img);
…
my_bar(ptr);
…
}
" Every attribute name given by [[user(name)]] must be unique for each data member
in the imageblock.
" The attribute name given by [[user(name)]] for a data member must match with a
data member declared in the master explicit imageblock structure, and their associated
data types must also match.
" If the [[user(name)]] attribute is not specified, the data member name and type
declared in the imageblock data type for a fragment function and the master imageblock
structure must match. Additionally, the data member cannot be within a nested structure
that is either within the view imageblock structure or within the master imageblock
structure.
struct IVIn {
rgb9e5<float4> b; // Maps to IM::b
float d; // Maps to IM::d
};
struct IVOut {
int c; // Maps to IM::c
};
struct FragOut {
IVOut i [[imageblock_data(IM)]];
};
fragment FragOut
my_fragment(IVIn i [[imageblock_data(IM)]], …) {
FragOut fragOut;
… = i.b;
… = i.d;
fragOut.i.c = …;
return fragOut;
}
You can declare nested structures in the master imageblock and view imageblock structures.
The following example shows how to use nested structures in an imageblock with data
members declared with the [[user(name)]] attribute:
struct A {
rgba8unorm<half4> a [[user(A_a)]];
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 116 of 249
rgb9e5<float4> b [[user(A_b)]];
};
struct B {
int a [[user(B_a), raster_order_group(1)]];
float b [[user(B_b), raster_order_group(2)]];
};
struct IM {
A a [[user(A), raster_order_group(0)]];
B b [[user(B)]];
};
struct IVIn {
A x [[user(A)]]; // Maps to IM::a
};
struct IVOut {
B y [[user(B)]]; // Maps to IM::b
rgb9e5<float4> z [[user(A_b)]]; // Maps to IM::A::b
};
struct FragOut {
IVOut i [[imageblock_data(IM)]];
};
fragment FragOut
my_fragment(IVIn i [[imageblock_data(IM)]], …) {
FragOut fragOut;
… = i.x;
fragOut.i.y.a = …;
fragOut.i.y.b = …;
fragOut.i.z = …;
return fragOut;
}
struct V {
int a [[user(x)]];
M::A b [[user(y)]]; // Illegal: b aliases with a
};
fragment void
f(V i [[imageblock_data(M)]])
{…}
Explicit imageblock types cannot have data members declared with the [[color(n)]]
attribute.
struct gBufferData {
half3 color [[raster_order_group(0)]];
S s [[raster_order_group(1)]];
rgb11b10f<half3> lighting [[raster_order_group(2)]];
};
Data members declared as an array have a single raster order group associated with all
members of the array. The following example shows how you can specify the
[[raster_order_group(index)]] attribute for a data member of a master imageblock that is
an array of a structure type.
struct S {
rgb9e5<half3> normal;
float factor;
};
struct IM {
half3 color [[raster_order_group(0)]];
S s [[raster_order_group(1)]][2];
rgb11b10f<half3> lighting [[raster_order_group(2)]];
};
" As a color attachment, where the storage layout of the imageblock is not known in the
fragment function. An implicit imageblock layout uses the existing color attachment
attribute. (For more about the implicit imageblock layout, see section 5.6.3.1.)
" As a structure used to declare the imageblock data where the fragment function
explicitly specifies the storage layout of the imageblock. (For more about the explicit
imageblock layout, see section 5.6.3.2.)
" Have a color attachment (see the [[color(m)]] attribute in Table 5.4 of section
5.2.3.4). The color index m must be unique for each member (and sub-member) of T.
" Be a structure type with members that satisfy the constraint on the list.
struct FragOut {
float c [[color(0)]];
I i [[imageblock_data]];
};
fragment FragOut
my_fragment(I i [[imageblock_data]])
{
FragOut fragOut;
…
return fragOut;
}
Fragment functions can access both an implicit imageblock and an explicit imageblock as
separate input arguments, or as fields in a return structure.
Example:
struct I {
float a [[raster_order_group(0)]];
};
fragment FragOut
my_fragment(I i [[imageblock_data]],
float c [[color(0)]])
{
FragOut fragOut;
…
return fragOut;
}
By default, the explicit imageblock storage is separate from the storage of the implicit
imageblock. To share storage between the explicit imageblock and implicit imageblock, see
section 5.6.5.
" scalars
" vectors and packed vectors
" pixel data types
" an array with elements that are one of the types on this list
struct FragOut {
float4 finalColor [[color(0)]];
I i [[imagelock_data, alias_implicit_imageblock_color(1)]];
fragment FragOut
my_fragment(I i [[imageblock_data]], …)
{
FragOut fragOut;
…
return fragOut;
}
vertex VertexOutput
my_vertex(…)
{
VertexOutput v;
…
return v;
}
fragment float4
my_fragment(VertexOutput f [[stage_in]], …)
{
float4 clr;
…
return clr;
}
struct FragInput {
float3 frag_normal [[user(normal)]];
float4 position [[position]];
float4 framebuffer_color [[color(0)]];
bool is_front_face [[front_facing]];
};
vertex VertexOutput
my_vertex(…)
{
VertexOutput v;
…
return v;
}
fragment float4
my_fragment(FragInput f [[stage_in]], …)
{
float4 clr;
…
vertex VertexOutput
my_vertex(…)
{
VertexOutput v;
…
return v;
}
fragment float4
my_fragment(float4 p [[position]], …)
{
float4 clr;
…
return clr;
}
struct FragInput {
float4 position [[position]];
vertex VertexOutput
my_vertex(…)
{
VertexOutput v;
…
return v;
}
fragment float4
my_fragment(FragInput f [[stage_in]], …)
{
float4 clr;
…
return clr;
}
struct FragInput {
float3 normal [[user(foo)]];
float4 position [[position]];
};
vertex VertexOutput
my_vertex_shader(…)
{
VertexOutput v;
fragment float4
my_fragment_shader(FragInput f [[stage_in]], …)
{
float4 clr;
…
return clr;
}
You can use the following built-in function to determine if a function constant has been defined
and is available. name refers to the function constant variable.
bool is_function_constant_defined(name)
Returns true if the function constant variable is defined and false otherwise.
If a function constant variable value is not defined during the creation of the pipeline state and if
the graphics or kernel function specified with the render or compute pipeline state uses these
function constants, the behavior is the same as when the value of
is_function_constant_defined(name) is false.
struct VertexInput {
float4 position [[attribute(0)]];
float4 offset [[attribute(1)]];
float4 color [[attribute(2)]];
};
vertex VertexOutput
vOut.position = vIn.position;
#ifdef OFFSET_DEFINED
vOut.position += vIn.offset;
#endif
#ifdef COLOR_DEFINED
vOut.color = vIn.color;
#else
vOut.color = float4(0.0f);
#endif
return vOut;
}
vertex VertexOutput
myVertex(VertexInput vIn [[stage_in]])
{
VertexOutput vOut;
vOut.position = vIn.position;
if (offset_defined)
vOut.position += vIn.offset;
if (color_defined)
vOut.color = vIn.color;
else
vOut.color = float4(0.0f);
fragment half4
myFragment(constant GlobalUniformData *globalUniform [[buffer(0)]],
constant RenderUniformData_ModelWithLightmap *renderUniform
[[buffer(1)]],
constant MaterialUniformData *materialUniform [[buffer(2)]],
texture2d<float> DiffuseTexture [[texture(0)]],
texture2d<float> LightmapTexture [[texture(1)]],
texture2d<float> FogTexture [[texture(3)]],
texture2d<float> LookupTexture [[texture(4),
function_constant(med_quality_defined)]],
texture2d<float> RealtimeShadowMapTexture [[texture(10),
function_constant(realtime_shadow_defined)]],
sampler DiffuseTextureSampler [[sampler(0)]],
sampler LightmapTextureSampler [[sampler(1)]],
sampler FogTextureSampler [[sampler(3)]],
sampler LookupTextureSampler [[sampler(4),
function_constant(med_quality_defined)]],
sampler RealtimeShadowMapTextureSampler [[sampler(10),
function_constant(realtime_shadow_defined)]],
VertexOutput fragIn [[stage_in]])
Below is another example that shows how to use function constants with arguments to a
function:
constant bool hasInputBuffer [[function_constant(0)]];
struct VertexOutput {
float4 position [[position]];
float4 color;
};
struct VertexInput {
float4 position [[attribute(0)]];
float4 offset [[attribute(1), function_constant(offset_defined)]];
float4 color [[attribute(2), function_constant(color_defined)]];
};
vertex VertexOutput
myVertex(VertexInput vIn [[stage_in]])
{
VertexOutput vOut;
vOut.position = vIn.position;
if (offset_defined)
vOut.position += vIn.offset;
return vOut;
}
5.8.1.4 Function Constants for Resource Bindings
All OS: Using a function constant to specify resource bindings supported since Metal 2.0.
An argument to a graphics or kernel functions that is a resource (buffer, texture, or sampler)
can use a function constant to specify its binding number. The function constant must be a
scalar integer type.
Example:
constant int indexA [[function_constant(0)]];
constant int indexB = indexA + 2;
constant int indexC [[function_constant(1)]];
constant int indexD [[function_constant(2)]];
kernel void
my_kernel(constant UserParams& params [[buffer(indexA)]],
device T * p [[buffer(indexB)]],
texture2d<float> texA [[texture(indexC)]],
sampler s [[sampler(indexD)]], …)
{…}
5.8.1.5 Function Constants for Color Attachments and Raster Order Groups
All OS: using a function constant to specify a color attachment or raster order group attribute
index supported since Metal 2.0.
The [[color(n)]] or [[raster_order_group(index)]] index can also be a function
constant. The function constant used must be a scalar integer type.
Example:
constant int colorAttachment0 [[function_constant(0)]];
constant int colorAttachment1 [[function_constant(1)]];
constant int group0 [[function_constant(2)]];
struct FragmentOutput {
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 135 of 249
float4 color0 [[color(colorAttachment0)]];
float4 color1 [[color(colorAttachment1)]];
};
fragment FragmentOutput
my_fragment(texture2d<float> texA [[texture(0),
raster_order_group(group0)]], …)
{…}
For single precision floating-point, Metal also supports a precise and fast variant of the
following common functions: clamp and saturate. The difference between the Fast and
precise function variants handle NaNs differently. In the fast variant, the behavior of NaNs is
undefined, whereas the precise variants follow the IEEE 754 rules for NaN handling. The
ffast-math compiler option (refer to section 1.5.3) selects the appropriate variant when
compiling the Metal source. In addition, the metal::precise and metal::fast nested
namespaces provide an explicit way to select the fast or precise variant of these common
functions.
T32 mul24(T32 x, T32 y)! Multiplies two 24-bit integer values x and y and
All OS: Since Metal 2.1. returns the 32-bit integer result. x and y are 32-
bit integers but only the low 24 bits perform the
multiplication. (See details following this table.)
T mulhi(T x, T y) Computes x * y and returns the high half of the
product of x and y.
T popcount(T x) Returns the number of non-zero bits in x.
T reverse_bits(T x)" Returns the reversal of the bits of x. The bit
All OS: Since Metal 2.1. numbered n of the result is taken from bit (bits
– 1) – n of x, where bits is the total number of
bits used to represent x.
T rhadd(T x, T y) Returns (x + y + 1) >> 1. The intermediate
sum does not modulo overflow.
T rotate(T v, T i) For each element in v, the bits are shifted left by
the number of bits given by the corresponding
element in i. Bits shifted off the left side of the
element are shifted back in from the right.
T subsat(T x, T y) Returns x – y and saturates the result.
The mul24 function only operates as described if x and y are signed integers and x and y are in
the range [-2^23, 2^23 -1], or if x and y are unsigned integers and x and y are in the range
[0, 2^24 -1]. If x and y are not in this range, the multiplication result is implementation-
defined.
Tb signbit(T x) Test for sign bit. Returns true if the sign bit is set for the
floating-point value in x; otherwise returns false.
For fmedian3, if all values are NaN, return NaN. Otherwise, treat NaN as missing data and
remove it from the set. If two values are NaN, return the non-NaN value. If one of the values is
NaN, the function can return either non-NaN value.
For single precision floating-point, Metal supports two variants of the math functions listed in
Table 6.4: the precise and the fast variants. The ffast-math compiler option (refer to section
1.5.3) selects the appropriate variant when compiling the Metal source. In addition, the
metal::precise and metal::fast nested namespaces provide an explicit way to select the
fast or precise variant of these math functions for single precision floating-point.
Examples:
float x;
float a = sin(x); // Use fast or precise version of sin based on
// whether you specify –ffast-math as
// compile option.
float b = fast::sin(x); // Use fast version of sin().
float c = precise::cos(x); // Use precise version of cos().
M_PI_F Value of π
M_PI_2_F Value of π / 2
M_PI_4_F Value of π / 4
M_1_PI_F Value of 1 / π
M_2_PI_F Value of 2 / π
M_2_SQRTPI_F Value of 2 / #$
M_SQRT2_F Value of #2
M_SQRT1_2_F Value of 1 / #2
Table 6.6 lists available symbolic constants with values of type half that are accurate within
the precision of a half-precision floating-point number.
M_E_H Value of e
M_PI_H Value of π
M_PI_2_H Value of π / 2
M_PI_4_H Value of π / 4
M_1_PI_H Value of 1 / π
M_2_PI_H Value of 2 / π
M_2_SQRTPI_H Value of 2 / #$
M_SQRT2_H Value of #2
M_SQRT1_2_H Value of 1 / #2
Functions Description
* Returns a * b
simdgroup_load(sgMatA, pMatA);
simdgroup_load(sgMatB, pMatB);
simdgroup_load(sgMatC, pMatC);
simdgroup_store(sgMatR, pMatR);
}
T refract(T I, T N, Ts eta) For the incident vector I and surface normal N, and
the ratio of indices of refraction eta, return the
refraction vector.
The input parameters for the incident vector I and
the surface normal N must already be normalized to
get the desired results.
For single precision floating-point, Metal also supports a precise and fast variant of the
following geometric functions: distance, length, and normalize. To select the appropriate
variant when compiling the Metal source, use the ffast-math compiler option (refer to section
1.5.3). In addition, the metal::precise and metal::fast nested namespaces are also
T simd_and(T data)
Returns the bitwise AND (&) of data across all
active threads in the SIMD-group and
macOS: Since Metal 2.1.! broadcasts the result to all active threads in
iOS: No support. the SIMD-group.
T simd_broadcast(T data,
Broadcasts data from the thread whose SIMD
ushort broadcast_lane_id) lane ID is equal to broadcast_lane_id.
The specification doesn’t define the behavior
macOS: Since Metal 2.0.! when broadcast_lane_id isn’t a valid SIMD
iOS: Since Metal 2.2. lane ID or isn’t the same for all threads in a
SIMD-group.
T simd_broadcast_first(T data)
Broadcasts data from the first active thread
— the active thread with the smallest index —
macOS: Since Metal 2.1. ! in the SIMD-group to all active threads.
iOS: Since Metal 2.2.
T simd_shuffle(T data,
Returns data from the thread whose SIMD
ushort simd_lane_id) lane ID is simd_lane_id. The simd_lane_id
must be a valid SIMD lane ID but doesn’t have
macOS: Since Metal 2.0.! to be the same for all threads in the SIMD-
iOS: Since Metal 2.2. group.
T simd_shuffle_and_fill_up(T data,
Returns data or filling_data for each
T filling_data, ushort delta, vector from the thread whose SIMD lane ID is
ushort modulo) the difference from the caller’s SIMD lane ID
minus delta.
All OS: Since Metal 2.4. If the difference is negative, the operation
copies values from the upper delta lanes of
T simd_shuffle_rotate_up(T data,
Returns data from the thread whose SIMD
ushort delta) lane ID is the difference from the caller’s
SIMD lane ID minus delta.
macOS: Since Metal 2.1.! The value of delta must be the same for all
iOS: Since Metal 2.2. threads in a SIMD-group.
This function wraps values around the SIMD-
group.
T simd_max(T data)
Returns data with the highest value from
across all active threads in the SIMD-group
macOS: Since Metal 2.1.! and broadcasts that value to all active threads
iOS: Since Metal 2.3. in the SIMD-group.
T simd_min(T data)
Returns data with the lowest value from
across all active threads in the SIMD-group
macOS: Since Metal 2.1.! and broadcasts that value to all active threads
iOS: Since Metal 2.3. in the SIMD-group.
T simd_or(T data)
Returns the bitwise OR (|) of data across all
active threads in the SIMD-group and
macOS: Since Metal 2.1.! broadcasts the result to all active threads in
iOS: Since Metal 2.3. the SIMD-group.
T simd_prefix_exclusive_sum (T data)
For a given thread, returns the sum of the
input values in data for all active threads with
macOS: Since Metal 2.1.! a lower index in the SIMD-group. The first
iOS: Since Metal 2.3. thread in the group, returns T(0).
T simd_prefix_inclusive_product (T
For a given thread, returns the product of the
data) input values in data for all active threads with
a lower or the same index in the SIMD-group.
macOS: Since Metal 2.1.!
iOS: Since Metal 2.3.
T simd_product(T data)
Returns the product of the input values in
data across all active threads in the SIMD-
macOS: Since Metal 2.1.! group and broadcasts the result to all active
iOS: Since Metal 2.3. threads in the SIMD-group.
T simd_sum(T data)
Returns the sum of the input values in data
across all active threads in the SIMD-group
macOS: Since Metal 2.1.! and broadcasts the result to all active threads
iOS: Since Metal 2.3. in the SIMD-group.
T simd_xor(T data)
Returns the bitwise XOR (^) of data across all
active threads in the SIMD-group and
macOS: Since Metal 2.1." broadcasts the result to all active threads in
iOS: Since Metal 2.3. the SIMD-group.
6.9.2.1 Examples
To demonstrate the shuffle functions, start with this SIMD-group’s initial state:
SIMD Lane ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
data a b c d e f g h i j K l m n o p
The simd_shuffle_up() function shifts each SIMD-group upward by delta threads. For
example, with a delta value of 2, the function:
• Shifts the SIMD lane IDs down by two
• Marks the lower two lanes as invalid
Computed
SIMD Lane ID -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13
valid 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
data a b a b c d e f g h i j k l m n
The simd_shuffle_up() function is a non-wrapping operation that doesn’t affect the lower
delta lanes.
Similarly simd_shuffle_down() function shifts each SIMD-group downward by the delta
threads. Starting with the same initial SIMD-group state, with a delta value of 2, the function:
• Shifts the SIMD lane IDs up by two
• Marks the upper two lanes as invalid
Computed
SIMD Lane ID 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
valid 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
data c d e f g h i j k l m n o p o p
The simd_shuffle_down() function is a non-wrapping operation that doesn’t affect the upper
delta lanes.
SIMD Lane ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
data a b c d e f g h s t u v w x y z
filling fa fb fc fd fe ff fg fh fs ft fu fv fw fx fy fz
Computed
SIMD Lane ID -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13
data fy fz a b c d e f g h s t u v w x
The simd_shuffle_and_fill_up() function with the modulo parameter splits the SIMD-
group into vectors, each with size modulo, and shifts each vector by the delta threads. For
example, with a modulo value of 8 and a delta value of 2, the function:
• Shifts the SIMD lane IDs down by two
• Assigns the upper two lanes of each vector in filling to the lower two lanes of each
vector in data
Computed
SIMD Lane ID -2 -1 0 1 2 3 4 5 -2 -1 0 1 2 3 4 5
data fg fh a b c d e f fy fz s t u v w x
Computed
SIMD Lane ID 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
data c d e f g h s t u v w x y z fa fb
The simd_shuffle_and_fill_down() function with the modulo parameter splits the SIMD-
group into vectors, each with size modulo and shifts each vector by the delta threads. For
example, with a modulo value of 8 and a delta value of 2, the function:
• Shifts the SIMD lane IDs up by two
• Assigns the lower two lanes of each vector in filling to the upper two lanes of each
vector in data
Below is an example of how to use these SIMD functions to perform a reduction operation:
kernel void
reduce(const device int *input [[buffer(0)]],
device int *output [[buffer(1)]],
threadgroup int *ldata [[threadgroup(0)]],
uint gid [[thread_position_in_grid]],
uint lid [[thread_position_in_threadgroup]],
uint lsize [[threads_per_threadgroup]],
uint simd_size [[threads_per_simdgroup]],
uint simd_lane_id [[thread_index_in_simdgroup]],
uint simd_group_id [[simdgroup_index_in_threadgroup]])
{
// Perform the first level of reduction.
// Read from device memory, write to threadgroup memory.
int val = input[gid] + input[gid + lsize];
for (uint s=lsize/simd_size; s>simd_size; s/=simd_size)
{
// Perform per-SIMD partial reduction.
for (uint offset=simd_size/2; offset>0; offset/=2)
val += simd_shuffle_down(val, offset);
// Write per-SIMD partial reduction value to threadgroup memory.
if (simd_lane_id == 0)
ldata[simd_group_id] = val;
// Wait for all partial reductions to complete.
threadgroup_barrier(mem_flags::mem_threadgroup);
class simd_vote {
public:
explicit constexpr simd_vote(vote_t v = 0);
explicit constexpr operator vote_t() const;
private:
// bit i in v represents the 'vote' for the thread in the SIMD-group
// at index i
uint64_t v;
};
Note that simd_all(expr) is different from simd_ballot(expr).all():
! simd_all(expr) returns true if all active threads evaluate expr to true.
! simd_ballot(expr).all() returns true if all threads were active and evaluated the
expr to true. (simd_vote::all() does not look at which threads are active.)
T quad_broadcast_first(T data)
Broadcasts data from the first active thread
— the active thread with the smallest index —
macOS: Since Metal 2.1. ! in the quad-group to all active threads.
iOS: Since Metal 2.2.
T quad_shuffle(T data,"
Returns data from the thread whose quad
ushort quad_lane_id) lane ID is the sum of the caller’s quad lane ID
and delta.
macOS: Since Metal 2.0.! The value for quad_lane_id must be a valid
iOS: Since Metal 2.0. land ID and may differ from other threads in
the quad-group.
T quad_shuffle_and_fill_down(T data,
Returns data or filling_data from the
T filling_data, ushort delta) thread whose quad lane ID is the sum of the
caller’s quad lane ID and delta.
All OS: Since Metal 2.4. If the sum is greater than the quad-group
size, the function copies values from the
lower delta lanes of filling_data into the
upper delta lanes of data.
The value for delta must be the same for all
threads in a quad-group.
T quad_shuffle_and_fill_down(T data,
Returns data or filling_data for each
T filling_data, ushort delta, vector, from the thread whose quad lane ID is
ushort modulo) the sum of caller’s quad lane ID and delta.
If the sum is greater than the quad-group
All OS: Since Metal 2.4. size, the function copies values from the
lower delta lanes of filling_data into the
upper delta lanes of data.
The value of delta must be the same for all
threads in a quad-group.
The modulo parameter defines the vector
width that splits the quad-group into separate
vectors.
T quad_shuffle_and_fill_up(T data,
Returns data or filling_data for each
T filling_data, ushort delta, vector from the thread whose quad lane ID is
ushort modulo) the difference from the caller’s quad lane ID
minus delta.
All OS: Since Metal 2.4. If the difference is negative, the operation
copies values from the upper delta lanes of
filling_data to the lower delta lanes of
data.
The value of delta must be the same for all
threads in a quad-group.
The modulo parameter defines the width that
splits the quad-group into separate vectors.
T quad_shuffle_down(T data,"
Returns data from the thread whose quad
ushort delta) lane ID is the sum of the caller’s quad lane ID
and delta.
macOS: Since Metal 2.0.! The value for delta must be the same for all
iOS: Since Metal 2.0. threads in a quad-group.
The function doesn’t modify the upper delta
lanes of data because it doesn’t wrap values
around the quad-group.
T quad_shuffle_rotate_up(T data,"
Returns data from the thread whose quad
ushort delta) lane ID is the difference from the caller’s quad
lane ID minus delta.
macOS: Since Metal 2.1.! The value for delta must be the same for all
iOS: Since Metal 2.2. threads in a quad-group.
This function wraps values around the quad-
group.
T quad_shuffle_xor(T value,"
Returns data from the thread whose quad
ushort mask) lane ID is a bitwise XOR (^) of the caller’s quad
lane ID and mask. The value of mask must be
macOS: Since Metal 2.0.! the same for all threads in a quad-group.
iOS: Since Metal 2.0.
quad_vote quad_active_threads_mask()
Returns a quad_vote mask that represents
the active threads.
macOS: Since Metal 2.1.! The function is equivalent to
iOS: Since Metal 2.3. quad_ballot(true) and sets the bits that
represent active threads to 1 and inactive
threads to 0.
T quad_and(T data)
Returns the bitwise AND (&) of data across all
active threads in the quad-group and
macOS: Since Metal 2.1.! broadcasts the result to all active threads in
iOS: Since Metal 2.3. the quad-group.
bool quad_is_helper_thread()
Returns true if the current thread is a helper
thread; otherwise, false.
macOS: Since Metal 2.1.! You must call this function from a fragment
iOS: Since Metal 2.3. function or another function that your
fragment function calls; otherwise, it may
trigger a compile-time error.
T quad_max(T data)
Returns data with the highest value from
across all active threads in the quad-group
macOS: Since Metal 2.1.! and broadcasts that value to all active threads
iOS: Since Metal 2.3. in the quad-group.
T quad_min(T data)
Returns data with the lowest value from
across all active threads in the quad-group
macOS: Since Metal 2.1.! and broadcasts that value to all active threads
iOS: Since Metal 2.3. in the quad-group.
T quad_prefix_exclusive_product (T
For a given thread, returns the product of the
data) input values in data for all active threads with
a lower index in the quad-group. For the first
macOS: Since Metal 2.1.! thread in the group, return T(1).
iOS: Since Metal 2.3.
T quad_prefix_exclusive_sum (T data)
For a given thread, returns the sum of the
input values in data for all active threads with
macOS: Since Metal 2.1.! a lower index in the quad-group. For the first
iOS: Since Metal 2.3. thread in the group, return T(0).
T quad_prefix_inclusive_product (T
For a given thread, returns the product of the
data) input values in data for all active threads with
a lower or the same index in the quad-group.
macOS: Since Metal 2.1.!
iOS: Since Metal 2.3.
T quad_product(T data)
Returns the product of the input values in
data across all active threads in the quad-
macOS: Since Metal 2.1.! group and broadcasts the result to all active
iOS: Since Metal 2.3. threads in the quad-group.
T quad_xor(T data)
Returns the bitwise XOR (^) of data across all
active threads in the quad-group and
macOS: Since Metal 2.1.! broadcasts the result to all active threads in
iOS: Since Metal 2.3. the quad-group.
In a kernel function, quads divide across the SIMD-group. In a fragment function, the lane ID
represents the fragment location in a 2 x 2 quad:
• Lane ID 0 is the upper-left pixel
• Lane ID 1 is the upper-right pixel
• Lane ID 2 is the lower-left pixel
• Lane ID 3 is the lower-right pixel
To demonstrate the shuffle functions, start with this quad-group’s initial state:
Quad Lane ID 0 1 2 3
data a b c d
The quad_shuffle_up() function shifts each quad-group upward by delta threads. For
example, with a delta value of 2, the function:
• Shifts the quad lane IDs down by two
• Marks the lower two lanes as invalid
Computed
Quad Lane ID -2 -1 0 1
valid 0 0 1 1
data a b a b
The quad_shuffle_up() function is a non-wrapping operation that doesn’t affect the lower
delta lanes.
Similarly, quad_shuffle_down() function shifts each quad-group downward by delta
threads. Starting with the same initial quad-group state, with a delta of 2, the function:
• Shifts the quad lane IDs up by two
• Marks the upper two lanes as invalid
Computed
Quad Lane ID 2 3 4 5
valid 1 1 0 0
data c d c d
The quad_shuffle_down() function is a non-wrapping operation that doesn’t affect the upper
delta lanes.
To demonstrate the shuffle-and-fill functions, start this quad-group’s initial state:
filling fa fb fc fd
Computed
Quad Lane ID -2 -1 0 1
data fc fd a b
The quad_shuffle_and_fill_up() function with the modulo parameter splits the quad-
group into vectors, each with size modulo and shifts each vector by the delta threads. For
example, with a modulo value of 2 and a delta value of 1, the function:
• Shifts the quad lane IDs down by one
• Assigns the upper lane of each vector in filling to the lower lane of each vector in data
Computed
Quad Lane ID -1 0 -1 0
data fb a fd c
Computed
Quad Lane ID 2 3 4 5
data c d fa fb
The quad_shuffle_and_fill_down() function with the modulo parameter splits the quad-
group into vectors, each with size modulo and shifts each vector by the delta threads. For
example, with a modulo value of 2 and a delta value of 1, the function:
Computed
Quad Lane ID 1 2 1 2
data b fa d fc
The quad_ballot function uses the quad_vote wrapper type, which can be explicitly cast to
its underlying type. (In the following example, note use of vote_t to represent an underlying
type, XXX.)
class quad_vote {
public:
typedef XXX vote_t;
explicit constexpr quad_vote(vote_t v = 0);
explicit constexpr operator vote_t() const;
The quad_vote constructor masks out the top bits (that is, other than the four bottom bits).
Hence, the non-bottom-four bits are guaranteed to be unset when cast to vote_t.
If you have customized sample positions (set with the setSamplePositions:count: method
of MTLRenderPassDescriptor), get_sample_position(index) returns the position
programmed for the specified index.
Table 6.19. Fragment flow control function in the Metal standard library
Built-in fragment functions Description
void discard_fragment(void) Marks the current fragment as terminated and
discards this fragment's output of the fragment
function.
Writes to a buffer or texture from a fragment thread made before calling discard_fragment
are not discarded.
Multiple fragment threads or helper threads associated with a fragment thread execute
together to compute derivatives. Prior to Metal 2.3, if any (but not all) of these threads executes
the discard_fragment function, the thread is terminated and the behavior of any derivative
computations (explicit or implicit) is undefined. Since Metal 2.3, discard_fragment marks the
fragment is terminated while continuing to execute in parallel to make derivative computations
well defined. Even though execution continues, the write behavior remains the same as before.
The fragment will discard the fragment output and discard all writes to buffer or texture after
discard_fragment.
The texture member functions, defined in the header <metal_texture>, listed in this section
and its subsections for different texture types include:
• sample - sample from a texture,
• sample_compare - sample compare from a texture,
• gather - gather from a texture,
• gather_compare - gather compare from a texture,
• read - sampler-less read from a texture,
• write - write to a texture,
• texture query (such as get_width, get_height, get_num_mip_levels,
get_array_size), and
• texture fence.
The following texture member functions are available to support sparse textures:
macOS: sparse texture functions supported since Metal 2.3.
iOS: sparse texture functions supported since Metal 2.2.
The following functions can be used to return the LOD (mip level) computation result for a
simulated texture fetch:
macOS: sparse texture functions supported since Metal 2.2.
iOS: sparse texture functions supported since Metal 2.3.
calculate_unclamped_lod - calculates the level of detail that would be sampled for the
given coordinates, ignoring any sampler parameter. The fractional part of this value contains
the mip level blending weights, even if the sampler indicates a nearest mip selection.
The following member functions can write to a 1D texture. Since mipmaps are not supported for
1D textures, lod must be 0.
void write(Tv color, uint coord, uint lod = 0)
void write(Tv color, ushort coord,
ushort lod = 0) // All OS: Since Metal 1.2.
The following member functions query a 1D texture. Since mipmaps are not supported for 1D
textures, get_num_mip_levels() always return 0, and lod must be 0 for get_width():
uint get_width(uint lod = 0) const
uint get_num_mip_levels() const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member function can sample
from a sparse 1D texture.
sparse_color-Tv sparse_sample(sampler s, float coord) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions can
perform sampler-less reads from a sparse 1D texture. Since mipmaps are not supported for 1D
textures, lod must be 0.
sparse_color-Tv sparse_read(ushort coord, ushort lod = 0) const
sparse_color-Tv sparse_read(uint coord, uint lod = 0) const
The following member functions can perform sampler-less reads from a 1D texture array. Since
mipmaps are not supported for 1D textures, lod must be 0.
Tv read(uint coord, uint array, uint lod = 0) const
The following member functions can write to a 1D texture array. Since mipmaps are not
supported for 1D textures, lod must be 0.
void write(Tv color, uint coord, uint array, uint lod = 0)
void write(Tv color, ushort coord, ushort array,
ushort lod = 0) // All OS: Since Metal 1.2.
The following member functions query a 1D texture array. Since mipmaps are not supported for
1D textures, get_num_mip_levels() always return 0, and lod must be 0 for get_width().
uint get_width(uint lod = 0) const
uint get_array_size() const
uint get_num_mip_levels() const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions can
sample from a sparse 1D texture array.
sparse_color-Tv sparse_sample(sampler s, float coord , uint array) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions can
perform sampler-less reads from a sparse 1D texture array. Since mipmaps are not supported
for 1D textures, lod must be 0.
sparse_color-Tv sparse_read(ushort coord , ushort array,
ushort lod = 0) const
sparse_color-Tv sparse_read(uint coord , uint array, uint lod = 0) const
6.12.3 2D Texture
For the functions in this section, the following data types and corresponding constructor
functions can specify sampling options (lod_options):
bias(float value)
level(float lod)
gradient2d(float2 dPdx, float2 dPdy)
min_lod_clamp(float lod) // All OS: support since Metal 2.2
The following member functions can perform sampler-less reads from a 2D texture:
Tv read(uint2 coord, uint lod = 0) const
Tv read(ushort2 coord, ushort lod = 0) const // All OS: Since Metal 1.2.
The following member functions can write to a 2D texture. On macOS up to Metal 2.2, lod must
be 0. In Metal 2.3, this restriction is lifted for some hardware.
void write(Tv color, uint2 coord, uint lod = 0)
void write(Tv color, ushort2 coord,
ushort lod = 0) // All OS: Since Metal 1.2.
The following member functions can gather four samples for bilinear interpolation when
sampling a 2D texture:
enum class component {x, y, z, w};
Tv gather(sampler s, float2 coord, int2 offset = int2(0),
component c = component::x) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions samples
from a sparse 2D texture.
sparse_color-Tv sparse_sample(sampler s, float2 coord,
int2 offset =
int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, bias options,
int2 offset =
int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, level options,
int2 offset =
int2(0)) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions performs
sampler-less reads from a sparse 2D texture.
sparse_color-Tv sparse_read(ushort2 coord, ushort lod = 0) const
sparse_color-Tv sparse_read(uint2 coord, uint lod = 0) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member function gathers
four samples for bilinear interpolation from a sparse 2D texture.
sparse_color-Tv sparse_gather(sampler s, float2 coord,
int2 offset =
int2(0), component c = component::x) const
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float2 coord);
float calculate_unclamped_lod(sampler s, float2 coord);
// No optional arguments.
The following member functions performs sampler-less reads from a 2D texture array:
Tv read(uint2 coord, uint array, uint lod = 0) const
Tv read(ushort2 coord, ushort array,
ushort lod = 0) const // All OS: Since Metal 1.2.
The following member functions write to a 2D texture array. On macOS up to Metal 2.2, lod
must be 0. In Metal 2.3, this restriction is lifted for some hardware.
The following member functions gather four samples for bilinear interpolation when sampling a
2D texture array:
Tv gather(sampler s, float2 coord, uint array, int2 offset = int2(0),
component c = component::x) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions samples
from a sparse 2D texture array.
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
int2 offset =
int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
bias options, int2
offset = int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
level options,
int2 offset = int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
min_lod_clamp
min_lod_clamp_options, int2 offset = int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
bias bias_options,
min_lod_clamp min_lod_clamp_options,
int2 offset =
int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
gradient2d
options, int2 offset = int2(0)) const
sparse_color-Tv sparse_sample(sampler s, float2 coord, uint array,
gradient2d
grad_options, min_lod_clamp min_lod_clamp_options,
int2 offset =
int2(0)) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member function gathers
four samples for bilinear interpolation from a sparse 2D texture array.
sparse_color-Tv sparse_gather(sampler s, float2 coord, uint array,
int2 offset =
int2(0), component c = component::x) const
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float2 coord);
float calculate_unclamped_lod(sampler s, float2 coord);
6.12.5 3D Texture
For the functions in this section, the following data types and corresponding constructor
functions can specify sampling options (lod_options):
bias(float value)
level(float lod)
gradient3d(float3 dPdx, float3 dPdy)
min_lod_clamp(float lod) // All OS: support since Metal 2.2
The following member functions can perform sampler-less reads from a 3D texture:
Tv read(uint3 coord, uint lod = 0) const
Tv read(ushort3 coord, ushort lod = 0) const // All OS: Since Metal 1.2.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions samples
from a sparse 3D texture.
sparse_color-Tv sparse_sample(sampler s, float3 coord,
int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, bias options,
int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, level options,
int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
min_lod_clamp
min_lod_clamp_options, int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, bias bias_options,
min_lod_clamp
min_lod_clamp_options, int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
gradient3d
grad_options, int3 offset = int3(0)) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
gradient3d
grad_options, min_lod_clamp min_lod_clamp_options,
int3 offset =
int3(0)) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions performs
sampler-less reads from a sparse 3D texture.
sparse_color-Tv sparse_read(ushort3 coord, ushort lod = 0) const
sparse_color-Tv sparse_read(uint3 coord, uint lod = 0) const
Table 6.20 describes the cube face and the number used to identify the face.
0 Positive X
1 Negative X
2 Positive Y
3 Negative Y
4 Positive Z
5 Negative Z
The following member function can gather four samples for bilinear interpolation when
sampling a cube texture:
Tv gather(sampler s, float3 coord, component c = component::x) const
The following member functions write to a cube texture. On macOS up to Metal 2.2, lod must
be 0. In Metal 2.3, this restriction is lifted for some hardware.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube texture.
sparse_color-Tv sparse_sample(sampler s, float3 coord) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, bias options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, level options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, bias bias_options,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
gradientcube
grad_options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord,
gradientcube
grad_options, min_lod_clamp min_lod_clamp_options) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
sampler-less reads from a sparse cube texture.
sparse_color-Tv sparse_read(ushort3 coord, ushort lod = 0) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member function gather four
samples for bilinear interpolation from a sparse cube texture.
sparse_color-Tv sparse_gather(sampler s, float3 coord,
component c =
component::x) const
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float3 coord);
float calculate_unclamped_lod(sampler s, float3 coord);
The following member functions can sample from a cube texture array:
Tv sample(sampler s, float3 coord, uint array) const
Tv sample(sampler s, float3 coord, uint array, lod_options options) const
Tv sample(sampler s, float3 coord, uint array, bias bias_options,
min_lod_clamp min_lod_clamp_options) const
Tv sample(sampler s, float3 coord, uint array, gradientcube grad_options,
min_lod_clamp min_lod_clamp_options) const
The following member function can gather four samples for bilinear interpolation when
sampling a cube texture array:
Tv gather(sampler s, float3 coord, uint array,
component c = component::x) const
The following member functions perform sampler-less reads from a cube texture array:
Tv read(uint2 coord, uint face, uint array, uint lod = 0) const
Tv read(ushort2 coord, ushort face, ushort array,
ushort lod = 0) const // All OS: Since Metal 1.2.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube texture array.
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
bias options)
const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
level options)
const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
bias bias_options,
min_lod_clamp min_lod_clamp_options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
gradientcube
options) const
sparse_color-Tv sparse_sample(sampler s, float3 coord, uint array,
gradientcube
grad_options, min_lod_clamp min_lod_clamp_options) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
sampler-less reads from a sparse cube texture array.
sparse_color-Tv sparse_read(ushort2 coord, ushort face, ushort array,
ushort lod = 0) const
sparse_color-Tv sparse_read(uint2 coord, uint face, uint array,
uint lod = 0) const
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float2 coord);
float calculate_unclamped_lod(sampler s, float2 coord);
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
reads from a sparse 2D multisampled texture.
sparse_color-Tv sparse_read(ushort2 coord, ushort sample) const
sparse_color-Tv sparse_read(uint2 coord, uint sample) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
reads from a sparse 2D multisampled texture array.
sparse_color-Tv sparse_read(ushort2 coord, ushort array,
ushort sample) const
sparse_color-Tv sparse_read(uint2 coord, uint array, uint sample) const
The following member functions sample from a 2D depth texture and compare a single
component against the specified comparison value:
T sample_compare(sampler s, float2 coord, float compare_value,
int2 offset = int2(0)) const
The following member functions perform sampler-less reads from a 2D depth texture:
T read(uint2 coord, uint lod = 0) const
T read(ushort2 coord, ushort lod = 0) const // All OS: Since Metal 1.2.
The following built-in functions gather four samples for bilinear interpolation when sampling a
2D depth texture:
Tv gather(sampler s, float2 coord, int2 offset = int2(0)) const
The following member functions gather four samples for bilinear interpolation when sampling a
2D depth texture and comparing these samples with a specified comparison value (1.0 if the
comparison passes and 0.0 if it fails).
Tv gather_compare(sampler s, float2 coord, float compare_value,
int2 offset = int2(0)) const
T must be a float type.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse 2D depth texture.
sparse_color-T sparse_sample(sampler s, float2 coord,
int2 offset =
int2(0)) const
sparse_color-T sparse_sample(sampler s, float2 coord, bias options,
int2 offset =
int2(0)) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse 2D depth texture and compare a single component against the specified
comparison value.
sparse_color-T sparse_sample_compare(sampler s, float2 coord,
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
sampler-less reads from a sparse 2D depth texture.
sparse_color-T sparse_read(ushort2 coord, ushort lod = 0) const
sparse_color-T sparse_read(uint2 coord, uint lod = 0) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions gather
four samples for bilinear interpolation from a sparse 2D depth texture, or can gather those
samples and compare them against a specified comparison value.
sparse_color-Tv sparse_gather(sampler s, float2 coord,
int2 offset =
int2(0), component c = component::x) const
sparse_color-Tv sparse_gather_compare(sampler s, float2 coord,
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float2 coord);
float calculate_unclamped_lod(sampler s, float2 coord);
The following member functions sample from a 2D depth texture array and compare a single
component against the specified comparison value:
T sample_compare(sampler s, float2 coord, uint array, float compare_value,
int2 offset = int2(0)) const
T sample_compare(sampler s, float2 coord, uint array, float compare_value,
lod_options options, int2 offset =
int2(0)) const
T must be a float type. On macOS up to Metal 2.2, the supported lod_options values are
level and min_lod_clamp (the latter, since Metal 2.2); lod must be a zero constant. In Metal
2.3, this restriction for lod_options is lifted for some hardware.
The following member functions perform sampler-less reads from a 2D depth texture array:
T read(uint2 coord, uint array, uint lod = 0) const
T read(ushort2 coord, ushort array,
ushort lod = 0) const // All OS: Since Metal 1.2.
The following member function gather four samples for bilinear interpolation when sampling a
2D depth texture array:
Tv gather(sampler s, float2 coord, uint array, int2 offset = int2(0)) const
The following member function gather four samples for bilinear interpolation when sampling a
2D depth texture array and comparing these samples with a specified comparison value:
Tv gather_compare(sampler s, float2 coord, uint array, float compare_value,
int2 offset = int2(0)) const
T must be a float type.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
reads from a sparse 2D multisampled depth texture.
sparse_color-T sparse_read(ushort2 coord, ushort sample) const
sparse_color-T sparse_read(uint2 coord, uint sample) const
The following member functions sample from a cube depth texture and compare a single
component against the specified comparison value:
T sample_compare(sampler s, float3 coord, float compare_value) const
T sample_compare(sampler s, float3 coord, float compare_value,
lod_options options) const
T must be a float type. On macOS up to Metal 2.2, the supported lod_options values are
level and min_lod_clamp (the latter, since Metal 2.2), and lod must be a zero constant. In
Metal 2.3, this restriction for lod_options is lifted for some hardware.
The following member functions perform sampler-less reads from a cube depth texture:
T read(uint2 coord, uint face, uint lod = 0) const
T read(ushort2 coord, ushort face,
ushort lod = 0) const // All OS: Since Metal 1.2.
The following member functions gather four samples for bilinear interpolation when sampling a
cube texture and comparing these samples with a specified comparison value:
Tv gather_compare(sampler s, float3 coord, float compare_value) const
T must be a float type.
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube depth texture.
sparse_color-T sparse_sample(sampler s, float3 coord) const
sparse_color-T sparse_sample(sampler s, float3 coord, bias options) const
sparse_color-T sparse_sample(sampler s, float3 coord, level options) const
sparse_color-T sparse_sample(sampler s, float3 coord,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-T sparse_sample(sampler s, float3 coord, bias bias_options,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-T sparse_sample(sampler s, float3 coord,
gradientcube
grad_options) const
sparse_color-T sparse_sample(sampler s, float3 coord,
gradientcube
grad_options,
min_lod_clamp
min_lod_clamp_options) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube depth texture and compare a single component against the specified
comparison value.
sparse_color-T sparse_sample_compare(sampler s, float3 coord,
float compare_value,
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
reads from a sparse cube depth texture.
sparse_color-T sparse_read(ushort2 coord, ushort face,
ushort lod = 0) const
sparse_color-T sparse_read(uint2 coord, uint face, uint lod = 0) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions gather
four samples for bilinear interpolation from a sparse cube depth texture, or can gather those
samples and compare them against a specified comparison value.
sparse_color-Tv sparse_gather(sampler s, float3 coord) const
sparse_color-Tv sparse_gather_compare(sampler s, float3 coord,
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float3 coord);
float calculate_unclamped_lod(sampler s, float3 coord);
The following member functions sample from a cube depth texture array:
T sample(sampler s, float3 coord, uint array) const
T sample(sampler s, float3 coord, uint array, lod_options options) const
T sample(sampler s, float3 coord, uint array, bias bias_options,
min_lod_clamp min_lod_clamp_options) const
T sample(sampler s, float3 coord, uint array, gradientcube grad_options,
min_lod_clamp min_lod_clamp_options) const
The following member functions sample from a cube depth texture and compare a single
component against the specified comparison value:
T sample_compare(sampler s, float3 coord, uint array,
float compare_value) const
T sample_compare(sampler s, float3 coord, uint array,
float compare_value, lod_options
options) const
T must be a float type. On macOS up to Metal2.2, the supported lod_options values are
level and min_lod_clamp (the latter, since Metal 2.2), and lod must be a zero constant. In
Metal 2.3, this restriction for lod_options is lifted for some hardware.
The following member functions perform sampler-less reads from a cube depth texture array:
T read(uint2 coord, uint face, uint array, uint lod = 0) const
T read(ushort2 coord, ushort face, ushort array,
ushort lod = 0) const // All OS: Since Metal 1.2.
The following member function gather four samples for bilinear interpolation when sampling a
cube depth texture:
Tv gather(sampler s, float3 coord, uint array) const
The following member function gather four samples for bilinear interpolation when sampling a
cube depth texture and comparing these samples with a specified comparison value:
Tv gather_compare(sampler s, float3 coord, uint array,
float compare_value) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube depth texture array.
sparse_color-T sparse_sample(sampler s, float3 coord, uint array) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
bias options) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
level options) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
min_lod_clamp
min_lod_clamp_options) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
bias bias_options,
min_lod_clamp min_lod_clamp_options) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
gradientcube
grad_options) const
sparse_color-T sparse_sample(sampler s, float3 coord, uint array,
gradientcube
grad_options,
min_lod_clamp
min_lod_clamp_options) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions sample
from a sparse cube depth texture array and compare a single component against the specified
comparison value.
sparse_color-T sparse_sample_compare(sampler s, float3 coord, uint array,
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions perform
reads from a sparse cube depth texture array.
sparse_color-T sparse_read(ushort2 coord, ushort face, ushort array,
ushort lod = 0) const
sparse_color-T sparse_read(uint2 coord, uint face, uint array,
uint lod = 0) const
On iOS since Metal 2.2, and macOS since Metal 2.3, the following member functions gather
four samples for bilinear interpolation from a sparse cube depth texture array, or can gather
those samples and compare them against a specified comparison value.
sparse_color-Tv sparse_gather(sampler s, float3 coord, uint array) const
sparse_color-Tv sparse_gather_compare(sampler s, float3 coord, uint array,
On iOS since Metal 2.3, and macOS since Metal 2.2, the following member functions simulate a
texture fetch and return the LOD (mip level) computation result.
float calculate_clamped_lod(sampler s, float3 coord);
float calculate_unclamped_lod(sampler s, float3 coord);
The following example uses the read method to access a texture buffer:
kernel void
myKernel(texture_buffer<float, access::read> myBuffer)
{
uint index = …;
float4 value = myBuffer.read(index);
}
Use the following method to query the number of elements in a texture buffer:
uint get_width() const;
The following example shows how to use a texture fence function to make sure that writes to a
texture by a thread are visible to later reads to the same location by the same thread:
kernel void
my_kernel(texture2d<float, access::read_write> texA,
…,
ushort2 gid [[thread_position_in_grid]])
{
float4 clr = …;
texA.write(clr, gid);
…
// Use fence to ensure that writes by thread become
// visible to later reads by the thread.
texA.fence();
clr_new = texA.read(gid);
…
!
Use the following member function to query the number of unique color entries for a specific
location given by an (x, y) coordinate inside the imageblock:
ushort get_num_colors(ushort2 coord) const;
The following member function returns the color coverage mask (that is, whether a given color
covers one or more samples in the imageblock). Each sample is identified by its bit position in
the return value. If a bit is set, then this indicates that this sample uses the color index.
ushort get_color_coverage_mask(ushort2 coord, ushort color_index) const;
color_index is a value from 0 to get_num_colors() - 1.
Use the following member function to read or write an imageblock at sample or color rate.
coord specifies the (x, y) coordinate inside the imageblock, and index is the sample or color
index.
enum class imageblock_data_rate { color, sample };
T read(ushort2 coord, ushort index, imageblock_data_rate data_rate) const;
void write(T data, ushort2 coord, ushort index, imageblock_data_rate
data_rate);
Example:
struct Foo {
float4 a [[color(0)]];
int4 b [[color(1)]];
};
kernel void
my_kernel(imageblock<Foo, imageblock_layout_implicit> img_blk,
ushort2 lid [[thread_index_in_threadgroup]] …)
{
…
Foo f = img_blk.read(lid); float4 r = f.a;
Use the following member function to write an imageblock with a color coverage mask. You
must use this member function when writing to an imageblock at color rate:
void write(T data, ushort2 coord, ushort color_coverage_mask);
Use the following member functions to get a region of a slice for a given data member in the
imageblock. You use these functions to write data associated with a specific data member
described in the imageblock for all threads in the threadgroup to a specified region in a texture.
color_index refers to the data member declared in the structure type specified in
imageblock<T> with the [[color(n)]] attribute where n is color_index. size is the actual
size of the copied slice.
const imageblock_slice<E, imageblock_layout_implicit> slice(ushort
color_index) const;
const imageblock_slice<E, imageblock_layout_implicit> slice(ushort
color_index, ushort2 size) const;
The region to copy has an origin of (0,0). The slice(…) member function that does not have
the argument size copies the entire width and height of the imageblock.
kernel void
my_kernel(imageblock<Foo> img_blk,
ushort2 lid [[thread_position_in_threadgroup]] …)
{
…
threadgroup_imageblock Foo* f = img_blk.data(lid);
half4 r = f->a;
f->a = r;
…
}
Use the following write member function to write an imageblock with a color coverage mask.
You must use this member function when writing to an imageblock at color rate.
void write(T data, ushort2 coord, ushort color_coverage_mask);
Use the following slice member functions to get a region of a slice for a given data member in
the imageblock structure. You use this function to write data associated with a specific data
member described in the imageblock structure for all threads in the threadgroup to a specified
region in a texture.
data_member is a data member declared in the structure type specified in imageblock<T>.
size is the actual size of the copied slice.
const imageblock_slice<E, imageblock_layout_explicit>
slice(const threadgroup_imageblock E& data_member) const;
const imageblock_slice<E, imageblock_layout_explicit>
slice(const threadgroup_imageblock E& data_member, ushort2 size) const;
The region to copy has an origin of (0,0). The slice(…) member function that does not have
the argument size copies the entire width and height of the imageblock.
kernel void
my_kernel(texture2d<half> src [[ texture(0) ]],
texture2d<half, access::write> dst [[ texture(1) ]],
imageblock<Foo> img_blk,
ushort2 lid [[ thread_position_in_threadgroup ]],
ushort2 gid [[ thread_position_in_grid ]])
{
// Read the pixel from the input image using the thread ID.
half4 clr = src.read(gid);
// A barrier to make sure all threads finish writing to the elements in the
// imageblock.
threadgroup_barrier(mem_flags::mem_threadgroup_imageblock);
! M refers to the type of the other argument for arithmetic operations. For atomic integer !
types, M is C.
All OS: Functions listed with names that end with _explicit (such as
atomic_store_explicit or atomic_load_explicit) supported since Metal 1.0. unless
otherwise indicated.
iOS: Support for the atomic_store, atomic_load, atomic_exchange,
atomic_compare_exchange_weak, atomic_fetch_key functions since Metal 2.0.
These operations are atomic read-modify-write operations. For signed integer types, the
arithmetic operation uses two’s complement representation with silent wrap-around on
overflow.
render_command can encode any draw command type. The following public interface for
render_command is defined in the header <metal_command_buffer>. To pass
render_pipeline_state objects to your shader, use argument buffers. Within an argument
buffer, the pipeline state can be passed as scalars or in an array.
set_render_pipeline_state(…) and render pipeline states are available on macOS since
Metal 2.1 and on iOS since Metal 2.2.
enum class primitive_type { point, line, line_strip, triangle,
triangle_strip };
struct render_command {
public:
explicit render_command(command_buffer icb, unsigned cmd_index);
When accessing command_buffer, Metal does not check whether the access is within bounds .
If an access is beyond the capacity of the buffer, the behavior is undefined.
The exposed methods in render_command mirror the interface of
MTLIndirectRenderCommand and are similar to MTLRenderCommandEncoder. Notable
differences with MTLRenderCommandEncoder are:
" Calls to draw* methods in render_command encode the actions taken by the command.
If multiple calls are made, only the last one takes effect.
compute_command can encode any dispatch command type. The following public interface for
compute_command is defined in the header <metal_command_buffer>. The
compute_pipeline_state type represents compute pipeline states, which can only be passed
to shaders through argument buffers. Within an argument buffer, the pipeline state can be
passed as scalars or in an array.
struct compute_command {
public:
explicit compute_command(command_buffer icb, unsigned cmd_index);
When accessing command_buffer, Metal does not check whether the access is within bounds.
If an access is beyond the capacity of the buffer, the behavior is undefined.
The exposed methods in compute_command mirror the interface of
MTLIndirectComputeCommand and are similar to MTLComputeCommandEncoder.
In an ICB, dispatches are always concurrent. Calls to the concurrent_dispatch* methods in
compute_command encode the actions taken by the command. If multiple calls are made, only
the last one takes effect.
The application is responsible for putting barriers where they are needed. Barriers encoded in
an ICB do not affect the parent encoder.
The CPU may have initialized individual commands within a command_buffer before the
command_buffer is passed as an argument to a shader. If the CPU has not already initialized a
command, you must reset that command before using it.
struct rasterization_rate_map_decoder {
explicit rasterization_rate_map_decoder(
constant rasterization_rate_map_data &data)
thread;
The VRR map describes the mapping between screen space and physical fragment space and
enables conversion of the rendering results back to the desired screen resolution. To convert
bool
is_null_instance_acceleration_structure(instance_acceleration_structure)
In Metal 2.4, call the following function to check if an acceleration structure is null.
2021-10-05 Copyright © 2021 Apple Inc. All Rights Reserved.
Page 223 of 249
bool
is_null_acceleration_structure(acceleration_structure<intersection_tags…>)
result_type intersect(…parameters…).
Table 6.25 shows the possible parameters for intersect function. All intersect functions
must have ray and accel_struct parameter. The other parameters are optional.
time The time associated with the ray. The parameter exists if the
All OS: Since Metal 2.4. intersection_tags have primitive_motion or
instance_motion.
The result_type is
The following set of intersect functions are available only if intersection_tags does not have
instancing:
result_type
intersect(
ray ray,
primitive_acceleration_structure accel_struct) const;
result_type
intersect(
ray ray,
primitive_acceleration_structure accel_struct,
intersection_function_table<intersection_tags...> func_table) const;
The following set of intersect functions are available only if intersection_tags have
instancing.
result_type
intersect(
ray ray,
instance_acceleration_structure accel_struct,
uint mask = ~0U) const;
result_type
intersect(
ray ray,
instance_acceleration_structure accel_struct,
intersection_function_table<intersection_tags...> func_table) const;
result_type
intersect(
ray ray,
instance_acceleration_structure accel_struct,
uint mask,
intersection_function_table<intersection_tags...> func_table) const;
As of Metal 2.4, the following set of intersect functions are available if intersection_tags
have primitive_motion or instance_motion.
template <typename T, intersection_tags...>
result_type
intersect(
ray ray,
acceleration_structure< intersection_tags...> accel_struct,
float time) const;
As of Metal 2.4, the following set of intersect functions are available only if
intersection_tags have instancing and either primitive_motion or instance_motion.
void set_triangle_front_facing_winding(winding)
void set_geometry_cull_mode(geometry_cull_mode
void set_opacity_cull_mode(opacity_cull_mode)
void force_opacity(forced_opacity)
void assume_geometry_type(geometry_type)
void assume_identity_transforms(bool)
void accept_any_intersection(bool)
Triangles have two sides or "faces". The front facing winding determines which triangle face is
considered the "front" face when viewed from the ray origin. If the vertices appear in clockwise
order when viewed from the ray origin and the front facing winding is clockwise, then the visible
face is the front face. The other face is the back face. If the front facing winding is
counterclockwise, then the opposite is true. Use the following function to change the default
winding (clockwise):
enum class winding {
clockwise,
counterclockwise
};
void set_triangle_front_facing_winding(winding w);
To change the default triangle cull mode (none), use the following function.
enum class triangle_cull_mode {
none,
front,
back
};
void set_triangle_cull_mode(triangle_cull_mode tcm);
If the cull mode is set to front, then triangles whose front face is visible from the ray origin
are not considered for intersection. Otherwise, if the cull mode is set to back, then triangles
whose back face is visible from the ray origin are not considered for intersection.
The following function may be used to set the intersector to cull all bounding box or triangle
primitives from the set of candidate geometries. The default geometry cull mode is none.
enum class geometry_cull_mode {
none,
triangle,
The default opacity cull mode is none. Use the following function to change the opacity. See
below on how opacity will affect triangle and bounding box primitives.
enum class opacity_cull_mode {
none,
opaque,
non_opaque
};
void set_opacity_cull_mode(opacity_cull_mode ocm);
Call the following function to override per-instance and per-geometry setting of forced
capacity. The default is none.
enum class forced_opacity {
none,
opaque,
non_opaque
};
void force_opacity(forced_opacity fo);
Triangle primitives may also be culled based on their opacity: An opaque triangle will not run
any intersection function. A non_opaque triangle will run its intersection function to accept or
reject the hit.
The PrimitiveAccelerationStructure encodes if the triangle is opaque or non_opaque by
declaring MTLAccelerationStructureGeometryFlagOpaque. The opaqueness can be
overridden by calling intersector.force_opacity(). If used, this takes precedence over the
per-instance opaqueness flags (MTLAccelerationStructureInstanceFlagOpaque and
MTLAccelerationStructureInstanceFlagNonOpaque), which in turn takes precedence over
the per-geometry opaqueness.
For custom bounding box primitives, the opaqueness will be evaluated in the same way as
described for triangles (first intersector.set_opacity_cull_mode(), then InstanceFlags,
then GeometryFlags). The opaque parameter informs the bounding box intersection program
the resolved opaqueness state. The intersection function may then use this to influence its
evaluation of if a hit is encountered or not.
intersector.set_opacity_cull_mode() will skip over primitive types based on their
opaqueness.
If intersector.force_opacity() is set to opaque or non_opaque then
intersector.set_opacity_cull_mode() must be none. The reverse is also true: Opacity
Override and Opacity culling cannot be mixed. The results of illegal combinations are
undefined.
If the following function is set to true, set the intersector object to search for any intersection.
If not, it will search for the closest. The default is false.
void accept_any_intersection(bool value);
void reset(…) * *
bool next() * *
void abort() * *
intersection_type * *
get_candidate_intersection_type ()
intersection_type * *
get_committed_intersection_type()
void commit_triangle_intersection() *
void *
commit_bounding_box_intersection(float distance)
float3 get_world_space_ray_origin() * *
float3 get_world_space_ray_direction() * *
float get_ray_min_distance() * *
intersection_params get_intersection_params() * *
float get_candidate_triangle_distance() *
uint get_candidate_instance_id() * *
uint get_candidate_user_instance_id () * *
uint get_candidate_geometry_id() * *
uint get_candidate_primitive_id() * *
float2 *
get_candidate_triangle_barycentric_coord()
bool is_candidate_non_opaque_bounding_box() *
bool is_candidate_triangle_front_facing() *
float4x3 * *
get_candidate_object_to_world_transform()
float4x3 * *
get_candidate_world_to_object_transform()
float3 get_candidate_ray_origin() * *
float3 get_candidate_ray_direction() * *
float get_committed_distance() * *
uint get_committed_instance_id() * *
uint get_committed_user_instance_id() * *
uint get_committed_geometry_id() * *
uint get_committed_primitive_id() * *
float2 * *
get_committed_triangle_barycentric_coord()
bool is_committed_triangle_front_facing() *
float4x3 *
get_committed_object_to_world_transform()
float4x3 * *
get_committed_world_to_object_transform()`
float3 get_committed_ray_origin() * *
float3 get_committed_ray_direction() * *
Call the following function to query the distance of a candidate triangle hit that needs
consideration.
float get_candidate_triangle_distance()
Call the following function to query the distance of the currently committed hit.
float get_committed_distance()
Call the following function to query the top level structure instance ID for the current candidate
hit.
uint get_candidate_instance_id()
Call the following function to query user instance ID provided by user on the bottom level
acceleration structure for the current candidate hit.
uint get_candidate_user_instance_id()
Call the following function to query the bottom level structure primitive ID within the geometry
for the current candidate hit.
uint get_candidate_primitive_id()
Call the following function to query the top level structure instance ID for the current committed
hit.
uint get_committed_instance_id()
Call the following function to query user instance ID provided by user on the bottom level
acceleration structure for the current committed hit.
uint get_committed_user_instance_id()
Call the following function to query the bottom level structure geometry ID for the current
committed hit.
uint get_committed_geometry_id()
Call the following function to query the bottom level structure primitive ID within the geometry
for the current committed hit.
uint get_committed_primitive_id()
Call the following function to query the ray origin in object space for the current hit candidate.
float3 get_candidate_ray_origin()
Call the following function to query the ray direction in object space for the current hit
candidate.
float3 get_candidate_ray_direction()
Call the following function to query the ray origin in object space for the current committed hit.
float3 get_committed_ray_origin()
Call the following function to query the ray direction in object space for the current committed
hit.
float3 get_committed_ray_direction()
Call the following function to query the matrix for transforming ray origin/direction of current
candidate hit from world-space to object-space.
float4x3 get_candidate_world_to_object_transform()
Call the following function to query the matrix for transforming ray origin/direction of current
committed hit from object-space to world-space.
float4x3 get_committed_object_to_world_transform()
Call the following function to query the matrix for transforming ray origin/direction of current
committed hit from world-space to object-space.
float4x3 get_committed_world_to_object_transform()
Call the following function to query the candidate hit location barycentric coordinates. Valid
when get_candidate_intersection_type() returns triangle.
float2 get_candidate_triangle_barycentric_coord()
Call the following function to query the committed hit location barycentric coordinates. Valid
when get_committed_intersection_type() returns triangle.
float2 get_committed_triangle_barycentric_coord()
Call the following function to query if the hit triangle candidate is front or back facing. Returns
true if it is front face and false if it is back face. Valid when
get_candidate_intersection_type() returns triangle.
bool is_candidate_triangle_front_facing()
Call the following function to query if the committed hit is front or back facing. Returns true if it
is front face and false if it is back face. Valid when get_committed_intersection_type()
returns triangle.
bool is_committed_triangle_front_facing()
Table 7.2. Accuracy of single-precision operations and functions with fast math
enabled
Math Function Minimum Accuracy (ULP Values)
x + y Correctly rounded
x - y Correctly rounded
x * y Correctly rounded
1.0 / x <= 1 ulp for x in the domain of 2-126 to 2126
x / y <= 2.5 ulp for y in the domain of 2-126 to 2126
Table 7.3 describes the minimum accuracy of half-precision floating-point basic arithmetic
operations and math functions given as ULP values. Table 7.3 applies to iOS and macOS,
starting with MTLGPUFamilyApple4 hardware.
Even though the precision of individual math operations and functions are specified in Table 7.1,
Table 7.2, and Table 7.3, the Metal compiler, in fast math mode, may reassociate floating-point
operations that may dramatically change results in floating-point. Reassociation may change or
ignore the sign of zero, allow optimizations to assume the arguments and result are not NaN or
+/-INF, inhibit or create underflow or overflow and thus cannot be in code that relies on
rounding behavior such as (x + 252) - 252, or ordered floating-point comparisons.
The ULP is defined as follows:
If x is a real number that lies between two finite consecutive floating-point numbers a and b,
without being equal to one of them, then ulp(x) = |b − a|, otherwise ulp(x) is the distance
between the two non-equal finite floating-point numbers nearest x. Moreover, ulp(NaN) is NaN.
In Metal 2.0, all conversions to and from unorm data types shall be correctly rounded.
7.7.6 Conversion Rules for Signed and Unsigned Integer Pixel Data Types
For textures that have an 8- or 16-bit signed or unsigned integer pixel values, the texture
sample and read functions return a signed or unsigned 32-bit integer pixel value. The
conversions described in this section must be correctly saturated.
Writes to these integer textures perform one of the conversions listed in Table 7.6.
Table 7.6. Conversion between integer pixel data types
Convert From To Conversion Rule
The precision of the above conversion must ensure that the delta between the resulting
infinitely precise floating point value when converting result back to an unnormalized sRGB
value but without rounding to an 8-bit unsigned integer value (call it r) and the original sRGB 8-
bit unsigned integer color value (call it rorig) is <= 0.5; for example:
fabs(r – rorig) <= 0.5
Use the following rules for converting a linear RGB floating-point color value (call it c) to a
normalized 8-bit unsigned integer sRGB value:
if (isnan(c)) c = 0.0;
if (c > 1.0)
c = 1.0;
else if (c < 0.0)
c = 0.0;
else if (c < 0.0031308)
c = 12.92 * c;
else
c = 1.055 * powr(c, 1.0/2.4) - 0.055;
Apple Inc.!
One Apple Park Way!
Cupertino, CA 95014!
408-996-1010