0% found this document useful (0 votes)

52 views253 pages

Vulkan 101

a vulkan tutor

Uploaded by

dreaman888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views253 pages

Vulkan 101

a vulkan tutor

Uploaded by

dreaman888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 253

Vulkan 101

Tom Olson
Directory, Graphics Research, ARM
Chair, Vulkan Working Group
© Copyright Khronos Group 2016 - Page 8
What is Vulkan?
• A 3D graphics API for the next 20 years
- Logical successor to OpenGL / OpenGL ES
- Modern, efficient design
- An open, industry-controlled standard

• Here, now
- Released in February 2016
- Available today for Windows / Linux
- Shipping in Samsung Galaxy S7
- Support announced in Android ‘N’

• Different!
- Fundamental change in philosophy
- Requires corresponding changes in applications
© Copyright Khronos Group 2016 - Page 9
Why did we do this?
• Traditional APIs had issues…
• Developers weren’t happy

http://www.joshbarczak.com/blog/?p=154

http://richg42.blogspot.com/2014/05/things-that-drive-me-nuts-about-opengl.html

© Copyright Khronos Group 2016 - Page 10

Problems with OpenGL / OpenGL ES
• Programming model doesn’t match GPU HW
- Especially in mobile
- Driver magic hides the mismatch

• CPU intensive
- Lots of state validation, dependency tracking

• Complex, buggy, unpredictable drivers

- Different bugs and fast-paths on every GPU

• Fundamentally single-threaded
- Can’t use multi-core CPUs effectively

• …not to mention twenty years of legacy cruft

• Moves into high gear in July/August 2014

- Commitment from key ISVs
- AMD donation of Mantle

• A lot of very hard work follows…

• Release to public in February 2016

- Conformant drivers from four IHVs
- GLSL to SPIR-V compiler
- Debug and validation tools

© Copyright Khronos Group 2016 - Page 12

Vulkan in one slide

Resources (textures, buffers)

Memory
Instance Device
Queues

Command Buffers

© Copyright Khronos Group 2016 - Page 13

Vulkan in one slide two slides
Andrew
Neil / Hans-Kristian Tobias
Buffer
Command

Render Pass

Descriptor

Descriptor
Draw Call

Draw Call
Pipeline

Pipeline
Shaders

Shaders

Copy
Sync
Sync
Sets

Sets
Michael Jesse

© Copyright Khronos Group 2016 - Page 14

The principle of Explicit Control
• You promise to tell the driver
OpenGL lets you specify important
- What you are going to do information very late, and change it
- In sufficient detail that it doesn’t have to guess at any time. It’s convenient, but has
- When the driver needs to know it huge performance costs.

• In return, driver promises to do

OpenGL drivers often defer work
- What you asked for until later, move it to another
- When you asked for it thread, or even ignore your
- Very quickly commands, based on guesses about
your intent. Vulkan drivers won’t.

• No driver magic!

© Copyright Khronos Group 2016 - Page 15

Loader, layers, and extensions
• Vulkan has no dependencies on external APIs
- ICD loader is built-in
- Window system binding is (semi) built-in

• A side benefit: Layers

- Loader can install intercept libraries (“layers”)
- E.g. trace, debug

• Extensions
- Must be enabled at initialization time

© Copyright Khronos Group 2016 - Page 16

Multithreading
• All objects visible / accessible to all threads

• Most operations are externally synchronized

- Application must prevent unsafe concurrent access
- E.g., recording to the same command buffer
- E.g., submitting to the same queue
- Application must manage object lifetimes
- Note, many objects are immutable
- Concurrent read access is OK

• Allocation / creation are internally synchronized and may block

- Per-thread pool allocators keep this reasonably cheap

© Copyright Khronos Group 2016 - Page 17

Error handling
• Vulkan is optimized for correct applications
- Does not (generally) check for invalid usage
- Does not track dependencies
- Does not (generally) provide thread safety
- Breaking the rules results in undefined behavior

• Vulkan does check for errors you can’t predict

- Out of memory
- Device lost
- Other system errors…

• Layers to the rescue!

- Can enable validation layers during development

© Copyright Khronos Group 2016 - Page 18

Community
• A new attitude
- ISV member input drove key decisions
- Consulted with hundreds of developers

• Strong commitment to open source

- Loader
- Validation and other layers
- SPIR-V tools: compiler, validator, …
- Conformance tests
- Specification

• All at https://github.com/KhronosGroup

© Copyright Khronos Group 2016 - Page 19

Should you be using Vulkan?
• Challenges
- Verbose and complex
- Lots of exposed sharp edges
- Lots to learn

• Opportunities
- Much lower driver overhead
- …which you can spread across multiple threads
- More predictable performance
- Mobile friendly

• Realities
- Ecosystem is still immature
- Will need to ship GL/DX versions for years to come

© Copyright Khronos Group 2016 - Page 20

Command Buffers and Pipelines
Michael Worcester – Driver Engineer
([email protected])
26 May 2016 www.imgtec.com
Command Buffers – Deferring the work

 OpenGL is immediate (ignoring display lists)

 Driver does not know how much work is incoming
 Has to guess
 Bad!
 Vulkan splits recording of work from submission of work
 Removes guesswork from driver
 Reducing hitching
 Helps eliminate unexplained resource usage

© Imagination Technologies
Command Buffers – Pooling Resource
 Command Buffers always belong to a Command Pool
 Buffers are allocated from pools
 Pools provide lightweight synchronisation
 Pools can be reset, reclaiming all resources
 Two flavours of pool:
 Individual reset of command buffers
 Group reset only

Single Thread OpenGL Context

Thread 1 VkCommandBuffer

Thread 2 VkCommandBuffer

Thread N VkCommandBuffer

 Deferred recording of commands

 Transfer
 Graphics
 Compute
 Synchronisation

 Transfer commands are raw copies

 However, they can change the tiling of an image (this is the only way!)
 CPU -> GPU
 Texture upload
 Static buffer data
 GPU -> CPU
 Read back of data
 GPU -> GPU
 Pipelined updates of data
 Mipgen

Transfer Compute RenderPass Compute

Graphics Graphics Graphics

Dispatch BindPipeline BindDescriptors BeginRenderPass PushConstants Draw

Primary Transfer Compute RenderPass Compute

ExecuteCommands ExecuteCommands

Secondaries BindPipeline BindDescriptors Draw BindPipeline BindDescriptors Draw Draw

Camera

CPU GPU

Allocated
Begin

Record End Begin Pending Submit Wait Active

VI IA VS CS TS ES GS VP RS MS DS FS CB

 Fixed Function States

 Programmable Shaders
 Descriptor Layout
 Renderpass (more later)
 Dynamic State

VI IA VS CS TS ES GS VP RS MS DS FS CB

 VertexInput
 Everything that isn’t a shader
 InputAssembly
 Buffer formats/layouts
 Tessellation
 Viewport
 Raster
 Multisample
 DepthStencil
 ColorBlend

VI IA VS CS TS ES GS VP RS MS DS FS CB

 Currently same as OpenGL

 Vertex
 Control
 Evaluation
 Geometry
 Fragment
 Note: Tessellation and Geometry are optional features

Describes the set of resources that a shader can access

 Uniforms
 Storage Buffers
 Images
 Samplers
 Push Constants

© Imagination Technologies
Pipelines – Dynamic State
 Viewport
 Per-draw state
 Scissor
 Tedious to compile each one
 Line Width
 Combinatorial explosion  Depth Bias
 Dynamic state!  Blend Constant Colour
 Opt-in  Depth Bounds
 Only use when required  Stencil
 Compare
 Write
 Reference

 Share common state

 Load/Store

© Imagination Technologies
Introduction to SPIR-V Shaders
Neil Hickey
Compiler Engineer, ARM
© Copyright Khronos Group 2016 - Page 38
SPIR History

© Copyright Khronos Group 2016 - Page 39

SPIR-V Purpose

Parse HLSL Parse GLSL Parse OpenCL C Parse ISPC Parse Static C++

SPIR-V CFG Optimize SPIR-V CFG

Binary IHV Compiler SPIR-V Print SPIR-V

© Copyright Khronos Group 2016 - Page 40

Developer Ecosystem

• Multiple Developer Advantages:

• Same front-end compiler for multiple
platforms
• Reduces runtime kernel compilation time
• Don’t have to ship shader/kernel source
code
• Drivers are simpler and more reliable

© Copyright Khronos Group 2016 - Page 41

Vulkan and OpenCL
SPIR 1.2 SPIR 2.0 SPIR-V 1.0
100% Khronos defined
LLVM Interaction Uses LLVM 3.2 Uses LLVM 3.4 Round-trip lossless
conversion

Compute Constructs Metadata/Intrinsics Metadata/Intrinsics Native

Graphics Constructs No No Native

Supported Language OpenCL C 1.2 OpenCL C 1.2 – 2.0

OpenCL C 1.2
Feature Sets OpenCL C 2.0 OpenCL C++ and GLSL
OpenCL 2.1 Core
OpenCL C 1.2 OpenCL C 2.0
OpenCL Ingestion OpenCL 1.2 / 2.0
Extension Extension
Extensions

Vulkan Ingestion - - Vulkan 1.0 Core

© Copyright Khronos Group 2016 - Page 42

Compiler flow
GLSL Third party kernel and
Khronos has open sourced shader languages
these tools and translators
OpenCL C OpenCL C++
Khronos plans to open source
these tools soon

SPIR-V Tools
SPIR-V Validator
Other
SPIR-V (Dis)Assembler LLVM intermediate
forms
LLVM to SPIR-V
SPIR-V Bi-directional
• 32-bit word stream Translator
• Extensible and easily parsed
• Retains data object and
control flow information for
effective code generation and
translation

© Copyright Khronos Group 2016 - Page 43

SPIR-V Capabilities
• OpenCL and Vulkan

• Capabilities define feature sets

OpCapability Addresses
• Separate capabilities for Vulkan shaders and OpCapability Linkage
OpenCL kernels OpCapability Kernel
• Validation layer checks correct capabilities
OpCapability Vector16
requested OpCapability Int16

© Copyright Khronos Group 2016 - Page 44

SPIR-V Extensions
• OpExtension

• New functionality

• New instructions OpExtInstImport

“OpenCL.std”
• New semantics

© Copyright Khronos Group 2016 - Page 45

Vulkan shaders vs. GL shaders
• Program GLSL/ESSL shaders in high level language
• Ship high level source with application
• Graphics drivers compile at runtime
• Each driver needs a full compilation tool chain

• Shaders in binary format

• Compile offline
• Ship intermediate language with application
• Graphics drivers “just” lower from IL
• Higher level compilation can be shared among vendors (provided by Khronos)

© Copyright Khronos Group 2016 - Page 46

Vulkan shaders vs. GL shaders
; SPIR-V %6 = OpTypeFloat 32
#version 310 es ; Version: 1.0 %7 = OpTypeVector %6 4
; Generator: Khronos Glslang Reference Front End; 1 %8 = OpTypePointer Output %7

precision mediump float; ; Bound: 20

; Schema: 0
%9 = OpVariable %8 Output
%10 = OpTypeImage %6 2D 0 0 0 1 Unknown

uniform sampler2D s; OpCapability Shader

%1 = OpExtInstImport "GLSL.std.450"
%11 = OpTypeSampledImage %10
%12 = OpTypePointer UniformConstant %11

in vec2 texcoord; OpMemoryModel Logical GLSL450

OpEntryPoint Fragment %4 "main" %9 %17
%13 = OpVariable %12 UniformConstant
%15 = OpTypeVector %6 2

out vec4 color;

OpExecutionMode %4 OriginUpperLeft %16 = OpTypePointer Input %15
OpSource ESSL 310 %17 = OpVariable %16 Input
OpName %4 "main" %4 = OpFunction %2 None %3
OpName %9 "color" %5 = OpLabel
OpName %13 "s" %14 = OpLoad %11 %13

void main() OpName %17 "texcoord"

OpDecorate %9 RelaxedPrecision
%18 = OpLoad %15 %17
%19 = OpImageSampleImplicitLod %7 %14 %18

{ OpDecorate %13 RelaxedPrecision

OpDecorate %13 DescriptorSet 0
OpStore %9 %19
OpReturn

color = texture(s, texcoord);

OpDecorate %14 RelaxedPrecision OpFunctionEnd
OpDecorate %17 RelaxedPrecision
OpDecorate %18 RelaxedPrecision
} OpDecorate %19 RelaxedPrecision
%2 = OpTypeVoid
%3 = OpTypeFunction %2

© Copyright Khronos Group 2016 - Page 47

Khronos SPIR-V Tools
• Reference frontend (glslang) glslangValidator –V –o shader.spv shader.frag

• SPIR-V disassembler (spirv-dis) spirv-dis -o shader.spvasm shader.spv

• SPIR-V assembler (spirv-as) spirv-as –o shader.spv shader.spvasm

• SPIR-V reflection (spirv-cross) spirv-cross shader.spv

© Copyright Khronos Group 2016 - Page 48

Vulkan shaders in a high level language

• GL_KHR_vulkan_glsl

• Exposes SPIR-V features

• Similar to GLSL with some changes

• Extends #version 140 and higher on desktop and #version 310 es for mobile
content

© Copyright Khronos Group 2016 - Page 49

Vulkan_glsl removed features
• Default uniforms

• Atomic-counter bindings

• Subroutines

• Packed block layouts

© Copyright Khronos Group 2016 - Page 50

Vulkan_glsl new features
• Push constants

• Separate textures and samplers

• Descriptor sets

• Specialization constants

• Subpass inputs

© Copyright Khronos Group 2016 - Page 51

Push Constants
• Push constants replace non-opaque uniforms
- Think of them as small, fast-access uniform buffer memory
• Update in Vulkan with vkCmdPushConstants
// New
layout(push_constant, std430) uniform PushConstants {
mat4 MVP;
vec4 MaterialData;
} RegisterMapped;

// Old, no longer supported in Vulkan GLSL

uniform mat4 MVP;
uniform vec4 MaterialData;

// Opaque uniform, still supported

uniform sampler2D sTexture;1

© Copyright Khronos Group 2016 - Page 52

Separate textures and samplers
• sampler contains just filtering information
• texture contains just image information
• combined in code at the point of texture lookup

uniform sampler s;
uniform texture2D t;
in vec2 texcoord;
...
void main()
{
fragColor = texture(sampler2D(t,s), texcoord);
}

© Copyright Khronos Group 2016 - Page 53

Descriptor sets
• Bound objects can optionally define a descriptor set
• Allows bound objects to be updated in one block
• Allows objects in other descriptor sets to remain the same
• Enabled with the set = ... syntax in the layout specifier

layout(set = 0, binding = 0) uniform sampler s;

layout(set = 1, binding = 0) uniform texture2D t;

© Copyright Khronos Group 2016 - Page 54

Specialization constants
• Allows for special constants to be created whose value is overridable at pipeline
creation time.
• Can be used in expressions
• Can be combined with other constants to form new specialization constants
• Declared using layout(constant_id=...)
• Can have a default value if not overridden at runtime

layout(constant_id = 1) const int arraySize = 12;

vec4 data[arraySize];

© Copyright Khronos Group 2016 - Page 55

Specialization constants(2)
• gl_WorkGroupSize can be specialized with values for the x,y and z component.

layout(local_size_x_id = 2, local_size_z_id = 3) in;

• These specialization constants can be set at pipeline creation time by using

vkSpecializationMapInfo

const VkSpecializationMapEntry entries[] =

{
{ 1, // constantID
0*sizeof(uint32_t), // offset
sizeof(uint32_t) // size
},
};

© Copyright Khronos Group 2016 - Page 56

Specialization constants(3)
const uint32_t data[] = { 16};
const VkSpecializationInfo info =
{
1, // mapEntryCount
entries, // pMapEntries
1*sizeof(uint32_t), // dataSize
data, // pData
};

© Copyright Khronos Group 2016 - Page 57

Subpass Inputs
• Vulkan supports subpasses within render passes
• Standardized GL_EXT_shader_pixel_local_storage!

// GLSL
#extension GL_EXT_shader_pixel_local_storage : require
__pixel_local_inEXT GBuffer {
layout(rgba8) vec4 albedo;
layout(rgba8) vec4 normal;
...
} pls;

// Vulkan
layout(input_attachment_index = 0) uniform subpassInput albedo;
layout(input_attachment_index = 1) uniform subpassInput normal;
...

© Copyright Khronos Group 2016 - Page 58

Acknowledgements
• Hans-Kristian Arntzen – ARM
• Benedict Gaster – University of the West of England
• Neil Henning – Codeplay

© Copyright Khronos Group 2016 - Page 59

Using SPIR-V in practice with
SPIRV-Cross
Hans-Kristian Arntzen
Engineer, ARM
© Copyright Khronos Group 2016 - Page 60
Contents
• Moving to offline compilation of SPIR-V
• Creating pipeline layouts with SPIRV-Cross
- Descriptor sets
- Push constants
- Multipass input attachments
• Making SPIR-V portable to other graphics APIs
• Debugging complex shaders with your C++ debugger of choice

© Copyright Khronos Group 2016 - Page 61

Offline Compilation to SPIR-V
• Shader compilation can be part of your build system
• Catching compilation bugs in build time is always a plus
• Strict, mature GLSL frontends available
- glslang: https://github.com/KhronosGroup/glslang
- shaderc: https://github.com/google/shaderc
• Full freedom for other languages in the future

# Makefile rules

FRAG_SHADERS := $(wildcard *.frag)

SPIRV_FILES :=
$(FRAG_SHADERS:.frag=.frag.spv)

shaders: $(SPIRV_FILES)

%.frag.spv: %.frag
glslc –o $@ $< $(GLSL_FLAGS) –std=310es

© Copyright Khronos Group 2016 - Page 62

Vulkan Pipeline Layouts
• Need to know the “function signature” of our shaders

pipelineInfo.layout = <layout goes here>;

vkCreateGraphicsPipelines(..., &pipelineInfo, ..., &pipeline);

© Copyright Khronos Group 2016 - Page 63

The Contents of a Pipeline Layout
layout(set = 0, binding = 1) uniform UBO {
mat4 MVP;
};
layout(set = 1, binding = 2) uniform sampler2D uTexture;
layout(push_constant) uniform PushConstants {
vec4 FastConstant;
•} Signature
constants;

- 16 bytes of push constant space

- Two descriptor sets
- Set #0 has one UBO at binding #1
- Set #1 has one combined image sampler at binding #2
• Need to figure this out automatically, or write every layout by hand
- Latter is fine for tiny applications
- Vulkan does not provide reflection here, after all, this is vendor neutral information

© Copyright Khronos Group 2016 - Page 64

Introducing SPIRV-Cross
• SPIRV-Cross is a new tool hosted by Khronos
- https://github.com/KhronosGroup/SPIRV-Cross
• Extensive reflection
• Decompilation to high level languages

Khronos SPIR-V Toolbox

SPIRV- SPIRV- SPIRV-

glslang
Tools LLVM Cross

© Copyright Khronos Group 2016 - Page 65

Reflecting Uniforms and Samplers
• SPIRV-Cross has a simple API to retrieve resources

using namespace spirv_cross;

vector<uint32_t> spirv_binary = load_spirv_file();

Compiler comp(move(spirv_binary));

// The SPIR-V is now parsed, and we can perform reflection on it.

ShaderResources resources = comp.get_shader_resources();

for (auto &u : resources.uniform_buffers)

{
uint32_t set = comp.get_decoration(u.id, spv::DecorationDescriptorSet);
uint32_t binding = comp.get_decoration(u.id, spv::DecorationBinding);
printf(“Found UBO %s at set = %u, binding = %u!\n”,
u.name.c_str(), set, binding);
}

© Copyright Khronos Group 2016 - Page 66

Stepping it up with Push Constants
• SPIRV-Cross can figure out which push constant elements are in use
- Push constant blocks are typically shared across the various stages
- Only parts of the push constant block are referenced in a single stage

layout(push_constant) uniform PushConstants {

mat4 MVPInVertex;
vec4 ColorInFragment;
} constants;

FragColor = constants.ColorInFragment; // Fragment only uses element #1.

uint32_t id = resources.push_constant_buffers[0].id;
vector<BufferRange> ranges = comp.get_active_buffer_ranges(id);
for (auto &range : ranges)
{
printf(“Accessing member #%u, offset %u, size %u\n”,
range.index, range.offset, range.range);
}

// Possible to get names for struct members as well 

© Copyright Khronos Group 2016 - Page 67

Subpass Input Attachments
• Subpass attachments are similar to regular images
- Set
- Binding
- Input attachment index

layout(set = 0, binding = 0, input_attachment_index = 0) uniform subpassInput uAlbedo;

layout(set = 0, binding = 1, input_attachment_index = 1) uniform subpassInput uNormal;

vec4 lastColor = subpassLoad(uLastPass);

for (auto &attachment : resources.subpass_inputs)

{
// ...
}

© Copyright Khronos Group 2016 - Page 68

Taking SPIR-V Beyond Vulkan
• SPIR-V is a great format to rally around
- Makes sense to be able to use it in older graphics APIs as well
• Will take some time before exclusive Vulkan support is mainstream
• How to make use of Vulkan features while being compatible?
- Push constants
- Subpass
- Descriptor sets
• Without tools, Vulkan features will be harder to take advantage of

© Copyright Khronos Group 2016 - Page 69

GL + GLES + Vulkan Pipeline
• Implemented in our internal demo engine
• Write shaders in Vulkan GLSL
• Use Vulkan features directly
• No need for platform #ifdefs
• Can target mobile and desktop GL from same
SPIR-V binary

© Copyright Khronos Group 2016 - Page 70

Subpasses in OpenGL
• The subpass attachment is really just a texture read from gl_FragCoord
- Enables reading directly from tile memory on tiled architectures
- Great for deferred rendering and programmable blending

// Vulkan GLSL
uniform subpassInput uAlbedo;
...
FragColor = accumulateLight(
subpassLoad(uAlbedo),
subpassLoad(uNormal).xyz,
subpassLoad(uDepth).x);

// Translated to GLSL in SPIRV-Cross

uniform sampler2D uAlbedo;
...
FragColor = accumulateLight(
texelFetch(uAlbedo, ivec2(gl_FragCoord.xy), 0),
texelFetch(uNormal, ivec2(gl_FragCoord.xy), 0).xyz,
texelFetch(uDepth, ivec2(gl_FragCoord.xy), 0).x);

© Copyright Khronos Group 2016 - Page 71

Push Constants in OpenGL
• Push constants bundle up old-style uniforms into buffer blocks
- Translates directly to uniform structs
- Use reflection to stamp out a list of glUniform() calls

// Vulkan GLSL
layout(push_constant) uniform PushConstants {
vec4 Material;
} constants;

FragColor = constants.Material;

// Translated to GLSL in SPIRV-Cross

struct PushConstants {
vec4 Material;
};
uniform PushConstants constants;

FragColor = constants.Material;

© Copyright Khronos Group 2016 - Page 72

Descriptor Sets in OpenGL
• OpenGL has a binding space per type
• Find some remapping scheme that fits your application
• SPIRV-Cross can tweak bindings before decompiling to GLSL

// Vulkan GLSL
layout(set = 1, binding = 1) uniform sampler2D uTexture;

// SPIRV-Cross
uint32_t newBinding = 4;
glsl.set_decoration(texture.id, spv::DecorationBinding, newBinding);
glsl.unset_decoration(texture.id, spv::DecorationDescriptorSet);
string glslSource = glsl.compile();

// GLSL
layout(binding = 4) uniform sampler2D uTexture;

© Copyright Khronos Group 2016 - Page 73

gl_InstanceIndex in OpenGL
• Vulkan adds the base instance to the instance ID
- GL does not 
- Workaround is to have GL backend pass in the base index as a uniform

// Vulkan GLSL
layout(set = 0, binding = 0) uniform UBO {
mat4 MVPs[MAX_INSTANCES];
};

gl_Position = MVPs[gl_InstanceIndex] * Position;

// GLSL through SPIRV-Cross

layout(binding = 0) uniform UBO {
mat4 MVPs[MAX_INSTANCES];
};
uniform int SPIRV_Cross_BaseInstance; // Supplied by application

gl_Position = MVPs[(gl_InstanceID + SPIRV_Cross_BaseInstance)] * Position;

© Copyright Khronos Group 2016 - Page 74

Debugging Shaders in C++
• If you have thought …
- “I wish I could assert() in a compute shader”
- “I wish I could instrument a shader with logging”
- “I wish I could use clang address sanitizer to debug out-of-bounds access”
- “I want to reproduce a shader bug outside the driver”
- “I want to run regression tests when optimizing a shader”
- “I want to step through a compute thread in <insert C++ debugger here>”
• … the C++ backend in SPIRV-Cross could be interesting
• Still a very experimental feature
• Hope to expand this further in the future

© Copyright Khronos Group 2016 - Page 75

Basic Idea
• With GLM, C++ can be near GLSL compatible
• Reuse the GLSL backend to emit code which also works in C++
- Minor differences like references vs. in/out, etc
• Add some scaffolding to redirect shader resources
- Easily done with macros, the actual C++ output is kept clean
• The C++ output implements a simple C-compatible interface
• Add instrumentation to the C++ file as desired
• Compile C++ file to a dynamic library with debug symbols
• Instantiate from test program, bind buffers and invoke
- And have fun running shadertoy raymarchers at seconds per frame

© Copyright Khronos Group 2016 - Page 76

On the Command Line

# Compile to SPIR-V
glslc –o test.spv test.comp

# Create C++ interface

spirv-cross --output test.cpp test.spv --cpp

# Add some instrumentation to the shader if you want

$EDITOR test.cpp

# Build library
g++ -o test.so –shared test.cpp –O0 –g –Iinclude/spirv_cross

# Run your test app

./<my app> --shader test.so

© Copyright Khronos Group 2016 - Page 77

Another tool supporting Vulkan:
Mali Graphics Debugger is an advanced API tracer tool for Vulkan, OpenGL ES, EGL and
OpenCL. It allows developers to trace their graphics and compute applications to debug
issues and analyze the performance.

• Vulkan Support
- Trace all the function calls in the
SPEC.
- Allows you to see exactly what calls
compose your application.
- Contact the Mali forums and we would
love to get you setup.
https://community.arm.com/groups/
arm-mali-graphics

© Copyright Khronos Group 2016 - Page 78

Investigation with the Mali Graphics Debugger
Frame
Assets View
Statistics

Frame
Outline
States
Uniforms
Frame Vertex Attributes
Capture: Buffers
Framebuffers
API Trace

Textures
Shaders
Dynamic
Help

© Copyright Khronos Group 2016 - Page 79

References
• SPIRV-Cross
- https://github.com/KhronosGroup/SPIRV-Cross
• Glslang
- https://github.com/KhronosGroup/glslang
• Shaderc
- https://github.com/google/shaderc
• SPIRV-Tools
- https://github.com/KhronosGroup/SPIRV-Tools
• Mali Graphics Debugger
- http://malideveloper.arm.com/resources/tools/mali-graphics-debugger/

© Copyright Khronos Group 2016 - Page 80

Feeding Your Shaders

Jesse Barker
Principal Software Engineer

Moving to Vulkan: How to make your 3D graphics more explicit

May 26, 2016

© ARM 2016
What is a Vulkan Resource?
 Shader Input/Output  Buffers
 Referenced via Descriptors  Images
 Some are specialized in the  Samplers
hardware  Input Attachments
 Vertex Input Attributes
 Render Targets

83 © ARM 2016
What are Vulkan Descriptors?

Handle Type
myImageView SAMPLED_IMAGE

Image View
Image Device
Memory

84 © ARM 2016
What are Descriptor Sets?
// uniform blocks:
layout(set = 0, binding = 0) uniform Type0 { ... } ubo0; binding type stages
// textures: 0 Uniform Buffer Graphics
layout(set = 0, binding = 1) uniform sampler2D tex0;
1 Image/Sampler Graphics
// SSBO:
layout(set = 0, binding = 2) buffer Type2 { ... } ssbo0; 2 Storage Buffer Graphics
void main()
// ...
}

85 © ARM 2016
What is a Descriptor Pool?
typedef struct VkDescriptorPoolSize {
 Parent object of a VkDescriptorType type;
Descriptor Set uint32_t descriptorCount;
} VkDescriptorPoolSize;
 Allows Descriptor Set
typedef struct VkDescriptorPoolCreateInfo {
management to be VkStructureType sType;
threaded const void*
VkDescriptorPoolCreateFlags
pNext;
flags;
 Manages memory for uint32_t maxSets;
uint32_t poolSizeCount;
hardware descriptors const VkDescriptorPoolSize* pPoolSizes;
} VkDescriptorPoolCreateInfo;

86 © ARM 2016
Allocating Descriptor Sets
 Define desired layouts of descriptors
 Ask the Descriptor Pool to allocate a Descriptor Set per layout

87 © ARM 2016
What is a Pipeline Layout?
// uniform blocks:
layout(set = 0, binding = 0) uniform Type0
Descriptor Set 0
{ ... } ubo0;
layout(set = 0, binding = 0) uniform Type1 binding type stages
{ ... } ubo1;
0 Uniform Buffer Graphics
// textures:
layout(set = 0, binding = 1) uniform 0 Uniform Buffer Graphics
sampler2D tex0;
layout(set = 1, binding = 0) uniform 1 Image/Sampler Graphics
sampler2D tex1;

// SSBO:
layout(set = 1, binding = 1) buffer Type2 {
... } ssbo0;
Descriptor Set 1
void main() { binding type stages
// ...
}
0 Image/Sampler Graphics
1 Storage Buffer Graphics

88 © ARM 2016
How do Descriptors get into Descriptor Sets?
VKAPI_ATTR void VKAPI_CALL vkUpdateDescriptorSets( typedef struct VkWriteDescriptorSet {
VkDevice device, VkStructureType sType;
uint32_t const void* pNext;
descriptorWriteCount, VkDescriptorSet dstSet;
const VkWriteDescriptorSet* pDescriptorWrites, uint32_t dstBinding;
uint32_t descriptorCopyCount, uint32_t dstArrayElement;
const VkCopyDescriptorSet* pDescriptorCopies); uint32_t descriptorCount;
VkDescriptorType descriptorType;
const VkDescriptorImageInfo* pImageInfo;
const VkDescriptorBufferInfo* pBufferInfo;
const VkBufferView* pTexelBufferView;
} VkWriteDescriptorSet;

typedef struct VkCopyDescriptorSet {

VkStructureType sType;
const void* pNext;
VkDescriptorSet srcSet;
uint32_t srcBinding;
uint32_t srcArrayElement;
VkDescriptorSet dstSet;
uint32_t dstBinding;
uint32_t dstArrayElement;
uint32_t descriptorCount;
} VkCopyDescriptorSet;

89 © ARM 2016
Finally, I’m ready to use my Descriptor Sets
VKAPI_ATTR void VKAPI_CALL vkCmdBindDescriptorSets(
VkCommandBuffer commandBuffer,  Bound sets must
VkPipelineBindPoint pipelineBindPoint, match pipeline layout
VkPipelineLayout layout,
uint32_t firstSet,  Graphics or compute?
uint32_t descriptorSetCount,
const VkDescriptorSet* pDescriptorSets,  Simple layout is best
uint32_t dynamicOffsetCount,
const uint32_t* pDynamicOffsets);

90 © ARM 2016
What about Vertex Input?

91 © ARM 2016
Vertex Input Description
If your shader declares: const VkVertexInputBindingDescription binding[] =
{
{
0, // binding
in vec3 position; sizeof(float) * 3, // stride
in uvec2 texcoord; VK_VERTEX_INPUT_RATE_VERTEX // inputRate
},
{
Your C code declares: 1,
sizeof(uint8_t) * 2,
// binding
// stride
VK_VERTEX_INPUT_RATE_VERTEX // inputRate
struct Position },
{ };
float x, y, z; const VkVertexInputAttributeDescription attributes[] =
}; {
{
0, // location
struct Texcoord binding[0].binding, // binding
{ VK_FORMAT_R32G32B32_SFLOAT, // format
uint8_t u, v; 0 // offset
},
}; {
1, // location
binding[1].binding, // binding
VK_FORMAT_R8G8_UNORM, // format
0 // offset
}
};

92 © ARM 2016
Questions?

93 © ARM 2016
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM
Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured
may be trademarks of their respective owners.
Copyright © 2016 ARM Limited

Andrew Garrard
Samsung R&D Institute UK

UK Khronos Chapter meet, May 2016

Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 96

Core 2 CmdBuf CmdBuf CmdBuf Command buffer

recording
Core 3 CmdBuf CmdBuf CmdBuf

Core 4 Submit Submit Submit

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 97

Invoke
Invoke

Invoke

Invoke
2ry 2ry 2ry 2ry

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 98

Record command buffer CmdBuf CmdBuf CmdBuf

Record command buffer CmdBuf CmdBuf

Record command buffer CmdBuf

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 99

View 1 View 2

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 100

Record command buffer (slow-ish) 2ry thread

Submit command buffer (fast) Submitting thread

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 101

Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Vulkan aims to reduce the overheads of
keeping the GPU busy
- Efficient generation of work on multiple CPU cores
- Reuse of command buffers to avoid CPU build time
- Potentially more efficient memory management
- Avoiding unpredictable shader compilation
•Mostly, the message has been that if you’re entirely
limited by shader performance or bandwidth, Vulkan
can’t help you (there is no magic wand)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 102
Vulkan:
Click Making
to edit Masteruse
titleof the
GPU more efficient
style
•Actually, that’s not entirely true...
•APIs like OpenGL were designed when the GPU
looked very different (or was partly software)
•The way to design an efficient mobile GPU is
not a perfect match for OpenGL
-Think a CPU’s command decode unit/microcode
•But the translation isn’t always perfectly
efficient
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 103
TiledtoGPUs
Click edit Master title style
•Most (not all) mobile GPUs use tiling
- It’s all about the bandwidth (size and power limits)
Scene description Binning pass Shading pass

•On-chip tile memory is much faster than the

main frame buffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 104
Not everything
Click to edit Masterreaches
title stylememory
•Rendering requires lots of per-pixel data
- Z, stencil
- Full multisample resolution
•We usually only care about the final image

Z Stencil RGB RGB

- We can throw away Z and stencil

- We only need a downsampled (A)RGB
- Don’t need to load anything from a previous frame
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 105
Sometimes
Click we want
to edit Master title the
styleresults
of rendering
•Output from one rendering job can be used by
the next
•Z buffer for shadow maps
•Rendering for environment maps
•HDR bloom

•These can have low resolution and may not

take much bandwidth
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 106
Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
Z

Light
weight Render
render Diffuse/ɑ
full-screen
storing quad and
per- perform
surface fragment
content Specular/ shading
at each Specularity
fragment

Normal

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 107

Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
Diffuse
Z
Re-render
Render scene with
Light full-screen full
weight quad and fragment
render calculate shading,
Specularity Specular
for lighting using
lighting output lighting
input inputs

Normal

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 108

Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
•Order-independent transparency

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 109

Sometimes
Click you do
to edit Master need
title styleframebuffer resolution
•Deferred shading
•Deferred lighting
•Order-independent transparency
•HDR tone mapping

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 110

Rendering
Click outputs
to edit Master separately
title style
•Rendering to each surface separately is bad

•Geometry has a per-bin cost

- Sometimes the cost is low, but it’s there
- Vertices in multiple bins get processed repeatedly
- Rendering the scene repeatedly is painful
•Even immediate-mode renderers hate this!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 111
Multiple
Click render
to edit Mastertargets don’t
help much
title style
•Using MRTs means multiple buffers in one pass
Single scene traversal
This is a typical approach for
immediate-mode renderers (e.g.
desktop/console systems)

•Reduces the geometry load (only process once)

•Still writing a lot of data off-chip
- Tilers are all about trying not to do this!
- Increases use of shader resources may slow some h/w
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 112
PixeltoLocal
Click Storage
edit Master title(OpenGL
style ES extension)
•Tiler-friendly (at last)
- Store only the current tile values
- Read them later in the tile processing
•But not portable!
- Not practical on immediate renderers
- Debugging on desktop won’t work!
- Capabilities vary between devices
- Driver doesn’t have visibility
- Data access is restricted
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 113
Vulkan:
Click Explicit
to edit Master dependencies
title style
•Vulkan has direct support for this type of
rendering work load
•By telling the driver how you intend to use the
rendered results, the driver can produce a
better mapping to the hardware
- The extra information is a little verbose, but simpler
than handling all possible cases yourself!

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 114

Vulkan
Click render
to edit passes
Master and
subpasses
title style
•A render pass groups dependent operations
- All images written in a render pass are the same size

Lighting Fragment
Geometry

Single render pass

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 115

Sub Sub Sub

pass pass 2: pass 3:
1: Light Frag
Geo

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 116

Vulkan
Click render
to edit passes
Master and
subpasses
title style
•A render pass groups dependent operations
- All images written in a render pass are the same size
•A render pass contains a number of subpasses
- Subpasses describe access to attachments
- Dependencies can be defined between subpasses
•Each render pass instance has to be contained
within a single command buffer (unit of work)
- Some tilers schedule by render pass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 117
Defining
Click a Master
to edit rendertitle
passstyle
•VkRenderPassCreateInfo
- VkAttachmentDescription *pAttachments
- Just the descriptions, not the actual attachments!
- VkSubpassDescription *pSubpasses
- VkSubpassDependency *pDependencies
•vkCreateRenderPass(device, createInfo,.. pass)
- Gives you a VkRenderPass object
- This is a template that you can use repeatedly
- When we use it, we get a render pass instance
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 118
Describing
Click attachments
to edit Master title stylefor a render pass
•VkAttachmentDescription
- format/samples
- loadOp
- VK_ATTACHMENT_LOAD_OP_LOAD to preserve
- VK_ATTACHMENT_LOAD_OP_DONT_CARE for overwrites
- VK_ATTACHMENT_LOAD_OP_CLEAR uniform clears (e.g. Z)
- storeOp
- VK_ATTACHMENT_STORE_OP_STORE to output it
- VK_ATTACHMENT_STORE_OP_DONT_CARE may discard after
the render pass
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 119
Defining
Click a Master
to edit subpass
title style
•VkSubpassDescription
- pInputAttachments
- Which of the render pass’s attachments this subpass reads
- pColorAttachments
- Which ones this subpass writes (1:1 - optional)
- pResolveAttachments
- Which ones this subpass writes (resolving multisampling)
- pPreserveAttachments
- Which attachments need to persist across this subpass
- Subpasses are numbered and ordered
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 120
Defining
Click subpass
to edit dependencies
Master title style
•VkSubpassDependency
- srcSubpass
- dstSubpass
- Where the dependency applies (can be external)
- srcStageMask
- dstStageMask
- Execution dependencies between subpasses
- srcAccessMask
- dstAccessMask
- Memory dependencies between subpasses
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 121
Vulkan
Click framebuffers
to edit Master title style
•A VkFramebuffer defines the set of
attachments used by a render pass instance
•VkFramebufferCreateInfo
- renderPass
- pAttachments
- These are actual VkImageViews this time!
- width
- height
- layers
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 122
Starting
Click to Master
to edit use a title
render
stylepass
•vkCmdBeginRenderPass/vkCmdEndRenderPass
- Starts a render pass instance in a command buffer
- You start in the first (maybe only) subpass implicitly
- pRenderPassBegin contains configuration
•VkRenderPassBeginInfo
- VkRenderPass renderPass
- The render pass “template”
- VkFrameBuffer framebuffer
- Specifies targets for rendering
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 123
Putting
Click it all
to edit together…
Master title style
VkAttachmentDescription VkSubpassDescription VkSubpassDependency
VkAttachmentDescription VkSubpassDescription VkSubpassDependency
VkAttachmentDescription VkSubpassDescription
VkAttachmentDescription
VkRenderPassCreateInfo Key:
VkImageView
• Objects are dark grey
VkImageView vkCreateRenderPass • Functions are light grey
• Arrows between objects are
VkImageView
references of some sort
VkImageView VkRenderPass • Arrows into functions are arguments
• Arrows out of functions are
VkFramebufferCreateInfo constructed objects

vkCreateFramebuffer VkRenderPassBeginInfo VkCommandBuffer

VkFramebuffer
vkCmdBeginRenderPass

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 124

Simple
Click rendering
to edit Master title style
•vkAllocateCommandBuffers (VK_COMMAND_BUFFER_LEVEL_PRIMARY)
•vkBeginCommandBuffer
Command buffer

•vkCmdBeginRenderPass Render pass

Draw Draw Draw Draw

•vkCmdDraw (etc.)
•vkCmdEndRenderPass
•vkEndCommandBuffer
Queue

•vkQueueSubmit
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 125
Multiple
Click render
to edit Masterpasses
title style
•You can have more than one render pass in a
command buffer Render pass
Command buffer
Render pass
- Yes, Leeloo multipass,
Draw Draw Draw Draw
we know…

- So a command buffer can render to many outputs

- E.g. you could render to the same shadow and environment
maps every frame by reusing the same command buffer
- But it must be the same outputs each time you submit
- A specific render pass instance has fixed vkFrameBuffers!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 126
Two to
Click limitations…
edit Master title style

•Different render passes ֜ independent outputs

- Rendering goes off-chip, there’s no PLS-style on-chip
reuse of pixel contents
•You can’t reuse the same command buffer with
a different render target
- E.g. for double buffering or streamed content
- We’ll come back to this…
•Still sometimes all you need, though!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 127
Moretothan
Click one subpass
edit Master title style
•vkCmdNextSubpass moves to the next subpass
- Implicitly start in the first subpass of the render pass
- Dependencies say what you’re accessing from
previous subpasses Command buffer

- Same render pass so Render Pass

accesses stay on

New subpass
chip (if possible) Draw Draw Draw Draw Draw

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 128

Usingtomultiple
Click subpasses
edit Master title style
•vkCmdBeginCommandBuffer
•vkCmdBeginRenderPass
Command buffer
•vkCmdDraw (etc.) Render Pass

New subpass
•vkCmdNextSubpass Draw Draw Draw Draw Draw

•vkCmdDraw (etc.)
•vkCmdEndRenderPass
•vkCmdEndCommandBuffer
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 129
Accessing
Click to edit subpass output
in fragment shaders
Master title style
•In SPIR-V, previous subpass content is read
with OpImageRead
- Coordinates are sample-relative, and need to be 0
- OpTypeImage Dim = SubpassData
•In GLSL (using GL_KHR_vulkan_glsl):
- Types for subpass access are [ui]subpassInput(MS)
- layout(input_attachment_index = i, …) uniform
subpassInput t; to select a subpass C.f. __pixel_localEXT layouts in
EXT_shader_pixel_local_storage
- subpassLoad() to access the pixel when using OpenGL ES

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 130

Avoiding
Click unnecessary
to edit allocations
Master title style
•If we’re using subpasses, we likely don’t need
the images in memory
- A tiler may be able to process the subpasses entirely
on-chip, without needing an allocation
- Still need to “do the allocation” in case the tiler can’t
handle the request/on an immediate-mode renderer!
- Won’t commit resources unless it actually needs to
•vkCreateImage flags for “lazy committal”
- VK_IMAGE_USAGE_TRANSIENT_ATTACHMENT_BIT
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 131
Vulkan
Click subpasses:
to edit advantages
Master title style
•The driver knows what you’re doing
- It can reorder subpasses EXT_shader_pixel_local_storage is actually
more explicit than Vulkan here (and may still
- It can change the tile size be offered as an extension)

- It can balance resources between subpasses

- It will fall back to memory for you if it has to
- Under the hood, mechanism likely matches PLS
•Works on immediate mode renderers
- Probably MRTs and normal external writes
- Desktop debugging tools will work!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 132
There’s
Click more:
to edit Secondary
Master command buffers
title style
•Vulkan has two levels of command buffers
- Determined by vkAllocateCommandBuffers
•VK_COMMAND_BUFFER_LEVEL_PRIMARY
- Main command buffer, as we’ve seen so far
•VK_COMMAND_BUFFER_LEVEL_SECONDARY
- Command buffer that can be invoked from the
primary command buffer

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 133

Use of
Click secondary
to edit command
Master title style buffers
•vkBeginCommandBuffer
- Takes a VkCommandBufferBeginInfo
•VkCommandBufferBeginInfo
- flags include:
- VK_COMMANDBUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
- pInheritanceInfo
•VkCommandBufferInheritanceInfo
- renderPass and subpass
- framebuffer (can be null, more efficient if known)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 134
Secondary
Click command
to edit Master buffers
and passes
title style
•Why do we need the “continue bit”?
- Render passes (and subpasses) can’t start in a
secondary command buffer
- Non-render pass stuff can be in a secondary buffer
- You can run a compute shader outside a render pass
- Otherwise, the render pass is inherited from the
primary command buffer

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 135

Secondary
Click command
to edit Master buffers
and passes
title style
•Why specify render pass/framebuffer?
- Command buffers needs to know this when recording
- Some operations depends on render pass info (e.g. format)
- Framebuffer is optional (can just inherit)
- If you can specify the actual framebuffer, the command
buffer can be less generic and therefore may be faster

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 136

Invoking
Click the
to edit secondary
Master command
title stylebuffer
•You can’t submit a secondary command buffer
•You have to invoke it from a primary command
buffer with vkCmdExecuteCommands
Secondary buffer Secondary buffer Secondary buffer

Draw Draw Draw Draw Draw Draw

Primary command buffer

Render pass Render pass
subpass
New

vkCEC vkCEC vkCEC

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 137

Secondary
Click command
to edit Master code buffer
title style
•vkCmdBeginCommandBuffer
Primary command buffer
•vkCmdBeginRenderPass Render pass

subpass
New
vkCEC vkCEC
•vkCmdExecuteCommands
•vkCmdNextSubpass Secondary buffer

•vkCmdExecuteCommands Draw Draw

•vkCmdEndRenderPass Secondary buffer

•vkCmdEndCommandBuffer Draw Draw

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 138

Performance
Click andtitle
to edit Master parallelism
style
•Creating a command buffer can be slow
- Lots of state to check, may require compilation
- This happens in GLES as well, you just don’t control when!
•So create secondary command buffers on
different threads
- Lots of 4- and 8-core CPUs in cell phones these days
•Invoking the secondary buffer is lightweight
- Primary command buffer generation is quick(er)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 139
Whattodoes
Click this have
edit Master to do
title style with passes?
•Remember:
- Render passes exist within (primary) command buffers
- The command buffer sets up the GPU for the render pass
- On-chip rendering happens within a render pass
- If you want content to persist between render passes, it’ll
reach memory (or at least cache), not stay in the tile buffer
- You can’t use multiple threads to build work for a
primary command buffer in parallel
- You can build many secondary command buffers at once

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 140

Command buffer
Render pass

Draw Draw Draw Draw

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 141

Draw Draw vkCEC vkCEC Draw Draw

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 142

You can’t
Click to edit mix and
Master match
title style
•Within a subpass you can either (but not both):
- Execute rendering commands directly in the primary
command buffer
- VK_SUBPASS_CONTENTS_INLINE
- Invoke secondary command buffers from the primary
command buffer with vkCmdExecuteCommands
- VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS
- Chosen by vkCmdBeginRenderPass/vkCmdNextSubpass
- Remember: you can only do these in a primary command
buffer!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 143
Command
Click to edit buffer reuse:
Master title styleeven
faster
•Primary command buffers work with a fixed
render pass and framebuffer
- You can reuse a primary command buffer, but it will
always access the same images – often good enough
- May have to wait for execution to end; can’t be “one-time”
•What if you want to access different targets?
- E.g. a cycle of framebuffers or streamed content?
- You can round-robin several command buffers
- Or you can use secondary command buffers!
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 144
Compatible
Click render
to edit Master titlepasses
and frame buffers
style
•The render pass a secondary command buffer
uses needn’t be the one it was recorded with
- It can be “compatible”
- Same formats, number of sub-passes, etc.
•You can have primary command buffers with
different outputs, and they can re-use
secondary command buffers
- The primary has to be different to record new targets
- The primary may have to patch secondary addresses
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 145
Almost-free
Click use with
to edit Master changing
framebuffers
title style
•No cost for secondary command buffers
•Primary command buffer is simple and quick
Primary command buffer
Render pass

CEC CEC
Target
image 1

Secondary
command
Primary command buffer buffer
Target
image 2 Render pass

CEC CEC Secondary

command
buffer

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 146

So I can
Click doMaster
to edit bloom/DoF/rain/motion
title style blur…!
•No! Remember, you can only access the
current pixel
•Tilers process one tile at a time
?

- If you could try to access a different pixel, the tile

containing it may not be there
- You have to write out the whole image to do this
- Slow, painful, last resort!
- Yes, we can think of possible solutions too
- Give it time (lots of different hardware out there)
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 147
Coming
Click outMaster
to edit of the shadow(buffer)s
title style
•Render passes are integral to the Vulkan API
- Reflects modern, high-quality rendering approaches
•The driver has more information to work with
- It can do more for you
- Remember this if you complain it’s verbose!
•Hardware resource management is hard
- Expect drivers to get better over time
•Another tool for better mobile gaming
UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 148
Thank
Click to you
edit Master title style
•Over to you…

Andrew Garrard
a.garrard at samsung.com

UK Khronos Chapter meet, May 2016 Vulkan subpasses — Page 149

Keeping your GPU fed
without getting bitten
Tobias Hector
May 2016
© Copyright Khronos Group 2016 - Page 150
Introduction
• You have delicious draw calls
- Yummy!

© Copyright Khronos Group 2016 - Page 151

Introduction
• You have delicious draw calls
- Yummy!

• Your GPU wants to eat them

- It’s really hungry

© Copyright Khronos Group 2016 - Page 152

Introduction
• You have delicious draw calls
- Yummy!

• Your GPU wants to eat them

- It’s really hungry

• Keep it fed at all times

- So it keeps making pixels

© Copyright Khronos Group 2016 - Page 153

Introduction
• You have delicious draw calls
- Yummy!

• Your GPU wants to eat them

- It’s really hungry

• Keep it fed at all times

- So it keeps making pixels

• Don’t want it biting your hand

- Look at those teeth!

© Copyright Khronos Group 2016 - Page 154

Keeping it fed
• GPU needs a constant supply of food
- It doesn’t want to wait

• Certain foods are tough to digest

- Provide multiple operations to hide stalls

• Draw calls provide a variety of nutrition

- Vertex work, raster work, tessellation, vitamins A-K, etc.

© Copyright Khronos Group 2016 - Page 155

Keeping it fed

System
CPU

0 1
GPU

0 1

© Copyright Khronos Group 2016 - Page 156

Keeping it fed

System
CPU

0 1 2
GPU

0 1 2

© Copyright Khronos Group 2016 - Page 157

Keeping it fed

GPU
Vertex

0 1
Fragment

0 1

© Copyright Khronos Group 2016 - Page 158

Keeping it fed

GPU
Vertex

0 1 2
Fragment

0 1 2

© Copyright Khronos Group 2016 - Page 159

Not getting bitten
• GPU eating from lots of different plates
- Don’t touch anything it’s using!

• It doesn’t want a mouthful of beef choc chip ice cream

- Don’t change data whilst it’s accessing a resource

• Hey I’m eating that!

- Don’t delete resources whilst the GPU is still using them

© Copyright Khronos Group 2016 - Page 160

© Copyright Khronos Group 2016 - Page 161
© Copyright Khronos Group 2016 - Page 162
© Copyright Khronos Group 2016 - Page 163
© Copyright Khronos Group 2016 - Page 164
© Copyright Khronos Group 2016 - Page 165
On to the serious bits…

© Copyright Khronos Group 2016 - Page 166

Terminology
• Operation
- Anything that can be executed Note: Memory barrier does not
mean quite the same thing as GL’s
- Includes synchronization and memory barriers memory barrier, though there is
some relation.
• Execution Dependency
- Operations waiting on other operations
- All synchronization expresses these

• Memory Barrier
- Flush/invalidate caches
- Determination of access and visibility

• Memory Dependency
- Execution dependency involving a Memory Barrier

© Copyright Khronos Group 2016 - Page 167

Synchronization Types
• 3 types of explicit synchronization in Vulkan

• Pipeline Barriers, Events and Subpass Dependencies

- Within a queue
- Explicit memory dependencies

• Semaphores
- Between Queues

• Fences
- Whole queue operations to CPU OpenGL has just two, very coarse
synchronization primitives: memory
barriers and fences. They are
loosely similar to the equivalently
named concepts in Vulkan

© Copyright Khronos Group 2016 - Page 168

Pipeline Barriers
• Pipeline Barriers void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
- Precise set of pipeline stages VkPipelineStageFlags srcStageMask,
- Memory Barriers to execute VkPipelineStageFlags dstStageMask,

- Single point in time VkDependencyFlags dependencyFlags,

uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);

Executing a pipeline barrier is

roughly equivalent to a
glMemoryBarrier call, though with
much more control.

© Copyright Khronos Group 2016 - Page 169

Events
• Events void vkCmdSetEvent(
VkCommandBuffer commandBuffer,
- Same info as Pipeline Barriers VkEvent event,
- …but operate over a range VkPipelineStageFlags stageMask);
void vkCmdResetEvent(
VkCommandBuffer commandBuffer,
VkEvent event,
VkPipelineStageFlags stageMask);

void vkCmdWaitEvents(
VkCommandBuffer commandBuffer,
uint32_t eventCount,
const VkEvent* pEvents,
VkPipelineStageFlags srcStageMask,
VkPipelineStageFlags dstStageMask,
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);

© Copyright Khronos Group 2016 - Page 170

Events
• Events VkResult vkSetEvent(
VkDevice device,
- Same info as Pipeline Barriers VkEvent event);
- …but operate over a range
VkResult vkResetEvent(
VkDevice device,
• CPU interaction VkEvent event);

- No explicit CPU wait

- No Memory Barriers VkResult vkGetEventStatus(
VkDevice device,
VkEvent event);

© Copyright Khronos Group 2016 - Page 171

- No explicit CPU wait

- No Memory Barriers VkResult vkGetEventStatus(
VkDevice device,
VkEvent event);
• Warning!
- OS may apply a timeout
- Set events soon after submission
- Could you just defer submission?

© Copyright Khronos Group 2016 - Page 172

Pipeline Barriers vs Events
• Use pipeline barriers for point synchronization
- Dependant operation immediately precedes operation that depends on it
- May be more optimal than set/wait event pair

• Use events if other work possible between two operations

- Set immediately after the dependant operation
- Wait immediately before the operation that depends on it

• Use events for CPU/GPU synchronization

- Memory accesses between processors
- Late latching of data to reduce latency

© Copyright Khronos Group 2016 - Page 173

Memory Barrier Types
• Global Memory Barrier
- All memory-backed resources OpenGL’s memory barriers imply
execution dependencies, which
Vulkan memory barriers do not –
• Buffer Barrier execution barriers are provided by
a pipeline barrier, event or subpass
- For a single buffer range dependency.

• Image Barrier
- For a single image subresource range

© Copyright Khronos Group 2016 - Page 174

Global Memory Barriers
• Global Memory Barriers typedef struct VkMemoryBarrier {
VkStructureType sType;
- All memory used by accessed stages const void* pNext;
- Effectively flushes entire caches VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
} VkMemoryBarrier;
• Use when many resources transition
- Cheaper than one-by-one
- Don’t transition unnecessarily!

• User must define prior access

- Driver not tracking for you

© Copyright Khronos Group 2016 - Page 175

Buffer Barriers
• Buffer Barriers typedef struct VkBufferMemoryBarrier {
VkStructureType sType;
- A single buffer range const void* pNext;
- Defines access stages VkAccessFlags srcAccessMask;

- Defines queue ownership VkAccessFlags dstAccessMask;

uint32_t srcQueueFamilyIndex;
uint32_t dstQueueFamilyIndex;

• User must define prior access VkBuffer buffer;

VkDeviceSize offset;
- Driver not tracking for you VkDeviceSize size;
} VkBufferMemoryBarrier;

© Copyright Khronos Group 2016 - Page 176

Image Barriers
• Image Barriers typedef struct VkImageMemoryBarrier {
VkStructureType sType;
- A single image subresource range const void* pNext;
- Defines access stages VkAccessFlags srcAccessMask;

- Defines queue ownership VkAccessFlags dstAccessMask;

VkImageLayout oldLayout;
- Defines image layout VkImageLayout newLayout;
uint32_t srcQueueFamilyIndex;
uint32_t dstQueueFamilyIndex;
• User must define prior access VkImage image;
- Driver not tracking for you VkImageSubresourceRange subresourceRange;

- For images, this includes prior layout } VkImageMemoryBarrier;

• Appropriate layouts allow compression

- GPU may use image compression
- Saves bandwidth
- Use GENERAL instead of switching
frequently
© Copyright Khronos Group 2016 - Page 177
Subpass Dependencies
• Subpass dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Similar info to Pipeline Barriers uint32_t dstSubpass;
- Explicitly between two subpasses VkPipelineStageFlags srcStageMask;
VkPipelineStageFlags dstStageMask;
VkAccessFlags srcAccessMask;
• Memory barriers VkAccessFlags dstAccessMask;

- Implicit for attachments VkDependencyFlags dependencyFlags;

} VkSubpassDependency;
- Explicit for other resources

• Pixel local dependencies

- Same fragment/sample location
- Cheap for most implementations
- Use region dependency flag:
- VK_DEPENDENCY_BY_REGION_BIT

© Copyright Khronos Group 2016 - Page 178

Subpass Dependencies
• Subpass self-dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Subpasses can wait on themselves uint32_t dstSubpass;
- A pipeline barrier in the subpass VkPipelineStageFlags srcStageMask;
VkPipelineStageFlags dstStageMask;
VkAccessFlags srcAccessMask;
• Forward progress only VkAccessFlags dstAccessMask;

- Can’t wait on later stages VkDependencyFlags dependencyFlags;

} VkSubpassDependency;
- Must wait on earlier or same stage
void vkCmdPipelineBarrier(
VkCommandBuffer commandBuffer,
• Pixel local only between fragments VkPipelineStageFlags srcStageMask,
- Must use flag: VkPipelineStageFlags dstStageMask,
VkDependencyFlags dependencyFlags,
- VK_DEPENDENCY_BY_REGION_BIT
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);

© Copyright Khronos Group 2016 - Page 179

Subpass Dependencies
• Subpass external dependencies typedef struct VkSubpassDependency {
uint32_t srcSubpass;
- Wait on ‘external’ operations uint32_t dstSubpass;
- vkCmdWaitEvent in the subpass VkPipelineStageFlags srcStageMask;

- Events set outside the render pass VkPipelineStageFlags dstStageMask;

VkAccessFlags srcAccessMask;
VkAccessFlags dstAccessMask;
VkDependencyFlags dependencyFlags;
} VkSubpassDependency;
void vkCmdWaitEvents(
VkCommandBuffer commandBuffer,
uint32_t eventCount,
const VkEvent* pEvents,
VkPipelineStageFlags srcStageMask,
VkPipelineStageFlags dstStageMask,
uint32_t memoryBarrierCount,
const VkMemoryBarrier* pMemoryBarriers,
uint32_t bufferMemoryBarrierCount,
const VkBufferMemoryBarrier* pBufferMemoryBarriers,
uint32_t imageMemoryBarrierCount,
const VkImageMemoryBarrier* pImageMemoryBarriers);

© Copyright Khronos Group 2016 - Page 180

Example – Texture Upload
// Transition the buffer from host write to transfer read
bufferBarrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT;
bufferBarrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT;
// Transition the image to transfer destination
imageBarrier.srcAccessMask = 0;
imageBarrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
imageBarrier.oldLayout = VK_IMAGE_LAYOUT_UNDEFINED;
imageBarrier.newLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;

vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, &bufferBarrier,

&imageBarrier);

vkCmdCopyBufferToImage(commandBuffer, srcBuffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &copy);

// Transition the image from transfer destination to shader read

imageBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT;
imageBarrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
imageBarrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;
imageBarrier.newLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;

vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT,

&imageBarrier);
© Copyright Khronos Group 2016 - Page 181
Example – Compute to Draw Indirect
// Add a subpass dependency to express the wait on an external event
externalDependency.srcSubpass = VK_SUBPASS_EXTERNAL;
externalDependency.srcStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
externalDependency.dstStageMask = VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT;
externalDependency.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
externalDependency.dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;

// Dispatch a compute shader that generates indirect command structures

vkCmdDispatch(...);
// Set an event that can be later waited on (same source stage).
vkCmdSetEvent(commandBuffer, event, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT);

vkCmdBeginRenderPass(...);

//Transition the buffer from shader write to indirect command

bufferBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
bufferBarrier.dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
bufferBarrier.buffer = indirectBuffer;
vkCmdWaitEvent(commandBuffer, event, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,
&bufferBarrier);

vkCmdDrawIndirect(commandBuffer, indirectBuffer, ...);

© Copyright Khronos Group 2016 - Page 182
Semaphores
• Semaphores typedef struct VkSubmitInfo {
VkStructureType sType;
- Used to synchronize queues const void* pNext;
- Not necessary for single-queue uint32_t waitSemaphoreCount;
const VkSemaphore* pWaitSemaphores;
const VkPipelineStageFlags* pWaitDstStageMask;
• Fairly coarse grain uint32_t commandBufferCount;

- Per submission batch const VkCommandBuffer* pCommandBuffers;

uint32_t signalSemaphoreCount;
- E.g. a set of command buffers const VkSemaphore* pSignalSemaphores;
- Multiple per submit command } VkSubmitInfo;

• Implicit memory guarantees

- Effects visible to future operations on
the same device
- Not guaranteed visible to host

© Copyright Khronos Group 2016 - Page 183

Example – Acquire and Present
// Acquire an image. Pass in a semaphore to be signalled
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, acquireSemaphore, VK_NULL_HANDLE, &imageIndex);

// Submit command buffers

submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = &acquireSemaphore;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
submitInfo.signalSemaphoreCount = 1;
submitInfo.pWaitSemaphores = &graphicsSemaphore;

vkQueueSubmit(graphicsQueue, 1, &submitInfo, fence);

// Present images to the display

presentInfo.waitSemaphoreCount = 1;
presentInfo.pWaitSemaphores = &graphicsSemaphore;
presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = &swapchain;
presentInfo.pImageIndices = &imageIndex;

vkQueuePresent(presentQueue, &presentInfo);

© Copyright Khronos Group 2016 - Page 184

Example – Acquire and Present (same queue)
// Acquire an image. Pass in a semaphore to be signalled
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, acquireSemaphore, VK_NULL_HANDLE, &imageIndex);

// Submit command buffers

submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = &acquireSemaphore;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
submitInfo.signalSemaphoreCount = 0;

vkQueueSubmit(universalQueue, 1, &submitInfo, fence);

// Present images to the display

presentInfo.waitSemaphoreCount = 0;

presentInfo.swapchainCount = 1;
presentInfo.pSwapchains = &swapchain;
presentInfo.pImageIndices = &imageIndex;

vkQueuePresent(universalQueue, &presentInfo);

© Copyright Khronos Group 2016 - Page 185

Fences
• Fences VkResult vkQueueSubmit(
VkQueue queue,
- Used to synchronize queue to CPU uint32_t submitCount,
const VkSubmitInfo* pSubmits,
VkFence fence);
• Very coarse grain
- Per queue submit command VkResult vkResetFences(
VkDevice device,
uint32_t fenceCount,
• Implicit memory guarantees const VkFence* pFences);

- Effects visible to future operations on VkResult vkGetFenceStatus(

the same device VkDevice device,
- Not guaranteed visible to host VkFence fence);

VkResult vkWaitForFences(
VkDevice device,
GL’s fences are like a combination
uint32_t fenceCount,
of a semaphore and a fence in
const VkFence* pFences,
Vulkan – they can synchronize GPU
VkBool32 waitAll,
and CPU in multiple ways at a
uint64_t timeout);
coarse granularity.
© Copyright Khronos Group 2016 - Page 186
Example – Multi-buffering
// Have enough resources and fences to have one per in-flight-frame, usually the swapchain image count
VkBuffer buffers[swapchainImageCount];
VkFence fence[swapchainImageCount];

// Can use the index from the presentation engine - 1:1 mapping between swapchain images and resources
vkAcquireNextImageKHR(device, swapchain, UINT64_MAX, semaphore, VK_NULL_HANDLE, &nextIndex);

// Make absolutely sure that the work has completed

vkWaitForFences(device, 1, &fence[nextIndex], true, UINT64_MAX);

// Reset the fences we waited on, so they can be re-used

vkResetFences(device, 1, &fence[nextIndex]);

// Change the data in your per-frame resources (with appropriate events/barriers!)

...

// Submit any work to the queue, with those fences being re-used for the next time around
vkQueueSubmit(graphicsQueue, 1, &sSubmitInfo, fence[nextIndex]);

© Copyright Khronos Group 2016 - Page 187

Wait Idle
• Ensures execution completes VkResult vkQueueSubmit(
VkQueue queue,
- VERY heavy-weight uint32_t submitCount,
const VkSubmitInfo* pSubmits,
VkFence fence);
• vkQueueWaitIdle
- Wait for queue operations to finish VkResult vkResetFences(

- Equivalent to waiting on a fence VkDevice device,

uint32_t fenceCount,
const VkFence* pFences);

• vkDeviceWaitIdle VkResult vkGetFenceStatus(

- Waits for device operations to finish VkDevice device,
- Includes vkQueueWaitIdle for queues VkFence fence);

VkResult vkWaitForFences(
VkDevice device,
These are a lot like glFinish, and uint32_t fenceCount,
should be treated similarly – use const VkFence* pFences,
them VERY SPARINGLY. VkBool32 waitAll,
uint64_t timeout);

© Copyright Khronos Group 2016 - Page 188

Wait Idle
• Useful primarily at teardown
- Use it to quickly ensure all work is done

• Favour other synchronization at all other times

- Extremely heavyweight, will cause serialization!

© Copyright Khronos Group 2016 - Page 189

Programmer Guidelines
• Specify EXACTLY the right amount of synchronization
- Too much and you risk starving your GPU
- Miss any and your GPU will bite you

• Use the validation layers to help!

- Won’t catch everything yet, but improving over time

• Pay particular attention to the pipeline stages

- Fiddly but become intuitive as you use them

• Consider Image Layouts

- If your GPU can save bandwidth it will

• Different behaviour depending on implementation

- Test/Tune on every platform you can find!
© Copyright Khronos Group 2016 - Page 190
Keep your GPU fed without getting bitten!

Questions?

© Copyright Khronos Group 2016 - Page 191

Swapchains Unchained!
(What you need to know about Vulkan WSI)
Alon Or-bach, Chair, Vulkan System
Integration Sub-Group – May 2016
@alonorbach (disclaimers apply!)
© Copyright Khronos Group 2016 - Page 193
Intro to Vulkan Window System Integration
• Explicit control for acquisition and
presentation of images WSI Jargon Buster
- Designed to fit the Vulkan API and today’s
• Platform
compositing window systems Our terminology for an OS
• Not all extensions are supported by every / window system e.g.
platform Android, Windows,
- You MUST check and enable the extensions Wayland, X11 via XCB
your app/engine uses!!! • Presentation Engine
The platform’s compositor
• Today’s presentation should help you get
or display engine
presentation working
• Application
- Learn how to present through a swapchain
Your app or game engine
- Overview of Vulkan objects used by the WSI
extensions

© Copyright Khronos Group 2016 - Page 194

How many WSI extensions are there?
• Two cross-platform instance extensions
- VK_KHR_surface
- VK_KHR_display
• Six (platform) instance extensions
- VK_KHR_android_surface
- VK_KHR_mir_surface
- VK_KHR_wayland_surface
- VK_KHR_win32_surface
- VK_KHR_xcb_surface
- VK_KHR_xlib_surface
• Two cross-platform device extensions
- VK_KHR_swapchain
- VK_KHR_display_swapchain

Vulkan Surfaces
• VkSurfaceKHR Physical Device A
Queue
- Vulkan’s way to encapsulate a native Family 2
window / surface Queue
Family 1 Queue
Unlike an EGLSurface, creating a Family 0
Vulkan Surface doesn’t mean you’ve
got your render targets created …yet
Physical Device B
Platform X
• Platform-independent surface queries
- Find out crucial information about your Queue
Queue
Family 1 Platform Y
surface’s properties Family 0
- Such as format, transform, image usage
- Some platforms provide additional queries
• Presentation support is per queue family Physical Device C
Queue
- An implementation may support multiple Queue
Family 1
platforms e.g. both xlib and xcb Family 0

- Or may not support presentation at all

© Copyright Khronos Group 2016 - Page 196
Vulkan Swapchains: VK_KHR_swapchain
• Array of presentable images associated with
a surface const VkSwapchainCreateInfoKHR createInfo =
- Application requests a minimum number {VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR, // sType
of presentable images NULL, // pNext
0, // flags
- Implementation creates at least that mySurface, // surface
desiredNumberOfPresentableImages, // minImageCount
number surfaceFormat, // imageFormat
surfaceColorSpace, // imageColorSpace
- Implementation may have a limit myExtent, // imageExtent
1, // imageArrayLayers
• Upfront allocation of presentable images VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT, // imageUsage
VK_SHARING_MODE_EXCLUSIVE, // imageSharingMode
- No allocation hitching at crucial moment 0, // queueFamilyIndexCount
NULL, // pQueueFamilyIndices
- Pre-record fixed content command buffers surfaceProperties.currentTransform, // preTransform
VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR, // compositeAlpha
• Present mode determines behavior swapchainPresentMode, // presentMode
VK_TRUE, // clipped
- FIFO support mandatory VK_NULL_HANDLE // oldSwapchain
};
- Platforms can offer mailbox,
immediate, FIFO relaxed FIFO is like eglSwapInterval = 1
Mailbox/Immediate is like eglSwapInterval 0
FIFO relaxed is like EXT_swap_control_tear
© Copyright Khronos Group 2016 - Page 197
Vulkan Swapchains: They’re good!
• Application knows which image within a
swapchain it is presenting Similar but neater than how
- Content of image preserved between EGL_KHR_partial_update /
EGL_EXT_buffer_age and preserved
presents behavior achieves this
• Application is responsible for explicitly
recreating swapchains - no surprises
- Platform informs app if current swapchain
- Suboptimal: e.g. after window resize,
swapchain still usable for present via image
scaling
- Surface Lost: swapchain no longer usable for
present
- Application is responsible to create a new
swapchain In EGL, the EGLSurface may be resized by the
platform after an eglSwapBuffers call.
Vulkan requires the application to intervene
© Copyright Khronos Group 2016 - Page 198
Vulkan Swapchains: They’re jolly good!
• Presenting and acquiring are separate
operations
- No need to submit a new image to acquire
another one, unless presentation engine
cannot release it
• Application must only modify presentable
images it has acquired
• Presentation engine must only display
presentable images that have been Stalls in frame loop
presented! are very bad!

In EGL, calling eglSwapBuffers both presents the

current back buffer and acquires a new one
Vulkan splits this up into separate operations

Steps to setup your presentable images
1 – Create a native
window/surface Platform-specific APIs

2 – Create a Vulkan
surface
VK_KHR_<platform>_surface

3 – Query information
about your surface
VK_KHR_surface

0 – Create your
swapchain

1 – Acquire the next

3 – Present the image
presentable image

VK_KHR_swapchain

Legend
2 – Submit command Setup
buffer(s) for that image Steady-state
Response to suboptimal
/ surface_lost

Vulkan Displays: VK_KHR_display Display 0
• Vulkan’s way to discover display devices Display
Display
(screens, panels) outside a window system Mode 1
Mode 0
- Reminder: Not supported on all platforms
Physical
• Defines VkDisplayKHR and Device
VkDisplayModeKHR objects
Plane 2
- Represent the display devices and the Plane 1
Plane 0
modes they support connected to a
VkPhysicalDevice
- Determine if a display supports multiple
planes that are blended together Surface Display 1

• Enables creation of a VkSurfaceKHR to Display

Display
Mode 1
represent a display plane Mode 0

A Vulkan display represents an actual display!

(Whereas an EGLDisplay is actually just a
connection to a driver – like a Vulkan Device)
© Copyright Khronos Group 2016 - Page 202
VK_KHR_display_swapchain
• Extends the information provided at vkQueuePresentKHR
- What region to present from the swapchain image
- What region to present to on the display
- Whether the display should persist the image
• Adds ability to create a shared swapchain
- Swapchain that takes multiple VkSwapchainCreateInfoKHR structs
- Allows multiple displays to be presented to simultaneously
- No guarantee that presents are atomic ...presently!

Any question?
[email protected]
@alonorbach
© Copyright Khronos Group 2016 - Page 204
Moving To Vulkan
Asynchronous Compute
Chris Hebert, Dev Tech Software Engineer, Professional Visualization
Who am I?
Chris Hebert
@chrisjhebert

Dev Tech Software Engineer- Pro Vis

20 years in the industry
Joined NVIDIA in March 2015.
Real time graphics makes me happy
I also like helicopters
Chris Hebert - Circa 1974

206
NVIDIA/KHRONOS CONFIDENTIAL

• Some Context

Agenda • Sharing The Load

• Pipeline Barriers

207
NVIDIA/KHRONOS CONFIDENTIAL

Some Context

208
GPU Architecture
In a nutshell
NVIDIA Maxwell 2
Register File

Core
Load Store Unit

209
Execution Model SMM
Thread Hierarchies

Logical View HW View

32 threads

32 threads
Work Group Warps

210
Resource Partitioning
Resources Are Limited

Key resources impacting local execution:

• Program Counters
• Registers
• Shared Memory

211
Resource Partitioning
Resources Are Limited

Key resources impacting local execution:

• Program Counters Partitioned amongst threads
• Registers
• Shared Memory
Partitioned amongst work groups

212
Resource Partitioning
Resources Are Limited

Key resources impacting local execution:

• Program Counters Partitioned amongst threads
• Registers
• Shared Memory
Partitioned amongst work groups

e.g. GTX 980 ti

64k 32bit registers per SM
96kb shared memory per SM

213
Resource Partitioning
Registers

The more registers used by a kernel means few resident warps on the SM

Fewer Registers More Registers

More Threads Fewer Threads

214
Resource Partitioning
Shared Memory

The more shared memory used by a work group means fewer work groups on the SM

Less SMEM More SMEM

More Groups Fewer Groups

215
Keeping It Moving
Occupancy

• Some small kernels may have low occupancy

• Depending on the algorithm
• Compute resources are limited
• Shared across threads or work groups on a per SM basis
• Warps stall when they have to wait for resources
• This latency can be hidden
• If there are other warps ready to execute.

216
Keeping It Moving
Occupancy – Simple Theoretical Example

• Simple kernel that updates positions of 20480 particles

• 1 FMAD - ~20 cycles (instruction latency)
• 20480 particles = 640 warps
• To hide this latency, according to Littles Law
• Required Warps = Latency x Throughput
• Throughput should be 32 threads * 16 sms = 512 to keep GPU busy
• Required warps is 20*512 = 10240
• ….oh….

217
Keeping It Moving
Occupancy – Simple Theoretical Example

• Simple kernel that updates positions of 20480 particles

• 1 FMAD - ~20 cycles (instruction latency)
• 20480 particles = 640 warps
• To hide this latency, according to Littles Law – But only on 1 SM..
• Required Warps = Latency x Throughput
• Throughput should be 32 threads * 1 sm = 32 to keep GPU busy
• Required warps is 20*32 = 640
• And we theoretically have 15 SMs to use for other stuff.

218
Queuing It Up
Working with 1 Queue • Scheduler will distribute work across all SMs
• kernels execute in sequence
Command Buffer (there may be some overlap)
Command Buffer

Command Buffer
• Low occupancy kernels will waste GPU time
Command Buffer

Kernel Kernel Kernel

Command Queue

Command Buffer
Transfers

219
NVIDIA/KHRONOS CONFIDENTIAL

Sharing The Load

220
Queuing It Up
Working with N Queues

Command Buffer
• NVIDIA hardware gives you 16 all powerful queues
Command Buffer

Command Buffer • 1 Queue family that supports all operations

Command Buffer
• 16 queues available for use

Command Queue #1 Kernel Kernel Kernel

Command Queue #2 Kernel Kernel Kernel

Command Queue #3 Kernel Kernel Kernel

221
Queuing It Up
Working with N Queues

Command Buffer
• Application decides which queues for which kernels
Command Buffer

Command Buffer • Load balance for best performance

Command Buffer
• Profile (Nsight) to gain insights

Command Queue #1 Kernel Kernel Kernel

Command Queue #2 Kernel Kernel Kernel

Command Queue #3 Kernel Kernel Kernel

222
Queuing It Up
Compute and Graphics In Harmony

• Some hardware can even run compute and graphics work concurrently
• Needs fast context switching and at high granularity (not just at draw commands)
• Simple Graphics work tends to have high occupancy
• Complex graphics work can reduce occupancy
• Profile for performance insights

223
Queuing It Up
Compute and Graphics In Harmony

Compute Cmd Buffer • Profile to understand occupancy of both graphics and compute workloads
Compute Cmd Buffer

Graphics Cmd Buffer

• Queues can support both compute and graphics
Compute Cmd Buffer

Command Queue #1 Kernel Kernel Kernel

Command Queue #2 Kernel Kernel Kernel

Command Queue #3 Kernel Kernel Kernel

224
An Example
Compute and Graphics In Harmony

Free Surface Navier Stokes Solver

• 11 Compute Kernels
• 4 Shaders
Click here to view this video

• The output of each kernel is the input to the next

• Some kernels have very low occupancy
• Still opportunities for concurrency with compute

225
An Example
Many discretized operations are separable

Process X Axis Process Y Axis

(and half the Z) (and other half of Z)
Examples
Command Queue Command Queue • Fluid Sims
• Gaussian Blurs
• Convolution Kernels

SM SM SM SM SM SM SM SM

Driver handles dispatching groups

Semaphore Semaphore Use semaphores to synchronize

226
An Example
Compute and graphics run concurrently
Compute Graphics

Compute Work Graphics Work Frame N

Command Queue Command Queue Frame

Frame N
N+1

Frame Frame
N+2 N+1
SM SM SM SM SM SM SM SM
Frame Frame
N+3 N+2
SM SM SM SM SM SM SM SM
Frame Frame
N+4 N+3

Semaphore
227
An Example
Putting it all together
Compute Graphics

Process X Axis Process Y Axis Frame N

Graphics Work
(and half the Z) (and other half of Z)

Command Queue Command Queue Command Queue Frame

Frame N
N+1

Frame Frame
N+2 N+1
SM SM SM SM SM SM SM SM
Frame Frame
N+3 N+2
SM SM SM SM SM SM SM SM
Frame Frame
N+4 N+3

Semaphore Semaphore
228
Memory Transfers
More opportunity for concurrency

• Memory transfers are handle by MMU

• Can run concurrently with Kernels
• As long as the current kernel isnt using the memory

MMU may be idle

Why do this?

Command Queue #1 Kernel Transfer Kernel Transfer Kernel

ALUs may be idle

229
Memory Transfers Examples
More opportunity for concurrency • Large image processing
• Video processing
When you can do this
• DtoH and HtoD transfers can run concurrently

Host to Device Queue Transfer Transfer Transfer

Compute Queue Kernel Kernel Kernel

Device to Host Queue Transfer Transfer Transfer

230
NVIDIA/KHRONOS CONFIDENTIAL

Conclusion
Takeaways

There is more than 1 queue available

Keep registers and shared memory to a minimum
Low occupancy leads to an under utilized GPU
Maximize GPU utilization by running kernels concurrently
Profile to understand the occupancy profiles of kernels and shaders
Some hardware can run kernels AND shaders concurrently
Use Semaphores to synchronize between queues
Be sensible at the beer festival

231
NVIDIA/KHRONOS CONFIDENTIAL

Thank You Enjoy Vulkan!!

232
Questions?
Chris Hebert, Dev Tech Software Engineer, Professional Visualization
Porting to Vulkan
Hans-Kristian Arntzen
Engineer, ARM
(Credit for slides: Marius Bjørge)
© Copyright Khronos Group 2016 - Page 234
Agenda
• API flashback
• Engine design
- Command buffers
- Pipelines
- Render passes
- Memory management

API Flashback

Application

Application
Logic shift

Driver

API Flashback
vkDevice

vkQueue vkCommandPool

vkCommandBuffer

vkCmdBindDescrip
vkBeginRenderPass vkCmdBindXXX vkCmdBindPipeline vkCmdDraw vkEndRenderPass
torSets

vkRenderPass vkBuffer vkPipeline vkDescriptorSet

State vkBufferView
vkFramebuffer Shaders vkImageView
vkImageView vkRenderPass vkSampler

vkDeviceMemory vkDeviceMemory vkDescriptorPool

Heap

Porting from OpenGL to Vulkan?
• Most graphics engines today are designed around the principles of implicit driver
behaviour
- A direct port to Vulkan won’t necessarily give you a lot of benefits

• Approach it differently
- Re-design for Vulkan, and then port that to OpenGL

Allocating Memory
• Memory is first allocated and then bound to Vulkan objects
- Different Vulkan objects may have different memory requirements
- Allows for aliasing memory across different Vulkan objects
• Driver does no ref counting of any objects in Vulkan
- Cannot free memory until you are sure it is never going to be used again
- Also applies to API handles!

• Most of the memory allocated during run-time is transient

- Allocate, write and use in the same frame
- Block based memory allocator

Block Based Memory Allocator
• Relaxes memory reference counting
• Only entire blocks are freed/recycled
• Sub-allocations take refcount on block

Command Buffers
• Request command buffers on the fly
- Allocated using ONE_TIME_SUBMIT_BIT
- Recycled

• Separate command pools per

- Thread
- Frame
- Primary/secondary

Secondary Command Buffers

vkCommandPool vkCommandBuffer
Main thread
vkBeginRenderPass vkCmdExecuteCommands vkEndRenderPass

Thread 0 vkCommandPool Secondary command buffer

Thread 1 vkCommandPool Secondary command buffer

Thread 2 vkCommandPool Secondary command buffer

Shaders
• Standardize on SPIR-V binary shaders
• Extensively use the Khronos SPIRV-Cross library
- Cross compiling back to GLSL
- Provides shader reflection for
- Vertex attributes
- Subpass attachments
- Pipeline layouts
- Push constants

Pipelines

Pipeline state

Dynamic state Shaders Render pass

Blend State Pipeline layout

Rasterizer state Vertex input

Depth/stencil state Input assembly

Pipelines
• Not trivial to create all required pipeline state objects upfront
Public interface

• Our approach: SetRenderState()

- Keep track of all pipeline state per command buffer
- Flush pipeline creation when required SetShaders()

- In our case this is implemented as an async operation SetVertexBuffer()

SetIndexBuffer()

Command
Draw() Buffer Internal

Flush

RequestPipeline

CreateNewPipeline

Pipelines
• In an ideal world…
- All pipeline combinations should be created upfront

• …but this requires detailed knowledge of every potential shader/state combination that
you might have in your scene
- As an example, one of our fragment shaders have ~9000 combinations
- Every one of these shaders can use different render state
- We also have to make sure the pipelines are bound to compatible render passes
- An explosion of combinations!

Pipeline cache
• Vulkan has built-in support for pipeline caching
- Store to disk and re-use on next run

• Can also speed up pipeline creation during run-time

- If the pipeline state is already in the cache it can be re-used

Pipeline state

Dynamic state Shaders Render pass

Blend State Pipeline layout

Rasterizer state Vertex input

Depth/stencil state Input assembly

vkPipelineCache
Disk

Pipeline layout
• Defines what kind of resources are in each binding slot in your shaders
- Textures, samplers, buffers, push constants, etc
• Can be shared among different pipeline objects

Pipeline layout
• Use SPIRV-Cross to automatically get binding information from SPIR-V shaders

SPIR-V shader

SPIRV-cross Pipeline layout

Descriptor set layout

Push constant range

Descriptor Sets
• Textures, uniform buffers, etc. are bound to shaders in descriptor sets
- Hierarchical invalidation
- Order descriptor sets by update frequency

• Ideally all descriptors are pre-baked during level load

- Keep track of low level descriptor sets per material
- But, this is not trivial

Descriptor Sets
• Our solution:
- Keep track of bindings and update descriptor sets when necessary
- Keep cache of descriptor sets used with immutable Vulkan objects
Public interface

SetShaders()

SetConstantData()

SetTexture()

Draw() Internal

Command Request cached

Buffer descriptor sets

Allocate descriptor sets Descriptor pool

Write descriptor sets Descriptor set layouts

BindDescriptorSets

Descriptor Set emulation
• We also need to support this in OpenGL

• Our solution:
- Emulate descriptor sets in our OpenGL backend
- SPIRV-Cross collapses and serializes bindings

Descriptor Set emulation
Shader
Set 0 Set 1 Set 2
0 GlobalVSData 0 MeshData 0 MaterialData
1 GlobalFSData 1 TexAlbedo
2 TexNormal
3 TexEnvmap

SPIR-V library to GLSL

Uniform block bindings Texture bindings

0 GlobalVSData 0 TexAlbedo
1 GlobalFSData 1 TexNormal
2 MeshData 2 TexEnvmap

Push Constants
• Push constants replace non-opaque uniforms
- Think of them as small, fast-access uniform buffer memory
• Update in Vulkan with vkCmdPushConstants
• Directly mapped to registers on Mali GPUs

// New
layout(push_constant, std430) uniform PushConstants {
mat4 MVP;
vec4 MaterialData;
} RegisterMapped;

// Old, no longer supported in Vulkan GLSL

uniform mat4 MVP;
uniform vec4 MaterialData;

Push Constant Emulation
• But again, we need to support OpenGL as well

• Our solution:
- Use SPIRV-Cross to turn push constants into regular non-opaque uniforms
- Logic in our OpenGL/Vulkan backends redirect the push constant data appropriately

Render pass
• Used to denote beginning and end of rendering to a framebuffer

• Can be re-used but must be compatible

- Attachments: Framebuffer format, image layout, MSAA?
- Subpasses DepthStencil
- Attachment load/store Color targets

Public interface

BeginRenderPass Internal

RequestFramebuffer

RequestRenderPass

Command CreateCompatibleRend
Buffer erPass

CreateFramebuffer

BeginRenderPass

Subpass Inputs
• Vulkan supports subpasses within render passes
• Standardized GL_EXT_shader_pixel_local_storage!
• Also useful for desktop GPUs

// GLSL
#extension GL_EXT_shader_pixel_local_storage : require
__pixel_local_inEXT GBuffer {
layout(rgba8) vec4 albedo;
layout(rgba8) vec4 normal;
...
} pls;

// Vulkan
layout(input_attachment_index = 0) uniform subpassInput albedo;
layout(input_attachment_index = 1) uniform subpassInput normal;
...

Subpass Input Emulation
• Supporting subpasses in GL is not trivial, and probably not feasible on a lot of
implementations

• Our solution:
- Use SPIRV-Cross to rewrite subpass inputs to Pixel Local Storage variables or texture
lookups
- This will only support a subset of the Vulkan subpass features, but good enough for our
current use

Synchronization
• Submitted work is completed out of order by the GPU
• Dependencies must be tracked by the application and handled explicitly
- Using output from a previous render pass
- Using output from a compute shader
- Etc
• Synchronization primitives in Vulkan
- Pipeline barriers and events
- Fences
- Semaphores

Render passes and pipeline barriers
• Most of the time the application knows upfront how the output of a renderpass is going to
be used afterwards
• Internally we have a couple of usage flags that we assign to a render pass
- On EndRenderPass we implicitly trigger a pipeline barrier

Public interface

BeginRenderPass

Render pass usage flags

DrawSomething Pipeline stages?
Memory domains?

EndRenderPass

Command
Buffer Internal

vkCmdEndRenderPass

vkCmdPipelineBarrier

Image Layout Transitions
• Must match how the image is used at any time
• Pedantic or relaxed
- Some implementations will require careful tracking of previous and new layout to achieve
optimal performance
- For Mali we can be quite relaxed with this – most of the time we can keep the image
layout as VK_IMAGE_LAYOUT_GENERAL

Summary
• Don’t allocate or release during runtime
• Batching still applies
• Multi-thread your code!
• Use push-constants as much as possible
• Multi-pass is fantastic on mobile GPUs

Vulkan
No ratings yet
Vulkan
77 pages
Vulkan in C++ (By Nvidia)
100% (1)
Vulkan in C++ (By Nvidia)
32 pages
1 Vulkan Tutorial - English
No ratings yet
1 Vulkan Tutorial - English
210 pages
Glspec46 Core
No ratings yet
Glspec46 Core
851 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
Vulkan Abridged
No ratings yet
Vulkan Abridged
256 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Vulkan Overview
100% (1)
Vulkan Overview
25 pages
API Without Secrets
No ratings yet
API Without Secrets
158 pages
Mobile3D EG08
No ratings yet
Mobile3D EG08
512 pages
8 - Pantum5103MFP Series Service Manual-20171205
No ratings yet
8 - Pantum5103MFP Series Service Manual-20171205
88 pages
Pierre Loup Griffais and John McDonald Vulkan
No ratings yet
Pierre Loup Griffais and John McDonald Vulkan
65 pages
Opengl
No ratings yet
Opengl
562 pages
AMD OpenCL Programming User Guide
No ratings yet
AMD OpenCL Programming User Guide
180 pages
10+ +Rasterization+Pipelines
No ratings yet
10+ +Rasterization+Pipelines
73 pages
LLVM Clang - Advancing Compiler Technology
No ratings yet
LLVM Clang - Advancing Compiler Technology
28 pages
2023waldemarson VulkanRTRT
No ratings yet
2023waldemarson VulkanRTRT
41 pages
4 Vulkan Getting Explicit How Hard Is Vulkan Really GDC Mar18
No ratings yet
4 Vulkan Getting Explicit How Hard Is Vulkan Really GDC Mar18
46 pages
CGPresentation Week2 (API, GPU&OpenGLInstallation)
No ratings yet
CGPresentation Week2 (API, GPU&OpenGLInstallation)
55 pages
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
No ratings yet
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
45 pages
Las 2020
No ratings yet
Las 2020
32 pages
Lecture9 Interactive 3D Graphics
No ratings yet
Lecture9 Interactive 3D Graphics
36 pages
Implementing Low Level GPU Hans Kristian Munich 2019
No ratings yet
Implementing Low Level GPU Hans Kristian Munich 2019
44 pages
TC3 Modbus RTU: Manual
No ratings yet
TC3 Modbus RTU: Manual
36 pages
Open Standards For Gamingnov13 131121092648 Phpapp02
No ratings yet
Open Standards For Gamingnov13 131121092648 Phpapp02
32 pages
Script Imgui
No ratings yet
Script Imgui
32 pages
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
No ratings yet
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
258 pages
Reac2023 Modern Mobile Rendering at Hypehype
No ratings yet
Reac2023 Modern Mobile Rendering at Hypehype
28 pages
Optimizing 2D Graphics With Openvg and I.Mx35
No ratings yet
Optimizing 2D Graphics With Openvg and I.Mx35
20 pages
Wa0020
No ratings yet
Wa0020
23 pages
OpenGL Intro
No ratings yet
OpenGL Intro
98 pages
Bringing Vulkan To VR - Cass Everitt, Oculus
No ratings yet
Bringing Vulkan To VR - Cass Everitt, Oculus
38 pages
Automation Anywhere - Latest1
No ratings yet
Automation Anywhere - Latest1
4 pages
Khronos OpenGL Efficiency GDC Mar14
No ratings yet
Khronos OpenGL Efficiency GDC Mar14
19 pages
Khronos Vulkan 1.3 Press Slides Jan22 19th Jan
No ratings yet
Khronos Vulkan 1.3 Press Slides Jan22 19th Jan
17 pages
Vulkan10 Reference Guide
No ratings yet
Vulkan10 Reference Guide
13 pages
Opencl 1pp PDF
No ratings yet
Opencl 1pp PDF
48 pages
Computer Graphics: Lecture 5 - Programming With Opengl
No ratings yet
Computer Graphics: Lecture 5 - Programming With Opengl
33 pages
1 OpenGLBackground
No ratings yet
1 OpenGLBackground
18 pages
A Practical Strategy and Workflow For Large Projects
No ratings yet
A Practical Strategy and Workflow For Large Projects
9 pages
Accelerating Graphic Rendering On Programmable RISC-V GPUs
No ratings yet
Accelerating Graphic Rendering On Programmable RISC-V GPUs
15 pages
Unit-5 CG
No ratings yet
Unit-5 CG
10 pages
Data Comunication Using Open GL
No ratings yet
Data Comunication Using Open GL
14 pages
Bindless Graphics
No ratings yet
Bindless Graphics
12 pages
Traffic Signal
No ratings yet
Traffic Signal
14 pages
Linux Graphics Drivers
No ratings yet
Linux Graphics Drivers
69 pages
Porting Source To Linux
No ratings yet
Porting Source To Linux
90 pages
Evolution of Graphics API: Sujal Bista CMSC 828V
No ratings yet
Evolution of Graphics API: Sujal Bista CMSC 828V
26 pages
Command Function Pointers Return Codes: Vulkan
No ratings yet
Command Function Pointers Return Codes: Vulkan
13 pages
Graphics: Nine Steps To Vulkan Literacy
No ratings yet
Graphics: Nine Steps To Vulkan Literacy
8 pages
OpenCL Jumpstart Guide
No ratings yet
OpenCL Jumpstart Guide
17 pages
Vulkan Klein
No ratings yet
Vulkan Klein
6 pages
VulkanPaper Server
No ratings yet
VulkanPaper Server
6 pages
02 Opengl PDF
No ratings yet
02 Opengl PDF
7 pages
GLFWVulkan Guide
No ratings yet
GLFWVulkan Guide
5 pages
Shaders and SPIR-V: Mike Bailey
No ratings yet
Shaders and SPIR-V: Mike Bailey
4 pages
Anisimova VulkanAndAnimation.4pp PDF
No ratings yet
Anisimova VulkanAndAnimation.4pp PDF
5 pages
Librerías Gráficas Introducción A Opengl Introducción
No ratings yet
Librerías Gráficas Introducción A Opengl Introducción
9 pages
Siemens-PLM - A-Complete-Guide-To-Enclosure-Thermal design-WhitePaper - tcm27-63195
No ratings yet
Siemens-PLM - A-Complete-Guide-To-Enclosure-Thermal design-WhitePaper - tcm27-63195
17 pages
OpenGL Tutorial 1 Preview
No ratings yet
OpenGL Tutorial 1 Preview
2 pages
Google Classroom Tutorial
No ratings yet
Google Classroom Tutorial
10 pages
SAP Readiness Check For SAP S4HANA Conversion
No ratings yet
SAP Readiness Check For SAP S4HANA Conversion
100 pages
SAP Testing Resume Sample 002
100% (1)
SAP Testing Resume Sample 002
7 pages
Controller Calibration Manual
No ratings yet
Controller Calibration Manual
12 pages
Gsscdeaada Groupshield Security Suite PDF
No ratings yet
Gsscdeaada Groupshield Security Suite PDF
180 pages
Thrisith Resume - React JS Frontend Developer
No ratings yet
Thrisith Resume - React JS Frontend Developer
1 page
Imd 123 Assignment 1
No ratings yet
Imd 123 Assignment 1
15 pages
How To Withdraw Bitcoins
No ratings yet
How To Withdraw Bitcoins
1 page
Cambridge IGCSE ™: Computer Science 0478/13
No ratings yet
Cambridge IGCSE ™: Computer Science 0478/13
8 pages
DP Jan Meravy
No ratings yet
DP Jan Meravy
60 pages
Master Thesis BWL PDF
100% (2)
Master Thesis BWL PDF
6 pages
Optical Data Capture: Optical Mark Recognition (OMR)
No ratings yet
Optical Data Capture: Optical Mark Recognition (OMR)
17 pages
Introduction To Java Programming Comprehensive Version 10th Edition Liang Test Bankinstant Download
100% (3)
Introduction To Java Programming Comprehensive Version 10th Edition Liang Test Bankinstant Download
45 pages
(Testing of Amazon Website Using Selenium Automation) (Roll - No-16,20,57)
No ratings yet
(Testing of Amazon Website Using Selenium Automation) (Roll - No-16,20,57)
16 pages
Gallagher - SALTO: Wireless Access Solutions
No ratings yet
Gallagher - SALTO: Wireless Access Solutions
4 pages
ST260 Manual
No ratings yet
ST260 Manual
24 pages
Mysql Practical Programs
No ratings yet
Mysql Practical Programs
33 pages
JD SW Designer
No ratings yet
JD SW Designer
2 pages
Harris Velocity
No ratings yet
Harris Velocity
3 pages
Agile in The Enterprise 2019 - Results Summary (Updated)
No ratings yet
Agile in The Enterprise 2019 - Results Summary (Updated)
50 pages
A Map of The Networking Code
No ratings yet
A Map of The Networking Code
41 pages
Kulkarni 2019
No ratings yet
Kulkarni 2019
6 pages
Case Study MIS of Deloitte
No ratings yet
Case Study MIS of Deloitte
7 pages
Linear Search&binary Search
No ratings yet
Linear Search&binary Search
5 pages
Commands Gitbash
No ratings yet
Commands Gitbash
2 pages