GLTF Tutorials - Wei Zhi PDF
GLTF Tutorials - Wei Zhi PDF
By Marco Hutter, @javagl
This tutorial gives an introduction to glTF, the GL transmission format. It summarizes the most important
features and application cases of glTF, and describes the structure of the files that are related to glTF. It explains
how glTF assets may be read, processed, and used to display 3D graphics efficiently.
Some basic knowledge about JSON, the JavaScript Object Notation, is assumed. Additionally, a basic
understanding of common graphics APIs, like OpenGL or WebGL, is required.
Introduction
Basic glTF Structure
Example: A Minimal glTF File
Scenes and Nodes
Buffers, BufferViews, and Accessors
Example: A Simple Animation
Animations
Example: Simple Meshes
Meshes
Materials
Example: A Simple Material
Textures, Images, and Samplers
Example: A Simple Texture
Example: An Advanced Material
Example: Simple Cameras
Cameras
Example: A Simple Morph Target
Morph Targets
Example: Simple Skin
Skins
Acknowledgements:
Patrick Cozzi, Cesium, @pjcozzi
Alexey Knyazev, @lexaknyazev
Sarah Chow, @slchow
Table of Contents | Next: Basic glTF Structure
Introduction to glTF using WebGL
An increasing number of applications and services are based on 3D content. Online shops offer product
configurators with a 3D preview. Museums digitize their artifacts with 3D scans and allow visitors to explore their
collections in virtual galleries. City planners use 3D city models for planning and information visualization.
Educators create interactive, animated 3D models of the human body. Many of these applications run directly in
the web browser, which is possible because all modern browsers support efficient rendering with WebGL.
Image 1a: Screenshots of various websites and applications showing 3D models.
Demand for 3D content in various applications is constantly increasing. In many cases, the 3D content has to be
transferred over the web, and it has to be rendered efficiently on the client side. But until now, there has been a
gap between the 3D content creation and efficient rendering of that 3D content in the runtime applications.
3D content pipelines
3D content that is rendered in client applications comes from different sources and is stored in different file
formats. The list of 3D graphics file formats on Wikipedia shows an overwhelming number, with more than 70
different file formats for 3D data, serving different purposes and application cases.
For example, raw 3D data may be obtained with a 3D scanner. These scanners usually provide the geometry data
of a single object, which is stored in OBJ, PLY, or STL files. These file formats do not contain information about
the scene structure or how the objects should be rendered.
More sophisticated 3D scenes can be created with authoring tools. These tools allow one to edit the structure of
the scene, the light setup, cameras, animations, and, of course, the 3D geometry of the objects that appear in
the scene. Applications store this information in their own, custom file formats. For example, Blender stores the
scenes in .blend files, LightWave3D uses the .lws file format, 3ds Max uses the .max file format, and Maya
uses .ma files.
In order to render such 3D content, the runtime application must be able to read different input file formats. The
scene structure has to be parsed, and the 3D geometry data has to be converted into the format required by the
graphics API. The 3D data has to be transferred to the graphics card memory, and then the rendering process
can be described with sequences of graphics API calls. Thus, each runtime application has to create importers,
loaders, or converters for all file formats that it will support, as shown in Image 1b.
Image 1b: The 3D content pipeline today.
glTF: A transmission format for 3D scenes
The goal of glTF is to define a standard for representing 3D content, in a form that is suitable for use in runtime
applications. The existing file formats are not appropriate for this use case: some of do not contain any scene
information, but only geometry data; others have been designed for exchanging data between authoring
applications, and their main goal is to retain as much information about the 3D scene as possible, resulting in
files that are usually large, complex, and hard to parse. Additionally, the geometry data may have to be
preprocessed so that it can be rendered with the client application.
None of the existing file formats were designed for the use case of efficiently transferring 3D scenes over the
web and rendering them as efficiently as possible. But glTF is not "yet another file format." It is the definition of a
transmission format for 3D scenes:
The scene structure is described with JSON, which is very compact and can easily be parsed.
The 3D data of the objects are stored in a form that can be directly used by the common graphics APIs, so
there is no overhead for decoding or preprocessing the 3D data.
Different content creation tools may now provide 3D content in the glTF format. And an increasing number of
client applications are able to consume and render glTF. Some of these applications are shown in Image 1a. So
glTF may help to bridge the gap between content creation and rendering, as shown in Image 1c.
Image 1c: The 3D content pipeline with glTF.
An increasing number of content creation tools will be able to provide glTF directly. Alternatively, other file
formats can be used to create glTF assets, using one of the opensource conversion utilities listed in the
Khronos glTF repository. For example, nearly all authoring applications can export their scenes in the COLLADA
format. So the COLLADA2GLTF tool can be used to convert scenes and models from these authoring
applications to glTF. OBJ files may be converted to glTF using obj2gltf. For other file formats, custom converters
can be used to create glTF assets, thus making the 3D content available for a broad range of runtime
applications.
Table of Contents | Next: Basic glTF Structure
Previous: Introduction | Table of Contents | Next: A Minimal glTF File
The Basic Structure of glTF
The core of glTF is a JSON file. This file describes the whole contents of the 3D scene. It consists of a
description of the scene structure itself, which is given by a hierarchy of nodes that define a scene graph. The 3D
objects that appear in the scene are defined using meshes that are attached to the nodes. Materials define the
appearance of the objects. Animations describe how the 3D objects are transformed (e.g., rotated to translated)
over time, and skins define how the geometry of the objects is deformed based on a skeleton pose. Cameras
describe the view configuration for the renderer.
The JSON structure
The scene objects are stored in arrays in the JSON file. They can be accessed using the index of the respective
object in the array:
"meshes" :
[
{ ... }
{ ... }
...
],
These indices are also used to define the relationships between the objects. The example above defines multiple
meshes, and a node may refer to one of these meshes, using the mesh index, to indicate that the mesh should
be attached to this node:
"nodes":
[
{ "mesh": 0, ... },
{ "mesh": 5, ... },
...
}
The following image (adapted from the glTF concepts section) gives an overview of the toplevel elements of the
JSON part of a glTF asset:
Image 2a: The glTF JSON structure
These elements are summarized here quickly, to give an overview, with links to the respective sections of the
glTF specification. More detailed explanations of the relationships between these elements will be given in the
following sections.
References to external data
The binary data, like geometry and textures of the 3D objects, are usually not contained in the JSON file.
Instead, they are stored in dedicated files, and the JSON part only contains links to these files. This allows the
binary data to be stored in a form that is very compact and can efficiently be transferred over the web.
Additionally, the data can be stored in a format that can be used directly in the renderer, without having to parse,
decode, or preprocess the data.
"Uniform
Resource Identifiers (URI)
Image 2b: The glTF structure
As shown in the image above, there are two types of objects that may contain such links to external resources,
namely buffers and images . These objects will later be explained in more detail.
Reading and managing external data
Reading and processing a glTF asset starts with parsing the JSON structure. After the structure has been
parsed, the buffer and image objects are available in the toplevel buffers and images arrays,
respectively. Each of these objects may refer to blocks of binary data. For further processing, this data is read
into memory. Usually, the data will be be stored in an array so that they may be looked up using the same index
that is used for referring to the buffer or image object that they belong to.
Binary data in buffers
A buffer contains a URI that points to a file containing the raw, binary buffer data:
"buffer01": {
"byteLength": 12352,
"type": "arraybuffer",
"uri": "buffer01.bin"
}
Image data in images
An image may refer to an external image file that can be used as the texture of a rendered object:
"image01": {
"uri": "image01.png"
}
The reference is given as a URI that usually points to a PNG or JPG file. These formats significantly reduce the
size of the files so that they may efficiently be transferred over the web. In some cases, the image objects may
not refer to an external file, but to data that is stored in a buffer . The details of this indirection will be explained
in the Textures, Images, and Samplers section.
Binary data in data URIs
Usually, the URIs that are contained in the buffer and image objects will point to a file that contains the actual
data. As an alternative, the data may be embedded into the JSON, in binary format, by using a data URI.
Previous: Introduction | Table of Contents | Next: A Minimal glTF File
Previous: Basic glTF Structure | Table of Contents | Next: Scenes and Nodes
A Minimal glTF File
The following is a minimal but complete glTF asset, containing a single, indexed triangle. You can copy and
paste it into a gltf file, and every glTFbased application should be able to load and render it. This section will
explain the basic concepts of glTF based on this example.
{
"scenes" : [
{
"nodes" : [ 0 ]
}
],
"nodes" : [
{
"mesh" : 0
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
}
],
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
"byteLength" : 44
}
],
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 36,
"target" : 34962
}
],
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}
],
"asset" : {
"version" : "2.0"
}
}
Image 3a: A single triangle.
The example here consists of a single scene. It refers to the only node in this example, which is the node with
the index 0. This node, in turn, refers to the only mesh, which has the index 0:
"scenes" : [
{
"nodes" : [ 0 ]
}
],
"nodes" : [
{
"mesh" : 0
}
],
More details about scenes and nodes and their properties will be given in the Scenes and Nodes section.
The meshes
A mesh represents an actual geometric object that appears in the scene. The mesh itself usually does not have
any properties, but only contains an array of mesh.primitive objects, which serve as building blocks for larger
models. Each mesh primitive contains a description of the geometry data that the mesh consists of.
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
}
],
A more detailed description of meshes and mesh primitives can be found in the meshes section.
Buffers
A buffer defines a block of raw, unstructured data with no inherent meaning. It contains an uri , which can
either point to an external file that contains the data, or it can be a data URI that encodes the binary data directly
in the JSON file.
In the example file, the second approach is used: there is a single buffer, containing 44 bytes, and the data of a
this buffer is encoded as a data URI:
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
"byteLength" : 44
}
],
This data contains the indices of the triangle, and the vertex positions of the triangle. But in order to actually use
this data as the geometry data of a mesh primitive, additional information about the structure of this data is
required. This information about the structure is encoded in the bufferView and accessor objects.
Buffer views
A bufferView describes a "chunk" or a "slice" of the whole, raw buffer data. In the given example, there are two
buffer views. They both refer to the same buffer. The first buffer view refers to the part of the buffer that contains
the data of the indices: it has a byteOffset of 0 referring to the whole buffer data, and a byteLength of 6. The
second buffer view refers to the part of the buffer that contains the vertex positions. It starts at a byteOffset of
8, and has a byteLength of 36; that is, it extends to the end of the whole buffer.
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 36,
"target" : 34962
}
],
Accessors
The second step of structuring the data is accomplished with accessor objects. They define how the data of a
bufferView has to be interpreted by providing information about the data types and the layout.
In the example, there are two accessor objects.
The second accessor describes the vertex positions. It contains a reference to the relevant part of the buffer
data, via the bufferView with index 1, and its count , type , and componentType properties say that there are
three elements of 3D vectors, each having float components.
"accessors" : [
{
"bufferView" : 0,
3个 unit
"byteOffset" : 0,
"componentType" : 5123,
在buffer的位置 offset:0, size:6
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126, 3个 vec3<float>
"count" : 3,
在 buffer 的位置 offset:8, size:36
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
}
],
"asset" : {
"version" : "2.0"
}
Previous: Basic glTF Structure | Table of Contents | Next: Scenes and Nodes
Previous: A Minimal glTF File | Table of Contents | Next: Buffers, BufferViews, and Accessors
Scenes and Nodes
Scenes
There may be multiple scenes stored in one glTF file, but in many cases, there will be only a single scene, which
then also is the default scene. Each scene contains an array of nodes , which are the indices of the root nodes
of the scene graphs. Again, there may be multiple root nodes, forming different hierarchies, but in many cases,
the scene will have a single root node. The most simple possible scene description has already been shown in
the previous section, consisting of a single scene with a single node:
"scenes" : [
{
"nodes" : [ 0 ]
}
],
"nodes" : [
{
"mesh" : 0
}
],
Nodes forming the scene graph
Each node can contain an array called children that contains the indices of its child nodes. So each node is
one element of a hierarchy of nodes, and together they define the structure of the scene as a scene graph.
Image 4a: The scene graph representation stored in the glTF JSON.
traverse(node) {
// Process the meshes, cameras, etc., that are
// attached to this node ‐ discussed later
processElements(node);
// Recursively process all children
for each (child in node.children) {
traverse(child);
}
}
In practice, some additional information will be required for the traversal: the processing of some elements that
are attached to nodes will require information about which node they are attached to. Additionally, the information
about the transforms of the nodes has to be accumulated during the traversal.
Local and global transforms
Each node can have a transform. Such a transform will define a translation, rotation, and/or scale. This transform
will be applied to all elements attached to the node itself and to all its child nodes. The hierarchy of nodes thus
allows one to structure the translations, rotations, and scalings that are applied to the scene elements.
Local transforms of nodes
There are different possible representations for the local transform of a node. The transform can be given directly
by the matrix property of the node. This is an array of 16 floating point numbers that describe the matrix in
columnmajor order. For example, the following matrix describes a scaling about (2,1,0.5), a rotation about 30
degrees around the xaxis, and a translation about (10,20,30):
"node0": {
"matrix": [
2.0, 0.0, 0.0, 0.0,
0.0, 0.866, 0.5, 0.0,
0.0, ‐0.25, 0.433, 0.0,
10.0, 20.0, 30.0, 1.0
]
}
The matrix defined here is as shown in Image 4b.
Image 4b: An example matrix.
"node0": {
"translation": [ 10.0, 20.0, 30.0 ],
"rotation": [ 0.259, 0.0, 0.0, 0.966 ],
"scale": [ 2.0, 1.0, 0.5 ]
}
Each of these properties can be used to create a matrix, and the product of these matrices then is the local
transform of the node:
Image 4c: A translation matrix.
Image 4d: A rotation matrix.
Image 4e: A scale matrix.
When computing the final, local transform matrix of the node, these matrices are multiplied together. It is
important to perform the multiplication of these matrices in the right order. The local transform matrix always has
to be computed as M = T * R * S , where T is the matrix for the translation part, R is the matrix for the
rotation part, and S is the matrix for the scale part. So the pseudocode for the computation is
translationMatrix = createTranslationMatrix(node.translation);
rotationMatrix = createRotationMatrix(node.rotation);
scaleMatrix = createScaleMatrix(node.scale);
localTransform = translationMatrix * rotationMatrix * scaleMatrix;
For the example matrices given above, the final, local transform matrix of the node will be as shown in Image 4f.
Image 4f: The final local transform matrix computed from the TRS properties.
This matrix will cause the vertices of the meshes to be scaled, then rotated, and then translated according to the
scale , rotation , and translation properties that have been given in the node.
When any of the three properties is not given, the identity matrix will be used. Similarly, when a node contains
neither a matrix property nor TRSproperties, then its local transform will be the identity matrix.
Global transforms of nodes
Regardless of the representation in the JSON file, the local transform of a node can be stored as a 4×4 matrix.
The global transform of a node is given by the product of all local transforms on the path from the root to the
respective node:
Structure: local transform global transform
root R R
+‐ nodeA A R*A
+‐ nodeB B R*A*B
+‐ nodeC C R*A*C
It is important to point out that after the file was loaded these global transforms can not be computed only once.
Later, it will be shown how animations may modify the local transforms of individual nodes. And these
modifications will affect the global transforms of all descendant nodes. Therefore, when the global transform of a
node is required, it has to be computed directly from the current local transforms of all nodes. Alternatively, and
as a potential performance improvement, an implementation could cache the global transforms, detect changes
in the local transforms of ancestor nodes, and update the global transforms only when necessary. The different
implementation options for this will depend on the programming language and the requirements for the client
application, and thus are beyond the scope of this tutorial.
Previous: A Minimal glTF File | Table of Contents | Next: Buffers, BufferViews, and Accessors
Previous: Scenes and Nodes | Table of Contents | Next: Simple Animation
Buffers, BufferViews, and Accessors
An example of buffer , bufferView , and accessor objects was already given in the Minimal glTF File section.
This section will explain these concepts in more detail.
Buffers
A buffer represents a block of raw binary data, without an inherent structure or meaning. This data is referred
to by a buffer using its uri . This URI may either point to an external file, or be a data URI that encodes the
binary data directly in the JSON file. The minimal glTF file contained an example of a buffer , with 44 bytes of
data, encoded in a data URI:
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
"byteLength" : 44
}
],
Image 5a: The buffer data, consisting of 44 bytes.
BufferViews
The first step of structuring the data from a buffer is with bufferView objects. A bufferView represents a
"slice" of the data of one buffer. This slice is defined using an offset and a length, in bytes. The minimal glTF file
defined two bufferView objects:
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 36,
"target" : 34962
}
],
Image 5b: The buffer views, referring to parts of the buffer.
The bytes that are shown in light gray are padding bytes that are required for properly aligning the accessors, as
described below.
Data type
The type of an accessor's data is encoded in the type and the componentType properties. The value of the
type property is a string that specifies whether the data elements are scalars, vectors, or matrices. For
example, the value may be "SCALAR" for scalar values, "VEC3" for 3D vectors, or "MAT4" for 4×4 matrices.
Different combinations of these properties may be used to describe arbitrary data types. For example, the
minimal glTF file contained two accessors:
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}
],
Data layout
Additional properties of an accessor further specify the layout of the data. The count property of an accessor
indicates how many data elements it consists of. In the example above, the count has been 3 for both
accessors, standing for the three indices and the three vertices of the triangle, respectively. Each accessor also
has a byteOffset property. For the example above, it has been 0 for both accessors, because there was only
one accessor for each bufferView . But when multiple accessors refer to the same bufferView , then the
byteOffset describes where the data of the accessor starts, relative to the bufferView that it refers to.
Data alignment
The data that is referred to by an accessor may be sent to the graphics card for rendering, or be used at the
host side as animation or skinning data. Therefore, the data of an accessor has to be aligned based on the type
of the data. For example, when the componentType of an accessor is 5126 ( FLOAT ), then the data must be
aligned at 4byte boundaries, because a single float value consists of four bytes. This alignment requirement
of an accessor refers to its bufferView and the underlying buffer . Particularly, the alignment requirements
are as follows:
Image 5c: The accessors defining how to interpret the data of the buffer views.
Data interleaving
The data of the attributes that are stored in a single bufferView may be stored as an ArrayOfStructures. A
single bufferView may, for example, contain the data for vertex positions and for vertex normals in an
interleaved fashion. In this case, the byteOffset of an accessor defines the start of the first relevant data
element for the respective attribute, and the bufferView defines an additional byteStride property. This is the
number of bytes between the start of one element of its accessors, and the start of the next one. An example of
how interleaved position and normal attributes are stored inside a bufferView is shown in Image 5d.
Image 5d: Interleaved acessors in one buffer view.
Data contents
An accessor also contains min and max properties that summarize the contents of their data. They are the
componentwise minimum and maximum values of all data elements contained in the accessor. In the case of
vertex positions, the min and max properties thus define the bounding box of an object. This can be useful for
prioritizing downloads, or for visibility detection. In general, this information is also useful for storing and
processing quantized data that is dequantized at runtime, by the renderer, but details of this quantization are
beyond the scope of this tutorial.
Sparse accessors
With version 2.0, the concept of sparse accessors was introduced in glTF. This is a special representation of
data that allows very compact storage of multiple data blocks that have only a few different entries. For example,
when there is geometry data that contains vertex positions, this geometry data may be used for multiple objects.
This may be achieved by referring to the same accessor from both objects. If the vertex positions for both
objects are mostly the same and differ for only a few vertices, then it is not necessary to store the whole
geometry data twice. Instead, it is possible to store the data only once, and use a sparse accessor to store only
the vertex positions that differ for the second object.
The following is a complete glTF asset, in embedded representation, that shows an example of sparse
accessors:
{
"scenes" : [ {
"nodes" : [ 0 ]
} ],
"nodes" : [ {
"mesh" : 0
} ],
"meshes" : [ {
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
} ],
"buffers" : [ {
"uri" : "data:application/gltf‐
buffer;base64,AAAIAAcAAAABAAgAAQAJAAgAAQACAAkAAgAKAAkAAgADAAoAAwALAAoAAwAEAAsABAAMAAsABAAFAAwABQA
NAAwABQAGAA0AAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAQAAAAAAAAAAAAABAQAAAAAAAAAAAAACAQAAAAAAAAAAAAACg
QAAAAAAAAAAAAADAQAAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAQAAAgD8AAAAAAABAQAAAgD8AAAAAAACAQ
AAAgD8AAAAAAACgQAAAgD8AAAAAAADAQAAAgD8AAAAACAAKAAwAAAAAAIA/AAAAQAAAAAAAAEBAAABAQAAAAAAAAKBAAACAQA
AAAAA=",
"byteLength" : 284
} ],
"bufferViews" : [ {
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 72,
"target" : 34963
}, {
"buffer" : 0,
"byteOffset" : 72,
"byteLength" : 168
}, {
"buffer" : 0,
"byteOffset" : 240,
"byteLength" : 6
}, {
"buffer" : 0,
"byteOffset" : 248,
"byteLength" : 36
} ],
"accessors" : [ {
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 36,
"type" : "SCALAR",
"max" : [ 13 ],
"min" : [ 0 ]
}, {
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 14,
"type" : "VEC3",
"max" : [ 6.0, 4.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ],
"sparse" : {
"count" : 3,
"indices" : {
"bufferView" : 2,
"byteOffset" : 0,
"componentType" : 5123
},
"values" : {
"bufferView" : 3,
"byteOffset" : 0
}
}
} ],
"asset" : {
"version" : "2.0"
}
}
The result of rendering this asset is shown in Image 5e:
Image 5e: The result of rendering the simple sparse accessor asset.
The example contains two accessors: one for the indices of the mesh, and one for the vertex positions. The one
that refers to the vertex positions defines an additional accessor.sparse property, which contains the
information about the sparse data substitution that should be applied:
"accessors" : [
...
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 14,
"type" : "VEC3",
"max" : [ 6.0, 4.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ],
"sparse" : {
"count" : 3,
"indices" : {
"bufferView" : 2,
"byteOffset" : 0,
"componentType" : 5123
},
"values" : {
"bufferView" : 3,
"byteOffset" : 0
}
}
} ],
This sparse object itself defines the count of elements that will be affected by the substitution. The
sparse.indices property refers to a bufferView that contains the indices of the elements which will be
replaced. The sparse.values refers to a bufferView that contains the actual data.
Image 5f: The substitution that is done with the sparse accessor.
Previous: Scenes and Nodes | Table of Contents | Next: Simple Animation
Previous: Buffers, BufferViews, and Accessors | Table of Contents | Next: Animations
A Simple Animation
As shown in the Scenes and Nodes section, each node can have a local transform. This transform can be given
either by the matrix property of the node or by using the translation , rotation , and scale (TRS)
properties.
The following is the minimal glTF file that was shown previously, but extended with an animation. This section will
explain the changes and extensions that have been made to add this animation.
{
"scenes" : [
{
"nodes" : [ 0 ]
}
],
"nodes" : [
{
"mesh" : 0,
"rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
}
],
"animations": [
{
"samplers" : [
{
"input" : 2,
"interpolation" : "LINEAR",
"output" : 3
}
],
"channels" : [ {
"sampler" : 0,
"target" : {
"node" : 0,
"path" : "rotation"
}
} ]
}
],
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
"byteLength" : 44
},
{
"uri" : "data:application/octet‐
stream;base64,AAAAAAAAgD4AAAA/AABAPwAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAPT9ND/0/TS/AAAAAAAAAAAAAAAAAACAPw==",
"byteLength" : 100
}
],
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 36,
"target" : 34962
},
{
"buffer" : 1,
"byteOffset" : 0,
"byteLength" : 100
}
],
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
},
{
"bufferView" : 2,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 5,
"type" : "SCALAR",
"max" : [ 1.0 ],
"min" : [ 0.0 ]
},
{
"bufferView" : 2,
"byteOffset" : 20,
"componentType" : 5126,
"count" : 5,
"type" : "VEC4",
"max" : [ 0.0, 0.0, 1.0, 1.0 ],
"min" : [ 0.0, 0.0, 0.0, ‐0.707 ]
}
],
"asset" : {
"version" : "2.0"
}
}
Image 6a: A single, animated triangle.
"nodes" : [
{
"mesh" : 0,
"rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
}
],
The given value is the quaternion describing a "rotation about 0 degrees," so the triangle will be shown in its initial
orientation.
The animation data
Three elements have been added to the toplevel arrays of the glTF JSON to encode the animation data:
"buffers" : [
...
{
"uri" : "data:application/octet‐
stream;base64,AAAAAAAAgD4AAAA/AABAPwAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAPT9ND/0/TS/AAAAAAAAAAAAAAAAAACAPw==",
"byteLength" : 100
}
],
"bufferViews" : [
...
{
"buffer" : 1,
"byteOffset" : 0,
"byteLength" : 100
}
],
Note that one could also have appended the animation data to the existing buffer that already contained the
geometry data of the triangle. In this case, the new buffer view would have referred to the buffer with index 0,
and used an appropriate byteOffset to refer to the part of the buffer that then contained the animation data.
In the example that is shown here, the animation data is added as a new buffer to keep the geometry data and
the animation data separated.
The actual data that is provided by the times accessor and the rotations accessor, using the data from the buffer
in the example, is shown in this table:
|0.25| (0.0, 0.0, 0.707, 0.707)| At 0.25 seconds, it has a rotation of 90 degrees around the zaxis
|0.5| (0.0, 0.0, 1.0, 0.0)| At 0.5 seconds, it has a rotation of 180 degrees around the zaxis |
|0.75| (0.0, 0.0, 0.707, 0.707)| At 0.75 seconds, it has a rotation of 270 (= 90) degrees around the zaxis |
|1.0| (0.0, 0.0, 0.0, 1.0)| At 1.0 seconds, it has a rotation of 360 (= 0) degrees around the zaxis |
So this animation describes a rotation of 360 degrees around the zaxis that lasts 1 second.
The animation
Finally, this is the part where the actual animation is added. The toplevel animations array contains a single
animation object. It consists of two elements:
"animations": [
{
"samplers" : [ buffer view idx
{
"input" : 2,
"interpolation" : "LINEAR",
"output" : 3
}
],
"channels" : [ {
"sampler" : 0,
"target" : {
"node" : 0,
"path" : "rotation"
}
} ]
}
],
Combining all this information, the given animation object says the following:
During the animation, the animated values are obtained from the rotations accessor. They are interpolated
linearly, based on the current simulation time and the key frame times that are provided by the times
accessor. The interpolated values are then written into the "rotation" property of the node with index 0.
A more detailed description and actual examples for the interpolation and the computations that are involved here
can be found in the Animations section.
Previous: Buffers, BufferViews, and Accessors | Table of Contents | Next: Animations
Previous: Simple Animation | Table of Contents | Next: Simple Meshes
Animations
As shown in the Simple Animation example, an animation can be used to describe how the translation ,
rotation , or scale properties of nodes change over time.
"animations": [
{
"samplers" : [
{
"input" : 2,
"interpolation" : "LINEAR",
"output" : 3
},
{
"input" : 2,
"interpolation" : "LINEAR",
"output" : 4
}
],
"channels" : [
{
"sampler" : 0,
"target" : {
"node" : 0,
"path" : "rotation"
}
},
{
"sampler" : 1,
"target" : {
"node" : 0,
"path" : "translation"
}
}
]
}
],
Animation samplers
The samplers array contains animation.sampler objects that define how the values that are provided by the
accessors have to be interpolated between the key frames, as shown in Image 7a.
Image 7a: Animation samplers.
In order to compute the value of the translation for the current animation time, the following algorithm can be
used:
Let the current animation time be given as currentTime .
Compute the next smaller and the next larger element of the times accessor:
Obtain the elements from the translations accessor that correspond to these times:
previousTranslation = The element from the translations accessor that corresponds to the previousTime
Compute the interpolation value. This is a value between 0.0 and 1.0 that describes the relative position of
the currentTime , between the previousTime and the nextTime :
interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)
Use the interpolation value to compute the translation for the current time:
currentTranslation = previousTranslation + interpolationValue * (nextTranslation ‐
previousTranslation)
Example:
Imagine the currentTime is 1.2. The next smaller element from the times accessor is 0.8. The next larger
element is 1.6. So
previousTime = 0.8
nextTime = 1.6
The corresponding values from the translations accessor can be looked up:
previousTranslation = (14.0, 3.0, ‐2.0)
nextTranslation = (18.0, 1.0, 1.0)
The interpolation value can be computed:
interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)
= (1.2 ‐ 0.8) / (1.6 ‐ 0.8)
= 0.4 / 0.8
= 0.5
From the interpolation value, the current translation can be computed:
currentTranslation = previousTranslation + interpolationValue * (nextTranslation ‐
previousTranslation)
= (14.0, 3.0, ‐2.0) + 0.5 * ( (18.0, 1.0, 1.0) ‐ (14.0, 3.0, ‐2.0) )
= (14.0, 3.0, ‐2.0) + 0.5 * (4.0, ‐2.0, 3.0)
= (16.0, 2.0, ‐0.5)
Animation channels
The animations contain an array of animation.channel objects. The channels establish the connection between
the input, which is the value that is computed from the sampler, and the output, which is the animated node
property. Therefore, each channel refers to one sampler, using the index of the sampler, and contains an
animation.channel.target . The target refers to a node, using the index of the node, and contains a path
that defines the property of the node that should be animated. The value from the sampler will be written into this
property.
In the example above, there are two channels for the animation. Both refer to the same node. The path of the first
channel refers to the translation of the node, and the path of the second channel refers to the rotation of
the node. So all objects (meshes) that are attached to the node will be translated and rotated by the animation,
as shown in Image 7b.
Image 7b: Animation channels.
Interpolation
The above example only covers LINEAR interpolation. Animations in a glTF asset can use three interpolation
modes :
STEP
LINEAR
CUBICSPLINE
Step
The STEP interpolation is not really an interpolation mode, it makes objects jump from keyframe to keyframe
without any sort of interpolation. When a sampler defines a step interpolation, just apply the transformation from
the keyframe corresponding to previousTime .
Linear
Linear interpolation exactly corresponds to the above example. The general case is :
Calculate the interpolationValue :
interpolationValue = (currentTime ‐ previousTime) / (nextTime ‐ previousTime)
Point lerp(previousPoint, nextPoint, interpolationValue)
return previousPoint + interpolationValue * (nextPoint ‐ previousPoint)
In the case of rotations expressed as quaternions, you need to perform a spherical linear intepolation ( slerp )
between the previous and next values:
Quat slerp(previousQuat, nextQuat, interpolationValue)
var dotProduct = dot(previousQuat, nextQuat)
//make sure we take the shortest path in case dot Product is negative
if(dotProduct < 0.0)
nextQuat = ‐nextQuat
dotProduct = ‐dotProduct
//if the two quaternions are too close to each other, just linear interpolate between the
4D vector
if(dotProduct > 0.9995)
return normalize(previousQuat + interpolationValue(nextQuat ‐ previousQuat))
//perform the spherical linear interpolation
var theta_0 = acos(dotProduct)
var theta = interpolationValue * theta_0
var sin_theta = sin(theta)
var sin_theta_0 = sin(theta_0)
var scalePreviousQuat = cos(theta) ‐ dotproduct * sin_theta / sin_theta_0
var scaleNextQuat = sin_theta / sin_theta_0
return scalePreviousQuat * previousQuat + scaleNextQuat * nextQuat
This example implementation is inspired from this wikipedia article
Cubic Spline interplation
Cubic spline intepolation needs more data than just the previous and next keyframe time and values, it also need
for each keyframe a couple of tangent vectors that act to smooth out the curve around the keyframe points.
These tangent are stored in the animation channel. For each keyframe described by the animation sampler, the
animation channel contains 3 elements :
The input tangent of the keyframe
The keyframe value
The output tangent
The input and output tangents are normalized vectors that will need to be scaled by the duration of the keyframe,
we call that the deltaTime
deltaTime = nextTime ‐ previousTime
note: the input tangent of the first keyframe and the output tangent of the last keyframe are totally ignored
To calculate the actual tangents of the keyframe, you need to multiply the direction vectors you got from the
channel by deltaTime
previousTangent = deltaTime * previousOutputTangent
nextTangent = deltaTime * nextInputTangent
The mathematical function is described in the Appenddix C of the glTF 2.0 specification.
Here's a corresponding pseudocode snippet :
Point cubicSpline(previousPoint, previousTangent, nextPoint, nextTangent, interpolationValue)
t = interpolationValue
t2 = t * t
t3 = t2 * t
return (2 * t3 ‐ 3 * t2 + 1) * previousPoint + (t3 ‐ 2 * t2 + t) * previousTangent + (‐2 *
t3 + 3 * t2) * nextPoint + (t3 ‐ t2) * nextTangent;
Previous: Simple Animation | Table of Contents | Next: Simple Meshes
Previous: Animations | Table of Contents | Next: Meshes
Simple Meshes
A mesh represents a geometric object that appears in a scene. An example of a mesh has already been shown
in the minimal glTF file. This example had a single mesh attached to a single node , and the mesh consisted of
a single mesh.primitive that contained only a single attribute—namely, the attribute for the vertex positions.
But usually, the mesh primitives will contain more attributes. These attributes may, for example, be the vertex
normals or texture coordinates.
The following is a glTF asset that contains a simple mesh with multiple attributes, which will serve as the basis
for explaining the related concepts:
{
"scenes" : [
{
"nodes" : [ 0, 1]
}
],
"nodes" : [
{
"mesh" : 0
},
{
"mesh" : 0,
"translation" : [ 1.0, 0.0, 0.0 ]
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1,
"NORMAL" : 2
},
"indices" : 0
} ]
}
],
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAAA
AAAAAgD8AAAAAAAAAAAAAgD8=",
"byteLength" : 80
}
],
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 72,
"target" : 34962
}
],
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
},
{
"bufferView" : 1,
"byteOffset" : 36,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 0.0, 0.0, 1.0 ],
"min" : [ 0.0, 0.0, 1.0 ]
}
],
"asset" : {
"version" : "2.0"
}
}
Image 8a shows the rendered glTF asset.
Image 8a: A simple mesh, attached to two nodes.
The mesh definition
The given example still contains a single mesh that has a single mesh primitive. But this mesh primitive contains
multiple attributes:
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1,
"NORMAL" : 2
},
"indices" : 0
} ]
}
],
The rendered mesh instances
As can be seen in Image 8a, the mesh is rendered twice. This is accomplished by attaching the mesh to two
different nodes:
"nodes" : [
{
"mesh" : 0
},
{
"mesh" : 0,
"translation" : [ 1.0, 0.0, 0.0 ]
}
],
The next section will explain meshes and mesh primitives in more detail.
Previous: Animations | Table of Contents | Next: Meshes
Previous: Simple Meshes | Table of Contents | Next: Materials
Meshes
The Simple Meshes example from the previous section showed a basic example of a mesh with a
mesh.primitive object that contained several attributes. This section will explain the meaning and usage of
mesh primitives, how meshes may be attached to nodes of the scene graph, and how they can be rendered with
different materials.
Mesh primitives
Each mesh contains an array of mesh.primitive objects. These mesh primitive objects are smaller parts or
building blocks of a larger object. A mesh primitive summarizes all information about how the respective part of
the object will be rendered.
Mesh primitive attributes
A mesh primitive defines the geometry data of the object using its attributes dictionary. This geometry data is
given by references to accessor objects that contain the data of vertex attributes. The details of the accessor
concept are explained in the Buffers, BufferViews, and Accessors section.
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1,
"NORMAL" : 2
},
"indices" : 0
} ]
}
],
Together, the elements of these accessors define the attributes that belong to the individual vertices, as shown in
Image 9a.
Image 9a: Mesh primitive accessors containing the data of vertices.
Indexed and nonindexed geometry
The geometry data of a mesh.primitive may be either indexed geometry or geometry without indices. In the
given example, the mesh.primitive contains indexed geometry. This is indicated by the indices property,
which refers to the accessor with index 0, defining the data for the indices. For nonindexed geometry, this
property is omitted.
Mesh primitive mode
By default, the geometry data is assumed to describe a triangle mesh. For the case of indexed geometry, this
means that three consecutive elements of the indices accessor are assumed to contain the indices of a single
triangle. For nonindexed geometry, three elements of the vertex attribute accessors are assumed to contain the
attributes of the three vertices of a triangle.
Other rendering modes are possible: the geometry data may also describe individual points, lines, or triangle
strips. This is indicated by the mode that may be stored in the mesh primitive. Its value is a constant that
indicates how the geometry data has to be interpreted. The mode may, for example, be 0 when the geometry
consists of points, or 4 when it consists of triangles. These constants correspond to the GL constants POINTS
or TRIANGLES , respectively. See the primitive.mode specification for a list of available modes.
Mesh primitive material
The mesh primitive may also refer to the material that should be used for rendering, using the index of this
material. In the given example, no material is defined, causing the objects to be rendered with a default
material that just defines the objects to have a uniform 50% gray color. A detailed explanation of materials and
the related concepts will be given in the Materials section.
Meshes attached to nodes
In the example from the Simple Meshes section, there is a single scene , which contains two nodes, and both
nodes refer to the same mesh instance, which has the index 0:
"scenes" : [
{
"nodes" : [ 0, 1]
}
],
"nodes" : [
{
"mesh" : 0
},
{
"mesh" : 0,
"translation" : [ 1.0, 0.0, 0.0 ]
}
],
"meshes" : [
{ ... }
],
So in this example, the mesh will be rendered twice because it is attached to two nodes: once with the global
transform of the first node, which is the identity transform, and once with the global transform of the second
node, which is a translation of 1.0 along the xaxis.
Previous: Simple Meshes | Table of Contents | Next: Materials
Previous: Meshes | Table of Contents | Next: Simple Material
Materials
Introduction
The purpose of glTF is to define a transmission format for 3D assets. As shown in the previous sections, this
includes information about the scene structure and the geometric objects that appear in the scene. But a glTF
asset can also contain information about the appearance of the objects; that is, how these objects should be
rendered on the screen.
There are different possible representations for the properties of a material, and the shading model describes how
these properties are processed. Simple shading models, like the Phong or BlinnPhong, are directly supported by
common graphics APIs like OpenGL or WebGL. These shading models are built on a set of basic material
properties. For example, the material properties involve information about the color of diffusely reflected light
(often in the form of a texture), the color of specularly reflected light, and a shininess parameter. Many file
formats contain exactly these parameters. For example, Wavefront OBJ files are combined with MTL files that
contain this texture and color information. Renderers can read this information and render the objects accordingly.
But in order to describe more realistic materials, more sophisticated shading and material models are required.
PhysicallyBased Rendering (PBR)
To allow renderers to display objects with a realistic appearance under different lighting conditions, the shading
model has to take the physical properties of the object surface into account. There are different representations
of these physical material properties. One that is frequently used is the metallicroughnessmodel. Here, the
information about the object surface is encoded with three main parameters:
The base color, which is the "main" color of the object surface.
The metallic value. This is a parameter that describes how much the reflective behavior of the material
resembles that of a metal.
The roughness value, indicating how rough the surface is, affecting the light scattering.
The metallicroughness model is the representation that is used in glTF. Other material representations, like the
specularglossinessmodel, are supported via extensions.
The effects of different metallic and roughness values are illustrated in this image:
Image 10a: Spheres with different metallic and roughness values.
The base color, metallic, and roughness properties may be given as single values and are then applied to the
whole object. In order to assign different material properties to different parts of the object surface, these
properties may also be given in the form of textures. This makes it possible to model a wide range of realworld
materials with a realistic appearance.
Depending on the shading model, additional effects can be applied to the object surface. These are usually given
as a combination of a texture and a scaling factor:
An emissive texture describes the parts of the object surface that emit light with a certain color.
The occlusion texture can be used to simulate the effect of objects selfshadowing each other.
The normal map is a texture applied to modulate the surface normal in a way that makes it possible to
simulate finer geometric details without the cost of a higher mesh resolution.
glTF supports all of these additional properties, and defines sensible default values for the cases that these
properties are omitted.
The following sections will show how these material properties are encoded in a glTF asset, including various
examples of materials:
A Simple Material
Textures, Images, and Samplers that serve as a basis for defining material properties
A Simple Texture showing an example of how to use a texture for a material
An Advanced Material combining multiple textures to achieve a sophisticated surface appearance for the
objects
Previous: Meshes | Table of Contents | Next: Simple Material
Previous: Materials | Table of Contents | Next: Textures, Images, Samplers
A Simple Material
The examples of glTF assets that have been given in the previous sections contained a basic scene structure
and simple geometric objects. But they did not contain information about the appearance of the objects. When no
such information is given, viewers are encouraged to render the objects with a "default" material. And as shown
in the screenshot of the minimal glTF file, depending on the light conditions in the scene, this default material
causes the object to be rendered with a uniformly white or light gray color.
This section will start with an example of a very simple material and explain the effect of the different material
properties.
This is a minimal glTF asset with a simple material:
{
"scenes" : [
{
"nodes" : [ 0 ]
}
],
"nodes" : [
{
"mesh" : 0
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0,
"material" : 0
} ]
}
],
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAA=",
"byteLength" : 44
}
],
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 6,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 8,
"byteLength" : 36,
"target" : 34962
}
],
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 3,
"type" : "SCALAR",
"max" : [ 2 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 3,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}
],
"materials" : [
{
"pbrMetallicRoughness": {
"baseColorFactor": [ 1.000, 0.766, 0.336, 1.0 ],
"metallicFactor": 0.5,
"roughnessFactor": 0.1
}
}
],
"asset" : {
"version" : "2.0"
}
}
When rendered, this asset will show the triangle with a new material, as shown in Image 11a.
Image 11a: A triangle with a simple material.
Material definition
A new toplevel array has been added to the glTF JSON to define this material: The materials array contains a
single element that defines the material and its properties:
"materials" : [
{
"pbrMetallicRoughness": {
"baseColorFactor": [ 1.000, 0.766, 0.336, 1.0 ],
"metallicFactor": 0.5,
"roughnessFactor": 0.1
}
}
],
Assigning the material to objects
The material is assigned to the triangle, namely to the mesh.primitive , by referring to the material using its
index:
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0,
"material" : 0
} ]
}
The next section will give a short introduction to how textures are defined in a glTF asset. The use of textures will
then allow the definition of more complex and realistic materials.
Previous: Materials | Table of Contents | Next: Textures, Images, Samplers
Previous: Simple Material | Table of Contents | Next: Simple Texture
Textures, Images, and Samplers
Textures are an important aspect of giving objects a realistic appearance. They make it possible to define the
main color of the objects, as well as other characteristics that are used in the material definition in order to
precisely describe what the rendered object should look like.
"textures": {
{
"source": 0,
"sampler": 0
}
},
"images": {
{
"uri": "testTexture.png"
}
},
"samplers": {
{
"magFilter": 9729,
"minFilter": 9987,
"wrapS": 33648,
"wrapT": 33648
}
},
The next section will show how such a texture definition may be used inside a material.
Previous: Simple Material | Table of Contents | Next: Simple Texture
Previous: Textures, Images, and Samplers | Table of Contents | Next: Advanced Material
A Simple Texture
As shown in the previous sections, the material definition in a glTF asset contains different parameters for the
color of the material or the overall appearance of the material under the influence of light. These properties may
be given via single values, for example, defining the color or the roughness of the object as a whole.
Alternatively, these values may be provided via textures that are mapped on the object surface. The following is
a glTF asset that defines a material with a simple, single texture:
{
"scenes" : [ {
"nodes" : [ 0 ]
} ],
"nodes" : [ {
"mesh" : 0
} ],
"meshes" : [ {
"primitives" : [ {
"attributes" : {
"POSITION" : 1,
"TEXCOORD_0" : 2
},
"indices" : 0,
"material" : 0
} ]
} ],
"materials" : [ {
"pbrMetallicRoughness" : {
"baseColorTexture" : {
"index" : 0
},
"metallicFactor" : 0.0,
"roughnessFactor" : 1.0
}
} ],
"textures" : [ {
"sampler" : 0,
"source" : 0
} ],
"images" : [ {
"uri" : "testTexture.png"
} ],
"samplers" : [ {
"magFilter" : 9729,
"minFilter" : 9987,
"wrapS" : 33648,
"wrapT" : 33648
} ],
"buffers" : [ {
"uri" : "data:application/gltf‐
buffer;base64,AAABAAIAAQADAAIAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAA
AAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAA",
"byteLength" : 108
} ],
"bufferViews" : [ {
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 12,
"target" : 34963
}, {
"buffer" : 0,
"byteOffset" : 12,
"byteLength" : 96,
"byteStride" : 12,
"target" : 34962
} ],
"accessors" : [ {
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 6,
"type" : "SCALAR",
"max" : [ 3 ],
"min" : [ 0 ]
}, {
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 4,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}, {
"bufferView" : 1,
"byteOffset" : 48,
"componentType" : 5126,
"count" : 4,
"type" : "VEC2",
"max" : [ 1.0, 1.0 ],
"min" : [ 0.0, 0.0 ]
} ],
"asset" : {
"version" : "2.0"
}
}
Image 15a: The image for the simple texture example.
Bringing this all together in a renderer will result in the scene rendered in Image 15b.
Image 15b: A simple texture on a unit square.
The Textured Material Definition
The material definition in this example differs from the Simple Material that was shown earlier. While the simple
material only defined a single color for the whole object, the material definition now refers to the newly added
texture:
"materials" : [ {
"pbrMetallicRoughness" : {
"baseColorTexture" : {
"index" : 0
},
"metallicFactor" : 0.0,
"roughnessFactor" : 1.0
}
} ],
In order to apply a texture to a mesh primitive, there must be information about the texture coordinates that
should be used for each vertex. The texture coordinates are only another attribute for the vertices defined in the
mesh.primitive . By default, a texture will use the texture coordinates that have the attribute name
TEXCOORD_0 . If there are multiple sets of texture coordinates, the one that should be used for one particular
texture may be selected by adding a texCoord property to the texture reference:
"baseColorTexture" : {
"index" : 0,
"texCoord": 2
},
In this case, the texture would use the texture coordinates that are contained in the attribute called TEXCOORD_2 .
Previous: Textures, Images, and Samplers | Table of Contents | Next: Advanced Material
Previous: Simple Texture | Table of Contents | Next: Simple Cameras
An Advanced Material
The Simple Texture example in the previous section showed a material for which the "base color" was defined
using a texture. But in addition to the base color, there are other properties of a material that may be defined via
textures. These properties have already been summarized in the Materials section:
The base color,
The metallic value,
The roughness of the surface,
The emissive properties,
An occlusion texture, and
A normal map.
The effects of these properties cannot properly be demonstrated with trivial textures. Therefore, they will be
shown here using one of the official Khronos PBR sample models, namely, the WaterBottle model. Image 14a
shows an overview of the textures that are involved in this model, and the final rendered object:
Image 14a: An example of a material where the surface properties are defined via textures.
Explaining the implementation of physically based rendering is beyond the scope of this tutorial. The official
Khronos WebGL PBR repository contains a reference implementation of a PBR renderer based on WebGL, and
provides implementation hints and background information. The following images mainly aim at demonstrating the
effects of the different material property textures, under different lighting conditions.
Image 14b shows the effect of the roughness texture: the main part of the bottle has a low roughness, causing it
to appear shiny, compared to the cap, which has a rough surface structure.
Image 14b: The influence of the roughness texture.
Image 14c highlights the effect of the metallic texture: the bottle reflects the light from the surrounding
environment map.
Image 14c: The influence of the metallic texture.
Image 14d shows the emissive part of the texture: regardless of the dark environment setting, the text, which is
contained in the emissive texture, is clearly visible.
Image 14d: The emissive part of the texture.
Image 14e shows the part of the bottl cap for which a normal map is defined: the text appears to be embossed
into the cap. This makes it possible to model finer geometric details on the surface, even though the model itself
only has a very coarse geometric resolution.
Image 14e: The effect of a normal map.
Together, these textures and maps allow modeling a wide range of realworld materials. Thanks to the common
underlying PBR model namely, the metallicroughness model the objects can be rendered consistently by
different renderer implementations.
Previous: Simple Texture | Table of Contents | Next: Simple Cameras
Previous: Advanced Material | Table of Contents | Next: Cameras
Simple Cameras
The previous sections showed how a basic scene structure with geometric objects is represented in a glTF
asset, and how different materials can be applied to these objects. This did not yet include information about the
view configuration that should be used for rendering the scene. This view configuration is usually described as a
virtual camera that is contained in the scene, at a certain position, and pointing in a certain direction.
The following is a simple, complete glTF asset. It is similar to the assets that have already been shown: it
defines a simple scene containing node objects and a single geometric object that is given as a mesh ,
attached to one of the nodes. But this asset additionally contains two camera objects:
{
"scenes" : [
{
"nodes" : [ 0, 1, 2 ]
}
],
"nodes" : [
{
"rotation" : [ ‐0.383, 0.0, 0.0, 0.924 ],
"mesh" : 0
},
{
"translation" : [ 0.5, 0.5, 3.0 ],
"camera" : 0
},
{
"translation" : [ 0.5, 0.5, 3.0 ],
"camera" : 1
}
],
"cameras" : [
{
"type": "perspective",
"perspective": {
"aspectRatio": 1.0,
"yfov": 0.7,
"zfar": 100,
"znear": 0.01
}
},
{
"type": "orthographic",
"orthographic": {
"xmag": 1.0,
"ymag": 1.0,
"zfar": 100,
"znear": 0.01
}
}
],
"meshes" : [
{
"primitives" : [ {
"attributes" : {
"POSITION" : 1
},
"indices" : 0
} ]
}
],
"buffers" : [
{
"uri" : "data:application/octet‐
stream;base64,AAABAAIAAQADAAIAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAA",
"byteLength" : 60
}
],
"bufferViews" : [
{
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 12,
"target" : 34963
},
{
"buffer" : 0,
"byteOffset" : 12,
"byteLength" : 48,
"target" : 34962
}
],
"accessors" : [
{
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 6,
"type" : "SCALAR",
"max" : [ 3 ],
"min" : [ 0 ]
},
{
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 4,
"type" : "VEC3",
"max" : [ 1.0, 1.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}
],
"asset" : {
"version" : "2.0"
}
}
The geometry in this asset is a simple unit square. It is rotated by 45 degrees around the xaxis, to emphasize
the effect of the different cameras. Image 17a shows three options for rendering this asset. The first examples
use the cameras from the asset. The last example shows how the scene looks from an external, userdefined
viewpoint.
Image 17a: The effect of rendering the scene with different cameras.
Camera definitions
The new toplevel element of this glTF asset is the cameras array, which contains the camera objects:
"cameras" : [
{
"type": "perspective",
"perspective": {
"aspectRatio": 1.0,
"yfov": 0.7,
"zfar": 100,
"znear": 0.01
}
},
{
"type": "orthographic",
"orthographic": {
"xmag": 1.0,
"ymag": 1.0,
"zfar": 100,
"znear": 0.01
}
}
],
The differences between perspective and orthographic cameras and their properties, the effect of attaching the
cameras to the nodes, and the management of multiple cameras will be explained in detail in the Cameras
section.
Previous: Advanced Material | Table of Contents | Next: Cameras
Previous: Simple Cameras | Table of Contents | Next: Simple Morph Target
Cameras
The example in the Simple Cameras section showed how to define perspective and orthographic cameras, and
how they can be integrated into a scene by attaching them to nodes. This section will explain the differences
between both types of cameras, and the handling of cameras in general.
Perspective and orthographic cameras
There are two kinds of cameras: Perspective cameras, where the viewing volume is a truncated pyramid (often
referred to as "viewing frustum"), and orthographic cameras, where the viewing volume is a rectangular box. The
main difference is that rendering with a perspective camera causes a proper perspective distortion, whereas
rendering with an orthographic camera causes a preservation of lengths and angles.
The example in the Simple Cameras section contains one camera of each type, a perspective camera with at
index 0, and an orthographic camera at index 1:
"cameras" : [
{
"type": "perspective",
"perspective": {
"aspectRatio": 1.0,
"yfov": 0.7,
"zfar": 100,
"znear": 0.01
}
},
{
"type": "orthographic",
"orthographic": {
"xmag": 1.0,
"ymag": 1.0,
"zfar": 100,
"znear": 0.01
}
}
],
Explaining the details of cameras, viewing, and projections is beyond the scope of this tutorial. The important
point is that most graphics APIs offer methods for defining the viewing configuration that are directly based on
these parameters. In general, these parameters can be used to compute a camera matrix. The camera matrix
can be inverted to obtain the view matrix, which will later be postmultiplied with the model matrix to obtain the
modelview matrix, which is required by the renderer.
Camera orientation
A camera can be transformed to have a certain orientation and viewing direction in the scene. This is
accomplished by attaching the camera to a node . Each node may contain the index of a camera that is
attached to it. In the simple camera example, there are two nodes for the cameras. The first node refers to the
perspective camera with index 0, and the second one refers to the orthographic camera with index 1:
"nodes" : {
...
{
"translation" : [ 0.5, 0.5, 3.0 ],
"camera" : 0
},
{
"translation" : [ 0.5, 0.5, 3.0 ],
"camera" : 1
}
},
As shown in the Scenes and Nodes section, these nodes may have properties that define the transform matrix of
the node. The global transform of a node then defines the actual orientation of the camera in the scene. With the
option to apply arbitrary animations to the nodes, it is even possible to define camera flights.
When the global transform of the camera node is the identity matrix, then the eye point of the camera is at the
origin, and the viewing direction is along the negative zaxis. In the given example, the nodes both have a
translation about (0.5, 0.5, 3.0) , which causes the camera to be transformed accordingly: it is translated
about 0.5 in the x and y direction, to look at the center of the unit square, and about 3.0 along the zaxis, to
move it a bit away from the object.
Camera instancing and management
There may be multiple cameras defined in the JSON part of a glTF. Each camera may be referred to by multiple
nodes. Therefore, the cameras as they appear in the glTF asset are really "templates" for actual camera
instances: Whenever a node refers to one camera, a new instance of this camera is created.
There is no "default" camera for a glTF asset. Instead, the client application has to keep track of the currently
active camera. The client application may, for example, offer a dropdownmenu that allows one to select the
active camera and thus to quickly switch between predefined view configurations. With a bit more implementation
effort, the client application can also define its own camera and interaction patterns for the camera control (e.g.,
zooming with the mouse wheel). However, the logic for the navigation and interaction has to be implemented
solely by the client application in this case. Image 17a shows the result of such an implementation, where the
user may select either the active camera from the ones that are defined in the glTF asset, or an "external
camera" that may be controlled with the mouse.
Previous: Simple Cameras | Table of Contents | Next: Simple Morph Target
Previous: Cameras | Table of Contents | Next: Morph Targets
A Simple Morph Target
Starting with version 2.0, glTF supports the definition of morph targets for meshes. A morph target stores
displacements or differences for certain mesh attributes. At runtime, these differences may be added to the
original mesh, with different weights, in order to animate parts of the mesh. This is often used in character
animations, for example, to encode different facial expressions of a virtual character.
The following is a minimal example that shows a mesh with two morph targets. The new elements will be
summarized here, and the broader concept of morph targets and how they are applied at runtime will be explained
in the next section.
{
"scenes":[
{
"nodes":[
0
]
}
],
"nodes":[
{
"mesh":0
}
],
"meshes":[
{
"primitives":[
{
"attributes":{
"POSITION":1
},
"targets":[
{
"POSITION":2
},
{
"POSITION":3
}
],
"indices":0
}
],
"weights":[
1.0,
0.5
]
}
],
"animations":[
{
"samplers":[
{
"input":4,
"interpolation":"LINEAR",
"output":5
}
],
"channels":[
{
"sampler":0,
"target":{
"node":0,
"path":"weights"
}
}
]
}
],
"buffers":[
{
"uri":"data:application/gltf‐
buffer;base64,AAABAAIAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAA/AAAAPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAIC/AACAPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIA/AACAPwAAAAA=",
"byteLength":116
},
{
"uri":"data:application/gltf‐
buffer;base64,AAAAAAAAgD8AAABAAABAQAAAgEAAAAAAAAAAAAAAAAAAAIA/AACAPwAAgD8AAIA/AAAAAAAAAAAAAAAA",
"byteLength":60
}
],
"bufferViews":[
{
"buffer":0,
"byteOffset":0,
"byteLength":6,
"target":34963
},
{
"buffer":0,
"byteOffset":8,
"byteLength":108,
"byteStride":12,
"target":34962
},
{
"buffer":1,
"byteOffset":0,
"byteLength":20
},
{
"buffer":1,
"byteOffset":20,
"byteLength":40
}
],
"accessors":[
{
"bufferView":0,
"byteOffset":0,
"componentType":5123,
"count":3,
"type":"SCALAR",
"max":[
2
],
"min":[
0
]
},
{
"bufferView":1,
"byteOffset":0,
"componentType":5126,
"count":3,
"type":"VEC3",
"max":[
1.0,
0.5,
0.0
],
"min":[
0.0,
0.0,
0.0
]
},
{
"bufferView":1,
"byteOffset":36,
"componentType":5126,
"count":3,
"type":"VEC3",
"max":[
0.0,
1.0,
0.0
],
"min":[
‐1.0,
0.0,
0.0
]
},
{
"bufferView":1,
"byteOffset":72,
"componentType":5126,
"count":3,
"type":"VEC3",
"max":[
1.0,
1.0,
0.0
],
"min":[
0.0,
0.0,
0.0
]
},
{
"bufferView":2,
"byteOffset":0,
"componentType":5126,
"count":5,
"type":"SCALAR",
"max":[
4.0
],
"min":[
0.0
]
},
{
"bufferView":3,
"byteOffset":0,
"componentType":5126,
"count":10,
"type":"SCALAR",
"max":[
1.0
],
"min":[
0.0
]
}
],
"asset":{
"version":"2.0"
}
}
The asset contains an animation that interpolates between the different morph targets for a single triangle. A
screenshot of this asset is shown in Image 21a.
Image 21a: A triangle with two morph targets.
Most of the elements of this asset have already been explained in the previous sections: It contains a scene
with a single node and a single mesh . There are two buffer objects, one storing the geometry data and one
storing the data for the animation , and several bufferView and accessor objects that provide access to this
data.
"animations":[
{
"samplers":[
{
"input":4,
"interpolation":"LINEAR",
"output":5
}
],
"channels":[
{
"sampler":0,
"target":{
"node":0,
"path":"weights"
}
}
]
}
],
This means that the animation will modify the weights of the mesh that is referred to by the target.node . The
result of applying the animation to these weights, and the computation of the final, rendered mesh will be
explained in more detail in the next section about Morph Targets.
Previous: Cameras | Table of Contents | Next: Morph Targets
Previous: Simple Morph Target | Table of Contents | Next: SimpleSkin
Morph Targets
The example in the previous section contains a mesh that consists of a single triangle with two morph targets:
{
"meshes":[
{
"primitives":[
{
"attributes":{
"POSITION":1
},
"targets":[
{
"POSITION":2
},
{
"POSITION":3
}
],
"indices":0
}
],
"weights":[
1.0,
0.5
]
}
],
The actual base geometry of the mesh, namely the triangle geometry, is defined by the mesh.primitive
attribute called "POSITIONS" . The morph targets of the mesh.primitive are dictionaries that map the attribute
name "POSITIONS" to accessor objects that contain the displacements for each vertex. Image 22a shows the
initial triangle geometry in black, and the displacement for the first morph target in red, and the displacement for
the second morph target in green.
Image 22a: The initial triangle and morph target displacements.
renderedPrimitive.POSITION = primitive.POSITION +
weights[0] * primitive.targets[0].POSITION +
weights[1] * primitive.targets[1].POSITION;
This means that the current state of the mesh primitive is computed by taking the initial mesh primitive geometry
and adding a linear combination of the morph target displacements, where the weights are the factors for the
linear combination.
0.0 0.0, 0.0
1.0 0.0, 1.0
2.0 1.0, 1.0
3.0 1.0, 0.0
4.0 0.0, 0.0
Throughout the animation, the weights are interpolated linearly, and applied to the morph target displacements. At
each point, the rendered state of the mesh primitive is updated accordingly. The following is an example of the
state that is computed at 1.25 seconds.
Image 22b: An intermediate state of the morph target animation.
Previous: Simple Morph Target | Table of Contents | Next: SimpleSkin
Previous: Morph Targets | Table of Contents | Next: Skins
A Simple Skin
glTF supports vertex skinning, which allows the geometry (vertices) of a mesh to be deformed based on the pose
of a skeleton. This is essential in order to give animated geometry, for example of virtual characters, a realistic
appearance. The core for the definition of vertex skinning in a glTF asset is the skin , but vertex skinning in
general implies several interdependencies between the elements of a glTF asset that have been presented so far.
The following is a glTF asset that shows basic vertex skinning for a simple geometry. The elements of this asset
will be summarized quickly in this section, referring to the previous sections where appropriate, and pointing out
the new elements that have been added for the vertex skinning functionality. The details and background
information for vertex skinning will be given in the next section.
{
"scenes" : [ {
"nodes" : [ 0 ]
} ],
"nodes" : [ {
"skin" : 0,
"mesh" : 0,
"children" : [ 1 ]
}, {
"children" : [ 2 ],
"translation" : [ 0.0, 1.0, 0.0 ]
}, {
"rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
} ],
"meshes" : [ {
"primitives" : [ {
"attributes" : {
"POSITION" : 1,
"JOINTS_0" : 2,
"WEIGHTS_0" : 3
},
"indices" : 0
} ]
} ],
"skins" : [ {
"inverseBindMatrices" : 4,
"joints" : [ 1, 2 ]
} ],
"animations" : [ {
"channels" : [ {
"sampler" : 0,
"target" : {
"node" : 2,
"path" : "rotation"
}
} ],
"samplers" : [ {
"input" : 5,
"interpolation" : "LINEAR",
"output" : 6
} ]
} ],
"buffers" : [ {
"uri" : "data:application/gltf‐
buffer;base64,AAABAAMAAAADAAIAAgADAAUAAgAFAAQABAAFAAcABAAHAAYABgAHAAkABgAJAAgAAAAAAAAAAAAAAAAAAAC
APwAAAAAAAAAAAAAAAAAAAD8AAAAAAACAPwAAAD8AAAAAAAAAAAAAgD8AAAAAAACAPwAAgD8AAAAAAAAAAAAAwD8AAAAAAACA
PwAAwD8AAAAAAAAAAAAAAEAAAAAAAACAPwAAAEAAAAAA",
"byteLength" : 168
}, {
"uri" : "data:application/gltf‐
buffer;base64,AAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAA
AAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAA
AAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAABAPwAAgD4AAAAAAAAAA
AAAQD8AAIA+AAAAAAAAAAAAAAA/AAAAPwAAAAAAAAAAAAAAPwAAAD8AAAAAAAAAAAAAgD4AAEA/AAAAAAAAAAAAAIA+AABAPw
AAAAAAAAAAAAAAAAAAgD8AAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAA=",
"byteLength" : 320
}, {
"uri" : "data:application/gltf‐
buffer;base64,AACAPwAAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAAAAAAAAAAAAAAAAgD8AAAAAAAAAvwAAgL8AAAAAAAC
APwAAgD8AAAAAAAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAAAAAAIA/AAAAAAAAAL8AAIC/AAAAAAAAgD8=",
"byteLength" : 128
}, {
"uri" : "data:application/gltf‐
buffer;base64,AAAAAAAAAD8AAIA/AADAPwAAAEAAACBAAABAQAAAYEAAAIBAAACQQAAAoEAAALBAAAAAAAAAAAAAAAAAAAC
APwAAAAAAAAAAkxjEPkSLbD8AAAAAAAAAAPT9ND/0/TQ/AAAAAAAAAAD0/TQ/9P00PwAAAAAAAAAAkxjEPkSLbD8AAAAAAAAA
AAAAAAAAAIA/AAAAAAAAAAAAAAAAAACAPwAAAAAAAAAAkxjEvkSLbD8AAAAAAAAAAPT9NL/0/TQ/AAAAAAAAAAD0/TS/9P00P
wAAAAAAAAAAkxjEvkSLbD8AAAAAAAAAAAAAAAAAAIA/",
"byteLength" : 240
} ],
"bufferViews" : [ {
"buffer" : 0,
"byteOffset" : 0,
"byteLength" : 48,
"target" : 34963
}, {
"buffer" : 0,
"byteOffset" : 48,
"byteLength" : 120,
"target" : 34962
}, {
"buffer" : 1,
"byteOffset" : 0,
"byteLength" : 320,
"byteStride" : 16
}, {
"buffer" : 2,
"byteOffset" : 0,
"byteLength" : 128
}, {
"buffer" : 3,
"byteOffset" : 0,
"byteLength" : 240
} ],
"accessors" : [ {
"bufferView" : 0,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 24,
"type" : "SCALAR",
"max" : [ 9 ],
"min" : [ 0 ]
}, {
"bufferView" : 1,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 10,
"type" : "VEC3",
"max" : [ 1.0, 2.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0 ]
}, {
"bufferView" : 2,
"byteOffset" : 0,
"componentType" : 5123,
"count" : 10,
"type" : "VEC4",
"max" : [ 0, 1, 0, 0 ],
"min" : [ 0, 1, 0, 0 ]
}, {
"bufferView" : 2,
"byteOffset" : 160,
"componentType" : 5126,
"count" : 10,
"type" : "VEC4",
"max" : [ 1.0, 1.0, 0.0, 0.0 ],
"min" : [ 0.0, 0.0, 0.0, 0.0 ]
}, {
"bufferView" : 3,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 2,
"type" : "MAT4",
"max" : [ 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ‐0.5, ‐1.0, 0.0, 1.0
],
"min" : [ 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ‐0.5, ‐1.0, 0.0, 1.0 ]
}, {
"bufferView" : 4,
"byteOffset" : 0,
"componentType" : 5126,
"count" : 12,
"type" : "SCALAR",
"max" : [ 5.5 ],
"min" : [ 0.0 ]
}, {
"bufferView" : 4,
"byteOffset" : 48,
"componentType" : 5126,
"count" : 12,
"type" : "VEC4",
"max" : [ 0.0, 0.0, 0.707, 1.0 ],
"min" : [ 0.0, 0.0, ‐0.707, 0.707 ]
} ],
"asset" : {
"version" : "2.0"
}
}
The result of rendering this asset is shown in Image 19a.
Image 19a: A scene with simple vertex skinning.
Elements of the simple skin example
The elements of the given example are briefly summarized here:
Details about how these elements are interconnected to achieve the vertex skinning will be explained in the Skins
section.
Previous: Morph Targets | Table of Contents | Next: Skins
Previous: Simple Skin | Table of Contents
Skins
The process of vertex skinning is a bit complex. It brings together nearly all elements that are contained in a glTF
asset. This section will explain the basics of vertex skinning, based on the example in the Simple Skin section.
The geometry data
The geometry of the vertex skinning example is an indexed triangle mesh, consisting of 8 triangles and 10
vertices. They form a rectangle in the xyplane, with the lower left point at the origin (0,0,0), and the upper right
point at (1,2,0). So the positions of the vertices are
0.0, 0.0, 0.0,
1.0, 0.0, 0.0,
0.0, 0.5, 0.0,
1.0, 0.5, 0.0,
0.0, 1.0, 0.0,
1.0, 1.0, 0.0,
0.0, 1.5, 0.0,
1.0, 1.5, 0.0,
0.0, 2.0, 0.0,
1.0, 2.0, 0.0
and the indices of the triangles are
0, 1, 3,
0, 3, 2,
2, 3, 5,
2, 5, 4,
4, 5, 7,
4, 7, 6,
6, 7, 9,
6, 9, 8,
Image 20a: The geometry for the skinning example, with outline rendering, in its initial configuration.
This geometry data is contained in the mesh primitive of the only mesh, which is attached to the main node of
the scene. The mesh primitive contains additional attributes, namely the "JOINTS_0" and "WEIGHTS_0"
attributes. The purpose of these attributes will be explained below.
The skeleton structure
In the given example, there are two nodes that define the skeleton. They are referred to as "skeleton nodes", or
"joint nodes", because they can be imagined as the joints between the bones of the skeleton. The skin refers
to these nodes, by listing their indices in its joints property.
"nodes" : [
...
{
"children" : [ 2 ],
"translation" : [ 0.0, 1.0, 0.0 ]
},
{
"rotation" : [ 0.0, 0.0, 0.0, 1.0 ]
}
],
The first joint node has a translation property, defining a translation about 1.0 along the yaxis. The second
joint node has a rotation property that initially describes a rotation about 0 degrees (thus, no rotation at all).
This rotation will later be changed by the animation to let the skeleton bend left and right and show the effect of
the vertex skinning.
The skin
The skin is the core element of the vertex skinning. In the example, there is a single skin:
"skins" : [
{
"inverseBindMatrices" : 4,
"joints" : [ 1, 2 ]
}
],
1.0 0.0 0.0 0.0
0.0 1.0 0.0 ‐1.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
This matrix translates the mesh about 1 along the yaxis, as shown Image 20b.
Image 20b: The transformation of the geometry with the inverse bind matrix of joint 1.
This transformation may look counterintuitive at first glance. But the goal of this transformation is to "undo" the
transformation that is done by the initial global transform of the respective joint node so that the influences of the
joint to the mesh vertices may be computed based on their actual global transform.
Vertex skinning implementation
Users of existing rendering libraries will hardly ever have to manually process the vertex skinning data contained
in a glTF asset: the actual skinning computations usually take place in the vertex shader, which is a lowlevel
implementation detail of the respective library. However, knowing how the vertex skinning data is supposed to be
processed may help to create proper, valid models with vertex skinning. So this section will give a short
summary of how the vertex skinning is applied, using some pseudocode and examples in GLSL.
The joint matrices
The vertex positions of a skinned mesh are eventually computed by the vertex shader. During these
computations, the vertex shader has to take into account the current pose of the skeleton in order to compute the
proper vertex position. This information is passed to the vertex shader as an array of matrices, namely as the
joint matrices. This is an array that is, a uniform variable that contains one 4×4 matrix for each joint of the
skeleton. In the shader, these matrices are combined to compute the actual skinning matrix for each vertex:
...
uniform mat4 u_jointMat[2];
...
void main(void)
{
mat4 skinMat =
a_weight.x * u_jointMat[int(a_joint.x)] +
a_weight.y * u_jointMat[int(a_joint.y)] +
a_weight.z * u_jointMat[int(a_joint.z)] +
a_weight.w * u_jointMat[int(a_joint.w)];
....
}
The joint matrix for each joint has to perform the following transformations to the vertices:
The vertices have to be prepared to be transformed with the current global transform of the joint node.
Therefore, they are transformed with the inverseBindMatrix of the joint node. This is the inverse of the
global transform of the joint node in its original state, when no animations have been applied yet.
The vertices have to be transformed with the current global transform of the joint node. Together with the
transformation from the inverseBindMatrix , this will cause the vertices to be transformed only based on
the current transform of the node, in the coordinate space of the current joint node.
The vertices have to be transformed with inverse of the global transform of the node that the mesh is
attached to, because this transform is already done using the modelviewmatrix, and thus has to be
cancelled out from the skinning computation.
So the pseudocode for computing the joint matrix of joint j may look as follows:
jointMatrix(j) =
globalTransformOfNodeThatTheMeshIsAttachedTo^‐1 *
globalTransformOfJointNode(j) *
inverseBindMatrixForJoint(j);
Note: Vertex skinning in other contexts often involves a matrix that is called "Bind Shape Matrix". This matrix is
supposed to transform the geometry of the skinned mesh into the coordinate space of the joints. In glTF, this
matrix is omitted, and it is assumed that this transform is either premultiplied with the mesh data, or
postmultiplied to the inverse bind matrices.
Image 20c shows the transformations that are done to the geometry in the Simple Skin example, using the joint
matrix of joint 1. The image shows the transformation for an intermediate state of the animation, namely, when
the rotation of the joint node has already been modified by the animation, to describe a rotation about 45 degrees
around the zaxis.
Image 20c: The transformation of the geometry done for joint 1.
The last panel of Image 20c shows how the geometry would look like if it were only transformed with the joint
matrix of joint 1. This state of the geometry is never really visible: The actual geometry that is computed in the
vertex shader will combine the geometries as they are created from the different joint matrices, based on the
joints and weights that are explained below.
The skinning joints and weights
As mentioned above, the mesh primitive contains new attributes that are required for the vertex skinning.
Particularly, these are the "JOINTS_0" and the "WEIGHTS_0" attributes. Each attribute refers to an accessor
that provides one data element for each vertex of the mesh.
Vertex 0: 0, 1, 0, 0,
Vertex 1: 0, 1, 0, 0,
Vertex 2: 0, 1, 0, 0,
Vertex 3: 0, 1, 0, 0,
Vertex 4: 0, 1, 0, 0,
Vertex 5: 0, 1, 0, 0,
Vertex 6: 0, 1, 0, 0,
Vertex 7: 0, 1, 0, 0,
Vertex 8: 0, 1, 0, 0,
Vertex 9: 0, 1, 0, 0,
This means that every vertex should be influenced by joint 0 and joint 1. (The last 2 components of each vector
are ignored here. If there were multiple joints, then one entry of this accessor could, for example, contain
3, 1, 8, 4,
meaning that the corresponding vertex should be influenced by the joints 3, 1, 8, and 4.)
Vertex 0: 1.00, 0.00, 0.0, 0.0,
Vertex 1: 1.00, 0.00, 0.0, 0.0,
Vertex 2: 0.75, 0.25, 0.0, 0.0,
Vertex 3: 0.75, 0.25, 0.0, 0.0,
Vertex 4: 0.50, 0.50, 0.0, 0.0,
Vertex 5: 0.50, 0.50, 0.0, 0.0,
Vertex 6: 0.25, 0.75, 0.0, 0.0,
Vertex 7: 0.25, 0.75, 0.0, 0.0,
Vertex 8: 0.00, 1.00, 0.0, 0.0,
Vertex 9: 0.00, 1.00, 0.0, 0.0,
Again, the last two components of each entry are not relevant, because there are only two joints.
In the vertex shader, this information is used to create a linear combination of the joint matrices. This matrix is
called the skin matrix of the respective vertex. Therefore, the data of the "JOINTS_0" and "WEIGHTS_0"
attributes are passed to the shader. In this example, they are given as the a_joint and a_weight attribute
variable, respectively:
...
attribute vec4 a_joint;
attribute vec4 a_weight;
uniform mat4 u_jointMat[2];
...
void main(void)
{
mat4 skinMat =
a_weight.x * u_jointMat[int(a_joint.x)] +
a_weight.y * u_jointMat[int(a_joint.y)] +
a_weight.z * u_jointMat[int(a_joint.z)] +
a_weight.w * u_jointMat[int(a_joint.w)];
vec4 pos = u_modelViewMatrix * skinMat * vec4(a_position,1.0);
gl_Position = u_projectionMatrix * pos;
}
The skin matrix is then used to transform the original position of the vertex before it is transformed with the
modelviewmatrix. The result of this transformation can be imagined as a weighted transformation of the vertices
with the respective joint matrices, as shown in Image 20d.
Image 20d: Computation of the skin matrix.
The result of applying this skin matrix to the vertices for the given example is shown in Image 20e.
Image 20e: The geometry for the skinning example, with outline rendering, during the animation.
Previous: Simple Skin | Table of Contents